diff --git a/cv/distiller/CWD/pytorch/README.md b/cv/distiller/CWD/pytorch/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4f02782d0972b2e9239c819efffd68181b1cccbc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/README.md
@@ -0,0 +1,88 @@
+
+# CWD
+
+> [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probability map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures.
+
+## Environment
+
+## install libGL
+yum install mesa-libGL
+
+## install zlib
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+cd ..
+rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
+
+
+```
+pip3 install cityscapesscripts addict opencv-python
+cd mmcv
+bash clean_mmcv.sh
+bash build_mmcv.sh
+bash install_mmcv.sh
+cd ../mmrazor
+pip3 install -r requirements.txt
+pip3 install mmcls==v1.0.0rc6
+pip3 install mmsegmentation==v1.0.0
+pip3 install mmengine==0.7.3
+python3 setup.py develop 
+```
+
+## Cityscapes
+Cityscapes 官方网站可以下载 [Cityscapes](<https://www.cityscapes-dataset.com/>) 数据集，按照官网要求注册并登陆后，数据可以在[这里](<https://www.cityscapes-dataset.com/downloads/>)找到。
+
+```
+mkdir data 
+cd data
+# dowmload data 
+```
+
+按照惯例，**labelTrainIds.png 用于 cityscapes 训练。 我们提供了一个基于 cityscapesscripts 的脚本用于生成 **labelTrainIds.png。
+```shell
+  ├── data
+  │   ├── cityscapes
+  │   │   ├── leftImg8bit
+  │   │   │   ├── train
+  │   │   │   ├── val
+  │   │   ├── gtFine
+  │   │   │   ├── train
+  │   │   │   ├── val
+  ```
+## --nproc 表示 8 个转换进程，也可以省略。
+```
+cd ..
+python3 tools/dataset_converters/cityscapes.py data/cityscapes --nproc 8
+```
+##  Training
+
+### On single GPU
+
+```bash
+python3 tools/train.py configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py
+```
+
+### Multiple GPUs on one machine
+
+```bash
+bash tools/dist_train.sh configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py 8
+```
+
+
+
+
+## Segmentation
+
+|       model       |     GPU     | FP32                                 | 
+|-------------------| ----------- | ------------------------------------ |
+|   pspnet_r18(student)   | 8 cards     | Miou=  75.32                           |
+
+
diff --git a/cv/distiller/CWD/pytorch/mmcv/.dockerignore b/cv/distiller/CWD/pytorch/mmcv/.dockerignore
new file mode 100644
index 0000000000000000000000000000000000000000..8c22f226d3e2d8a625515290691d2cfc6ed87f2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/.dockerignore
@@ -0,0 +1,6 @@
+.git
+.gitignore
+*.egg-info
+.eggs/
+.mypy-cache
+pip-wheel-metadata
diff --git a/cv/distiller/CWD/pytorch/mmcv/.pre-commit-config.yaml b/cv/distiller/CWD/pytorch/mmcv/.pre-commit-config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..19e9f8d4813d7dafc5d7659b0e95cca161b966ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/.pre-commit-config.yaml
@@ -0,0 +1,54 @@
+exclude: ^tests/data/
+repos:
+  - repo: https://gitlab.com/pycqa/flake8.git
+    rev: 3.8.3
+    hooks:
+      - id: flake8
+  - repo: https://github.com/asottile/seed-isort-config
+    rev: v2.2.0
+    hooks:
+      - id: seed-isort-config
+  - repo: https://github.com/timothycrosley/isort
+    rev: 4.3.21
+    hooks:
+      - id: isort
+  - repo: https://github.com/pre-commit/mirrors-yapf
+    rev: v0.30.0
+    hooks:
+      - id: yapf
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v3.1.0
+    hooks:
+      - id: trailing-whitespace
+      - id: check-yaml
+      - id: end-of-file-fixer
+      - id: requirements-txt-fixer
+      - id: double-quote-string-fixer
+      - id: check-merge-conflict
+      - id: fix-encoding-pragma
+        args: ["--remove"]
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+  - repo: https://github.com/jumanjihouse/pre-commit-hooks
+    rev: 2.1.4
+    hooks:
+      - id: markdownlint
+        args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034",
+              "-t", "allow_different_nesting"]
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.1.0
+    hooks:
+      - id: codespell
+  - repo: https://github.com/myint/docformatter
+    rev: v1.3.1
+    hooks:
+      - id: docformatter
+        args: ["--in-place", "--wrap-descriptions", "79"]
+  # - repo: local
+  #   hooks:
+  #     - id: clang-format
+  #       name: clang-format
+  #       description: Format files with ClangFormat
+  #       entry: clang-format -style=google -i
+  #       language: system
+  #       files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
diff --git a/cv/distiller/CWD/pytorch/mmcv/.readthedocs.yml b/cv/distiller/CWD/pytorch/mmcv/.readthedocs.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7d5f1c2060a64e5cf9c2bec433cd24532a283164
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/.readthedocs.yml
@@ -0,0 +1,9 @@
+version: 2
+
+formats: all
+
+python:
+  version: 3.7
+  install:
+    - requirements: requirements/runtime.txt
+    - requirements: requirements/docs.txt
diff --git a/cv/distiller/CWD/pytorch/mmcv/CITATION.cff b/cv/distiller/CWD/pytorch/mmcv/CITATION.cff
new file mode 100644
index 0000000000000000000000000000000000000000..786117aac3e063efc18ad1b55e163d570a09e379
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/CITATION.cff
@@ -0,0 +1,8 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+  - name: "MMCV Contributors"
+title: "OpenMMLab Computer Vision Foundation"
+date-released: 2018-08-22
+url: "https://github.com/open-mmlab/mmcv"
+license: Apache-2.0
diff --git a/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING.md b/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING.md
new file mode 100644
index 0000000000000000000000000000000000000000..184a6bd2c6dacbba0866a91ca0226854b8d06f01
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING.md
@@ -0,0 +1,258 @@
+## Contributing to OpenMMLab
+
+Welcome to the MMCV community, we are committed to building a cutting-edge computer vision foundational library and all kinds of contributions are welcomed, including but not limited to
+
+**Fix bug**
+
+You can directly post a Pull Request to fix typo in code or documents
+
+The steps to fix the bug of code implementation are as follows.
+
+1. If the modification involve significant changes, you should create an issue first and describe the error information and how to trigger the bug. Other developers will discuss with you and propose an proper solution.
+
+2. Posting a pull request after fixing the bug and adding corresponding unit test.
+
+**New Feature or Enhancement**
+
+1. If the modification involve significant changes, you should create an issue to discuss with our developers to propose an proper design.
+2. Post a Pull Request after implementing the new feature or enhancement and add corresponding unit test.
+
+**Document**
+
+You can directly post a pull request to fix documents. If you want to add a document, you should first create an issue to check if it is reasonable.
+
+### Pull Request Workflow
+
+If you're not familiar with Pull Request, don't worry! The following guidance will tell you how to create a Pull Request step by step. If you want to dive into the develop mode of Pull Request, you can refer to the [official documents](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. Fork and clone
+
+If you are posting a pull request for the first time, you should fork the OpenMMLab repositories by clicking the **Fork** button in the top right corner of the GitHub page, and the forked repositories will appear under your GitHub profile.
+
+<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">
+
+Then, you can clone the repositories to local:
+
+```shell
+git clone git@github.com:{username}/mmcv.git
+```
+
+After that, you should ddd official repository as the upstream repository
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmcv
+```
+
+Check whether remote repository has been added successfully by `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmcv.git (fetch)
+origin	git@github.com:{username}/mmcv.git (push)
+upstream	git@github.com:open-mmlab/mmcv (fetch)
+upstream	git@github.com:open-mmlab/mmcv (push)
+```
+
+> Here's a brief introduction to origin and upstream. When we use "git clone", we create an "origin" remote by default, which points to the repository cloned from. As for "upstream", we add it ourselves to point to the target repository. Of course, if you don't like the name "upstream", you could name it as you wish. Usually, we'll push the code to "origin". If the pushed code conflicts with the latest code in official("upstream"), we should pull the latest code from upstream to resolve the conflicts, and then push to "origin" again. The posted Pull Request will be updated automatically.
+
+#### 2. Configure pre-commit
+
+You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of OpenMMLab. **Note**: The following code should be executed under the MMCV directory.
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+Check that pre-commit is configured successfully, and install the hooks defined in `.pre-commit-config.yaml`.
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+If the installation process is interrupted, you can repeatedly run `pre-commit run ... ` to continue the installation.
+
+If the code does not conform to the code style specification, pre-commit will raise a warning and  fixes some of the errors automatically.
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+If we want to commit our code bypassing the pre-commit hook, we can use the `--no-verify` option(**only for temporarily commit**).
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. Create a development branch
+
+After configuring the pre-commit, we should create a branch based on the master branch to develop the new feature or fix the bug. The proposed branch name is `username/pr_name`
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:
+
+```shell
+git pull upstream master
+```
+
+#### 4. Commit the code and pass the unit test
+
+- MMCV introduces mypy to do static type checking to increase the robustness of the code. Therefore, we need to add Type Hints to our code and pass the mypy check. If you are not familiar with Type Hints, you can refer to [this tutorial](https://docs.python.org/3/library/typing.html).
+
+- The committed code should pass through the unit test
+
+  ```shell
+  # Pass all unit tests
+  pytest tests
+
+  # Pass the unit test of runner
+  pytest tests/test_runner/test_runner.py
+  ```
+
+  If the unit test fails for lack of dependencies, you can install the dependencies referring to the [guidance](#unit-test)
+
+- If the documents are modified/added, we should check the rendering result referring to [guidance](#document-rendering)
+
+#### 5. Push the code to remote
+
+We could push the local commits to remote after passing through the check of unit test and pre-commit. You can associate the local branch with remote branch by adding `-u` option.
+
+```shell
+git push -u origin {branch_name}
+```
+
+This will allow you to use the `git push` command to push code directly next time, without having to specify a branch or the remote repository.
+
+#### 6. Create a Pull Request
+
+(1) Create a pull request in GitHub's Pull request interface
+
+<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">
+
+(2) Modify the PR description according to the guidelines so that other developers can better understand your changes
+
+<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">
+
+Find more details about Pull Request description in [pull request guidelines](#pr-specs).
+
+**note**
+
+(a) The Pull Request description should contain the reason for the change, the content of the change, and the impact of the change, and be associated with the relevant Issue (see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
+
+(b) If it is your first contribution, please sign the CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) Check whether the Pull Request pass through the CI
+
+<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">
+
+MMCV will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
+
+(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+#### 7. Resolve conflicts
+
+If your local branch conflicts with the latest master branch of "upstream", you'll need to resolove them. There are two ways to do this:
+
+```shell
+git fetch --all --prune
+git rebase upstream/master
+```
+
+or
+
+```shell
+git fetch --all --prune
+git merge upstream/master
+```
+
+If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.
+
+### Guidance
+
+#### Unit test
+
+If you cannot run the unit test of some modules for lacking of some dependencies, such as [video](https://github.com/open-mmlab/mmcv/tree/master/mmcv/video) module, you can try to install the following dependencies:
+
+```shell
+# Linux
+sudo apt-get update -y
+sudo apt-get install -y libturbojpeg
+sudo apt-get install -y ffmpeg
+
+# Windows
+conda install ffmpeg
+```
+
+We should also make sure the committed code will not decrease the coverage of unit test, we could run the following command to check the coverage of unit test:
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### Document rendering
+
+If the documents are modified/added, we should check the rendering result. We could install the dependencies and run the following command to render the documents and check the results:
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### Code style
+
+#### Python
+
+We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
+
+We use the following tools for linting and formatting:
+
+- [flake8](https://github.com/PyCQA/flake8): A wrapper around some linter tools.
+- [isort](https://github.com/timothycrosley/isort): A Python utility to sort imports.
+- [yapf](https://github.com/google/yapf): A formatter for Python files.
+- [codespell](https://github.com/codespell-project/codespell): A Python utility to fix common misspellings in text files.
+- [mdformat](https://github.com/executablebooks/mdformat): Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files.
+- [docformatter](https://github.com/myint/docformatter): A formatter to format docstring.
+
+Style configurations of yapf and isort can be found in [setup.cfg](./setup.cfg).
+
+We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`, `markdown files`,
+fixes `end-of-files`, `float-quoted-strings`, `python-encoding-pragma`, `mixed-line-ending`, sorts `requirments.txt` automatically on every commit.
+The config for a pre-commit hook is stored in [.pre-commit-config](./.pre-commit-config.yaml).
+
+#### C++ and CUDA
+
+We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
+
+### PR Specs
+
+1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
+
+2. One short-time branch should be matched with only one PR
+
+3. Accomplish a detailed change in one PR. Avoid large PR
+
+   - Bad: Support Faster R-CNN
+   - Acceptable: Add a box head to Faster R-CNN
+   - Good: Add a parameter to box head to support custom conv-layer number
+
+4. Provide clear and significant commit message
+
+5. Provide clear and meaningful PR description
+
+   - Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
+   - Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
+   - Introduce main changes, results and influences on other modules in short description
+   - Associate related issues and pull requests with a milestone
diff --git a/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING_zh-CN.md b/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING_zh-CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..52cc1ab5b2d399557647604018c494e4f93a1d24
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/CONTRIBUTING_zh-CN.md
@@ -0,0 +1,274 @@
+## 贡献代码
+
+欢迎加入 MMCV 社区，我们致力于打造最前沿的计算机视觉基础库，我们欢迎任何类型的贡献，包括但不限于
+
+**修复错误**
+
+修复代码实现错误的步骤如下：
+
+1. 如果提交的代码改动较大，建议先提交 issue，并正确描述 issue 的现象、原因和复现方式，讨论后确认修复方案。
+2. 修复错误并补充相应的单元测试，提交拉取请求。
+
+**新增功能或组件**
+
+1. 如果新功能或模块涉及较大的代码改动，建议先提交 issue，确认功能的必要性。
+2. 实现新增功能并添单元测试，提交拉取请求。
+
+**文档补充**
+
+修复文档可以直接提交拉取请求
+
+添加文档或将文档翻译成其他语言步骤如下
+
+1. 提交 issue，确认添加文档的必要性。
+2. 添加文档，提交拉取请求。
+
+### 拉取请求工作流
+
+如果你对拉取请求不了解，没关系，接下来的内容将会从零开始，一步一步地指引你如何创建一个拉取请求。如果你想深入了解拉取请求的开发模式，可以参考 github [官方文档](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. 复刻仓库
+
+当你第一次提交拉取请求时，先复刻 OpenMMLab 原代码库，点击 GitHub 页面右上角的 **Fork** 按钮，复刻后的代码库将会出现在你的 GitHub 个人主页下。
+
+<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">
+
+将代码克隆到本地
+
+```shell
+git clone git@github.com:{username}/mmcv.git
+```
+
+添加原代码库为上游代码库
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmcv
+```
+
+检查 remote 是否添加成功，在终端输入 `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmcv.git (fetch)
+origin	git@github.com:{username}/mmcv.git (push)
+upstream	git@github.com:open-mmlab/mmcv (fetch)
+upstream	git@github.com:open-mmlab/mmcv (push)
+```
+
+> 这里对 origin 和 upstream 进行一个简单的介绍，当我们使用 git clone 来克隆代码时，会默认创建一个 origin 的 remote，它指向我们克隆的代码库地址，而 upstream 则是我们自己添加的，用来指向原始代码库地址。当然如果你不喜欢他叫 upstream，也可以自己修改，比如叫 open-mmlab。我们通常向 origin 提交代码（即 fork 下来的远程仓库），然后向 upstream 提交一个 pull request。如果提交的代码和最新的代码发生冲突，再从 upstream 拉取最新的代码，和本地分支解决冲突，再提交到 origin。
+
+#### 2. 配置 pre-commit
+
+在本地开发环境中，我们使用 [pre-commit](https://pre-commit.com/#intro) 来检查代码风格，以确保代码风格的统一。在提交代码，需要先安装 pre-commit（需要在 MMCV 目录下执行）:
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+检查 pre-commit 是否配置成功，并安装 `.pre-commit-config.yaml` 中的钩子：
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+> 如果你是中国用户，由于网络原因，可能会出现安装失败的情况，这时可以使用国内源
+
+> pre-commit install -c .pre-commit-config-zh-cn.yaml
+
+> pre-commit run --all-files -c .pre-commit-config-zh-cn.yaml
+
+如果安装过程被中断，可以重复执行 `pre-commit run ...` 继续安装。
+
+如果提交的代码不符合代码风格规范，pre-commit 会发出警告，并自动修复部分错误。
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+如果我们想临时绕开 pre-commit 的检查提交一次代码，可以在 `git commit` 时加上 `--no-verify`（需要保证最后推送至远程仓库的代码能够通过 pre-commit 检查）。
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. 创建开发分支
+
+安装完 pre-commit 之后，我们需要基于 master 创建开发分支，建议的分支命名规则为 `username/pr_name`。
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+在后续的开发中，如果本地仓库的 master 分支落后于 upstream 的 master 分支，我们需要先拉取 upstream 的代码进行同步，再执行上面的命令
+
+```shell
+git pull upstream master
+```
+
+#### 4. 提交代码并在本地通过单元测试
+
+- MMCV 引入了 mypy 来做静态类型检查，以增加代码的鲁棒性。因此我们在提交代码时，需要补充 Type Hints。具体规则可以参考[教程](https://zhuanlan.zhihu.com/p/519335398)。
+
+- 提交的代码同样需要通过单元测试
+
+  ```shell
+  # 通过全量单元测试
+  pytest tests
+
+  # 我们需要保证提交的代码能够通过修改模块的单元测试，以 runner 为例
+  pytest tests/test_runner/test_runner.py
+  ```
+
+  如果你由于缺少依赖无法运行修改模块的单元测试，可以参考[指引-单元测试](#单元测试)
+
+- 如果修改/添加了文档，参考[指引](#文档渲染)确认文档渲染正常。
+
+#### 5. 推送代码到远程
+
+代码通过单元测试和 pre-commit 检查后，将代码推送到远程仓库，如果是第一次推送，可以在 `git push` 后加上 `-u` 参数以关联远程分支
+
+```shell
+git push -u origin {branch_name}
+```
+
+这样下次就可以直接使用 `git push` 命令推送代码了，而无需指定分支和远程仓库。
+
+#### 6. 提交拉取请求（PR）
+
+(1) 在 GitHub 的 Pull request 界面创建拉取请求
+<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">
+
+(2) 根据指引修改 PR 描述，以便于其他开发者更好地理解你的修改
+
+<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">
+
+描述规范详见[拉取请求规范](#拉取请求规范)
+
+&#160;
+
+**注意事项**
+
+(a) PR 描述应该包含修改理由、修改内容以及修改后带来的影响，并关联相关 Issue（具体方式见[文档](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)）
+
+(b) 如果是第一次为 OpenMMLab 做贡献，需要签署 CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) 检查提交的 PR 是否通过 CI（集成测试）
+
+<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">
+
+MMCV 会在不同的平台（Linux、Window、Mac），基于不同版本的 Python、PyTorch、CUDA 对提交的代码进行单元测试，以保证代码的正确性，如果有任何一个没有通过，我们可点击上图中的 `Details` 来查看具体的测试信息，以便于我们修改代码。
+
+(3) 如果 PR 通过了 CI，那么就可以等待其他开发者的 review，并根据 reviewer 的意见，修改代码，并重复 [4](#4-提交代码并本地通过单元测试)-[5](#5-推送代码到远程) 步骤，直到 reviewer 同意合入 PR。
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+所有 reviewer 同意合入 PR 后，我们会尽快将 PR 合并到主分支。
+
+#### 7. 解决冲突
+
+随着时间的推移，我们的代码库会不断更新，这时候，如果你的 PR 与主分支存在冲突，你需要解决冲突，解决冲突的方式有两种：
+
+```shell
+git fetch --all --prune
+git rebase upstream/master
+```
+
+或者
+
+```shell
+git fetch --all --prune
+git merge upstream/master
+```
+
+如果你非常善于处理冲突，那么可以使用 rebase 的方式来解决冲突，因为这能够保证你的 commit log 的整洁。如果你不太熟悉 `rebase` 的使用，那么可以使用 `merge` 的方式来解决冲突。
+
+### 指引
+
+#### 单元测试
+
+如果你无法正常执行部分模块的单元测试，例如 [video](https://github.com/open-mmlab/mmcv/tree/master/mmcv/video) 模块，可能是你的当前环境没有安装以下依赖
+
+```shell
+# Linux
+sudo apt-get update -y
+sudo apt-get install -y libturbojpeg
+sudo apt-get install -y ffmpeg
+
+# Windows
+conda install ffmpeg
+```
+
+在提交修复代码错误或新增特性的拉取请求时，我们应该尽可能的让单元测试覆盖所有提交的代码，计算单元测试覆盖率的方法如下
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### 文档渲染
+
+在提交修复代码错误或新增特性的拉取请求时，可能会需要修改/新增模块的 docstring。我们需要确认渲染后的文档样式是正确的。
+本地生成渲染后的文档的方法如下
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### 代码风格
+
+#### Python
+
+[PEP8](https://www.python.org/dev/peps/pep-0008/) 作为 OpenMMLab 算法库首选的代码规范，我们使用以下工具检查和格式化代码
+
+- [flake8](https://github.com/PyCQA/flake8): Python 官方发布的代码规范检查工具，是多个检查工具的封装
+- [isort](https://github.com/timothycrosley/isort): 自动调整模块导入顺序的工具
+- [yapf](https://github.com/google/yapf): Google 发布的代码规范检查工具
+- [codespell](https://github.com/codespell-project/codespell): 检查单词拼写是否有误
+- [mdformat](https://github.com/executablebooks/mdformat): 检查 markdown 文件的工具
+- [docformatter](https://github.com/myint/docformatter): 格式化 docstring 的工具
+
+yapf 和 isort 的配置可以在 [setup.cfg](./setup.cfg) 找到
+
+通过配置 [pre-commit hook](https://pre-commit.com/) ，我们可以在提交代码时自动检查和格式化 `flake8`、`yapf`、`isort`、`trailing whitespaces`、`markdown files`，
+修复 `end-of-files`、`float-quoted-strings`、`python-encoding-pragma`、`mixed-line-ending`，调整 `requirments.txt` 的包顺序。
+pre-commit 钩子的配置可以在 [.pre-commit-config](./.pre-commit-config.yaml) 找到。
+
+pre-commit 具体的安装使用方式见[拉取请求](#2-配置-pre-commit)。
+
+更具体的规范请参考 [OpenMMLab 代码规范](code_style.md)。
+
+#### C++ and CUDA
+
+C++ 和 CUDA 的代码规范遵从 [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)
+
+### 拉取请求规范
+
+1. 使用 [pre-commit hook](https://pre-commit.com)，尽量减少代码风格相关问题
+
+2. 一个`拉取请求`对应一个短期分支
+
+3. 粒度要细，一个`拉取请求`只做一件事情，避免超大的`拉取请求`
+
+   - Bad：实现 Faster R-CNN
+   - Acceptable：给 Faster R-CNN 添加一个 box head
+   - Good：给 box head 增加一个参数来支持自定义的 conv 层数
+
+4. 每次 Commit 时需要提供清晰且有意义 commit 信息
+
+5. 提供清晰且有意义的`拉取请求`描述
+
+   - 标题写明白任务名称，一般格式:\[Prefix\] Short description of the pull request (Suffix)
+   - prefix: 新增功能 \[Feature\], 修 bug \[Fix\], 文档相关 \[Docs\], 开发中 \[WIP\] (暂时不会被review)
+   - 描述里介绍`拉取请求`的主要修改内容，结果，以及对其他部分的影响, 参考`拉取请求`模板
+   - 关联相关的`议题` (issue) 和其他`拉取请求`
+
+6. 如果引入了其他三方库，或借鉴了三方库的代码，请确认他们的许可证和 mmcv 兼容，并在借鉴的代码上补充 `This code is inspired from http://`
diff --git a/cv/distiller/CWD/pytorch/mmcv/Jenkinsfile b/cv/distiller/CWD/pytorch/mmcv/Jenkinsfile
new file mode 100644
index 0000000000000000000000000000000000000000..f0c19d9f3c3e0efc9ed218efa2259c598e383a06
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/Jenkinsfile
@@ -0,0 +1,56 @@
+def docker_images = ["registry.cn-hangzhou.aliyuncs.com/sensetime/openmmlab:cuda10.1-cudnn7-devel-ubuntu18.04-py37-pt1.3",
+                     "registry.cn-hangzhou.aliyuncs.com/sensetime/openmmlab:cuda10.2-cudnn7-devel-ubuntu18.04-py37-pt1.5"]
+def torch_versions = ["1.3.0", "1.5.0"]
+def torchvision_versions = ["0.4.2", "0.6.0"]
+
+
+def get_stages(docker_image, folder) {
+    def pip_mirror = "-i https://mirrors.aliyun.com/pypi/simple"
+    stages = {
+        docker.image(docker_image).inside('-u root --gpus all --net host') {
+            sh "rm -rf ${env.WORKSPACE}-${folder} ${env.WORKSPACE}-${folder}@tmp"
+            sh "cp -r ${env.WORKSPACE} ${env.WORKSPACE}-${folder}"
+            try {
+                dir("${env.WORKSPACE}-${folder}") {
+                    stage("before_install") {
+                        sh "apt-get update && apt-get install -y ninja-build"
+                    }
+                    stage("dependencies") {
+                        // torch and torchvision are pre-installed in dockers
+                        sh "pip list | grep torch"
+                        sh "apt-get install -y ffmpeg libturbojpeg"
+                        sh "pip install pytest coverage lmdb PyTurboJPEG Cython ${pip_mirror}"
+                    }
+                    stage("build") {
+                        sh "MMCV_WITH_OPS=1 pip install -e . ${pip_mirror}"
+                    }
+                    stage("test") {
+                        sh "coverage run --branch --source=mmcv -m pytest tests/"
+                        sh "coverage xml"
+                        sh "coverage report -m"
+                    }
+                }
+            } finally {
+                sh "rm -rf ${env.WORKSPACE}-${folder} ${env.WORKSPACE}-${folder}@tmp"
+            }
+        }
+    }
+    return stages
+}
+
+
+node('master') {
+    // fetch latest change from SCM (Source Control Management)
+    checkout scm
+
+    def stages = [:]
+    for (int i = 0; i < docker_images.size(); i++) {
+        def docker_image = docker_images[i]
+        def torch = torch_versions[i]
+        def torchvision = torchvision_versions[i]
+        def tag = docker_image + '_' + torch + '_' + torchvision
+        def folder = "${i}"
+        stages[tag] = get_stages(docker_image, folder)
+    }
+    parallel stages
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/LICENSE b/cv/distiller/CWD/pytorch/mmcv/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..f02314255d824c0816b0bf1648aac8ab78976199
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/LICENSE
@@ -0,0 +1,203 @@
+Copyright (c) OpenMMLab. All rights reserved
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2018-2020 Open-MMLab. All rights reserved.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/cv/distiller/CWD/pytorch/mmcv/LICENSES.md b/cv/distiller/CWD/pytorch/mmcv/LICENSES.md
new file mode 100644
index 0000000000000000000000000000000000000000..5de8358331f4d21529e016807b86b66dc6ca29da
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/LICENSES.md
@@ -0,0 +1,8 @@
+# Licenses for special operations
+
+In this file, we list the operations with other licenses instead of Apache 2.0. Users should be careful about adopting these operations in any commercial matters.
+
+|    Operation     |                                                                               Files                                                                               |    License     |
+| :--------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------: |
+|    upfirdn2d     |          [mmcv/ops/csrc/pytorch/cuda/upfirdn2d_kernel.cu](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/pytorch/cuda/upfirdn2d_kernel.cu)          | NVIDIA License |
+| fused_leaky_relu | [mmcv/ops/csrc/pytorch/cuda/fused_bias_leakyrelu_cuda.cu](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/pytorch/cuda/fused_bias_leakyrelu_cuda.cu) | NVIDIA License |
diff --git a/cv/distiller/CWD/pytorch/mmcv/MANIFEST.in b/cv/distiller/CWD/pytorch/mmcv/MANIFEST.in
new file mode 100644
index 0000000000000000000000000000000000000000..622635caa1ec01f78d95c684b87658df87c63b38
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/MANIFEST.in
@@ -0,0 +1,6 @@
+include requirements/runtime.txt
+include mmcv/ops/csrc/common/cuda/*.cuh mmcv/ops/csrc/common/cuda/*.hpp mmcv/ops/csrc/common/*.hpp
+include mmcv/ops/csrc/pytorch/*.cpp mmcv/ops/csrc/pytorch/cuda/*.cu mmcv/ops/csrc/pytorch/cuda/*.cpp mmcv/ops/csrc/pytorch/cpu/*.cpp
+include mmcv/ops/csrc/parrots/*.h mmcv/ops/csrc/parrots/*.cpp
+include mmcv/ops/csrc/pytorch/mps/*.mm mmcv/ops/csrc/common/mps/*.h mmcv/ops/csrc/common/mps/*.mm
+recursive-include mmcv/ops/csrc/ *.h *.hpp *.cpp *.cuh *.cu *.mm
diff --git a/cv/distiller/CWD/pytorch/mmcv/README.md b/cv/distiller/CWD/pytorch/mmcv/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..25d290f3dac27c8f0e87b0256ed8b0964d5bbcc9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/README.md
@@ -0,0 +1,161 @@
+<div align="center">
+  <img src="https://raw.githubusercontent.com/open-mmlab/mmcv/master/docs/en/mmcv-logo.png" width="300"/>
+  <div>&nbsp;</div>
+  <div align="center">
+    <b><font size="5">OpenMMLab website</font></b>
+    <sup>
+      <a href="https://openmmlab.com">
+        <i><font size="4">HOT</font></i>
+      </a>
+    </sup>
+    &nbsp;&nbsp;&nbsp;&nbsp;
+    <b><font size="5">OpenMMLab platform</font></b>
+    <sup>
+      <a href="https://platform.openmmlab.com">
+        <i><font size="4">TRY IT OUT</font></i>
+      </a>
+    </sup>
+  </div>
+  <div>&nbsp;</div>
+</div>
+
+[![docs](https://img.shields.io/badge/docs-2.x-blue)](https://mmcv.readthedocs.io/en/2.x/)
+[![platform](https://img.shields.io/badge/platform-Linux%7CWindows%7CmacOS-blue)](https://mmcv.readthedocs.io/en/2.x/get_started/installation.html)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mmcv)](https://pypi.org/project/mmcv/)
+[![pytorch](https://img.shields.io/badge/pytorch-1.6~1.13-orange)](https://pytorch.org/get-started/previous-versions/)
+[![cuda](https://img.shields.io/badge/cuda-9.2~11.7-green)](https://developer.nvidia.com/cuda-downloads)
+[![PyPI](https://img.shields.io/pypi/v/mmcv)](https://pypi.org/project/mmcv)
+[![badge](https://github.com/open-mmlab/mmcv/workflows/build/badge.svg)](https://github.com/open-mmlab/mmcv/actions)
+[![codecov](https://codecov.io/gh/open-mmlab/mmcv/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmcv)
+[![license](https://img.shields.io/github/license/open-mmlab/mmcv.svg)](https://github.com/open-mmlab/mmcv/blob/master/LICENSE)
+
+English | [简体中文](README_zh-CN.md)
+
+## Introduction
+
+MMCV is a foundational library for computer vision research and it provides the following functionalities:
+
+- [Image/Video processing](https://mmcv.readthedocs.io/en/2.x/understand_mmcv/data_process.html)
+- [Image and annotation visualization](https://mmcv.readthedocs.io/en/2.x/understand_mmcv/visualization.html)
+- [Image transformation](https://mmcv.readthedocs.io/en/2.x/understand_mmcv/data_transform.html)
+- [Various CNN architectures](https://mmcv.readthedocs.io/en/2.x/understand_mmcv/cnn.html)
+- [High-quality implementation of common CPU and CUDA ops](https://mmcv.readthedocs.io/en/2.x/understand_mmcv/ops.html)
+
+It supports the following systems:
+
+- Linux
+- Windows
+- macOS
+
+See the [documentation](http://mmcv.readthedocs.io/en/2.x) for more features and usage.
+
+Note: MMCV requires Python 3.7+.
+
+## Installation
+
+There are two versions of MMCV:
+
+- **mmcv**: comprehensive, with full features and various CUDA ops out of the box. It takes longer time to build.
+- **mmcv-lite**: lite, without CUDA ops but all other features, similar to mmcv\<1.0.0. It is useful when you do not need those CUDA ops.
+
+**Note**: Do not install both versions in the same environment, otherwise you may encounter errors like `ModuleNotFound`. You need to uninstall one before installing the other. `Installing the full version is highly recommended if CUDA is available`.
+
+### Install mmcv
+
+Before installing mmcv, make sure that PyTorch has been successfully installed following the [PyTorch official installation guide](https://github.com/pytorch/pytorch#installation). For apple silicon users, please use PyTorch 1.13+.
+
+The command to install mmcv:
+
+```bash
+pip install -U openmim
+mim install "mmcv>=2.0.0rc1"
+```
+
+If you need to specify the version of mmcv, you can use the following command:
+
+```bash
+mim install mmcv==2.0.0rc3
+```
+
+If you find that the above installation command does not use a pre-built package ending with `.whl` but a source package ending with `.tar.gz`, you may not have a pre-build package corresponding to the PyTorch or CUDA or mmcv version, in which case you can [build mmcv from source](https://mmcv.readthedocs.io/en/2.x/get_started/build.html).
+
+<details>
+<summary>Installation log using pre-built packages</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv<br />
+<b>Downloading https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/mmcv-2.0.0rc3-cp38-cp38-manylinux1_x86_64.whl</b>
+
+</details>
+
+<details>
+<summary>Installation log using source packages</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv==2.0.0rc3<br />
+<b>Downloading mmcv-2.0.0rc3.tar.gz</b>
+
+</details>
+
+For more installation methods, please refer to the [Installation documentation](https://mmcv.readthedocs.io/en/2.x/get_started/installation.html).
+
+### Install mmcv-lite
+
+If you need to use PyTorch-related modules, make sure PyTorch has been successfully installed in your environment by referring to the [PyTorch official installation guide](https://github.com/pytorch/pytorch#installation).
+
+```bash
+pip install -U openmim
+mim install "mmcv-lite>=2.0.0rc1"
+```
+
+## FAQ
+
+If you face some installation issues, CUDA related issues or RuntimeErrors,
+you may first refer to this [Frequently Asked Questions](https://mmcv.readthedocs.io/en/2.x/faq.html).
+
+If you face installation problems or runtime issues, you may first refer to this [Frequently Asked Questions](https://mmcv.readthedocs.io/en/2.x/faq.html) to see if there is a solution. If the problem is still not solved, feel free to open an [issue](https://github.com/open-mmlab/mmcv/issues).
+
+## Citation
+
+If you find this project useful in your research, please consider cite:
+
+```latex
+@misc{mmcv,
+    title={{MMCV: OpenMMLab} Computer Vision Foundation},
+    author={MMCV Contributors},
+    howpublished = {\url{https://github.com/open-mmlab/mmcv}},
+    year={2018}
+}
+```
+
+## Contributing
+
+We appreciate all contributions to improve MMCV. Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for the contributing guideline.
+
+## License
+
+MMCV is released under the Apache 2.0 license, while some specific operations in this library are with other licenses. Please refer to [LICENSES.md](LICENSES.md) for the careful check, if you are using our code for commercial matters.
+
+## Projects in OpenMMLab
+
+- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab foundational library for training deep learning models.
+- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
+- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages.
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO series toolbox and benchmark.
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
diff --git a/cv/distiller/CWD/pytorch/mmcv/README_zh-CN.md b/cv/distiller/CWD/pytorch/mmcv/README_zh-CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..d9a81ebf58c7e5578e7b43d9803cd9a2b69bdd9b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/README_zh-CN.md
@@ -0,0 +1,164 @@
+<div align="center">
+  <img src="https://raw.githubusercontent.com/open-mmlab/mmcv/master/docs/en/mmcv-logo.png" width="300"/>
+  <div>&nbsp;</div>
+  <div align="center">
+    <b><font size="5">OpenMMLab 官网</font></b>
+    <sup>
+      <a href="https://openmmlab.com">
+        <i><font size="4">HOT</font></i>
+      </a>
+    </sup>
+    &nbsp;&nbsp;&nbsp;&nbsp;
+    <b><font size="5">OpenMMLab 开放平台</font></b>
+    <sup>
+      <a href="https://platform.openmmlab.com">
+        <i><font size="4">TRY IT OUT</font></i>
+      </a>
+    </sup>
+  </div>
+  <div>&nbsp;</div>
+</div>
+
+[![docs](https://img.shields.io/badge/docs-2.x-blue)](https://mmcv.readthedocs.io/zh_CN/2.x/)
+[![platform](https://img.shields.io/badge/platform-Linux%7CWindows%7CmacOS-blue)](https://mmcv.readthedocs.io/zh_CN/2.x/get_started/installation.html)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mmcv)](https://pypi.org/project/mmcv/)
+[![pytorch](https://img.shields.io/badge/pytorch-1.6~1.13-orange)](https://pytorch.org/get-started/previous-versions/)
+[![cuda](https://img.shields.io/badge/cuda-9.2~11.7-green)](https://developer.nvidia.com/cuda-downloads)
+[![PyPI](https://img.shields.io/pypi/v/mmcv)](https://pypi.org/project/mmcv)
+[![badge](https://github.com/open-mmlab/mmcv/workflows/build/badge.svg)](https://github.com/open-mmlab/mmcv/actions)
+[![codecov](https://codecov.io/gh/open-mmlab/mmcv/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmcv)
+[![license](https://img.shields.io/github/license/open-mmlab/mmcv.svg)](https://github.com/open-mmlab/mmcv/blob/master/LICENSE)
+
+[English](README.md) | 简体中文
+
+## 简介
+
+MMCV 是一个面向计算机视觉的基础库，它提供了以下功能：
+
+- [图像和视频处理](https://mmcv.readthedocs.io/zh_CN/2.x/understand_mmcv/data_process.html)
+- [图像和标注结果可视化](https://mmcv.readthedocs.io/zh_CN/2.x/understand_mmcv/visualization.html)
+- [图像变换](https://mmcv.readthedocs.io/zh_CN/2.x/understand_mmcv/data_transform.html)
+- [多种 CNN 网络结构](https://mmcv.readthedocs.io/zh_CN/2.x/understand_mmcv/cnn.html)
+- [高质量实现的常见 CUDA 算子](https://mmcv.readthedocs.io/zh_CN/2.x/understand_mmcv/ops.html)
+
+MMCV 支持多种平台，包括：
+
+- Linux
+- Windows
+- macOS
+
+如想了解更多特性和使用，请参考[文档](http://mmcv.readthedocs.io/zh_CN/2.x)。
+
+提示: MMCV 需要 Python 3.7 以上版本。
+
+## 安装
+
+MMCV 有两个版本：
+
+- **mmcv**: 完整版，包含所有的特性以及丰富的开箱即用的 CUDA 算子。注意完整版本可能需要更长时间来编译。
+- **mmcv-lite**: 精简版，不包含 CUDA 算子但包含其余所有特性和功能，类似 MMCV 1.0 之前的版本。如果你不需要使用 CUDA 算子的话，精简版可以作为一个考虑选项。
+
+**注意**: 请不要在同一个环境中安装两个版本，否则可能会遇到类似 `ModuleNotFound` 的错误。在安装一个版本之前，需要先卸载另一个。`如果 CUDA 可用，强烈推荐安装 mmcv`。
+
+### 安装 mmcv
+
+在安装 mmcv 之前，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://github.com/pytorch/pytorch#installation)。如果你使用的是搭载 apple silicon 的 mac 设备，请安装 PyTorch 1.13+ 的版本。
+
+安装 mmcv 的命令如下：
+
+```bash
+pip install -U openmim
+mim install "mmcv>=2.0.0rc1"
+```
+
+如果需要指定 mmcv 的版本，可以使用以下命令
+
+```bash
+mim install mmcv==2.0.0rc3
+```
+
+如果发现上述的安装命令没有使用预编译包（以 `.whl` 结尾）而是使用源码包（以 `.tar.gz` 结尾）安装，则有可能是我们没有提供和当前环境的 PyTorch 版本、CUDA 版本相匹配的 mmcv 预编译包，此时，你可以[源码安装 mmcv](https://mmcv.readthedocs.io/zh_CN/2.x/get_started/build.html)。
+
+<details>
+<summary>使用预编译包的安装日志</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv<br />
+<b>Downloading https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/mmcv-2.0.0rc3-cp38-cp38-manylinux1_x86_64.whl</b>
+
+</details>
+
+<details>
+<summary>使用源码包的安装日志</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv==2.0.0rc3<br />
+<b>Downloading mmcv-2.0.0rc3.tar.gz</b>
+
+</details>
+
+更多安装方式请参考[安装文档](https://mmcv.readthedocs.io/zh_CN/2.x/get_started/installation.html)。
+
+### 安装 mmcv-lite
+
+如果你需要使用和 PyTorch 相关的模块，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://github.com/pytorch/pytorch#installation)。
+
+```bash
+pip install -U openmim
+mim install "mmcv-lite>=2.0.0rc1"
+```
+
+## FAQ
+
+如果你遇到了安装问题或者运行时问题，请查看[问题解决页面](https://mmcv.readthedocs.io/zh_CN/2.x/faq.html)是否已有解决方案。如果问题仍然没有解决，欢迎提 [issue](https://github.com/open-mmlab/mmcv/issues)。
+
+## 贡献指南
+
+我们感谢所有的贡献者为改进和提升 MMCV 所作出的努力。请参考[贡献指南](CONTRIBUTING.md)来了解参与项目贡献的相关指引。
+
+## 许可证
+
+`MMCV` 目前以 Apache 2.0 的许可证发布，但是其中有一部分功能并不是使用的 Apache2.0 许可证，我们在 [许可证](LICENSES.md) 中详细地列出了这些功能以及他们对应的许可证，如果您正在从事盈利性活动，请谨慎参考此文档。
+
+## OpenMMLab 的其他项目
+
+- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab 深度学习模型训练基础库
+- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
+- [MIM](https://github.com/open-mmlab/mim): MIM 是 OpenMMlab 项目、算法、模型的统一入口
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab 目标检测工具箱
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab 新一代通用 3D 目标检测平台
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab 旋转框检测工具箱与测试基准
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO 系列工具箱与测试基准
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab 语义分割工具箱
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab 全流程文字检测识别理解工具箱
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab 姿态估计工具箱
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 人体参数化模型工具箱与测试基准
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab 自监督学习工具箱与测试基准
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab 模型压缩工具箱与测试基准
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab 少样本学习工具箱与测试基准
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab 新一代视频理解工具箱
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab 一体化视频目标感知平台
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab 光流估计工具箱与测试基准
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab 图像视频编辑工具箱
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab 图片视频生成模型工具箱
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架
+
+## 欢迎加入 OpenMMLab 社区
+
+扫描下方的二维码可关注 OpenMMLab 团队的 [知乎官方账号](https://www.zhihu.com/people/openmmlab)，加入 OpenMMLab 团队的 [官方交流 QQ 群](https://jq.qq.com/?_wv=1027&k=K0QI8ByU)，或添加微信小助手”OpenMMLabwx“加入官方交流微信群。
+
+<div align="center">
+<img src="https://user-images.githubusercontent.com/25839884/205870927-39f4946d-8751-4219-a4c0-740117558fd7.jpg" height="400" />  <img src="https://user-images.githubusercontent.com/25839884/203904835-62392033-02d4-4c73-a68c-c9e4c1e2b07f.jpg" height="400" /> <img src="https://user-images.githubusercontent.com/25839884/205872898-e2e6009d-c6bb-4d27-8d07-117e697a3da8.jpg" height="400" />
+</div>
+
+我们会在 OpenMMLab 社区为大家
+
+- 📢 分享 AI 框架的前沿核心技术
+- 💻 解读 PyTorch 常用模块源码
+- 📰 发布 OpenMMLab 的相关新闻
+- 🚀 介绍 OpenMMLab 开发的前沿算法
+- 🏃 获取更高效的问题答疑和意见反馈
+- 🔥 提供与各行各业开发者充分交流的平台
+
+干货满满 📘，等你来撩 💗，OpenMMLab 社区期待您的加入 👬
diff --git a/cv/distiller/CWD/pytorch/mmcv/TERMINOLOGY.md b/cv/distiller/CWD/pytorch/mmcv/TERMINOLOGY.md
new file mode 100644
index 0000000000000000000000000000000000000000..07411b7774c2ed713f472c1287b98b871c7f4d02
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/TERMINOLOGY.md
@@ -0,0 +1,30 @@
+# English-Chinese terminology comparison (英汉术语对照)
+
+This document is used as a reference for English-Chinese terminology translation.
+
+该文档用作中英文翻译对照参考。
+
+|      English      |     中文     |
+| :---------------: | :----------: |
+|    annotation     |     标注     |
+|     backbone      |   主干网络   |
+|     benchmark     |   基准测试   |
+|    checkpoint     | 模型权重文件 |
+|    classifier     |    分类器    |
+|     cls_head      |    分类头    |
+|      decoder      |    解码器    |
+|     detector      |    检测器    |
+|      encoder      |    编码器    |
+|     finetune      |     微调     |
+|   ground truth    |   真实标签   |
+|       hook        |     钩子     |
+|     localizer     |    定位器    |
+|       neck        |   模型颈部   |
+|     pipeline      |    流水线    |
+|    recognizer     |    识别器    |
+|     register      |    注册器    |
+|     schedule      |     调整     |
+|     scheduler     |    调度器    |
+|     segmentor     |    分割器    |
+|      tensor       |     张量     |
+| training schedule |   训练策略   |
diff --git a/cv/distiller/CWD/pytorch/mmcv/build_mmcv.sh b/cv/distiller/CWD/pytorch/mmcv/build_mmcv.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6a40dda04dc4cd6a864a4231928fb6e80a520534
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/build_mmcv.sh
@@ -0,0 +1,26 @@
+
+
+
+
+#!/bin/bash
+
+COREX_VERSION=${COREX_VERSION:-latest}
+MAX_JOBS=${MAX_JOBS:-$(nproc --all)}
+PYTHON_PATH=$(which python3)
+${PYTHON_PATH} -m pip list | grep "^torch .*+corex" || {
+  echo "ERROR: building mmcv requries the corex torch has been installed."
+  exit 1
+}
+
+export MAX_JOBS=${MAX_JOBS}
+
+FORCE_CUDA=1 MMCV_WITH_OPS=1 ${PYTHON_PATH} setup.py build 2>&1 | tee compile.log; [[ ${PIPESTATUS[0]} == 0 ]] || exit
+
+if [[ "${COREX_VERSION}" == "latest" ]]; then
+  COREX_VERSION=`date --utc +%Y%m%d%H%M%S`
+fi
+export MMCV_LOCAL_VERSION_IDENTIFIER="corex.${COREX_VERSION}"
+FORCE_CUDA=1 MMCV_WITH_OPS=1 ${PYTHON_PATH} setup.py bdist_wheel -d build_pip || exit
+
+# Return 0 status if all finished
+exit 0
diff --git a/cv/distiller/CWD/pytorch/mmcv/clean_mmcv.sh b/cv/distiller/CWD/pytorch/mmcv/clean_mmcv.sh
new file mode 100644
index 0000000000000000000000000000000000000000..afe4df7bbd4d767cfcc136d2367743eb03842142
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/clean_mmcv.sh
@@ -0,0 +1,14 @@
+
+
+
+
+#!/bin/bash
+
+PYTHON_PATH=$(which python3)
+
+rm -rf build
+${PYTHON_PATH} setup.py clean || true
+rm -rf build_pip
+
+# Return 0 status if all finished
+exit 0
diff --git a/cv/distiller/CWD/pytorch/mmcv/docker/README.md b/cv/distiller/CWD/pytorch/mmcv/docker/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..60d5c9de5da8faa7e0ae7e0def19a4320a2a7a5e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docker/README.md
@@ -0,0 +1,70 @@
+# Docker images
+
+There are two `Dockerfile` files to build docker images, one to build an image with the mmcv pre-built package and the other with the mmcv development environment.
+
+```text
+.
+|-- README.md
+|-- dev  # build with mmcv development environment
+|   `-- Dockerfile
+`-- release  # build with mmcv pre-built package
+    `-- Dockerfile
+```
+
+## Build docker images
+
+### Build with mmcv pre-built package
+
+Build with local repository
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git && cd mmcv
+docker build -t mmcv -f docker/release/Dockerfile .
+```
+
+Or build with remote repository
+
+```bash
+docker build -t mmcv https://github.com/open-mmlab/mmcv.git#master:docker/release
+```
+
+The [Dockerfile](release/Dockerfile) installs latest released version of mmcv by default, but you can specify mmcv versions to install expected versions.
+
+```bash
+docker image build -t mmcv -f docker/release/Dockerfile --build-arg MMCV=2.0.0rc1 .
+```
+
+If you also want to use other versions of PyTorch and CUDA, you can also pass them when building docker images.
+
+An example to build an image with PyTorch 1.11 and CUDA 11.3.
+
+```bash
+docker build -t mmcv -f docker/release/Dockerfile \
+    --build-arg PYTORCH=1.9.0 \
+    --build-arg CUDA=11.1 \
+    --build-arg CUDNN=8 \
+    --build-arg MMCV=2.0.0rc1 .
+```
+
+More available versions of PyTorch and CUDA can be found at [dockerhub/pytorch](https://hub.docker.com/r/pytorch/pytorch/tags).
+
+### Build with mmcv development environment
+
+If you want to build an docker image with the mmcv development environment, you can use the following command
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git && cd mmcv
+docker build -t mmcv -f docker/dev/Dockerfile --build-arg CUDA_ARCH=7.5 .
+```
+
+Note that `CUDA_ARCH` is the cumpute capability of your GPU and you can find it at [Compute Capability](https://developer.nvidia.com/cuda-gpus#compute).
+
+The building process may take 10 minutes or more.
+
+## Run images
+
+```bash
+docker run --gpus all --shm-size=8g -it mmcv
+```
+
+See [docker run](https://docs.docker.com/engine/reference/commandline/run/) for more usages.
diff --git a/cv/distiller/CWD/pytorch/mmcv/docker/dev/Dockerfile b/cv/distiller/CWD/pytorch/mmcv/docker/dev/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..a4d9e23fcfaa6e1af104aaa0e9cbb2a348b3cd34
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docker/dev/Dockerfile
@@ -0,0 +1,31 @@
+ARG PYTORCH="1.8.1"
+ARG CUDA="10.2"
+ARG CUDNN="7"
+
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+
+# To fix GPG key error when running apt-get update
+RUN rm /etc/apt/sources.list.d/cuda.list \
+    && rm /etc/apt/sources.list.d/nvidia-ml.list \
+    && apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub \
+    && apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+
+# Install git and system dependencies for opencv-python
+RUN apt-get update && apt-get install -y git \
+    && apt-get update && apt-get install -y libgl1 libglib2.0-0
+
+# Install system dependencies for unit tests
+RUN apt-get install -y ffmpeg libturbojpeg \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# build mmcv from source with develop mode
+ARG HTTPS_PROXY=""
+ENV https_proxy=${HTTPS_PROXY}
+ENV FORCE_CUDA="1"
+ARG CUDA_ARCH=""
+ENV TORCH_CUDA_ARCH_LIST=${CUDA_ARCH}
+RUN git clone https://github.com/open-mmlab/mmcv.git /mmcv
+WORKDIR /mmcv
+RUN git checkout 2.x && git rev-parse --short HEAD
+RUN pip install --no-cache-dir -e .[all] -v && pip install pre-commit && pre-commit install
diff --git a/cv/distiller/CWD/pytorch/mmcv/docker/release/Dockerfile b/cv/distiller/CWD/pytorch/mmcv/docker/release/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..d5e25e9eb70a87ab1c47a629cc6ed9706ade83c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docker/release/Dockerfile
@@ -0,0 +1,23 @@
+ARG PYTORCH="1.8.1"
+ARG CUDA="10.2"
+ARG CUDNN="7"
+
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+
+# To fix GPG key error when running apt-get update
+RUN rm /etc/apt/sources.list.d/cuda.list \
+    && rm /etc/apt/sources.list.d/nvidia-ml.list \
+    && apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub \
+    && apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+
+# Install system dependencies for opencv-python
+RUN apt-get update && apt-get install -y libgl1 libglib2.0-0 \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install mmcv
+ARG MMCV=""
+RUN if [ "${MMCV}" = "" ]; then pip install -U openmim && mim install 'mmcv>=2.0.0rc1'; else pip install -U openmim && mim install mmcv==${MMCV}; fi
+
+# Verify the installation
+RUN python -c 'import mmcv;print(mmcv.__version__)'
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/Makefile b/cv/distiller/CWD/pytorch/mmcv/docs/en/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..51285967a7d9722c5bdee4f6a81c154a56aa0846
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/Makefile
@@ -0,0 +1,19 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/1.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/1.png
new file mode 100644
index 0000000000000000000000000000000000000000..1837fbc8ca1dd46fc169d3c16fd2aef73645af92
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/1.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/2.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/2.png
new file mode 100644
index 0000000000000000000000000000000000000000..76e21def858b2f9392a90999d741cb653e766ae5
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/2.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/3.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/3.png
new file mode 100644
index 0000000000000000000000000000000000000000..5c8ef1315f92933436f4be14eb43669a85e9e098
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/community/3.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/css/readthedocs.css b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/css/readthedocs.css
new file mode 100644
index 0000000000000000000000000000000000000000..9e3a567d5f78aedb606600bb3111034a1003b362
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/css/readthedocs.css
@@ -0,0 +1,10 @@
+.header-logo {
+    background-image: url("../image/mmcv-logo.png");
+    background-size: 85px 40px;
+    height: 40px;
+    width: 85px;
+}
+
+table.colwidths-auto td {
+    width: 50%
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_img2toimg1.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_img2toimg1.png
new file mode 100644
index 0000000000000000000000000000000000000000..12df0a17ddd3290f5f05072c2bcd38ae79d9f100
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_img2toimg1.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_raw_images.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_raw_images.png
new file mode 100644
index 0000000000000000000000000000000000000000..b60cb9af087ffe69d56f9f124a86e4166c1198fe
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_raw_images.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_visualization.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_visualization.png
new file mode 100644
index 0000000000000000000000000000000000000000..4b2e026a058f85d31c70d51cabd11c02d7b26c35
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_visualization.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp.png
new file mode 100644
index 0000000000000000000000000000000000000000..c3764118dde74517a83ad1098e6e4d767341f5cb
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp_diff.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp_diff.png
new file mode 100644
index 0000000000000000000000000000000000000000..8b86474b81a52b863d24f39eac933336e23e36b4
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/flow_warp_diff.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/image/mmcv-logo.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/image/mmcv-logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..bcc5759f8fe3bc7d191d411c38a9e1d3c1c27a84
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/image/mmcv-logo.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.gif b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.gif
new file mode 100644
index 0000000000000000000000000000000000000000..943603058e4d4c3652fa37875ec146609db1848a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.gif differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.png
new file mode 100644
index 0000000000000000000000000000000000000000..3affeeb3cf59a07db44b0025b8e483f06d144c24
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/parallel_progress.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.gif b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.gif
new file mode 100644
index 0000000000000000000000000000000000000000..f2a6208a84c31c09c6448e495fbd4ca769ce833e
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.gif differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.png
new file mode 100644
index 0000000000000000000000000000000000000000..a4070e0052427373c59967ed07bb9f936ca8df59
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/progress.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/version.json b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/version.json
new file mode 100644
index 0000000000000000000000000000000000000000..7ee4965d36ed96f63f484137921d156d19cc40da
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/_static/version.json
@@ -0,0 +1,575 @@
+{
+    "Linux": [
+        {
+            "cuda": "11.7",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.5",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.0",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "9.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "9.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        }
+    ],
+    "Windows": [
+        {
+            "cuda": "11.7",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.5",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        }
+    ],
+    "macOS": [
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "mps",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        }
+    ]
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/_templates/classtemplate.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/_templates/classtemplate.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4f74842394ec9807fb1ae2d8f05a8a57e9a2e24c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/_templates/classtemplate.rst
@@ -0,0 +1,14 @@
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+
+
+..
+  autogenerated from source/_templates/classtemplate.rst
+  note it does not have :inherited-members:
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/arraymisc.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/arraymisc.rst
new file mode 100644
index 0000000000000000000000000000000000000000..28975eb76e94994c50d2fe52b8f34c7ce533e788
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/arraymisc.rst
@@ -0,0 +1,19 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.arraymisc
+===================================
+
+.. contents:: mmcv.arraymisc
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.arraymisc
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   quantize
+   dequantize
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/cnn.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/cnn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5cbcb191e9e4feb7a76e9d154411fd899a48999e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/cnn.rst
@@ -0,0 +1,70 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.cnn
+===================================
+
+.. contents:: mmcv.cnn
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.cnn
+
+Module
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   ContextBlock
+   Conv2d
+   Conv3d
+   ConvAWS2d
+   ConvModule
+   ConvTranspose2d
+   ConvTranspose3d
+   ConvWS2d
+   DepthwiseSeparableConvModule
+   GeneralizedAttention
+   HSigmoid
+   HSwish
+   LayerScale
+   Linear
+   MaxPool2d
+   MaxPool3d
+   NonLocal1d
+   NonLocal2d
+   NonLocal3d
+   Scale
+   Swish
+
+Build Function
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   build_activation_layer
+   build_conv_layer
+   build_norm_layer
+   build_padding_layer
+   build_plugin_layer
+   build_upsample_layer
+
+Miscellaneous
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   fuse_conv_bn
+   conv_ws_2d
+   is_norm
+   make_res_layer
+   make_vgg_layer
+   get_model_complexity_info
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/image.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/image.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3b93484952cd0c45b9d103088b0677f93fe5615d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/image.rst
@@ -0,0 +1,100 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.image
+===================================
+
+.. contents:: mmcv.image
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.image
+
+IO
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   imfrombytes
+   imread
+   imwrite
+   use_backend
+
+Color Space
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   bgr2gray
+   bgr2hls
+   bgr2hsv
+   bgr2rgb
+   bgr2ycbcr
+   gray2bgr
+   gray2rgb
+   hls2bgr
+   hsv2bgr
+   imconvert
+   rgb2bgr
+   rgb2gray
+   rgb2ycbcr
+   ycbcr2bgr
+   ycbcr2rgb
+
+Geometric
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   cutout
+   imcrop
+   imflip
+   impad
+   impad_to_multiple
+   imrescale
+   imresize
+   imresize_like
+   imresize_to_multiple
+   imrotate
+   imshear
+   imtranslate
+   rescale_size
+
+Photometric
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   adjust_brightness
+   adjust_color
+   adjust_contrast
+   adjust_hue
+   adjust_lighting
+   adjust_sharpness
+   auto_contrast
+   clahe
+   imdenormalize
+   imequalize
+   iminvert
+   imnormalize
+   lut_transform
+   posterize
+   solarize
+
+Miscellaneous
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   tensor2imgs
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/ops.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/ops.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b0290457bfa0c08f14d7fe346efccb33f388bdae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/ops.rst
@@ -0,0 +1,135 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.ops
+===================================
+
+.. contents:: mmcv.ops
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.ops
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   BorderAlign
+   CARAFE
+   CARAFENaive
+   CARAFEPack
+   Conv2d
+   ConvTranspose2d
+   CornerPool
+   Correlation
+   CrissCrossAttention
+   DeformConv2d
+   DeformConv2dPack
+   DeformRoIPool
+   DeformRoIPoolPack
+   DynamicScatter
+   FusedBiasLeakyReLU
+   GroupAll
+   Linear
+   MaskedConv2d
+   MaxPool2d
+   ModulatedDeformConv2d
+   ModulatedDeformConv2dPack
+   ModulatedDeformRoIPoolPack
+   MultiScaleDeformableAttention
+   PSAMask
+   PointsSampler
+   PrRoIPool
+   QueryAndGroup
+   RiRoIAlignRotated
+   RoIAlign
+   RoIAlignRotated
+   RoIAwarePool3d
+   RoIPointPool3d
+   RoIPool
+   SAConv2d
+   SigmoidFocalLoss
+   SimpleRoIAlign
+   SoftmaxFocalLoss
+   SparseConv2d
+   SparseConv3d
+   SparseConvTensor
+   SparseConvTranspose2d
+   SparseConvTranspose3d
+   SparseInverseConv2d
+   SparseInverseConv3d
+   SparseMaxPool2d
+   SparseMaxPool3d
+   SparseModule
+   SparseSequential
+   SubMConv2d
+   SubMConv3d
+   SyncBatchNorm
+   TINShift
+   Voxelization
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   active_rotated_filter
+   assign_score_withk
+   ball_query
+   batched_nms
+   bbox_overlaps
+   border_align
+   box_iou_rotated
+   boxes_iou3d
+   boxes_iou_bev
+   boxes_overlap_bev
+   carafe
+   carafe_naive
+   chamfer_distance
+   contour_expand
+   convex_giou
+   convex_iou
+   deform_conv2d
+   deform_roi_pool
+   diff_iou_rotated_2d
+   diff_iou_rotated_3d
+   dynamic_scatter
+   furthest_point_sample
+   furthest_point_sample_with_dist
+   fused_bias_leakyrelu
+   gather_points
+   grouping_operation
+   knn
+   masked_conv2d
+   min_area_polygons
+   modulated_deform_conv2d
+   nms
+   nms3d
+   nms3d_normal
+   nms_bev
+   nms_match
+   nms_normal_bev
+   nms_rotated
+   pixel_group
+   point_sample
+   points_in_boxes_all
+   points_in_boxes_cpu
+   points_in_boxes_part
+   points_in_polygons
+   prroi_pool
+   rel_roi_point_to_rel_img_point
+   riroi_align_rotated
+   roi_align
+   roi_align_rotated
+   roi_pool
+   rotated_feature_align
+   scatter_nd
+   sigmoid_focal_loss
+   soft_nms
+   softmax_focal_loss
+   three_interpolate
+   three_nn
+   tin_shift
+   upfirdn2d
+   voxelization
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/transforms.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/transforms.rst
new file mode 100644
index 0000000000000000000000000000000000000000..56463b304e39734ad55d27a2f5ab54ad529de7ed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/transforms.rst
@@ -0,0 +1,57 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.transforms
+===================================
+
+.. currentmodule:: mmcv.transforms
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   BaseTransform
+
+Loading
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   LoadAnnotations
+   LoadImageFromFile
+
+Processing
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   CenterCrop
+   MultiScaleFlipAug
+   Normalize
+   Pad
+   RandomChoiceResize
+   RandomFlip
+   RandomGrayscale
+   RandomResize
+   Resize
+
+Wrapper
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   Compose
+   KeyMapper
+   RandomApply
+   RandomChoice
+   TransformBroadcaster
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/utils.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/utils.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f2ff4c2a3872bc9ae0c2942debac5e5b523bd071
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/utils.rst
@@ -0,0 +1,23 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.utils
+===================================
+
+.. contents:: mmcv.utils
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.utils
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   IS_CUDA_AVAILABLE
+   IS_MLU_AVAILABLE
+   IS_MPS_AVAILABLE
+   collect_env
+   jit
+   skip_no_elena
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/video.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/video.rst
new file mode 100644
index 0000000000000000000000000000000000000000..a6ebca0eb73afcf3f3f11aae8520e2782a310f13
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/video.rst
@@ -0,0 +1,56 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.video
+===================================
+
+.. contents:: mmcv.video
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.video
+
+IO
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   VideoReader
+   Cache
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   frames2video
+
+Optical Flow
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   dequantize_flow
+   flow_from_bytes
+   flow_warp
+   flowread
+   flowwrite
+   quantize_flow
+   sparse_flow_from_bytes
+
+Video Processing
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   concat_video
+   convert_video
+   cut_video
+   resize_video
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/api/visualization.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/visualization.rst
new file mode 100644
index 0000000000000000000000000000000000000000..8f43ef27a441dcd9001a352cf18e97f8e615676d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/api/visualization.rst
@@ -0,0 +1,50 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.visualization
+===================================
+
+.. contents:: mmcv.visualization
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.visualization
+
+Color
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   Color
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   color_val
+
+Image
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   imshow
+   imshow_bboxes
+   imshow_det_bboxes
+
+Optical Flow
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   flow2rgb
+   flowshow
+   make_color_wheel
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/community/contributing.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/community/contributing.md
new file mode 100644
index 0000000000000000000000000000000000000000..5ac6993021f6ddde9c4d65e07de0690174bd8c5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/community/contributing.md
@@ -0,0 +1,267 @@
+## Contributing to OpenMMLab
+
+Welcome to the MMCV community, we are committed to building a cutting-edge computer vision foundational library and all kinds of contributions are welcomed, including but not limited to
+
+**Fix bug**
+
+You can directly post a Pull Request to fix typo in code or documents
+
+The steps to fix the bug of code implementation are as follows.
+
+1. If the modification involve significant changes, you should create an issue first and describe the error information and how to trigger the bug. Other developers will discuss with you and propose an proper solution.
+
+2. Posting a pull request after fixing the bug and adding corresponding unit test.
+
+**New Feature or Enhancement**
+
+1. If the modification involve significant changes, you should create an issue to discuss with our developers to propose an proper design.
+2. Post a Pull Request after implementing the new feature or enhancement and add corresponding unit test.
+
+**Document**
+
+You can directly post a pull request to fix documents. If you want to add a document, you should first create an issue to check if it is reasonable.
+
+### Pull Request Workflow
+
+If you're not familiar with Pull Request, don't worry! The following guidance will tell you how to create a Pull Request step by step. If you want to dive into the develop mode of Pull Request, you can refer to the [official documents](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. Fork and clone
+
+If you are posting a pull request for the first time, you should fork the OpenMMLab repositories by clicking the **Fork** button in the top right corner of the GitHub page, and the forked repositories will appear under your GitHub profile.
+
+<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">
+
+Then, you can clone the repositories to local:
+
+```shell
+git clone git@github.com:{username}/mmcv.git
+```
+
+After that, you should ddd official repository as the upstream repository
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmcv
+```
+
+Check whether remote repository has been added successfully by `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmcv.git (fetch)
+origin	git@github.com:{username}/mmcv.git (push)
+upstream	git@github.com:open-mmlab/mmcv (fetch)
+upstream	git@github.com:open-mmlab/mmcv (push)
+```
+
+```{note}
+Here's a brief introduction to origin and upstream. When we use "git clone", we create an "origin" remote by default, which points to the repository cloned from. As for "upstream", we add it ourselves to point to the target repository. Of course, if you don't like the name "upstream", you could name it as you wish. Usually, we'll push the code to "origin". If the pushed code conflicts with the latest code in official("upstream"), we should pull the latest code from upstream to resolve the conflicts, and then push to "origin" again. The posted Pull Request will be updated automatically.
+```
+
+#### 2. Configure pre-commit
+
+You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of OpenMMLab. **Note**: The following code should be executed under the MMCV directory.
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+Check that pre-commit is configured successfully, and install the hooks defined in `.pre-commit-config.yaml`.
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+```{note}
+Chinese users may fail to download the pre-commit hooks due to the network issue. In this case, you could download these hooks from gitee by setting the .pre-commit-config-zh-cn.yaml
+
+pre-commit install -c .pre-commit-config-zh-cn.yaml
+pre-commit run --all-files -c .pre-commit-config-zh-cn.yaml
+```
+
+If the installation process is interrupted, you can repeatedly run `pre-commit run ... ` to continue the installation.
+
+If the code does not conform to the code style specification, pre-commit will raise a warning and  fixes some of the errors automatically.
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+If we want to commit our code bypassing the pre-commit hook, we can use the `--no-verify` option(**only for temporarily commit**.
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. Create a development branch
+
+After configuring the pre-commit, we should create a branch based on the master branch to develop the new feature or fix the bug. The proposed branch name is `username/pr_name`
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:
+
+```shell
+git pull upstream master
+```
+
+#### 4. Commit the code and pass the unit test
+
+- MMCV introduces mypy to do static type checking to increase the robustness of the code. Therefore, we need to add Type Hints to our code and pass the mypy check. If you are not familiar with Type Hints, you can refer to [this tutorial](https://docs.python.org/3/library/typing.html).
+
+- The committed code should pass through the unit test
+
+  ```shell
+  # Pass all unit tests
+  pytest tests
+
+  # Pass the unit test of runner
+  pytest tests/test_runner/test_runner.py
+  ```
+
+  If the unit test fails for lack of dependencies, you can install the dependencies referring to the [guidance](#unit-test)
+
+- If the documents are modified/added, we should check the rendering result referring to [guidance](#document-rendering)
+
+#### 5. Push the code to remote
+
+We could push the local commits to remote after passing through the check of unit test and pre-commit. You can associate the local branch with remote branch by adding `-u` option.
+
+```shell
+git push -u origin {branch_name}
+```
+
+This will allow you to use the `git push` command to push code directly next time, without having to specify a branch or the remote repository.
+
+#### 6. Create a Pull Request
+
+(1) Create a pull request in GitHub's Pull request interface
+
+<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">
+
+(2) Modify the PR description according to the guidelines so that other developers can better understand your changes
+
+<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">
+
+Find more details about Pull Request description in [pull request guidelines](#pr-specs).
+
+**note**
+
+(a) The Pull Request description should contain the reason for the change, the content of the change, and the impact of the change, and be associated with the relevant Issue (see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
+
+(b) If it is your first contribution, please sign the CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) Check whether the Pull Request pass through the CI
+
+<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">
+
+MMCV will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
+
+(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+#### 7. Resolve conflicts
+
+If your local branch conflicts with the latest master branch of "upstream", you'll need to resolove them. There are two ways to do this:
+
+```shell
+git fetch --all --prune
+git rebase upstream/master
+```
+
+or
+
+```shell
+git fetch --all --prune
+git merge upstream/master
+```
+
+If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.
+
+### Guidance
+
+#### Unit test
+
+If you cannot run the unit test of some modules for lacking of some dependencies, such as [video](https://github.com/open-mmlab/mmcv/tree/master/mmcv/video) module, you can try to install the following dependencies:
+
+```shell
+# Linux
+sudo apt-get update -y
+sudo apt-get install -y libturbojpeg
+sudo apt-get install -y ffmpeg
+
+# Windows
+conda install ffmpeg
+```
+
+We should also make sure the committed code will not decrease the coverage of unit test, we could run the following command to check the coverage of unit test:
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### Document rendering
+
+If the documents are modified/added, we should check the rendering result. We could install the dependencies and run the following command to render the documents and check the results:
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### Code style
+
+#### Python
+
+We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
+
+We use the following tools for linting and formatting:
+
+- [flake8](https://github.com/PyCQA/flake8): A wrapper around some linter tools.
+- [isort](https://github.com/timothycrosley/isort): A Python utility to sort imports.
+- [yapf](https://github.com/google/yapf): A formatter for Python files.
+- [codespell](https://github.com/codespell-project/codespell): A Python utility to fix common misspellings in text files.
+- [mdformat](https://github.com/executablebooks/mdformat): Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files.
+- [docformatter](https://github.com/myint/docformatter): A formatter to format docstring.
+
+Style configurations of yapf and isort can be found in [setup.cfg](./setup.cfg).
+
+We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`, `markdown files`,
+fixes `end-of-files`, `float-quoted-strings`, `python-encoding-pragma`, `mixed-line-ending`, sorts `requirments.txt` automatically on every commit.
+The config for a pre-commit hook is stored in [.pre-commit-config](./.pre-commit-config.yaml).
+
+#### C++ and CUDA
+
+We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
+
+### PR Specs
+
+1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
+
+2. One short-time branch should be matched with only one PR
+
+3. Accomplish a detailed change in one PR. Avoid large PR
+
+   - Bad: Support Faster R-CNN
+   - Acceptable: Add a box head to Faster R-CNN
+   - Good: Add a parameter to box head to support custom conv-layer number
+
+4. Provide clear and significant commit message
+
+5. Provide clear and meaningful PR description
+
+   - Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
+   - Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
+   - Introduce main changes, results and influences on other modules in short description
+   - Associate related issues and pull requests with a milestone
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/community/pr.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/community/pr.md
new file mode 100644
index 0000000000000000000000000000000000000000..1bdd90f2bc41867e5c17403690f6a35cfe2c07b7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/community/pr.md
@@ -0,0 +1,3 @@
+## Pull Request (PR)
+
+Content has been migrated to [contributing guidance](contributing.md).
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/compatibility.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/compatibility.md
new file mode 100644
index 0000000000000000000000000000000000000000..c8618388f4bfb9fd7caee84e61e193b323aa852f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/compatibility.md
@@ -0,0 +1,176 @@
+### v1.3.18
+
+Some ops have different implementations on different devices. Lots of macros and type checks are scattered in several files, which makes the code hard to maintain. For example:
+
+```c++
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(rois);
+    CHECK_CUDA_INPUT(output);
+    CHECK_CUDA_INPUT(argmax_y);
+    CHECK_CUDA_INPUT(argmax_x);
+
+    roi_align_forward_cuda(input, rois, output, argmax_y, argmax_x,
+                           aligned_height, aligned_width, spatial_scale,
+                           sampling_ratio, pool_mode, aligned);
+#else
+    AT_ERROR("RoIAlign is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(rois);
+    CHECK_CPU_INPUT(output);
+    CHECK_CPU_INPUT(argmax_y);
+    CHECK_CPU_INPUT(argmax_x);
+    roi_align_forward_cpu(input, rois, output, argmax_y, argmax_x,
+                          aligned_height, aligned_width, spatial_scale,
+                          sampling_ratio, pool_mode, aligned);
+  }
+```
+
+Registry and dispatcher are added to manage these implementations.
+
+```c++
+
+void ROIAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       int pool_mode, bool aligned);
+
+void roi_align_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignForwardCUDAKernelLauncher(
+      input, rois, output, argmax_y, argmax_x, aligned_height, aligned_width,
+      spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+// register cuda implementation
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, CUDA, roi_align_forward_cuda);
+
+// roi_align.cpp
+// use the dispatcher to invoke different implementation depending on device type of input tensors.
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_forward_impl, input, rois, output, argmax_y,
+                       argmax_x, aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, pool_mode, aligned);
+}
+
+```
+
+### v1.3.11
+
+In order to flexibly support more backends and hardwares like `NVIDIA GPUs` and `AMD GPUs`, the directory of `mmcv/ops/csrc` is refactored. Note that this refactoring will not affect the usage in API. For related information, please refer to [PR1206](https://github.com/open-mmlab/mmcv/pull/1206).
+
+The original directory was organized as follows.
+
+```
+.
+├── common_cuda_helper.hpp
+├── ops_cuda_kernel.cuh
+├── pytorch_cpp_helper.hpp
+├── pytorch_cuda_helper.hpp
+├── parrots_cpp_helper.hpp
+├── parrots_cuda_helper.hpp
+├── parrots_cudawarpfunction.cuh
+├── onnxruntime
+│   ├── onnxruntime_register.h
+│   ├── onnxruntime_session_options_config_keys.h
+│   ├── ort_mmcv_utils.h
+│   ├── ...
+│   ├── onnx_ops.h
+│   └── cpu
+│       ├── onnxruntime_register.cpp
+│       ├── ...
+│       └── onnx_ops_impl.cpp
+├── parrots
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_cuda.cu
+│   ├── ops_parrots.cpp
+│   └── ops_pytorch.h
+├── pytorch
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_cuda.cu
+│   ├── pybind.cpp
+└── tensorrt
+    ├── trt_cuda_helper.cuh
+    ├── trt_plugin_helper.hpp
+    ├── trt_plugin.hpp
+    ├── trt_serialize.hpp
+    ├── ...
+    ├── trt_ops.hpp
+    └── plugins
+        ├── trt_cuda_helper.cu
+        ├── trt_plugin.cpp
+        ├── ...
+        ├── trt_ops.cpp
+        └── trt_ops_kernel.cu
+```
+
+After refactored, it is organized as follows.
+
+```
+.
+├── common
+│   ├── box_iou_rotated_utils.hpp
+│   ├── parrots_cpp_helper.hpp
+│   ├── parrots_cuda_helper.hpp
+│   ├── pytorch_cpp_helper.hpp
+│   ├── pytorch_cuda_helper.hpp
+│   └── cuda
+│       ├── common_cuda_helper.hpp
+│       ├── parrots_cudawarpfunction.cuh
+│       ├── ...
+│       └── ops_cuda_kernel.cuh
+├── onnxruntime
+│   ├── onnxruntime_register.h
+│   ├── onnxruntime_session_options_config_keys.h
+│   ├── ort_mmcv_utils.h
+│   ├── ...
+│   ├── onnx_ops.h
+│   └── cpu
+│       ├── onnxruntime_register.cpp
+│       ├── ...
+│       └── onnx_ops_impl.cpp
+├── parrots
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_parrots.cpp
+│   └── ops_pytorch.h
+├── pytorch
+│   ├── info.cpp
+│   ├── pybind.cpp
+│   ├── ...
+│   ├── ops.cpp
+│   └── cuda
+│       ├── ...
+│       └── ops_cuda.cu
+└── tensorrt
+    ├── trt_cuda_helper.cuh
+    ├── trt_plugin_helper.hpp
+    ├── trt_plugin.hpp
+    ├── trt_serialize.hpp
+    ├── ...
+    ├── trt_ops.hpp
+    └── plugins
+        ├── trt_cuda_helper.cu
+        ├── trt_plugin.cpp
+        ├── ...
+        ├── trt_ops.cpp
+        └── trt_ops_kernel.cu
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/conf.py b/cv/distiller/CWD/pytorch/mmcv/docs/en/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..471bd225adeede01787a236ac0d370d0056b960a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/conf.py
@@ -0,0 +1,215 @@
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+import pytorch_sphinx_theme
+from sphinx.builders.html import StandaloneHTMLBuilder
+
+sys.path.insert(0, os.path.abspath('../..'))
+
+version_file = '../../mmcv/version.py'
+with open(version_file) as f:
+    exec(compile(f.read(), version_file, 'exec'))
+__version__ = locals()['__version__']
+
+# -- Project information -----------------------------------------------------
+
+project = 'mmcv'
+copyright = '2018-2022, OpenMMLab'
+author = 'MMCV Authors'
+
+# The short X.Y version
+version = __version__
+# The full version, including alpha/beta/rc tags
+release = __version__
+
+# -- General configuration ---------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+    'sphinx_markdown_tables',
+    'myst_parser',
+    'sphinx_copybutton',
+]  # yapf: disable
+
+myst_heading_anchors = 4
+
+myst_enable_extensions = ['colon_fence']
+
+# Configuration for intersphinx
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3', None),
+    'numpy': ('https://numpy.org/doc/stable', None),
+    'torch': ('https://pytorch.org/docs/stable/', None),
+    'mmengine': ('https://mmengine.readthedocs.io/en/latest', None),
+}
+
+autodoc_mock_imports = ['mmcv._ext', 'mmcv.utils.ext_loader', 'torchvision']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = {
+    '.rst': 'restructuredtext',
+    '.md': 'markdown',
+}
+
+# The master toctree document.
+master_doc = 'index'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'sphinx_rtd_theme'
+html_theme = 'pytorch_sphinx_theme'
+html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+html_theme_options = {
+    'menu': [
+        {
+            'name': 'GitHub',
+            'url': 'https://github.com/open-mmlab/mmcv'
+        },
+    ],
+    # Specify the language of shared menu
+    'menu_lang': 'en',
+}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+html_css_files = ['css/readthedocs.css']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+
+# -- Options for HTMLHelp output ---------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'mmcvdoc'
+
+# -- Options for LaTeX output ------------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'mmcv.tex', 'mmcv Documentation', 'MMCV Contributors',
+     'manual'),
+]
+
+# -- Options for manual page output ------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [(master_doc, 'mmcv', 'mmcv Documentation', [author], 1)]
+
+# -- Options for Texinfo output ----------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'mmcv', 'mmcv Documentation', author, 'mmcv',
+     'One line description of project.', 'Miscellaneous'),
+]
+
+# -- Options for Epub output -------------------------------------------------
+
+# Bibliographic Dublin Core info.
+epub_title = project
+
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+
+# A unique identification for the text.
+#
+# epub_uid = ''
+
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+
+# set priority when building html
+StandaloneHTMLBuilder.supported_image_types = [
+    'image/svg+xml', 'image/gif', 'image/png', 'image/jpeg'
+]
+# -- Extension configuration -------------------------------------------------
+# Ignore >>> when copying code
+copybutton_prompt_text = r'>>> |\.\.\. '
+copybutton_prompt_is_regexp = True
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/deployment/mmcv_ops_definition.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/deployment/mmcv_ops_definition.md
new file mode 100644
index 0000000000000000000000000000000000000000..d7eabb33fd41855116ed975d4e48daea81e4d74d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/deployment/mmcv_ops_definition.md
@@ -0,0 +1,686 @@
+# MMCV Operators
+
+To make custom operators in MMCV more standard, precise definitions of each operator are listed in this document.
+
+<!-- TOC -->
+
+- [MMCV Operators](#mmcv-operators)
+  - [MMCVBorderAlign](#mmcvborderalign)
+    - [Description](#description)
+    - [Parameters](#parameters)
+    - [Inputs](#inputs)
+    - [Outputs](#outputs)
+    - [Type Constraints](#type-constraints)
+  - [MMCVCARAFE](#mmcvcarafe)
+    - [Description](#description-1)
+    - [Parameters](#parameters-1)
+    - [Inputs](#inputs-1)
+    - [Outputs](#outputs-1)
+    - [Type Constraints](#type-constraints-1)
+  - [MMCVCAWeight](#mmcvcaweight)
+    - [Description](#description-2)
+    - [Parameters](#parameters-2)
+    - [Inputs](#inputs-2)
+    - [Outputs](#outputs-2)
+    - [Type Constraints](#type-constraints-2)
+  - [MMCVCAMap](#mmcvcamap)
+    - [Description](#description-3)
+    - [Parameters](#parameters-3)
+    - [Inputs](#inputs-3)
+    - [Outputs](#outputs-3)
+    - [Type Constraints](#type-constraints-3)
+  - [MMCVCornerPool](#mmcvcornerpool)
+    - [Description](#description-4)
+    - [Parameters](#parameters-4)
+    - [Inputs](#inputs-4)
+    - [Outputs](#outputs-4)
+    - [Type Constraints](#type-constraints-4)
+  - [MMCVDeformConv2d](#mmcvdeformconv2d)
+    - [Description](#description-5)
+    - [Parameters](#parameters-5)
+    - [Inputs](#inputs-5)
+    - [Outputs](#outputs-5)
+    - [Type Constraints](#type-constraints-5)
+  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
+    - [Description](#description-6)
+    - [Parameters](#parameters-6)
+    - [Inputs](#inputs-6)
+    - [Outputs](#outputs-6)
+    - [Type Constraints](#type-constraints-6)
+  - [MMCVDeformRoIPool](#mmcvdeformroipool)
+    - [Description](#description-7)
+    - [Parameters](#parameters-7)
+    - [Inputs](#inputs-7)
+    - [Outputs](#outputs-7)
+    - [Type Constraints](#type-constraints-7)
+  - [MMCVMaskedConv2d](#mmcvmaskedconv2d)
+    - [Description](#description-8)
+    - [Parameters](#parameters-8)
+    - [Inputs](#inputs-8)
+    - [Outputs](#outputs-8)
+    - [Type Constraints](#type-constraints-8)
+  - [MMCVPSAMask](#mmcvpsamask)
+    - [Description](#description-9)
+    - [Parameters](#parameters-9)
+    - [Inputs](#inputs-9)
+    - [Outputs](#outputs-9)
+    - [Type Constraints](#type-constraints-9)
+  - [NonMaxSuppression](#nonmaxsuppression)
+    - [Description](#description-10)
+    - [Parameters](#parameters-10)
+    - [Inputs](#inputs-10)
+    - [Outputs](#outputs-10)
+    - [Type Constraints](#type-constraints-10)
+  - [MMCVRoIAlign](#mmcvroialign)
+    - [Description](#description-11)
+    - [Parameters](#parameters-11)
+    - [Inputs](#inputs-11)
+    - [Outputs](#outputs-11)
+    - [Type Constraints](#type-constraints-11)
+  - [MMCVRoIAlignRotated](#mmcvroialignrotated)
+    - [Description](#description-12)
+    - [Parameters](#parameters-12)
+    - [Inputs](#inputs-12)
+    - [Outputs](#outputs-12)
+    - [Type Constraints](#type-constraints-12)
+  - [grid_sampler\*](#grid_sampler)
+    - [Description](#description-13)
+    - [Parameters](#parameters-13)
+    - [Inputs](#inputs-13)
+    - [Outputs](#outputs-13)
+    - [Type Constraints](#type-constraints-13)
+  - [cummax\*](#cummax)
+    - [Description](#description-14)
+    - [Parameters](#parameters-14)
+    - [Inputs](#inputs-14)
+    - [Outputs](#outputs-14)
+    - [Type Constraints](#type-constraints-14)
+  - [cummin\*](#cummin)
+    - [Description](#description-15)
+    - [Parameters](#parameters-15)
+    - [Inputs](#inputs-15)
+    - [Outputs](#outputs-15)
+    - [Type Constraints](#type-constraints-15)
+  - [Reminders](#reminders)
+
+<!-- TOC -->
+
+## MMCVBorderAlign
+
+### Description
+
+Applies `border_align` over the input feature based on predicted bboxes.
+
+For each border line (e.g. top, left, bottom or right) of each box,
+border_align does the following:
+
+- uniformly samples `pool_size`+1 positions on this line, involving the start and end points.
+- the corresponding features on these points are computed by bilinear interpolation.
+- max pooling over all the `pool_size`+1 positions are used for computing pooled feature.
+
+Read [BorderDet: Border Feature for Dense Object Detection](ttps://arxiv.org/abs/2007.11056) for more detailed information.
+
+### Parameters
+
+| Type  | Parameter   | Description                                                                         |
+| ----- | ----------- | ----------------------------------------------------------------------------------- |
+| `int` | `pool_size` | number of positions sampled over the boxes' borders(e.g. top, bottom, left, right). |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Features with shape [N,4C,H,W]. Channels ranged in [0,C), [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom, right features respectively</dd>
+<dt><tt>boxes</tt>: T</dt>
+<dd>Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Pooled features with shape [N,C,H*W,4]. The order is(top,left,bottom,right) for the last dimension.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVCARAFE
+
+### Description
+
+CARAFE operator performs feature upsampling.
+
+Read [CARAFE: Content-Aware ReAssembly of FEatures](https://arxiv.org/abs/1905.02188) for more detailed information.
+
+### Parameters
+
+| Type    | Parameter      | Description                                   |
+| ------- | -------------- | --------------------------------------------- |
+| `int`   | `kernel_size`  | reassemble kernel size, should be odd integer |
+| `int`   | `group_size`   | reassemble group size                         |
+| `float` | `scale_factor` | upsample ratio(>=1)                           |
+
+### Inputs
+
+<dl>
+<dt><tt>features</tt>: T</dt>
+<dd>Input features. 4-D tensor of shape (N, C, H, W). N is the batch size.</dd>
+<dt><tt>masks</tt>: T</dt>
+<dd>The input mask</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>The upsampled features. 4-D tensor of shape (N, C, H * scale_factor, W * scale_factor). N is the batch size.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVCAWeight
+
+### Description
+
+Operator for Criss-Cross Attention
+Read [CCNet: Criss-Cross Attention for SemanticSegmentation](https://arxiv.org/pdf/1811.11721.pdf) for more detailed information.
+
+### Parameters
+
+None
+
+### Inputs
+
+<dl>
+<dt><tt>t</tt>: T</dt>
+<dd>The query matrix of shape (N, C', H, W).</dd>
+<dt><tt>f</tt>: T</dt>
+<dd>The key matrix of shape (N, C', H, W).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>weight</tt>: T</dt>
+<dd>The attention map of shape (N, H+W-1, H, W).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVCAMap
+
+### Description
+
+Operator for Criss-Cross Attention
+Read [CCNet: Criss-Cross Attention for SemanticSegmentation](https://arxiv.org/pdf/1811.11721.pdf) for more detailed information.
+
+### Parameters
+
+None
+
+### Inputs
+
+<dl>
+<dt><tt>weight</tt>: T</dt>
+<dd>Output from the operator MMCVCAWeight.</dd>
+<dt><tt>value</tt>: T</dt>
+<dd>The value matrix of shape (N, C, H, W).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output tensor of aggregated contextual information</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVCornerPool
+
+### Description
+
+Perform CornerPool on `input` features. Read [CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) for more details.
+
+### Parameters
+
+| Type  | Parameter | Description                                                      |
+| ----- | --------- | ---------------------------------------------------------------- |
+| `int` | `mode`    | corner pool mode, (0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input features. 4-D tensor of shape (N, C, H, W). N is the batch size.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>The pooled features. 4-D tensor of shape (N, C, H, W).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVDeformConv2d
+
+### Description
+
+Applies a deformable 2D convolution over an input signal composed of several input planes.
+
+Read [Deformable Convolutional Networks](https://arxiv.org/pdf/1703.06211.pdf) for detail.
+
+### Parameters
+
+| Type           | Parameter           | Description                                                                                                       |
+| -------------- | ------------------- | ----------------------------------------------------------------------------------------------------------------- |
+| `list of ints` | `stride`            | The stride of the convolving kernel, (sH, sW). Defaults to `(1, 1)`.                                              |
+| `list of ints` | `padding`           | Paddings on both sides of the input, (padH, padW).  Defaults to `(0, 0)`.                                         |
+| `list of ints` | `dilation`          | The spacing between kernel elements (dH, dW). Defaults to `(1, 1)`.                                               |
+| `int`          | `groups`            | Split input into groups. `input_channel` should be divisible by the number of groups. Defaults to `1`.            |
+| `int`          | `deformable_groups` | Groups of deformable offset. Defaults to `1`.                                                                     |
+| `int`          | `bias`              | Whether to add a learnable bias to the output. `0` stands for `False` and `1` stands for `True`. Defaults to `0`. |
+| `int`          | `im2col_step`       | Groups of deformable offset. Defaults to `32`.                                                                    |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>offset</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW is the height and width of offset and output.</dd>
+<dt><tt>weight</tt>: T</dt>
+<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32, Linear)
+
+## MMCVModulatedDeformConv2d
+
+### Description
+
+Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
+
+### Parameters
+
+| Type           | Parameter           | Description                                                                           |
+| -------------- | ------------------- | ------------------------------------------------------------------------------------- |
+| `list of ints` | `stride`            | The stride of the convolving kernel. (sH, sW)                                         |
+| `list of ints` | `padding`           | Paddings on both sides of the input. (padH, padW)                                     |
+| `list of ints` | `dilation`          | The spacing between kernel elements. (dH, dW)                                         |
+| `int`          | `deformable_groups` | Groups of deformable offset.                                                          |
+| `int`          | `groups`            | Split input into groups. `input_channel` should be divisible by the number of groups. |
+
+### Inputs
+
+<dl>
+<dt><tt>feature</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>offset</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.</dd>
+<dt><tt>mask</tt>: T</dt>
+<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.</dd>
+<dt><tt>weight]</tt>: T</dt>
+<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
+<dt><tt>bias</tt>: T, optional</dt>
+<dd>Input bias; 1-D tensor of shape (output_channel).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32, Linear)
+
+## MMCVDeformRoIPool
+
+### Description
+
+Deformable roi pooling layer
+
+### Parameters
+
+| Type    | Parameter        | Description                                                                                                   |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
+| `int`   | `output_height`  | height of output roi                                                                                          |
+| `int`   | `output_width`   | width of output roi                                                                                           |
+| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
+| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
+| `float` | `gamma`          | gamma                                                                                                         |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+<dt><tt>rois</tt>: T</dt>
+<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
+<dt><tt>offset</tt>: T</dt>
+<dd>offset of height and width. Defaults to a tensor of zero</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>feat</tt>: T</dt>
+<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVMaskedConv2d
+
+### Description
+
+Performs a masked 2D convolution from PixelRNN
+Read [Pixel Recurrent Neural Networks](https://arxiv.org/abs/1601.06759) for more detailed information.
+
+### Parameters
+
+| Type           | Parameter | Description                                                                      |
+| -------------- | --------- | -------------------------------------------------------------------------------- |
+| `list of ints` | `stride`  | The stride of the convolving kernel. (sH, sW). **Only support stride=1 in mmcv** |
+| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW). Defaults to `(0, 0)`.         |
+
+### Inputs
+
+<dl>
+<dt><tt>features</tt>: T</dt>
+<dd>Input features; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+<dt><tt>mask</tt>: T</dt>
+<dd>Input mask; 3D tensor of shape (N, H, W)</dd>
+<dt><tt>weight</tt>: T</dt>
+<dd>The learnable weights of the module</dd>
+<dt><tt>bias</tt>: T</dt>
+<dd>The learnable bias of the module</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>The output convolved feature</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVPSAMask
+
+### Description
+
+An operator from PSANet.
+
+Read [PSANet: Point-wise Spatial Attention Network for Scene Parsing](https://hszhao.github.io/papers/eccv18_psanet.pdf) for more detailed information.
+
+### Parameters
+
+| Type           | Parameter   | Description                                  |
+| -------------- | ----------- | -------------------------------------------- |
+| `int`          | `psa_type`  | `0` means collect and `1` means `distribute` |
+| `list of ints` | `mask_size` | The size of mask                             |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output tensor of shape (N, H * W, H, W)</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## NonMaxSuppression
+
+### Description
+
+Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes.
+
+Note this definition is slightly different with [onnx: NonMaxSuppression](https://github.com/onnx/onnx/blob/master/docs/Operators.md#nonmaxsuppression)
+
+### Parameters
+
+| Type    | Parameter                    | Description                                                                                                                          |
+| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
+| `int`   | `center_point_box`           | 0 - the box data is supplied as \[y1, x1, y2, x2\], 1-the box data is supplied as \[x_center, y_center, width, height\].             |
+| `int`   | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
+| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range \[0, 1\]. Default to 0.                   |
+| `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
+| `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |
+
+### Inputs
+
+<dl>
+<dt><tt>boxes</tt>: T</dt>
+<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>
+<dt><tt>scores</tt>: T</dt>
+<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
+<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
+<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>
+<dd>All invalid indices will be filled with -1.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32, Linear)
+
+## MMCVRoIAlign
+
+### Description
+
+Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
+
+### Parameters
+
+| Type    | Parameter        | Description                                                                                                   |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
+| `int`   | `output_height`  | height of output roi                                                                                          |
+| `int`   | `output_width`   | width of output roi                                                                                           |
+| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
+| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+<dt><tt>rois</tt>: T</dt>
+<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>feat</tt>: T</dt>
+<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## MMCVRoIAlignRotated
+
+### Description
+
+Perform RoI align pooling for rotated proposals
+
+### Parameters
+
+| Type    | Parameter        | Description                                                                                                   |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
+| `int`   | `output_height`  | height of output roi                                                                                          |
+| `int`   | `output_width`   | width of output roi                                                                                           |
+| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
+| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
+| `int`   | `clockwise`      | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
+
+### Inputs
+
+<dl>
+<dt><tt>features</tt>: T</dt>
+<dd>Input feature map; 4D tensor of shape (N, C, H, W)</dd>
+<dt><tt>rois</tt>: T</dt>
+<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## grid_sampler\*
+
+### Description
+
+Perform sample from `input` with pixel locations from `grid`.
+
+Check [torch.nn.functional.grid_sample](https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html?highlight=grid_sample#torch.nn.functional.grid_sample) for more information.
+
+### Parameters
+
+| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
+| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
+| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
+| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>grid</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output. </dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32, Linear)
+
+## cummax\*
+
+### Description
+
+Returns a tuple (`values`, `indices`) where `values` is the cumulative maximum elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. Read [torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html) for more details.
+
+### Parameters
+
+| Type  | Parameter | Description                            |
+| ----- | --------- | -------------------------------------- |
+| `int` | `dim`     | the dimension to do the operation over |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output the cumulative maximum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
+<dt><tt>indices</tt>: tensor(int64)</dt>
+<dd>Output the index location of each cumulative maximum value found in the dimension `dim`, with the same shape as `input`.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## cummin\*
+
+### Description
+
+Returns a tuple (`values`, `indices`) where `values` is the cumulative minimum elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`. Read [torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html) for more details.
+
+### Parameters
+
+| Type  | Parameter | Description                            |
+| ----- | --------- | -------------------------------------- |
+| `int` | `dim`     | the dimension to do the operation over |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output the cumulative minimum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
+<dt><tt>indices</tt>: tensor(int64)</dt>
+<dd>Output the index location of each cumulative minimum value found in the dimension `dim`, with the same shape as `input`.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## Reminders
+
+- Operators endwith `*` are defined in Torch and are included here for the conversion to ONNX.
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/docutils.conf b/cv/distiller/CWD/pytorch/mmcv/docs/en/docutils.conf
new file mode 100644
index 0000000000000000000000000000000000000000..0c00c84688701117f231fd0c8ec295fb747b7d8f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/docutils.conf
@@ -0,0 +1,2 @@
+[html writers]
+table_style: colwidths-auto
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/faq.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/faq.md
new file mode 100644
index 0000000000000000000000000000000000000000..02d31c233a9ff66d5e8f3f288b5d5f64e5c5298c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/faq.md
@@ -0,0 +1,93 @@
+## Frequently Asked Questions
+
+We list some common troubles faced by many users and their corresponding solutions here.
+Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them.
+
+### Installation
+
+- KeyError: "xxx: 'yyy is not in the zzz registry'"
+
+  The registry mechanism will be triggered only when the file of the module is imported.
+  So you need to import that file somewhere. More details can be found at [KeyError: "MaskRCNN: 'RefineRoIHead is not in the models registry'"](https://github.com/open-mmlab/mmdetection/issues/5974).
+
+- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"
+
+  1. Uninstall existing mmcv in the environment using `pip uninstall mmcv`
+  2. Install mmcv-full following the [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) or [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html)
+
+- "invalid device function" or "no kernel image is available for execution"
+
+  1. Check the CUDA compute capability of you GPU
+  2. Run `python mmdet/utils/collect_env.py` to check whether PyTorch, torchvision, and MMCV are built for the correct GPU architecture. You may need to set `TORCH_CUDA_ARCH_LIST` to reinstall MMCV. The compatibility issue could happen when  using old GPUS, e.g., Tesla K80 (3.7) on colab.
+  3. Check whether the running environment is the same as that when mmcv/mmdet is compiled. For example, you may compile mmcv using CUDA 10.0 bug run it on CUDA9.0 environments
+
+- "undefined symbol" or "cannot open xxx.so"
+
+  1. If those symbols are CUDA/C++ symbols (e.g., libcudart.so or GLIBCXX), check
+     whether the CUDA/GCC runtimes are the same as those used for compiling mmcv
+  2. If those symbols are Pytorch symbols (e.g., symbols containing caffe, aten, and TH), check whether the Pytorch version is the same as that used for compiling mmcv
+  3. Run `python mmdet/utils/collect_env.py` to check whether PyTorch, torchvision, and MMCV are built by and running on the same environment
+
+- "RuntimeError: CUDA error: invalid configuration argument"
+
+  This error may be caused by the poor performance of GPU. Try to decrease the value of [THREADS_PER_BLOCK](https://github.com/open-mmlab/mmcv/blob/cac22f8cf5a904477e3b5461b1cc36856c2793da/mmcv/ops/csrc/common_cuda_helper.hpp#L10)
+  and recompile mmcv.
+
+- "RuntimeError: nms is not compiled with GPU support"
+
+  This error is because your CUDA environment is not installed correctly.
+  You may try to re-install your CUDA environment and then delete the build/ folder before re-compile mmcv.
+
+- "Segmentation fault"
+
+  1. Check your GCC version and use GCC >= 5.4. This usually caused by the incompatibility between PyTorch and the environment (e.g., GCC \< 4.9 for PyTorch). We also recommend the users to avoid using GCC 5.5 because many feedbacks report that GCC 5.5 will cause "segmentation fault" and simply changing it to GCC 5.4 could solve the problem
+  2. Check whether PyTorch is correctly installed and could use CUDA op, e.g. type the following command in your terminal and see whether they could correctly output results
+     ```shell
+     python -c 'import torch; print(torch.cuda.is_available())'
+     ```
+  3. If PyTorch is correctly installed, check whether MMCV is correctly installed. If MMCV is correctly installed, then there will be no issue of the command
+     ```shell
+     python -c 'import mmcv; import mmcv.ops'
+     ```
+  4. If MMCV and PyTorch are correctly installed, you can use `ipdb` to set breakpoints or directly add `print` to debug and see which part leads the `segmentation fault`
+
+- "libtorch_cuda_cu.so: cannot open shared object file"
+
+  `mmcv-full` depends on the share object but it can not be found. We can check whether the object exists in `~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib` or try to re-install the PyTorch.
+
+- "fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version!"
+
+  If you are building mmcv-full on Windows and the version of CUDA is 9.2, you will probably encounter the error `"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include\crt/host_config.h(133): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!"`, in which case you can use a lower version of Microsoft Visual Studio like vs2017.
+
+- "error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized"
+
+  If your version of PyTorch is 1.5.0 and you are building mmcv-full on Windows, you will probably encounter the error `- torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized`. The way to solve the error is to replace all the `static constexpr bool all_slots = false;` with `static bool all_slots = false;` at this file `https://github.com/pytorch/pytorch/blob/v1.5.0/torch/csrc/jit/api/module.h`. More details can be found at [member "torch::jit::detail::AttributePolicy::all_slots" may not be initialized](https://github.com/pytorch/pytorch/issues/39394).
+
+- "error: a member with an in-class initializer must be const"
+
+  If your version of PyTorch is 1.6.0 and you are building mmcv-full on Windows, you will probably encounter the error `"- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const"`. The way to solve the error is to replace all the `CONSTEXPR_EXCEPT_WIN_CUDA ` with `const` at `torch/include\torch/csrc/jit/api/module.h`. More details can be found at [Ninja: build stopped: subcommand failed](https://github.com/open-mmlab/mmcv/issues/575).
+
+- "error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized"
+
+  If your version of PyTorch is 1.7.0 and you are building mmcv-full on Windows, you will probably encounter the error `torch/include\torch/csrc/jit/ir/ir.h(1347): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized`. The way to solve the error needs to modify several local files of PyTorch:
+
+  - delete `static constexpr Symbol Kind = ::c10::prim::profile;` and `tatic constexpr Symbol Kind = ::c10::prim::profile_optional;` at `torch/include\torch/csrc/jit/ir/ir.h`
+  - replace `explicit operator type&() { return *(this->value); }` with `explicit operator type&() { return *((type*)this->value); }` at `torch\include\pybind11\cast.h`
+  - replace all the `CONSTEXPR_EXCEPT_WIN_CUDA` with `const` at `torch/include\torch/csrc/jit/api/module.h`
+
+  More details can be found at [Ensure default extra_compile_args](https://github.com/pytorch/pytorch/pull/45956).
+
+- Compatibility issue between MMCV and MMDetection; "ConvWS is already registered in conv layer"
+
+  Please install the correct version of MMCV for the version of your MMDetection following the [installation instruction](https://mmdetection.readthedocs.io/en/latest/get_started.html#installation).
+
+### Usage
+
+- "RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one"
+
+  1. This error indicates that your module has parameters that were not used in producing loss. This phenomenon may be caused by running different branches in your code in DDP mode. More datails at [Expected to have finished reduction in the prior iteration before starting a new one](https://github.com/pytorch/pytorch/issues/55582).
+  2. You can set ` find_unused_parameters = True` in the config to solve the above problems or find those unused parameters manually
+
+- "RuntimeError: Trying to backward through the graph a second time"
+
+  `GradientCumulativeOptimizerHook` and `OptimizerHook` are both set which causes the `loss.backward()` to be called twice so `RuntimeError` was raised. We can only use one of these. More datails at [Trying to backward through the graph a second time](https://github.com/open-mmlab/mmcv/issues/1379).
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/build.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/build.md
new file mode 100644
index 0000000000000000000000000000000000000000..e3d48ec7cf486edece6ea9e622937b08602f5e6e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/build.md
@@ -0,0 +1,292 @@
+## Build MMCV from source
+
+### Build mmcv
+
+Before installing mmcv, make sure that PyTorch has been successfully installed following the [PyTorch official installation guide](https://pytorch.org/get-started/locally/#start-locally). This can be verified using the following command
+
+```bash
+python -c 'import torch;print(torch.__version__)'
+```
+
+If version information is output, then PyTorch is installed.
+
+```{note}
+If you would like to use `opencv-python-headless` instead of `opencv-python`,
+e.g., in a minimum container environment or servers without GUI,
+you can first install it before installing MMCV to skip the installation of `opencv-python`.
+```
+
+#### Build on Linux
+
+1. Clone the repo
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. Install `ninja` and `psutil` to speed up the compilation
+
+   ```bash
+   pip install -r requirements/optional.txt
+   ```
+
+3. Check the nvcc version (requires 9.2+. Skip if no GPU available.)
+
+   ```bash
+   nvcc --version
+   ```
+
+   If the above command outputs the following message, it means that the nvcc setting is OK, otherwise you need to set CUDA_HOME.
+
+   ```
+   nvcc: NVIDIA (R) Cuda compiler driver
+   Copyright (c) 2005-2020 NVIDIA Corporation
+   Built on Mon_Nov_30_19:08:53_PST_2020
+   Cuda compilation tools, release 11.2, V11.2.67
+   Build cuda_11.2.r11.2/compiler.29373293_0
+   ```
+
+   :::{note}
+   If you want to support ROCm, you can refer to [AMD ROCm](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html) to install ROCm.
+   :::
+
+4. Check the gcc version (requires 5.4+)
+
+   ```bash
+   gcc --version
+   ```
+
+5. Start building (takes 10+ min)
+
+   ```bash
+   pip install -e . -v
+   ```
+
+6. Validate the installation
+
+   ```bash
+   python .dev_scripts/check_installation.py
+   ```
+
+   If no error is reported by the above command, the installation is successful. If there is an error reported, please check [Frequently Asked Questions](../faq.md) to see if there is already a solution.
+
+   If no solution is found, please feel free to open an [issue](https://github.com/open-mmlab/mmcv/issues).
+
+#### Build on macOS
+
+```{note}
+If you are using a mac with apple silicon chip, install the PyTorch 1.13+, otherwise you will encounter the problem in [issues#2218](https://github.com/open-mmlab/mmcv/issues/2218).
+```
+
+1. Clone the repo
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. Install `ninja` and `psutil` to speed up the compilation
+
+   ```bash
+   pip install -r requirements/optional.txt
+   ```
+
+3. Start building
+
+   ```bash
+   MMCV_WITH_OPS=1 pip install -e .
+   ```
+
+4. Validate the installation
+
+   ```bash
+   python .dev_scripts/check_installation.py
+   ```
+
+   If no error is reported by the above command, the installation is successful. If there is an error reported, please check [Frequently Asked Questions](../faq.md) to see if there is already a solution.
+
+   If no solution is found, please feel free to open an [issue](https://github.com/open-mmlab/mmcv/issues).
+
+#### Build on Windows
+
+Building MMCV on Windows is a bit more complicated than that on Linux.
+The following instructions show how to get this accomplished.
+
+##### Prerequisite
+
+The following software is required for building MMCV on windows.
+Install them first.
+
+- [Git](https://git-scm.com/download/win)
+  - During installation, tick **add git to Path**.
+- [Visual Studio Community 2019](https://visualstudio.microsoft.com)
+  - A compiler for C++ and CUDA codes.
+- [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
+  - Official distributions of Python should work too.
+- [CUDA 10.2](https://developer.nvidia.com/cuda-10.2-download-archive)
+  - Not required for building CPU version.
+  - Customize the installation if necessary. As a recommendation, skip the driver installation if a newer version is already installed.
+
+```{note}
+You should know how to set up environment variables, especially `Path`, on Windows. The following instruction relies heavily on this skill.
+```
+
+##### Common steps
+
+1. Launch Anaconda prompt from Windows Start menu
+
+   Do not use raw `cmd.exe` s instruction is based on PowerShell syntax.
+
+2. Create a new conda environment
+
+   ```powershell
+   (base) PS C:\Users\xxx> conda create --name mmcv python=3.7
+   (base) PS C:\Users\xxx> conda activate mmcv  # make sure to activate environment before any operation
+   ```
+
+3. Install PyTorch. Choose a version based on your need.
+
+   ```powershell
+   # CUDA version
+   (mmcv) PS C:\Users\xxx> conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
+   # CPU version
+   (mmcv) PS C:\Users\xxx> conda install install pytorch torchvision cpuonly -c pytorch
+   ```
+
+4. Clone the repo
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx> git clone https://github.com/open-mmlab/mmcv.git
+   (mmcv) PS C:\Users\xxx\mmcv> cd mmcv
+   ```
+
+5. Install `ninja` and `psutil` to speed up the compilation
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> pip install -r requirements/optional.txt
+   ```
+
+6. Set up MSVC compiler
+
+   Set Environment variable, add `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\Hostx86\x64` to `PATH`, so that `cl.exe` will be available in prompt, as shown below.
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> cl
+   Microsoft (R) C/C++ Optimizing  Compiler Version 19.27.29111 for x64
+   Copyright (C) Microsoft Corporation.   All rights reserved.
+
+   usage: cl [ option... ] filename... [ / link linkoption... ]
+   ```
+
+   For compatibility, we use the x86-hosted and x64-targeted compiler. note `Hostx86\x64` in the path.
+
+   You may want to change the system language to English because pytorch will parse text output from `cl.exe` to check its version. However only utf-8 is recognized. Navigate to Control Panel -> Region -> Administrative -> Language for Non-Unicode programs and change it to English.
+
+##### Build and install MMCV
+
+mmcv can be built in two ways:
+
+1. Full version (CPU ops)
+
+   Module `ops` will be compiled as a pytorch extension, but only x86 code will be compiled. The compiled ops can be executed on CPU only.
+
+2. Full version (CUDA ops)
+
+   Both x86 and CUDA codes of `ops` module will be compiled. The compiled version can be run on both CPU and CUDA-enabled GPU (if implemented).
+
+###### CPU version
+
+Build and install
+
+```powershell
+(mmcv) PS C:\Users\xxx\mmcv> python setup.py build_ext
+(mmcv) PS C:\Users\xxx\mmcv> python setup.py develop
+```
+
+###### GPU version
+
+1. Make sure `CUDA_PATH` or `CUDA_HOME` is already set in `envs` via `ls env:`, desired output is shown as below:
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> ls env:
+
+   Name                           Value
+   ----                           -----
+   CUDA_PATH                      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   CUDA_PATH_V10_1                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
+   CUDA_PATH_V10_2                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   ```
+
+   This should already be done by CUDA installer. If not, or you have multiple version of CUDA toolkit installed, set it with
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> $env:CUDA_HOME = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2"
+   # OR
+   (mmcv) PS C:\Users\xxx\mmcv> $env:CUDA_HOME = $env:CUDA_PATH_V10_2 # if CUDA_PATH_V10_2 is in envs:
+   ```
+
+2. Set CUDA target arch
+
+   ```shell
+   # Here you need to change to the target architecture corresponding to your GPU
+   (mmcv) PS C:\Users\xxx\mmcv> $env:TORCH_CUDA_ARCH_LIST="7.5"
+   ```
+
+   :::{note}
+   Check your the compute capability of your GPU from [here](https://developer.nvidia.com/cuda-gpus).
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> &"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\demo_suite\deviceQuery.exe"
+   Device 0: "NVIDIA GeForce GTX 1660 SUPER"
+   CUDA Driver Version / Runtime Version          11.7 / 11.1
+   CUDA Capability Major/Minor version number:    7.5
+   ```
+
+   The 7.5 above indicates the target architecture. Note: You need to replace v10.2 with your CUDA version in the above command.
+   :::
+
+3. Build and install
+
+   ```powershell
+   # build
+   python setup.py build_ext # if success, cl will be launched to compile ops
+   # install
+   python setup.py develop
+   ```
+
+   ```{note}
+   If you are compiling against PyTorch 1.6.0, you might meet some errors from PyTorch as described in [this issue](https://github.com/pytorch/pytorch/issues/42467). Follow [this pull request](https://github.com/pytorch/pytorch/pull/43380/files) to modify the source code in your local PyTorch installation.
+   ```
+
+##### Validate installation
+
+```powershell
+(mmcv) PS C:\Users\xxx\mmcv> python .dev_scripts/check_installation.py
+```
+
+If no error is reported by the above command, the installation is successful. If there is an error reported, please check [Frequently Asked Questions](../faq.md) to see if there is already a solution.
+If no solution is found, please feel free to open an [issue](https://github.com/open-mmlab/mmcv/issues).
+
+### Build mmcv-lite
+
+If you need to use PyTorch-related modules, make sure PyTorch has been successfully installed in your environment by referring to the [PyTorch official installation guide](https://github.com/pytorch/pytorch#installation).
+
+1. Clone the repo
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. Start building
+
+   ```bash
+   MMCV_WITH_OPS=0 pip install -e . -v
+   ```
+
+3. Validate installation
+
+   ```bash
+   python -c 'import mmcv;print(mmcv.__version__)'
+   ```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/installation.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/installation.md
new file mode 100644
index 0000000000000000000000000000000000000000..12bad000a171c0adf5be01dc7f53a94a5933070d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/installation.md
@@ -0,0 +1,348 @@
+## Installation
+
+There are two versions of MMCV:
+
+- **mmcv**: comprehensive, with full features and various CUDA ops out of box. It takes longer time to build.
+- **mmcv-lite**: lite, without CUDA ops but all other features, similar to mmcv\<1.0.0. It is useful when you do not need those CUDA ops.
+
+```{warning}
+Do not install both versions in the same environment, otherwise you may encounter errors like `ModuleNotFound`. You need to uninstall one before installing the other. `Installing the full version is highly recommended if CUDA is avaliable`.
+```
+
+### Install mmcv
+
+Before installing mmcv, make sure that PyTorch has been successfully installed following the [PyTorch official installation guide](https://pytorch.org/get-started/locally/#start-locally). This can be verified using the following command
+
+```bash
+python -c 'import torch;print(torch.__version__)'
+```
+
+If version information is output, then PyTorch is installed.
+
+#### Install with mim (recommended)
+
+[mim](https://github.com/open-mmlab/mim) is the package management tool for the OpenMMLab projects, which makes it easy to install mmcv
+
+```bash
+pip install -U openmim
+mim install "mmcv>=2.0.0rc1"
+```
+
+If you find that the above installation command does not use a pre-built package ending with `.whl` but a source package ending with `.tar.gz`, you may not have a pre-build package corresponding to the PyTorch or CUDA or mmcv version, in which case you can [build mmcv from source](build.md).
+
+<details>
+<summary>Installation log using pre-built packages</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv<br />
+<b>Downloading https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/mmcv-2.0.0rc3-cp38-cp38-manylinux1_x86_64.whl</b>
+
+</details>
+
+<details>
+<summary>Installation log using source packages</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv==2.0.0rc3<br />
+<b>Downloading mmcv-2.0.0rc3.tar.gz</b>
+
+</details>
+
+To install a specific version of mmcv, for example, mmcv version 2.0.0rc3, you can use the following command
+
+```bash
+mim install mmcv==2.0.0rc3
+```
+
+:::{note}
+If you would like to use `opencv-python-headless` instead of `opencv-python`,
+e.g., in a minimum container environment or servers without GUI,
+you can first install it before installing MMCV to skip the installation of `opencv-python`.
+
+Alternatively, if it takes too long to install a dependency library, you can specify the pypi source
+
+```bash
+mim install "mmcv>=2.0.0rc3" -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+:::
+
+You can run [check_installation.py](https://github.com/open-mmlab/mmcv/blob/2.x/.dev_scripts/check_installation.py) to check the installation of mmcv-full after running the installation commands.
+
+#### Install with pip
+
+Use the following command to check the version of CUDA and PyTorch
+
+```bash
+python -c 'import torch;print(torch.__version__);print(torch.version.cuda)'
+```
+
+Select the appropriate installation command depending on the type of system, CUDA version, PyTorch version, and MMCV version
+
+<html>
+<body>
+    <style>
+      select {
+          z-index: 1000;
+          position: absolute;
+          top: 10px;
+          width: 6.7rem;
+      }
+      #select-container {
+          position: relative;
+          height: 30px;
+      }
+      #select-cmd {
+          background-color: #f5f6f7;
+          font-size: 14px;
+          margin-top: 20px;
+      }
+      /* 让每一个都间隔1.3rem */
+      #select-os {
+          /* left: 1.375rem; */
+          left: 0;
+      }
+      #select-cuda {
+          /* left: 9.375rem;    9.375 = 1.375 + 6.7 + 1.3 */
+          left: 8rem;
+      }
+      #select-torch {
+          /* left: 17.375rem;    17.375 = 9.375 + 6.7 + 1.3 */
+          left: 16rem;
+      }
+      #select-mmcv {
+          /* left: 25.375rem;    25.375 = 17.375 + 6.7 + 1.3 */
+          left: 24rem;
+      }
+    </style>
+    <div id="select-container">
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeOS(this.value)"
+            id="select-os">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeCUDA(this.value)"
+            id="select-cuda">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeTorch(this.value)"
+            id="select-torch">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeMMCV(this.value)"
+            id="select-mmcv">
+        </select>
+    </div>
+    <pre id="select-cmd"></pre>
+</body>
+<script>
+    let osVal, cudaVal, torchVal, mmcvVal;
+    function changeMMCV(val) {
+        mmcvVal = val;
+        change("select-mmcv");
+    }
+    function changeTorch(val) {
+        torchVal = val;
+        change("select-torch");
+    }
+    function changeCUDA(val) {
+        cudaVal = val;
+        change("select-cuda");
+    }
+    function changeOS(val) {
+        osVal = val;
+        change("select-os");
+    }
+    function handleSelectMouseDown(id) {
+        const dom = document.getElementById(id);
+        if (!dom) return;
+        const len = dom?.options?.length;
+        if (len >= 9) {
+            dom.size = 10;
+            dom.style.zIndex = 100;
+        }
+    }
+    function handleSelectClick() {
+        const selects = Array.from(document.getElementsByTagName("select"));
+        selects.forEach(select => {
+            select.size = 1;
+        });
+    }
+    function handleSelectBlur(id) {
+        const dom = document.getElementById(id);
+        if (!dom) {
+            handleSelectClick();
+            return;
+        }
+        dom.size = 1;
+        dom.style.zIndex = 1;
+    }
+    function changeCmd() {
+        const cmd = document.getElementById("select-cmd");
+        let cmdString = "pip install mmcv=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html";
+        // e.g: pip install mmcv==2.0.0rc1 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9/index.html
+        let cudaVersion;
+        if (cudaVal === "cpu" || cudaVal === "mps") {
+            cudaVersion = "cpu";
+        } else {
+            cudaVersion = `cu${cudaVal.split(".").join("")}`;
+        }
+        const torchVersion = `torch${torchVal.substring(0, torchVal.length - 2)}`;
+        cmdString = cmdString.replace("{cu_version}", cudaVersion).replace("{mmcv_version}", mmcvVal).replace("{torch_version}", torchVersion);
+        cmd.textContent = cmdString;
+    }
+    function unique(arr) {
+        if (!arr || !Array.isArray(arr)) return [];
+        return [...new Set(arr)];
+    }
+    function genOptionFragment(data, id) {
+        const name = id.includes("-")? id.split("-")[1] : id;
+        const fragment = new DocumentFragment();
+        data.forEach(option => {
+            const ele = document.createElement("option");
+            let text = `${name} ${option}`;
+            if (name === "os" || option.toUpperCase() === "CPU" || option.toUpperCase() === "MPS") {
+                text = `${option}`;
+            }
+            ele.textContent = text;
+            ele.value = option;
+            ele.addEventListener('click', handleSelectClick);
+            fragment.appendChild(ele);
+        });
+        return fragment;
+    }
+    function findAndAppend(data, id) {
+        const fragment = genOptionFragment(data, id);
+        const dom = document.getElementById(id);
+        if (dom) dom.replaceChildren(fragment);
+    }
+    function change(id) {
+        const order = ["select-mmcv", "select-torch", "select-cuda", "select-os"];
+        const idx = order.indexOf(id);
+        if (idx === -1) return;
+        const versionDetail = version[osVal];
+        if (idx >= 3) {
+            let cuda = [];
+            versionDetail.forEach(v => {
+                cuda.push(v.cuda);
+            });
+            cuda = unique(cuda);
+            cudaVal = cuda[0];
+            findAndAppend(cuda, "select-cuda");
+        }
+        if (idx >= 2) {
+            const torch = [];
+            versionDetail.forEach(v => {
+                if (v.cuda === cudaVal) torch.push(v.torch);
+            });
+            torchVal = torch[0];
+            findAndAppend(torch, "select-torch");
+        }
+        if (idx >= 1) {
+            let mmcv = [];
+            versionDetail.forEach(v => {
+                if (v.cuda === cudaVal && v.torch === torchVal) mmcv = v.mmcv;
+            });
+            mmcvVal = mmcv[0];
+            findAndAppend(mmcv, "select-mmcv");
+        }
+        changeCmd();
+    }
+    function init() {
+        document.addEventListener("click", handleSelectBlur);
+        const version = window.version;
+        const os = Object.keys(version);
+        osVal = os[0];
+        findAndAppend(os, "select-os");
+        change("select-os");
+        changeCmd();
+    }
+    window.onload = function () {
+        const url = "../_static/version.json"
+        const request = new XMLHttpRequest();
+        request.open("get", url);
+        request.send(null);
+        request.onload = function () {
+            if (request.status !== 200) return;
+            const data = JSON.parse(request.responseText);
+            window.version = data;
+            init();
+        }
+    }
+</script>
+</html>
+
+If you do not find a corresponding version in the dropdown box above, you probably do not have a pre-built package corresponding to the PyTorch or CUDA or mmcv version, at which point you can [build mmcv from source](build.md).
+
+:::{note}
+mmcv is only compiled on PyTorch 1.x.0 because the compatibility
+usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you
+can install mmcv compiled with PyTorch 1.x.0 and it usually works well.
+For example, if your PyTorch version is 1.8.1, you can feel free to choose 1.8.x.
+:::
+
+:::{note}
+If you would like to use `opencv-python-headless` instead of `opencv-python`,
+e.g., in a minimum container environment or servers without GUI,
+you can first install it before installing MMCV to skip the installation of `opencv-python`.
+
+Alternatively, if it takes too long to install a dependency library, you can specify the pypi source
+
+```bash
+mim install "mmcv>=2.0.0rc1" -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+:::
+
+You can run [check_installation.py](https://github.com/open-mmlab/mmcv/blob/2.x/.dev_scripts/check_installation.py) to check the installation of mmcv after running the installation commands.
+
+#### Using mmcv with Docker
+
+Build with local repository
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git && cd mmcv
+docker build -t mmcv -f docker/release/Dockerfile .
+```
+
+Or build with remote repository
+
+```bash
+docker build -t mmcv https://github.com/open-mmlab/mmcv.git#2.x:docker/release
+```
+
+The [Dockerfile](release/Dockerfile) installs latest released version of mmcv-full by default, but you can specify mmcv versions to install expected versions.
+
+```bash
+docker image build -t mmcv -f docker/release/Dockerfile --build-arg MMCV=2.0.0rc1 .
+```
+
+If you also want to use other versions of PyTorch and CUDA, you can also pass them when building docker images.
+
+An example to build an image with PyTorch 1.11 and CUDA 11.3.
+
+```bash
+docker build -t mmcv -f docker/release/Dockerfile \
+    --build-arg PYTORCH=1.11.0 \
+    --build-arg CUDA=11.3 \
+    --build-arg CUDNN=8 \
+    --build-arg MMCV=2.0.0rc1 .
+```
+
+More available versions of PyTorch and CUDA can be found at [dockerhub/pytorch](https://hub.docker.com/r/pytorch/pytorch/tags).
+
+### Install mmcv-lite
+
+If you need to use PyTorch-related modules, make sure PyTorch has been successfully installed in your environment by referring to the [PyTorch official installation guide](https://github.com/pytorch/pytorch#installation).
+
+```python
+pip install mmcv-lite
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/introduction.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/introduction.md
new file mode 100644
index 0000000000000000000000000000000000000000..461fcc725bbcf4a84296e95789303b64e7b2e9c5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/introduction.md
@@ -0,0 +1,36 @@
+## Introduction
+
+MMCV is a foundational library for computer vision research and provides the following functionalities.
+
+- [Image/Video processing](../understand_mmcv/data_process.md)
+- [Image and annotation visualization](../understand_mmcv/visualization.md)
+- [Image transformation](../understand_mmcv/data_transform.md)
+- [Various CNN architectures](../understand_mmcv/cnn.md)
+- [High-quality implementation of common CUDA ops](../understand_mmcv/ops.md)
+
+It supports the following systems:
+
+- Linux
+- Windows
+- macOS
+
+It supports many research projects as below:
+
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO series toolbox and benchmark.
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/previous_versions.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/previous_versions.md
new file mode 100644
index 0000000000000000000000000000000000000000..a9c3717667fec3e8f338c319413aa6ad639dc6d3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/get_started/previous_versions.md
@@ -0,0 +1,47 @@
+## OTHER VERSIONS OF PYTORCH BUILT FOR MMCV-FULL
+
+We no longer provide `mmcv-full` packages compiled under lower versions of `PyTorch`, but for your convenience, you can find them below.
+
+### PyTorch 1.4
+
+| 1.0.0 \<= mmcv_version \<= 1.2.1
+
+#### CUDA 10.1
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.4.0/index.html
+```
+
+#### CUDA 9.2
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.4.0/index.html
+```
+
+#### CPU
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.4.0/index.html
+```
+
+### PyTorch v1.3
+
+| 1.0.0 \<= mmcv_version \<= 1.3.16
+
+#### CUDA 10.1
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.3.0/index.html
+```
+
+#### CUDA 9.2
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.3.0/index.html
+```
+
+#### CPU
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.3.0/index.html
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/index.rst b/cv/distiller/CWD/pytorch/mmcv/docs/en/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..dee2c37507fb77df42fef5e51fe501214c13d7ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/index.rst
@@ -0,0 +1,69 @@
+Welcome to MMCV's documentation!
+================================
+
+You can switch between Chinese and English documents in the lower-left corner of the layout.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Get Started
+
+   get_started/introduction.md
+   get_started/installation.md
+   get_started/build.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Understand MMCV
+
+   understand_mmcv/data_process.md
+   understand_mmcv/data_transform.md
+   understand_mmcv/visualization.md
+   understand_mmcv/cnn.md
+   understand_mmcv/ops.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Deployment
+
+   deployment/mmcv_ops_definition.md
+
+.. toctree::
+   :caption: Switch Language
+
+   switch_language.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Compatibility
+
+   compatibility.md
+
+.. toctree::
+
+   faq.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Community
+
+   community/contributing.md
+   community/pr.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: API Reference
+
+   mmcv.image <api/image>
+   mmcv.video <api/video>
+   mmcv.visualization <api/visualization>
+   mmcv.cnn <api/cnn>
+   mmcv.ops <api/ops>
+   mmcv.transforms <api/transforms>
+   mmcv.arraymisc <api/arraymisc>
+   mmcv.utils <api/utils>
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/make.bat b/cv/distiller/CWD/pytorch/mmcv/docs/en/make.bat
new file mode 100644
index 0000000000000000000000000000000000000000..7893348a1b7dbb588983a48e6991282eae7e1b55
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+
+:end
+popd
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/mmcv-logo.png b/cv/distiller/CWD/pytorch/mmcv/docs/en/mmcv-logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..bcc5759f8fe3bc7d191d411c38a9e1d3c1c27a84
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/en/mmcv-logo.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/switch_language.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/switch_language.md
new file mode 100644
index 0000000000000000000000000000000000000000..9dc7b34b4fac6a972abedd8c2b0b80d03441d2b9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/switch_language.md
@@ -0,0 +1,3 @@
+## <a href='https://mmcv.readthedocs.io/en/latest/'>English</a>
+
+## <a href='https://mmcv.readthedocs.io/zh_CN/latest/'>简体中文</a>
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/cnn.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/cnn.md
new file mode 100644
index 0000000000000000000000000000000000000000..2c42f25d9d5c5b2886c420bbab4461272cf02b21
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/cnn.md
@@ -0,0 +1,120 @@
+## CNN
+
+We provide some building bricks for CNNs, including layer building, module bundles and weight initialization.
+
+### Layer building
+
+We may need to try different layers of the same type when running experiments,
+but do not want to modify the code from time to time.
+Here we provide some layer building methods to construct layers from a dict,
+which can be written in configs or specified via command line arguments.
+
+#### Usage
+
+A simplest example is
+
+```python
+from mmcv.cnn import build_conv_layer
+
+cfg = dict(type='Conv3d')
+layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
+```
+
+- `build_conv_layer`: Supported types are Conv1d, Conv2d, Conv3d, Conv (alias for Conv2d).
+- `build_norm_layer`: Supported types are BN1d, BN2d, BN3d, BN (alias for BN2d), SyncBN, GN, LN, IN1d, IN2d, IN3d, IN (alias for IN2d).
+- `build_activation_layer`: Supported types are ReLU, LeakyReLU, PReLU, RReLU, ReLU6, ELU, Sigmoid, Tanh, GELU.
+- `build_upsample_layer`: Supported types are nearest, bilinear, deconv, pixel_shuffle.
+- `build_padding_layer`: Supported types are zero, reflect, replicate.
+
+#### Extension
+
+We also allow extending the building methods with custom layers and operators.
+
+1. Write and register your own module.
+
+   ```python
+   from mmengine.registry import MODELS
+
+   @MODELS.register_module()
+   class MyUpsample:
+
+       def __init__(self, scale_factor):
+           pass
+
+       def forward(self, x):
+           pass
+   ```
+
+2. Import `MyUpsample` somewhere (e.g., in `__init__.py`) and then use it.
+
+   ```python
+   from mmcv.cnn import build_upsample_layer
+
+   cfg = dict(type='MyUpsample', scale_factor=2)
+   layer = build_upsample_layer(cfg)
+   ```
+
+### Module bundles
+
+We also provide common module bundles to facilitate the network construction.
+`ConvModule` is a bundle of convolution, normalization and activation layers,
+please refer to the [api](api.html#mmcv.cnn.ConvModule) for details.
+
+```python
+from mmcv.cnn import ConvModule
+
+# conv + bn + relu
+conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+# conv + gn + relu
+conv = ConvModule(3, 8, 2, norm_cfg=dict(type='GN', num_groups=2))
+# conv + relu
+conv = ConvModule(3, 8, 2)
+# conv
+conv = ConvModule(3, 8, 2, act_cfg=None)
+# conv + leaky relu
+conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='LeakyReLU'))
+# bn + conv + relu
+conv = ConvModule(
+    3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
+```
+
+### Model Zoo
+
+Besides torchvision pre-trained models, we also provide pre-trained models of following CNN:
+
+- VGG Caffe
+- ResNet Caffe
+- ResNeXt
+- ResNet with Group Normalization
+- ResNet with Group Normalization and Weight Standardization
+- HRNetV2
+- Res2Net
+- RegNet
+
+#### Model URLs in JSON
+
+The model zoo links in MMCV are managed by JSON files.
+The json file consists of key-value pair of model name and its url or path.
+An example json file could be like:
+
+```json
+{
+    "model_a": "https://example.com/models/model_a_9e5bac.pth",
+    "model_b": "pretrain/model_b_ab3ef2c.pth"
+}
+```
+
+The default links of the pre-trained models hosted on OpenMMLab AWS could be found [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/model_zoo/open_mmlab.json).
+
+You may override default links by putting `open-mmlab.json` under `MMCV_HOME`. If `MMCV_HOME` is not found in your environment, `~/.cache/mmcv` will be used by default. You may use your own path with `export MMCV_HOME=/your/path`.
+
+The external json files will be merged into default one. If the same key presents in both external json and default json, the external one will be used.
+
+#### Load Checkpoint
+
+The following types are supported for `filename` of `mmcv.load_checkpoint()`.
+
+- filepath: The filepath of the checkpoint.
+- `http://xxx` and `https://xxx`: The link to download the checkpoint. The `SHA256` postfix should be contained in the filename.
+- `torchvision://xxx`: The model links in `torchvision.models`. Please refer to [torchvision](https://pytorch.org/docs/stable/torchvision/models.html) for details.
+- `open-mmlab://xxx`: The model links or filepath provided in default and additional json files.
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_process.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_process.md
new file mode 100644
index 0000000000000000000000000000000000000000..167928f88528ee6b682a559582a1584c369a5d39
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_process.md
@@ -0,0 +1,286 @@
+## Data Process
+
+### Image
+
+This module provides some image processing methods, which requires `opencv` to be installed first.
+
+#### Read/Write/Show
+
+To read or write images files, use `imread` or `imwrite`.
+
+```python
+import mmcv
+
+img = mmcv.imread('test.jpg')
+img = mmcv.imread('test.jpg', flag='grayscale')
+img_ = mmcv.imread(img)  # nothing will happen, img_ = img
+mmcv.imwrite(img, 'out.jpg')
+```
+
+To read images from bytes
+
+```python
+with open('test.jpg', 'rb') as f:
+    data = f.read()
+img = mmcv.imfrombytes(data)
+```
+
+To show an image file or a loaded image
+
+```python
+mmcv.imshow('tests/data/color.jpg')
+# this is equivalent to
+
+for i in range(10):
+    img = np.random.randint(256, size=(100, 100, 3), dtype=np.uint8)
+    mmcv.imshow(img, win_name='test image', wait_time=200)
+```
+
+#### Color space conversion
+
+Supported conversion methods:
+
+- bgr2gray
+- gray2bgr
+- bgr2rgb
+- rgb2bgr
+- bgr2hsv
+- hsv2bgr
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+img1 = mmcv.bgr2rgb(img)
+img2 = mmcv.rgb2gray(img1)
+img3 = mmcv.bgr2hsv(img)
+```
+
+#### Resize
+
+There are three resize methods. All `imresize_*` methods have an argument `return_scale`,
+if this argument is `False`, then the return value is merely the resized image, otherwise
+is a tuple `(resized_img, scale)`.
+
+```python
+# resize to a given size
+mmcv.imresize(img, (1000, 600), return_scale=True)
+
+# resize to the same size of another image
+mmcv.imresize_like(img, dst_img, return_scale=False)
+
+# resize by a ratio
+mmcv.imrescale(img, 0.5)
+
+# resize so that the max edge no longer than 1000, short edge no longer than 800
+# without changing the aspect ratio
+mmcv.imrescale(img, (1000, 800))
+```
+
+#### Rotate
+
+To rotate an image by some angle, use `imrotate`. The center can be specified,
+which is the center of original image by default. There are two modes of rotating,
+one is to keep the image size unchanged so that some parts of the image will be
+cropped after rotating, the other is to extend the image size to fit the rotated
+image.
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# rotate the image clockwise by 30 degrees.
+img_ = mmcv.imrotate(img, 30)
+
+# rotate the image counterclockwise by 90 degrees.
+img_ = mmcv.imrotate(img, -90)
+
+# rotate the image clockwise by 30 degrees, and rescale it by 1.5x at the same time.
+img_ = mmcv.imrotate(img, 30, scale=1.5)
+
+# rotate the image clockwise by 30 degrees, with (100, 100) as the center.
+img_ = mmcv.imrotate(img, 30, center=(100, 100))
+
+# rotate the image clockwise by 30 degrees, and extend the image size.
+img_ = mmcv.imrotate(img, 30, auto_bound=True)
+```
+
+#### Flip
+
+To flip an image, use `imflip`.
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# flip the image horizontally
+mmcv.imflip(img)
+
+# flip the image vertically
+mmcv.imflip(img, direction='vertical')
+```
+
+#### Crop
+
+`imcrop` can crop the image with one or more regions. Each region is represented by the upper left and lower right coordinates as (x1, y1, x2, y2).
+
+```python
+import mmcv
+import numpy as np
+
+img = mmcv.imread('tests/data/color.jpg')
+
+# crop the region (10, 10, 100, 120)
+bboxes = np.array([10, 10, 100, 120])
+patch = mmcv.imcrop(img, bboxes)
+
+# crop two regions (10, 10, 100, 120) and (0, 0, 50, 50)
+bboxes = np.array([[10, 10, 100, 120], [0, 0, 50, 50]])
+patches = mmcv.imcrop(img, bboxes)
+
+# crop two regions, and rescale the patches by 1.2x
+patches = mmcv.imcrop(img, bboxes, scale=1.2)
+```
+
+#### Padding
+
+There are two methods, `impad` and `impad_to_multiple`, to pad an image to the
+specific size with given values.
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# pad the image to (1000, 1200) with all zeros
+img_ = mmcv.impad(img, shape=(1000, 1200), pad_val=0)
+
+# pad the image to (1000, 1200) with different values for three channels.
+img_ = mmcv.impad(img, shape=(1000, 1200), pad_val=(100, 50, 200))
+
+# pad the image on left, right, top, bottom borders with all zeros
+img_ = mmcv.impad(img, padding=(10, 20, 30, 40), pad_val=0)
+
+# pad the image on left, right, top, bottom borders with different values
+# for three channels.
+img_ = mmcv.impad(img, padding=(10, 20, 30, 40), pad_val=(100, 50, 200))
+
+# pad an image so that each edge is a multiple of some value.
+img_ = mmcv.impad_to_multiple(img, 32)
+```
+
+### Video
+
+This module provides the following functionalities:
+
+- A `VideoReader` class with friendly apis to read and convert videos.
+- Some methods for editing (cut, concat, resize) videos.
+- Optical flow read/write/warp.
+
+#### VideoReader
+
+The `VideoReader` class provides sequence like apis to access video frames.
+It will internally cache the frames which have been visited.
+
+```python
+video = mmcv.VideoReader('test.mp4')
+
+# obtain basic information
+print(len(video))
+print(video.width, video.height, video.resolution, video.fps)
+
+# iterate over all frames
+for frame in video:
+    print(frame.shape)
+
+# read the next frame
+img = video.read()
+
+# read a frame by index
+img = video[100]
+
+# read some frames
+img = video[5:10]
+```
+
+To convert a video to images or generate a video from a image directory.
+
+```python
+# split a video into frames and save to a folder
+video = mmcv.VideoReader('test.mp4')
+video.cvt2frames('out_dir')
+
+# generate video from frames
+mmcv.frames2video('out_dir', 'test.avi')
+```
+
+#### Editing utils
+
+There are also some methods for editing videos, which wraps the commands of ffmpeg.
+
+```python
+# cut a video clip
+mmcv.cut_video('test.mp4', 'clip1.mp4', start=3, end=10, vcodec='h264')
+
+# join a list of video clips
+mmcv.concat_video(['clip1.mp4', 'clip2.mp4'], 'joined.mp4', log_level='quiet')
+
+# resize a video with the specified size
+mmcv.resize_video('test.mp4', 'resized1.mp4', (360, 240))
+
+# resize a video with a scaling ratio of 2
+mmcv.resize_video('test.mp4', 'resized2.mp4', ratio=2)
+```
+
+#### Optical flow
+
+`mmcv` provides the following methods to operate on optical flows.
+
+- IO
+- Visualization
+- Flow warping
+
+We provide two options to dump optical flow files: uncompressed and compressed.
+The uncompressed way just dumps the floating numbers to a binary file. It is
+lossless but the dumped file has a larger size.
+The compressed way quantizes the optical flow to 0-255 and dumps it as a
+jpeg image. The flow of x-dim and y-dim will be concatenated into a single image.
+
+1. IO
+
+```python
+flow = np.random.rand(800, 600, 2).astype(np.float32)
+# dump the flow to a flo file (~3.7M)
+mmcv.flowwrite(flow, 'uncompressed.flo')
+# dump the flow to a jpeg file (~230K)
+# the shape of the dumped image is (800, 1200)
+mmcv.flowwrite(flow, 'compressed.jpg', quantize=True, concat_axis=1)
+
+# read the flow file, the shape of loaded flow is (800, 600, 2) for both ways
+flow = mmcv.flowread('uncompressed.flo')
+flow = mmcv.flowread('compressed.jpg', quantize=True, concat_axis=1)
+```
+
+2. Visualization
+
+It is possible to visualize optical flows with `mmcv.flowshow()`.
+
+```python
+mmcv.flowshow(flow)
+```
+
+![progress](../_static/flow_visualization.png)
+
+3. Flow warping
+
+```python
+img1 = mmcv.imread('img1.jpg')
+flow = mmcv.flowread('flow.flo')
+warped_img2 = mmcv.flow_warp(img1, flow)
+```
+
+img1 (left) and img2 (right)
+
+![raw images](../_static/flow_raw_images.png)
+
+optical flow (img2 -> img1)
+
+![optical flow](../_static/flow_img2toimg1.png)
+
+warped image and difference with ground truth
+
+![warped image](../_static/flow_warp_diff.png)
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_transform.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_transform.md
new file mode 100644
index 0000000000000000000000000000000000000000..64c3af980eab0b07d7a298cee2c41465803911f8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/data_transform.md
@@ -0,0 +1,341 @@
+# Data Transformation
+
+In the OpenMMLab algorithm library, dataset construction and data preparation are decoupled. Usually, the construction of the dataset only parses the dataset and records the basic information of each sample, while the data preparation is a series of data transformations including data loading, preprocessing, formatting, and other operations performed according to the basic information of the sample.
+
+## Design of data transformation
+
+In MMCV, we use various callable data transformation classes to manipulate data. These data transformation classes can accept several configuration parameters for the instantiation and then process the input data dictionary by `__call__` method. All data transformation methods accept a dictionary as the input and produce the output as a dictionary as well. A simple example is as follows:
+
+```python
+>>> import numpy as np
+>>> from mmcv.transforms import Resize
+>>>
+>>> transform = Resize(scale=(224, 224))
+>>> data_dict = {'img': np.random.rand(256, 256, 3)}
+>>> data_dict = transform(data_dict)
+>>> print(data_dict['img'].shape)
+(224, 224, 3)
+```
+
+The data transformation class reads some fields of the input dictionary and may add or update some fields. The keys of these fields are mostly fixed. For example, `Resize` will always read fields such as `"img"` in the input dictionary. More information about the conventions for input and output fields could be found in the documentation of the corresponding class.
+
+```{note}
+By convention, the order of image shape which is used as **initialization parameters** in data transformation (such as Resize, Pad) is (width, height). In the dictionary returned by the data transformation, the image related shape, such as `img_shape`, `ori_shape`, `pad_shape`, etc., is (height, width).
+```
+
+MMCV provides a unified base class called `BaseTransform` for all data transformation classes:
+
+```python
+class BaseTransform(metaclass=ABCMeta):
+
+    def __call__(self, results: dict) -> dict:
+
+        return self.transform(results)
+
+    @abstractmethod
+    def transform(self, results: dict) -> dict:
+        pass
+```
+
+All data transformation classes must inherit `BaseTransform` and implement the `transform` method. Both the input and output of the `transform` method are a dictionary. In the **Custom data transformation class** section, we will describe how to implement a data transformation class in more detail.
+
+## Data pipeline
+
+As mentioned above, the inputs and outputs of all data transformations are dictionaries. Moreover, according to the \[Convention on Datasets\] (TODO) in OpenMMLab, the basic information of each sample in the dataset is also a dictionary. This way, we can connect all data transformation operations end to end and combine them into a data pipeline. This pipeline inputs the information dictionary of the samples in the dataset and outputs the information dictionary after a series of processing.
+
+Taking the classification task as an example, we show a typical data pipeline in the figure below. For each sample, the information stored in the dataset is a dictionary, as shown on the far left in the figure. After each data transformation operation represented by the blue block, a new field (marked in green) will be added to the data dictionary or an existing field (marked in orange) will be updated.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/154197953-bf0b1a16-3f41-4bc7-9e67-b2b9b323d895.png" width="90%"/>
+</div>
+
+The data pipeline is a list of several data transformation configuration dictionaries in the configuration file. Each dataset needs to set the parameter `pipeline` to define the data preparation operations the dataset needs to perform. The configuration of the above data pipeline in the configuration file is as follows:
+
+```python
+pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='Resize', size=256, keep_ratio=True),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375]),
+    dict(type='ClsFormatBundle')
+]
+
+dataset = dict(
+    ...
+    pipeline=pipeline,
+    ...
+)
+```
+
+## Common data transformation classes
+
+The commonly used data transformation classes can be roughly divided into data loading, data preprocessing and augmentation, and data formatting. In MMCV, we provide some commonly used classes as follows:
+
+### Data loading
+
+To support the loading of large-scale datasets, data is usually not loaded when `Dataset` is initialized. Only the corresponding path is loaded. Therefore, it is necessary to load specific data in the data pipeline.
+
+|            Class            |                    Feature                     |
+| :-------------------------: | :--------------------------------------------: |
+| [`LoadImageFromFile`](TODO) |              Load from file path               |
+|  [`LoadAnnotations`](TODO)  | Load and organize the annotations (bbox, etc.) |
+
+### Data preprocessing and enhancement
+
+Data preprocessing and augmentation usually involve transforming the image itself, such as cropping, padding, scaling, etc.
+
+|              Class               |                        Feature                         |
+| :------------------------------: | :----------------------------------------------------: |
+|          [`Pad`](TODO)           |                        Padding                         |
+|       [`CenterCrop`](TODO)       |                      Center crop                       |
+|       [`Normalize`](TODO)        |                  Image normalization                   |
+|         [`Resize`](TODO)         |         Resize to the specified size or ratio          |
+|      [`RandomResize`](TODO)      |  Scale the image randomly within the specified range   |
+| [`RandomMultiscaleResize`](TODO) | Scale the image to a random size from multiple options |
+|    [`RandomGrayscale`](TODO)     |                    Random grayscale                    |
+|       [`RandomFlip`](TODO)       |                      Random flip                       |
+|   [`MultiScaleFlipAug`](TODO)    |    Support scaling and flipping during the testing     |
+
+### Data formatting
+
+Data formatting operations are type conversions performed on the data.
+
+|          Class          |                   Feature                    |
+| :---------------------: | :------------------------------------------: |
+|   [`ToTensor`](TODO)    | Convert the specified data to `torch.Tensor` |
+| [`ImageToTensor`](TODO) |     Convert the image to `torch.Tensor`      |
+
+## Customize data transformation classes
+
+To implement a new data transformation class, you must inherit `BaseTransform` and implement the `transform` method. Here, we use a simple flip transform (`MyFlip`) as an example:
+
+```python
+import random
+import mmcv
+from mmcv.transforms import BaseTransform, TRANSFORMS
+
+@TRANSFORMS.register_module()
+class MyFlip(BaseTransform):
+    def __init__(self, direction: str):
+        super().__init__()
+        self.direction = direction
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+Now, we can instantiate `MyFlip` as a callable object to handle our data dictionary.
+
+```python
+import numpy as np
+
+transform = MyFlip(direction='horizontal')
+data_dict = {'img': np.random.rand(224, 224, 3)}
+data_dict = transform(data_dict)
+processed_img = data_dict['img']
+```
+
+Alternatively, use `MyFlip` transform in the `pipeline` of the config file.
+
+```python
+pipeline = [
+    ...
+    dict(type='MyFlip', direction='horizontal'),
+    ...
+]
+```
+
+It should be noted that if you want to use it in the configuration file, you must ensure that the file where the `MyFlip` class is located can be imported at the runtime.
+
+## Transform wrapper
+
+Transform wrappers are a special class of data transformations. They do not operate on images, labels or other information in the data dictionary by themselves. Instead, they enhance the behavior of data transformations defined in them.
+
+### KeyMapper
+
+`KeyMapper` is used to map fields in the data dictionary. For example, image processing transforms usually get their values from the `"img"` field in the data dictionary. But sometimes we want these transforms to handle images in other fields in the data dictionary, such as the `"gt_img"` field.
+
+When used with registry and configuration file, the field map wrapper should be used as follows:
+
+```python
+pipeline = [
+    ...
+    dict(type='KeyMapper',
+        mapping={
+            'img': 'gt_img',  # map "gt_img" to "img"
+            'mask': ...,  # The "mask" field in the raw data is not used. That is, for wrapped data transformations, the "mask" field is not included in the data
+        },
+        auto_remap=True,  # remap "img" back to "gt_img" after the transformation
+        transforms=[
+            # only need to specify "img" in `RandomFlip`
+            dict(type='RandomFlip'),
+        ])
+    ...
+]
+```
+
+With `KeyMapper`, we don't need to consider various possible input field names in the `transform` method when we implement the data transformation class. We only need to deal with the default fields.
+
+### RandomChoice and RandomApply
+
+`RandomChoice` is used to randomly select a data transformation pipeline from the given choices. With this wrapper, we can easily implement some data augmentation functions, such as AutoAugment.
+
+In configuration file, you can use `RandomChoice` as follows:
+
+```python
+pipeline = [
+    ...
+    dict(type='RandomChoice',
+        transforms=[
+            [
+                dict(type='Posterize', bits=4),
+                dict(type='Rotate', angle=30.)
+            ],  # the first combo option
+            [
+                dict(type='Equalize'),
+                dict(type='Rotate', angle=30)
+            ],  # the second combo option
+        ],
+        prob=[0.4, 0.6]  # the prob of each combo
+        )
+    ...
+]
+```
+
+`RandomApply` is used to randomly perform a combination of data transformations with a specified probability. For example:
+
+```python
+pipeline = [
+    ...
+    dict(type='RandomApply',
+        transforms=[dict(type='Rotate', angle=30.)],
+        prob=0.3)  # perform the transformation with prob as 0.3
+    ...
+]
+```
+
+### TransformBroadcaster
+
+Usually, a data transformation class only reads the target of an operation from one field. While we can also use `KeyMapper` to change the fields read, there is no way to apply transformations to the data of multiple fields at once. To achieve this, we need to use the multi-target extension wrapper `TransformBroadcaster`.
+
+`TransformBroadcaster` has two uses, one is to apply data transformation to multiple specified fields, and the other is to apply data transformation to a group of targets under a field.
+
+1. Apply to multiple fields
+
+   Suppose we need to apply a data transformation to images in two fields `"lq"` (low-quality) and `"gt"` (ground-truth).
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           # apply to the "lq" and "gt" fields respectively, and set the "img" field to both
+           mapping={'img': ['lq', 'gt']},
+           # remap the "img" field back to the original field after the transformation
+           auto_remap=True,
+           # whether to share random variables in the transformation of each target
+           # more introduction will be referred in the following chapters (random variable sharing)
+           share_random_params=True,
+           transforms=[
+               # only need to manipulate the "img" field in the `RandomFlip` class
+               dict(type='RandomFlip'),
+           ])
+   ]
+   ```
+
+   In the `mapping` setting of the multi-target extension, we can also use `...` to ignore the specified original field. As shown in the following example, the wrapped `RandomCrop` will crop the image in the field `"img"` and update the size of the cropped image if the field `"img_shape"` exists. If we want to do the same random cropping for both image fields `"lq"` and `"gt"` at the same time but update the `"img_shape"` field only once, we can do it as in the example:
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           mapping={
+               'img': ['lq', 'gt'],
+               'img_shape': ['img_shape', ...],
+            },
+           # remap the "img" and "img_shape" fields back to their original fields after the transformation
+           auto_remap=True,
+           # whether to share random variables in the transformation of each target
+           # more introduction will be referred in the following chapters (random variable sharing)
+           share_random_params=True,
+           transforms=[
+               # "img" and "img_shape" fields are manipulated in the `RandomCrop` class
+               # if "img_shape" is missing, only operate on "img"
+               dict(type='RandomCrop'),
+           ])
+   ]
+   ```
+
+2. A set of targets applied to a field
+
+   Suppose we need to apply a data transformation to the `"images"` field, which is a list of images.
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           # map each image under the "images" field to the "img" field
+           mapping={'img': 'images'},
+           # remap the images under the "img" field back to the list in the "images" field after the transformation
+           auto_remap=True,
+           # whether to share random variables in the transformation of each target
+           share_random_params=True,
+           transforms=[
+               # in the `RandomFlip` transformation class, we only need to manipulate the "img" field
+               dict(type='RandomFlip'),
+           ])
+   ]
+   ```
+
+#### Decorator `cache_randomness`
+
+In `TransformBroadcaster`, we provide the `share_random_params` option to support sharing random states across multiple data transformations. For example, in a super-resolution task, we want to apply **the same** random transformations **simultaneously** to the low-resolution image and the original image. If we use this function in a custom data transformation class, we need to mark which random variables support sharing in the class. This can be achieved with the decorator `cache_randomness`.
+
+Taking `MyFlip` from the above example, we want to perform flipping randomly with a certain probability:
+
+```python
+from mmcv.transforms.utils import cache_randomness
+
+@TRANSFORMS.register_module()
+class MyRandomFlip(BaseTransform):
+    def __init__(self, prob: float, direction: str):
+        super().__init__()
+        self.prob = prob
+        self.direction = direction
+
+    @cache_randomness  # label the output of the method as a shareable random variable
+    def do_flip(self):
+        flip = True if random.random() > self.prob else False
+        return flip
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        if self.do_flip():
+            results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+In the above example, we decorate the `do_flip` method with `cache_randomness`, marking the method return value `flip` as a random variable that supports sharing. Therefore, in the transformation of `TransformBroadcaster` to multiple targets, the value of this variable will remain the same.
+
+#### Decorator `avoid_cache_randomness`
+
+In some cases, we cannot separate the process of generating random variables in data transformation into a class method. For example, modules from third-party libraries used in data transformation encapsulate the relevant parts of random variables inside, making them impossible to be extracted as class methods for data transformation. Such data transformations cannot support shared random variables through the decorator `cache_randomness` annotation, and thus cannot share random variables during multi-objective expansion.
+
+To avoid misuse of such data transformations in multi-object extensions, we provide another decorator, `avoid_cache_randomness`, to mark such data transformations:
+
+```python
+from mmcv.transforms.utils import avoid_cache_randomness
+
+@TRANSFORMS.register_module()
+@avoid_cache_randomness
+class MyRandomTransform(BaseTransform):
+
+    def transform(self, results: dict) -> dict:
+        ...
+```
+
+Data transformation classes marked with `avoid_cache_randomness` will throw an exception when their instance is wrapped by `TransformBroadcaster` and the parameter `share_random_params` is set to True. This reminds the user not to use it in this way.
+
+There are a few things to keep in mind when using `avoid_cache_randomness`:
+
+1. `avoid_cache_randomness` is only used to decorate data transformation classes (subclasses of `BaseTransfrom`) and cannot be used to decorate other general classes, class methods, or functions
+2. When a data transformation decorated with `avoid_cache_randomness` is used as a base class, its subclasses **will not inherit** its feature. If the subclass is still unable to share random variables, `avoid_cache_randomness` should be used again.
+3. A data transformation needs to be modified with `avoid_cache_randomness` only when a data transformation is random and cannot share its random parameters. Data transformations without randomness require no decoration
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/ops.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/ops.md
new file mode 100644
index 0000000000000000000000000000000000000000..5579cd7757fa344519e69c7fb1091de6fe32fdcc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/ops.md
@@ -0,0 +1,63 @@
+## ops
+
+We implement common ops used in detection, segmentation, etc.
+
+| Device                       | CPU | CUDA | MLU | MPS | Ascend |
+| ---------------------------- | --- | ---- | --- | --- | ------ |
+| ActiveRotatedFilter          | √   | √    |     |     |        |
+| AssignScoreWithK             |     | √    |     |     |        |
+| BallQuery                    |     | √    |     |     |        |
+| BBoxOverlaps                 |     | √    | √   | √   |        |
+| BorderAlign                  |     | √    |     |     |        |
+| BoxIouRotated                | √   | √    |     |     |        |
+| BoxIouQuadri                 | √   | √    |     |     |        |
+| CARAFE                       |     | √    | √   |     |        |
+| ChamferDistance              |     | √    |     |     |        |
+| CrissCrossAttention          |     | √    |     |     |        |
+| ContourExpand                | √   |      |     |     |        |
+| ConvexIoU                    |     | √    |     |     |        |
+| CornerPool                   |     | √    |     |     |        |
+| Correlation                  |     | √    |     |     |        |
+| Deformable Convolution v1/v2 | √   | √    |     |     | √      |
+| Deformable RoIPool           |     | √    | √   |     | √      |
+| DiffIoURotated               |     | √    |     |     |        |
+| DynamicScatter               |     | √    |     |     |        |
+| FurthestPointSample          |     | √    |     |     |        |
+| FurthestPointSampleWithDist  |     | √    |     |     |        |
+| FusedBiasLeakyrelu           |     | √    |     |     | √      |
+| GatherPoints                 |     | √    |     |     |        |
+| GroupPoints                  |     | √    |     |     |        |
+| Iou3d                        |     | √    | √   |     |        |
+| KNN                          |     | √    |     |     |        |
+| MaskedConv                   |     | √    | √   |     | √      |
+| MergeCells                   |     | √    |     |     |        |
+| MinAreaPolygon               |     | √    |     |     |        |
+| ModulatedDeformConv2d        | √   | √    |     |     | √      |
+| MultiScaleDeformableAttn     |     | √    | √   |     |        |
+| NMS                          | √   | √    | √   |     | √      |
+| NMSRotated                   | √   | √    |     |     |        |
+| NMSQuadri                    | √   | √    |     |     |        |
+| PixelGroup                   | √   |      |     |     |        |
+| PointsInBoxes                | √   | √    |     |     |        |
+| PointsInPolygons             |     | √    |     |     |        |
+| PSAMask                      | √   | √    | √   |     | √      |
+| RotatedFeatureAlign          | √   | √    |     |     |        |
+| RoIPointPool3d               |     | √    | √   |     |        |
+| RoIPool                      |     | √    | √   |     | √      |
+| RoIAlignRotated              | √   | √    | √   |     |        |
+| RiRoIAlignRotated            |     | √    |     |     |        |
+| RoIAlign                     | √   | √    | √   |     |        |
+| RoIAwarePool3d               |     | √    | √   |     |        |
+| SAConv2d                     |     | √    |     |     |        |
+| SigmoidFocalLoss             |     | √    | √   |     | √      |
+| SoftmaxFocalLoss             |     | √    |     |     | √      |
+| SoftNMS                      |     | √    |     |     |        |
+| Sparse Convolution           |     | √    |     |     |        |
+| Synchronized BatchNorm       |     | √    |     |     |        |
+| ThreeInterpolate             |     | √    |     |     |        |
+| ThreeNN                      |     | √    | √   |     |        |
+| TINShift                     |     | √    | √   |     |        |
+| UpFirDn2d                    |     | √    |     |     |        |
+| Voxelization                 | √   | √    |     |     |        |
+| PrRoIPool                    |     | √    |     |     |        |
+| BezierAlign                  | √   | √    |     |     |        |
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/visualization.md b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/visualization.md
new file mode 100644
index 0000000000000000000000000000000000000000..968e350589aafdf79c32593a6b5968329d5afa2a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/en/understand_mmcv/visualization.md
@@ -0,0 +1,24 @@
+## Visualization
+
+`mmcv` can show images and annotations (currently supported types include bounding boxes).
+
+```python
+# show an image file
+mmcv.imshow('a.jpg')
+
+# show a loaded image
+img = np.random.rand(100, 100, 3)
+mmcv.imshow(img)
+
+# show image with bounding boxes
+img = np.random.rand(100, 100, 3)
+bboxes = np.array([[0, 0, 50, 50], [20, 20, 60, 60]])
+mmcv.imshow_bboxes(img, bboxes)
+```
+
+`mmcv` can also visualize special images such as optical flows.
+
+```python
+flow = mmcv.flowread('test.flo')
+mmcv.flowshow(flow)
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/Makefile b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..51285967a7d9722c5bdee4f6a81c154a56aa0846
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/Makefile
@@ -0,0 +1,19 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/css/readthedocs.css b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/css/readthedocs.css
new file mode 100644
index 0000000000000000000000000000000000000000..9e3a567d5f78aedb606600bb3111034a1003b362
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/css/readthedocs.css
@@ -0,0 +1,10 @@
+.header-logo {
+    background-image: url("../image/mmcv-logo.png");
+    background-size: 85px 40px;
+    height: 40px;
+    width: 85px;
+}
+
+table.colwidths-auto td {
+    width: 50%
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/image/mmcv-logo.png b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/image/mmcv-logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..bcc5759f8fe3bc7d191d411c38a9e1d3c1c27a84
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/image/mmcv-logo.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/version.json b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/version.json
new file mode 100644
index 0000000000000000000000000000000000000000..7ee4965d36ed96f63f484137921d156d19cc40da
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_static/version.json
@@ -0,0 +1,575 @@
+{
+    "Linux": [
+        {
+            "cuda": "11.7",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.5",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.0",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "9.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "9.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        }
+    ],
+    "Windows": [
+        {
+            "cuda": "11.7",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "11.6",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.5",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.3",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "11.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "10.2",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "10.1",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2",
+                "2.0.0rc1"
+            ]
+        }
+    ],
+    "macOS": [
+        {
+            "cuda": "cpu",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "mps",
+            "torch": "1.13.x",
+            "mmcv": [
+                "2.0.0rc3"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.12.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.11.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.10.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.9.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.8.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.7.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        },
+        {
+            "cuda": "cpu",
+            "torch": "1.6.x",
+            "mmcv": [
+                "2.0.0rc3",
+                "2.0.0rc2"
+            ]
+        }
+    ]
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_templates/classtemplate.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_templates/classtemplate.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4f74842394ec9807fb1ae2d8f05a8a57e9a2e24c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/_templates/classtemplate.rst
@@ -0,0 +1,14 @@
+.. role:: hidden
+    :class: hidden-section
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+
+
+..
+  autogenerated from source/_templates/classtemplate.rst
+  note it does not have :inherited-members:
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/arraymisc.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/arraymisc.rst
new file mode 100644
index 0000000000000000000000000000000000000000..28975eb76e94994c50d2fe52b8f34c7ce533e788
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/arraymisc.rst
@@ -0,0 +1,19 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.arraymisc
+===================================
+
+.. contents:: mmcv.arraymisc
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.arraymisc
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   quantize
+   dequantize
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/cnn.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/cnn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5cbcb191e9e4feb7a76e9d154411fd899a48999e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/cnn.rst
@@ -0,0 +1,70 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.cnn
+===================================
+
+.. contents:: mmcv.cnn
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.cnn
+
+Module
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   ContextBlock
+   Conv2d
+   Conv3d
+   ConvAWS2d
+   ConvModule
+   ConvTranspose2d
+   ConvTranspose3d
+   ConvWS2d
+   DepthwiseSeparableConvModule
+   GeneralizedAttention
+   HSigmoid
+   HSwish
+   LayerScale
+   Linear
+   MaxPool2d
+   MaxPool3d
+   NonLocal1d
+   NonLocal2d
+   NonLocal3d
+   Scale
+   Swish
+
+Build Function
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   build_activation_layer
+   build_conv_layer
+   build_norm_layer
+   build_padding_layer
+   build_plugin_layer
+   build_upsample_layer
+
+Miscellaneous
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   fuse_conv_bn
+   conv_ws_2d
+   is_norm
+   make_res_layer
+   make_vgg_layer
+   get_model_complexity_info
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/image.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/image.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3b93484952cd0c45b9d103088b0677f93fe5615d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/image.rst
@@ -0,0 +1,100 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.image
+===================================
+
+.. contents:: mmcv.image
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.image
+
+IO
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   imfrombytes
+   imread
+   imwrite
+   use_backend
+
+Color Space
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   bgr2gray
+   bgr2hls
+   bgr2hsv
+   bgr2rgb
+   bgr2ycbcr
+   gray2bgr
+   gray2rgb
+   hls2bgr
+   hsv2bgr
+   imconvert
+   rgb2bgr
+   rgb2gray
+   rgb2ycbcr
+   ycbcr2bgr
+   ycbcr2rgb
+
+Geometric
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   cutout
+   imcrop
+   imflip
+   impad
+   impad_to_multiple
+   imrescale
+   imresize
+   imresize_like
+   imresize_to_multiple
+   imrotate
+   imshear
+   imtranslate
+   rescale_size
+
+Photometric
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   adjust_brightness
+   adjust_color
+   adjust_contrast
+   adjust_hue
+   adjust_lighting
+   adjust_sharpness
+   auto_contrast
+   clahe
+   imdenormalize
+   imequalize
+   iminvert
+   imnormalize
+   lut_transform
+   posterize
+   solarize
+
+Miscellaneous
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   tensor2imgs
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/ops.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/ops.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b0290457bfa0c08f14d7fe346efccb33f388bdae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/ops.rst
@@ -0,0 +1,135 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.ops
+===================================
+
+.. contents:: mmcv.ops
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.ops
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   BorderAlign
+   CARAFE
+   CARAFENaive
+   CARAFEPack
+   Conv2d
+   ConvTranspose2d
+   CornerPool
+   Correlation
+   CrissCrossAttention
+   DeformConv2d
+   DeformConv2dPack
+   DeformRoIPool
+   DeformRoIPoolPack
+   DynamicScatter
+   FusedBiasLeakyReLU
+   GroupAll
+   Linear
+   MaskedConv2d
+   MaxPool2d
+   ModulatedDeformConv2d
+   ModulatedDeformConv2dPack
+   ModulatedDeformRoIPoolPack
+   MultiScaleDeformableAttention
+   PSAMask
+   PointsSampler
+   PrRoIPool
+   QueryAndGroup
+   RiRoIAlignRotated
+   RoIAlign
+   RoIAlignRotated
+   RoIAwarePool3d
+   RoIPointPool3d
+   RoIPool
+   SAConv2d
+   SigmoidFocalLoss
+   SimpleRoIAlign
+   SoftmaxFocalLoss
+   SparseConv2d
+   SparseConv3d
+   SparseConvTensor
+   SparseConvTranspose2d
+   SparseConvTranspose3d
+   SparseInverseConv2d
+   SparseInverseConv3d
+   SparseMaxPool2d
+   SparseMaxPool3d
+   SparseModule
+   SparseSequential
+   SubMConv2d
+   SubMConv3d
+   SyncBatchNorm
+   TINShift
+   Voxelization
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   active_rotated_filter
+   assign_score_withk
+   ball_query
+   batched_nms
+   bbox_overlaps
+   border_align
+   box_iou_rotated
+   boxes_iou3d
+   boxes_iou_bev
+   boxes_overlap_bev
+   carafe
+   carafe_naive
+   chamfer_distance
+   contour_expand
+   convex_giou
+   convex_iou
+   deform_conv2d
+   deform_roi_pool
+   diff_iou_rotated_2d
+   diff_iou_rotated_3d
+   dynamic_scatter
+   furthest_point_sample
+   furthest_point_sample_with_dist
+   fused_bias_leakyrelu
+   gather_points
+   grouping_operation
+   knn
+   masked_conv2d
+   min_area_polygons
+   modulated_deform_conv2d
+   nms
+   nms3d
+   nms3d_normal
+   nms_bev
+   nms_match
+   nms_normal_bev
+   nms_rotated
+   pixel_group
+   point_sample
+   points_in_boxes_all
+   points_in_boxes_cpu
+   points_in_boxes_part
+   points_in_polygons
+   prroi_pool
+   rel_roi_point_to_rel_img_point
+   riroi_align_rotated
+   roi_align
+   roi_align_rotated
+   roi_pool
+   rotated_feature_align
+   scatter_nd
+   sigmoid_focal_loss
+   soft_nms
+   softmax_focal_loss
+   three_interpolate
+   three_nn
+   tin_shift
+   upfirdn2d
+   voxelization
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/transforms.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/transforms.rst
new file mode 100644
index 0000000000000000000000000000000000000000..56463b304e39734ad55d27a2f5ab54ad529de7ed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/transforms.rst
@@ -0,0 +1,57 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.transforms
+===================================
+
+.. currentmodule:: mmcv.transforms
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   BaseTransform
+
+Loading
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   LoadAnnotations
+   LoadImageFromFile
+
+Processing
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   CenterCrop
+   MultiScaleFlipAug
+   Normalize
+   Pad
+   RandomChoiceResize
+   RandomFlip
+   RandomGrayscale
+   RandomResize
+   Resize
+
+Wrapper
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   Compose
+   KeyMapper
+   RandomApply
+   RandomChoice
+   TransformBroadcaster
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/utils.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/utils.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f2ff4c2a3872bc9ae0c2942debac5e5b523bd071
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/utils.rst
@@ -0,0 +1,23 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.utils
+===================================
+
+.. contents:: mmcv.utils
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.utils
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   IS_CUDA_AVAILABLE
+   IS_MLU_AVAILABLE
+   IS_MPS_AVAILABLE
+   collect_env
+   jit
+   skip_no_elena
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/video.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/video.rst
new file mode 100644
index 0000000000000000000000000000000000000000..a6ebca0eb73afcf3f3f11aae8520e2782a310f13
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/video.rst
@@ -0,0 +1,56 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.video
+===================================
+
+.. contents:: mmcv.video
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.video
+
+IO
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   VideoReader
+   Cache
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   frames2video
+
+Optical Flow
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   dequantize_flow
+   flow_from_bytes
+   flow_warp
+   flowread
+   flowwrite
+   quantize_flow
+   sparse_flow_from_bytes
+
+Video Processing
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   concat_video
+   convert_video
+   cut_video
+   resize_video
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/visualization.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/visualization.rst
new file mode 100644
index 0000000000000000000000000000000000000000..8f43ef27a441dcd9001a352cf18e97f8e615676d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/api/visualization.rst
@@ -0,0 +1,50 @@
+.. role:: hidden
+    :class: hidden-section
+
+mmcv.visualization
+===================================
+
+.. contents:: mmcv.visualization
+   :depth: 2
+   :local:
+   :backlinks: top
+
+.. currentmodule:: mmcv.visualization
+
+Color
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: classtemplate.rst
+
+   Color
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   color_val
+
+Image
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   imshow
+   imshow_bboxes
+   imshow_det_bboxes
+
+Optical Flow
+----------------
+
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+
+   flow2rgb
+   flowshow
+   make_color_wheel
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/code_style.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/code_style.md
new file mode 100644
index 0000000000000000000000000000000000000000..8ddb87c2391e07b848aa073287cc2a230da8c3ec
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/code_style.md
@@ -0,0 +1,609 @@
+## 代码规范
+
+### 代码规范标准
+
+#### PEP 8 —— Python 官方代码规范
+
+[Python 官方的代码风格指南](https://www.python.org/dev/peps/pep-0008/)，包含了以下几个方面的内容：
+
+- 代码布局，介绍了 Python 中空行、断行以及导入相关的代码风格规范。比如一个常见的问题：当我的代码较长，无法在一行写下时，何处可以断行？
+
+- 表达式，介绍了 Python 中表达式空格相关的一些风格规范。
+
+- 尾随逗号相关的规范。当列表较长，无法一行写下而写成如下逐行列表时，推荐在末项后加逗号，从而便于追加选项、版本控制等。
+
+  ```python
+  # Correct:
+  FILES = ['setup.cfg', 'tox.ini']
+  # Correct:
+  FILES = [
+      'setup.cfg',
+      'tox.ini',
+  ]
+  # Wrong:
+  FILES = ['setup.cfg', 'tox.ini',]
+  # Wrong:
+  FILES = [
+      'setup.cfg',
+      'tox.ini'
+  ]
+  ```
+
+- 命名相关规范、注释相关规范、类型注解相关规范，我们将在后续章节中做详细介绍。
+
+  "A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important." PEP 8 -- Style Guide for Python Code
+
+:::{note}
+PEP 8 的代码规范并不是绝对的，项目内的一致性要优先于 PEP 8 的规范。OpenMMLab 各个项目都在 setup.cfg 设定了一些代码规范的设置，请遵照这些设置。一个例子是在 PEP 8 中有如下一个例子：
+
+```python
+# Correct:
+hypot2 = x*x + y*y
+# Wrong:
+hypot2 = x * x + y * y
+```
+
+这一规范是为了指示不同优先级，但 OpenMMLab 的设置中通常没有启用 yapf 的 `ARITHMETIC_PRECEDENCE_INDICATION` 选项，因而格式规范工具不会按照推荐样式格式化，以设置为准。
+:::
+
+#### Google 开源项目风格指南
+
+[Google 使用的编程风格指南](https://google.github.io/styleguide/pyguide.html)，包括了 Python 相关的章节。相较于 PEP 8，该指南提供了更为详尽的代码指南。该指南包括了语言规范和风格规范两个部分。
+
+其中，语言规范对 Python 中很多语言特性进行了优缺点的分析，并给出了使用指导意见，如异常、Lambda 表达式、列表推导式、metaclass 等。
+
+风格规范的内容与 PEP 8 较为接近，大部分约定建立在 PEP 8 的基础上，也有一些更为详细的约定，如函数长度、TODO 注释、文件与 socket 对象的访问等。
+
+推荐将该指南作为参考进行开发，但不必严格遵照，一来该指南存在一些 Python 2 兼容需求，例如指南中要求所有无基类的类应当显式地继承 Object, 而在仅使用 Python 3 的环境中，这一要求是不必要的，依本项目中的惯例即可。二来 OpenMMLab 的项目作为框架级的开源软件，不必对一些高级技巧过于避讳，尤其是 MMCV。但尝试使用这些技巧前应当认真考虑是否真的有必要，并寻求其他开发人员的广泛评估。
+
+另外需要注意的一处规范是关于包的导入，在该指南中，要求导入本地包时必须使用路径全称，且导入的每一个模块都应当单独成行，通常这是不必要的，而且也不符合目前项目的开发惯例，此处进行如下约定：
+
+```python
+# Correct
+from mmcv.cnn.bricks import (Conv2d, build_norm_layer, DropPath, MaxPool2d,
+                             Linear)
+from ..utils import ext_loader
+
+# Wrong
+from mmcv.cnn.bricks import Conv2d, build_norm_layer, DropPath, MaxPool2d, \
+                            Linear  # 使用括号进行连接，而不是反斜杠
+from ...utils import is_str  # 最多向上回溯一层，过多的回溯容易导致结构混乱
+```
+
+OpenMMLab 项目使用 pre-commit 工具自动格式化代码，详情见[贡献代码](./contributing.md#代码风格)。
+
+### 命名规范
+
+#### 命名规范的重要性
+
+优秀的命名是良好代码可读的基础。基础的命名规范对各类变量的命名做了要求，使读者可以方便地根据代码名了解变量是一个类 / 局部变量 / 全局变量等。而优秀的命名则需要代码作者对于变量的功能有清晰的认识，以及良好的表达能力，从而使读者根据名称就能了解其含义，甚至帮助了解该段代码的功能。
+
+#### 基础命名规范
+
+| 类型            | 公有             | 私有               |
+| --------------- | ---------------- | ------------------ |
+| 模块            | lower_with_under | \_lower_with_under |
+| 包              | lower_with_under |                    |
+| 类              | CapWords         | \_CapWords         |
+| 异常            | CapWordsError    |                    |
+| 函数（方法）    | lower_with_under | \_lower_with_under |
+| 函数 / 方法参数 | lower_with_under |                    |
+| 全局 / 类内常量 | CAPS_WITH_UNDER  | \_CAPS_WITH_UNDER  |
+| 全局 / 类内变量 | lower_with_under | \_lower_with_under |
+| 变量            | lower_with_under | \_lower_with_under |
+| 局部变量        | lower_with_under |                    |
+
+注意：
+
+- 尽量避免变量名与保留字冲突，特殊情况下如不可避免，可使用一个后置下划线，如 class\_
+- 尽量不要使用过于简单的命名，除了约定俗成的循环变量 i，文件变量 f，错误变量 e 等。
+- 不会被用到的变量可以命名为 \_，逻辑检查器会将其忽略。
+
+#### 命名技巧
+
+良好的变量命名需要保证三点：
+
+1. 含义准确，没有歧义
+2. 长短适中
+3. 前后统一
+
+```python
+# Wrong
+class Masks(metaclass=ABCMeta):  # 命名无法表现基类；Instance or Semantic？
+    pass
+
+# Correct
+class BaseInstanceMasks(metaclass=ABCMeta):
+    pass
+
+# Wrong，不同地方含义相同的变量尽量用统一的命名
+def __init__(self, inplanes, planes):
+    pass
+
+def __init__(self, in_channels, out_channels):
+    pass
+```
+
+常见的函数命名方法：
+
+- 动宾命名法：crop_img, init_weights
+- 动宾倒置命名法：imread, bbox_flip
+
+注意函数命名与参数的顺序，保证主语在前，符合语言习惯：
+
+- check_keys_exist(key, container)
+- check_keys_contain(container, key)
+
+注意避免非常规或统一约定的缩写，如 nb -> num_blocks，in_nc -> in_channels
+
+### docstring 规范
+
+#### 为什么要写 docstring
+
+docstring 是对一个类、一个函数功能与 API 接口的详细描述，有两个功能，一是帮助其他开发者了解代码功能，方便 debug 和复用代码；二是在 Readthedocs 文档中自动生成相关的 API reference 文档，帮助不了解源代码的社区用户使用相关功能。
+
+#### 如何写 docstring
+
+与注释不同，一份规范的 docstring 有着严格的格式要求，以便于 Python 解释器以及 sphinx 进行文档解析，详细的 docstring 约定参见 [PEP 257](https://www.python.org/dev/peps/pep-0257/)。此处以例子的形式介绍各种文档的标准格式，参考格式为 [Google 风格](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules/#comments)。
+
+1. 模块文档
+
+   代码风格规范推荐为每一个模块（即 Python 文件）编写一个 docstring，但目前 OpenMMLab 项目大部分没有此类 docstring，因此不做硬性要求。
+
+   ```python
+   """A one line summary of the module or program, terminated by a period.
+
+   Leave one blank line. The rest of this docstring should contain an
+   overall description of the module or program. Optionally, it may also
+   contain a brief description of exported classes and functions and/or usage
+   examples.
+
+   Typical usage example:
+
+   foo = ClassFoo()
+   bar = foo.FunctionBar()
+   """
+   ```
+
+2. 类文档
+
+   类文档是我们最常需要编写的，此处，按照 OpenMMLab 的惯例，我们使用了与 Google 风格不同的写法。如下例所示，文档中没有使用 Attributes 描述类属性，而是使用 Args 描述 __init__ 函数的参数。
+
+   在 Args 中，遵照 `parameter (type): Description.` 的格式，描述每一个参数类型和功能。其中，多种类型可使用 `(float or str)` 的写法，可以为 None 的参数可以写为 `(int, optional)`。
+
+   ```python
+   class BaseRunner(metaclass=ABCMeta):
+       """The base class of Runner, a training helper for PyTorch.
+
+       All subclasses should implement the following APIs:
+
+       - ``run()``
+       - ``train()``
+       - ``val()``
+       - ``save_checkpoint()``
+
+       Args:
+           model (:obj:`torch.nn.Module`): The model to be run.
+           batch_processor (callable, optional): A callable method that process
+               a data batch. The interface of this method should be
+               ``batch_processor(model, data, train_mode) -> dict``.
+               Defaults to None.
+           optimizer (dict or :obj:`torch.optim.Optimizer`, optional): It can be
+               either an optimizer (in most cases) or a dict of optimizers
+               (in models that requires more than one optimizer, e.g., GAN).
+               Defaults to None.
+           work_dir (str, optional): The working directory to save checkpoints
+               and logs. Defaults to None.
+           logger (:obj:`logging.Logger`): Logger used during training.
+                Defaults to None. (The default value is just for backward
+                compatibility)
+           meta (dict, optional): A dict records some import information such as
+               environment info and seed, which will be logged in logger hook.
+               Defaults to None.
+           max_epochs (int, optional): Total training epochs. Defaults to None.
+           max_iters (int, optional): Total training iterations. Defaults to None.
+       """
+
+       def __init__(self,
+                    model,
+                    batch_processor=None,
+                    optimizer=None,
+                    work_dir=None,
+                    logger=None,
+                    meta=None,
+                    max_iters=None,
+                    max_epochs=None):
+           ...
+   ```
+
+   另外，在一些算法实现的主体类中，建议加入原论文的链接；如果参考了其他开源代码的实现，则应加入 modified from，而如果是直接复制了其他代码库的实现，则应加入 copied from ，并注意源码的 License。如有必要，也可以通过 .. math:: 来加入数学公式
+
+   ```python
+   # 参考实现
+   # This func is modified from `detectron2
+   # <https://github.com/facebookresearch/detectron2/blob/ffff8acc35ea88ad1cb1806ab0f00b4c1c5dbfd9/detectron2/structures/masks.py#L387>`_.
+
+   # 复制代码
+   # This code was copied from the `ubelt
+   # library<https://github.com/Erotemic/ubelt>`_.
+
+   # 引用论文 & 添加公式
+   class LabelSmoothLoss(nn.Module):
+       r"""Initializer for the label smoothed cross entropy loss.
+
+       Refers to `Rethinking the Inception Architecture for Computer Vision
+       <https://arxiv.org/abs/1512.00567>`_.
+
+       This decreases gap between output scores and encourages generalization.
+       Labels provided to forward can be one-hot like vectors (NxC) or class
+       indices (Nx1).
+       And this accepts linear combination of one-hot like labels from mixup or
+       cutmix except multi-label task.
+
+       Args:
+           label_smooth_val (float): The degree of label smoothing.
+           num_classes (int, optional): Number of classes. Defaults to None.
+           mode (str): Refers to notes, Options are "original", "classy_vision",
+               "multi_label". Defaults to "classy_vision".
+           reduction (str): The method used to reduce the loss.
+               Options are "none", "mean" and "sum". Defaults to 'mean'.
+           loss_weight (float):  Weight of the loss. Defaults to 1.0.
+
+       Note:
+           if the ``mode`` is "original", this will use the same label smooth
+           method as the original paper as:
+
+           .. math::
+               (1-\epsilon)\delta_{k, y} + \frac{\epsilon}{K}
+
+           where :math:`\epsilon` is the ``label_smooth_val``, :math:`K` is
+           the ``num_classes`` and :math:`\delta_{k,y}` is Dirac delta,
+           which equals 1 for k=y and 0 otherwise.
+
+           if the ``mode`` is "classy_vision", this will use the same label
+           smooth method as the `facebookresearch/ClassyVision
+           <https://github.com/facebookresearch/ClassyVision/blob/main/classy_vision/losses/label_smoothing_loss.py>`_ repo as:
+
+           .. math::
+               \frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}
+
+           if the ``mode`` is "multi_label", this will accept labels from
+           multi-label task and smoothing them as:
+
+           .. math::
+               (1-2\epsilon)\delta_{k, y} + \epsilon
+   ```
+
+```{note}
+注意 \`\`here\`\`、\`here\`、"here" 三种引号功能是不同。
+
+在 reStructured 语法中，\`\`here\`\` 表示一段代码；\`here\` 表示斜体；"here" 无特殊含义，一般可用来表示字符串。其中 \`here\` 的用法与 Markdown 中不同，需要多加留意。
+另外还有 :obj:\`type\` 这种更规范的表示类的写法，但鉴于长度，不做特别要求，一般仅用于表示非常用类型。
+```
+
+3. 方法（函数）文档
+
+   函数文档与类文档的结构基本一致，但需要加入返回值文档。对于较为复杂的函数和类，可以使用 Examples 字段加入示例；如果需要对参数加入一些较长的备注，可以加入 Note 字段进行说明。
+
+   对于使用较为复杂的类或函数，比起看大段大段的说明文字和参数文档，添加合适的示例更能帮助用户迅速了解其用法。需要注意的是，这些示例最好是能够直接在 Python 交互式环境中运行的，并给出一些相对应的结果。如果存在多个示例，可以使用注释简单说明每段示例，也能起到分隔作用。
+
+   ```python
+   def import_modules_from_strings(imports, allow_failed_imports=False):
+       """Import modules from the given list of strings.
+
+       Args:
+           imports (list | str | None): The given module names to be imported.
+           allow_failed_imports (bool): If True, the failed imports will return
+               None. Otherwise, an ImportError is raise. Defaults to False.
+
+       Returns:
+           List[module] | module | None: The imported modules.
+           All these three lines in docstring will be compiled into the same
+           line in readthedocs.
+
+       Examples:
+           >>> osp, sys = import_modules_from_strings(
+           ...     ['os.path', 'sys'])
+           >>> import os.path as osp_
+           >>> import sys as sys_
+           >>> assert osp == osp_
+           >>> assert sys == sys_
+       """
+       ...
+   ```
+
+   如果函数接口在某个版本发生了变化，需要在 docstring 中加入相关的说明，必要时添加 Note 或者 Warning 进行说明，例如：
+
+   ```python
+   class CheckpointHook(Hook):
+       """Save checkpoints periodically.
+
+       Args:
+           out_dir (str, optional): The root directory to save checkpoints. If
+               not specified, ``runner.work_dir`` will be used by default. If
+               specified, the ``out_dir`` will be the concatenation of
+               ``out_dir`` and the last level directory of ``runner.work_dir``.
+               Defaults to None. `Changed in version 1.3.15.`
+           file_client_args (dict, optional): Arguments to instantiate a
+               FileClient. See :class:`mmcv.fileio.FileClient` for details.
+               Defaults to None. `New in version 1.3.15.`
+
+       Warning:
+           Before v1.3.15, the ``out_dir`` argument indicates the path where the
+           checkpoint is stored. However, in v1.3.15 and later, ``out_dir``
+           indicates the root directory and the final path to save checkpoint is
+           the concatenation of out_dir and the last level directory of
+           ``runner.work_dir``. Suppose the value of ``out_dir`` is
+           "/path/of/A" and the value of ``runner.work_dir`` is "/path/of/B",
+           then the final path will be "/path/of/A/B".
+   ```
+
+   如果参数或返回值里带有需要展开描述字段的 dict，则应该采用如下格式：
+
+   ```python
+   def func(x):
+       r"""
+       Args:
+           x (None): A dict with 2 keys, ``padded_targets``, and ``targets``.
+
+               - ``targets`` (list[Tensor]): A list of tensors.
+                 Each tensor has the shape of :math:`(T_i)`. Each
+                 element is the index of a character.
+               - ``padded_targets`` (Tensor): A tensor of shape :math:`(N)`.
+                 Each item is the length of a word.
+
+       Returns:
+           dict: A dict with 2 keys, ``padded_targets``, and ``targets``.
+
+           - ``targets`` (list[Tensor]): A list of tensors.
+             Each tensor has the shape of :math:`(T_i)`. Each
+             element is the index of a character.
+           - ``padded_targets`` (Tensor): A tensor of shape :math:`(N)`.
+             Each item is the length of a word.
+       """
+       return x
+   ```
+
+```{important}
+为了生成 readthedocs 文档，文档的编写需要按照 ReStructrued 文档格式，否则会产生文档渲染错误，在提交 PR 前，最好生成并预览一下文档效果。
+语法规范参考：
+
+- [reStructuredText Primer - Sphinx documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#)
+- [Example Google Style Python Docstrings ‒ napoleon 0.7 documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html#example-google)
+```
+
+### 注释规范
+
+#### 为什么要写注释
+
+对于一个开源项目，团队合作以及社区之间的合作是必不可少的，因而尤其要重视合理的注释。不写注释的代码，很有可能过几个月自己也难以理解，造成额外的阅读和修改成本。
+
+#### 如何写注释
+
+最需要写注释的是代码中那些技巧性的部分。如果你在下次代码审查的时候必须解释一下，那么你应该现在就给它写注释。对于复杂的操作，应该在其操作开始前写上若干行注释。对于不是一目了然的代码，应在其行尾添加注释。
+—— Google 开源项目风格指南
+
+```python
+# We use a weighted dictionary search to find out where i is in
+# the array. We extrapolate position based on the largest num
+# in the array and the array size and then do binary search to
+# get the exact number.
+if i & (i-1) == 0:  # True if i is 0 or a power of 2.
+```
+
+为了提高可读性, 注释应该至少离开代码2个空格.
+另一方面, 绝不要描述代码. 假设阅读代码的人比你更懂Python, 他只是不知道你的代码要做什么.
+—— Google 开源项目风格指南
+
+```python
+# Wrong:
+# Now go through the b array and make sure whenever i occurs
+# the next element is i+1
+
+# Wrong:
+if i & (i-1) == 0:  # True if i bitwise and i-1 is 0.
+```
+
+在注释中，可以使用 Markdown 语法，因为开发人员通常熟悉 Markdown 语法，这样可以便于交流理解，如可使用单反引号表示代码和变量（注意不要和 docstring 中的 ReStructured 语法混淆）
+
+```python
+# `_reversed_padding_repeated_twice` is the padding to be passed to
+# `F.pad` if needed (e.g., for non-zero padding types that are
+# implemented as two ops: padding + conv). `F.pad` accepts paddings in
+# reverse order than the dimension.
+self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2)
+```
+
+#### 注释示例
+
+1. 出自 `mmcv/utils/registry.py`，对于较为复杂的逻辑结构，通过注释，明确了优先级关系。
+
+   ```python
+   # self.build_func will be set with the following priority:
+   # 1. build_func
+   # 2. parent.build_func
+   # 3. build_from_cfg
+   if build_func is None:
+       if parent is not None:
+           self.build_func = parent.build_func
+       else:
+           self.build_func = build_from_cfg
+   else:
+       self.build_func = build_func
+   ```
+
+2. 出自 `mmcv/runner/checkpoint.py`，对于 bug 修复中的一些特殊处理，可以附带相关的 issue 链接，帮助其他人了解 bug 背景。
+
+   ```python
+   def _save_ckpt(checkpoint, file):
+       # The 1.6 release of PyTorch switched torch.save to use a new
+       # zipfile-based file format. It will cause RuntimeError when a
+       # checkpoint was saved in high version (PyTorch version>=1.6.0) but
+       # loaded in low version (PyTorch version<1.6.0). More details at
+       # https://github.com/open-mmlab/mmpose/issues/904
+       if digit_version(TORCH_VERSION) >= digit_version('1.6.0'):
+           torch.save(checkpoint, file, _use_new_zipfile_serialization=False)
+       else:
+           torch.save(checkpoint, file)
+   ```
+
+### 类型注解
+
+#### 为什么要写类型注解
+
+类型注解是对函数中变量的类型做限定或提示，为代码的安全性提供保障、增强代码的可读性、避免出现类型相关的错误。
+Python 没有对类型做强制限制，类型注解只起到一个提示作用，通常你的 IDE 会解析这些类型注解，然后在你调用相关代码时对类型做提示。另外也有类型注解检查工具，这些工具会根据类型注解，对代码中可能出现的问题进行检查，减少 bug 的出现。
+需要注意的是，通常我们不需要注释模块中的所有函数：
+
+1. 公共的 API 需要注释
+2. 在代码的安全性，清晰性和灵活性上进行权衡是否注释
+3. 对于容易出现类型相关的错误的代码进行注释
+4. 难以理解的代码请进行注释
+5. 若代码中的类型已经稳定，可以进行注释. 对于一份成熟的代码，多数情况下，即使注释了所有的函数，也不会丧失太多的灵活性.
+
+#### 如何写类型注解
+
+1. 函数 / 方法类型注解，通常不对 self 和 cls 注释。
+
+   ```python
+   from typing import Optional, List, Tuple
+
+   # 全部位于一行
+   def my_method(self, first_var: int) -> int:
+       pass
+
+   # 另起一行
+   def my_method(
+           self, first_var: int,
+           second_var: float) -> Tuple[MyLongType1, MyLongType1, MyLongType1]:
+       pass
+
+   # 单独成行（具体的应用场合与行宽有关，建议结合 yapf 自动化格式使用）
+   def my_method(
+       self, first_var: int, second_var: float
+   ) -> Tuple[MyLongType1, MyLongType1, MyLongType1]:
+       pass
+
+   # 引用尚未被定义的类型
+   class MyClass:
+       def __init__(self,
+                    stack: List["MyClass"]) -> None:
+           pass
+   ```
+
+   注：类型注解中的类型可以是 Python 内置类型，也可以是自定义类，还可以使用 Python 提供的 wrapper 类对类型注解进行装饰，一些常见的注解如下：
+
+   ```python
+   # 数值类型
+   from numbers import Number
+
+   # 可选类型，指参数可以为 None
+   from typing import Optional
+   def foo(var: Optional[int] = None):
+       pass
+
+   # 联合类型，指同时接受多种类型
+   from typing import Union
+   def foo(var: Union[float, str]):
+       pass
+
+   from typing import Sequence  # 序列类型
+   from typing import Iterable  # 可迭代类型
+   from typing import Any  # 任意类型
+   from typing import Callable  # 可调用类型
+
+   from typing import List, Dict  # 列表和字典的泛型类型
+   from typing import Tuple  # 元组的特殊格式
+   # 虽然在 Python 3.9 中，list, tuple 和 dict 本身已支持泛型，但为了支持之前的版本
+   # 我们在进行类型注解时还是需要使用 List, Tuple, Dict 类型
+   # 另外，在对参数类型进行注解时，尽量使用 Sequence & Iterable & Mapping
+   # List, Tuple, Dict 主要用于返回值类型注解
+   # 参见 https://docs.python.org/3/library/typing.html#typing.List
+   ```
+
+2. 变量类型注解，一般用于难以直接推断其类型时
+
+   ```python
+   # Recommend: 带类型注解的赋值
+   a: Foo = SomeUndecoratedFunction()
+   a: List[int]: [1, 2, 3]         # List 只支持单一类型泛型，可使用 Union
+   b: Tuple[int, int] = (1, 2)     # 长度固定为 2
+   c: Tuple[int, ...] = (1, 2, 3)  # 变长
+   d: Dict[str, int] = {'a': 1, 'b': 2}
+
+   # Not Recommend：行尾类型注释
+   # 虽然这种方式被写在了 Google 开源指南中，但这是一种为了支持 Python 2.7 版本
+   # 而补充的注释方式，鉴于我们只支持 Python 3, 为了风格统一，不推荐使用这种方式。
+   a = SomeUndecoratedFunction()  # type: Foo
+   a = [1, 2, 3]  # type: List[int]
+   b = (1, 2, 3)  # type: Tuple[int, ...]
+   c = (1, "2", 3.5)  # type: Tuple[int, Text, float]
+   ```
+
+3. 泛型
+
+   上文中我们知道，typing 中提供了 list 和 dict 的泛型类型，那么我们自己是否可以定义类似的泛型呢？
+
+   ```python
+   from typing import TypeVar, Generic
+
+   KT = TypeVar('KT')
+   VT = TypeVar('VT')
+
+   class Mapping(Generic[KT, VT]):
+       def __init__(self, data: Dict[KT, VT]):
+           self._data = data
+
+       def __getitem__(self, key: KT) -> VT:
+           return self._data[key]
+   ```
+
+   使用上述方法，我们定义了一个拥有泛型能力的映射类，实际用法如下：
+
+   ```python
+   mapping = Mapping[str, float]({'a': 0.5})
+   value: float = example['a']
+   ```
+
+   另外，我们也可以利用 TypeVar 在函数签名中指定联动的多个类型：
+
+   ```python
+   from typing import TypeVar, List
+
+   T = TypeVar('T')  # Can be anything
+   A = TypeVar('A', str, bytes)  # Must be str or bytes
+
+
+   def repeat(x: T, n: int) -> List[T]:
+       """Return a list containing n references to x."""
+       return [x]*n
+
+
+   def longest(x: A, y: A) -> A:
+       """Return the longest of two strings."""
+       return x if len(x) >= len(y) else y
+   ```
+
+更多关于类型注解的写法请参考 [typing](https://docs.python.org/3/library/typing.html)。
+
+#### 类型注解检查工具
+
+[mypy](https://mypy.readthedocs.io/en/stable/) 是一个 Python 静态类型检查工具。根据你的类型注解，mypy 会检查传参、赋值等操作是否符合类型注解，从而避免可能出现的 bug。
+
+例如如下的一个  Python 脚本文件 test.py:
+
+```python
+def foo(var: int) -> float:
+    return float(var)
+
+a: str = foo('2.0')
+b: int = foo('3.0')  # type: ignore
+```
+
+运行 mypy test.py 可以得到如下检查结果，分别指出了第 4 行在函数调用和返回值赋值两处类型错误。而第 5 行同样存在两个类型错误，由于使用了 type: ignore 而被忽略了，只有部分特殊情况可能需要此类忽略。
+
+```
+test.py:4: error: Incompatible types in assignment (expression has type "float", variable has type "int")
+test.py:4: error: Argument 1 to "foo" has incompatible type "str"; expected "int"
+Found 2 errors in 1 file (checked 1 source file)
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/contributing.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/contributing.md
new file mode 100644
index 0000000000000000000000000000000000000000..a53dc3cb44a56fae17265a1b0cae79c427d408a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/contributing.md
@@ -0,0 +1,278 @@
+## 贡献代码
+
+欢迎加入 MMCV 社区，我们致力于打造最前沿的计算机视觉基础库，我们欢迎任何类型的贡献，包括但不限于
+
+**修复错误**
+
+修复代码实现错误的步骤如下：
+
+1. 如果提交的代码改动较大，建议先提交 issue，并正确描述 issue 的现象、原因和复现方式，讨论后确认修复方案。
+2. 修复错误并补充相应的单元测试，提交拉取请求。
+
+**新增功能或组件**
+
+1. 如果新功能或模块涉及较大的代码改动，建议先提交 issue，确认功能的必要性。
+2. 实现新增功能并添单元测试，提交拉取请求。
+
+**文档补充**
+
+修复文档可以直接提交拉取请求
+
+添加文档或将文档翻译成其他语言步骤如下
+
+1. 提交 issue，确认添加文档的必要性。
+2. 添加文档，提交拉取请求。
+
+### 拉取请求工作流
+
+如果你对拉取请求不了解，没关系，接下来的内容将会从零开始，一步一步地指引你如何创建一个拉取请求。如果你想深入了解拉取请求的开发模式，可以参考 github [官方文档](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. 复刻仓库
+
+当你第一次提交拉取请求时，先复刻 OpenMMLab 原代码库，点击 GitHub 页面右上角的 **Fork** 按钮，复刻后的代码库将会出现在你的 GitHub 个人主页下。
+
+<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">
+
+将代码克隆到本地
+
+```shell
+git clone git@github.com:{username}/mmcv.git
+```
+
+添加原代码库为上游代码库
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmcv
+```
+
+检查 remote 是否添加成功，在终端输入 `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmcv.git (fetch)
+origin	git@github.com:{username}/mmcv.git (push)
+upstream	git@github.com:open-mmlab/mmcv (fetch)
+upstream	git@github.com:open-mmlab/mmcv (push)
+```
+
+```{note}
+这里对 origin 和 upstream 进行一个简单的介绍，当我们使用 git clone 来克隆代码时，会默认创建一个 origin 的 remote，它指向我们克隆的代码库地址，而 upstream 则是我们自己添加的，用来指向原始代码库地址。当然如果你不喜欢他叫 upstream，也可以自己修改，比如叫 open-mmlab。我们通常向 origin 提交代码（即 fork 下来的远程仓库），然后向 upstream 提交一个 pull request。如果提交的代码和最新的代码发生冲突，再从 upstream 拉取最新的代码，和本地分支解决冲突，再提交到 origin。
+```
+
+#### 2. 配置 pre-commit
+
+在本地开发环境中，我们使用 [pre-commit](https://pre-commit.com/#intro) 来检查代码风格，以确保代码风格的统一。在提交代码，需要先安装 pre-commit（需要在 MMCV 目录下执行）:
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+检查 pre-commit 是否配置成功，并安装 `.pre-commit-config.yaml` 中的钩子：
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+```{note}
+如果你是中国用户，由于网络原因，可能会出现安装失败的情况，这时可以使用国内源
+
+pre-commit install -c .pre-commit-config-zh-cn.yaml
+
+pre-commit run --all-files -c .pre-commit-config-zh-cn.yaml
+```
+
+如果安装过程被中断，可以重复执行 `pre-commit run ...` 继续安装。
+
+如果提交的代码不符合代码风格规范，pre-commit 会发出警告，并自动修复部分错误。
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+如果我们想临时绕开 pre-commit 的检查提交一次代码，可以在 `git commit` 时加上 `--no-verify`（需要保证最后推送至远程仓库的代码能够通过 pre-commit 检查）。
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. 创建开发分支
+
+安装完 pre-commit 之后，我们需要基于 master 创建开发分支，建议的分支命名规则为 `username/pr_name`。
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+在后续的开发中，如果本地仓库的 master 分支落后于 upstream 的 master 分支，我们需要先拉取 upstream 的代码进行同步，再执行上面的命令
+
+```shell
+git pull upstream master
+```
+
+#### 4. 提交代码并在本地通过单元测试
+
+- MMCV 引入了 mypy 来做静态类型检查，以增加代码的鲁棒性。因此我们在提交代码时，需要补充 Type Hints。具体规则可以参考[教程](https://zhuanlan.zhihu.com/p/519335398)。
+
+- 提交的代码同样需要通过单元测试
+
+  ```shell
+  # 通过全量单元测试
+  pytest tests
+
+  # 我们需要保证提交的代码能够通过修改模块的单元测试，以 runner 为例
+  pytest tests/test_runner/test_runner.py
+  ```
+
+  如果你由于缺少依赖无法运行修改模块的单元测试，可以参考[指引-单元测试](#单元测试)
+
+- 如果修改/添加了文档，参考[指引](#文档渲染)确认文档渲染正常。
+
+#### 5. 推送代码到远程
+
+代码通过单元测试和 pre-commit 检查后，将代码推送到远程仓库，如果是第一次推送，可以在 `git push` 后加上 `-u` 参数以关联远程分支
+
+```shell
+git push -u origin {branch_name}
+```
+
+这样下次就可以直接使用 `git push` 命令推送代码了，而无需指定分支和远程仓库。
+
+#### 6. 提交拉取请求（PR）
+
+(1) 在 GitHub 的 Pull request 界面创建拉取请求
+<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">
+
+(2) 根据指引修改 PR 描述，以便于其他开发者更好地理解你的修改
+
+<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">
+
+描述规范详见[拉取请求规范](#拉取请求规范)
+
+&#160;
+
+**注意事项**
+
+(a) PR 描述应该包含修改理由、修改内容以及修改后带来的影响，并关联相关 Issue（具体方式见[文档](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)）
+
+(b) 如果是第一次为 OpenMMLab 做贡献，需要签署 CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) 检查提交的 PR 是否通过 CI（集成测试）
+
+<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">
+
+MMCV 会在不同的平台（Linux、Window、Mac），基于不同版本的 Python、PyTorch、CUDA 对提交的代码进行单元测试，以保证代码的正确性，如果有任何一个没有通过，我们可点击上图中的 `Details` 来查看具体的测试信息，以便于我们修改代码。
+
+(3) 如果 PR 通过了 CI，那么就可以等待其他开发者的 review，并根据 reviewer 的意见，修改代码，并重复 [4](#4-提交代码并本地通过单元测试)-[5](#5-推送代码到远程) 步骤，直到 reviewer 同意合入 PR。
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+所有 reviewer 同意合入 PR 后，我们会尽快将 PR 合并到主分支。
+
+#### 7. 解决冲突
+
+随着时间的推移，我们的代码库会不断更新，这时候，如果你的 PR 与主分支存在冲突，你需要解决冲突，解决冲突的方式有两种：
+
+```shell
+git fetch --all --prune
+git rebase upstream/master
+```
+
+或者
+
+```shell
+git fetch --all --prune
+git merge upstream/master
+```
+
+如果你非常善于处理冲突，那么可以使用 rebase 的方式来解决冲突，因为这能够保证你的 commit log 的整洁。如果你不太熟悉 `rebase` 的使用，那么可以使用 `merge` 的方式来解决冲突。
+
+### 指引
+
+#### 单元测试
+
+如果你无法正常执行部分模块的单元测试，例如 [video](https://github.com/open-mmlab/mmcv/tree/master/mmcv/video) 模块，可能是你的当前环境没有安装以下依赖
+
+```shell
+# Linux
+sudo apt-get update -y
+sudo apt-get install -y libturbojpeg
+sudo apt-get install -y ffmpeg
+
+# Windows
+conda install ffmpeg
+```
+
+在提交修复代码错误或新增特性的拉取请求时，我们应该尽可能的让单元测试覆盖所有提交的代码，计算单元测试覆盖率的方法如下
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### 文档渲染
+
+在提交修复代码错误或新增特性的拉取请求时，可能会需要修改/新增模块的 docstring。我们需要确认渲染后的文档样式是正确的。
+本地生成渲染后的文档的方法如下
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### 代码风格
+
+#### Python
+
+[PEP8](https://www.python.org/dev/peps/pep-0008/) 作为 OpenMMLab 算法库首选的代码规范，我们使用以下工具检查和格式化代码
+
+- [flake8](https://github.com/PyCQA/flake8): Python 官方发布的代码规范检查工具，是多个检查工具的封装
+- [isort](https://github.com/timothycrosley/isort): 自动调整模块导入顺序的工具
+- [yapf](https://github.com/google/yapf): Google 发布的代码规范检查工具
+- [codespell](https://github.com/codespell-project/codespell): 检查单词拼写是否有误
+- [mdformat](https://github.com/executablebooks/mdformat): 检查 markdown 文件的工具
+- [docformatter](https://github.com/myint/docformatter): 格式化 docstring 的工具
+
+yapf 和 isort 的配置可以在 [setup.cfg](./setup.cfg) 找到
+
+通过配置 [pre-commit hook](https://pre-commit.com/) ，我们可以在提交代码时自动检查和格式化 `flake8`、`yapf`、`isort`、`trailing whitespaces`、`markdown files`，
+修复 `end-of-files`、`float-quoted-strings`、`python-encoding-pragma`、`mixed-line-ending`，调整 `requirments.txt` 的包顺序。
+pre-commit 钩子的配置可以在 [.pre-commit-config](./.pre-commit-config.yaml) 找到。
+
+pre-commit 具体的安装使用方式见[拉取请求](#2-配置-pre-commit)。
+
+更具体的规范请参考 [OpenMMLab 代码规范](code_style.md)。
+
+#### C++ and CUDA
+
+C++ 和 CUDA 的代码规范遵从 [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)
+
+### 拉取请求规范
+
+1. 使用 [pre-commit hook](https://pre-commit.com)，尽量减少代码风格相关问题
+
+2. 一个`拉取请求`对应一个短期分支
+
+3. 粒度要细，一个`拉取请求`只做一件事情，避免超大的`拉取请求`
+
+   - Bad：实现 Faster R-CNN
+   - Acceptable：给 Faster R-CNN 添加一个 box head
+   - Good：给 box head 增加一个参数来支持自定义的 conv 层数
+
+4. 每次 Commit 时需要提供清晰且有意义 commit 信息
+
+5. 提供清晰且有意义的`拉取请求`描述
+
+   - 标题写明白任务名称，一般格式:\[Prefix\] Short description of the pull request (Suffix)
+   - prefix: 新增功能 \[Feature\], 修 bug \[Fix\], 文档相关 \[Docs\], 开发中 \[WIP\] (暂时不会被review)
+   - 描述里介绍`拉取请求`的主要修改内容，结果，以及对其他部分的影响, 参考`拉取请求`模板
+   - 关联相关的`议题` (issue) 和其他`拉取请求`
+
+6. 如果引入了其他三方库，或借鉴了三方库的代码，请确认他们的许可证和 mmcv 兼容，并在借鉴的代码上补充 `This code is inspired from http://`
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/pr.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/pr.md
new file mode 100644
index 0000000000000000000000000000000000000000..427fdf9e4965e404970c761676e7edd29e7b2e56
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/community/pr.md
@@ -0,0 +1,3 @@
+## 拉取请求
+
+本文档的内容已迁移到[贡献指南](contributing.md)。
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/compatibility.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/compatibility.md
new file mode 100644
index 0000000000000000000000000000000000000000..6bda56092751e4993533008ef0a751e34565e33e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/compatibility.md
@@ -0,0 +1,176 @@
+### v1.3.18
+
+部分自定义算子对于不同的设备有不同实现，为此添加的大量宏命令与类型检查使得代码变得难以维护。例如：
+
+```c++
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(rois);
+    CHECK_CUDA_INPUT(output);
+    CHECK_CUDA_INPUT(argmax_y);
+    CHECK_CUDA_INPUT(argmax_x);
+
+    roi_align_forward_cuda(input, rois, output, argmax_y, argmax_x,
+                           aligned_height, aligned_width, spatial_scale,
+                           sampling_ratio, pool_mode, aligned);
+#else
+    AT_ERROR("RoIAlign is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(rois);
+    CHECK_CPU_INPUT(output);
+    CHECK_CPU_INPUT(argmax_y);
+    CHECK_CPU_INPUT(argmax_x);
+    roi_align_forward_cpu(input, rois, output, argmax_y, argmax_x,
+                          aligned_height, aligned_width, spatial_scale,
+                          sampling_ratio, pool_mode, aligned);
+  }
+```
+
+为此我们设计了注册与分发的机制以更好的管理这些算子实现。
+
+```c++
+
+void ROIAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       int pool_mode, bool aligned);
+
+void roi_align_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignForwardCUDAKernelLauncher(
+      input, rois, output, argmax_y, argmax_x, aligned_height, aligned_width,
+      spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+// 注册算子的cuda实现
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, CUDA, roi_align_forward_cuda);
+
+// roi_align.cpp
+// 使用dispatcher根据参数中的Tensor device类型对实现进行分发
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_forward_impl, input, rois, output, argmax_y,
+                       argmax_x, aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, pool_mode, aligned);
+}
+
+```
+
+### v1.3.11
+
+为了灵活地支持更多的后端和硬件，例如 `NVIDIA GPUs` 、`AMD GPUs`，我们重构了 `mmcv/ops/csrc` 目录。注意，这次重构不会影响 API 的使用。更多相关信息，请参考 [PR1206](https://github.com/open-mmlab/mmcv/pull/1206)。
+
+原始的目录结构如下所示
+
+```
+.
+├── common_cuda_helper.hpp
+├── ops_cuda_kernel.cuh
+├── pytorch_cpp_helper.hpp
+├── pytorch_cuda_helper.hpp
+├── parrots_cpp_helper.hpp
+├── parrots_cuda_helper.hpp
+├── parrots_cudawarpfunction.cuh
+├── onnxruntime
+│   ├── onnxruntime_register.h
+│   ├── onnxruntime_session_options_config_keys.h
+│   ├── ort_mmcv_utils.h
+│   ├── ...
+│   ├── onnx_ops.h
+│   └── cpu
+│       ├── onnxruntime_register.cpp
+│       ├── ...
+│       └── onnx_ops_impl.cpp
+├── parrots
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_cuda.cu
+│   ├── ops_parrots.cpp
+│   └── ops_pytorch.h
+├── pytorch
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_cuda.cu
+│   ├── pybind.cpp
+└── tensorrt
+    ├── trt_cuda_helper.cuh
+    ├── trt_plugin_helper.hpp
+    ├── trt_plugin.hpp
+    ├── trt_serialize.hpp
+    ├── ...
+    ├── trt_ops.hpp
+    └── plugins
+        ├── trt_cuda_helper.cu
+        ├── trt_plugin.cpp
+        ├── ...
+        ├── trt_ops.cpp
+        └── trt_ops_kernel.cu
+```
+
+重构之后，它的结构如下所示
+
+```
+.
+├── common
+│   ├── box_iou_rotated_utils.hpp
+│   ├── parrots_cpp_helper.hpp
+│   ├── parrots_cuda_helper.hpp
+│   ├── pytorch_cpp_helper.hpp
+│   ├── pytorch_cuda_helper.hpp
+│   └── cuda
+│       ├── common_cuda_helper.hpp
+│       ├── parrots_cudawarpfunction.cuh
+│       ├── ...
+│       └── ops_cuda_kernel.cuh
+├── onnxruntime
+│   ├── onnxruntime_register.h
+│   ├── onnxruntime_session_options_config_keys.h
+│   ├── ort_mmcv_utils.h
+│   ├── ...
+│   ├── onnx_ops.h
+│   └── cpu
+│       ├── onnxruntime_register.cpp
+│       ├── ...
+│       └── onnx_ops_impl.cpp
+├── parrots
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_parrots.cpp
+│   └── ops_pytorch.h
+├── pytorch
+│   ├── info.cpp
+│   ├── pybind.cpp
+│   ├── ...
+│   ├── ops.cpp
+│   └── cuda
+│       ├── ...
+│       └── ops_cuda.cu
+└── tensorrt
+    ├── trt_cuda_helper.cuh
+    ├── trt_plugin_helper.hpp
+    ├── trt_plugin.hpp
+    ├── trt_serialize.hpp
+    ├── ...
+    ├── trt_ops.hpp
+    └── plugins
+        ├── trt_cuda_helper.cu
+        ├── trt_plugin.cpp
+        ├── ...
+        ├── trt_ops.cpp
+        └── trt_ops_kernel.cu
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/conf.py b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..7bfb9c23a726bb917761c725472d307e6d1d865a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/conf.py
@@ -0,0 +1,217 @@
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+import pytorch_sphinx_theme
+from sphinx.builders.html import StandaloneHTMLBuilder
+
+sys.path.insert(0, os.path.abspath('../..'))
+
+version_file = '../../mmcv/version.py'
+with open(version_file) as f:
+    exec(compile(f.read(), version_file, 'exec'))
+__version__ = locals()['__version__']
+
+# -- Project information -----------------------------------------------------
+
+project = 'mmcv'
+copyright = '2018-2022, OpenMMLab'
+author = 'MMCV Authors'
+
+# The short X.Y version
+version = __version__
+# The full version, including alpha/beta/rc tags
+release = __version__
+
+# -- General configuration ---------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+    'sphinx.ext.autosectionlabel',
+    'sphinx_markdown_tables',
+    'myst_parser',
+    'sphinx_copybutton',
+]  # yapf: disable
+
+myst_heading_anchors = 4
+
+myst_enable_extensions = ['colon_fence']
+
+# Configuration for intersphinx
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3', None),
+    'numpy': ('https://numpy.org/doc/stable', None),
+    'torch': ('https://pytorch.org/docs/stable/', None),
+    'mmengine': ('https://mmengine.readthedocs.io/en/latest', None),
+}
+
+autodoc_mock_imports = ['mmcv._ext', 'mmcv.utils.ext_loader', 'torchvision']
+autosectionlabel_prefix_document = True
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = {
+    '.rst': 'restructuredtext',
+    '.md': 'markdown',
+}
+
+# The master toctree document.
+master_doc = 'index'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = 'zh_CN'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'sphinx_rtd_theme'
+html_theme = 'pytorch_sphinx_theme'
+html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+html_theme_options = {
+    'menu': [
+        {
+            'name': 'GitHub',
+            'url': 'https://github.com/open-mmlab/mmcv'
+        },
+    ],
+    # Specify the language of shared menu
+    'menu_lang': 'cn',
+}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+html_css_files = ['css/readthedocs.css']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+
+# -- Options for HTMLHelp output ---------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'mmcvdoc'
+
+# -- Options for LaTeX output ------------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'mmcv.tex', 'mmcv Documentation', 'MMCV Contributors',
+     'manual'),
+]
+
+# -- Options for manual page output ------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [(master_doc, 'mmcv', 'mmcv Documentation', [author], 1)]
+
+# -- Options for Texinfo output ----------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'mmcv', 'mmcv Documentation', author, 'mmcv',
+     'One line description of project.', 'Miscellaneous'),
+]
+
+# -- Options for Epub output -------------------------------------------------
+
+# Bibliographic Dublin Core info.
+epub_title = project
+
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+
+# A unique identification for the text.
+#
+# epub_uid = ''
+
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+
+# set priority when building html
+StandaloneHTMLBuilder.supported_image_types = [
+    'image/svg+xml', 'image/gif', 'image/png', 'image/jpeg'
+]
+# -- Extension configuration -------------------------------------------------
+# Ignore >>> when copying code
+copybutton_prompt_text = r'>>> |\.\.\. '
+copybutton_prompt_is_regexp = True
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/docutils.conf b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/docutils.conf
new file mode 100644
index 0000000000000000000000000000000000000000..0c00c84688701117f231fd0c8ec295fb747b7d8f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/docutils.conf
@@ -0,0 +1,2 @@
+[html writers]
+table_style: colwidths-auto
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/faq.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/faq.md
new file mode 100644
index 0000000000000000000000000000000000000000..6cfb100c631b101fa0cff0650105a3cc7d735e7b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/faq.md
@@ -0,0 +1,91 @@
+## 常见问题
+
+在这里我们列出了用户经常遇到的问题以及对应的解决方法。如果您遇到了其他常见的问题，并且知道可以帮到大家的解决办法，
+欢迎随时丰富这个列表。
+
+### 安装问题
+
+- KeyError: "xxx: 'yyy is not in the zzz registry'"
+
+  只有模块所在的文件被导入时，注册机制才会被触发，所以您需要在某处导入该文件，更多详情请查看 [KeyError: "MaskRCNN: 'RefineRoIHead is not in the models registry'"](https://github.com/open-mmlab/mmdetection/issues/5974)。
+
+- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"
+
+  1. 使用 `pip uninstall mmcv` 卸载您环境中的 mmcv
+  2. 参考 [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) 或者 [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html) 安装 mmcv-full
+
+- "invalid device function" 或者 "no kernel image is available for execution"
+
+  1. 检查 GPU 的 CUDA 计算能力
+  2. 运行 `python mmdet/utils/collect_env.py` 来检查 PyTorch、torchvision 和 MMCV 是否是针对正确的 GPU 架构构建的，您可能需要去设置 `TORCH_CUDA_ARCH_LIST` 来重新安装 MMCV。兼容性问题可能会出现在使用旧版的 GPUs，如：colab 上的 Tesla K80 (3.7)
+  3. 检查运行环境是否和 mmcv/mmdet 编译时的环境相同。例如，您可能使用 CUDA 10.0 编译 mmcv，但在 CUDA 9.0 的环境中运行它
+
+- "undefined symbol" 或者 "cannot open xxx.so"
+
+  1. 如果符号和 CUDA/C++ 相关（例如：libcudart.so 或者 GLIBCXX），请检查 CUDA/GCC 运行时的版本是否和编译 mmcv 的一致
+  2. 如果符号和 PyTorch 相关（例如：符号包含 caffe、aten 和 TH），请检查 PyTorch 运行时的版本是否和编译 mmcv 的一致
+  3. 运行 `python mmdet/utils/collect_env.py` 以检查 PyTorch、torchvision 和 MMCV 构建和运行的环境是否相同
+
+- "RuntimeError: CUDA error: invalid configuration argument"
+
+  这个错误可能是由于您的 GPU 性能不佳造成的。尝试降低 [THREADS_PER_BLOCK](https://github.com/open-mmlab/mmcv/blob/cac22f8cf5a904477e3b5461b1cc36856c2793da/mmcv/ops/csrc/common_cuda_helper.hpp#L10)
+  的值并重新编译 mmcv。
+
+- "RuntimeError: nms is not compiled with GPU support"
+
+  这个错误是由于您的 CUDA 环境没有正确安装。
+  您可以尝试重新安装您的 CUDA 环境，然后删除 mmcv/build 文件夹并重新编译 mmcv。
+
+- "Segmentation fault"
+
+  1. 检查 GCC 的版本，通常是因为 PyTorch 版本与 GCC 版本不匹配 （例如 GCC \< 4.9 )，我们推荐用户使用 GCC 5.4，我们也不推荐使用 GCC 5.5， 因为有反馈 GCC 5.5 会导致 "segmentation fault" 并且切换到 GCC 5.4 就可以解决问题
+  2. 检查是否正确安装 CUDA 版本的 PyTorc。输入以下命令并检查是否返回 True
+     ```shell
+     python -c 'import torch; print(torch.cuda.is_available())'
+     ```
+  3. 如果 `torch` 安装成功，那么检查 MMCV 是否安装成功。输入以下命令，如果没有报错说明 mmcv-full 安装成。
+     ```shell
+     python -c 'import mmcv; import mmcv.ops'
+     ```
+  4. 如果 MMCV 与 PyTorch 都安装成功了，则可以使用 `ipdb` 设置断点或者使用 `print` 函数，分析是哪一部分的代码导致了 `segmentation fault`
+
+- "libtorch_cuda_cu.so: cannot open shared object file"
+
+  `mmcv-full` 依赖 `libtorch_cuda_cu.so` 文件，但程序运行时没能找到该文件。我们可以检查该文件是否存在 `~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib` 也可以尝试重装 PyTorch。
+
+- "fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version!"
+
+  如果您在 Windows 上编译 mmcv-full 并且 CUDA 的版本是 9.2，您很可能会遇到这个问题 `"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include\crt/host_config.h(133): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!"`，您可以尝试使用低版本的 Microsoft Visual Studio，例如 vs2017。
+
+- "error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized"
+
+  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.5.0，您很可能会遇到这个问题 `- torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized`。解决这个问题的方法是将 `torch/csrc/jit/api/module.h` 文件中所有 `static constexpr bool all_slots = false;` 替换为 `static bool all_slots = false;`。更多细节可以查看 [member "torch::jit::detail::AttributePolicy::all_slots" may not be initialized](https://github.com/pytorch/pytorch/issues/39394)。
+
+- "error: a member with an in-class initializer must be const"
+
+  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.6.0，您很可能会遇到这个问题 `"- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const"`. 解决这个问题的方法是将 `torch/include\torch/csrc/jit/api/module.h` 文件中的所有 `CONSTEXPR_EXCEPT_WIN_CUDA ` 替换为 `const`。更多细节可以查看 [Ninja: build stopped: subcommand failed](https://github.com/open-mmlab/mmcv/issues/575)。
+
+- "error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized"
+
+  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.7.0，您很可能会遇到这个问题 `torch/include\torch/csrc/jit/ir/ir.h(1347): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized`. 解决这个问题的方法是修改 PyTorch 中的几个文件：
+
+  - 删除 `torch/include\torch/csrc/jit/ir/ir.h` 文件中的 `static constexpr Symbol Kind = ::c10::prim::profile;` 和 `tatic constexpr Symbol Kind = ::c10::prim::profile_optional;`
+  - 将 `torch\include\pybind11\cast.h` 文件中的 `explicit operator type&() { return *(this->value); }` 替换为 `explicit operator type&() { return *((type*)this->value); }`
+  - 将 `torch/include\torch/csrc/jit/api/module.h` 文件中的 所有 `CONSTEXPR_EXCEPT_WIN_CUDA` 替换为 `const`
+
+  更多细节可以查看 [Ensure default extra_compile_args](https://github.com/pytorch/pytorch/pull/45956)。
+
+- MMCV 和 MMDetection 的兼容性问题；"ConvWS is already registered in conv layer"
+
+  请参考 [installation instruction](https://mmdetection.readthedocs.io/en/latest/get_started.html#installation) 为您的 MMDetection 版本安装正确版本的 MMCV。
+
+### 使用问题
+
+- "RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one"
+
+  1. 这个错误是因为有些参数没有参与 loss 的计算，可能是代码中存在多个分支，导致有些分支没有参与 loss 的计算。更多细节见 [Expected to have finished reduction in the prior iteration before starting a new one](https://github.com/pytorch/pytorch/issues/55582)。
+  2. 你可以设置 DDP 中的 `find_unused_parameters` 为 `True`，或者手动查找哪些参数没有用到。
+
+- "RuntimeError: Trying to backward through the graph a second time"
+
+  不能同时设置 `GradientCumulativeOptimizerHook` 和 `OptimizerHook`，这会导致 `loss.backward()` 被调用两次，于是程序抛出 `RuntimeError`。我们只需设置其中的一个。更多细节见 [Trying to backward through the graph a second time](https://github.com/open-mmlab/mmcv/issues/1379)。
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/article.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/article.md
new file mode 100644
index 0000000000000000000000000000000000000000..96768502cedb607d58ea2dc8d17b3dd8b9af20b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/article.md
@@ -0,0 +1,63 @@
+## 解读文章汇总
+
+这篇文章汇总了 [OpenMMLab](https://www.zhihu.com/people/openmmlab) 解读的部分文章（更多文章和视频见 [OpenMMLabCourse](https://github.com/open-mmlab/OpenMMLabCourse)），如果您有推荐的文章（不一定是 OpenMMLab 发布的文章，可以是自己写的文章），非常欢迎提 [Pull Request](http://127.0.0.1:5501/mmcv/docs/zh_cn/_build/html/community/pr.html) 添加到这里。
+
+### MMCV 解读文章
+
+#### 框架解读
+
+- [MMCV 核心组件分析(一)：整体概述](https://zhuanlan.zhihu.com/p/336081587)
+- [MMCV 核心组件分析(二)：FileHandler](https://zhuanlan.zhihu.com/p/336097883)
+- [MMCV 核心组件分析(三): FileClient](https://zhuanlan.zhihu.com/p/339190576)
+- [MMCV 核心组件分析(四): Config](https://zhuanlan.zhihu.com/p/346203167)
+- [MMCV 核心组件分析(五): Registry](https://zhuanlan.zhihu.com/p/355271993)
+- [MMCV 核心组件分析(六): Hook](https://zhuanlan.zhihu.com/p/355272220)
+- [MMCV 核心组件分析(七): Runner](https://zhuanlan.zhihu.com/p/355272459)
+- [MMCV Hook 食用指南](https://zhuanlan.zhihu.com/p/448600739)
+- [PyTorch & MMCV Dispatcher 机制解析](https://zhuanlan.zhihu.com/p/451671838)
+
+#### 工具解读
+
+- [训练可视化工具哪款是你的菜？MMCV一行代码随你挑](https://zhuanlan.zhihu.com/p/387078211)
+
+#### 安装指南
+
+- [久等了！Windows 平台 MMCV 的预编译包终于来了！](https://zhuanlan.zhihu.com/p/441653536)
+- [Windows 环境从零安装 mmcv-full](https://zhuanlan.zhihu.com/p/434491590)
+
+#### 知乎问答
+
+- [深度学习科研，如何高效进行代码和实验管理？](https://www.zhihu.com/question/269707221/answer/2480772257)
+- [深度学习方面的科研工作中的实验代码有什么规范和写作技巧？如何妥善管理实验数据？](https://www.zhihu.com/question/268193800/answer/2586000037)
+
+### 下游算法库解读文章
+
+- [MMDetection](https://mmdetection.readthedocs.io/zh_CN/latest/article.html)
+
+### PyTorch 解读文章
+
+- [PyTorch1.11 亮点一览：TorchData、functorch、DDP 静态图](https://zhuanlan.zhihu.com/p/486222256)
+- [PyTorch1.12 亮点一览：DataPipe + TorchArrow 新的数据加载与处理范式](https://zhuanlan.zhihu.com/p/537868554)
+- [PyTorch 源码解读之 nn.Module：核心网络模块接口详解](https://zhuanlan.zhihu.com/p/340453841)
+- [PyTorch 源码解读之 torch.autograd：梯度计算详解](https://zhuanlan.zhihu.com/p/321449610)
+- [PyTorch 源码解读之 torch.utils.data：解析数据处理全流程](https://zhuanlan.zhihu.com/p/337850513)
+- [PyTorch 源码解读之 torch.optim：优化算法接口详解](https://zhuanlan.zhihu.com/p/346205754)
+- [PyTorch 源码解读之 DP & DDP：模型并行和分布式训练解析](https://zhuanlan.zhihu.com/p/343951042)
+- [PyTorch 源码解读之 BN & SyncBN：BN 与 多卡同步 BN 详解](https://zhuanlan.zhihu.com/p/337732517)
+- [PyTorch 源码解读之 torch.cuda.amp: 自动混合精度详解](https://zhuanlan.zhihu.com/p/348554267)
+- [PyTorch 源码解读之 cpp_extension：揭秘 C++/CUDA 算子实现和调用全流程](https://zhuanlan.zhihu.com/p/348555597)
+- [PyTorch 源码解读之即时编译篇](https://zhuanlan.zhihu.com/p/361101354)
+- [PyTorch 源码解读之分布式训练了解一下？](https://zhuanlan.zhihu.com/p/361314953)
+- [PyTorch 源码解读之 torch.serialization & torch.hub](https://zhuanlan.zhihu.com/p/364239544)
+
+### 其他
+
+- [困扰我 48 小时的深拷贝，今天终于...](https://zhuanlan.zhihu.com/p/470892209)
+- [拿什么拯救我的 4G 显卡](https://zhuanlan.zhihu.com/p/430123077)
+- [是谁偷偷动了我的 logger](https://zhuanlan.zhihu.com/p/481383590)
+- [三句话，让 logger 言听计从](https://zhuanlan.zhihu.com/p/487524917)
+- [Logging 不为人知的二三事](https://zhuanlan.zhihu.com/p/502610682)
+- [Type Hints 入门教程，让代码更加规范整洁](https://zhuanlan.zhihu.com/p/519335398)
+- [手把手教你如何高效地在 MMCV 中贡献算子](https://zhuanlan.zhihu.com/p/464492627)
+- [OpenMMLab 支持 IPU 训练芯片](https://zhuanlan.zhihu.com/p/517527926)
+- [基于 MMCV 走上开源大佬之路？](https://zhuanlan.zhihu.com/p/391144979)
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/build.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/build.md
new file mode 100644
index 0000000000000000000000000000000000000000..95f611bc2e0e616f83de448567d404c2e420981a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/build.md
@@ -0,0 +1,300 @@
+## 从源码编译 MMCV
+
+### 编译 mmcv
+
+在编译 mmcv 之前，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://pytorch.org/get-started/locally/#start-locally)。可使用以下命令验证
+
+```bash
+python -c 'import torch;print(torch.__version__)'
+```
+
+:::{note}
+
+- 如果克隆代码仓库的速度过慢，可以使用以下命令克隆（注意：gitee 的 mmcv 不一定和 github 的保持一致，因为每天只同步一次）
+
+```bash
+git clone https://gitee.com/open-mmlab/mmcv.git
+```
+
+- 如果打算使用 `opencv-python-headless` 而不是 `opencv-python`，例如在一个很小的容器环境或者没有图形用户界面的服务器中，你可以先安装 `opencv-python-headless`，这样在安装 mmcv 依赖的过程中会跳过 `opencv-python`。
+
+- 如果编译过程安装依赖库的时间过长，可以[设置 pypi 源](https://mirrors.tuna.tsinghua.edu.cn/help/pypi/)
+
+```bash
+pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+:::
+
+#### 在 Linux 上编译 mmcv
+
+| TODO: 视频教程
+
+1. 克隆代码仓库
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. 安装 `ninja` 和 `psutil` 以加快编译速度
+
+   ```bash
+   pip install -r requirements/optional.txt
+   ```
+
+3. 检查 nvcc 的版本（要求大于等于 9.2，如果没有 GPU，可以跳过）
+
+   ```bash
+   nvcc --version
+   ```
+
+   上述命令如果输出以下信息，表示 nvcc 的设置没有问题，否则需要设置 CUDA_HOME
+
+   ```
+   nvcc: NVIDIA (R) Cuda compiler driver
+   Copyright (c) 2005-2020 NVIDIA Corporation
+   Built on Mon_Nov_30_19:08:53_PST_2020
+   Cuda compilation tools, release 11.2, V11.2.67
+   Build cuda_11.2.r11.2/compiler.29373293_0
+   ```
+
+   :::{note}
+   如果想要支持 ROCm，可以参考 [AMD ROCm](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html) 安装 ROCm。
+   :::
+
+4. 检查 gcc 的版本（要求大于等于**5.4**）
+
+   ```bash
+   gcc --version
+   ```
+
+5. 开始编译（预估耗时 10 分钟）
+
+   ```bash
+   pip install -e . -v
+   ```
+
+6. 验证安装
+
+   ```bash
+   python .dev_scripts/check_installation.py
+   ```
+
+   如果上述命令没有报错，说明安装成功。如有报错，请查看[问题解决页面](../faq.html)是否已经有解决方案。
+
+   如果没有找到解决方案，欢迎提 [issue](https://github.com/open-mmlab/mmcv/issues)。
+
+#### 在 macOS 上编译 mmcv
+
+| TODO: 视频教程
+
+```{note}
+如果你使用的是搭载 apple silicon 的 mac 设备，请安装 PyTorch 1.13+ 的版本，否则会遇到 [issues#2218](https://github.com/open-mmlab/mmcv/issues/2218) 中的问题。
+```
+
+1. 克隆代码仓库
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. 安装 `ninja` 和 `psutil` 以加快编译速度
+
+   ```bash
+   pip install -r requirements/optional.txt
+   ```
+
+3. 开始编译
+
+   ```bash
+   pip install -e .
+   ```
+
+4. 验证安装
+
+   ```bash
+   python .dev_scripts/check_installation.py
+   ```
+
+   如果上述命令没有报错，说明安装成功。如有报错，请查看[问题解决页面](../faq.md)是否已经有解决方案。
+
+   如果没有找到解决方案，欢迎提 [issue](https://github.com/open-mmlab/mmcv/issues)。
+
+#### 在 Windows 上编译 mmcv
+
+| TODO: 视频教程
+
+在 Windows 上编译 mmcv 比 Linux 复杂，本节将一步步介绍如何在 Windows 上编译 mmcv。
+
+##### 依赖项
+
+请先安装以下的依赖项：
+
+- [Git](https://git-scm.com/download/win)：安装期间，请选择 **add git to Path**
+- [Visual Studio Community 2019](https://visualstudio.microsoft.com)：用于编译 C++ 和 CUDA 代码
+- [Miniconda](https://docs.conda.io/en/latest/miniconda.html)：包管理工具
+- [CUDA 10.2](https://developer.nvidia.com/cuda-10.2-download-archive)：如果只需要 CPU 版本可以不安装 CUDA，安装 CUDA 时，可根据需要进行自定义安装。如果已经安装新版本的显卡驱动，建议取消驱动程序的安装
+
+```{note}
+如果不清楚如何安装以上依赖，请参考[Windows 环境从零安装 mmcv](https://zhuanlan.zhihu.com/p/434491590)。
+另外，你需要知道如何在 Windows 上设置变量环境，尤其是 "PATH" 的设置，以下安装过程都会用到。
+```
+
+##### 通用步骤
+
+1. 从 Windows 菜单启动 Anaconda 命令行
+
+   如 Miniconda 安装程序建议，不要使用原始的 `cmd.exe` 或是 `powershell.exe`。命令行有两个版本，一个基于 PowerShell，一个基于传统的 `cmd.exe`。请注意以下说明都是使用的基于 PowerShell
+
+2. 创建一个新的 Conda 环境
+
+   ```powershell
+   (base) PS C:\Users\xxx> conda create --name mmcv python=3.7
+   (base) PS C:\Users\xxx> conda activate mmcv  # 确保做任何操作前先激活环境
+   ```
+
+3. 安装 PyTorch 时，可以根据需要安装支持 CUDA 或不支持 CUDA 的版本
+
+   ```powershell
+   # CUDA version
+   (mmcv) PS C:\Users\xxx> conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
+   # CPU version
+   (mmcv) PS C:\Users\xxx> conda install install pytorch torchvision cpuonly -c pytorch
+   ```
+
+4. 克隆代码仓库
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx> git clone https://github.com/open-mmlab/mmcv.git
+   (mmcv) PS C:\Users\xxx> cd mmcv
+   ```
+
+5. 安装 `ninja` 和 `psutil` 以加快编译速度
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> pip install -r requirements/optional.txt
+   ```
+
+6. 设置 MSVC 编译器
+
+   设置环境变量。添加 `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\Hostx86\x64` 到 `PATH`，则 `cl.exe` 可以在命令行中运行，如下所示。
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> cl
+   Microsoft (R) C/C++ Optimizing  Compiler Version 19.27.29111 for x64
+   Copyright (C) Microsoft Corporation.   All rights reserved.
+
+   usage: cl [ option... ] filename... [ / link linkoption... ]
+   ```
+
+   为了兼容性，我们使用 x86-hosted 以及 x64-targeted 版本，即路径中的 `Hostx86\x64` 。
+
+   因为 PyTorch 将解析 `cl.exe` 的输出以检查其版本，只有 utf-8 将会被识别，你可能需要将系统语言更改为英语。控制面板 -> 地区-> 管理-> 非 Unicode 来进行语言转换。
+
+##### 编译与安装 mmcv
+
+mmcv 有两个版本：
+
+- 只包含 CPU 算子的版本
+
+  编译 CPU 算子，但只有 x86 将会被编译，并且编译版本只能在 CPU only 情况下运行
+
+- 既包含 CPU 算子，又包含 CUDA 算子的版本
+
+  同时编译 CPU 和 CUDA 算子，`ops` 模块的 x86 与 CUDA 的代码都可以被编译。同时编译的版本可以在 CUDA 上调用 GPU
+
+###### CPU 版本
+
+编译安装
+
+```powershell
+(mmcv) PS C:\Users\xxx\mmcv> python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
+(mmcv) PS C:\Users\xxx\mmcv> python setup.py develop  # 安装
+```
+
+###### GPU 版本
+
+1. 检查 `CUDA_PATH` 或者 `CUDA_HOME` 环境变量已经存在在 `envs` 之中
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> ls env:
+
+   Name                           Value
+   ----                           -----
+   CUDA_PATH                      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   CUDA_PATH_V10_1                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
+   CUDA_PATH_V10_2                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   ```
+
+   如果没有，你可以按照下面的步骤设置
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> $env:CUDA_HOME = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2"
+   # 或者
+   (mmcv) PS C:\Users\xxx\mmcv> $env:CUDA_HOME = $env:CUDA_PATH_V10_2  # CUDA_PATH_V10_2 已经在环境变量中
+   ```
+
+2. 设置 CUDA 的目标架构
+
+   ```powershell
+   # 这里需要改成你的显卡对应的目标架构
+   (mmcv) PS C:\Users\xxx\mmcv> $env:TORCH_CUDA_ARCH_LIST="7.5"
+   ```
+
+   :::{note}
+   可以点击 [cuda-gpus](https://developer.nvidia.com/cuda-gpus) 查看 GPU 的计算能力，也可以通过 CUDA 目录下的 deviceQuery.exe 工具查看
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> &"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\demo_suite\deviceQuery.exe"
+   Device 0: "NVIDIA GeForce GTX 1660 SUPER"
+   CUDA Driver Version / Runtime Version          11.7 / 11.1
+   CUDA Capability Major/Minor version number:    7.5
+   ```
+
+   上面的 7.5 表示目标架构。注意：需把上面命令的 v10.2 换成你的 CUDA 版本。
+   :::
+
+3. 编译安装
+
+   ```powershell
+   (mmcv) PS C:\Users\xxx\mmcv> python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
+   (mmcv) PS C:\Users\xxx\mmcv> python setup.py develop # 安装
+   ```
+
+   ```{note}
+   如果你的 PyTorch 版本是 1.6.0，你可能会遇到一些 [issue](https://github.com/pytorch/pytorch/issues/42467) 提到的错误，你可以参考这个 [pull request](https://github.com/pytorch/pytorch/pull/43380/files) 修改本地环境的 PyTorch 源代码
+   ```
+
+##### 验证安装
+
+```powershell
+(mmcv) PS C:\Users\xxx\mmcv> python .dev_scripts/check_installation.py
+```
+
+如果上述命令没有报错，说明安装成功。如有报错，请查看[问题解决页面](../faq.md)是否已经有解决方案。
+如果没有找到解决方案，欢迎提 [issue](https://github.com/open-mmlab/mmcv/issues)。
+
+### 编译 mmcv-lite
+
+如果你需要使用和 PyTorch 相关的模块，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://pytorch.org/get-started/locally/#start-locally)。
+
+1. 克隆代码仓库
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```
+
+2. 开始编译
+
+   ```bash
+   MMCV_WITH_OPS=0 pip install -e . -v
+   ```
+
+3. 验证安装
+
+   ```bash
+   python -c 'import mmcv;print(mmcv.__version__)'
+   ```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/installation.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/installation.md
new file mode 100644
index 0000000000000000000000000000000000000000..54cdbd9f3ab9c2694e78013f5b3a5841730c54a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/installation.md
@@ -0,0 +1,369 @@
+## 安装 MMCV
+
+MMCV 有两个版本：
+
+- **mmcv**: 完整版，包含所有的特性以及丰富的开箱即用的 CPU 和 CUDA 算子。注意，完整版本可能需要更长时间来编译。
+- **mmcv-lite**: 精简版，不包含 CPU 和 CUDA 算子但包含其余所有特性和功能，类似 MMCV 1.0 之前的版本。如果你不需要使用算子的话，精简版可以作为一个考虑选项。
+
+```{warning}
+请不要在同一个环境中安装两个版本，否则可能会遇到类似 `ModuleNotFound` 的错误。在安装一个版本之前，需要先卸载另一个。`如果 CUDA 可用，强烈推荐安装 mmcv`。
+```
+
+### 安装 mmcv
+
+在安装 mmcv 之前，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://pytorch.org/get-started/locally/#start-locally)。可使用以下命令验证
+
+```bash
+python -c 'import torch;print(torch.__version__)'
+```
+
+如果输出版本信息，则表示 PyTorch 已安装。
+
+#### 使用 mim 安装（推荐）
+
+[mim](https://github.com/open-mmlab/mim) 是 OpenMMLab 项目的包管理工具，使用它可以很方便地安装 mmcv。
+
+```bash
+pip install -U openmim
+mim install "mmcv>=2.0.0rc1"
+```
+
+如果发现上述的安装命令没有使用预编译包（以 `.whl` 结尾）而是使用源码包（以 `.tar.gz` 结尾）安装，则有可能是我们没有提供和当前环境的 PyTorch 版本、CUDA 版本相匹配的 mmcv 预编译包，此时，你可以[源码安装 mmcv](build.md)。
+
+<details>
+<summary>使用预编译包的安装日志</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv<br />
+<b>Downloading https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/mmcv-2.0.0rc3-cp38-cp38-manylinux1_x86_64.whl</b>
+
+</details>
+
+<details>
+<summary>使用源码包的安装日志</summary>
+
+Looking in links: https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html<br />
+Collecting mmcv==2.0.0rc3<br />
+<b>Downloading mmcv-2.0.0rc3.tar.gz</b>
+
+</details>
+
+如需安装指定版本的 mmcv，例如安装 2.0.0rc3 版本的 mmcv，可使用以下命令
+
+```bash
+mim install mmcv==2.0.0rc3
+```
+
+:::{note}
+如果你打算使用 `opencv-python-headless` 而不是 `opencv-python`，例如在一个很小的容器环境或者没有图形用户界面的服务器中，你可以先安装 `opencv-python-headless`，这样在安装 mmcv 依赖的过程中会跳过 `opencv-python`。
+
+另外，如果安装依赖库的时间过长，可以指定 pypi 源
+
+```bash
+mim install "mmcv>=2.0.0rc1" -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+:::
+
+安装完成后可以运行 [check_installation.py](https://github.com/open-mmlab/mmcv/blob/2.x/.dev_scripts/check_installation.py) 脚本检查 mmcv 是否安装成功。
+
+#### 使用 pip 安装
+
+使用以下命令查看 CUDA 和 PyTorch 的版本
+
+```bash
+python -c 'import torch;print(torch.__version__);print(torch.version.cuda)'
+```
+
+根据系统的类型、CUDA 版本、PyTorch 版本以及 MMCV 版本选择相应的安装命令
+
+<html>
+<body>
+    <style>
+      select {
+          z-index: 1000;
+          position: absolute;
+          top: 10px;
+          width: 6.7rem;
+      }
+      #select-container {
+          position: relative;
+          height: 30px;
+      }
+      #select-cmd {
+          background-color: #f5f6f7;
+          font-size: 14px;
+          margin-top: 20px;
+      }
+      /* 让每一个都间隔1.3rem */
+      #select-os {
+          /* left: 1.375rem; */
+          left: 0;
+      }
+      #select-cuda {
+          /* left: 9.375rem;    9.375 = 1.375 + 6.7 + 1.3 */
+          left: 8rem;
+      }
+      #select-torch {
+          /* left: 17.375rem;    17.375 = 9.375 + 6.7 + 1.3 */
+          left: 16rem;
+      }
+      #select-mmcv {
+          /* left: 25.375rem;    25.375 = 17.375 + 6.7 + 1.3 */
+          left: 24rem;
+      }
+    </style>
+    <div id="select-container">
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeOS(this.value)"
+            id="select-os">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeCUDA(this.value)"
+            id="select-cuda">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeTorch(this.value)"
+            id="select-torch">
+        </select>
+        <select
+            onmousedown="handleSelectMouseDown(this.id)"
+            onblur="handleSelectBlur(this.id)"
+            onchange="changeMMCV(this.value)"
+            id="select-mmcv">
+        </select>
+    </div>
+    <pre id="select-cmd"></pre>
+</body>
+<script>
+    // 各个select当前的值
+    let osVal, cudaVal, torchVal, mmcvVal;
+    function changeMMCV(val) {
+        mmcvVal = val;
+        change("select-mmcv");
+    }
+    function changeTorch(val) {
+        torchVal = val;
+        change("select-torch");
+    }
+    function changeCUDA(val) {
+        cudaVal = val;
+        change("select-cuda");
+    }
+    function changeOS(val) {
+        osVal = val;
+        change("select-os");
+    }
+    // 控制size大小相关的几个方法
+    function handleSelectMouseDown(id) {
+        const dom = document.getElementById(id);
+        if (!dom) return;
+        const len = dom?.options?.length;
+        if (len >= 9) {
+            dom.size = 10;
+            dom.style.zIndex = 100;
+        }
+    }
+    function handleSelectClick() {
+        const selects = Array.from(document.getElementsByTagName("select"));
+        selects.forEach(select => {
+            select.size = 1;
+        });
+    }
+    function handleSelectBlur(id) {
+        const dom = document.getElementById(id);
+        if (!dom) {
+            // 如果没有指定特定的id，那就直接把所有的select都设置成size = 1
+            handleSelectClick();
+            return;
+        }
+        dom.size = 1;
+        dom.style.zIndex = 1;
+    }
+    function changeCmd() {
+        const cmd = document.getElementById("select-cmd");
+        let cmdString = "pip install mmcv=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html";
+        // e.g: pip install mmcv==2.0.0rc1 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9/index.html
+        let cudaVersion;
+        if (cudaVal === "cpu" || cudaVal === "mps") {
+            cudaVersion = "cpu";
+        } else {
+            cudaVersion = `cu${cudaVal.split(".").join("")}`;
+        }
+        const torchVersion = `torch${torchVal.substring(0, torchVal.length - 2)}`;
+        cmdString = cmdString.replace("{cu_version}", cudaVersion).replace("{mmcv_version}", mmcvVal).replace("{torch_version}", torchVersion);
+        cmd.textContent = cmdString;
+    }
+    // string数组去重
+    function unique(arr) {
+        if (!arr || !Array.isArray(arr)) return [];
+        return [...new Set(arr)];
+    }
+    // 根据string数组生成option的DocumentFragment
+    function genOptionFragment(data, id) {
+        const name = id.includes("-")? id.split("-")[1] : id;
+        const fragment = new DocumentFragment();
+        data.forEach(option => {
+            const ele = document.createElement("option");
+            let text = `${name} ${option}`;
+            if (name === "os" || option.toUpperCase() === "CPU" || option.toUpperCase() === "MPS") {
+                text = `${option}`;
+            }
+            ele.textContent = text;
+            // 添加value属性，方便下拉框选择时直接读到数据
+            ele.value = option;
+            // 添加点击事件监听
+            ele.addEventListener('click', handleSelectClick);
+            fragment.appendChild(ele);
+        });
+        return fragment;
+    }
+    // 在dom树中找到id对应的dom（select元素），并将生成的options添加到元素内
+    function findAndAppend(data, id) {
+        const fragment = genOptionFragment(data, id);
+        const dom = document.getElementById(id);
+        if (dom) dom.replaceChildren(fragment);
+    }
+    /**
+     * change方法的重点在于
+     * 1. 各个下拉框数据的联动
+     *      OS ==> cuda ==> torch ==> mmcv
+     * 2. 命令行的修改
+    */
+    function change(id) {
+        const order = ["select-mmcv", "select-torch", "select-cuda", "select-os"];
+        const idx = order.indexOf(id);
+        if (idx === -1) return;
+        const versionDetail = version[osVal];
+        if (idx >= 3) {
+            // 根据os修改cuda
+            let cuda = [];
+            versionDetail.forEach(v => {
+                cuda.push(v.cuda);
+            });
+            cuda = unique(cuda);
+            cudaVal = cuda[0];
+            findAndAppend(cuda, "select-cuda");
+        }
+        if (idx >= 2) {
+            // 根据cuda修改torch
+            const torch = [];
+            versionDetail.forEach(v => {
+                if (v.cuda === cudaVal) torch.push(v.torch);
+            });
+            torchVal = torch[0];
+            findAndAppend(torch, "select-torch");
+        }
+        if (idx >= 1) {
+            // 根据torch修改mmcv
+            let mmcv = [];
+            versionDetail.forEach(v => {
+                if (v.cuda === cudaVal && v.torch === torchVal) mmcv = v.mmcv;
+            });
+            mmcvVal = mmcv[0];
+            findAndAppend(mmcv, "select-mmcv");
+        }
+        changeCmd();
+    }
+    // 初始化，处理version数据，并调用findAndAppend
+    function init() {
+        // 增加一个全局的click事件监听，作为select onBlur事件失效的兜底
+        document.addEventListener("click", handleSelectBlur);
+        const version = window.version;
+        // OS
+        const os = Object.keys(version);
+        osVal = os[0];
+        findAndAppend(os, "select-os");
+        change("select-os");
+        changeCmd();
+    }
+    // 利用xhr获取本地version数据，如果作为html直接浏览的话需要使用本地服务器打开，否则会有跨域问题
+    window.onload = function () {
+        const url = "../_static/version.json"
+        // 申明一个XMLHttpRequest
+        const request = new XMLHttpRequest();
+        // 设置请求方法与路径
+        request.open("get", url);
+        // 不发送数据到服务器
+        request.send(null);
+        //XHR对象获取到返回信息后执行
+        request.onload = function () {
+            // 返回状态为200，即为数据获取成功
+            if (request.status !== 200) return;
+            const data = JSON.parse(request.responseText);
+            window.version = data;
+            init();
+        }
+    }
+</script>
+</html>
+
+如果在上面的下拉框中没有找到对应的版本，则可能是没有对应 PyTorch 或者 CUDA 或者 mmcv 版本的预编译包，此时，你可以[源码安装 mmcv](build.md)。
+
+:::{note}
+PyTorch 在 1.x.0 和 1.x.1 之间通常是兼容的，故 mmcv 只提供 1.x.0 的编译包。如果你
+的 PyTorch 版本是 1.x.1，你可以放心地安装在 1.x.0 版本编译的 mmcv。例如，如果你的
+PyTorch 版本是 1.8.1，你可以放心选择 1.8.x。
+:::
+
+:::{note}
+如果你打算使用 `opencv-python-headless` 而不是 `opencv-python`，例如在一个很小的容器环境或者没有图形用户界面的服务器中，你可以先安装 `opencv-python-headless`，这样在安装 mmcv 依赖的过程中会跳过 `opencv-python`。
+
+另外，如果安装依赖库的时间过长，可以指定 pypi 源
+
+```bash
+pip install "mmcv>=2.0.0rc1" -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+:::
+
+安装完成后可以运行 [check_installation.py](https://github.com/open-mmlab/mmcv/blob/2.x/.dev_scripts/check_installation.py) 脚本检查 mmcv 是否安装成功。
+
+#### 使用 docker 镜像
+
+先将算法库克隆到本地再构建镜像
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git && cd mmcv
+docker build -t mmcv -f docker/release/Dockerfile .
+```
+
+也可以直接使用下面的命令构建镜像
+
+```bash
+docker build -t mmcv https://github.com/open-mmlab/mmcv.git#2.x:docker/release
+```
+
+[Dockerfile](release/Dockerfile) 默认安装最新的 mmcv，如果你想要指定版本，可以使用下面的命令
+
+```bash
+docker image build -t mmcv -f docker/release/Dockerfile --build-arg MMCV=2.0.0rc1 .
+```
+
+如果你想要使用其他版本的 PyTorch 和 CUDA，你可以在构建镜像时指定它们的版本。
+
+例如指定 PyTorch 的版本是 1.11，CUDA 的版本是 11.3
+
+```bash
+docker build -t mmcv -f docker/release/Dockerfile \
+    --build-arg PYTORCH=1.11.0 \
+    --build-arg CUDA=11.3 \
+    --build-arg CUDNN=8 \
+    --build-arg MMCV=2.0.0rc1 .
+```
+
+更多 PyTorch 和 CUDA 镜像可以点击 [dockerhub/pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) 查看。
+
+### 安装 mmcv-lite
+
+如果你需要使用和 PyTorch 相关的模块，请确保 PyTorch 已经成功安装在环境中，可以参考 [PyTorch 官方安装文档](https://pytorch.org/get-started/locally/#start-locally)。
+
+```python
+pip install mmcv-lite
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/introduction.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/introduction.md
new file mode 100644
index 0000000000000000000000000000000000000000..4c735b94d3db71e484d04794fb5509cabbed68a9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/introduction.md
@@ -0,0 +1,36 @@
+## 介绍 MMCV
+
+MMCV 是一个面向计算机视觉的基础库，它提供了以下功能：
+
+- [图像和视频处理](../understand_mmcv/data_process.md)
+- [图像和标注结果可视化](../understand_mmcv/visualization.md)
+- [图像变换](../understand_mmcv/data_transform.md)
+- [多种 CNN 网络结构](../understand_mmcv/cnn.md)
+- [高质量实现的常见 CUDA 算子](../understand_mmcv/ops.md)
+
+MMCV 支持多种平台，包括：
+
+- Linux
+- Windows
+- macOS
+
+它支持的 OpenMMLab 项目：
+
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab 目标检测工具箱
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab 新一代通用 3D 目标检测平台
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab 旋转框检测工具箱与测试基准
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO 系列工具箱与测试基准
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab 语义分割工具箱
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab 全流程文字检测识别理解工具箱
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab 姿态估计工具箱
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 人体参数化模型工具箱与测试基准
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab 自监督学习工具箱与测试基准
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab 模型压缩工具箱与测试基准
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab 少样本学习工具箱与测试基准
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab 新一代视频理解工具箱
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab 一体化视频目标感知平台
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab 光流估计工具箱与测试基准
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab 图像视频编辑工具箱
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab 图片视频生成模型工具箱
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/previous_versions.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/previous_versions.md
new file mode 100644
index 0000000000000000000000000000000000000000..d543818752b51985169d4489bd46708725ce422d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/get_started/previous_versions.md
@@ -0,0 +1,47 @@
+## 其他版本的 PyTorch
+
+我们不再提供在较低的 `PyTorch` 版本下编译的 `mmcv-full` 包，但为了您的方便，您可以在下面找到它们。
+
+### PyTorch 1.4
+
+| 1.0.0 \<= mmcv_version \<= 1.2.1
+
+#### CUDA 10.1
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.4.0/index.html
+```
+
+#### CUDA 9.2
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.4.0/index.html
+```
+
+#### CPU
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.4.0/index.html
+```
+
+### PyTorch v1.3
+
+| 1.0.0 \<= mmcv_version \<= 1.3.16
+
+#### CUDA 10.1
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.3.0/index.html
+```
+
+#### CUDA 9.2
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.3.0/index.html
+```
+
+#### CPU
+
+```bash
+pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cpu/torch1.3.0/index.html
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/index.rst b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..98cf08890618e699c7ac4731093818a07e862362
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/index.rst
@@ -0,0 +1,66 @@
+欢迎来到 MMCV 的中文文档！
+=============================
+
+您可以在页面左下角切换中英文文档。
+
+.. toctree::
+   :maxdepth: 2
+   :caption: 介绍与安装
+
+   get_started/introduction.md
+   get_started/installation.md
+   get_started/build.md
+   get_started/article.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: 深入理解 MMCV
+
+   understand_mmcv/data_process.md
+   understand_mmcv/data_transform.md
+   understand_mmcv/visualization.md
+   understand_mmcv/cnn.md
+   understand_mmcv/ops.md
+
+.. toctree::
+   :caption: 语言切换
+
+   switch_language.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: 兼容性
+
+   compatibility.md
+
+.. toctree::
+
+   faq.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: 社区
+
+   community/contributing.md
+   community/pr.md
+   community/code_style.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: API 文档
+
+   mmcv.image <api/image>
+   mmcv.video <api/video>
+   mmcv.visualization <api/visualization>
+   mmcv.cnn <api/cnn>
+   mmcv.ops <api/ops>
+   mmcv.transforms <api/transforms>
+   mmcv.arraymisc <api/arraymisc>
+   mmcv.utils <api/utils>
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/make.bat b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/make.bat
new file mode 100644
index 0000000000000000000000000000000000000000..7893348a1b7dbb588983a48e6991282eae7e1b55
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+
+:end
+popd
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/mmcv-logo.png b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/mmcv-logo.png
new file mode 120000
index 0000000000000000000000000000000000000000..7dcca035f61762b204842bfe5a12aa990c97e8eb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/mmcv-logo.png
@@ -0,0 +1 @@
+../docs/mmcv-logo.png
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/switch_language.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/switch_language.md
new file mode 100644
index 0000000000000000000000000000000000000000..e4ac4b229ad520f142243f3a918748c542e9989f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/switch_language.md
@@ -0,0 +1,3 @@
+## <a href='https://mmcv.readthedocs.io/en/2.x/'>English</a>
+
+## <a href='https://mmcv.readthedocs.io/zh_CN/2.x/'>简体中文</a>
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/cnn.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/cnn.md
new file mode 100644
index 0000000000000000000000000000000000000000..1f910419b3c212faed2ec6926fa316600a846232
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/cnn.md
@@ -0,0 +1,114 @@
+## 卷积神经网络
+
+我们为卷积神经网络提供了一些构建模块，包括层构建、模块组件和权重初始化。
+
+### 网络层的构建
+
+在运行实验时，我们可能需要尝试同属一种类型但不同配置的层，但又不希望每次都修改代码。于是我们提供一些层构建方法，可以从字典构建层，字典可以在配置文件中配置，也可以通过命令行参数指定。
+
+#### 用法
+
+一个简单的例子：
+
+```python
+from mmcv.cnn import build_conv_layer
+
+cfg = dict(type='Conv3d')
+layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
+```
+
+- `build_conv_layer`: 支持的类型包括 Conv1d、Conv2d、Conv3d、Conv (Conv是Conv2d的别名）
+- `build_norm_layer`: 支持的类型包括 BN1d、BN2d、BN3d、BN (alias for BN2d)、SyncBN、GN、LN、IN1d、IN2d、IN3d、IN（IN是IN2d的别名）
+- `build_activation_layer`：支持的类型包括 ReLU、LeakyReLU、PReLU、RReLU、ReLU6、ELU、Sigmoid、Tanh、GELU
+- `build_upsample_layer`: 支持的类型包括 nearest、bilinear、deconv、pixel_shuffle
+- `build_padding_layer`: 支持的类型包括 zero、reflect、replicate
+
+#### 拓展
+
+我们还允许自定义层和算子来扩展构建方法。
+
+1. 编写和注册自己的模块：
+
+   ```python
+   from mmengine.registry import MODELS
+
+   @MODELS.register_module()
+   class MyUpsample:
+
+       def __init__(self, scale_factor):
+           pass
+
+       def forward(self, x):
+           pass
+   ```
+
+2. 在某处导入 `MyUpsample` （例如 `__init__.py` ）然后使用它：
+
+   ```python
+   from mmcv.cnn import build_upsample_layer
+
+   cfg = dict(type='MyUpsample', scale_factor=2)
+   layer = build_upsample_layer(cfg)
+   ```
+
+### 模块组件
+
+我们还提供了常用的模块组件，以方便网络构建。
+卷积组件 `ConvModule` 由 convolution、normalization以及activation layers 组成，更多细节请参考 [ConvModule api](api.html#mmcv.cnn.ConvModule)。
+
+```python
+from mmcv.cnn import ConvModule
+
+# conv + bn + relu
+conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+# conv + gn + relu
+conv = ConvModule(3, 8, 2, norm_cfg=dict(type='GN', num_groups=2))
+# conv + relu
+conv = ConvModule(3, 8, 2)
+# conv
+conv = ConvModule(3, 8, 2, act_cfg=None)
+# conv + leaky relu
+conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='LeakyReLU'))
+# bn + conv + relu
+conv = ConvModule(
+    3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
+```
+
+### Model Zoo
+
+除了`torchvision`的预训练模型，我们还提供以下 CNN 的预训练模型：
+
+- VGG Caffe
+- ResNet Caffe
+- ResNeXt
+- ResNet with Group Normalization
+- ResNet with Group Normalization and Weight Standardization
+- HRNetV2
+- Res2Net
+- RegNet
+
+#### Model URLs in JSON
+
+MMCV中的Model Zoo Link 由 JSON 文件管理。 json 文件由模型名称及其url或path的键值对组成,一个json文件可能类似于:
+
+```json
+{
+    "model_a": "https://example.com/models/model_a_9e5bac.pth",
+    "model_b": "pretrain/model_b_ab3ef2c.pth"
+}
+```
+
+可以在[此处](https://github.com/open-mmlab/mmcv/blob/master/mmcv/model_zoo/open_mmlab.json)找到托管在 OpenMMLab AWS 上的预训练模型的默认链接。
+
+你可以通过将 `open-mmlab.json` 放在 `MMCV_HOME`下来覆盖默认链接，如果在环境中找不到`MMCV_HOME`，则默认使用 `~/.cache/mmcv`。当然你也可以使用命令 `export MMCV_HOME=/your/path`来设置自己的路径。
+
+外部的json文件将被合并为默认文件，如果相同的键出现在外部`json`和默认`json`中，则将使用外部`json`。
+
+#### Load Checkpoint
+
+`mmcv.load_checkpoint()`的参数`filename`支持以下类型：
+
+- filepath: `checkpoint`路径
+- `http://xxx` and `https://xxx`: 下载checkpoint的链接，文件名中必需包含`SHA256`后缀
+- `torchvision://xxx`: `torchvision.models`中的模型链接，更多细节参考 [torchvision](https://pytorch.org/docs/stable/torchvision/models.html)
+- `open-mmlab://xxx`: 默认和其他 json 文件中提供的模型链接或文件路径
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_process.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_process.md
new file mode 100644
index 0000000000000000000000000000000000000000..7e0afd1e690b51d43d6e5b88cfa198dee32eb3d2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_process.md
@@ -0,0 +1,275 @@
+## 数据处理
+
+### 图像
+
+图像模块提供了一些图像预处理的函数，该模块依赖 `opencv` 。
+
+#### 读取/保存/显示
+
+使用 `imread` 和 `imwrite` 函数可以读取和保存图像。
+
+```python
+import mmcv
+
+img = mmcv.imread('test.jpg')
+img = mmcv.imread('test.jpg', flag='grayscale')
+img_ = mmcv.imread(img)  # 相当于什么也没做
+mmcv.imwrite(img, 'out.jpg')
+```
+
+从二进制中读取图像
+
+```python
+with open('test.jpg', 'rb') as f:
+    data = f.read()
+img = mmcv.imfrombytes(data)
+```
+
+显示图像文件或已读取的图像
+
+```python
+mmcv.imshow('tests/data/color.jpg')
+
+for i in range(10):
+    img = np.random.randint(256, size=(100, 100, 3), dtype=np.uint8)
+    mmcv.imshow(img, win_name='test image', wait_time=200)
+```
+
+#### 色彩空间转换
+
+支持的转换函数：
+
+- bgr2gray
+- gray2bgr
+- bgr2rgb
+- rgb2bgr
+- bgr2hsv
+- hsv2bgr
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+img1 = mmcv.bgr2rgb(img)
+img2 = mmcv.rgb2gray(img1)
+img3 = mmcv.bgr2hsv(img)
+```
+
+#### 缩放
+
+有三种缩放图像的方法。所有以 `imresize_*` 开头的函数都有一个 `return_scale` 参数，如果
+该参数为 `False` ，函数的返回值只有调整之后的图像，否则是一个元组 `(resized_img, scale)` 。
+
+```python
+# 缩放图像至给定的尺寸
+mmcv.imresize(img, (1000, 600), return_scale=True)
+
+# 缩放图像至与给定的图像同样的尺寸
+mmcv.imresize_like(img, dst_img, return_scale=False)
+
+# 以一定的比例缩放图像
+mmcv.imrescale(img, 0.5)
+
+# 缩放图像至最长的边不大于1000、最短的边不大于800并且没有改变图像的长宽比
+mmcv.imrescale(img, (1000, 800))
+```
+
+#### 旋转
+
+我们可以使用 `imrotate` 旋转图像一定的角度。旋转的中心需要指定，默认值是原始图像的中心。有
+两种旋转的模式，一种保持图像的尺寸不变，因此旋转后原始图像中的某些部分会被裁剪，另一种是扩大
+图像的尺寸进而保留完整的原始图像。
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# 顺时针旋转图像30度
+img_ = mmcv.imrotate(img, 30)
+
+# 逆时针旋转图像90度
+img_ = mmcv.imrotate(img, -90)
+
+# 顺时针旋转图像30度并且缩放图像为原始图像的1.5倍
+img_ = mmcv.imrotate(img, 30, scale=1.5)
+
+# 以坐标(100, 100)为中心顺时针旋转图像30度
+img_ = mmcv.imrotate(img, 30, center=(100, 100))
+
+# 顺时针旋转图像30度并扩大图像的尺寸
+img_ = mmcv.imrotate(img, 30, auto_bound=True)
+```
+
+#### 翻转
+
+我们可以使用 `imflip` 翻转图像。
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# 水平翻转图像
+mmcv.imflip(img)
+
+# 垂直翻转图像
+mmcv.imflip(img, direction='vertical')
+```
+
+#### 裁剪
+
+`imcrop` 可以裁剪图像的一个或多个区域，每个区域用左上角和右下角坐标表示，形如(x1, y1, x2, y2)
+
+```python
+import mmcv
+import numpy as np
+
+img = mmcv.imread('tests/data/color.jpg')
+
+# 裁剪区域 (10, 10, 100, 120)
+bboxes = np.array([10, 10, 100, 120])
+patch = mmcv.imcrop(img, bboxes)
+
+# 裁剪两个区域，分别是 (10, 10, 100, 120) 和 (0, 0, 50, 50)
+bboxes = np.array([[10, 10, 100, 120], [0, 0, 50, 50]])
+patches = mmcv.imcrop(img, bboxes)
+
+# 裁剪两个区域并且缩放区域1.2倍
+patches = mmcv.imcrop(img, bboxes, scale=1.2)
+```
+
+#### 填充
+
+`impad` and `impad_to_multiple` 可以用给定的值将图像填充至给定的尺寸。
+
+```python
+img = mmcv.imread('tests/data/color.jpg')
+
+# 用给定值将图像填充至 (1000, 1200)
+img_ = mmcv.impad(img, shape=(1000, 1200), pad_val=0)
+
+# 用给定值分别填充图像的3个通道至 (1000, 1200)
+img_ = mmcv.impad(img, shape=(1000, 1200), pad_val=(100, 50, 200))
+
+# 用给定值填充图像的左、右、上、下四条边
+img_ = mmcv.impad(img, padding=(10, 20, 30, 40), pad_val=0)
+
+# 用3个值分别填充图像的左、右、上、下四条边的3个通道
+img_ = mmcv.impad(img, padding=(10, 20, 30, 40), pad_val=(100, 50, 200))
+
+# 将图像的四条边填充至能够被给定值整除
+img_ = mmcv.impad_to_multiple(img, 32)
+```
+
+### 视频
+
+视频模块提供了以下的功能：
+
+- 一个 `VideoReader` 类，具有友好的 API 接口可以读取和转换视频
+- 一些编辑视频的方法，包括 `cut` ， `concat` ， `resize`
+- 光流的读取/保存/变换
+
+#### VideoReader
+
+`VideoReader` 类提供了和序列一样的接口去获取视频帧。该类会缓存所有被访问过的帧。
+
+```python
+video = mmcv.VideoReader('test.mp4')
+
+# 获取基本的信息
+print(len(video))
+print(video.width, video.height, video.resolution, video.fps)
+
+# 遍历所有的帧
+for frame in video:
+    print(frame.shape)
+
+# 读取下一帧
+img = video.read()
+
+# 使用索引获取帧
+img = video[100]
+
+# 获取指定范围的帧
+img = video[5:10]
+```
+
+将视频切成帧并保存至给定目录或者从给定目录中生成视频。
+
+```python
+# 将视频切成帧并保存至目录
+video = mmcv.VideoReader('test.mp4')
+video.cvt2frames('out_dir')
+
+# 从给定目录中生成视频
+mmcv.frames2video('out_dir', 'test.avi')
+```
+
+#### 编辑函数
+
+有几个用于编辑视频的函数，这些函数是对 `ffmpeg` 的封装。
+
+```python
+# 裁剪视频
+mmcv.cut_video('test.mp4', 'clip1.mp4', start=3, end=10, vcodec='h264')
+
+# 将多个视频拼接成一个视频
+mmcv.concat_video(['clip1.mp4', 'clip2.mp4'], 'joined.mp4', log_level='quiet')
+
+# 将视频缩放至给定的尺寸
+mmcv.resize_video('test.mp4', 'resized1.mp4', (360, 240))
+
+# 将视频缩放至给定的倍率
+mmcv.resize_video('test.mp4', 'resized2.mp4', ratio=2)
+```
+
+#### 光流
+
+`mmcv` 提供了以下用于操作光流的函数：
+
+- 读取/保存
+- 可视化
+- 流变换
+
+我们提供了两种将光流dump到文件的方法，分别是非压缩和压缩的方法。非压缩的方法直接将浮点数值的光流
+保存至二进制文件，虽然光流无损但文件会比较大。而压缩的方法先量化光流至 0-255 整形数值再保存为
+jpeg图像。光流的x维度和y维度会被拼接到图像中。
+
+1. 读取/保存
+
+```python
+flow = np.random.rand(800, 600, 2).astype(np.float32)
+# 保存光流到flo文件 (~3.7M)
+mmcv.flowwrite(flow, 'uncompressed.flo')
+# 保存光流为jpeg图像 (~230K)，图像的尺寸为 (800, 1200)
+mmcv.flowwrite(flow, 'compressed.jpg', quantize=True, concat_axis=1)
+
+# 读取光流文件，以下两种方式读取的光流尺寸均为 (800, 600, 2)
+flow = mmcv.flowread('uncompressed.flo')
+flow = mmcv.flowread('compressed.jpg', quantize=True, concat_axis=1)
+```
+
+2. 可视化
+
+使用 `mmcv.flowshow()` 可视化光流
+
+```python
+mmcv.flowshow(flow)
+```
+
+![progress](../../en/_static/flow_visualization.png)
+
+1. 流变换
+
+```python
+img1 = mmcv.imread('img1.jpg')
+flow = mmcv.flowread('flow.flo')
+warped_img2 = mmcv.flow_warp(img1, flow)
+```
+
+img1 (左) and img2 (右)
+
+![raw images](../../en/_static/flow_raw_images.png)
+
+光流 (img2 -> img1)
+
+![optical flow](../../en/_static/flow_img2toimg1.png)
+
+变换后的图像和真实图像的差异
+
+![warped image](../../en/_static/flow_warp_diff.png)
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_transform.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_transform.md
new file mode 100644
index 0000000000000000000000000000000000000000..47d16e1b5279cdcdf8700876d3d94e152b3181a0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/data_transform.md
@@ -0,0 +1,341 @@
+# 数据变换
+
+在 OpenMMLab 算法库中，数据集的构建和数据的准备是相互解耦的。通常，数据集的构建只对数据集进行解析，记录每个样本的基本信息；而数据的准备则是通过一系列的数据变换，根据样本的基本信息进行数据加载、预处理、格式化等操作。
+
+## 数据变换的设计
+
+在 MMCV 中，我们使用各种可调用的数据变换类来进行数据的操作。这些数据变换类可以接受若干配置参数进行实例化，之后通过调用的方式对输入的数据字典进行处理。同时，我们约定所有数据变换都接受一个字典作为输入，并将处理后的数据输出为一个字典。一个简单的例子如下：
+
+```python
+>>> import numpy as np
+>>> from mmcv.transforms import Resize
+>>>
+>>> transform = Resize(scale=(224, 224))
+>>> data_dict = {'img': np.random.rand(256, 256, 3)}
+>>> data_dict = transform(data_dict)
+>>> print(data_dict['img'].shape)
+(224, 224, 3)
+```
+
+数据变换类会读取输入字典的某些字段，并且可能添加、或者更新某些字段。这些字段的键大部分情况下是固定的，如 `Resize` 会固定地读取输入字典中的 `"img"` 等字段。我们可以在对应类的文档中了解对输入输出字段的约定。
+
+```{note}
+默认情况下，在需要图像尺寸作为**初始化参数**的数据变换 (如Resize, Pad) 中，图像尺寸的顺序均为 (width, height)。在数据变换**返回的字典**中，图像相关的尺寸， 如 `img_shape`、`ori_shape`、`pad_shape` 等，均为 (height, width)。
+```
+
+MMCV 为所有的数据变换类提供了一个统一的基类 (`BaseTransform`)：
+
+```python
+class BaseTransform(metaclass=ABCMeta):
+
+    def __call__(self, results: dict) -> dict:
+
+        return self.transform(results)
+
+    @abstractmethod
+    def transform(self, results: dict) -> dict:
+        pass
+```
+
+所有的数据变换类都需要继承 `BaseTransform`，并实现 `transform` 方法。`transform` 方法的输入和输出均为一个字典。在**自定义数据变换类**一节中，我们会更详细地介绍如何实现一个数据变换类。
+
+## 数据流水线
+
+如上所述，所有数据变换的输入和输出都是一个字典，而且根据 OpenMMLab 中 [有关数据集的约定](TODO)，数据集中每个样本的基本信息都是一个字典。这样一来，我们可以将所有的数据变换操作首尾相接，组合成为一条数据流水线（data pipeline），输入数据集中样本的信息字典，输出完成一系列处理后的信息字典。
+
+以分类任务为例，我们在下图展示了一个典型的数据流水线。对每个样本，数据集中保存的基本信息是一个如图中最左侧所示的字典，之后每经过一个由蓝色块代表的数据变换操作，数据字典中都会加入新的字段（标记为绿色）或更新现有的字段（标记为橙色）。
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/154197953-bf0b1a16-3f41-4bc7-9e67-b2b9b323d895.png" width="90%"/>
+</div>
+
+在配置文件中，数据流水线是一个若干数据变换配置字典组成的列表，每个数据集都需要设置参数 `pipeline` 来定义该数据集需要进行的数据准备操作。如上数据流水线在配置文件中的配置如下：
+
+```python
+pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='Resize', size=256, keep_ratio=True),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375]),
+    dict(type='ClsFormatBundle')
+]
+
+dataset = dict(
+    ...
+    pipeline=pipeline,
+    ...
+)
+```
+
+## 常用的数据变换类
+
+按照功能，常用的数据变换类可以大致分为数据加载、数据预处理与增强、数据格式化。在 MMCV 中，我们提供了一些常用的数据变换类如下：
+
+### 数据加载
+
+为了支持大规模数据集的加载，通常在 `Dataset` 初始化时不加载数据，只加载相应的路径。因此需要在数据流水线中进行具体数据的加载。
+
+|            class            |                   功能                    |
+| :-------------------------: | :---------------------------------------: |
+| [`LoadImageFromFile`](TODO) |             根据路径加载图像              |
+|  [`LoadAnnotations`](TODO)  | 加载和组织标注信息，如 bbox、语义分割图等 |
+
+### 数据预处理及增强
+
+数据预处理和增强通常是对图像本身进行变换，如裁剪、填充、缩放等。
+
+|              class               |                功能                |
+| :------------------------------: | :--------------------------------: |
+|          [`Pad`](TODO)           |            填充图像边缘            |
+|       [`CenterCrop`](TODO)       |              居中裁剪              |
+|       [`Normalize`](TODO)        |          对图像进行归一化          |
+|         [`Resize`](TODO)         |     按照指定尺寸或比例缩放图像     |
+|      [`RandomResize`](TODO)      |    缩放图像至指定范围的随机尺寸    |
+| [`RandomMultiscaleResize`](TODO) | 缩放图像至多个尺寸中的随机一个尺寸 |
+|    [`RandomGrayscale`](TODO)     |             随机灰度化             |
+|       [`RandomFlip`](TODO)       |            图像随机翻转            |
+|   [`MultiScaleFlipAug`](TODO)    |   支持缩放和翻转的测试时数据增强   |
+
+### 数据格式化
+
+数据格式化操作通常是对数据进行的类型转换。
+
+|          class          |               功能                |
+| :---------------------: | :-------------------------------: |
+|   [`ToTensor`](TODO)    | 将指定的数据转换为 `torch.Tensor` |
+| [`ImageToTensor`](TODO) |    将图像转换为 `torch.Tensor`    |
+
+## 自定义数据变换类
+
+要实现一个新的数据变换类，需要继承 `BaseTransform`，并实现 `transform` 方法。这里，我们使用一个简单的翻转变换（`MyFlip`）作为示例：
+
+```python
+import random
+import mmcv
+from mmcv.transforms import BaseTransform, TRANSFORMS
+
+@TRANSFORMS.register_module()
+class MyFlip(BaseTransform):
+    def __init__(self, direction: str):
+        super().__init__()
+        self.direction = direction
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+从而，我们可以实例化一个 `MyFlip` 对象，并将之作为一个可调用对象，来处理我们的数据字典。
+
+```python
+import numpy as np
+
+transform = MyFlip(direction='horizontal')
+data_dict = {'img': np.random.rand(224, 224, 3)}
+data_dict = transform(data_dict)
+processed_img = data_dict['img']
+```
+
+又或者，在配置文件的 pipeline 中使用 `MyFlip` 变换
+
+```python
+pipeline = [
+    ...
+    dict(type='MyFlip', direction='horizontal'),
+    ...
+]
+```
+
+需要注意的是，如需在配置文件中使用，需要保证 `MyFlip` 类所在的文件在运行时能够被导入。
+
+## 变换包装
+
+变换包装是一种特殊的数据变换类，他们本身并不操作数据字典中的图像、标签等信息，而是对其中定义的数据变换的行为进行增强。
+
+### 字段映射（KeyMapper）
+
+字段映射包装（`KeyMapper`）用于对数据字典中的字段进行映射。例如，一般的图像处理变换都从数据字典中的 `"img"` 字段获得值。但有些时候，我们希望这些变换处理数据字典中其他字段中的图像，比如 `"gt_img"` 字段。
+
+如果配合注册器和配置文件使用的话，在配置文件中数据集的 `pipeline` 中如下例使用字段映射包装：
+
+```python
+pipeline = [
+    ...
+    dict(type='KeyMapper',
+        mapping={
+            'img': 'gt_img',  # 将 "gt_img" 字段映射至 "img" 字段
+            'mask': ...,  # 不使用原始数据中的 "mask" 字段。即对于被包装的数据变换，数据中不包含 "mask" 字段
+        },
+        auto_remap=True,  # 在完成变换后，将 "img" 重映射回 "gt_img" 字段
+        transforms=[
+            # 在 `RandomFlip` 变换类中，我们只需要操作 "img" 字段即可
+            dict(type='RandomFlip'),
+        ])
+    ...
+]
+```
+
+利用字段映射包装，我们在实现数据变换类时，不需要考虑在 `transform` 方法中考虑各种可能的输入字段名，只需要处理默认的字段即可。
+
+### 随机选择（RandomChoice）和随机执行（RandomApply）
+
+随机选择包装（`RandomChoice`）用于从一系列数据变换组合中随机应用一个数据变换组合。利用这一包装，我们可以简单地实现一些数据增强功能，比如 AutoAugment。
+
+如果配合注册器和配置文件使用的话，在配置文件中数据集的 `pipeline` 中如下例使用随机选择包装：
+
+```python
+pipeline = [
+    ...
+    dict(type='RandomChoice',
+        transforms=[
+            [
+                dict(type='Posterize', bits=4),
+                dict(type='Rotate', angle=30.)
+            ],  # 第一种随机变化组合
+            [
+                dict(type='Equalize'),
+                dict(type='Rotate', angle=30)
+            ],  # 第二种随机变换组合
+        ],
+        prob=[0.4, 0.6]  # 两种随机变换组合各自的选用概率
+        )
+    ...
+]
+```
+
+随机执行包装（`RandomApply`）用于以指定概率随机执行数据变换组合。例如：
+
+```python
+pipeline = [
+    ...
+    dict(type='RandomApply',
+        transforms=[dict(type='Rotate', angle=30.)],
+        prob=0.3)  # 以 0.3 的概率执行被包装的数据变换
+    ...
+]
+```
+
+### 多目标扩展（TransformBroadcaster）
+
+通常，一个数据变换类只会从一个固定的字段读取操作目标。虽然我们也可以使用 `KeyMapper` 来改变读取的字段，但无法将变换一次性应用于多个字段的数据。为了实现这一功能，我们需要借助多目标扩展包装（`TransformBroadcaster`）。
+
+多目标扩展包装（`TransformBroadcaster`）有两个用法，一是将数据变换作用于指定的多个字段，二是将数据变换作用于某个字段下的一组目标中。
+
+1. 应用于多个字段
+
+   假设我们需要将数据变换应用于 `"lq"` (low-quality) 和 `"gt"` (ground-truth) 两个字段中的图像上。
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           # 分别应用于 "lq" 和 "gt" 两个字段，并将二者应设置 "img" 字段
+           mapping={'img': ['lq', 'gt']},
+           # 在完成变换后，将 "img" 字段重映射回原先的字段
+           auto_remap=True,
+           # 是否在对各目标的变换中共享随机变量
+           # 更多介绍参加后续章节（随机变量共享）
+           share_random_params=True,
+           transforms=[
+               # 在 `RandomFlip` 变换类中，我们只需要操作 "img" 字段即可
+               dict(type='RandomFlip'),
+           ])
+   ]
+   ```
+
+   在多目标扩展的 `mapping` 设置中，我们同样可以使用 `...` 来忽略指定的原始字段。如以下例子中，被包裹的 `RandomCrop` 会对字段 `"img"` 中的图像进行裁剪，并且在字段 `"img_shape"` 存在时更新剪裁后的图像大小。如果我们希望同时对两个图像字段 `"lq"` 和 `"gt"` 进行相同的随机裁剪，但只更新一次 `"img_shape"` 字段，可以通过例子中的方式实现：
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           mapping={
+               'img': ['lq', 'gt'],
+               'img_shape': ['img_shape', ...],
+            },
+           # 在完成变换后，将 "img" 和 "img_shape" 字段重映射回原先的字段
+           auto_remap=True,
+           # 是否在对各目标的变换中共享随机变量
+           # 更多介绍参加后续章节（随机变量共享）
+           share_random_params=True,
+           transforms=[
+               # `RandomCrop` 类中会操作 "img" 和 "img_shape" 字段。若 "img_shape" 空缺，
+               # 则只操作 "img"
+               dict(type='RandomCrop'),
+           ])
+   ]
+   ```
+
+2. 应用于一个字段的一组目标
+
+   假设我们需要将数据变换应用于 `"images"` 字段，该字段为一个图像组成的 list。
+
+   ```python
+   pipeline = [
+       dict(type='TransformBroadcaster',
+           # 将 "images" 字段下的每张图片映射至 "img" 字段
+           mapping={'img': 'images'},
+           # 在完成变换后，将 "img" 字段下的图片重映射回 "images" 字段的列表中
+           auto_remap=True,
+           # 是否在对各目标的变换中共享随机变量
+           share_random_params=True,
+           transforms=[
+               # 在 `RandomFlip` 变换类中，我们只需要操作 "img" 字段即可
+               dict(type='RandomFlip'),
+           ])
+   ]
+   ```
+
+#### 装饰器 `cache_randomness`
+
+在 `TransformBroadcaster` 中，我们提供了 `share_random_params` 选项来支持在多次数据变换中共享随机状态。例如，在超分辨率任务中，我们希望将随机变换**同步**作用于低分辨率图像和原始图像。如果我们希望在自定义的数据变换类中使用这一功能，需要在类中标注哪些随机变量是支持共享的。这可以通过装饰器 `cache_randomness` 来实现。
+
+以上文中的 `MyFlip` 为例，我们希望以一定的概率随机执行翻转：
+
+```python
+from mmcv.transforms.utils import cache_randomness
+
+@TRANSFORMS.register_module()
+class MyRandomFlip(BaseTransform):
+    def __init__(self, prob: float, direction: str):
+        super().__init__()
+        self.prob = prob
+        self.direction = direction
+
+    @cache_randomness  # 标注该方法的输出为可共享的随机变量
+    def do_flip(self):
+        flip = True if random.random() > self.prob else False
+        return flip
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        if self.do_flip():
+            results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+在上面的例子中，我们用`cache_randomness` 装饰 `do_flip`方法，即将该方法返回值 `flip` 标注为一个支持共享的随机变量。进而，在 `TransformBroadcaster` 对多个目标的变换中，这一变量的值都会保持一致。
+
+#### 装饰器 `avoid_cache_randomness`
+
+在一些情况下，我们无法将数据变换中产生随机变量的过程单独放在类方法中。例如数据变换中使用的来自第三方库的模块，这些模块将随机变量相关的部分封装在了内部，导致无法将其抽出为数据变换的类方法。这样的数据变换无法通过装饰器 `cache_randomness` 标注支持共享的随机变量，进而无法在多目标扩展时共享随机变量。
+
+为了避免在多目标扩展中误用此类数据变换，我们提供了另一个装饰器 `avoid_cache_randomness`，用来对此类数据变换进行标记：
+
+```python
+from mmcv.transforms.utils import avoid_cache_randomness
+
+@TRANSFORMS.register_module()
+@avoid_cache_randomness
+class MyRandomTransform(BaseTransform):
+
+    def transform(self, results: dict) -> dict:
+        ...
+```
+
+用 `avoid_cache_randomness` 标记的数据变换类，当其实例被 `TransformBroadcaster` 包装且将参数 `share_random_params` 设置为 True 时，会抛出异常，以此提醒用户不能这样使用。
+
+在使用 `avoid_cache_randomness` 时需要注意以下几点：
+
+1. `avoid_cache_randomness` 只用于装饰数据变换类（BaseTransfrom 的子类），而不能用与装饰其他一般的类、类方法或函数
+2. 被 `avoid_cache_randomness` 修饰的数据变换作为基类时，其子类将**不会继承**这一特性。如果子类仍无法共享随机变量，则应再次使用 `avoid_cache_randomness` 修饰
+3. 只有当一个数据变换具有随机性，且无法共享随机参数时，才需要以 `avoid_cache_randomness` 修饰。无随机性的数据变换不需要修饰
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/ops.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/ops.md
new file mode 100644
index 0000000000000000000000000000000000000000..fbc0f1338658d83b5e59510c1dd35aefbaf25c4d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/ops.md
@@ -0,0 +1,63 @@
+## 算子
+
+MMCV 提供了检测、分割等任务中常用的算子
+
+| Device                       | CPU | CUDA | MLU | MPS | Ascend |
+| ---------------------------- | --- | ---- | --- | --- | ------ |
+| ActiveRotatedFilter          | √   | √    |     |     |        |
+| AssignScoreWithK             |     | √    |     |     |        |
+| BallQuery                    |     | √    |     |     |        |
+| BBoxOverlaps                 |     | √    | √   | √   |        |
+| BorderAlign                  |     | √    |     |     |        |
+| BoxIouRotated                | √   | √    |     |     |        |
+| BoxIouQuadri                 | √   | √    |     |     |        |
+| CARAFE                       |     | √    | √   |     |        |
+| ChamferDistance              |     | √    |     |     |        |
+| CrissCrossAttention          |     | √    |     |     |        |
+| ContourExpand                | √   |      |     |     |        |
+| ConvexIoU                    |     | √    |     |     |        |
+| CornerPool                   |     | √    |     |     |        |
+| Correlation                  |     | √    |     |     |        |
+| Deformable Convolution v1/v2 | √   | √    |     |     | √      |
+| Deformable RoIPool           |     | √    | √   |     | √      |
+| DiffIoURotated               |     | √    |     |     |        |
+| DynamicScatter               |     | √    |     |     |        |
+| FurthestPointSample          |     | √    |     |     |        |
+| FurthestPointSampleWithDist  |     | √    |     |     |        |
+| FusedBiasLeakyrelu           |     | √    |     |     | √      |
+| GatherPoints                 |     | √    |     |     |        |
+| GroupPoints                  |     | √    |     |     |        |
+| Iou3d                        |     | √    | √   |     |        |
+| KNN                          |     | √    |     |     |        |
+| MaskedConv                   |     | √    | √   |     | √      |
+| MergeCells                   |     | √    |     |     |        |
+| MinAreaPolygon               |     | √    |     |     |        |
+| ModulatedDeformConv2d        | √   | √    |     |     | √      |
+| MultiScaleDeformableAttn     |     | √    | √   |     |        |
+| NMS                          | √   | √    | √   |     | √      |
+| NMSRotated                   | √   | √    |     |     |        |
+| NMSQuadri                    | √   | √    |     |     |        |
+| PixelGroup                   | √   |      |     |     |        |
+| PointsInBoxes                | √   | √    |     |     |        |
+| PointsInPolygons             |     | √    |     |     |        |
+| PSAMask                      | √   | √    | √   |     | √      |
+| RotatedFeatureAlign          | √   | √    |     |     |        |
+| RoIPointPool3d               |     | √    | √   |     |        |
+| RoIPool                      |     | √    | √   |     | √      |
+| RoIAlignRotated              | √   | √    | √   |     |        |
+| RiRoIAlignRotated            |     | √    |     |     |        |
+| RoIAlign                     | √   | √    | √   |     |        |
+| RoIAwarePool3d               |     | √    | √   |     |        |
+| SAConv2d                     |     | √    |     |     |        |
+| SigmoidFocalLoss             |     | √    | √   |     | √      |
+| SoftmaxFocalLoss             |     | √    |     |     | √      |
+| SoftNMS                      |     | √    |     |     |        |
+| Sparse Convolution           |     | √    |     |     |        |
+| Synchronized BatchNorm       |     | √    |     |     |        |
+| ThreeInterpolate             |     | √    |     |     |        |
+| ThreeNN                      |     | √    | √   |     |        |
+| TINShift                     |     | √    | √   |     |        |
+| UpFirDn2d                    |     | √    |     |     |        |
+| Voxelization                 | √   | √    |     |     |        |
+| PrRoIPool                    |     | √    |     |     |        |
+| BezierAlign                  | √   | √    |     |     |        |
diff --git a/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/visualization.md b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/visualization.md
new file mode 100644
index 0000000000000000000000000000000000000000..9ad26c6a822cae0c084c52e204baa07b88627b97
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/docs/zh_cn/understand_mmcv/visualization.md
@@ -0,0 +1,24 @@
+## 可视化
+
+`mmcv` 可以展示图像以及标注（目前只支持标注框）
+
+```python
+# 展示图像文件
+mmcv.imshow('a.jpg')
+
+# 展示已加载的图像
+img = np.random.rand(100, 100, 3)
+mmcv.imshow(img)
+
+# 展示带有标注框的图像
+img = np.random.rand(100, 100, 3)
+bboxes = np.array([[0, 0, 50, 50], [20, 20, 60, 60]])
+mmcv.imshow_bboxes(img, bboxes)
+```
+
+`mmcv` 也可以展示特殊的图像，例如光流
+
+```python
+flow = mmcv.flowread('test.flo')
+mmcv.flowshow(flow)
+```
diff --git a/cv/distiller/CWD/pytorch/mmcv/install_mmcv.sh b/cv/distiller/CWD/pytorch/mmcv/install_mmcv.sh
new file mode 100644
index 0000000000000000000000000000000000000000..21555368b96ee3013a926a7b04896cf52960e4b9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/install_mmcv.sh
@@ -0,0 +1,36 @@
+
+
+
+#!/bin/bash
+
+TARGET_DIR=${TARGET_DIR:-}
+
+PYTHON_PATH=$(which python3)
+PYTHON_DIST_PATH=${TARGET_DIR}/lib/python3/dist-packages
+
+PKG_DIR="build_pip"
+PKG_NAME="mmcv"
+
+if [[ ! -d ${PKG_DIR} ]]; then
+  echo "ERROR: Package directory ${PKG_DIR} doesn't exist"
+  exit 1
+fi
+
+latest_pkg="$(ls -t ${PKG_DIR} | grep ${PKG_NAME} | head -1)"
+if [[ "${latest_pkg}" == "" ]]; then
+  echo "ERROR: Cannot find latest ${PKG_NAME} package"
+  exit 1
+else
+  echo "INFO: Found latest package ${latest_pkg} in directory ${PKG_DIR}"
+fi
+
+if [[ "${TARGET_DIR}" != ""  ]]; then
+  ${PYTHON_PATH} -m pip install --upgrade --no-deps -t ${PYTHON_DIST_PATH} ${PKG_DIR}/${latest_pkg} || exit
+  echo "Mmcv installed in ${PYTHON_DIST_PATH}; please add it to your PYTHONPATH."
+else
+  ${PYTHON_PATH} -m pip uninstall ${PKG_NAME} -y
+  ${PYTHON_PATH} -m pip install --no-deps ${PKG_DIR}/${latest_pkg} || exit
+fi
+
+# Return 0 status if all finished
+exit 0
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..2410ea555e905acb450792a427596764e16f62d3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# flake8: noqa
+from .arraymisc import *
+from .image import *
+from .transforms import *
+from .version import *
+from .video import *
+from .visualization import *
+
+# The following modules are not imported to this level, so mmcv may be used
+# without PyTorch.
+# - op
+# - utils
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b4700d6139ae3d604ff6e542468cce4200c020c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .quantization import dequantize, quantize
+
+__all__ = ['quantize', 'dequantize']
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/quantization.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/quantization.py
new file mode 100644
index 0000000000000000000000000000000000000000..6182710d51787061304cfc7304ec97d565822536
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/arraymisc/quantization.py
@@ -0,0 +1,65 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Union
+
+import numpy as np
+
+
+def quantize(arr: np.ndarray,
+             min_val: Union[int, float],
+             max_val: Union[int, float],
+             levels: int,
+             dtype=np.int64) -> tuple:
+    """Quantize an array of (-inf, inf) to [0, levels-1].
+
+    Args:
+        arr (ndarray): Input array.
+        min_val (int or float): Minimum value to be clipped.
+        max_val (int or float): Maximum value to be clipped.
+        levels (int): Quantization levels.
+        dtype (np.type): The type of the quantized array.
+
+    Returns:
+        tuple: Quantized array.
+    """
+    if not (isinstance(levels, int) and levels > 1):
+        raise ValueError(
+            f'levels must be a positive integer, but got {levels}')
+    if min_val >= max_val:
+        raise ValueError(
+            f'min_val ({min_val}) must be smaller than max_val ({max_val})')
+
+    arr = np.clip(arr, min_val, max_val) - min_val
+    quantized_arr = np.minimum(
+        np.floor(levels * arr / (max_val - min_val)).astype(dtype), levels - 1)
+
+    return quantized_arr
+
+
+def dequantize(arr: np.ndarray,
+               min_val: Union[int, float],
+               max_val: Union[int, float],
+               levels: int,
+               dtype=np.float64) -> tuple:
+    """Dequantize an array.
+
+    Args:
+        arr (ndarray): Input array.
+        min_val (int or float): Minimum value to be clipped.
+        max_val (int or float): Maximum value to be clipped.
+        levels (int): Quantization levels.
+        dtype (np.type): The type of the dequantized array.
+
+    Returns:
+        tuple: Dequantized array.
+    """
+    if not (isinstance(levels, int) and levels > 1):
+        raise ValueError(
+            f'levels must be a positive integer, but got {levels}')
+    if min_val >= max_val:
+        raise ValueError(
+            f'min_val ({min_val}) must be smaller than max_val ({max_val})')
+
+    dequantized_arr = (arr + 0.5).astype(dtype) * (max_val -
+                                                   min_val) / levels + min_val
+
+    return dequantized_arr
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce2d2463d0419a79d5d198c16d583f754311e383
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/__init__.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .alexnet import AlexNet
+# yapf: disable
+from .bricks import (ContextBlock, Conv2d, Conv3d, ConvAWS2d, ConvModule,
+                     ConvTranspose2d, ConvTranspose3d, ConvWS2d,
+                     DepthwiseSeparableConvModule, GeneralizedAttention,
+                     HSigmoid, HSwish, Linear, MaxPool2d, MaxPool3d,
+                     NonLocal1d, NonLocal2d, NonLocal3d, Scale, Swish,
+                     build_activation_layer, build_conv_layer,
+                     build_norm_layer, build_padding_layer, build_plugin_layer,
+                     build_upsample_layer, conv_ws_2d, is_norm)
+# yapf: enable
+from .resnet import ResNet, make_res_layer
+from .utils import fuse_conv_bn, get_model_complexity_info
+from .vgg import VGG, make_vgg_layer
+
+__all__ = [
+    'AlexNet', 'VGG', 'make_vgg_layer', 'ResNet', 'make_res_layer',
+    'ConvModule', 'build_activation_layer', 'build_conv_layer',
+    'build_norm_layer', 'build_padding_layer', 'build_upsample_layer',
+    'build_plugin_layer', 'is_norm', 'NonLocal1d', 'NonLocal2d', 'NonLocal3d',
+    'ContextBlock', 'HSigmoid', 'Swish', 'HSwish', 'GeneralizedAttention',
+    'Scale', 'conv_ws_2d', 'ConvAWS2d', 'ConvWS2d',
+    'DepthwiseSeparableConvModule', 'Linear', 'Conv2d', 'ConvTranspose2d',
+    'MaxPool2d', 'ConvTranspose3d', 'MaxPool3d', 'Conv3d', 'fuse_conv_bn',
+    'get_model_complexity_info'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/alexnet.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/alexnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..309be24b66049c86837c67d24ee0e790e6396abc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/alexnet.py
@@ -0,0 +1,63 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Optional
+
+import torch
+import torch.nn as nn
+from mmengine.runner import load_checkpoint
+
+
+class AlexNet(nn.Module):
+    """AlexNet backbone.
+
+    Args:
+        num_classes (int): number of classes for classification.
+    """
+
+    def __init__(self, num_classes: int = -1):
+        super().__init__()
+        self.num_classes = num_classes
+        self.features = nn.Sequential(
+            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+            nn.Conv2d(64, 192, kernel_size=5, padding=2),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+            nn.Conv2d(192, 384, kernel_size=3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(384, 256, kernel_size=3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(256, 256, kernel_size=3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=3, stride=2),
+        )
+        if self.num_classes > 0:
+            self.classifier = nn.Sequential(
+                nn.Dropout(),
+                nn.Linear(256 * 6 * 6, 4096),
+                nn.ReLU(inplace=True),
+                nn.Dropout(),
+                nn.Linear(4096, 4096),
+                nn.ReLU(inplace=True),
+                nn.Linear(4096, num_classes),
+            )
+
+    def init_weights(self, pretrained: Optional[str] = None) -> None:
+        if isinstance(pretrained, str):
+            logger = logging.getLogger()
+            load_checkpoint(self, pretrained, strict=False, logger=logger)
+        elif pretrained is None:
+            # use default initializer
+            pass
+        else:
+            raise TypeError('pretrained must be a str or None')
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+
+        x = self.features(x)
+        if self.num_classes > 0:
+            x = x.view(x.size(0), 256 * 6 * 6)
+            x = self.classifier(x)
+
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c74986953bf1a23a246c92c51fd14e033b6d682
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/__init__.py
@@ -0,0 +1,32 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .activation import build_activation_layer
+from .context_block import ContextBlock
+from .conv import build_conv_layer
+from .conv2d_adaptive_padding import Conv2dAdaptivePadding
+from .conv_module import ConvModule
+from .conv_ws import ConvAWS2d, ConvWS2d, conv_ws_2d
+from .depthwise_separable_conv_module import DepthwiseSeparableConvModule
+from .drop import Dropout, DropPath
+from .generalized_attention import GeneralizedAttention
+from .hsigmoid import HSigmoid
+from .hswish import HSwish
+from .non_local import NonLocal1d, NonLocal2d, NonLocal3d
+from .norm import build_norm_layer, is_norm
+from .padding import build_padding_layer
+from .plugin import build_plugin_layer
+from .scale import LayerScale, Scale
+from .swish import Swish
+from .upsample import build_upsample_layer
+from .wrappers import (Conv2d, Conv3d, ConvTranspose2d, ConvTranspose3d,
+                       Linear, MaxPool2d, MaxPool3d)
+
+__all__ = [
+    'ConvModule', 'build_activation_layer', 'build_conv_layer',
+    'build_norm_layer', 'build_padding_layer', 'build_upsample_layer',
+    'build_plugin_layer', 'is_norm', 'HSigmoid', 'HSwish', 'NonLocal1d',
+    'NonLocal2d', 'NonLocal3d', 'ContextBlock', 'GeneralizedAttention',
+    'Scale', 'ConvAWS2d', 'ConvWS2d', 'conv_ws_2d',
+    'DepthwiseSeparableConvModule', 'Swish', 'Linear', 'Conv2dAdaptivePadding',
+    'Conv2d', 'ConvTranspose2d', 'MaxPool2d', 'ConvTranspose3d', 'MaxPool3d',
+    'Conv3d', 'Dropout', 'DropPath', 'LayerScale'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/activation.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/activation.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae99714b940913c946fa169883584ea193f645ea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/activation.py
@@ -0,0 +1,114 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.registry import MODELS
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+for module in [
+        nn.ReLU, nn.LeakyReLU, nn.PReLU, nn.RReLU, nn.ReLU6, nn.ELU,
+        nn.Sigmoid, nn.Tanh
+]:
+    MODELS.register_module(module=module)
+
+if digit_version(torch.__version__) >= digit_version('1.7.0'):
+    MODELS.register_module(module=nn.SiLU, name='SiLU')
+else:
+
+    class SiLU(nn.Module):
+        """Sigmoid Weighted Liner Unit."""
+
+        def __init__(self, inplace=False):
+            super().__init__()
+            self.inplace = inplace
+
+        def forward(self, inputs) -> torch.Tensor:
+            if self.inplace:
+                return inputs.mul_(torch.sigmoid(inputs))
+            else:
+                return inputs * torch.sigmoid(inputs)
+
+    MODELS.register_module(module=SiLU, name='SiLU')
+
+
+@MODELS.register_module(name='Clip')
+@MODELS.register_module()
+class Clamp(nn.Module):
+    """Clamp activation layer.
+
+    This activation function is to clamp the feature map value within
+    :math:`[min, max]`. More details can be found in ``torch.clamp()``.
+
+    Args:
+        min (Number | optional): Lower-bound of the range to be clamped to.
+            Default to -1.
+        max (Number | optional): Upper-bound of the range to be clamped to.
+            Default to 1.
+    """
+
+    def __init__(self, min: float = -1., max: float = 1.):
+        super().__init__()
+        self.min = min
+        self.max = max
+
+    def forward(self, x) -> torch.Tensor:
+        """Forward function.
+
+        Args:
+            x (torch.Tensor): The input tensor.
+
+        Returns:
+            torch.Tensor: Clamped tensor.
+        """
+        return torch.clamp(x, min=self.min, max=self.max)
+
+
+class GELU(nn.Module):
+    r"""Applies the Gaussian Error Linear Units function:
+
+    .. math::
+        \text{GELU}(x) = x * \Phi(x)
+    where :math:`\Phi(x)` is the Cumulative Distribution Function for
+    Gaussian Distribution.
+
+    Shape:
+        - Input: :math:`(N, *)` where `*` means, any number of additional
+          dimensions
+        - Output: :math:`(N, *)`, same shape as the input
+
+    .. image:: scripts/activation_images/GELU.png
+
+    Examples::
+
+        >>> m = nn.GELU()
+        >>> input = torch.randn(2)
+        >>> output = m(input)
+    """
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        return F.gelu(input)
+
+
+if (TORCH_VERSION == 'parrots'
+        or digit_version(TORCH_VERSION) < digit_version('1.4')):
+    MODELS.register_module(module=GELU)
+else:
+    MODELS.register_module(module=nn.GELU)
+
+
+def build_activation_layer(cfg: Dict) -> nn.Module:
+    """Build activation layer.
+
+    Args:
+        cfg (dict): The activation layer config, which should contain:
+
+            - type (str): Layer type.
+            - layer args: Args needed to instantiate an activation layer.
+
+    Returns:
+        nn.Module: Created activation layer.
+    """
+    return MODELS.build(cfg)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/context_block.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/context_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e78df8648b779124091a8595282aad7a8d0d305
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/context_block.py
@@ -0,0 +1,126 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Union
+
+import torch
+from mmengine.model import constant_init, kaiming_init
+from mmengine.registry import MODELS
+from torch import nn
+
+
+def last_zero_init(m: Union[nn.Module, nn.Sequential]) -> None:
+    if isinstance(m, nn.Sequential):
+        constant_init(m[-1], val=0)
+    else:
+        constant_init(m, val=0)
+
+
+@MODELS.register_module()
+class ContextBlock(nn.Module):
+    """ContextBlock module in GCNet.
+
+    See 'GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond'
+    (https://arxiv.org/abs/1904.11492) for details.
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        ratio (float): Ratio of channels of transform bottleneck
+        pooling_type (str): Pooling method for context modeling.
+            Options are 'att' and 'avg', stand for attention pooling and
+            average pooling respectively. Default: 'att'.
+        fusion_types (Sequence[str]): Fusion method for feature fusion,
+            Options are 'channels_add', 'channel_mul', stand for channelwise
+            addition and multiplication respectively. Default: ('channel_add',)
+    """
+
+    _abbr_ = 'context_block'
+
+    def __init__(self,
+                 in_channels: int,
+                 ratio: float,
+                 pooling_type: str = 'att',
+                 fusion_types: tuple = ('channel_add', )):
+        super().__init__()
+        assert pooling_type in ['avg', 'att']
+        assert isinstance(fusion_types, (list, tuple))
+        valid_fusion_types = ['channel_add', 'channel_mul']
+        assert all([f in valid_fusion_types for f in fusion_types])
+        assert len(fusion_types) > 0, 'at least one fusion should be used'
+        self.in_channels = in_channels
+        self.ratio = ratio
+        self.planes = int(in_channels * ratio)
+        self.pooling_type = pooling_type
+        self.fusion_types = fusion_types
+        if pooling_type == 'att':
+            self.conv_mask = nn.Conv2d(in_channels, 1, kernel_size=1)
+            self.softmax = nn.Softmax(dim=2)
+        else:
+            self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        if 'channel_add' in fusion_types:
+            self.channel_add_conv = nn.Sequential(
+                nn.Conv2d(self.in_channels, self.planes, kernel_size=1),
+                nn.LayerNorm([self.planes, 1, 1]),
+                nn.ReLU(inplace=True),  # yapf: disable
+                nn.Conv2d(self.planes, self.in_channels, kernel_size=1))
+        else:
+            self.channel_add_conv = None
+        if 'channel_mul' in fusion_types:
+            self.channel_mul_conv = nn.Sequential(
+                nn.Conv2d(self.in_channels, self.planes, kernel_size=1),
+                nn.LayerNorm([self.planes, 1, 1]),
+                nn.ReLU(inplace=True),  # yapf: disable
+                nn.Conv2d(self.planes, self.in_channels, kernel_size=1))
+        else:
+            self.channel_mul_conv = None
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        if self.pooling_type == 'att':
+            kaiming_init(self.conv_mask, mode='fan_in')
+            self.conv_mask.inited = True
+
+        if self.channel_add_conv is not None:
+            last_zero_init(self.channel_add_conv)
+        if self.channel_mul_conv is not None:
+            last_zero_init(self.channel_mul_conv)
+
+    def spatial_pool(self, x: torch.Tensor) -> torch.Tensor:
+        batch, channel, height, width = x.size()
+        if self.pooling_type == 'att':
+            input_x = x
+            # [N, C, H * W]
+            input_x = input_x.view(batch, channel, height * width)
+            # [N, 1, C, H * W]
+            input_x = input_x.unsqueeze(1)
+            # [N, 1, H, W]
+            context_mask = self.conv_mask(x)
+            # [N, 1, H * W]
+            context_mask = context_mask.view(batch, 1, height * width)
+            # [N, 1, H * W]
+            context_mask = self.softmax(context_mask)
+            # [N, 1, H * W, 1]
+            context_mask = context_mask.unsqueeze(-1)
+            # [N, 1, C, 1]
+            context = torch.matmul(input_x, context_mask)
+            # [N, C, 1, 1]
+            context = context.view(batch, channel, 1, 1)
+        else:
+            # [N, C, 1, 1]
+            context = self.avg_pool(x)
+
+        return context
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # [N, C, 1, 1]
+        context = self.spatial_pool(x)
+
+        out = x
+        if self.channel_mul_conv is not None:
+            # [N, C, 1, 1]
+            channel_mul_term = torch.sigmoid(self.channel_mul_conv(context))
+            out = out * channel_mul_term
+        if self.channel_add_conv is not None:
+            # [N, C, 1, 1]
+            channel_add_term = self.channel_add_conv(context)
+            out = out + channel_add_term
+
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..ace744e039b644c2e3bb643de2ba89a438d299af
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv.py
@@ -0,0 +1,49 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+from mmengine.registry import MODELS
+from torch import nn
+
+MODELS.register_module('Conv1d', module=nn.Conv1d)
+MODELS.register_module('Conv2d', module=nn.Conv2d)
+MODELS.register_module('Conv3d', module=nn.Conv3d)
+MODELS.register_module('Conv', module=nn.Conv2d)
+
+
+def build_conv_layer(cfg: Optional[Dict], *args, **kwargs) -> nn.Module:
+    """Build convolution layer.
+
+    Args:
+        cfg (None or dict): The conv layer config, which should contain:
+            - type (str): Layer type.
+            - layer args: Args needed to instantiate an conv layer.
+        args (argument list): Arguments passed to the `__init__`
+            method of the corresponding conv layer.
+        kwargs (keyword arguments): Keyword arguments passed to the `__init__`
+            method of the corresponding conv layer.
+
+    Returns:
+        nn.Module: Created conv layer.
+    """
+    if cfg is None:
+        cfg_ = dict(type='Conv2d')
+    else:
+        if not isinstance(cfg, dict):
+            raise TypeError('cfg must be a dict')
+        if 'type' not in cfg:
+            raise KeyError('the cfg dict must contain the key "type"')
+        cfg_ = cfg.copy()
+
+    layer_type = cfg_.pop('type')
+
+    # Switch registry to the target scope. If `conv_layer` cannot be found
+    # in the registry, fallback to search `conv_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        conv_layer = registry.get(layer_type)
+    if conv_layer is None:
+        raise KeyError(f'Cannot find {conv_layer} in registry under scope '
+                       f'name {registry.scope}')
+    layer = conv_layer(*args, **kwargs, **cfg_)
+
+    return layer
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv2d_adaptive_padding.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv2d_adaptive_padding.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ac9949e4830c64161036b519594685f7dae72c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv2d_adaptive_padding.py
@@ -0,0 +1,63 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Tuple, Union
+
+import torch
+from mmengine.registry import MODELS
+from torch import nn
+from torch.nn import functional as F
+
+
+@MODELS.register_module()
+class Conv2dAdaptivePadding(nn.Conv2d):
+    """Implementation of 2D convolution in tensorflow with `padding` as "same",
+    which applies padding to input (if needed) so that input image gets fully
+    covered by filter and stride you specified. For stride 1, this will ensure
+    that output image size is same as input. For stride of 2, output dimensions
+    will be half, for example.
+
+    Args:
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        kernel_size (int or tuple): Size of the convolving kernel
+        stride (int or tuple, optional): Stride of the convolution. Default: 1
+        padding (int or tuple, optional): Zero-padding added to both sides of
+            the input. Default: 0
+        dilation (int or tuple, optional): Spacing between kernel elements.
+            Default: 1
+        groups (int, optional): Number of blocked connections from input
+            channels to output channels. Default: 1
+        bias (bool, optional): If ``True``, adds a learnable bias to the
+            output. Default: ``True``
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, int]],
+                 stride: Union[int, Tuple[int, int]] = 1,
+                 padding: Union[int, Tuple[int, int]] = 0,
+                 dilation: Union[int, Tuple[int, int]] = 1,
+                 groups: int = 1,
+                 bias: bool = True):
+        super().__init__(in_channels, out_channels, kernel_size, stride, 0,
+                         dilation, groups, bias)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        img_h, img_w = x.size()[-2:]
+        kernel_h, kernel_w = self.weight.size()[-2:]
+        stride_h, stride_w = self.stride
+        output_h = math.ceil(img_h / stride_h)
+        output_w = math.ceil(img_w / stride_w)
+        pad_h = (
+            max((output_h - 1) * self.stride[0] +
+                (kernel_h - 1) * self.dilation[0] + 1 - img_h, 0))
+        pad_w = (
+            max((output_w - 1) * self.stride[1] +
+                (kernel_w - 1) * self.dilation[1] + 1 - img_w, 0))
+        if pad_h > 0 or pad_w > 0:
+            x = F.pad(x, [
+                pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+            ])
+        return F.conv2d(x, self.weight, self.bias, self.stride, self.padding,
+                        self.dilation, self.groups)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_module.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f8e160517f7d62c07a4b10317da1a5718805209
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_module.py
@@ -0,0 +1,212 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Dict, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.model import constant_init, kaiming_init
+from mmengine.registry import MODELS
+from mmengine.utils.dl_utils.parrots_wrapper import _BatchNorm, _InstanceNorm
+
+from .activation import build_activation_layer
+from .conv import build_conv_layer
+from .norm import build_norm_layer
+from .padding import build_padding_layer
+
+
+@MODELS.register_module()
+class ConvModule(nn.Module):
+    """A conv block that bundles conv/norm/activation layers.
+
+    This block simplifies the usage of convolution layers, which are commonly
+    used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU).
+    It is based upon three build methods: `build_conv_layer()`,
+    `build_norm_layer()` and `build_activation_layer()`.
+
+    Besides, we add some additional features in this module.
+    1. Automatically set `bias` of the conv layer.
+    2. Spectral norm is supported.
+    3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only
+    supports zero and circular padding, and we add "reflect" padding mode.
+
+    Args:
+        in_channels (int): Number of channels in the input feature map.
+            Same as that in ``nn._ConvNd``.
+        out_channels (int): Number of channels produced by the convolution.
+            Same as that in ``nn._ConvNd``.
+        kernel_size (int | tuple[int]): Size of the convolving kernel.
+            Same as that in ``nn._ConvNd``.
+        stride (int | tuple[int]): Stride of the convolution.
+            Same as that in ``nn._ConvNd``.
+        padding (int | tuple[int]): Zero-padding added to both sides of
+            the input. Same as that in ``nn._ConvNd``.
+        dilation (int | tuple[int]): Spacing between kernel elements.
+            Same as that in ``nn._ConvNd``.
+        groups (int): Number of blocked connections from input channels to
+            output channels. Same as that in ``nn._ConvNd``.
+        bias (bool | str): If specified as `auto`, it will be decided by the
+            norm_cfg. Bias will be set as True if `norm_cfg` is None, otherwise
+            False. Default: "auto".
+        conv_cfg (dict): Config dict for convolution layer. Default: None,
+            which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer. Default: None.
+        act_cfg (dict): Config dict for activation layer.
+            Default: dict(type='ReLU').
+        inplace (bool): Whether to use inplace mode for activation.
+            Default: True.
+        with_spectral_norm (bool): Whether use spectral norm in conv module.
+            Default: False.
+        padding_mode (str): If the `padding_mode` has not been supported by
+            current `Conv2d` in PyTorch, we will use our own padding layer
+            instead. Currently, we support ['zeros', 'circular'] with official
+            implementation and ['reflect'] with our own implementation.
+            Default: 'zeros'.
+        order (tuple[str]): The order of conv/norm/activation layers. It is a
+            sequence of "conv", "norm" and "act". Common examples are
+            ("conv", "norm", "act") and ("act", "conv", "norm").
+            Default: ('conv', 'norm', 'act').
+    """
+
+    _abbr_ = 'conv_block'
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, int]],
+                 stride: Union[int, Tuple[int, int]] = 1,
+                 padding: Union[int, Tuple[int, int]] = 0,
+                 dilation: Union[int, Tuple[int, int]] = 1,
+                 groups: int = 1,
+                 bias: Union[bool, str] = 'auto',
+                 conv_cfg: Optional[Dict] = None,
+                 norm_cfg: Optional[Dict] = None,
+                 act_cfg: Optional[Dict] = dict(type='ReLU'),
+                 inplace: bool = True,
+                 with_spectral_norm: bool = False,
+                 padding_mode: str = 'zeros',
+                 order: tuple = ('conv', 'norm', 'act')):
+        super().__init__()
+        assert conv_cfg is None or isinstance(conv_cfg, dict)
+        assert norm_cfg is None or isinstance(norm_cfg, dict)
+        assert act_cfg is None or isinstance(act_cfg, dict)
+        official_padding_mode = ['zeros', 'circular']
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.inplace = inplace
+        self.with_spectral_norm = with_spectral_norm
+        self.with_explicit_padding = padding_mode not in official_padding_mode
+        self.order = order
+        assert isinstance(self.order, tuple) and len(self.order) == 3
+        assert set(order) == {'conv', 'norm', 'act'}
+
+        self.with_norm = norm_cfg is not None
+        self.with_activation = act_cfg is not None
+        # if the conv layer is before a norm layer, bias is unnecessary.
+        if bias == 'auto':
+            bias = not self.with_norm
+        self.with_bias = bias
+
+        if self.with_explicit_padding:
+            pad_cfg = dict(type=padding_mode)
+            self.padding_layer = build_padding_layer(pad_cfg, padding)
+
+        # reset padding to 0 for conv module
+        conv_padding = 0 if self.with_explicit_padding else padding
+        # build convolution layer
+        self.conv = build_conv_layer(
+            conv_cfg,
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=conv_padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias)
+        # export the attributes of self.conv to a higher level for convenience
+        self.in_channels = self.conv.in_channels
+        self.out_channels = self.conv.out_channels
+        self.kernel_size = self.conv.kernel_size
+        self.stride = self.conv.stride
+        self.padding = padding
+        self.dilation = self.conv.dilation
+        self.transposed = self.conv.transposed
+        self.output_padding = self.conv.output_padding
+        self.groups = self.conv.groups
+
+        if self.with_spectral_norm:
+            self.conv = nn.utils.spectral_norm(self.conv)
+
+        # build normalization layers
+        if self.with_norm:
+            # norm layer is after conv layer
+            if order.index('norm') > order.index('conv'):
+                norm_channels = out_channels
+            else:
+                norm_channels = in_channels
+            self.norm_name, norm = build_norm_layer(
+                norm_cfg, norm_channels)  # type: ignore
+            self.add_module(self.norm_name, norm)
+            if self.with_bias:
+                if isinstance(norm, (_BatchNorm, _InstanceNorm)):
+                    warnings.warn(
+                        'Unnecessary conv bias before batch/instance norm')
+        else:
+            self.norm_name = None  # type: ignore
+
+        # build activation layer
+        if self.with_activation:
+            act_cfg_ = act_cfg.copy()  # type: ignore
+            # nn.Tanh has no 'inplace' argument
+            if act_cfg_['type'] not in [
+                    'Tanh', 'PReLU', 'Sigmoid', 'HSigmoid', 'Swish', 'GELU'
+            ]:
+                act_cfg_.setdefault('inplace', inplace)
+            self.activate = build_activation_layer(act_cfg_)
+
+        # Use msra init by default
+        self.init_weights()
+
+    @property
+    def norm(self):
+        if self.norm_name:
+            return getattr(self, self.norm_name)
+        else:
+            return None
+
+    def init_weights(self):
+        # 1. It is mainly for customized conv layers with their own
+        #    initialization manners by calling their own ``init_weights()``,
+        #    and we do not want ConvModule to override the initialization.
+        # 2. For customized conv layers without their own initialization
+        #    manners (that is, they don't have their own ``init_weights()``)
+        #    and PyTorch's conv layers, they will be initialized by
+        #    this method with default ``kaiming_init``.
+        # Note: For PyTorch's conv layers, they will be overwritten by our
+        #    initialization implementation using default ``kaiming_init``.
+        if not hasattr(self.conv, 'init_weights'):
+            if self.with_activation and self.act_cfg['type'] == 'LeakyReLU':
+                nonlinearity = 'leaky_relu'
+                a = self.act_cfg.get('negative_slope', 0.01)
+            else:
+                nonlinearity = 'relu'
+                a = 0
+            kaiming_init(self.conv, a=a, nonlinearity=nonlinearity)
+        if self.with_norm:
+            constant_init(self.norm, 1, bias=0)
+
+    def forward(self,
+                x: torch.Tensor,
+                activate: bool = True,
+                norm: bool = True) -> torch.Tensor:
+        for layer in self.order:
+            if layer == 'conv':
+                if self.with_explicit_padding:
+                    x = self.padding_layer(x)
+                x = self.conv(x)
+            elif layer == 'norm' and norm and self.with_norm:
+                x = self.norm(x)
+            elif layer == 'act' and activate and self.with_activation:
+                x = self.activate(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_ws.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_ws.py
new file mode 100644
index 0000000000000000000000000000000000000000..261f5c1aa9aa9b80891e6330e6d576c3a8ce3e5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/conv_ws.py
@@ -0,0 +1,153 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import OrderedDict
+from typing import Dict, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.registry import MODELS
+
+
+def conv_ws_2d(input: torch.Tensor,
+               weight: torch.Tensor,
+               bias: Optional[torch.Tensor] = None,
+               stride: Union[int, Tuple[int, int]] = 1,
+               padding: Union[int, Tuple[int, int]] = 0,
+               dilation: Union[int, Tuple[int, int]] = 1,
+               groups: int = 1,
+               eps: float = 1e-5) -> torch.Tensor:
+    c_in = weight.size(0)
+    weight_flat = weight.view(c_in, -1)
+    mean = weight_flat.mean(dim=1, keepdim=True).view(c_in, 1, 1, 1)
+    std = weight_flat.std(dim=1, keepdim=True).view(c_in, 1, 1, 1)
+    weight = (weight - mean) / (std + eps)
+    return F.conv2d(input, weight, bias, stride, padding, dilation, groups)
+
+
+@MODELS.register_module('ConvWS')
+class ConvWS2d(nn.Conv2d):
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, int]],
+                 stride: Union[int, Tuple[int, int]] = 1,
+                 padding: Union[int, Tuple[int, int]] = 0,
+                 dilation: Union[int, Tuple[int, int]] = 1,
+                 groups: int = 1,
+                 bias: bool = True,
+                 eps: float = 1e-5):
+        super().__init__(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias)
+        self.eps = eps
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return conv_ws_2d(x, self.weight, self.bias, self.stride, self.padding,
+                          self.dilation, self.groups, self.eps)
+
+
+@MODELS.register_module(name='ConvAWS')
+class ConvAWS2d(nn.Conv2d):
+    """AWS (Adaptive Weight Standardization)
+
+    This is a variant of Weight Standardization
+    (https://arxiv.org/pdf/1903.10520.pdf)
+    It is used in DetectoRS to avoid NaN
+    (https://arxiv.org/pdf/2006.02334.pdf)
+
+    Args:
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        kernel_size (int or tuple): Size of the conv kernel
+        stride (int or tuple, optional): Stride of the convolution. Default: 1
+        padding (int or tuple, optional): Zero-padding added to both sides of
+            the input. Default: 0
+        dilation (int or tuple, optional): Spacing between kernel elements.
+            Default: 1
+        groups (int, optional): Number of blocked connections from input
+            channels to output channels. Default: 1
+        bias (bool, optional): If set True, adds a learnable bias to the
+            output. Default: True
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, int]],
+                 stride: Union[int, Tuple[int, int]] = 1,
+                 padding: Union[int, Tuple[int, int]] = 0,
+                 dilation: Union[int, Tuple[int, int]] = 1,
+                 groups: int = 1,
+                 bias: bool = True):
+        super().__init__(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias)
+        self.register_buffer('weight_gamma',
+                             torch.ones(self.out_channels, 1, 1, 1))
+        self.register_buffer('weight_beta',
+                             torch.zeros(self.out_channels, 1, 1, 1))
+
+    def _get_weight(self, weight: torch.Tensor) -> torch.Tensor:
+        weight_flat = weight.view(weight.size(0), -1)
+        mean = weight_flat.mean(dim=1).view(-1, 1, 1, 1)
+        std = torch.sqrt(weight_flat.var(dim=1) + 1e-5).view(-1, 1, 1, 1)
+        weight = (weight - mean) / std
+        weight = self.weight_gamma * weight + self.weight_beta
+        return weight
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        weight = self._get_weight(self.weight)
+        return F.conv2d(x, weight, self.bias, self.stride, self.padding,
+                        self.dilation, self.groups)
+
+    def _load_from_state_dict(self, state_dict: OrderedDict, prefix: str,
+                              local_metadata: Dict, strict: bool,
+                              missing_keys: List[str],
+                              unexpected_keys: List[str],
+                              error_msgs: List[str]) -> None:
+        """Override default load function.
+
+        AWS overrides the function _load_from_state_dict to recover
+        weight_gamma and weight_beta if they are missing. If weight_gamma and
+        weight_beta are found in the checkpoint, this function will return
+        after super()._load_from_state_dict. Otherwise, it will compute the
+        mean and std of the pretrained weights and store them in weight_beta
+        and weight_gamma.
+        """
+
+        self.weight_gamma.data.fill_(-1)
+        local_missing_keys: List = []
+        super()._load_from_state_dict(state_dict, prefix, local_metadata,
+                                      strict, local_missing_keys,
+                                      unexpected_keys, error_msgs)
+        if self.weight_gamma.data.mean() > 0:
+            for k in local_missing_keys:
+                missing_keys.append(k)
+            return
+        weight = self.weight.data
+        weight_flat = weight.view(weight.size(0), -1)
+        mean = weight_flat.mean(dim=1).view(-1, 1, 1, 1)
+        std = torch.sqrt(weight_flat.var(dim=1) + 1e-5).view(-1, 1, 1, 1)
+        self.weight_beta.data.copy_(mean)
+        self.weight_gamma.data.copy_(std)
+        missing_gamma_beta = [
+            k for k in local_missing_keys
+            if k.endswith('weight_gamma') or k.endswith('weight_beta')
+        ]
+        for k in missing_gamma_beta:
+            local_missing_keys.remove(k)
+        for k in local_missing_keys:
+            missing_keys.append(k)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/depthwise_separable_conv_module.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/depthwise_separable_conv_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf1fe4cad3812007573211fa2bede28b23822122
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/depthwise_separable_conv_module.py
@@ -0,0 +1,99 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from .conv_module import ConvModule
+
+
+class DepthwiseSeparableConvModule(nn.Module):
+    """Depthwise separable convolution module.
+
+    See https://arxiv.org/pdf/1704.04861.pdf for details.
+
+    This module can replace a ConvModule with the conv block replaced by two
+    conv block: depthwise conv block and pointwise conv block. The depthwise
+    conv block contains depthwise-conv/norm/activation layers. The pointwise
+    conv block contains pointwise-conv/norm/activation layers. It should be
+    noted that there will be norm/activation layer in the depthwise conv block
+    if `norm_cfg` and `act_cfg` are specified.
+
+    Args:
+        in_channels (int): Number of channels in the input feature map.
+            Same as that in ``nn._ConvNd``.
+        out_channels (int): Number of channels produced by the convolution.
+            Same as that in ``nn._ConvNd``.
+        kernel_size (int | tuple[int]): Size of the convolving kernel.
+            Same as that in ``nn._ConvNd``.
+        stride (int | tuple[int]): Stride of the convolution.
+            Same as that in ``nn._ConvNd``. Default: 1.
+        padding (int | tuple[int]): Zero-padding added to both sides of
+            the input. Same as that in ``nn._ConvNd``. Default: 0.
+        dilation (int | tuple[int]): Spacing between kernel elements.
+            Same as that in ``nn._ConvNd``. Default: 1.
+        norm_cfg (dict): Default norm config for both depthwise ConvModule and
+            pointwise ConvModule. Default: None.
+        act_cfg (dict): Default activation config for both depthwise ConvModule
+            and pointwise ConvModule. Default: dict(type='ReLU').
+        dw_norm_cfg (dict): Norm config of depthwise ConvModule. If it is
+            'default', it will be the same as `norm_cfg`. Default: 'default'.
+        dw_act_cfg (dict): Activation config of depthwise ConvModule. If it is
+            'default', it will be the same as `act_cfg`. Default: 'default'.
+        pw_norm_cfg (dict): Norm config of pointwise ConvModule. If it is
+            'default', it will be the same as `norm_cfg`. Default: 'default'.
+        pw_act_cfg (dict): Activation config of pointwise ConvModule. If it is
+            'default', it will be the same as `act_cfg`. Default: 'default'.
+        kwargs (optional): Other shared arguments for depthwise and pointwise
+            ConvModule. See ConvModule for ref.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, int]],
+                 stride: Union[int, Tuple[int, int]] = 1,
+                 padding: Union[int, Tuple[int, int]] = 0,
+                 dilation: Union[int, Tuple[int, int]] = 1,
+                 norm_cfg: Optional[Dict] = None,
+                 act_cfg: Dict = dict(type='ReLU'),
+                 dw_norm_cfg: Union[Dict, str] = 'default',
+                 dw_act_cfg: Union[Dict, str] = 'default',
+                 pw_norm_cfg: Union[Dict, str] = 'default',
+                 pw_act_cfg: Union[Dict, str] = 'default',
+                 **kwargs):
+        super().__init__()
+        assert 'groups' not in kwargs, 'groups should not be specified'
+
+        # if norm/activation config of depthwise/pointwise ConvModule is not
+        # specified, use default config.
+        dw_norm_cfg = dw_norm_cfg if dw_norm_cfg != 'default' else norm_cfg  # type: ignore # noqa E501
+        dw_act_cfg = dw_act_cfg if dw_act_cfg != 'default' else act_cfg
+        pw_norm_cfg = pw_norm_cfg if pw_norm_cfg != 'default' else norm_cfg  # type: ignore # noqa E501
+        pw_act_cfg = pw_act_cfg if pw_act_cfg != 'default' else act_cfg
+
+        # depthwise convolution
+        self.depthwise_conv = ConvModule(
+            in_channels,
+            in_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=in_channels,
+            norm_cfg=dw_norm_cfg,  # type: ignore
+            act_cfg=dw_act_cfg,  # type: ignore
+            **kwargs)
+
+        self.pointwise_conv = ConvModule(
+            in_channels,
+            out_channels,
+            1,
+            norm_cfg=pw_norm_cfg,  # type: ignore
+            act_cfg=pw_act_cfg,  # type: ignore
+            **kwargs)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.pointwise_conv(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/drop.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/drop.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe82a2560515858341836de3fa563ed4db3a3e14
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/drop.py
@@ -0,0 +1,67 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Dict, Optional
+
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+
+
+def drop_path(x: torch.Tensor,
+              drop_prob: float = 0.,
+              training: bool = False) -> torch.Tensor:
+    """Drop paths (Stochastic Depth) per sample (when applied in main path of
+    residual blocks).
+
+    We follow the implementation
+    https://github.com/rwightman/pytorch-image-models/blob/a2727c1bf78ba0d7b5727f5f95e37fb7f8866b1f/timm/models/layers/drop.py  # noqa: E501
+    """
+    if drop_prob == 0. or not training:
+        return x
+    keep_prob = 1 - drop_prob
+    # handle tensors with different dimensions, not just 4D tensors.
+    shape = (x.shape[0], ) + (1, ) * (x.ndim - 1)
+    random_tensor = keep_prob + torch.rand(
+        shape, dtype=x.dtype, device=x.device)
+    output = x.div(keep_prob) * random_tensor.floor()
+    return output
+
+
+@MODELS.register_module()
+class DropPath(nn.Module):
+    """Drop paths (Stochastic Depth) per sample  (when applied in main path of
+    residual blocks).
+
+    We follow the implementation
+    https://github.com/rwightman/pytorch-image-models/blob/a2727c1bf78ba0d7b5727f5f95e37fb7f8866b1f/timm/models/layers/drop.py  # noqa: E501
+
+    Args:
+        drop_prob (float): Probability of the path to be zeroed. Default: 0.1
+    """
+
+    def __init__(self, drop_prob: float = 0.1):
+        super().__init__()
+        self.drop_prob = drop_prob
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return drop_path(x, self.drop_prob, self.training)
+
+
+@MODELS.register_module()
+class Dropout(nn.Dropout):
+    """A wrapper for ``torch.nn.Dropout``, We rename the ``p`` of
+    ``torch.nn.Dropout`` to ``drop_prob`` so as to be consistent with
+    ``DropPath``
+
+    Args:
+        drop_prob (float): Probability of the elements to be
+            zeroed. Default: 0.5.
+        inplace (bool):  Do the operation inplace or not. Default: False.
+    """
+
+    def __init__(self, drop_prob: float = 0.5, inplace: bool = False):
+        super().__init__(p=drop_prob, inplace=inplace)
+
+
+def build_dropout(cfg: Dict, default_args: Optional[Dict] = None) -> Any:
+    """Builder for drop out layers."""
+    return MODELS.build(cfg, default_args=default_args)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/generalized_attention.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/generalized_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea931c6154427d4ce5dec81e157aa6416e123815
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/generalized_attention.py
@@ -0,0 +1,411 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model import kaiming_init
+from mmengine.registry import MODELS
+
+
+@MODELS.register_module()
+class GeneralizedAttention(nn.Module):
+    """GeneralizedAttention module.
+
+    See 'An Empirical Study of Spatial Attention Mechanisms in Deep Networks'
+    (https://arxiv.org/abs/1711.07971) for details.
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        spatial_range (int): The spatial range. -1 indicates no spatial range
+            constraint. Default: -1.
+        num_heads (int): The head number of empirical_attention module.
+            Default: 9.
+        position_embedding_dim (int): The position embedding dimension.
+            Default: -1.
+        position_magnitude (int): A multiplier acting on coord difference.
+            Default: 1.
+        kv_stride (int): The feature stride acting on key/value feature map.
+            Default: 2.
+        q_stride (int): The feature stride acting on query feature map.
+            Default: 1.
+        attention_type (str): A binary indicator string for indicating which
+            items in generalized empirical_attention module are used.
+            Default: '1111'.
+
+            - '1000' indicates 'query and key content' (appr - appr) item,
+            - '0100' indicates 'query content and relative position'
+              (appr - position) item,
+            - '0010' indicates 'key content only' (bias - appr) item,
+            - '0001' indicates 'relative position only' (bias - position) item.
+    """
+
+    _abbr_ = 'gen_attention_block'
+
+    def __init__(self,
+                 in_channels: int,
+                 spatial_range: int = -1,
+                 num_heads: int = 9,
+                 position_embedding_dim: int = -1,
+                 position_magnitude: int = 1,
+                 kv_stride: int = 2,
+                 q_stride: int = 1,
+                 attention_type: str = '1111'):
+
+        super().__init__()
+
+        # hard range means local range for non-local operation
+        self.position_embedding_dim = (
+            position_embedding_dim
+            if position_embedding_dim > 0 else in_channels)
+
+        self.position_magnitude = position_magnitude
+        self.num_heads = num_heads
+        self.in_channels = in_channels
+        self.spatial_range = spatial_range
+        self.kv_stride = kv_stride
+        self.q_stride = q_stride
+        self.attention_type = [bool(int(_)) for _ in attention_type]
+        self.qk_embed_dim = in_channels // num_heads
+        out_c = self.qk_embed_dim * num_heads
+
+        if self.attention_type[0] or self.attention_type[1]:
+            self.query_conv = nn.Conv2d(
+                in_channels=in_channels,
+                out_channels=out_c,
+                kernel_size=1,
+                bias=False)
+            self.query_conv.kaiming_init = True
+
+        if self.attention_type[0] or self.attention_type[2]:
+            self.key_conv = nn.Conv2d(
+                in_channels=in_channels,
+                out_channels=out_c,
+                kernel_size=1,
+                bias=False)
+            self.key_conv.kaiming_init = True
+
+        self.v_dim = in_channels // num_heads
+        self.value_conv = nn.Conv2d(
+            in_channels=in_channels,
+            out_channels=self.v_dim * num_heads,
+            kernel_size=1,
+            bias=False)
+        self.value_conv.kaiming_init = True
+
+        if self.attention_type[1] or self.attention_type[3]:
+            self.appr_geom_fc_x = nn.Linear(
+                self.position_embedding_dim // 2, out_c, bias=False)
+            self.appr_geom_fc_x.kaiming_init = True
+
+            self.appr_geom_fc_y = nn.Linear(
+                self.position_embedding_dim // 2, out_c, bias=False)
+            self.appr_geom_fc_y.kaiming_init = True
+
+        if self.attention_type[2]:
+            stdv = 1.0 / math.sqrt(self.qk_embed_dim * 2)
+            appr_bias_value = -2 * stdv * torch.rand(out_c) + stdv
+            self.appr_bias = nn.Parameter(appr_bias_value)
+
+        if self.attention_type[3]:
+            stdv = 1.0 / math.sqrt(self.qk_embed_dim * 2)
+            geom_bias_value = -2 * stdv * torch.rand(out_c) + stdv
+            self.geom_bias = nn.Parameter(geom_bias_value)
+
+        self.proj_conv = nn.Conv2d(
+            in_channels=self.v_dim * num_heads,
+            out_channels=in_channels,
+            kernel_size=1,
+            bias=True)
+        self.proj_conv.kaiming_init = True
+        self.gamma = nn.Parameter(torch.zeros(1))
+
+        if self.spatial_range >= 0:
+            # only works when non local is after 3*3 conv
+            if in_channels == 256:
+                max_len = 84
+            elif in_channels == 512:
+                max_len = 42
+
+            max_len_kv = int((max_len - 1.0) / self.kv_stride + 1)
+            local_constraint_map = np.ones(
+                (max_len, max_len, max_len_kv, max_len_kv), dtype=int)
+            for iy in range(max_len):
+                for ix in range(max_len):
+                    local_constraint_map[
+                        iy, ix,
+                        max((iy - self.spatial_range) //
+                            self.kv_stride, 0):min((iy + self.spatial_range +
+                                                    1) // self.kv_stride +
+                                                   1, max_len),
+                        max((ix - self.spatial_range) //
+                            self.kv_stride, 0):min((ix + self.spatial_range +
+                                                    1) // self.kv_stride +
+                                                   1, max_len)] = 0
+
+            self.local_constraint_map = nn.Parameter(
+                torch.from_numpy(local_constraint_map).byte(),
+                requires_grad=False)
+
+        if self.q_stride > 1:
+            self.q_downsample = nn.AvgPool2d(
+                kernel_size=1, stride=self.q_stride)
+        else:
+            self.q_downsample = None
+
+        if self.kv_stride > 1:
+            self.kv_downsample = nn.AvgPool2d(
+                kernel_size=1, stride=self.kv_stride)
+        else:
+            self.kv_downsample = None
+
+        self.init_weights()
+
+    def get_position_embedding(self,
+                               h,
+                               w,
+                               h_kv,
+                               w_kv,
+                               q_stride,
+                               kv_stride,
+                               device,
+                               dtype,
+                               feat_dim,
+                               wave_length=1000):
+        # the default type of Tensor is float32, leading to type mismatch
+        # in fp16 mode. Cast it to support fp16 mode.
+        h_idxs = torch.linspace(0, h - 1, h).to(device=device, dtype=dtype)
+        h_idxs = h_idxs.view((h, 1)) * q_stride
+
+        w_idxs = torch.linspace(0, w - 1, w).to(device=device, dtype=dtype)
+        w_idxs = w_idxs.view((w, 1)) * q_stride
+
+        h_kv_idxs = torch.linspace(0, h_kv - 1, h_kv).to(
+            device=device, dtype=dtype)
+        h_kv_idxs = h_kv_idxs.view((h_kv, 1)) * kv_stride
+
+        w_kv_idxs = torch.linspace(0, w_kv - 1, w_kv).to(
+            device=device, dtype=dtype)
+        w_kv_idxs = w_kv_idxs.view((w_kv, 1)) * kv_stride
+
+        # (h, h_kv, 1)
+        h_diff = h_idxs.unsqueeze(1) - h_kv_idxs.unsqueeze(0)
+        h_diff *= self.position_magnitude
+
+        # (w, w_kv, 1)
+        w_diff = w_idxs.unsqueeze(1) - w_kv_idxs.unsqueeze(0)
+        w_diff *= self.position_magnitude
+
+        feat_range = torch.arange(0, feat_dim / 4).to(
+            device=device, dtype=dtype)
+
+        dim_mat = torch.Tensor([wave_length]).to(device=device, dtype=dtype)
+        dim_mat = dim_mat**((4. / feat_dim) * feat_range)
+        dim_mat = dim_mat.view((1, 1, -1))
+
+        embedding_x = torch.cat(
+            ((w_diff / dim_mat).sin(), (w_diff / dim_mat).cos()), dim=2)
+
+        embedding_y = torch.cat(
+            ((h_diff / dim_mat).sin(), (h_diff / dim_mat).cos()), dim=2)
+
+        return embedding_x, embedding_y
+
+    def forward(self, x_input: torch.Tensor) -> torch.Tensor:
+        num_heads = self.num_heads
+
+        # use empirical_attention
+        if self.q_downsample is not None:
+            x_q = self.q_downsample(x_input)
+        else:
+            x_q = x_input
+        n, _, h, w = x_q.shape
+
+        if self.kv_downsample is not None:
+            x_kv = self.kv_downsample(x_input)
+        else:
+            x_kv = x_input
+        _, _, h_kv, w_kv = x_kv.shape
+
+        if self.attention_type[0] or self.attention_type[1]:
+            proj_query = self.query_conv(x_q).view(
+                (n, num_heads, self.qk_embed_dim, h * w))
+            proj_query = proj_query.permute(0, 1, 3, 2)
+
+        if self.attention_type[0] or self.attention_type[2]:
+            proj_key = self.key_conv(x_kv).view(
+                (n, num_heads, self.qk_embed_dim, h_kv * w_kv))
+
+        if self.attention_type[1] or self.attention_type[3]:
+            position_embed_x, position_embed_y = self.get_position_embedding(
+                h, w, h_kv, w_kv, self.q_stride, self.kv_stride,
+                x_input.device, x_input.dtype, self.position_embedding_dim)
+            # (n, num_heads, w, w_kv, dim)
+            position_feat_x = self.appr_geom_fc_x(position_embed_x).\
+                view(1, w, w_kv, num_heads, self.qk_embed_dim).\
+                permute(0, 3, 1, 2, 4).\
+                repeat(n, 1, 1, 1, 1)
+
+            # (n, num_heads, h, h_kv, dim)
+            position_feat_y = self.appr_geom_fc_y(position_embed_y).\
+                view(1, h, h_kv, num_heads, self.qk_embed_dim).\
+                permute(0, 3, 1, 2, 4).\
+                repeat(n, 1, 1, 1, 1)
+
+            position_feat_x /= math.sqrt(2)
+            position_feat_y /= math.sqrt(2)
+
+        # accelerate for saliency only
+        if (np.sum(self.attention_type) == 1) and self.attention_type[2]:
+            appr_bias = self.appr_bias.\
+                view(1, num_heads, 1, self.qk_embed_dim).\
+                repeat(n, 1, 1, 1)
+
+            energy = torch.matmul(appr_bias, proj_key).\
+                view(n, num_heads, 1, h_kv * w_kv)
+
+            h = 1
+            w = 1
+        else:
+            # (n, num_heads, h*w, h_kv*w_kv), query before key, 540mb for
+            if not self.attention_type[0]:
+                energy = torch.zeros(
+                    n,
+                    num_heads,
+                    h,
+                    w,
+                    h_kv,
+                    w_kv,
+                    dtype=x_input.dtype,
+                    device=x_input.device)
+
+            # attention_type[0]: appr - appr
+            # attention_type[1]: appr - position
+            # attention_type[2]: bias - appr
+            # attention_type[3]: bias - position
+            if self.attention_type[0] or self.attention_type[2]:
+                if self.attention_type[0] and self.attention_type[2]:
+                    appr_bias = self.appr_bias.\
+                        view(1, num_heads, 1, self.qk_embed_dim)
+                    energy = torch.matmul(proj_query + appr_bias, proj_key).\
+                        view(n, num_heads, h, w, h_kv, w_kv)
+
+                elif self.attention_type[0]:
+                    energy = torch.matmul(proj_query, proj_key).\
+                        view(n, num_heads, h, w, h_kv, w_kv)
+
+                elif self.attention_type[2]:
+                    appr_bias = self.appr_bias.\
+                        view(1, num_heads, 1, self.qk_embed_dim).\
+                        repeat(n, 1, 1, 1)
+
+                    energy += torch.matmul(appr_bias, proj_key).\
+                        view(n, num_heads, 1, 1, h_kv, w_kv)
+
+            if self.attention_type[1] or self.attention_type[3]:
+                if self.attention_type[1] and self.attention_type[3]:
+                    geom_bias = self.geom_bias.\
+                        view(1, num_heads, 1, self.qk_embed_dim)
+
+                    proj_query_reshape = (proj_query + geom_bias).\
+                        view(n, num_heads, h, w, self.qk_embed_dim)
+
+                    energy_x = torch.matmul(
+                        proj_query_reshape.permute(0, 1, 3, 2, 4),
+                        position_feat_x.permute(0, 1, 2, 4, 3))
+                    energy_x = energy_x.\
+                        permute(0, 1, 3, 2, 4).unsqueeze(4)
+
+                    energy_y = torch.matmul(
+                        proj_query_reshape,
+                        position_feat_y.permute(0, 1, 2, 4, 3))
+                    energy_y = energy_y.unsqueeze(5)
+
+                    energy += energy_x + energy_y
+
+                elif self.attention_type[1]:
+                    proj_query_reshape = proj_query.\
+                        view(n, num_heads, h, w, self.qk_embed_dim)
+                    proj_query_reshape = proj_query_reshape.\
+                        permute(0, 1, 3, 2, 4)
+                    position_feat_x_reshape = position_feat_x.\
+                        permute(0, 1, 2, 4, 3)
+                    position_feat_y_reshape = position_feat_y.\
+                        permute(0, 1, 2, 4, 3)
+
+                    energy_x = torch.matmul(proj_query_reshape,
+                                            position_feat_x_reshape)
+                    energy_x = energy_x.permute(0, 1, 3, 2, 4).unsqueeze(4)
+
+                    energy_y = torch.matmul(proj_query_reshape,
+                                            position_feat_y_reshape)
+                    energy_y = energy_y.unsqueeze(5)
+
+                    energy += energy_x + energy_y
+
+                elif self.attention_type[3]:
+                    geom_bias = self.geom_bias.\
+                        view(1, num_heads, self.qk_embed_dim, 1).\
+                        repeat(n, 1, 1, 1)
+
+                    position_feat_x_reshape = position_feat_x.\
+                        view(n, num_heads, w * w_kv, self.qk_embed_dim)
+
+                    position_feat_y_reshape = position_feat_y.\
+                        view(n, num_heads, h * h_kv, self.qk_embed_dim)
+
+                    energy_x = torch.matmul(position_feat_x_reshape, geom_bias)
+                    energy_x = energy_x.view(n, num_heads, 1, w, 1, w_kv)
+
+                    energy_y = torch.matmul(position_feat_y_reshape, geom_bias)
+                    energy_y = energy_y.view(n, num_heads, h, 1, h_kv, 1)
+
+                    energy += energy_x + energy_y
+
+            energy = energy.view(n, num_heads, h * w, h_kv * w_kv)
+
+        if self.spatial_range >= 0:
+            cur_local_constraint_map = \
+                self.local_constraint_map[:h, :w, :h_kv, :w_kv].\
+                contiguous().\
+                view(1, 1, h*w, h_kv*w_kv)
+
+            energy = energy.masked_fill_(cur_local_constraint_map,
+                                         float('-inf'))
+
+        attention = F.softmax(energy, 3)
+
+        proj_value = self.value_conv(x_kv)
+        proj_value_reshape = proj_value.\
+            view((n, num_heads, self.v_dim, h_kv * w_kv)).\
+            permute(0, 1, 3, 2)
+
+        out = torch.matmul(attention, proj_value_reshape).\
+            permute(0, 1, 3, 2).\
+            contiguous().\
+            view(n, self.v_dim * self.num_heads, h, w)
+
+        out = self.proj_conv(out)
+
+        # output is downsampled, upsample back to input size
+        if self.q_downsample is not None:
+            out = F.interpolate(
+                out,
+                size=x_input.shape[2:],
+                mode='bilinear',
+                align_corners=False)
+
+        out = self.gamma * out + x_input
+        return out
+
+    def init_weights(self):
+        for m in self.modules():
+            if hasattr(m, 'kaiming_init') and m.kaiming_init:
+                kaiming_init(
+                    m,
+                    mode='fan_in',
+                    nonlinearity='leaky_relu',
+                    bias=0,
+                    distribution='uniform',
+                    a=1)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hsigmoid.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hsigmoid.py
new file mode 100644
index 0000000000000000000000000000000000000000..423e0aad9ae154cf651d289327bc19da940cf449
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hsigmoid.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+
+
+@MODELS.register_module()
+class HSigmoid(nn.Module):
+    """Hard Sigmoid Module. Apply the hard sigmoid function:
+    Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value)
+    Default: Hsigmoid(x) = min(max((x + 3) / 6, 0), 1)
+
+    Note:
+        In MMCV v1.4.4, we modified the default value of args to align with
+        PyTorch official.
+
+    Args:
+        bias (float): Bias of the input feature map. Default: 3.0.
+        divisor (float): Divisor of the input feature map. Default: 6.0.
+        min_value (float): Lower bound value. Default: 0.0.
+        max_value (float): Upper bound value. Default: 1.0.
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self,
+                 bias: float = 3.0,
+                 divisor: float = 6.0,
+                 min_value: float = 0.0,
+                 max_value: float = 1.0):
+        super().__init__()
+        warnings.warn(
+            'In MMCV v1.4.4, we modified the default value of args to align '
+            'with PyTorch official. Previous Implementation: '
+            'Hsigmoid(x) = min(max((x + 1) / 2, 0), 1). '
+            'Current Implementation: '
+            'Hsigmoid(x) = min(max((x + 3) / 6, 0), 1).')
+        self.bias = bias
+        self.divisor = divisor
+        assert self.divisor != 0
+        self.min_value = min_value
+        self.max_value = max_value
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = (x + self.bias) / self.divisor
+
+        return x.clamp_(self.min_value, self.max_value)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hswish.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hswish.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b6dd006d424bd39a3f99ceefda816408309d71c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/hswish.py
@@ -0,0 +1,39 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+
+class HSwish(nn.Module):
+    """Hard Swish Module.
+
+    This module applies the hard swish function:
+
+    .. math::
+        Hswish(x) = x * ReLU6(x + 3) / 6
+
+    Args:
+        inplace (bool): can optionally do the operation in-place.
+            Default: False.
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self, inplace: bool = False):
+        super().__init__()
+        self.act = nn.ReLU6(inplace)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return x * self.act(x + 3) / 6
+
+
+if (TORCH_VERSION == 'parrots'
+        or digit_version(TORCH_VERSION) < digit_version('1.7')):
+    # Hardswish is not supported when PyTorch version < 1.6.
+    # And Hardswish in PyTorch 1.6 does not support inplace.
+    MODELS.register_module(module=HSwish)
+else:
+    MODELS.register_module(module=nn.Hardswish, name='HSwish')
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/non_local.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/non_local.py
new file mode 100644
index 0000000000000000000000000000000000000000..8dd4465cd62fcb07ec1bc3410ebd272f427ec6b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/non_local.py
@@ -0,0 +1,308 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+from mmengine.model import constant_init, normal_init
+from mmengine.registry import MODELS
+
+from .conv_module import ConvModule
+
+
+class _NonLocalNd(nn.Module, metaclass=ABCMeta):
+    """Basic Non-local module.
+
+    This module is proposed in
+    "Non-local Neural Networks"
+    Paper reference: https://arxiv.org/abs/1711.07971
+    Code reference: https://github.com/AlexHex7/Non-local_pytorch
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        reduction (int): Channel reduction ratio. Default: 2.
+        use_scale (bool): Whether to scale pairwise_weight by
+            `1/sqrt(inter_channels)` when the mode is `embedded_gaussian`.
+            Default: True.
+        conv_cfg (None | dict): The config dict for convolution layers.
+            If not specified, it will use `nn.Conv2d` for convolution layers.
+            Default: None.
+        norm_cfg (None | dict): The config dict for normalization layers.
+            Default: None. (This parameter is only applicable to conv_out.)
+        mode (str): Options are `gaussian`, `concatenation`,
+            `embedded_gaussian` and `dot_product`. Default: embedded_gaussian.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 reduction: int = 2,
+                 use_scale: bool = True,
+                 conv_cfg: Optional[Dict] = None,
+                 norm_cfg: Optional[Dict] = None,
+                 mode: str = 'embedded_gaussian',
+                 **kwargs):
+        super().__init__()
+        self.in_channels = in_channels
+        self.reduction = reduction
+        self.use_scale = use_scale
+        self.inter_channels = max(in_channels // reduction, 1)
+        self.mode = mode
+
+        if mode not in [
+                'gaussian', 'embedded_gaussian', 'dot_product', 'concatenation'
+        ]:
+            raise ValueError("Mode should be in 'gaussian', 'concatenation', "
+                             f"'embedded_gaussian' or 'dot_product', but got "
+                             f'{mode} instead.')
+
+        # g, theta, phi are defaulted as `nn.ConvNd`.
+        # Here we use ConvModule for potential usage.
+        self.g = ConvModule(
+            self.in_channels,
+            self.inter_channels,
+            kernel_size=1,
+            conv_cfg=conv_cfg,
+            act_cfg=None)  # type: ignore
+        self.conv_out = ConvModule(
+            self.inter_channels,
+            self.in_channels,
+            kernel_size=1,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=None)
+
+        if self.mode != 'gaussian':
+            self.theta = ConvModule(
+                self.in_channels,
+                self.inter_channels,
+                kernel_size=1,
+                conv_cfg=conv_cfg,
+                act_cfg=None)
+            self.phi = ConvModule(
+                self.in_channels,
+                self.inter_channels,
+                kernel_size=1,
+                conv_cfg=conv_cfg,
+                act_cfg=None)
+
+        if self.mode == 'concatenation':
+            self.concat_project = ConvModule(
+                self.inter_channels * 2,
+                1,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                bias=False,
+                act_cfg=dict(type='ReLU'))
+
+        self.init_weights(**kwargs)
+
+    def init_weights(self, std: float = 0.01, zeros_init: bool = True) -> None:
+        if self.mode != 'gaussian':
+            for m in [self.g, self.theta, self.phi]:
+                normal_init(m.conv, std=std)
+        else:
+            normal_init(self.g.conv, std=std)
+        if zeros_init:
+            if self.conv_out.norm_cfg is None:
+                constant_init(self.conv_out.conv, 0)
+            else:
+                constant_init(self.conv_out.norm, 0)
+        else:
+            if self.conv_out.norm_cfg is None:
+                normal_init(self.conv_out.conv, std=std)
+            else:
+                normal_init(self.conv_out.norm, std=std)
+
+    def gaussian(self, theta_x: torch.Tensor,
+                 phi_x: torch.Tensor) -> torch.Tensor:
+        # NonLocal1d pairwise_weight: [N, H, H]
+        # NonLocal2d pairwise_weight: [N, HxW, HxW]
+        # NonLocal3d pairwise_weight: [N, TxHxW, TxHxW]
+        pairwise_weight = torch.matmul(theta_x, phi_x)
+        pairwise_weight = pairwise_weight.softmax(dim=-1)
+        return pairwise_weight
+
+    def embedded_gaussian(self, theta_x: torch.Tensor,
+                          phi_x: torch.Tensor) -> torch.Tensor:
+        # NonLocal1d pairwise_weight: [N, H, H]
+        # NonLocal2d pairwise_weight: [N, HxW, HxW]
+        # NonLocal3d pairwise_weight: [N, TxHxW, TxHxW]
+        pairwise_weight = torch.matmul(theta_x, phi_x)
+        if self.use_scale:
+            # theta_x.shape[-1] is `self.inter_channels`
+            pairwise_weight /= theta_x.shape[-1]**0.5
+        pairwise_weight = pairwise_weight.softmax(dim=-1)
+        return pairwise_weight
+
+    def dot_product(self, theta_x: torch.Tensor,
+                    phi_x: torch.Tensor) -> torch.Tensor:
+        # NonLocal1d pairwise_weight: [N, H, H]
+        # NonLocal2d pairwise_weight: [N, HxW, HxW]
+        # NonLocal3d pairwise_weight: [N, TxHxW, TxHxW]
+        pairwise_weight = torch.matmul(theta_x, phi_x)
+        pairwise_weight /= pairwise_weight.shape[-1]
+        return pairwise_weight
+
+    def concatenation(self, theta_x: torch.Tensor,
+                      phi_x: torch.Tensor) -> torch.Tensor:
+        # NonLocal1d pairwise_weight: [N, H, H]
+        # NonLocal2d pairwise_weight: [N, HxW, HxW]
+        # NonLocal3d pairwise_weight: [N, TxHxW, TxHxW]
+        h = theta_x.size(2)
+        w = phi_x.size(3)
+        theta_x = theta_x.repeat(1, 1, 1, w)
+        phi_x = phi_x.repeat(1, 1, h, 1)
+
+        concat_feature = torch.cat([theta_x, phi_x], dim=1)
+        pairwise_weight = self.concat_project(concat_feature)
+        n, _, h, w = pairwise_weight.size()
+        pairwise_weight = pairwise_weight.view(n, h, w)
+        pairwise_weight /= pairwise_weight.shape[-1]
+
+        return pairwise_weight
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # Assume `reduction = 1`, then `inter_channels = C`
+        # or `inter_channels = C` when `mode="gaussian"`
+
+        # NonLocal1d x: [N, C, H]
+        # NonLocal2d x: [N, C, H, W]
+        # NonLocal3d x: [N, C, T, H, W]
+        n = x.size(0)
+
+        # NonLocal1d g_x: [N, H, C]
+        # NonLocal2d g_x: [N, HxW, C]
+        # NonLocal3d g_x: [N, TxHxW, C]
+        g_x = self.g(x).view(n, self.inter_channels, -1)
+        g_x = g_x.permute(0, 2, 1)
+
+        # NonLocal1d theta_x: [N, H, C], phi_x: [N, C, H]
+        # NonLocal2d theta_x: [N, HxW, C], phi_x: [N, C, HxW]
+        # NonLocal3d theta_x: [N, TxHxW, C], phi_x: [N, C, TxHxW]
+        if self.mode == 'gaussian':
+            theta_x = x.view(n, self.in_channels, -1)
+            theta_x = theta_x.permute(0, 2, 1)
+            if self.sub_sample:
+                phi_x = self.phi(x).view(n, self.in_channels, -1)
+            else:
+                phi_x = x.view(n, self.in_channels, -1)
+        elif self.mode == 'concatenation':
+            theta_x = self.theta(x).view(n, self.inter_channels, -1, 1)
+            phi_x = self.phi(x).view(n, self.inter_channels, 1, -1)
+        else:
+            theta_x = self.theta(x).view(n, self.inter_channels, -1)
+            theta_x = theta_x.permute(0, 2, 1)
+            phi_x = self.phi(x).view(n, self.inter_channels, -1)
+
+        pairwise_func = getattr(self, self.mode)
+        # NonLocal1d pairwise_weight: [N, H, H]
+        # NonLocal2d pairwise_weight: [N, HxW, HxW]
+        # NonLocal3d pairwise_weight: [N, TxHxW, TxHxW]
+        pairwise_weight = pairwise_func(theta_x, phi_x)
+
+        # NonLocal1d y: [N, H, C]
+        # NonLocal2d y: [N, HxW, C]
+        # NonLocal3d y: [N, TxHxW, C]
+        y = torch.matmul(pairwise_weight, g_x)
+        # NonLocal1d y: [N, C, H]
+        # NonLocal2d y: [N, C, H, W]
+        # NonLocal3d y: [N, C, T, H, W]
+        y = y.permute(0, 2, 1).contiguous().reshape(n, self.inter_channels,
+                                                    *x.size()[2:])
+
+        output = x + self.conv_out(y)
+
+        return output
+
+
+class NonLocal1d(_NonLocalNd):
+    """1D Non-local module.
+
+    Args:
+        in_channels (int): Same as `NonLocalND`.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        conv_cfg (None | dict): Same as `NonLocalND`.
+            Default: dict(type='Conv1d').
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 sub_sample: bool = False,
+                 conv_cfg: Dict = dict(type='Conv1d'),
+                 **kwargs):
+        super().__init__(in_channels, conv_cfg=conv_cfg, **kwargs)
+
+        self.sub_sample = sub_sample
+
+        if sub_sample:
+            max_pool_layer = nn.MaxPool1d(kernel_size=2)
+            self.g = nn.Sequential(self.g, max_pool_layer)
+            if self.mode != 'gaussian':
+                self.phi = nn.Sequential(self.phi, max_pool_layer)
+            else:
+                self.phi = max_pool_layer
+
+
+@MODELS.register_module()
+class NonLocal2d(_NonLocalNd):
+    """2D Non-local module.
+
+    Args:
+        in_channels (int): Same as `NonLocalND`.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        conv_cfg (None | dict): Same as `NonLocalND`.
+            Default: dict(type='Conv2d').
+    """
+
+    _abbr_ = 'nonlocal_block'
+
+    def __init__(self,
+                 in_channels: int,
+                 sub_sample: bool = False,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 **kwargs):
+        super().__init__(in_channels, conv_cfg=conv_cfg, **kwargs)
+
+        self.sub_sample = sub_sample
+
+        if sub_sample:
+            max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
+            self.g = nn.Sequential(self.g, max_pool_layer)
+            if self.mode != 'gaussian':
+                self.phi = nn.Sequential(self.phi, max_pool_layer)
+            else:
+                self.phi = max_pool_layer
+
+
+class NonLocal3d(_NonLocalNd):
+    """3D Non-local module.
+
+    Args:
+        in_channels (int): Same as `NonLocalND`.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        conv_cfg (None | dict): Same as `NonLocalND`.
+            Default: dict(type='Conv3d').
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 sub_sample: bool = False,
+                 conv_cfg: Dict = dict(type='Conv3d'),
+                 **kwargs):
+        super().__init__(in_channels, conv_cfg=conv_cfg, **kwargs)
+        self.sub_sample = sub_sample
+
+        if sub_sample:
+            max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
+            self.g = nn.Sequential(self.g, max_pool_layer)
+            if self.mode != 'gaussian':
+                self.phi = nn.Sequential(self.phi, max_pool_layer)
+            else:
+                self.phi = max_pool_layer
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/norm.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/norm.py
new file mode 100644
index 0000000000000000000000000000000000000000..2fff684af04286cf688bc4e8e61157426307f5e9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/norm.py
@@ -0,0 +1,153 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import inspect
+from typing import Dict, Tuple, Union
+
+import torch.nn as nn
+from mmengine.registry import MODELS
+from mmengine.utils import is_tuple_of
+from mmengine.utils.dl_utils.parrots_wrapper import (SyncBatchNorm, _BatchNorm,
+                                                     _InstanceNorm)
+
+MODELS.register_module('BN', module=nn.BatchNorm2d)
+MODELS.register_module('BN1d', module=nn.BatchNorm1d)
+MODELS.register_module('BN2d', module=nn.BatchNorm2d)
+MODELS.register_module('BN3d', module=nn.BatchNorm3d)
+MODELS.register_module('SyncBN', module=SyncBatchNorm)
+MODELS.register_module('GN', module=nn.GroupNorm)
+MODELS.register_module('LN', module=nn.LayerNorm)
+MODELS.register_module('IN', module=nn.InstanceNorm2d)
+MODELS.register_module('IN1d', module=nn.InstanceNorm1d)
+MODELS.register_module('IN2d', module=nn.InstanceNorm2d)
+MODELS.register_module('IN3d', module=nn.InstanceNorm3d)
+
+
+def infer_abbr(class_type):
+    """Infer abbreviation from the class name.
+
+    When we build a norm layer with `build_norm_layer()`, we want to preserve
+    the norm type in variable names, e.g, self.bn1, self.gn. This method will
+    infer the abbreviation to map class types to abbreviations.
+
+    Rule 1: If the class has the property "_abbr_", return the property.
+    Rule 2: If the parent class is _BatchNorm, GroupNorm, LayerNorm or
+    InstanceNorm, the abbreviation of this layer will be "bn", "gn", "ln" and
+    "in" respectively.
+    Rule 3: If the class name contains "batch", "group", "layer" or "instance",
+    the abbreviation of this layer will be "bn", "gn", "ln" and "in"
+    respectively.
+    Rule 4: Otherwise, the abbreviation falls back to "norm".
+
+    Args:
+        class_type (type): The norm layer type.
+
+    Returns:
+        str: The inferred abbreviation.
+    """
+    if not inspect.isclass(class_type):
+        raise TypeError(
+            f'class_type must be a type, but got {type(class_type)}')
+    if hasattr(class_type, '_abbr_'):
+        return class_type._abbr_
+    if issubclass(class_type, _InstanceNorm):  # IN is a subclass of BN
+        return 'in'
+    elif issubclass(class_type, _BatchNorm):
+        return 'bn'
+    elif issubclass(class_type, nn.GroupNorm):
+        return 'gn'
+    elif issubclass(class_type, nn.LayerNorm):
+        return 'ln'
+    else:
+        class_name = class_type.__name__.lower()
+        if 'batch' in class_name:
+            return 'bn'
+        elif 'group' in class_name:
+            return 'gn'
+        elif 'layer' in class_name:
+            return 'ln'
+        elif 'instance' in class_name:
+            return 'in'
+        else:
+            return 'norm_layer'
+
+
+def build_norm_layer(cfg: Dict,
+                     num_features: int,
+                     postfix: Union[int, str] = '') -> Tuple[str, nn.Module]:
+    """Build normalization layer.
+
+    Args:
+        cfg (dict): The norm layer config, which should contain:
+
+            - type (str): Layer type.
+            - layer args: Args needed to instantiate a norm layer.
+            - requires_grad (bool, optional): Whether stop gradient updates.
+        num_features (int): Number of input channels.
+        postfix (int | str): The postfix to be appended into norm abbreviation
+            to create named layer.
+
+    Returns:
+        tuple[str, nn.Module]: The first element is the layer name consisting
+        of abbreviation and postfix, e.g., bn1, gn. The second element is the
+        created norm layer.
+    """
+    if not isinstance(cfg, dict):
+        raise TypeError('cfg must be a dict')
+    if 'type' not in cfg:
+        raise KeyError('the cfg dict must contain the key "type"')
+    cfg_ = cfg.copy()
+
+    layer_type = cfg_.pop('type')
+
+    # Switch registry to the target scope. If `norm_layer` cannot be found
+    # in the registry, fallback to search `norm_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        norm_layer = registry.get(layer_type)
+    if norm_layer is None:
+        raise KeyError(f'Cannot find {norm_layer} in registry under scope '
+                       f'name {registry.scope}')
+    abbr = infer_abbr(norm_layer)
+
+    assert isinstance(postfix, (int, str))
+    name = abbr + str(postfix)
+
+    requires_grad = cfg_.pop('requires_grad', True)
+    cfg_.setdefault('eps', 1e-5)
+    if layer_type != 'GN':
+        layer = norm_layer(num_features, **cfg_)
+        if layer_type == 'SyncBN' and hasattr(layer, '_specify_ddp_gpu_num'):
+            layer._specify_ddp_gpu_num(1)
+    else:
+        assert 'num_groups' in cfg_
+        layer = norm_layer(num_channels=num_features, **cfg_)
+
+    for param in layer.parameters():
+        param.requires_grad = requires_grad
+
+    return name, layer
+
+
+def is_norm(layer: nn.Module,
+            exclude: Union[type, tuple, None] = None) -> bool:
+    """Check if a layer is a normalization layer.
+
+    Args:
+        layer (nn.Module): The layer to be checked.
+        exclude (type | tuple[type]): Types to be excluded.
+
+    Returns:
+        bool: Whether the layer is a norm layer.
+    """
+    if exclude is not None:
+        if not isinstance(exclude, tuple):
+            exclude = (exclude, )
+        if not is_tuple_of(exclude, type):
+            raise TypeError(
+                f'"exclude" must be either None or type or a tuple of types, '
+                f'but got {type(exclude)}: {exclude}')
+
+    if exclude and isinstance(layer, exclude):
+        return False
+
+    all_norm_bases = (_BatchNorm, _InstanceNorm, nn.GroupNorm, nn.LayerNorm)
+    return isinstance(layer, all_norm_bases)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/padding.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/padding.py
new file mode 100644
index 0000000000000000000000000000000000000000..4135a190d65170a5762ccc7a201439e276a9a5ab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/padding.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch.nn as nn
+from mmengine.registry import MODELS
+
+MODELS.register_module('zero', module=nn.ZeroPad2d)
+MODELS.register_module('reflect', module=nn.ReflectionPad2d)
+MODELS.register_module('replicate', module=nn.ReplicationPad2d)
+
+
+def build_padding_layer(cfg: Dict, *args, **kwargs) -> nn.Module:
+    """Build padding layer.
+
+    Args:
+        cfg (dict): The padding layer config, which should contain:
+            - type (str): Layer type.
+            - layer args: Args needed to instantiate a padding layer.
+
+    Returns:
+        nn.Module: Created padding layer.
+    """
+    if not isinstance(cfg, dict):
+        raise TypeError('cfg must be a dict')
+    if 'type' not in cfg:
+        raise KeyError('the cfg dict must contain the key "type"')
+
+    cfg_ = cfg.copy()
+    padding_type = cfg_.pop('type')
+
+    # Switch registry to the target scope. If `padding_layer` cannot be found
+    # in the registry, fallback to search `padding_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        padding_layer = registry.get(padding_type)
+    if padding_layer is None:
+        raise KeyError(f'Cannot find {padding_layer} in registry under scope '
+                       f'name {registry.scope}')
+    layer = padding_layer(*args, **kwargs, **cfg_)
+
+    return layer
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/plugin.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/plugin.py
new file mode 100644
index 0000000000000000000000000000000000000000..83ba3737abe683648ddd0a49b7143330d0480b8d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/plugin.py
@@ -0,0 +1,98 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import inspect
+import platform
+from typing import Dict, Tuple, Union
+
+import torch.nn as nn
+from mmengine.registry import MODELS
+
+if platform.system() == 'Windows':
+    import regex as re  # type: ignore
+else:
+    import re  # type: ignore
+
+
+def infer_abbr(class_type: type) -> str:
+    """Infer abbreviation from the class name.
+
+    This method will infer the abbreviation to map class types to
+    abbreviations.
+
+    Rule 1: If the class has the property "abbr", return the property.
+    Rule 2: Otherwise, the abbreviation falls back to snake case of class
+    name, e.g. the abbreviation of ``FancyBlock`` will be ``fancy_block``.
+
+    Args:
+        class_type (type): The norm layer type.
+
+    Returns:
+        str: The inferred abbreviation.
+    """
+
+    def camel2snack(word):
+        """Convert camel case word into snack case.
+
+        Modified from `inflection lib
+        <https://inflection.readthedocs.io/en/latest/#inflection.underscore>`_.
+
+        Example::
+
+            >>> camel2snack("FancyBlock")
+            'fancy_block'
+        """
+
+        word = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', word)
+        word = re.sub(r'([a-z\d])([A-Z])', r'\1_\2', word)
+        word = word.replace('-', '_')
+        return word.lower()
+
+    if not inspect.isclass(class_type):
+        raise TypeError(
+            f'class_type must be a type, but got {type(class_type)}')
+    if hasattr(class_type, '_abbr_'):
+        return class_type._abbr_  # type: ignore
+    else:
+        return camel2snack(class_type.__name__)
+
+
+def build_plugin_layer(cfg: Dict,
+                       postfix: Union[int, str] = '',
+                       **kwargs) -> Tuple[str, nn.Module]:
+    """Build plugin layer.
+
+    Args:
+        cfg (dict): cfg should contain:
+
+            - type (str): identify plugin layer type.
+            - layer args: args needed to instantiate a plugin layer.
+        postfix (int, str): appended into norm abbreviation to
+            create named layer. Default: ''.
+
+    Returns:
+        tuple[str, nn.Module]: The first one is the concatenation of
+        abbreviation and postfix. The second is the created plugin layer.
+    """
+    if not isinstance(cfg, dict):
+        raise TypeError('cfg must be a dict')
+    if 'type' not in cfg:
+        raise KeyError('the cfg dict must contain the key "type"')
+    cfg_ = cfg.copy()
+
+    layer_type = cfg_.pop('type')
+
+    # Switch registry to the target scope. If `plugin_layer` cannot be found
+    # in the registry, fallback to search `plugin_layer` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        plugin_layer = registry.get(layer_type)
+    if plugin_layer is None:
+        raise KeyError(f'Cannot find {plugin_layer} in registry under scope '
+                       f'name {registry.scope}')
+    abbr = infer_abbr(plugin_layer)
+
+    assert isinstance(postfix, (int, str))
+    name = abbr + str(postfix)
+
+    layer = plugin_layer(**kwargs, **cfg_)
+
+    return name, layer
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/scale.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/scale.py
new file mode 100644
index 0000000000000000000000000000000000000000..a47379898f75117e5ca2176d9a5f225f563d7b1e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/scale.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+
+class Scale(nn.Module):
+    """A learnable scale parameter.
+
+    This layer scales the input by a learnable factor. It multiplies a
+    learnable scale parameter of shape (1,) with input of any shape.
+
+    Args:
+        scale (float): Initial value of scale factor. Default: 1.0
+    """
+
+    def __init__(self, scale: float = 1.0):
+        super().__init__()
+        self.scale = nn.Parameter(torch.tensor(scale, dtype=torch.float))
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return x * self.scale
+
+
+class LayerScale(nn.Module):
+    """LayerScale layer.
+
+    Args:
+        dim (int): Dimension of input features.
+        inplace (bool): Whether performs operation in-place.
+            Default: `False`.
+        data_format (str): The input data format, could be 'channels_last'
+            or 'channels_first', representing (B, C, H, W) and
+            (B, N, C) format data respectively. Default: 'channels_last'.
+        scale (float): Initial value of scale factor. Default: 1.0
+    """
+
+    def __init__(self,
+                 dim: int,
+                 inplace: bool = False,
+                 data_format: str = 'channels_last',
+                 scale: float = 1e-5):
+        super().__init__()
+        assert data_format in ('channels_last', 'channels_first'), \
+            "'data_format' could only be channels_last or channels_first."
+        self.inplace = inplace
+        self.data_format = data_format
+        self.weight = nn.Parameter(torch.ones(dim) * scale)
+
+    def forward(self, x) -> torch.Tensor:
+        if self.data_format == 'channels_first':
+            shape = tuple((1, -1, *(1 for _ in range(x.dim() - 2))))
+        else:
+            shape = tuple((*(1 for _ in range(x.dim() - 1)), -1))
+        if self.inplace:
+            return x.mul_(self.weight.view(*shape))
+        else:
+            return x * self.weight.view(*shape)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/swish.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/swish.py
new file mode 100644
index 0000000000000000000000000000000000000000..75ad75b9d73f11375ed63491d9e29efd6f43f143
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/swish.py
@@ -0,0 +1,24 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+
+
+@MODELS.register_module()
+class Swish(nn.Module):
+    """Swish Module.
+
+    This module applies the swish function:
+
+    .. math::
+        Swish(x) = x * Sigmoid(x)
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return x * torch.sigmoid(x)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/transformer.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..f83b9a6977bf821985cb4c2f78de84fcf103fffb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/transformer.py
@@ -0,0 +1,951 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import math
+import warnings
+from typing import Sequence
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.config import ConfigDict
+from mmengine.model import BaseModule, ModuleList, Sequential
+from mmengine.registry import MODELS
+from mmengine.utils import deprecated_api_warning, to_2tuple
+
+from mmcv.cnn import (Linear, build_activation_layer, build_conv_layer,
+                      build_norm_layer)
+from .drop import build_dropout
+from .scale import LayerScale
+
+# Avoid BC-breaking of importing MultiScaleDeformableAttention from this file
+try:
+    from mmcv.ops.multi_scale_deform_attn import \
+        MultiScaleDeformableAttention  # noqa F401
+    warnings.warn(
+        ImportWarning(
+            '``MultiScaleDeformableAttention`` has been moved to '
+            '``mmcv.ops.multi_scale_deform_attn``, please change original path '  # noqa E501
+            '``from mmcv.cnn.bricks.transformer import MultiScaleDeformableAttention`` '  # noqa E501
+            'to ``from mmcv.ops.multi_scale_deform_attn import MultiScaleDeformableAttention`` '  # noqa E501
+        ))
+
+except ImportError:
+    warnings.warn('Fail to import ``MultiScaleDeformableAttention`` from '
+                  '``mmcv.ops.multi_scale_deform_attn``, '
+                  'You should install ``mmcv`` rather than ``mmcv-lite`` '
+                  'if you need this module. ')
+
+
+def build_positional_encoding(cfg, default_args=None):
+    """Builder for Position Encoding."""
+    return MODELS.build(cfg, default_args=default_args)
+
+
+def build_attention(cfg, default_args=None):
+    """Builder for attention."""
+    return MODELS.build(cfg, default_args=default_args)
+
+
+def build_feedforward_network(cfg, default_args=None):
+    """Builder for feed-forward network (FFN)."""
+    return MODELS.build(cfg, default_args=default_args)
+
+
+def build_transformer_layer(cfg, default_args=None):
+    """Builder for transformer layer."""
+    return MODELS.build(cfg, default_args=default_args)
+
+
+def build_transformer_layer_sequence(cfg, default_args=None):
+    """Builder for transformer encoder and transformer decoder."""
+    return MODELS.build(cfg, default_args=default_args)
+
+
+class AdaptivePadding(nn.Module):
+    """Applies padding adaptively to the input.
+
+    This module can make input get fully covered by filter
+    you specified. It support two modes "same" and "corner". The
+    "same" mode is same with "SAME" padding mode in TensorFlow, pad
+    zero around input. The "corner"  mode would pad zero
+    to bottom right.
+
+    Args:
+        kernel_size (int | tuple): Size of the kernel. Default: 1.
+        stride (int | tuple): Stride of the filter. Default: 1.
+        dilation (int | tuple): Spacing between kernel elements.
+            Default: 1.
+        padding (str): Support "same" and "corner", "corner" mode
+            would pad zero to bottom right, and "same" mode would
+            pad zero around input. Default: "corner".
+
+    Example:
+        >>> kernel_size = 16
+        >>> stride = 16
+        >>> dilation = 1
+        >>> input = torch.rand(1, 1, 15, 17)
+        >>> adap_pad = AdaptivePadding(
+        >>>     kernel_size=kernel_size,
+        >>>     stride=stride,
+        >>>     dilation=dilation,
+        >>>     padding="corner")
+        >>> out = adap_pad(input)
+        >>> assert (out.shape[2], out.shape[3]) == (16, 32)
+        >>> input = torch.rand(1, 1, 16, 17)
+        >>> out = adap_pad(input)
+        >>> assert (out.shape[2], out.shape[3]) == (16, 32)
+    """
+
+    def __init__(self, kernel_size=1, stride=1, dilation=1, padding='corner'):
+        super().__init__()
+        assert padding in ('same', 'corner')
+
+        kernel_size = to_2tuple(kernel_size)
+        stride = to_2tuple(stride)
+        dilation = to_2tuple(dilation)
+
+        self.padding = padding
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.dilation = dilation
+
+    def get_pad_shape(self, input_shape):
+        """Calculate the padding size of input.
+
+        Args:
+            input_shape (:obj:`torch.Size`): arrange as (H, W).
+
+        Returns:
+            Tuple[int]: The padding size along the
+            original H and W directions
+        """
+        input_h, input_w = input_shape
+        kernel_h, kernel_w = self.kernel_size
+        stride_h, stride_w = self.stride
+        output_h = math.ceil(input_h / stride_h)
+        output_w = math.ceil(input_w / stride_w)
+        pad_h = max((output_h - 1) * stride_h +
+                    (kernel_h - 1) * self.dilation[0] + 1 - input_h, 0)
+        pad_w = max((output_w - 1) * stride_w +
+                    (kernel_w - 1) * self.dilation[1] + 1 - input_w, 0)
+        return pad_h, pad_w
+
+    def forward(self, x):
+        """Add padding to `x`
+
+        Args:
+            x (Tensor): Input tensor has shape (B, C, H, W).
+
+        Returns:
+            Tensor: The tensor with adaptive padding
+        """
+        pad_h, pad_w = self.get_pad_shape(x.size()[-2:])
+        if pad_h > 0 or pad_w > 0:
+            if self.padding == 'corner':
+                x = F.pad(x, [0, pad_w, 0, pad_h])
+            elif self.padding == 'same':
+                x = F.pad(x, [
+                    pad_w // 2, pad_w - pad_w // 2, pad_h // 2,
+                    pad_h - pad_h // 2
+                ])
+        return x
+
+
+class PatchEmbed(BaseModule):
+    """Image to Patch Embedding.
+
+    We use a conv layer to implement PatchEmbed.
+
+    Args:
+        in_channels (int): The num of input channels. Default: 3
+        embed_dims (int): The dimensions of embedding. Default: 768
+        conv_type (str): The type of convolution
+            to generate patch embedding. Default: "Conv2d".
+        kernel_size (int): The kernel_size of embedding conv. Default: 16.
+        stride (int): The slide stride of embedding conv.
+            Default: 16.
+        padding (int | tuple | string): The padding length of
+            embedding conv. When it is a string, it means the mode
+            of adaptive padding, support "same" and "corner" now.
+            Default: "corner".
+        dilation (int): The dilation rate of embedding conv. Default: 1.
+        bias (bool): Bias of embed conv. Default: True.
+        norm_cfg (dict, optional): Config dict for normalization layer.
+            Default: None.
+        input_size (int | tuple | None): The size of input, which will be
+            used to calculate the out size. Only works when `dynamic_size`
+            is False. Default: None.
+        init_cfg (`mmcv.ConfigDict`, optional): The Config for initialization.
+            Default: None.
+    """
+
+    def __init__(self,
+                 in_channels=3,
+                 embed_dims=768,
+                 conv_type='Conv2d',
+                 kernel_size=16,
+                 stride=16,
+                 padding='corner',
+                 dilation=1,
+                 bias=True,
+                 norm_cfg=None,
+                 input_size=None,
+                 init_cfg=None):
+        super().__init__(init_cfg=init_cfg)
+
+        self.embed_dims = embed_dims
+        if stride is None:
+            stride = kernel_size
+
+        kernel_size = to_2tuple(kernel_size)
+        stride = to_2tuple(stride)
+        dilation = to_2tuple(dilation)
+
+        if isinstance(padding, str):
+            self.adaptive_padding = AdaptivePadding(
+                kernel_size=kernel_size,
+                stride=stride,
+                dilation=dilation,
+                padding=padding)
+            # disable the padding of conv
+            padding = 0
+        else:
+            self.adaptive_padding = None
+        padding = to_2tuple(padding)
+
+        self.projection = build_conv_layer(
+            dict(type=conv_type),
+            in_channels=in_channels,
+            out_channels=embed_dims,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        if norm_cfg is not None:
+            self.norm = build_norm_layer(norm_cfg, embed_dims)[1]
+        else:
+            self.norm = None
+
+        if input_size:
+            input_size = to_2tuple(input_size)
+            # `init_out_size` would be used outside to
+            # calculate the num_patches
+            # e.g. when `use_abs_pos_embed` outside
+            self.init_input_size = input_size
+            if self.adaptive_padding:
+                pad_h, pad_w = self.adaptive_padding.get_pad_shape(input_size)
+                input_h, input_w = input_size
+                input_h = input_h + pad_h
+                input_w = input_w + pad_w
+                input_size = (input_h, input_w)
+
+            # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
+            h_out = (input_size[0] + 2 * padding[0] - dilation[0] *
+                     (kernel_size[0] - 1) - 1) // stride[0] + 1
+            w_out = (input_size[1] + 2 * padding[1] - dilation[1] *
+                     (kernel_size[1] - 1) - 1) // stride[1] + 1
+            self.init_out_size = (h_out, w_out)
+        else:
+            self.init_input_size = None
+            self.init_out_size = None
+
+    def forward(self, x):
+        """
+        Args:
+            x (Tensor): Has shape (B, C, H, W). In most case, C is 3.
+
+        Returns:
+            tuple: Contains merged results and its spatial shape.
+
+            - x (Tensor): Has shape (B, out_h * out_w, embed_dims)
+            - out_size (tuple[int]): Spatial shape of x, arrange as
+              (out_h, out_w).
+        """
+
+        if self.adaptive_padding:
+            x = self.adaptive_padding(x)
+
+        x = self.projection(x)
+        out_size = (x.shape[2], x.shape[3])
+        x = x.flatten(2).transpose(1, 2)
+        if self.norm is not None:
+            x = self.norm(x)
+        return x, out_size
+
+
+class PatchMerging(BaseModule):
+    """Merge patch feature map.
+
+    This layer groups feature map by kernel_size, and applies norm and linear
+    layers to the grouped feature map ((used in Swin Transformer)).
+    Our implementation uses `nn.Unfold` to
+    merge patches, which is about 25% faster than the original
+    implementation. However, we need to modify pretrained
+    models for compatibility.
+
+    Args:
+        in_channels (int): The num of input channels.
+            to gets fully covered by filter and stride you specified.
+        out_channels (int): The num of output channels.
+        kernel_size (int | tuple, optional): the kernel size in the unfold
+            layer. Defaults to 2.
+        stride (int | tuple, optional): the stride of the sliding blocks in the
+            unfold layer. Default: None. (Would be set as `kernel_size`)
+        padding (int | tuple | string ): The padding length of
+            embedding conv. When it is a string, it means the mode
+            of adaptive padding, support "same" and "corner" now.
+            Default: "corner".
+        dilation (int | tuple, optional): dilation parameter in the unfold
+            layer. Default: 1.
+        bias (bool, optional): Whether to add bias in linear layer or not.
+            Defaults: False.
+        norm_cfg (dict, optional): Config dict for normalization layer.
+            Default: dict(type='LN').
+        init_cfg (dict, optional): The extra config for initialization.
+            Default: None.
+    """
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 kernel_size=2,
+                 stride=None,
+                 padding='corner',
+                 dilation=1,
+                 bias=False,
+                 norm_cfg=dict(type='LN'),
+                 init_cfg=None):
+        super().__init__(init_cfg=init_cfg)
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        if stride:
+            stride = stride
+        else:
+            stride = kernel_size
+
+        kernel_size = to_2tuple(kernel_size)
+        stride = to_2tuple(stride)
+        dilation = to_2tuple(dilation)
+
+        if isinstance(padding, str):
+            self.adaptive_padding = AdaptivePadding(
+                kernel_size=kernel_size,
+                stride=stride,
+                dilation=dilation,
+                padding=padding)
+            # disable the padding of unfold
+            padding = 0
+        else:
+            self.adaptive_padding = None
+
+        padding = to_2tuple(padding)
+        self.sampler = nn.Unfold(
+            kernel_size=kernel_size,
+            dilation=dilation,
+            padding=padding,
+            stride=stride)
+
+        sample_dim = kernel_size[0] * kernel_size[1] * in_channels
+
+        if norm_cfg is not None:
+            self.norm = build_norm_layer(norm_cfg, sample_dim)[1]
+        else:
+            self.norm = None
+
+        self.reduction = nn.Linear(sample_dim, out_channels, bias=bias)
+
+    def forward(self, x, input_size):
+        """
+        Args:
+            x (Tensor): Has shape (B, H*W, C_in).
+            input_size (tuple[int]): The spatial shape of x, arrange as (H, W).
+                Default: None.
+
+        Returns:
+            tuple: Contains merged results and its spatial shape.
+
+            - x (Tensor): Has shape (B, Merged_H * Merged_W, C_out)
+            - out_size (tuple[int]): Spatial shape of x, arrange as
+              (Merged_H, Merged_W).
+        """
+        B, L, C = x.shape
+        assert isinstance(input_size, Sequence), f'Expect ' \
+                                                 f'input_size is ' \
+                                                 f'`Sequence` ' \
+                                                 f'but get {input_size}'
+
+        H, W = input_size
+        assert L == H * W, 'input feature has wrong size'
+
+        x = x.view(B, H, W, C).permute([0, 3, 1, 2])  # B, C, H, W
+
+        if self.adaptive_padding:
+            x = self.adaptive_padding(x)
+            H, W = x.shape[-2:]
+
+        # Use nn.Unfold to merge patch. About 25% faster than original method,
+        # but need to modify pretrained model for compatibility
+        # if kernel_size=2 and stride=2, x should has shape (B, 4*C, H/2*W/2)
+        x = self.sampler(x)
+
+        out_h = (H + 2 * self.sampler.padding[0] - self.sampler.dilation[0] *
+                 (self.sampler.kernel_size[0] - 1) -
+                 1) // self.sampler.stride[0] + 1
+        out_w = (W + 2 * self.sampler.padding[1] - self.sampler.dilation[1] *
+                 (self.sampler.kernel_size[1] - 1) -
+                 1) // self.sampler.stride[1] + 1
+
+        output_size = (out_h, out_w)
+        x = x.transpose(1, 2)  # B, H/2*W/2, 4*C
+        x = self.norm(x) if self.norm else x
+        x = self.reduction(x)
+        return x, output_size
+
+
+@MODELS.register_module()
+class MultiheadAttention(BaseModule):
+    """A wrapper for ``torch.nn.MultiheadAttention``.
+
+    This module implements MultiheadAttention with identity connection,
+    and positional encoding  is also passed as input.
+
+    Args:
+        embed_dims (int): The embedding dimension.
+        num_heads (int): Parallel attention heads.
+        attn_drop (float): A Dropout layer on attn_output_weights.
+            Default: 0.0.
+        proj_drop (float): A Dropout layer after `nn.MultiheadAttention`.
+            Default: 0.0.
+        dropout_layer (obj:`ConfigDict`): The dropout_layer used
+            when adding the shortcut.
+        init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.
+            Default: None.
+        batch_first (bool): When it is True,  Key, Query and Value are shape of
+            (batch, n, embed_dim), otherwise (n, batch, embed_dim).
+             Default to False.
+    """
+
+    def __init__(self,
+                 embed_dims,
+                 num_heads,
+                 attn_drop=0.,
+                 proj_drop=0.,
+                 dropout_layer=dict(type='Dropout', drop_prob=0.),
+                 init_cfg=None,
+                 batch_first=False,
+                 **kwargs):
+        super().__init__(init_cfg)
+        if 'dropout' in kwargs:
+            warnings.warn(
+                'The arguments `dropout` in MultiheadAttention '
+                'has been deprecated, now you can separately '
+                'set `attn_drop`(float), proj_drop(float), '
+                'and `dropout_layer`(dict) ', DeprecationWarning)
+            attn_drop = kwargs['dropout']
+            dropout_layer['drop_prob'] = kwargs.pop('dropout')
+
+        self.embed_dims = embed_dims
+        self.num_heads = num_heads
+        self.batch_first = batch_first
+
+        self.attn = nn.MultiheadAttention(embed_dims, num_heads, attn_drop,
+                                          **kwargs)
+
+        self.proj_drop = nn.Dropout(proj_drop)
+        self.dropout_layer = build_dropout(
+            dropout_layer) if dropout_layer else nn.Identity()
+
+    @deprecated_api_warning({'residual': 'identity'},
+                            cls_name='MultiheadAttention')
+    def forward(self,
+                query,
+                key=None,
+                value=None,
+                identity=None,
+                query_pos=None,
+                key_pos=None,
+                attn_mask=None,
+                key_padding_mask=None,
+                **kwargs):
+        """Forward function for `MultiheadAttention`.
+
+        **kwargs allow passing a more general data flow when combining
+        with other operations in `transformerlayer`.
+
+        Args:
+            query (Tensor): The input query with shape [num_queries, bs,
+                embed_dims] if self.batch_first is False, else
+                [bs, num_queries embed_dims].
+            key (Tensor): The key tensor with shape [num_keys, bs,
+                embed_dims] if self.batch_first is False, else
+                [bs, num_keys, embed_dims] .
+                If None, the ``query`` will be used. Defaults to None.
+            value (Tensor): The value tensor with same shape as `key`.
+                Same in `nn.MultiheadAttention.forward`. Defaults to None.
+                If None, the `key` will be used.
+            identity (Tensor): This tensor, with the same shape as x,
+                will be used for the identity link.
+                If None, `x` will be used. Defaults to None.
+            query_pos (Tensor): The positional encoding for query, with
+                the same shape as `x`. If not None, it will
+                be added to `x` before forward function. Defaults to None.
+            key_pos (Tensor): The positional encoding for `key`, with the
+                same shape as `key`. Defaults to None. If not None, it will
+                be added to `key` before forward function. If None, and
+                `query_pos` has the same shape as `key`, then `query_pos`
+                will be used for `key_pos`. Defaults to None.
+            attn_mask (Tensor): ByteTensor mask with shape [num_queries,
+                num_keys]. Same in `nn.MultiheadAttention.forward`.
+                Defaults to None.
+            key_padding_mask (Tensor): ByteTensor with shape [bs, num_keys].
+                Defaults to None.
+
+        Returns:
+            Tensor: forwarded results with shape
+            [num_queries, bs, embed_dims]
+            if self.batch_first is False, else
+            [bs, num_queries embed_dims].
+        """
+
+        if key is None:
+            key = query
+        if value is None:
+            value = key
+        if identity is None:
+            identity = query
+        if key_pos is None:
+            if query_pos is not None:
+                # use query_pos if key_pos is not available
+                if query_pos.shape == key.shape:
+                    key_pos = query_pos
+                else:
+                    warnings.warn(f'position encoding of key is'
+                                  f'missing in {self.__class__.__name__}.')
+        if query_pos is not None:
+            query = query + query_pos
+        if key_pos is not None:
+            key = key + key_pos
+
+        # Because the dataflow('key', 'query', 'value') of
+        # ``torch.nn.MultiheadAttention`` is (num_query, batch,
+        # embed_dims), We should adjust the shape of dataflow from
+        # batch_first (batch, num_query, embed_dims) to num_query_first
+        # (num_query ,batch, embed_dims), and recover ``attn_output``
+        # from num_query_first to batch_first.
+        if self.batch_first:
+            query = query.transpose(0, 1)
+            key = key.transpose(0, 1)
+            value = value.transpose(0, 1)
+
+        out = self.attn(
+            query=query,
+            key=key,
+            value=value,
+            attn_mask=attn_mask,
+            key_padding_mask=key_padding_mask)[0]
+
+        if self.batch_first:
+            out = out.transpose(0, 1)
+
+        return identity + self.dropout_layer(self.proj_drop(out))
+
+
+@MODELS.register_module()
+class FFN(BaseModule):
+    """Implements feed-forward networks (FFNs) with identity connection.
+
+    Args:
+        embed_dims (int): The feature dimension. Same as
+            `MultiheadAttention`. Defaults: 256.
+        feedforward_channels (int): The hidden dimension of FFNs.
+            Defaults: 1024.
+        num_fcs (int, optional): The number of fully-connected layers in
+            FFNs. Default: 2.
+        act_cfg (dict, optional): The activation config for FFNs.
+            Default: dict(type='ReLU')
+        ffn_drop (float, optional): Probability of an element to be
+            zeroed in FFN. Default 0.0.
+        add_identity (bool, optional): Whether to add the
+            identity connection. Default: `True`.
+        dropout_layer (obj:`ConfigDict`): The dropout_layer used
+            when adding the shortcut.
+        init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.
+            Default: None.
+        layer_scale_init_value (float): Initial value of scale factor in
+            LayerScale. Default: 1.0
+    """
+
+    @deprecated_api_warning(
+        {
+            'dropout': 'ffn_drop',
+            'add_residual': 'add_identity'
+        },
+        cls_name='FFN')
+    def __init__(self,
+                 embed_dims=256,
+                 feedforward_channels=1024,
+                 num_fcs=2,
+                 act_cfg=dict(type='ReLU', inplace=True),
+                 ffn_drop=0.,
+                 dropout_layer=None,
+                 add_identity=True,
+                 init_cfg=None,
+                 layer_scale_init_value=0.):
+        super().__init__(init_cfg)
+        assert num_fcs >= 2, 'num_fcs should be no less ' \
+            f'than 2. got {num_fcs}.'
+        self.embed_dims = embed_dims
+        self.feedforward_channels = feedforward_channels
+        self.num_fcs = num_fcs
+
+        layers = []
+        in_channels = embed_dims
+        for _ in range(num_fcs - 1):
+            layers.append(
+                Sequential(
+                    Linear(in_channels, feedforward_channels),
+                    build_activation_layer(act_cfg), nn.Dropout(ffn_drop)))
+            in_channels = feedforward_channels
+        layers.append(Linear(feedforward_channels, embed_dims))
+        layers.append(nn.Dropout(ffn_drop))
+        self.layers = Sequential(*layers)
+        self.dropout_layer = build_dropout(
+            dropout_layer) if dropout_layer else torch.nn.Identity()
+        self.add_identity = add_identity
+
+        if layer_scale_init_value > 0:
+            self.gamma2 = LayerScale(embed_dims, scale=layer_scale_init_value)
+        else:
+            self.gamma2 = nn.Identity()
+
+    @deprecated_api_warning({'residual': 'identity'}, cls_name='FFN')
+    def forward(self, x, identity=None):
+        """Forward function for `FFN`.
+
+        The function would add x to the output tensor if residue is None.
+        """
+        out = self.layers(x)
+        out = self.gamma2(out)
+        if not self.add_identity:
+            return self.dropout_layer(out)
+        if identity is None:
+            identity = x
+        return identity + self.dropout_layer(out)
+
+
+@MODELS.register_module()
+class BaseTransformerLayer(BaseModule):
+    """Base `TransformerLayer` for vision transformer.
+
+    It can be built from `mmcv.ConfigDict` and support more flexible
+    customization, for example, using any number of `FFN or LN ` and
+    use different kinds of `attention` by specifying a list of `ConfigDict`
+    named `attn_cfgs`. It is worth mentioning that it supports `prenorm`
+    when you specifying `norm` as the first element of `operation_order`.
+    More details about the `prenorm`: `On Layer Normalization in the
+    Transformer Architecture <https://arxiv.org/abs/2002.04745>`_ .
+
+    Args:
+        attn_cfgs (list[`mmcv.ConfigDict`] | obj:`mmcv.ConfigDict` | None )):
+            Configs for `self_attention` or `cross_attention` modules,
+            The order of the configs in the list should be consistent with
+            corresponding attentions in operation_order.
+            If it is a dict, all of the attention modules in operation_order
+            will be built with this config. Default: None.
+        ffn_cfgs (list[`mmcv.ConfigDict`] | obj:`mmcv.ConfigDict` | None )):
+            Configs for FFN, The order of the configs in the list should be
+            consistent with corresponding ffn in operation_order.
+            If it is a dict, all of the attention modules in operation_order
+            will be built with this config.
+        operation_order (tuple[str]): The execution order of operation
+            in transformer. Such as ('self_attn', 'norm', 'ffn', 'norm').
+            Support `prenorm` when you specifying first element as `norm`.
+            Default：None.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='LN').
+        init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.
+            Default: None.
+        batch_first (bool): Key, Query and Value are shape
+            of (batch, n, embed_dim)
+            or (n, batch, embed_dim). Default to False.
+    """
+
+    def __init__(self,
+                 attn_cfgs=None,
+                 ffn_cfgs=dict(
+                     type='FFN',
+                     embed_dims=256,
+                     feedforward_channels=1024,
+                     num_fcs=2,
+                     ffn_drop=0.,
+                     act_cfg=dict(type='ReLU', inplace=True),
+                 ),
+                 operation_order=None,
+                 norm_cfg=dict(type='LN'),
+                 init_cfg=None,
+                 batch_first=False,
+                 **kwargs):
+
+        deprecated_args = dict(
+            feedforward_channels='feedforward_channels',
+            ffn_dropout='ffn_drop',
+            ffn_num_fcs='num_fcs')
+        for ori_name, new_name in deprecated_args.items():
+            if ori_name in kwargs:
+                warnings.warn(
+                    f'The arguments `{ori_name}` in BaseTransformerLayer '
+                    f'has been deprecated, now you should set `{new_name}` '
+                    f'and other FFN related arguments '
+                    f'to a dict named `ffn_cfgs`. ', DeprecationWarning)
+                ffn_cfgs[new_name] = kwargs[ori_name]
+
+        super().__init__(init_cfg)
+
+        self.batch_first = batch_first
+
+        assert set(operation_order) & {
+            'self_attn', 'norm', 'ffn', 'cross_attn'} == \
+            set(operation_order), f'The operation_order of' \
+            f' {self.__class__.__name__} should ' \
+            f'contains all four operation type ' \
+            f"{['self_attn', 'norm', 'ffn', 'cross_attn']}"
+
+        num_attn = operation_order.count('self_attn') + operation_order.count(
+            'cross_attn')
+        if isinstance(attn_cfgs, dict):
+            attn_cfgs = [copy.deepcopy(attn_cfgs) for _ in range(num_attn)]
+        else:
+            assert num_attn == len(attn_cfgs), f'The length ' \
+                f'of attn_cfg {num_attn} is ' \
+                f'not consistent with the number of attention' \
+                f'in operation_order {operation_order}.'
+
+        self.num_attn = num_attn
+        self.operation_order = operation_order
+        self.norm_cfg = norm_cfg
+        self.pre_norm = operation_order[0] == 'norm'
+        self.attentions = ModuleList()
+
+        index = 0
+        for operation_name in operation_order:
+            if operation_name in ['self_attn', 'cross_attn']:
+                if 'batch_first' in attn_cfgs[index]:
+                    assert self.batch_first == attn_cfgs[index]['batch_first']
+                else:
+                    attn_cfgs[index]['batch_first'] = self.batch_first
+                attention = build_attention(attn_cfgs[index])
+                # Some custom attentions used as `self_attn`
+                # or `cross_attn` can have different behavior.
+                attention.operation_name = operation_name
+                self.attentions.append(attention)
+                index += 1
+
+        self.embed_dims = self.attentions[0].embed_dims
+
+        self.ffns = ModuleList()
+        num_ffns = operation_order.count('ffn')
+        if isinstance(ffn_cfgs, dict):
+            ffn_cfgs = ConfigDict(ffn_cfgs)
+        if isinstance(ffn_cfgs, dict):
+            ffn_cfgs = [copy.deepcopy(ffn_cfgs) for _ in range(num_ffns)]
+        assert len(ffn_cfgs) == num_ffns
+        for ffn_index in range(num_ffns):
+            if 'embed_dims' not in ffn_cfgs[ffn_index]:
+                ffn_cfgs[ffn_index]['embed_dims'] = self.embed_dims
+            else:
+                assert ffn_cfgs[ffn_index]['embed_dims'] == self.embed_dims
+            self.ffns.append(
+                build_feedforward_network(ffn_cfgs[ffn_index],
+                                          dict(type='FFN')))
+
+        self.norms = ModuleList()
+        num_norms = operation_order.count('norm')
+        for _ in range(num_norms):
+            self.norms.append(build_norm_layer(norm_cfg, self.embed_dims)[1])
+
+    def forward(self,
+                query,
+                key=None,
+                value=None,
+                query_pos=None,
+                key_pos=None,
+                attn_masks=None,
+                query_key_padding_mask=None,
+                key_padding_mask=None,
+                **kwargs):
+        """Forward function for `TransformerDecoderLayer`.
+
+        **kwargs contains some specific arguments of attentions.
+
+        Args:
+            query (Tensor): The input query with shape
+                [num_queries, bs, embed_dims] if
+                self.batch_first is False, else
+                [bs, num_queries embed_dims].
+            key (Tensor): The key tensor with shape [num_keys, bs,
+                embed_dims] if self.batch_first is False, else
+                [bs, num_keys, embed_dims] .
+            value (Tensor): The value tensor with same shape as `key`.
+            query_pos (Tensor): The positional encoding for `query`.
+                Default: None.
+            key_pos (Tensor): The positional encoding for `key`.
+                Default: None.
+            attn_masks (List[Tensor] | None): 2D Tensor used in
+                calculation of corresponding attention. The length of
+                it should equal to the number of `attention` in
+                `operation_order`. Default: None.
+            query_key_padding_mask (Tensor): ByteTensor for `query`, with
+                shape [bs, num_queries]. Only used in `self_attn` layer.
+                Defaults to None.
+            key_padding_mask (Tensor): ByteTensor for `query`, with
+                shape [bs, num_keys]. Default: None.
+
+        Returns:
+            Tensor: forwarded results with shape [num_queries, bs, embed_dims].
+        """
+
+        norm_index = 0
+        attn_index = 0
+        ffn_index = 0
+        identity = query
+        if attn_masks is None:
+            attn_masks = [None for _ in range(self.num_attn)]
+        elif isinstance(attn_masks, torch.Tensor):
+            attn_masks = [
+                copy.deepcopy(attn_masks) for _ in range(self.num_attn)
+            ]
+            warnings.warn(f'Use same attn_mask in all attentions in '
+                          f'{self.__class__.__name__} ')
+        else:
+            assert len(attn_masks) == self.num_attn, f'The length of ' \
+                        f'attn_masks {len(attn_masks)} must be equal ' \
+                        f'to the number of attention in ' \
+                        f'operation_order {self.num_attn}'
+
+        for layer in self.operation_order:
+            if layer == 'self_attn':
+                temp_key = temp_value = query
+                query = self.attentions[attn_index](
+                    query,
+                    temp_key,
+                    temp_value,
+                    identity if self.pre_norm else None,
+                    query_pos=query_pos,
+                    key_pos=query_pos,
+                    attn_mask=attn_masks[attn_index],
+                    key_padding_mask=query_key_padding_mask,
+                    **kwargs)
+                attn_index += 1
+                identity = query
+
+            elif layer == 'norm':
+                query = self.norms[norm_index](query)
+                norm_index += 1
+
+            elif layer == 'cross_attn':
+                query = self.attentions[attn_index](
+                    query,
+                    key,
+                    value,
+                    identity if self.pre_norm else None,
+                    query_pos=query_pos,
+                    key_pos=key_pos,
+                    attn_mask=attn_masks[attn_index],
+                    key_padding_mask=key_padding_mask,
+                    **kwargs)
+                attn_index += 1
+                identity = query
+
+            elif layer == 'ffn':
+                query = self.ffns[ffn_index](
+                    query, identity if self.pre_norm else None)
+                ffn_index += 1
+
+        return query
+
+
+@MODELS.register_module()
+class TransformerLayerSequence(BaseModule):
+    """Base class for TransformerEncoder and TransformerDecoder in vision
+    transformer.
+
+    As base-class of Encoder and Decoder in vision transformer.
+    Support customization such as specifying different kind
+    of `transformer_layer` in `transformer_coder`.
+
+    Args:
+        transformerlayer (list[obj:`mmcv.ConfigDict`] |
+            obj:`mmcv.ConfigDict`): Config of transformerlayer
+            in TransformerCoder. If it is obj:`mmcv.ConfigDict`,
+             it would be repeated `num_layer` times to a
+             list[`mmcv.ConfigDict`]. Default: None.
+        num_layers (int): The number of `TransformerLayer`. Default: None.
+        init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.
+            Default: None.
+    """
+
+    def __init__(self, transformerlayers=None, num_layers=None, init_cfg=None):
+        super().__init__(init_cfg)
+        if isinstance(transformerlayers, dict):
+            transformerlayers = [
+                copy.deepcopy(transformerlayers) for _ in range(num_layers)
+            ]
+        else:
+            assert isinstance(transformerlayers, list) and \
+                   len(transformerlayers) == num_layers
+        self.num_layers = num_layers
+        self.layers = ModuleList()
+        for i in range(num_layers):
+            self.layers.append(build_transformer_layer(transformerlayers[i]))
+        self.embed_dims = self.layers[0].embed_dims
+        self.pre_norm = self.layers[0].pre_norm
+
+    def forward(self,
+                query,
+                key,
+                value,
+                query_pos=None,
+                key_pos=None,
+                attn_masks=None,
+                query_key_padding_mask=None,
+                key_padding_mask=None,
+                **kwargs):
+        """Forward function for `TransformerCoder`.
+
+        Args:
+            query (Tensor): Input query with shape
+                `(num_queries, bs, embed_dims)`.
+            key (Tensor): The key tensor with shape
+                `(num_keys, bs, embed_dims)`.
+            value (Tensor): The value tensor with shape
+                `(num_keys, bs, embed_dims)`.
+            query_pos (Tensor): The positional encoding for `query`.
+                Default: None.
+            key_pos (Tensor): The positional encoding for `key`.
+                Default: None.
+            attn_masks (List[Tensor], optional): Each element is 2D Tensor
+                which is used in calculation of corresponding attention in
+                operation_order. Default: None.
+            query_key_padding_mask (Tensor): ByteTensor for `query`, with
+                shape [bs, num_queries]. Only used in self-attention
+                Default: None.
+            key_padding_mask (Tensor): ByteTensor for `query`, with
+                shape [bs, num_keys]. Default: None.
+
+        Returns:
+            Tensor:  results with shape [num_queries, bs, embed_dims].
+        """
+        for layer in self.layers:
+            query = layer(
+                query,
+                key,
+                value,
+                query_pos=query_pos,
+                key_pos=key_pos,
+                attn_masks=attn_masks,
+                query_key_padding_mask=query_key_padding_mask,
+                key_padding_mask=key_padding_mask,
+                **kwargs)
+        return query
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/upsample.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/upsample.py
new file mode 100644
index 0000000000000000000000000000000000000000..d91689a1c8e16b97c0b4d76092c246e84c84256a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/upsample.py
@@ -0,0 +1,90 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model import xavier_init
+from mmengine.registry import MODELS
+
+MODELS.register_module('nearest', module=nn.Upsample)
+MODELS.register_module('bilinear', module=nn.Upsample)
+
+
+@MODELS.register_module(name='pixel_shuffle')
+class PixelShufflePack(nn.Module):
+    """Pixel Shuffle upsample layer.
+
+    This module packs `F.pixel_shuffle()` and a nn.Conv2d module together to
+    achieve a simple upsampling with pixel shuffle.
+
+    Args:
+        in_channels (int): Number of input channels.
+        out_channels (int): Number of output channels.
+        scale_factor (int): Upsample ratio.
+        upsample_kernel (int): Kernel size of the conv layer to expand the
+            channels.
+    """
+
+    def __init__(self, in_channels: int, out_channels: int, scale_factor: int,
+                 upsample_kernel: int):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.scale_factor = scale_factor
+        self.upsample_kernel = upsample_kernel
+        self.upsample_conv = nn.Conv2d(
+            self.in_channels,
+            self.out_channels * scale_factor * scale_factor,
+            self.upsample_kernel,
+            padding=(self.upsample_kernel - 1) // 2)
+        self.init_weights()
+
+    def init_weights(self):
+        xavier_init(self.upsample_conv, distribution='uniform')
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.upsample_conv(x)
+        x = F.pixel_shuffle(x, self.scale_factor)
+        return x
+
+
+def build_upsample_layer(cfg: Dict, *args, **kwargs) -> nn.Module:
+    """Build upsample layer.
+
+    Args:
+        cfg (dict): The upsample layer config, which should contain:
+
+            - type (str): Layer type.
+            - scale_factor (int): Upsample ratio, which is not applicable to
+              deconv.
+            - layer args: Args needed to instantiate a upsample layer.
+        args (argument list): Arguments passed to the ``__init__``
+            method of the corresponding conv layer.
+        kwargs (keyword arguments): Keyword arguments passed to the
+            ``__init__`` method of the corresponding conv layer.
+
+    Returns:
+        nn.Module: Created upsample layer.
+    """
+    if not isinstance(cfg, dict):
+        raise TypeError(f'cfg must be a dict, but got {type(cfg)}')
+    if 'type' not in cfg:
+        raise KeyError(
+            f'the cfg dict must contain the key "type", but got {cfg}')
+    cfg_ = cfg.copy()
+
+    layer_type = cfg_.pop('type')
+
+    # Switch registry to the target scope. If `upsample` cannot be found
+    # in the registry, fallback to search `upsample` in the
+    # mmengine.MODELS.
+    with MODELS.switch_scope_and_registry(None) as registry:
+        upsample = registry.get(layer_type)
+    if upsample is None:
+        raise KeyError(f'Cannot find {upsample} in registry under scope '
+                       f'name {registry.scope}')
+    if upsample is nn.Upsample:
+        cfg_['mode'] = layer_type
+    layer = upsample(*args, **kwargs, **cfg_)
+    return layer
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/wrappers.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/wrappers.py
new file mode 100644
index 0000000000000000000000000000000000000000..07eb04ee324c713291f834f5020c2943a48d9358
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/bricks/wrappers.py
@@ -0,0 +1,177 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+r"""Modified from https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/wrappers.py  # noqa: E501
+
+Wrap some nn modules to support empty tensor input. Currently, these wrappers
+are mainly used in mask heads like fcn_mask_head and maskiou_heads since mask
+heads are trained on only positive RoIs.
+"""
+import math
+
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+from torch.nn.modules.utils import _pair, _triple
+
+if torch.__version__ == 'parrots':
+    TORCH_VERSION = torch.__version__
+else:
+    # torch.__version__ could be 1.3.1+cu92, we only need the first two
+    # for comparison
+    TORCH_VERSION = tuple(int(x) for x in torch.__version__.split('.')[:2])
+
+
+def obsolete_torch_version(torch_version, version_threshold) -> bool:
+    return torch_version == 'parrots' or torch_version <= version_threshold
+
+
+class NewEmptyTensorOp(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx, x: torch.Tensor, new_shape: tuple) -> torch.Tensor:
+        ctx.shape = x.shape
+        return x.new_empty(new_shape)
+
+    @staticmethod
+    def backward(ctx, grad: torch.Tensor) -> tuple:
+        shape = ctx.shape
+        return NewEmptyTensorOp.apply(grad, shape), None
+
+
+@MODELS.register_module('Conv', force=True)
+class Conv2d(nn.Conv2d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 4)):
+            out_shape = [x.shape[0], self.out_channels]
+            for i, k, p, s, d in zip(x.shape[-2:], self.kernel_size,
+                                     self.padding, self.stride, self.dilation):
+                o = (i + 2 * p - (d * (k - 1) + 1)) // s + 1
+                out_shape.append(o)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            if self.training:
+                # produce dummy gradient to avoid DDP warning.
+                dummy = sum(x.view(-1)[0] for x in self.parameters()) * 0.0
+                return empty + dummy
+            else:
+                return empty
+
+        return super().forward(x)
+
+
+@MODELS.register_module('Conv3d', force=True)
+class Conv3d(nn.Conv3d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 4)):
+            out_shape = [x.shape[0], self.out_channels]
+            for i, k, p, s, d in zip(x.shape[-3:], self.kernel_size,
+                                     self.padding, self.stride, self.dilation):
+                o = (i + 2 * p - (d * (k - 1) + 1)) // s + 1
+                out_shape.append(o)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            if self.training:
+                # produce dummy gradient to avoid DDP warning.
+                dummy = sum(x.view(-1)[0] for x in self.parameters()) * 0.0
+                return empty + dummy
+            else:
+                return empty
+
+        return super().forward(x)
+
+
+@MODELS.register_module()
+@MODELS.register_module('deconv')
+class ConvTranspose2d(nn.ConvTranspose2d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 4)):
+            out_shape = [x.shape[0], self.out_channels]
+            for i, k, p, s, d, op in zip(x.shape[-2:], self.kernel_size,
+                                         self.padding, self.stride,
+                                         self.dilation, self.output_padding):
+                out_shape.append((i - 1) * s - 2 * p + (d * (k - 1) + 1) + op)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            if self.training:
+                # produce dummy gradient to avoid DDP warning.
+                dummy = sum(x.view(-1)[0] for x in self.parameters()) * 0.0
+                return empty + dummy
+            else:
+                return empty
+
+        return super().forward(x)
+
+
+@MODELS.register_module()
+@MODELS.register_module('deconv3d')
+class ConvTranspose3d(nn.ConvTranspose3d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 4)):
+            out_shape = [x.shape[0], self.out_channels]
+            for i, k, p, s, d, op in zip(x.shape[-3:], self.kernel_size,
+                                         self.padding, self.stride,
+                                         self.dilation, self.output_padding):
+                out_shape.append((i - 1) * s - 2 * p + (d * (k - 1) + 1) + op)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            if self.training:
+                # produce dummy gradient to avoid DDP warning.
+                dummy = sum(x.view(-1)[0] for x in self.parameters()) * 0.0
+                return empty + dummy
+            else:
+                return empty
+
+        return super().forward(x)
+
+
+class MaxPool2d(nn.MaxPool2d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # PyTorch 1.9 does not support empty tensor inference yet
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 9)):
+            out_shape = list(x.shape[:2])
+            for i, k, p, s, d in zip(x.shape[-2:], _pair(self.kernel_size),
+                                     _pair(self.padding), _pair(self.stride),
+                                     _pair(self.dilation)):
+                o = (i + 2 * p - (d * (k - 1) + 1)) / s + 1
+                o = math.ceil(o) if self.ceil_mode else math.floor(o)
+                out_shape.append(o)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            return empty
+
+        return super().forward(x)
+
+
+class MaxPool3d(nn.MaxPool3d):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # PyTorch 1.9 does not support empty tensor inference yet
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 9)):
+            out_shape = list(x.shape[:2])
+            for i, k, p, s, d in zip(x.shape[-3:], _triple(self.kernel_size),
+                                     _triple(self.padding),
+                                     _triple(self.stride),
+                                     _triple(self.dilation)):
+                o = (i + 2 * p - (d * (k - 1) + 1)) / s + 1
+                o = math.ceil(o) if self.ceil_mode else math.floor(o)
+                out_shape.append(o)
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            return empty
+
+        return super().forward(x)
+
+
+class Linear(torch.nn.Linear):
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # empty tensor forward of Linear layer is supported in Pytorch 1.6
+        if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 5)):
+            out_shape = [x.shape[0], self.out_features]
+            empty = NewEmptyTensorOp.apply(x, out_shape)
+            if self.training:
+                # produce dummy gradient to avoid DDP warning.
+                dummy = sum(x.view(-1)[0] for x in self.parameters()) * 0.0
+                return empty + dummy
+            else:
+                return empty
+
+        return super().forward(x)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/resnet.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..8fc6abf6ac60b982a8c7998e0545bc55f9ceee78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/resnet.py
@@ -0,0 +1,321 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Optional, Sequence, Tuple, Union
+
+import torch.nn as nn
+import torch.utils.checkpoint as cp
+from mmengine.model import constant_init, kaiming_init
+from mmengine.runner import load_checkpoint
+from torch import Tensor
+
+
+def conv3x3(in_planes: int,
+            out_planes: int,
+            stride: int = 1,
+            dilation: int = 1):
+    """3x3 convolution with padding."""
+    return nn.Conv2d(
+        in_planes,
+        out_planes,
+        kernel_size=3,
+        stride=stride,
+        padding=dilation,
+        dilation=dilation,
+        bias=False)
+
+
+class BasicBlock(nn.Module):
+    expansion = 1
+
+    def __init__(self,
+                 inplanes: int,
+                 planes: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 downsample: Optional[nn.Module] = None,
+                 style: str = 'pytorch',
+                 with_cp: bool = False):
+        super().__init__()
+        assert style in ['pytorch', 'caffe']
+        self.conv1 = conv3x3(inplanes, planes, stride, dilation)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.downsample = downsample
+        self.stride = stride
+        self.dilation = dilation
+        assert not with_cp
+
+    def forward(self, x: Tensor) -> Tensor:
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.relu(out)
+
+        return out
+
+
+class Bottleneck(nn.Module):
+    expansion = 4
+
+    def __init__(self,
+                 inplanes: int,
+                 planes: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 downsample: Optional[nn.Module] = None,
+                 style: str = 'pytorch',
+                 with_cp: bool = False):
+        """Bottleneck block.
+
+        If style is "pytorch", the stride-two layer is the 3x3 conv layer, if
+        it is "caffe", the stride-two layer is the first 1x1 conv layer.
+        """
+        super().__init__()
+        assert style in ['pytorch', 'caffe']
+        if style == 'pytorch':
+            conv1_stride = 1
+            conv2_stride = stride
+        else:
+            conv1_stride = stride
+            conv2_stride = 1
+        self.conv1 = nn.Conv2d(
+            inplanes, planes, kernel_size=1, stride=conv1_stride, bias=False)
+        self.conv2 = nn.Conv2d(
+            planes,
+            planes,
+            kernel_size=3,
+            stride=conv2_stride,
+            padding=dilation,
+            dilation=dilation,
+            bias=False)
+
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.conv3 = nn.Conv2d(
+            planes, planes * self.expansion, kernel_size=1, bias=False)
+        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.stride = stride
+        self.dilation = dilation
+        self.with_cp = with_cp
+
+    def forward(self, x: Tensor) -> Tensor:
+
+        def _inner_forward(x):
+            residual = x
+
+            out = self.conv1(x)
+            out = self.bn1(out)
+            out = self.relu(out)
+
+            out = self.conv2(out)
+            out = self.bn2(out)
+            out = self.relu(out)
+
+            out = self.conv3(out)
+            out = self.bn3(out)
+
+            if self.downsample is not None:
+                residual = self.downsample(x)
+
+            out += residual
+
+            return out
+
+        if self.with_cp and x.requires_grad:
+            out = cp.checkpoint(_inner_forward, x)
+        else:
+            out = _inner_forward(x)
+
+        out = self.relu(out)
+
+        return out
+
+
+def make_res_layer(block: nn.Module,
+                   inplanes: int,
+                   planes: int,
+                   blocks: int,
+                   stride: int = 1,
+                   dilation: int = 1,
+                   style: str = 'pytorch',
+                   with_cp: bool = False) -> nn.Module:
+    downsample = None
+    if stride != 1 or inplanes != planes * block.expansion:
+        downsample = nn.Sequential(
+            nn.Conv2d(
+                inplanes,
+                planes * block.expansion,
+                kernel_size=1,
+                stride=stride,
+                bias=False),
+            nn.BatchNorm2d(planes * block.expansion),
+        )
+
+    layers = []
+    layers.append(
+        block(
+            inplanes,
+            planes,
+            stride,
+            dilation,
+            downsample,
+            style=style,
+            with_cp=with_cp))
+    inplanes = planes * block.expansion
+    for _ in range(1, blocks):
+        layers.append(
+            block(inplanes, planes, 1, dilation, style=style, with_cp=with_cp))
+
+    return nn.Sequential(*layers)
+
+
+class ResNet(nn.Module):
+    """ResNet backbone.
+
+    Args:
+        depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.
+        num_stages (int): Resnet stages, normally 4.
+        strides (Sequence[int]): Strides of the first block of each stage.
+        dilations (Sequence[int]): Dilation of each stage.
+        out_indices (Sequence[int]): Output from which stages.
+        style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two
+            layer is the 3x3 conv layer, otherwise the stride-two layer is
+            the first 1x1 conv layer.
+        frozen_stages (int): Stages to be frozen (all param fixed). -1 means
+            not freezing any parameters.
+        bn_eval (bool): Whether to set BN layers as eval mode, namely, freeze
+            running stats (mean and var).
+        bn_frozen (bool): Whether to freeze weight and bias of BN layers.
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed.
+    """
+
+    arch_settings = {
+        18: (BasicBlock, (2, 2, 2, 2)),
+        34: (BasicBlock, (3, 4, 6, 3)),
+        50: (Bottleneck, (3, 4, 6, 3)),
+        101: (Bottleneck, (3, 4, 23, 3)),
+        152: (Bottleneck, (3, 8, 36, 3))
+    }
+
+    def __init__(self,
+                 depth: int,
+                 num_stages: int = 4,
+                 strides: Sequence[int] = (1, 2, 2, 2),
+                 dilations: Sequence[int] = (1, 1, 1, 1),
+                 out_indices: Sequence[int] = (0, 1, 2, 3),
+                 style: str = 'pytorch',
+                 frozen_stages: int = -1,
+                 bn_eval: bool = True,
+                 bn_frozen: bool = False,
+                 with_cp: bool = False):
+        super().__init__()
+        if depth not in self.arch_settings:
+            raise KeyError(f'invalid depth {depth} for resnet')
+        assert num_stages >= 1 and num_stages <= 4
+        block, stage_blocks = self.arch_settings[depth]
+        stage_blocks = stage_blocks[:num_stages]  # type: ignore
+        assert len(strides) == len(dilations) == num_stages
+        assert max(out_indices) < num_stages
+
+        self.out_indices = out_indices
+        self.style = style
+        self.frozen_stages = frozen_stages
+        self.bn_eval = bn_eval
+        self.bn_frozen = bn_frozen
+        self.with_cp = with_cp
+
+        self.inplanes: int = 64
+        self.conv1 = nn.Conv2d(
+            3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+
+        self.res_layers = []
+        for i, num_blocks in enumerate(stage_blocks):
+            stride = strides[i]
+            dilation = dilations[i]
+            planes = 64 * 2**i
+            res_layer = make_res_layer(
+                block,
+                self.inplanes,
+                planes,
+                num_blocks,
+                stride=stride,
+                dilation=dilation,
+                style=self.style,
+                with_cp=with_cp)
+            self.inplanes = planes * block.expansion  # type: ignore
+            layer_name = f'layer{i + 1}'
+            self.add_module(layer_name, res_layer)
+            self.res_layers.append(layer_name)
+
+        self.feat_dim = block.expansion * 64 * 2**(  # type: ignore
+            len(stage_blocks) - 1)
+
+    def init_weights(self, pretrained: Optional[str] = None) -> None:
+        if isinstance(pretrained, str):
+            logger = logging.getLogger()
+            load_checkpoint(self, pretrained, strict=False, logger=logger)
+        elif pretrained is None:
+            for m in self.modules():
+                if isinstance(m, nn.Conv2d):
+                    kaiming_init(m)
+                elif isinstance(m, nn.BatchNorm2d):
+                    constant_init(m, 1)
+        else:
+            raise TypeError('pretrained must be a str or None')
+
+    def forward(self, x: Tensor) -> Union[Tensor, Tuple[Tensor]]:
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+        outs = []
+        for i, layer_name in enumerate(self.res_layers):
+            res_layer = getattr(self, layer_name)
+            x = res_layer(x)
+            if i in self.out_indices:
+                outs.append(x)
+        if len(outs) == 1:
+            return outs[0]
+        else:
+            return tuple(outs)
+
+    def train(self, mode: bool = True) -> None:
+        super().train(mode)
+        if self.bn_eval:
+            for m in self.modules():
+                if isinstance(m, nn.BatchNorm2d):
+                    m.eval()
+                    if self.bn_frozen:
+                        for params in m.parameters():
+                            params.requires_grad = False
+        if mode and self.frozen_stages >= 0:
+            for param in self.conv1.parameters():
+                param.requires_grad = False
+            for param in self.bn1.parameters():
+                param.requires_grad = False
+            self.bn1.eval()
+            self.bn1.weight.requires_grad = False
+            self.bn1.bias.requires_grad = False
+            for i in range(1, self.frozen_stages + 1):
+                mod = getattr(self, f'layer{i}')
+                mod.eval()
+                for param in mod.parameters():
+                    param.requires_grad = False
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..cdec9399f6544a90de6ac4238a60b05b8888c907
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .flops_counter import get_model_complexity_info
+from .fuse_conv_bn import fuse_conv_bn
+
+__all__ = ['get_model_complexity_info', 'fuse_conv_bn']
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/flops_counter.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/flops_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..b09edbcdff063c5a8276bafdd8d69b440539108e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/flops_counter.py
@@ -0,0 +1,604 @@
+# Modified from flops-counter.pytorch by Vladislav Sovrasov
+# original repo: https://github.com/sovrasov/flops-counter.pytorch
+
+# MIT License
+
+# Copyright (c) 2018 Vladislav Sovrasov
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import sys
+import warnings
+from functools import partial
+from typing import Any, Callable, Dict, Optional, TextIO, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+from mmcv.cnn.bricks import (Conv2d, Conv3d, ConvTranspose2d, Linear,
+                             MaxPool2d, MaxPool3d)
+
+
+def get_model_complexity_info(model: nn.Module,
+                              input_shape: tuple,
+                              print_per_layer_stat: bool = True,
+                              as_strings: bool = True,
+                              input_constructor: Optional[Callable] = None,
+                              flush: bool = False,
+                              ost: TextIO = sys.stdout) -> tuple:
+    """Get complexity information of a model.
+
+    This method can calculate FLOPs and parameter counts of a model with
+    corresponding input shape. It can also print complexity information for
+    each layer in a model.
+
+    Supported layers are listed as below:
+        - Convolutions: ``nn.Conv1d``, ``nn.Conv2d``, ``nn.Conv3d``.
+        - Activations: ``nn.ReLU``, ``nn.PReLU``, ``nn.ELU``,
+          ``nn.LeakyReLU``, ``nn.ReLU6``.
+        - Poolings: ``nn.MaxPool1d``, ``nn.MaxPool2d``, ``nn.MaxPool3d``,
+          ``nn.AvgPool1d``, ``nn.AvgPool2d``, ``nn.AvgPool3d``,
+          ``nn.AdaptiveMaxPool1d``, ``nn.AdaptiveMaxPool2d``,
+          ``nn.AdaptiveMaxPool3d``, ``nn.AdaptiveAvgPool1d``,
+          ``nn.AdaptiveAvgPool2d``, ``nn.AdaptiveAvgPool3d``.
+        - BatchNorms: ``nn.BatchNorm1d``, ``nn.BatchNorm2d``,
+          ``nn.BatchNorm3d``, ``nn.GroupNorm``, ``nn.InstanceNorm1d``,
+          ``InstanceNorm2d``, ``InstanceNorm3d``, ``nn.LayerNorm``.
+        - Linear: ``nn.Linear``.
+        - Deconvolution: ``nn.ConvTranspose2d``.
+        - Upsample: ``nn.Upsample``.
+
+    Args:
+        model (nn.Module): The model for complexity calculation.
+        input_shape (tuple): Input shape used for calculation.
+        print_per_layer_stat (bool): Whether to print complexity information
+            for each layer in a model. Default: True.
+        as_strings (bool): Output FLOPs and params counts in a string form.
+            Default: True.
+        input_constructor (None | callable): If specified, it takes a callable
+            method that generates input. otherwise, it will generate a random
+            tensor with input shape to calculate FLOPs. Default: None.
+        flush (bool): same as that in :func:`print`. Default: False.
+        ost (stream): same as ``file`` param in :func:`print`.
+            Default: sys.stdout.
+
+    Returns:
+        tuple[float | str]: If ``as_strings`` is set to True, it will return
+        FLOPs and parameter counts in a string format. otherwise, it will
+        return those in a float number format.
+    """
+    assert type(input_shape) is tuple
+    assert len(input_shape) >= 1
+    assert isinstance(model, nn.Module)
+    flops_model = add_flops_counting_methods(model)
+    flops_model.eval()
+    flops_model.start_flops_count()
+    if input_constructor:
+        input = input_constructor(input_shape)
+        _ = flops_model(**input)
+    else:
+        try:
+            batch = torch.ones(()).new_empty(
+                (1, *input_shape),
+                dtype=next(flops_model.parameters()).dtype,
+                device=next(flops_model.parameters()).device)
+        except StopIteration:
+            # Avoid StopIteration for models which have no parameters,
+            # like `nn.Relu()`, `nn.AvgPool2d`, etc.
+            batch = torch.ones(()).new_empty((1, *input_shape))
+
+        _ = flops_model(batch)
+
+    flops_count, params_count = flops_model.compute_average_flops_cost()
+    if print_per_layer_stat:
+        print_model_with_flops(
+            flops_model, flops_count, params_count, ost=ost, flush=flush)
+    flops_model.stop_flops_count()
+
+    if as_strings:
+        return flops_to_string(flops_count), params_to_string(params_count)
+
+    return flops_count, params_count
+
+
+def flops_to_string(flops: float,
+                    units: Optional[str] = 'GFLOPs',
+                    precision: int = 2) -> str:
+    """Convert FLOPs number into a string.
+
+    Note that Here we take a multiply-add counts as one FLOP.
+
+    Args:
+        flops (float): FLOPs number to be converted.
+        units (str | None): Converted FLOPs units. Options are None, 'GFLOPs',
+            'MFLOPs', 'KFLOPs', 'FLOPs'. If set to None, it will automatically
+            choose the most suitable unit for FLOPs. Default: 'GFLOPs'.
+        precision (int): Digit number after the decimal point. Default: 2.
+
+    Returns:
+        str: The converted FLOPs number with units.
+
+    Examples:
+        >>> flops_to_string(1e9)
+        '1.0 GFLOPs'
+        >>> flops_to_string(2e5, 'MFLOPs')
+        '0.2 MFLOPs'
+        >>> flops_to_string(3e-9, None)
+        '3e-09 FLOPs'
+    """
+    if units is None:
+        if flops // 10**9 > 0:
+            return str(round(flops / 10.**9, precision)) + ' GFLOPs'
+        elif flops // 10**6 > 0:
+            return str(round(flops / 10.**6, precision)) + ' MFLOPs'
+        elif flops // 10**3 > 0:
+            return str(round(flops / 10.**3, precision)) + ' KFLOPs'
+        else:
+            return str(flops) + ' FLOPs'
+    else:
+        if units == 'GFLOPs':
+            return str(round(flops / 10.**9, precision)) + ' ' + units
+        elif units == 'MFLOPs':
+            return str(round(flops / 10.**6, precision)) + ' ' + units
+        elif units == 'KFLOPs':
+            return str(round(flops / 10.**3, precision)) + ' ' + units
+        else:
+            return str(flops) + ' FLOPs'
+
+
+def params_to_string(num_params: float,
+                     units: Optional[str] = None,
+                     precision: int = 2) -> str:
+    """Convert parameter number into a string.
+
+    Args:
+        num_params (float): Parameter number to be converted.
+        units (str | None): Converted FLOPs units. Options are None, 'M',
+            'K' and ''. If set to None, it will automatically choose the most
+            suitable unit for Parameter number. Default: None.
+        precision (int): Digit number after the decimal point. Default: 2.
+
+    Returns:
+        str: The converted parameter number with units.
+
+    Examples:
+        >>> params_to_string(1e9)
+        '1000.0 M'
+        >>> params_to_string(2e5)
+        '200.0 k'
+        >>> params_to_string(3e-9)
+        '3e-09'
+    """
+    if units is None:
+        if num_params // 10**6 > 0:
+            return str(round(num_params / 10**6, precision)) + ' M'
+        elif num_params // 10**3:
+            return str(round(num_params / 10**3, precision)) + ' k'
+        else:
+            return str(num_params)
+    else:
+        if units == 'M':
+            return str(round(num_params / 10.**6, precision)) + ' ' + units
+        elif units == 'K':
+            return str(round(num_params / 10.**3, precision)) + ' ' + units
+        else:
+            return str(num_params)
+
+
+def print_model_with_flops(model: nn.Module,
+                           total_flops: float,
+                           total_params: float,
+                           units: Optional[str] = 'GFLOPs',
+                           precision: int = 3,
+                           ost: TextIO = sys.stdout,
+                           flush: bool = False) -> None:
+    """Print a model with FLOPs for each layer.
+
+    Args:
+        model (nn.Module): The model to be printed.
+        total_flops (float): Total FLOPs of the model.
+        total_params (float): Total parameter counts of the model.
+        units (str | None): Converted FLOPs units. Default: 'GFLOPs'.
+        precision (int): Digit number after the decimal point. Default: 3.
+        ost (stream): same as `file` param in :func:`print`.
+            Default: sys.stdout.
+        flush (bool): same as that in :func:`print`. Default: False.
+
+    Example:
+        >>> class ExampleModel(nn.Module):
+
+        >>> def __init__(self):
+        >>>     super().__init__()
+        >>>     self.conv1 = nn.Conv2d(3, 8, 3)
+        >>>     self.conv2 = nn.Conv2d(8, 256, 3)
+        >>>     self.conv3 = nn.Conv2d(256, 8, 3)
+        >>>     self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
+        >>>     self.flatten = nn.Flatten()
+        >>>     self.fc = nn.Linear(8, 1)
+
+        >>> def forward(self, x):
+        >>>     x = self.conv1(x)
+        >>>     x = self.conv2(x)
+        >>>     x = self.conv3(x)
+        >>>     x = self.avg_pool(x)
+        >>>     x = self.flatten(x)
+        >>>     x = self.fc(x)
+        >>>     return x
+
+        >>> model = ExampleModel()
+        >>> x = (3, 16, 16)
+        to print the complexity information state for each layer, you can use
+        >>> get_model_complexity_info(model, x)
+        or directly use
+        >>> print_model_with_flops(model, 4579784.0, 37361)
+        ExampleModel(
+          0.037 M, 100.000% Params, 0.005 GFLOPs, 100.000% FLOPs,
+          (conv1): Conv2d(0.0 M, 0.600% Params, 0.0 GFLOPs, 0.959% FLOPs, 3, 8, kernel_size=(3, 3), stride=(1, 1))  # noqa: E501
+          (conv2): Conv2d(0.019 M, 50.020% Params, 0.003 GFLOPs, 58.760% FLOPs, 8, 256, kernel_size=(3, 3), stride=(1, 1))
+          (conv3): Conv2d(0.018 M, 49.356% Params, 0.002 GFLOPs, 40.264% FLOPs, 256, 8, kernel_size=(3, 3), stride=(1, 1))
+          (avg_pool): AdaptiveAvgPool2d(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.017% FLOPs, output_size=(1, 1))
+          (flatten): Flatten(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
+          (fc): Linear(0.0 M, 0.024% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=8, out_features=1, bias=True)
+        )
+    """
+
+    def accumulate_params(self):
+        if is_supported_instance(self):
+            return self.__params__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_params()
+            return sum
+
+    def accumulate_flops(self):
+        if is_supported_instance(self):
+            return self.__flops__ / model.__batch_counter__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_flops()
+            return sum
+
+    def flops_repr(self):
+        accumulated_num_params = self.accumulate_params()
+        accumulated_flops_cost = self.accumulate_flops()
+        return ', '.join([
+            params_to_string(
+                accumulated_num_params, units='M', precision=precision),
+            f'{accumulated_num_params / total_params:.3%} Params',
+            flops_to_string(
+                accumulated_flops_cost, units=units, precision=precision),
+            f'{accumulated_flops_cost / total_flops:.3%} FLOPs',
+            self.original_extra_repr()
+        ])
+
+    def add_extra_repr(m):
+        m.accumulate_flops = accumulate_flops.__get__(m)
+        m.accumulate_params = accumulate_params.__get__(m)
+        flops_extra_repr = flops_repr.__get__(m)
+        if m.extra_repr != flops_extra_repr:
+            m.original_extra_repr = m.extra_repr
+            m.extra_repr = flops_extra_repr
+            assert m.extra_repr != m.original_extra_repr
+
+    def del_extra_repr(m):
+        if hasattr(m, 'original_extra_repr'):
+            m.extra_repr = m.original_extra_repr
+            del m.original_extra_repr
+        if hasattr(m, 'accumulate_flops'):
+            del m.accumulate_flops
+
+    model.apply(add_extra_repr)
+    print(model, file=ost, flush=flush)
+    model.apply(del_extra_repr)
+
+
+def get_model_parameters_number(model: nn.Module) -> float:
+    """Calculate parameter number of a model.
+
+    Args:
+        model (nn.module): The model for parameter number calculation.
+
+    Returns:
+        float: Parameter number of the model.
+    """
+    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    return num_params
+
+
+def add_flops_counting_methods(net_main_module: nn.Module) -> nn.Module:
+    # adding additional methods to the existing module object,
+    # this is done this way so that each function has access to self object
+    net_main_module.start_flops_count = start_flops_count.__get__(  # type: ignore # noqa E501
+        net_main_module)
+    net_main_module.stop_flops_count = stop_flops_count.__get__(  # type: ignore # noqa E501
+        net_main_module)
+    net_main_module.reset_flops_count = reset_flops_count.__get__(  # type: ignore # noqa E501
+        net_main_module)
+    net_main_module.compute_average_flops_cost = compute_average_flops_cost.__get__(  # type: ignore # noqa E501
+        net_main_module)
+
+    net_main_module.reset_flops_count()
+
+    return net_main_module
+
+
+def compute_average_flops_cost(self) -> Tuple[float, float]:
+    """Compute average FLOPs cost.
+
+    A method to compute average FLOPs cost, which will be available after
+    `add_flops_counting_methods()` is called on a desired net object.
+
+    Returns:
+        float: Current mean flops consumption per image.
+    """
+    batches_count = self.__batch_counter__
+    flops_sum = 0
+    for module in self.modules():
+        if is_supported_instance(module):
+            flops_sum += module.__flops__
+    params_sum = get_model_parameters_number(self)
+    return flops_sum / batches_count, params_sum
+
+
+def start_flops_count(self) -> None:
+    """Activate the computation of mean flops consumption per image.
+
+    A method to activate the computation of mean flops consumption per image.
+    which will be available after ``add_flops_counting_methods()`` is called on
+    a desired net object. It should be called before running the network.
+    """
+    add_batch_counter_hook_function(self)
+
+    def add_flops_counter_hook_function(module: nn.Module) -> None:
+        if is_supported_instance(module):
+            if hasattr(module, '__flops_handle__'):
+                return
+
+            else:
+                handle = module.register_forward_hook(
+                    get_modules_mapping()[type(module)])
+
+            module.__flops_handle__ = handle
+
+    self.apply(partial(add_flops_counter_hook_function))
+
+
+def stop_flops_count(self) -> None:
+    """Stop computing the mean flops consumption per image.
+
+    A method to stop computing the mean flops consumption per image, which will
+    be available after ``add_flops_counting_methods()`` is called on a desired
+    net object. It can be called to pause the computation whenever.
+    """
+    remove_batch_counter_hook_function(self)
+    self.apply(remove_flops_counter_hook_function)
+
+
+def reset_flops_count(self) -> None:
+    """Reset statistics computed so far.
+
+    A method to Reset computed statistics, which will be available after
+    `add_flops_counting_methods()` is called on a desired net object.
+    """
+    add_batch_counter_variables_or_reset(self)
+    self.apply(add_flops_counter_variable_or_reset)
+
+
+# ---- Internal functions
+def empty_flops_counter_hook(module: nn.Module, input: tuple,
+                             output: Any) -> None:
+    module.__flops__ += 0
+
+
+def upsample_flops_counter_hook(module: nn.Module, input: tuple,
+                                output: torch.Tensor) -> None:
+    output_size = output[0]
+    batch_size = output_size.shape[0]
+    output_elements_count = batch_size
+    for val in output_size.shape[1:]:
+        output_elements_count *= val
+    module.__flops__ += int(output_elements_count)
+
+
+def relu_flops_counter_hook(module: nn.Module, input: tuple,
+                            output: torch.Tensor) -> None:
+    active_elements_count = output.numel()
+    module.__flops__ += int(active_elements_count)
+
+
+def linear_flops_counter_hook(module: nn.Module, input: tuple,
+                              output: torch.Tensor) -> None:
+    output_last_dim = output.shape[
+        -1]  # pytorch checks dimensions, so here we don't care much
+    module.__flops__ += int(np.prod(input[0].shape) * output_last_dim)
+
+
+def pool_flops_counter_hook(module: nn.Module, input: tuple,
+                            output: torch.Tensor) -> None:
+    module.__flops__ += int(np.prod(input[0].shape))
+
+
+def norm_flops_counter_hook(module: nn.Module, input: tuple,
+                            output: torch.Tensor) -> None:
+    batch_flops = np.prod(input[0].shape)
+    if (getattr(module, 'affine', False)
+            or getattr(module, 'elementwise_affine', False)):
+        batch_flops *= 2
+    module.__flops__ += int(batch_flops)
+
+
+def deconv_flops_counter_hook(conv_module: nn.Module, input: tuple,
+                              output: torch.Tensor) -> None:
+    # Can have multiple inputs, getting the first one
+    batch_size = input[0].shape[0]
+    input_height, input_width = input[0].shape[2:]
+
+    kernel_height, kernel_width = conv_module.kernel_size
+    in_channels = conv_module.in_channels
+    out_channels = conv_module.out_channels
+    groups = conv_module.groups
+
+    filters_per_channel = out_channels // groups
+    conv_per_position_flops = (
+        kernel_height * kernel_width * in_channels * filters_per_channel)
+
+    active_elements_count = batch_size * input_height * input_width
+    overall_conv_flops = conv_per_position_flops * active_elements_count
+    bias_flops = 0
+    if conv_module.bias is not None:
+        output_height, output_width = output.shape[2:]
+        bias_flops = out_channels * batch_size * output_height * output_width
+    overall_flops = overall_conv_flops + bias_flops
+
+    conv_module.__flops__ += int(overall_flops)
+
+
+def conv_flops_counter_hook(conv_module: nn.Module, input: tuple,
+                            output: torch.Tensor) -> None:
+    # Can have multiple inputs, getting the first one
+    batch_size = input[0].shape[0]
+    output_dims = list(output.shape[2:])
+
+    kernel_dims = list(conv_module.kernel_size)
+    in_channels = conv_module.in_channels
+    out_channels = conv_module.out_channels
+    groups = conv_module.groups
+
+    filters_per_channel = out_channels // groups
+    conv_per_position_flops = int(
+        np.prod(kernel_dims)) * in_channels * filters_per_channel
+
+    active_elements_count = batch_size * int(np.prod(output_dims))
+
+    overall_conv_flops = conv_per_position_flops * active_elements_count
+
+    bias_flops = 0
+
+    if conv_module.bias is not None:
+
+        bias_flops = out_channels * active_elements_count
+
+    overall_flops = overall_conv_flops + bias_flops
+
+    conv_module.__flops__ += int(overall_flops)
+
+
+def batch_counter_hook(module: nn.Module, input: tuple, output: Any) -> None:
+    batch_size = 1
+    if len(input) > 0:
+        # Can have multiple inputs, getting the first one
+        batch_size = len(input[0])
+    else:
+        warnings.warn('No positional inputs found for a module, '
+                      'assuming batch size is 1.')
+    module.__batch_counter__ += batch_size
+
+
+def add_batch_counter_variables_or_reset(module: nn.Module) -> None:
+
+    module.__batch_counter__ = 0
+
+
+def add_batch_counter_hook_function(module: nn.Module) -> None:
+    if hasattr(module, '__batch_counter_handle__'):
+        return
+
+    handle = module.register_forward_hook(batch_counter_hook)
+    module.__batch_counter_handle__ = handle
+
+
+def remove_batch_counter_hook_function(module: nn.Module) -> None:
+    if hasattr(module, '__batch_counter_handle__'):
+        module.__batch_counter_handle__.remove()
+        del module.__batch_counter_handle__
+
+
+def add_flops_counter_variable_or_reset(module: nn.Module) -> None:
+    if is_supported_instance(module):
+        if hasattr(module, '__flops__') or hasattr(module, '__params__'):
+            warnings.warn('variables __flops__ or __params__ are already '
+                          'defined for the module' + type(module).__name__ +
+                          ' ptflops can affect your code!')
+        module.__flops__ = 0
+        module.__params__ = get_model_parameters_number(module)
+
+
+def is_supported_instance(module: nn.Module) -> bool:
+    if type(module) in get_modules_mapping():
+        return True
+    return False
+
+
+def remove_flops_counter_hook_function(module: nn.Module) -> None:
+    if is_supported_instance(module):
+        if hasattr(module, '__flops_handle__'):
+            module.__flops_handle__.remove()
+            del module.__flops_handle__
+
+
+def get_modules_mapping() -> Dict:
+    return {
+        # convolutions
+        nn.Conv1d: conv_flops_counter_hook,
+        nn.Conv2d: conv_flops_counter_hook,
+        Conv2d: conv_flops_counter_hook,
+        nn.Conv3d: conv_flops_counter_hook,
+        Conv3d: conv_flops_counter_hook,
+        # activations
+        nn.ReLU: relu_flops_counter_hook,
+        nn.PReLU: relu_flops_counter_hook,
+        nn.ELU: relu_flops_counter_hook,
+        nn.LeakyReLU: relu_flops_counter_hook,
+        nn.ReLU6: relu_flops_counter_hook,
+        # poolings
+        nn.MaxPool1d: pool_flops_counter_hook,
+        nn.AvgPool1d: pool_flops_counter_hook,
+        nn.AvgPool2d: pool_flops_counter_hook,
+        nn.MaxPool2d: pool_flops_counter_hook,
+        MaxPool2d: pool_flops_counter_hook,
+        nn.MaxPool3d: pool_flops_counter_hook,
+        MaxPool3d: pool_flops_counter_hook,
+        nn.AvgPool3d: pool_flops_counter_hook,
+        nn.AdaptiveMaxPool1d: pool_flops_counter_hook,
+        nn.AdaptiveAvgPool1d: pool_flops_counter_hook,
+        nn.AdaptiveMaxPool2d: pool_flops_counter_hook,
+        nn.AdaptiveAvgPool2d: pool_flops_counter_hook,
+        nn.AdaptiveMaxPool3d: pool_flops_counter_hook,
+        nn.AdaptiveAvgPool3d: pool_flops_counter_hook,
+        # normalizations
+        nn.BatchNorm1d: norm_flops_counter_hook,
+        nn.BatchNorm2d: norm_flops_counter_hook,
+        nn.BatchNorm3d: norm_flops_counter_hook,
+        nn.GroupNorm: norm_flops_counter_hook,
+        nn.InstanceNorm1d: norm_flops_counter_hook,
+        nn.InstanceNorm2d: norm_flops_counter_hook,
+        nn.InstanceNorm3d: norm_flops_counter_hook,
+        nn.LayerNorm: norm_flops_counter_hook,
+        # FC
+        nn.Linear: linear_flops_counter_hook,
+        Linear: linear_flops_counter_hook,
+        # Upscale
+        nn.Upsample: upsample_flops_counter_hook,
+        # Deconvolution
+        nn.ConvTranspose2d: deconv_flops_counter_hook,
+        ConvTranspose2d: deconv_flops_counter_hook,
+    }
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/fuse_conv_bn.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/fuse_conv_bn.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ccaab3bf1eb3ce615bad910d6dc45a467bb1fe4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/utils/fuse_conv_bn.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+
+def _fuse_conv_bn(conv: nn.Module, bn: nn.Module) -> nn.Module:
+    """Fuse conv and bn into one module.
+
+    Args:
+        conv (nn.Module): Conv to be fused.
+        bn (nn.Module): BN to be fused.
+
+    Returns:
+        nn.Module: Fused module.
+    """
+    conv_w = conv.weight
+    conv_b = conv.bias if conv.bias is not None else torch.zeros_like(
+        bn.running_mean)
+
+    factor = bn.weight / torch.sqrt(bn.running_var + bn.eps)
+    conv.weight = nn.Parameter(conv_w *
+                               factor.reshape([conv.out_channels, 1, 1, 1]))
+    conv.bias = nn.Parameter((conv_b - bn.running_mean) * factor + bn.bias)
+    return conv
+
+
+def fuse_conv_bn(module: nn.Module) -> nn.Module:
+    """Recursively fuse conv and bn in a module.
+
+    During inference, the functionary of batch norm layers is turned off
+    but only the mean and var alone channels are used, which exposes the
+    chance to fuse it with the preceding conv layers to save computations and
+    simplify network structures.
+
+    Args:
+        module (nn.Module): Module to be fused.
+
+    Returns:
+        nn.Module: Fused module.
+    """
+    last_conv = None
+    last_conv_name = None
+
+    for name, child in module.named_children():
+        if isinstance(child,
+                      (nn.modules.batchnorm._BatchNorm, nn.SyncBatchNorm)):
+            if last_conv is None:  # only fuse BN that is after Conv
+                continue
+            fused_conv = _fuse_conv_bn(last_conv, child)
+            module._modules[last_conv_name] = fused_conv
+            # To reduce changes, set BN as Identity instead of deleting it.
+            module._modules[name] = nn.Identity()
+            last_conv = None
+        elif isinstance(child, nn.Conv2d):
+            last_conv = child
+            last_conv_name = name
+        else:
+            fuse_conv_bn(child)
+    return module
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/vgg.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/vgg.py
new file mode 100644
index 0000000000000000000000000000000000000000..a7f3116062c3943bb85fd7540b23a31918622a24
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/cnn/vgg.py
@@ -0,0 +1,176 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import List, Optional, Sequence, Tuple, Union
+
+import torch.nn as nn
+from mmengine.model import constant_init, kaiming_init, normal_init
+from mmengine.runner import load_checkpoint
+from torch import Tensor
+
+
+def conv3x3(in_planes: int, out_planes: int, dilation: int = 1) -> nn.Module:
+    """3x3 convolution with padding."""
+    return nn.Conv2d(
+        in_planes,
+        out_planes,
+        kernel_size=3,
+        padding=dilation,
+        dilation=dilation)
+
+
+def make_vgg_layer(inplanes: int,
+                   planes: int,
+                   num_blocks: int,
+                   dilation: int = 1,
+                   with_bn: bool = False,
+                   ceil_mode: bool = False) -> List[nn.Module]:
+    layers = []
+    for _ in range(num_blocks):
+        layers.append(conv3x3(inplanes, planes, dilation))
+        if with_bn:
+            layers.append(nn.BatchNorm2d(planes))
+        layers.append(nn.ReLU(inplace=True))
+        inplanes = planes
+    layers.append(nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=ceil_mode))
+
+    return layers
+
+
+class VGG(nn.Module):
+    """VGG backbone.
+
+    Args:
+        depth (int): Depth of vgg, from {11, 13, 16, 19}.
+        with_bn (bool): Use BatchNorm or not.
+        num_classes (int): number of classes for classification.
+        num_stages (int): VGG stages, normally 5.
+        dilations (Sequence[int]): Dilation of each stage.
+        out_indices (Sequence[int]): Output from which stages.
+        frozen_stages (int): Stages to be frozen (all param fixed). -1 means
+            not freezing any parameters.
+        bn_eval (bool): Whether to set BN layers as eval mode, namely, freeze
+            running stats (mean and var).
+        bn_frozen (bool): Whether to freeze weight and bias of BN layers.
+    """
+
+    arch_settings = {
+        11: (1, 1, 2, 2, 2),
+        13: (2, 2, 2, 2, 2),
+        16: (2, 2, 3, 3, 3),
+        19: (2, 2, 4, 4, 4)
+    }
+
+    def __init__(self,
+                 depth: int,
+                 with_bn: bool = False,
+                 num_classes: int = -1,
+                 num_stages: int = 5,
+                 dilations: Sequence[int] = (1, 1, 1, 1, 1),
+                 out_indices: Sequence[int] = (0, 1, 2, 3, 4),
+                 frozen_stages: int = -1,
+                 bn_eval: bool = True,
+                 bn_frozen: bool = False,
+                 ceil_mode: bool = False,
+                 with_last_pool: bool = True):
+        super().__init__()
+        if depth not in self.arch_settings:
+            raise KeyError(f'invalid depth {depth} for vgg')
+        assert num_stages >= 1 and num_stages <= 5
+        stage_blocks = self.arch_settings[depth]
+        self.stage_blocks = stage_blocks[:num_stages]
+        assert len(dilations) == num_stages
+        assert max(out_indices) <= num_stages
+
+        self.num_classes = num_classes
+        self.out_indices = out_indices
+        self.frozen_stages = frozen_stages
+        self.bn_eval = bn_eval
+        self.bn_frozen = bn_frozen
+
+        self.inplanes = 3
+        start_idx = 0
+        vgg_layers = []
+        self.range_sub_modules = []
+        for i, num_blocks in enumerate(self.stage_blocks):
+            num_modules = num_blocks * (2 + with_bn) + 1
+            end_idx = start_idx + num_modules
+            dilation = dilations[i]
+            planes = 64 * 2**i if i < 4 else 512
+            vgg_layer = make_vgg_layer(
+                self.inplanes,
+                planes,
+                num_blocks,
+                dilation=dilation,
+                with_bn=with_bn,
+                ceil_mode=ceil_mode)
+            vgg_layers.extend(vgg_layer)
+            self.inplanes = planes
+            self.range_sub_modules.append([start_idx, end_idx])
+            start_idx = end_idx
+        if not with_last_pool:
+            vgg_layers.pop(-1)
+            self.range_sub_modules[-1][1] -= 1
+        self.module_name = 'features'
+        self.add_module(self.module_name, nn.Sequential(*vgg_layers))
+
+        if self.num_classes > 0:
+            self.classifier = nn.Sequential(
+                nn.Linear(512 * 7 * 7, 4096),
+                nn.ReLU(True),
+                nn.Dropout(),
+                nn.Linear(4096, 4096),
+                nn.ReLU(True),
+                nn.Dropout(),
+                nn.Linear(4096, num_classes),
+            )
+
+    def init_weights(self, pretrained: Optional[str] = None) -> None:
+        if isinstance(pretrained, str):
+            logger = logging.getLogger()
+            load_checkpoint(self, pretrained, strict=False, logger=logger)
+        elif pretrained is None:
+            for m in self.modules():
+                if isinstance(m, nn.Conv2d):
+                    kaiming_init(m)
+                elif isinstance(m, nn.BatchNorm2d):
+                    constant_init(m, 1)
+                elif isinstance(m, nn.Linear):
+                    normal_init(m, std=0.01)
+        else:
+            raise TypeError('pretrained must be a str or None')
+
+    def forward(self, x: Tensor) -> Union[Tensor, Tuple[Tensor, ...]]:
+        outs = []
+        vgg_layers = getattr(self, self.module_name)
+        for i in range(len(self.stage_blocks)):
+            for j in range(*self.range_sub_modules[i]):
+                vgg_layer = vgg_layers[j]
+                x = vgg_layer(x)
+            if i in self.out_indices:
+                outs.append(x)
+        if self.num_classes > 0:
+            x = x.view(x.size(0), -1)
+            x = self.classifier(x)
+            outs.append(x)
+        if len(outs) == 1:
+            return outs[0]
+        else:
+            return tuple(outs)
+
+    def train(self, mode: bool = True) -> None:
+        super().train(mode)
+        if self.bn_eval:
+            for m in self.modules():
+                if isinstance(m, nn.BatchNorm2d):
+                    m.eval()
+                    if self.bn_frozen:
+                        for params in m.parameters():
+                            params.requires_grad = False
+        vgg_layers = getattr(self, self.module_name)
+        if mode and self.frozen_stages >= 0:
+            for i in range(self.frozen_stages):
+                for j in range(*self.range_sub_modules[i]):
+                    mod = vgg_layers[j]
+                    mod.eval()
+                    for param in mod.parameters():
+                        param.requires_grad = False
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..92ecec4046a6f5ee25b4ea07215ed7c7c810dcfa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/__init__.py
@@ -0,0 +1,29 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .colorspace import (bgr2gray, bgr2hls, bgr2hsv, bgr2rgb, bgr2ycbcr,
+                         gray2bgr, gray2rgb, hls2bgr, hsv2bgr, imconvert,
+                         rgb2bgr, rgb2gray, rgb2ycbcr, ycbcr2bgr, ycbcr2rgb)
+from .geometric import (cutout, imcrop, imflip, imflip_, impad,
+                        impad_to_multiple, imrescale, imresize, imresize_like,
+                        imresize_to_multiple, imrotate, imshear, imtranslate,
+                        rescale_size)
+from .io import imfrombytes, imread, imwrite, supported_backends, use_backend
+from .misc import tensor2imgs
+from .photometric import (adjust_brightness, adjust_color, adjust_contrast,
+                          adjust_hue, adjust_lighting, adjust_sharpness,
+                          auto_contrast, clahe, imdenormalize, imequalize,
+                          iminvert, imnormalize, imnormalize_, lut_transform,
+                          posterize, solarize)
+
+__all__ = [
+    'bgr2gray', 'bgr2hls', 'bgr2hsv', 'bgr2rgb', 'gray2bgr', 'gray2rgb',
+    'hls2bgr', 'hsv2bgr', 'imconvert', 'rgb2bgr', 'rgb2gray', 'imrescale',
+    'imresize', 'imresize_like', 'imresize_to_multiple', 'rescale_size',
+    'imcrop', 'imflip', 'imflip_', 'impad', 'impad_to_multiple', 'imrotate',
+    'imfrombytes', 'imread', 'imwrite', 'supported_backends', 'use_backend',
+    'imdenormalize', 'imnormalize', 'imnormalize_', 'iminvert', 'posterize',
+    'solarize', 'rgb2ycbcr', 'bgr2ycbcr', 'ycbcr2rgb', 'ycbcr2bgr',
+    'tensor2imgs', 'imshear', 'imtranslate', 'adjust_color', 'imequalize',
+    'adjust_brightness', 'adjust_contrast', 'lut_transform', 'clahe',
+    'adjust_sharpness', 'auto_contrast', 'cutout', 'adjust_lighting',
+    'adjust_hue'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/colorspace.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/colorspace.py
new file mode 100644
index 0000000000000000000000000000000000000000..08f9952408c8e0bb38b17c10e2089e900ed418c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/colorspace.py
@@ -0,0 +1,309 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Callable, Union
+
+import cv2
+import numpy as np
+
+
+def imconvert(img: np.ndarray, src: str, dst: str) -> np.ndarray:
+    """Convert an image from the src colorspace to dst colorspace.
+
+    Args:
+        img (ndarray): The input image.
+        src (str): The source colorspace, e.g., 'rgb', 'hsv'.
+        dst (str): The destination colorspace, e.g., 'rgb', 'hsv'.
+
+    Returns:
+        ndarray: The converted image.
+    """
+    code = getattr(cv2, f'COLOR_{src.upper()}2{dst.upper()}')
+    out_img = cv2.cvtColor(img, code)
+    return out_img
+
+
+def bgr2gray(img: np.ndarray, keepdim: bool = False) -> np.ndarray:
+    """Convert a BGR image to grayscale image.
+
+    Args:
+        img (ndarray): The input image.
+        keepdim (bool): If False (by default), then return the grayscale image
+            with 2 dims, otherwise 3 dims.
+
+    Returns:
+        ndarray: The converted grayscale image.
+    """
+    out_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+    if keepdim:
+        out_img = out_img[..., None]
+    return out_img
+
+
+def rgb2gray(img: np.ndarray, keepdim: bool = False) -> np.ndarray:
+    """Convert a RGB image to grayscale image.
+
+    Args:
+        img (ndarray): The input image.
+        keepdim (bool): If False (by default), then return the grayscale image
+            with 2 dims, otherwise 3 dims.
+
+    Returns:
+        ndarray: The converted grayscale image.
+    """
+    out_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
+    if keepdim:
+        out_img = out_img[..., None]
+    return out_img
+
+
+def gray2bgr(img: np.ndarray) -> np.ndarray:
+    """Convert a grayscale image to BGR image.
+
+    Args:
+        img (ndarray): The input image.
+
+    Returns:
+        ndarray: The converted BGR image.
+    """
+    img = img[..., None] if img.ndim == 2 else img
+    out_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+    return out_img
+
+
+def gray2rgb(img: np.ndarray) -> np.ndarray:
+    """Convert a grayscale image to RGB image.
+
+    Args:
+        img (ndarray): The input image.
+
+    Returns:
+        ndarray: The converted RGB image.
+    """
+    img = img[..., None] if img.ndim == 2 else img
+    out_img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
+    return out_img
+
+
+def _convert_input_type_range(img: np.ndarray) -> np.ndarray:
+    """Convert the type and range of the input image.
+
+    It converts the input image to np.float32 type and range of [0, 1].
+    It is mainly used for pre-processing the input image in colorspace
+    conversion functions such as rgb2ycbcr and ycbcr2rgb.
+
+    Args:
+        img (ndarray): The input image. It accepts:
+            1. np.uint8 type with range [0, 255];
+            2. np.float32 type with range [0, 1].
+
+    Returns:
+        (ndarray): The converted image with type of np.float32 and range of
+            [0, 1].
+    """
+    img_type = img.dtype
+    img = img.astype(np.float32)
+    if img_type == np.float32:
+        pass
+    elif img_type == np.uint8:
+        img /= 255.
+    else:
+        raise TypeError('The img type should be np.float32 or np.uint8, '
+                        f'but got {img_type}')
+    return img
+
+
+def _convert_output_type_range(
+        img: np.ndarray, dst_type: Union[np.uint8, np.float32]) -> np.ndarray:
+    """Convert the type and range of the image according to dst_type.
+
+    It converts the image to desired type and range. If `dst_type` is np.uint8,
+    images will be converted to np.uint8 type with range [0, 255]. If
+    `dst_type` is np.float32, it converts the image to np.float32 type with
+    range [0, 1].
+    It is mainly used for post-processing images in colorspace conversion
+    functions such as rgb2ycbcr and ycbcr2rgb.
+
+    Args:
+        img (ndarray): The image to be converted with np.float32 type and
+            range [0, 255].
+        dst_type (np.uint8 | np.float32): If dst_type is np.uint8, it
+            converts the image to np.uint8 type with range [0, 255]. If
+            dst_type is np.float32, it converts the image to np.float32 type
+            with range [0, 1].
+
+    Returns:
+        (ndarray): The converted image with desired type and range.
+    """
+    if dst_type not in (np.uint8, np.float32):
+        raise TypeError('The dst_type should be np.float32 or np.uint8, '
+                        f'but got {dst_type}')
+    if dst_type == np.uint8:
+        img = img.round()
+    else:
+        img /= 255.
+    return img.astype(dst_type)
+
+
+def rgb2ycbcr(img: np.ndarray, y_only: bool = False) -> np.ndarray:
+    """Convert a RGB image to YCbCr image.
+
+    This function produces the same results as Matlab's `rgb2ycbcr` function.
+    It implements the ITU-R BT.601 conversion for standard-definition
+    television. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+
+    It differs from a similar function in cv2.cvtColor: `RGB <-> YCrCb`.
+    In OpenCV, it implements a JPEG conversion. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+
+    Args:
+        img (ndarray): The input image. It accepts:
+            1. np.uint8 type with range [0, 255];
+            2. np.float32 type with range [0, 1].
+        y_only (bool): Whether to only return Y channel. Default: False.
+
+    Returns:
+        ndarray: The converted YCbCr image. The output image has the same type
+        and range as input image.
+    """
+    img_type = img.dtype
+    img = _convert_input_type_range(img)
+    if y_only:
+        out_img = np.dot(img, [65.481, 128.553, 24.966]) + 16.0
+    else:
+        out_img = np.matmul(
+            img, [[65.481, -37.797, 112.0], [128.553, -74.203, -93.786],
+                  [24.966, 112.0, -18.214]]) + [16, 128, 128]
+    out_img = _convert_output_type_range(out_img, img_type)
+    return out_img
+
+
+def bgr2ycbcr(img: np.ndarray, y_only: bool = False) -> np.ndarray:
+    """Convert a BGR image to YCbCr image.
+
+    The bgr version of rgb2ycbcr.
+    It implements the ITU-R BT.601 conversion for standard-definition
+    television. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+
+    It differs from a similar function in cv2.cvtColor: `BGR <-> YCrCb`.
+    In OpenCV, it implements a JPEG conversion. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+
+    Args:
+        img (ndarray): The input image. It accepts:
+            1. np.uint8 type with range [0, 255];
+            2. np.float32 type with range [0, 1].
+        y_only (bool): Whether to only return Y channel. Default: False.
+
+    Returns:
+        ndarray: The converted YCbCr image. The output image has the same type
+        and range as input image.
+    """
+    img_type = img.dtype
+    img = _convert_input_type_range(img)
+    if y_only:
+        out_img = np.dot(img, [24.966, 128.553, 65.481]) + 16.0
+    else:
+        out_img = np.matmul(
+            img, [[24.966, 112.0, -18.214], [128.553, -74.203, -93.786],
+                  [65.481, -37.797, 112.0]]) + [16, 128, 128]
+    out_img = _convert_output_type_range(out_img, img_type)
+    return out_img
+
+
+def ycbcr2rgb(img: np.ndarray) -> np.ndarray:
+    """Convert a YCbCr image to RGB image.
+
+    This function produces the same results as Matlab's ycbcr2rgb function.
+    It implements the ITU-R BT.601 conversion for standard-definition
+    television. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+
+    It differs from a similar function in cv2.cvtColor: `YCrCb <-> RGB`.
+    In OpenCV, it implements a JPEG conversion. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+
+    Args:
+        img (ndarray): The input image. It accepts:
+            1. np.uint8 type with range [0, 255];
+            2. np.float32 type with range [0, 1].
+
+    Returns:
+        ndarray: The converted RGB image. The output image has the same type
+        and range as input image.
+    """
+    img_type = img.dtype
+    img = _convert_input_type_range(img) * 255
+    out_img = np.matmul(img, [[0.00456621, 0.00456621, 0.00456621],
+                              [0, -0.00153632, 0.00791071],
+                              [0.00625893, -0.00318811, 0]]) * 255.0 + [
+                                  -222.921, 135.576, -276.836
+                              ]
+    out_img = _convert_output_type_range(out_img, img_type)
+    return out_img
+
+
+def ycbcr2bgr(img: np.ndarray) -> np.ndarray:
+    """Convert a YCbCr image to BGR image.
+
+    The bgr version of ycbcr2rgb.
+    It implements the ITU-R BT.601 conversion for standard-definition
+    television. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
+
+    It differs from a similar function in cv2.cvtColor: `YCrCb <-> BGR`.
+    In OpenCV, it implements a JPEG conversion. See more details in
+    https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
+
+    Args:
+        img (ndarray): The input image. It accepts:
+            1. np.uint8 type with range [0, 255];
+            2. np.float32 type with range [0, 1].
+
+    Returns:
+        ndarray: The converted BGR image. The output image has the same type
+        and range as input image.
+    """
+    img_type = img.dtype
+    img = _convert_input_type_range(img) * 255
+    out_img = np.matmul(img, [[0.00456621, 0.00456621, 0.00456621],
+                              [0.00791071, -0.00153632, 0],
+                              [0, -0.00318811, 0.00625893]]) * 255.0 + [
+                                  -276.836, 135.576, -222.921
+                              ]
+    out_img = _convert_output_type_range(out_img, img_type)
+    return out_img
+
+
+def convert_color_factory(src: str, dst: str) -> Callable:
+
+    code = getattr(cv2, f'COLOR_{src.upper()}2{dst.upper()}')
+
+    def convert_color(img: np.ndarray) -> np.ndarray:
+        out_img = cv2.cvtColor(img, code)
+        return out_img
+
+    convert_color.__doc__ = f"""Convert a {src.upper()} image to {dst.upper()}
+        image.
+
+    Args:
+        img (ndarray or str): The input image.
+
+    Returns:
+        ndarray: The converted {dst.upper()} image.
+    """
+
+    return convert_color
+
+
+bgr2rgb = convert_color_factory('bgr', 'rgb')
+
+rgb2bgr = convert_color_factory('rgb', 'bgr')
+
+bgr2hsv = convert_color_factory('bgr', 'hsv')
+
+hsv2bgr = convert_color_factory('hsv', 'bgr')
+
+bgr2hls = convert_color_factory('bgr', 'hls')
+
+hls2bgr = convert_color_factory('hls', 'bgr')
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/geometric.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/geometric.py
new file mode 100644
index 0000000000000000000000000000000000000000..88fb63693a95ad04bd745f9e935af1e48e731f4e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/geometric.py
@@ -0,0 +1,786 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numbers
+from typing import List, Optional, Tuple, Union, no_type_check
+
+import cv2
+import numpy as np
+from mmengine.utils import to_2tuple
+
+from .io import imread_backend
+
+try:
+    from PIL import Image
+except ImportError:
+    Image = None
+
+
+def _scale_size(
+    size: Tuple[int, int],
+    scale: Union[float, int, tuple],
+) -> Tuple[int, int]:
+    """Rescale a size by a ratio.
+
+    Args:
+        size (tuple[int]): (w, h).
+        scale (float | tuple(float)): Scaling factor.
+
+    Returns:
+        tuple[int]: scaled size.
+    """
+    if isinstance(scale, (float, int)):
+        scale = (scale, scale)
+    w, h = size
+    return int(w * float(scale[0]) + 0.5), int(h * float(scale[1]) + 0.5)
+
+
+cv2_interp_codes = {
+    'nearest': cv2.INTER_NEAREST,
+    'bilinear': cv2.INTER_LINEAR,
+    'bicubic': cv2.INTER_CUBIC,
+    'area': cv2.INTER_AREA,
+    'lanczos': cv2.INTER_LANCZOS4
+}
+
+cv2_border_modes = {
+    'constant': cv2.BORDER_CONSTANT,
+    'replicate': cv2.BORDER_REPLICATE,
+    'reflect': cv2.BORDER_REFLECT,
+    'wrap': cv2.BORDER_WRAP,
+    'reflect_101': cv2.BORDER_REFLECT_101,
+    'transparent': cv2.BORDER_TRANSPARENT,
+    'isolated': cv2.BORDER_ISOLATED
+}
+
+# Pillow >=v9.1.0 use a slightly different naming scheme for filters.
+# Set pillow_interp_codes according to the naming scheme used.
+if Image is not None:
+    if hasattr(Image, 'Resampling'):
+        pillow_interp_codes = {
+            'nearest': Image.Resampling.NEAREST,
+            'bilinear': Image.Resampling.BILINEAR,
+            'bicubic': Image.Resampling.BICUBIC,
+            'box': Image.Resampling.BOX,
+            'lanczos': Image.Resampling.LANCZOS,
+            'hamming': Image.Resampling.HAMMING
+        }
+    else:
+        pillow_interp_codes = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+
+
+def imresize(
+    img: np.ndarray,
+    size: Tuple[int, int],
+    return_scale: bool = False,
+    interpolation: str = 'bilinear',
+    out: Optional[np.ndarray] = None,
+    backend: Optional[str] = None
+) -> Union[Tuple[np.ndarray, float, float], np.ndarray]:
+    """Resize image to a given size.
+
+    Args:
+        img (ndarray): The input image.
+        size (tuple[int]): Target size (w, h).
+        return_scale (bool): Whether to return `w_scale` and `h_scale`.
+        interpolation (str): Interpolation method, accepted values are
+            "nearest", "bilinear", "bicubic", "area", "lanczos" for 'cv2'
+            backend, "nearest", "bilinear" for 'pillow' backend.
+        out (ndarray): The output destination.
+        backend (str | None): The image resize backend type. Options are `cv2`,
+            `pillow`, `None`. If backend is None, the global imread_backend
+            specified by ``mmcv.use_backend()`` will be used. Default: None.
+
+    Returns:
+        tuple | ndarray: (`resized_img`, `w_scale`, `h_scale`) or
+        `resized_img`.
+    """
+    h, w = img.shape[:2]
+    if backend is None:
+        backend = imread_backend
+    if backend not in ['cv2', 'pillow']:
+        raise ValueError(f'backend: {backend} is not supported for resize.'
+                         f"Supported backends are 'cv2', 'pillow'")
+
+    if backend == 'pillow':
+        assert img.dtype == np.uint8, 'Pillow backend only support uint8 type'
+        pil_image = Image.fromarray(img)
+        pil_image = pil_image.resize(size, pillow_interp_codes[interpolation])
+        resized_img = np.array(pil_image)
+    else:
+        resized_img = cv2.resize(
+            img, size, dst=out, interpolation=cv2_interp_codes[interpolation])
+    if not return_scale:
+        return resized_img
+    else:
+        w_scale = size[0] / w
+        h_scale = size[1] / h
+        return resized_img, w_scale, h_scale
+
+
+@no_type_check
+def imresize_to_multiple(
+    img: np.ndarray,
+    divisor: Union[int, Tuple[int, int]],
+    size: Union[int, Tuple[int, int], None] = None,
+    scale_factor: Union[float, Tuple[float, float], None] = None,
+    keep_ratio: bool = False,
+    return_scale: bool = False,
+    interpolation: str = 'bilinear',
+    out: Optional[np.ndarray] = None,
+    backend: Optional[str] = None
+) -> Union[Tuple[np.ndarray, float, float], np.ndarray]:
+    """Resize image according to a given size or scale factor and then rounds
+    up the the resized or rescaled image size to the nearest value that can be
+    divided by the divisor.
+
+    Args:
+        img (ndarray): The input image.
+        divisor (int | tuple): Resized image size will be a multiple of
+            divisor. If divisor is a tuple, divisor should be
+            (w_divisor, h_divisor).
+        size (None | int | tuple[int]): Target size (w, h). Default: None.
+        scale_factor (None | float | tuple[float]): Multiplier for spatial
+            size. Should match input size if it is a tuple and the 2D style is
+            (w_scale_factor, h_scale_factor). Default: None.
+        keep_ratio (bool): Whether to keep the aspect ratio when resizing the
+            image. Default: False.
+        return_scale (bool): Whether to return `w_scale` and `h_scale`.
+        interpolation (str): Interpolation method, accepted values are
+            "nearest", "bilinear", "bicubic", "area", "lanczos" for 'cv2'
+            backend, "nearest", "bilinear" for 'pillow' backend.
+        out (ndarray): The output destination.
+        backend (str | None): The image resize backend type. Options are `cv2`,
+            `pillow`, `None`. If backend is None, the global imread_backend
+            specified by ``mmcv.use_backend()`` will be used. Default: None.
+
+    Returns:
+        tuple | ndarray: (`resized_img`, `w_scale`, `h_scale`) or
+        `resized_img`.
+    """
+    h, w = img.shape[:2]
+    if size is not None and scale_factor is not None:
+        raise ValueError('only one of size or scale_factor should be defined')
+    elif size is None and scale_factor is None:
+        raise ValueError('one of size or scale_factor should be defined')
+    elif size is not None:
+        size = to_2tuple(size)
+        if keep_ratio:
+            size = rescale_size((w, h), size, return_scale=False)
+    else:
+        size = _scale_size((w, h), scale_factor)
+
+    divisor = to_2tuple(divisor)
+    size = tuple(int(np.ceil(s / d)) * d for s, d in zip(size, divisor))
+    resized_img, w_scale, h_scale = imresize(
+        img,
+        size,
+        return_scale=True,
+        interpolation=interpolation,
+        out=out,
+        backend=backend)
+    if return_scale:
+        return resized_img, w_scale, h_scale
+    else:
+        return resized_img
+
+
+def imresize_like(
+    img: np.ndarray,
+    dst_img: np.ndarray,
+    return_scale: bool = False,
+    interpolation: str = 'bilinear',
+    backend: Optional[str] = None
+) -> Union[Tuple[np.ndarray, float, float], np.ndarray]:
+    """Resize image to the same size of a given image.
+
+    Args:
+        img (ndarray): The input image.
+        dst_img (ndarray): The target image.
+        return_scale (bool): Whether to return `w_scale` and `h_scale`.
+        interpolation (str): Same as :func:`resize`.
+        backend (str | None): Same as :func:`resize`.
+
+    Returns:
+        tuple or ndarray: (`resized_img`, `w_scale`, `h_scale`) or
+        `resized_img`.
+    """
+    h, w = dst_img.shape[:2]
+    return imresize(img, (w, h), return_scale, interpolation, backend=backend)
+
+
+def rescale_size(old_size: tuple,
+                 scale: Union[float, int, tuple],
+                 return_scale: bool = False) -> tuple:
+    """Calculate the new size to be rescaled to.
+
+    Args:
+        old_size (tuple[int]): The old size (w, h) of image.
+        scale (float | tuple[int]): The scaling factor or maximum size.
+            If it is a float number, then the image will be rescaled by this
+            factor, else if it is a tuple of 2 integers, then the image will
+            be rescaled as large as possible within the scale.
+        return_scale (bool): Whether to return the scaling factor besides the
+            rescaled image size.
+
+    Returns:
+        tuple[int]: The new rescaled image size.
+    """
+    w, h = old_size
+    if isinstance(scale, (float, int)):
+        if scale <= 0:
+            raise ValueError(f'Invalid scale {scale}, must be positive.')
+        scale_factor = scale
+    elif isinstance(scale, tuple):
+        max_long_edge = max(scale)
+        max_short_edge = min(scale)
+        scale_factor = min(max_long_edge / max(h, w),
+                           max_short_edge / min(h, w))
+    else:
+        raise TypeError(
+            f'Scale must be a number or tuple of int, but got {type(scale)}')
+
+    new_size = _scale_size((w, h), scale_factor)
+
+    if return_scale:
+        return new_size, scale_factor
+    else:
+        return new_size
+
+
+def imrescale(
+    img: np.ndarray,
+    scale: Union[float, Tuple[int, int]],
+    return_scale: bool = False,
+    interpolation: str = 'bilinear',
+    backend: Optional[str] = None
+) -> Union[np.ndarray, Tuple[np.ndarray, float]]:
+    """Resize image while keeping the aspect ratio.
+
+    Args:
+        img (ndarray): The input image.
+        scale (float | tuple[int]): The scaling factor or maximum size.
+            If it is a float number, then the image will be rescaled by this
+            factor, else if it is a tuple of 2 integers, then the image will
+            be rescaled as large as possible within the scale.
+        return_scale (bool): Whether to return the scaling factor besides the
+            rescaled image.
+        interpolation (str): Same as :func:`resize`.
+        backend (str | None): Same as :func:`resize`.
+
+    Returns:
+        ndarray: The rescaled image.
+    """
+    h, w = img.shape[:2]
+    new_size, scale_factor = rescale_size((w, h), scale, return_scale=True)
+    rescaled_img = imresize(
+        img, new_size, interpolation=interpolation, backend=backend)
+    if return_scale:
+        return rescaled_img, scale_factor
+    else:
+        return rescaled_img
+
+
+def imflip(img: np.ndarray, direction: str = 'horizontal') -> np.ndarray:
+    """Flip an image horizontally or vertically.
+
+    Args:
+        img (ndarray): Image to be flipped.
+        direction (str): The flip direction, either "horizontal" or
+            "vertical" or "diagonal".
+
+    Returns:
+        ndarray: The flipped image.
+    """
+    assert direction in ['horizontal', 'vertical', 'diagonal']
+    if direction == 'horizontal':
+        return np.flip(img, axis=1)
+    elif direction == 'vertical':
+        return np.flip(img, axis=0)
+    else:
+        return np.flip(img, axis=(0, 1))
+
+
+def imflip_(img: np.ndarray, direction: str = 'horizontal') -> np.ndarray:
+    """Inplace flip an image horizontally or vertically.
+
+    Args:
+        img (ndarray): Image to be flipped.
+        direction (str): The flip direction, either "horizontal" or
+            "vertical" or "diagonal".
+
+    Returns:
+        ndarray: The flipped image (inplace).
+    """
+    assert direction in ['horizontal', 'vertical', 'diagonal']
+    if direction == 'horizontal':
+        return cv2.flip(img, 1, img)
+    elif direction == 'vertical':
+        return cv2.flip(img, 0, img)
+    else:
+        return cv2.flip(img, -1, img)
+
+
+def imrotate(img: np.ndarray,
+             angle: float,
+             center: Optional[Tuple[float, float]] = None,
+             scale: float = 1.0,
+             border_value: int = 0,
+             interpolation: str = 'bilinear',
+             auto_bound: bool = False,
+             border_mode: str = 'constant') -> np.ndarray:
+    """Rotate an image.
+
+    Args:
+        img (np.ndarray): Image to be rotated.
+        angle (float): Rotation angle in degrees, positive values mean
+            clockwise rotation.
+        center (tuple[float], optional): Center point (w, h) of the rotation in
+            the source image. If not specified, the center of the image will be
+            used.
+        scale (float): Isotropic scale factor.
+        border_value (int): Border value used in case of a constant border.
+            Defaults to 0.
+        interpolation (str): Same as :func:`resize`.
+        auto_bound (bool): Whether to adjust the image size to cover the whole
+            rotated image.
+        border_mode (str): Pixel extrapolation method. Defaults to 'constant'.
+
+    Returns:
+        np.ndarray: The rotated image.
+    """
+    if center is not None and auto_bound:
+        raise ValueError('`auto_bound` conflicts with `center`')
+    h, w = img.shape[:2]
+    if center is None:
+        center = ((w - 1) * 0.5, (h - 1) * 0.5)
+    assert isinstance(center, tuple)
+
+    matrix = cv2.getRotationMatrix2D(center, -angle, scale)
+    if auto_bound:
+        cos = np.abs(matrix[0, 0])
+        sin = np.abs(matrix[0, 1])
+        new_w = h * sin + w * cos
+        new_h = h * cos + w * sin
+        matrix[0, 2] += (new_w - w) * 0.5
+        matrix[1, 2] += (new_h - h) * 0.5
+        w = int(np.round(new_w))
+        h = int(np.round(new_h))
+    rotated = cv2.warpAffine(
+        img,
+        matrix, (w, h),
+        flags=cv2_interp_codes[interpolation],
+        borderMode=cv2_border_modes[border_mode],
+        borderValue=border_value)
+    return rotated
+
+
+def bbox_clip(bboxes: np.ndarray, img_shape: Tuple[int, int]) -> np.ndarray:
+    """Clip bboxes to fit the image shape.
+
+    Args:
+        bboxes (ndarray): Shape (..., 4*k)
+        img_shape (tuple[int]): (height, width) of the image.
+
+    Returns:
+        ndarray: Clipped bboxes.
+    """
+    assert bboxes.shape[-1] % 4 == 0
+    cmin = np.empty(bboxes.shape[-1], dtype=bboxes.dtype)
+    cmin[0::2] = img_shape[1] - 1
+    cmin[1::2] = img_shape[0] - 1
+    clipped_bboxes = np.maximum(np.minimum(bboxes, cmin), 0)
+    return clipped_bboxes
+
+
+def bbox_scaling(bboxes: np.ndarray,
+                 scale: float,
+                 clip_shape: Optional[Tuple[int, int]] = None) -> np.ndarray:
+    """Scaling bboxes w.r.t the box center.
+
+    Args:
+        bboxes (ndarray): Shape(..., 4).
+        scale (float): Scaling factor.
+        clip_shape (tuple[int], optional): If specified, bboxes that exceed the
+            boundary will be clipped according to the given shape (h, w).
+
+    Returns:
+        ndarray: Scaled bboxes.
+    """
+    if float(scale) == 1.0:
+        scaled_bboxes = bboxes.copy()
+    else:
+        w = bboxes[..., 2] - bboxes[..., 0] + 1
+        h = bboxes[..., 3] - bboxes[..., 1] + 1
+        dw = (w * (scale - 1)) * 0.5
+        dh = (h * (scale - 1)) * 0.5
+        scaled_bboxes = bboxes + np.stack((-dw, -dh, dw, dh), axis=-1)
+    if clip_shape is not None:
+        return bbox_clip(scaled_bboxes, clip_shape)
+    else:
+        return scaled_bboxes
+
+
+def imcrop(
+    img: np.ndarray,
+    bboxes: np.ndarray,
+    scale: float = 1.0,
+    pad_fill: Union[float, list, None] = None
+) -> Union[np.ndarray, List[np.ndarray]]:
+    """Crop image patches.
+
+    3 steps: scale the bboxes -> clip bboxes -> crop and pad.
+
+    Args:
+        img (ndarray): Image to be cropped.
+        bboxes (ndarray): Shape (k, 4) or (4, ), location of cropped bboxes.
+        scale (float, optional): Scale ratio of bboxes, the default value
+            1.0 means no padding.
+        pad_fill (Number | list[Number]): Value to be filled for padding.
+            Default: None, which means no padding.
+
+    Returns:
+        list[ndarray] | ndarray: The cropped image patches.
+    """
+    chn = 1 if img.ndim == 2 else img.shape[2]
+    if pad_fill is not None:
+        if isinstance(pad_fill, (int, float)):
+            pad_fill = [pad_fill for _ in range(chn)]
+        assert len(pad_fill) == chn
+
+    _bboxes = bboxes[None, ...] if bboxes.ndim == 1 else bboxes
+    scaled_bboxes = bbox_scaling(_bboxes, scale).astype(np.int32)
+    clipped_bbox = bbox_clip(scaled_bboxes, img.shape)
+
+    patches = []
+    for i in range(clipped_bbox.shape[0]):
+        x1, y1, x2, y2 = tuple(clipped_bbox[i, :])
+        if pad_fill is None:
+            patch = img[y1:y2 + 1, x1:x2 + 1, ...]
+        else:
+            _x1, _y1, _x2, _y2 = tuple(scaled_bboxes[i, :])
+            patch_h = _y2 - _y1 + 1
+            patch_w = _x2 - _x1 + 1
+            if chn == 1:
+                patch_shape = (patch_h, patch_w)
+            else:
+                patch_shape = (patch_h, patch_w, chn)  # type: ignore
+            patch = np.array(
+                pad_fill, dtype=img.dtype) * np.ones(
+                    patch_shape, dtype=img.dtype)
+            x_start = 0 if _x1 >= 0 else -_x1
+            y_start = 0 if _y1 >= 0 else -_y1
+            w = x2 - x1 + 1
+            h = y2 - y1 + 1
+            patch[y_start:y_start + h, x_start:x_start + w,
+                  ...] = img[y1:y1 + h, x1:x1 + w, ...]
+        patches.append(patch)
+
+    if bboxes.ndim == 1:
+        return patches[0]
+    else:
+        return patches
+
+
+def impad(img: np.ndarray,
+          *,
+          shape: Optional[Tuple[int, int]] = None,
+          padding: Union[int, tuple, None] = None,
+          pad_val: Union[float, List] = 0,
+          padding_mode: str = 'constant') -> np.ndarray:
+    """Pad the given image to a certain shape or pad on all sides with
+    specified padding mode and padding value.
+
+    Args:
+        img (ndarray): Image to be padded.
+        shape (tuple[int]): Expected padding shape (h, w). Default: None.
+        padding (int or tuple[int]): Padding on each border. If a single int is
+            provided this is used to pad all borders. If tuple of length 2 is
+            provided this is the padding on left/right and top/bottom
+            respectively. If a tuple of length 4 is provided this is the
+            padding for the left, top, right and bottom borders respectively.
+            Default: None. Note that `shape` and `padding` can not be both
+            set.
+        pad_val (Number | Sequence[Number]): Values to be filled in padding
+            areas when padding_mode is 'constant'. Default: 0.
+        padding_mode (str): Type of padding. Should be: constant, edge,
+            reflect or symmetric. Default: constant.
+
+            - constant: pads with a constant value, this value is specified
+              with pad_val.
+            - edge: pads with the last value at the edge of the image.
+            - reflect: pads with reflection of image without repeating the last
+              value on the edge. For example, padding [1, 2, 3, 4] with 2
+              elements on both sides in reflect mode will result in
+              [3, 2, 1, 2, 3, 4, 3, 2].
+            - symmetric: pads with reflection of image repeating the last value
+              on the edge. For example, padding [1, 2, 3, 4] with 2 elements on
+              both sides in symmetric mode will result in
+              [2, 1, 1, 2, 3, 4, 4, 3]
+
+    Returns:
+        ndarray: The padded image.
+    """
+
+    assert (shape is not None) ^ (padding is not None)
+    if shape is not None:
+        width = max(shape[1] - img.shape[1], 0)
+        height = max(shape[0] - img.shape[0], 0)
+        padding = (0, 0, width, height)
+
+    # check pad_val
+    if isinstance(pad_val, tuple):
+        assert len(pad_val) == img.shape[-1]
+    elif not isinstance(pad_val, numbers.Number):
+        raise TypeError('pad_val must be a int or a tuple. '
+                        f'But received {type(pad_val)}')
+
+    # check padding
+    if isinstance(padding, tuple) and len(padding) in [2, 4]:
+        if len(padding) == 2:
+            padding = (padding[0], padding[1], padding[0], padding[1])
+    elif isinstance(padding, numbers.Number):
+        padding = (padding, padding, padding, padding)
+    else:
+        raise ValueError('Padding must be a int or a 2, or 4 element tuple.'
+                         f'But received {padding}')
+
+    # check padding mode
+    assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']
+
+    border_type = {
+        'constant': cv2.BORDER_CONSTANT,
+        'edge': cv2.BORDER_REPLICATE,
+        'reflect': cv2.BORDER_REFLECT_101,
+        'symmetric': cv2.BORDER_REFLECT
+    }
+    img = cv2.copyMakeBorder(
+        img,
+        padding[1],
+        padding[3],
+        padding[0],
+        padding[2],
+        border_type[padding_mode],
+        value=pad_val)
+
+    return img
+
+
+def impad_to_multiple(img: np.ndarray,
+                      divisor: int,
+                      pad_val: Union[float, List] = 0) -> np.ndarray:
+    """Pad an image to ensure each edge to be multiple to some number.
+
+    Args:
+        img (ndarray): Image to be padded.
+        divisor (int): Padded image edges will be multiple to divisor.
+        pad_val (Number | Sequence[Number]): Same as :func:`impad`.
+
+    Returns:
+        ndarray: The padded image.
+    """
+    pad_h = int(np.ceil(img.shape[0] / divisor)) * divisor
+    pad_w = int(np.ceil(img.shape[1] / divisor)) * divisor
+    return impad(img, shape=(pad_h, pad_w), pad_val=pad_val)
+
+
+def cutout(img: np.ndarray,
+           shape: Union[int, Tuple[int, int]],
+           pad_val: Union[int, float, tuple] = 0) -> np.ndarray:
+    """Randomly cut out a rectangle from the original img.
+
+    Args:
+        img (ndarray): Image to be cutout.
+        shape (int | tuple[int]): Expected cutout shape (h, w). If given as a
+            int, the value will be used for both h and w.
+        pad_val (int | float | tuple[int | float]): Values to be filled in the
+            cut area. Defaults to 0.
+
+    Returns:
+        ndarray: The cutout image.
+    """
+
+    channels = 1 if img.ndim == 2 else img.shape[2]
+    if isinstance(shape, int):
+        cut_h, cut_w = shape, shape
+    else:
+        assert isinstance(shape, tuple) and len(shape) == 2, \
+            f'shape must be a int or a tuple with length 2, but got type ' \
+            f'{type(shape)} instead.'
+        cut_h, cut_w = shape
+    if isinstance(pad_val, (int, float)):
+        pad_val = tuple([pad_val] * channels)
+    elif isinstance(pad_val, tuple):
+        assert len(pad_val) == channels, \
+            'Expected the num of elements in tuple equals the channels' \
+            'of input image. Found {} vs {}'.format(
+                len(pad_val), channels)
+    else:
+        raise TypeError(f'Invalid type {type(pad_val)} for `pad_val`')
+
+    img_h, img_w = img.shape[:2]
+    y0 = np.random.uniform(img_h)
+    x0 = np.random.uniform(img_w)
+
+    y1 = int(max(0, y0 - cut_h / 2.))
+    x1 = int(max(0, x0 - cut_w / 2.))
+    y2 = min(img_h, y1 + cut_h)
+    x2 = min(img_w, x1 + cut_w)
+
+    if img.ndim == 2:
+        patch_shape = (y2 - y1, x2 - x1)
+    else:
+        patch_shape = (y2 - y1, x2 - x1, channels)  # type: ignore
+
+    img_cutout = img.copy()
+    patch = np.array(
+        pad_val, dtype=img.dtype) * np.ones(
+            patch_shape, dtype=img.dtype)
+    img_cutout[y1:y2, x1:x2, ...] = patch
+
+    return img_cutout
+
+
+def _get_shear_matrix(magnitude: Union[int, float],
+                      direction: str = 'horizontal') -> np.ndarray:
+    """Generate the shear matrix for transformation.
+
+    Args:
+        magnitude (int | float): The magnitude used for shear.
+        direction (str): The flip direction, either "horizontal"
+            or "vertical".
+
+    Returns:
+        ndarray: The shear matrix with dtype float32.
+    """
+    if direction == 'horizontal':
+        shear_matrix = np.float32([[1, magnitude, 0], [0, 1, 0]])
+    elif direction == 'vertical':
+        shear_matrix = np.float32([[1, 0, 0], [magnitude, 1, 0]])
+    return shear_matrix
+
+
+def imshear(img: np.ndarray,
+            magnitude: Union[int, float],
+            direction: str = 'horizontal',
+            border_value: Union[int, Tuple[int, int]] = 0,
+            interpolation: str = 'bilinear') -> np.ndarray:
+    """Shear an image.
+
+    Args:
+        img (ndarray): Image to be sheared with format (h, w)
+            or (h, w, c).
+        magnitude (int | float): The magnitude used for shear.
+        direction (str): The flip direction, either "horizontal"
+            or "vertical".
+        border_value (int | tuple[int]): Value used in case of a
+            constant border.
+        interpolation (str): Same as :func:`resize`.
+
+    Returns:
+        ndarray: The sheared image.
+    """
+    assert direction in ['horizontal',
+                         'vertical'], f'Invalid direction: {direction}'
+    height, width = img.shape[:2]
+    if img.ndim == 2:
+        channels = 1
+    elif img.ndim == 3:
+        channels = img.shape[-1]
+    if isinstance(border_value, int):
+        border_value = tuple([border_value] * channels)  # type: ignore
+    elif isinstance(border_value, tuple):
+        assert len(border_value) == channels, \
+            'Expected the num of elements in tuple equals the channels' \
+            'of input image. Found {} vs {}'.format(
+                len(border_value), channels)
+    else:
+        raise ValueError(
+            f'Invalid type {type(border_value)} for `border_value`')
+    shear_matrix = _get_shear_matrix(magnitude, direction)
+    sheared = cv2.warpAffine(
+        img,
+        shear_matrix,
+        (width, height),
+        # Note case when the number elements in `border_value`
+        # greater than 3 (e.g. shearing masks whose channels large
+        # than 3) will raise TypeError in `cv2.warpAffine`.
+        # Here simply slice the first 3 values in `border_value`.
+        borderValue=border_value[:3],  # type: ignore
+        flags=cv2_interp_codes[interpolation])
+    return sheared
+
+
+def _get_translate_matrix(offset: Union[int, float],
+                          direction: str = 'horizontal') -> np.ndarray:
+    """Generate the translate matrix.
+
+    Args:
+        offset (int | float): The offset used for translate.
+        direction (str): The translate direction, either
+            "horizontal" or "vertical".
+
+    Returns:
+        ndarray: The translate matrix with dtype float32.
+    """
+    if direction == 'horizontal':
+        translate_matrix = np.float32([[1, 0, offset], [0, 1, 0]])
+    elif direction == 'vertical':
+        translate_matrix = np.float32([[1, 0, 0], [0, 1, offset]])
+    return translate_matrix
+
+
+def imtranslate(img: np.ndarray,
+                offset: Union[int, float],
+                direction: str = 'horizontal',
+                border_value: Union[int, tuple] = 0,
+                interpolation: str = 'bilinear') -> np.ndarray:
+    """Translate an image.
+
+    Args:
+        img (ndarray): Image to be translated with format
+            (h, w) or (h, w, c).
+        offset (int | float): The offset used for translate.
+        direction (str): The translate direction, either "horizontal"
+            or "vertical".
+        border_value (int | tuple[int]): Value used in case of a
+            constant border.
+        interpolation (str): Same as :func:`resize`.
+
+    Returns:
+        ndarray: The translated image.
+    """
+    assert direction in ['horizontal',
+                         'vertical'], f'Invalid direction: {direction}'
+    height, width = img.shape[:2]
+    if img.ndim == 2:
+        channels = 1
+    elif img.ndim == 3:
+        channels = img.shape[-1]
+    if isinstance(border_value, int):
+        border_value = tuple([border_value] * channels)
+    elif isinstance(border_value, tuple):
+        assert len(border_value) == channels, \
+            'Expected the num of elements in tuple equals the channels' \
+            'of input image. Found {} vs {}'.format(
+                len(border_value), channels)
+    else:
+        raise ValueError(
+            f'Invalid type {type(border_value)} for `border_value`.')
+    translate_matrix = _get_translate_matrix(offset, direction)
+    translated = cv2.warpAffine(
+        img,
+        translate_matrix,
+        (width, height),
+        # Note case when the number elements in `border_value`
+        # greater than 3 (e.g. translating masks whose channels
+        # large than 3) will raise TypeError in `cv2.warpAffine`.
+        # Here simply slice the first 3 values in `border_value`.
+        borderValue=border_value[:3],
+        flags=cv2_interp_codes[interpolation])
+    return translated
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/io.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/io.py
new file mode 100644
index 0000000000000000000000000000000000000000..e10d443da6554865afc98cb2441a0cc8eddf0e16
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/io.py
@@ -0,0 +1,364 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import io
+import os.path as osp
+import warnings
+from pathlib import Path
+from typing import Optional, Union
+
+import cv2
+import mmengine.fileio as fileio
+import numpy as np
+from cv2 import (IMREAD_COLOR, IMREAD_GRAYSCALE, IMREAD_IGNORE_ORIENTATION,
+                 IMREAD_UNCHANGED)
+from mmengine.utils import is_filepath, is_str
+
+try:
+    from turbojpeg import TJCS_RGB, TJPF_BGR, TJPF_GRAY, TurboJPEG
+except ImportError:
+    TJCS_RGB = TJPF_GRAY = TJPF_BGR = TurboJPEG = None
+
+try:
+    from PIL import Image, ImageOps
+except ImportError:
+    Image = None
+
+try:
+    import tifffile
+except ImportError:
+    tifffile = None
+
+jpeg = None
+supported_backends = ['cv2', 'turbojpeg', 'pillow', 'tifffile']
+
+imread_flags = {
+    'color': IMREAD_COLOR,
+    'grayscale': IMREAD_GRAYSCALE,
+    'unchanged': IMREAD_UNCHANGED,
+    'color_ignore_orientation': IMREAD_IGNORE_ORIENTATION | IMREAD_COLOR,
+    'grayscale_ignore_orientation':
+    IMREAD_IGNORE_ORIENTATION | IMREAD_GRAYSCALE
+}
+
+imread_backend = 'cv2'
+
+
+def use_backend(backend: str) -> None:
+    """Select a backend for image decoding.
+
+    Args:
+        backend (str): The image decoding backend type. Options are `cv2`,
+        `pillow`, `turbojpeg` (see https://github.com/lilohuang/PyTurboJPEG)
+        and `tifffile`. `turbojpeg` is faster but it only supports `.jpeg`
+        file format.
+    """
+    assert backend in supported_backends
+    global imread_backend
+    imread_backend = backend
+    if imread_backend == 'turbojpeg':
+        if TurboJPEG is None:
+            raise ImportError('`PyTurboJPEG` is not installed')
+        global jpeg
+        if jpeg is None:
+            jpeg = TurboJPEG()
+    elif imread_backend == 'pillow':
+        if Image is None:
+            raise ImportError('`Pillow` is not installed')
+    elif imread_backend == 'tifffile':
+        if tifffile is None:
+            raise ImportError('`tifffile` is not installed')
+
+
+def _jpegflag(flag: str = 'color', channel_order: str = 'bgr'):
+    channel_order = channel_order.lower()
+    if channel_order not in ['rgb', 'bgr']:
+        raise ValueError('channel order must be either "rgb" or "bgr"')
+
+    if flag == 'color':
+        if channel_order == 'bgr':
+            return TJPF_BGR
+        elif channel_order == 'rgb':
+            return TJCS_RGB
+    elif flag == 'grayscale':
+        return TJPF_GRAY
+    else:
+        raise ValueError('flag must be "color" or "grayscale"')
+
+
+def _pillow2array(img,
+                  flag: str = 'color',
+                  channel_order: str = 'bgr') -> np.ndarray:
+    """Convert a pillow image to numpy array.
+
+    Args:
+        img (:obj:`PIL.Image.Image`): The image loaded using PIL
+        flag (str): Flags specifying the color type of a loaded image,
+            candidates are 'color', 'grayscale' and 'unchanged'.
+            Default to 'color'.
+        channel_order (str): The channel order of the output image array,
+            candidates are 'bgr' and 'rgb'. Default to 'bgr'.
+
+    Returns:
+        np.ndarray: The converted numpy array
+    """
+    channel_order = channel_order.lower()
+    if channel_order not in ['rgb', 'bgr']:
+        raise ValueError('channel order must be either "rgb" or "bgr"')
+
+    if flag == 'unchanged':
+        array = np.array(img)
+        if array.ndim >= 3 and array.shape[2] >= 3:  # color image
+            array[:, :, :3] = array[:, :, (2, 1, 0)]  # RGB to BGR
+    else:
+        # Handle exif orientation tag
+        if flag in ['color', 'grayscale']:
+            img = ImageOps.exif_transpose(img)
+        # If the image mode is not 'RGB', convert it to 'RGB' first.
+        if img.mode != 'RGB':
+            if img.mode != 'LA':
+                # Most formats except 'LA' can be directly converted to RGB
+                img = img.convert('RGB')
+            else:
+                # When the mode is 'LA', the default conversion will fill in
+                #  the canvas with black, which sometimes shadows black objects
+                #  in the foreground.
+                #
+                # Therefore, a random color (124, 117, 104) is used for canvas
+                img_rgba = img.convert('RGBA')
+                img = Image.new('RGB', img_rgba.size, (124, 117, 104))
+                img.paste(img_rgba, mask=img_rgba.split()[3])  # 3 is alpha
+        if flag in ['color', 'color_ignore_orientation']:
+            array = np.array(img)
+            if channel_order != 'rgb':
+                array = array[:, :, ::-1]  # RGB to BGR
+        elif flag in ['grayscale', 'grayscale_ignore_orientation']:
+            img = img.convert('L')
+            array = np.array(img)
+        else:
+            raise ValueError(
+                'flag must be "color", "grayscale", "unchanged", '
+                f'"color_ignore_orientation" or "grayscale_ignore_orientation"'
+                f' but got {flag}')
+    return array
+
+
+def imread(img_or_path: Union[np.ndarray, str, Path],
+           flag: str = 'color',
+           channel_order: str = 'bgr',
+           backend: Optional[str] = None,
+           file_client_args: Optional[dict] = None,
+           *,
+           backend_args: Optional[dict] = None) -> np.ndarray:
+    """Read an image.
+
+    Args:
+        img_or_path (ndarray or str or Path): Either a numpy array or str or
+            pathlib.Path. If it is a numpy array (loaded image), then
+            it will be returned as is.
+        flag (str): Flags specifying the color type of a loaded image,
+            candidates are `color`, `grayscale`, `unchanged`,
+            `color_ignore_orientation` and `grayscale_ignore_orientation`.
+            By default, `cv2` and `pillow` backend would rotate the image
+            according to its EXIF info unless called with `unchanged` or
+            `*_ignore_orientation` flags. `turbojpeg` and `tifffile` backend
+            always ignore image's EXIF info regardless of the flag.
+            The `turbojpeg` backend only supports `color` and `grayscale`.
+        channel_order (str): Order of channel, candidates are `bgr` and `rgb`.
+        backend (str | None): The image decoding backend type. Options are
+            `cv2`, `pillow`, `turbojpeg`, `tifffile`, `None`.
+            If backend is None, the global imread_backend specified by
+            ``mmcv.use_backend()`` will be used. Default: None.
+        file_client_args (dict, optional): Arguments to instantiate a
+            FileClient. See :class:`mmengine.fileio.FileClient` for details.
+            Default: None. It will be deprecated in future. Please use
+            ``backend_args`` instead.
+            Deprecated in version 2.0.0rc4.
+        backend_args (dict, optional): Instantiates the corresponding file
+            backend. It may contain `backend` key to specify the file
+            backend. If it contains, the file backend corresponding to this
+            value will be used and initialized with the remaining values,
+            otherwise the corresponding file backend will be selected
+            based on the prefix of the file path. Defaults to None.
+            New in version 2.0.0rc4.
+
+    Returns:
+        ndarray: Loaded image array.
+
+    Examples:
+        >>> import mmcv
+        >>> img_path = '/path/to/img.jpg'
+        >>> img = mmcv.imread(img_path)
+        >>> img = mmcv.imread(img_path, flag='color', channel_order='rgb',
+        ...     backend='cv2')
+        >>> img = mmcv.imread(img_path, flag='color', channel_order='bgr',
+        ...     backend='pillow')
+        >>> s3_img_path = 's3://bucket/img.jpg'
+        >>> # infer the file backend by the prefix s3
+        >>> img = mmcv.imread(s3_img_path)
+        >>> # manually set the file backend petrel
+        >>> img = mmcv.imread(s3_img_path, backend_args={
+        ...     'backend': 'petrel'})
+        >>> http_img_path = 'http://path/to/img.jpg'
+        >>> img = mmcv.imread(http_img_path)
+        >>> img = mmcv.imread(http_img_path, backend_args={
+        ...     'backend': 'http'})
+    """
+    if file_client_args is not None:
+        warnings.warn(
+            '"file_client_args" will be deprecated in future. '
+            'Please use "backend_args" instead', DeprecationWarning)
+        if backend_args is not None:
+            raise ValueError(
+                '"file_client_args" and "backend_args" cannot be set at the '
+                'same time.')
+
+    if isinstance(img_or_path, Path):
+        img_or_path = str(img_or_path)
+
+    if isinstance(img_or_path, np.ndarray):
+        return img_or_path
+    elif is_str(img_or_path):
+        if file_client_args is not None:
+            file_client = fileio.FileClient.infer_client(
+                file_client_args, img_or_path)
+            img_bytes = file_client.get(img_or_path)
+        else:
+            img_bytes = fileio.get(img_or_path, backend_args=backend_args)
+        return imfrombytes(img_bytes, flag, channel_order, backend)
+    else:
+        raise TypeError('"img" must be a numpy array or a str or '
+                        'a pathlib.Path object')
+
+
+def imfrombytes(content: bytes,
+                flag: str = 'color',
+                channel_order: str = 'bgr',
+                backend: Optional[str] = None) -> np.ndarray:
+    """Read an image from bytes.
+
+    Args:
+        content (bytes): Image bytes got from files or other streams.
+        flag (str): Same as :func:`imread`.
+        channel_order (str): The channel order of the output, candidates
+            are 'bgr' and 'rgb'. Default to 'bgr'.
+        backend (str | None): The image decoding backend type. Options are
+            `cv2`, `pillow`, `turbojpeg`, `tifffile`, `None`. If backend is
+            None, the global imread_backend specified by ``mmcv.use_backend()``
+            will be used. Default: None.
+
+    Returns:
+        ndarray: Loaded image array.
+
+    Examples:
+        >>> img_path = '/path/to/img.jpg'
+        >>> with open(img_path, 'rb') as f:
+        >>>     img_buff = f.read()
+        >>> img = mmcv.imfrombytes(img_buff)
+        >>> img = mmcv.imfrombytes(img_buff, flag='color', channel_order='rgb')
+        >>> img = mmcv.imfrombytes(img_buff, backend='pillow')
+        >>> img = mmcv.imfrombytes(img_buff, backend='cv2')
+    """
+
+    if backend is None:
+        backend = imread_backend
+    if backend not in supported_backends:
+        raise ValueError(
+            f'backend: {backend} is not supported. Supported '
+            "backends are 'cv2', 'turbojpeg', 'pillow', 'tifffile'")
+    if backend == 'turbojpeg':
+        img = jpeg.decode(  # type: ignore
+            content, _jpegflag(flag, channel_order))
+        if img.shape[-1] == 1:
+            img = img[:, :, 0]
+        return img
+    elif backend == 'pillow':
+        with io.BytesIO(content) as buff:
+            img = Image.open(buff)
+            img = _pillow2array(img, flag, channel_order)
+        return img
+    elif backend == 'tifffile':
+        with io.BytesIO(content) as buff:
+            img = tifffile.imread(buff)
+        return img
+    else:
+        img_np = np.frombuffer(content, np.uint8)
+        flag = imread_flags[flag] if is_str(flag) else flag
+        img = cv2.imdecode(img_np, flag)
+        if flag == IMREAD_COLOR and channel_order == 'rgb':
+            cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)
+        return img
+
+
+def imwrite(img: np.ndarray,
+            file_path: str,
+            params: Optional[list] = None,
+            auto_mkdir: Optional[bool] = None,
+            file_client_args: Optional[dict] = None,
+            *,
+            backend_args: Optional[dict] = None) -> bool:
+    """Write image to file.
+
+    Warning:
+        The parameter `auto_mkdir` will be deprecated in the future and every
+        file clients will make directory automatically.
+
+    Args:
+        img (ndarray): Image array to be written.
+        file_path (str): Image file path.
+        params (None or list): Same as opencv :func:`imwrite` interface.
+        auto_mkdir (bool): If the parent folder of `file_path` does not exist,
+            whether to create it automatically. It will be deprecated.
+        file_client_args (dict, optional): Arguments to instantiate a
+            FileClient. See :class:`mmengine.fileio.FileClient` for details.
+            Default: None. It will be deprecated in future. Please use
+            ``backend_args`` instead.
+            Deprecated in version 2.0.0rc4.
+        backend_args (dict, optional): Instantiates the corresponding file
+            backend. It may contain `backend` key to specify the file
+            backend. If it contains, the file backend corresponding to this
+            value will be used and initialized with the remaining values,
+            otherwise the corresponding file backend will be selected
+            based on the prefix of the file path. Defaults to None.
+            New in version 2.0.0rc4.
+
+    Returns:
+        bool: Successful or not.
+
+    Examples:
+        >>> # write to hard disk client
+        >>> ret = mmcv.imwrite(img, '/path/to/img.jpg')
+        >>> # infer the file backend by the prefix s3
+        >>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg')
+        >>> # manually set the file backend petrel
+        >>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg', backend_args={
+        ...     'backend': 'petrel'})
+    """
+    if file_client_args is not None:
+        warnings.warn(
+            '"file_client_args" will be deprecated in future. '
+            'Please use "backend_args" instead', DeprecationWarning)
+        if backend_args is not None:
+            raise ValueError(
+                '"file_client_args" and "backend_args" cannot be set at the '
+                'same time.')
+
+    assert is_filepath(file_path)
+    file_path = str(file_path)
+    if auto_mkdir is not None:
+        warnings.warn(
+            'The parameter `auto_mkdir` will be deprecated in the future and '
+            'every file clients will make directory automatically.')
+
+    img_ext = osp.splitext(file_path)[-1]
+    # Encode image according to image suffix.
+    # For example, if image path is '/path/your/img.jpg', the encode
+    # format is '.jpg'.
+    flag, img_buff = cv2.imencode(img_ext, img, params)
+
+    if file_client_args is not None:
+        file_client = fileio.FileClient.infer_client(file_client_args,
+                                                     file_path)
+        file_client.put(img_buff.tobytes(), file_path)
+    else:
+        fileio.put(img_buff.tobytes(), file_path, backend_args=backend_args)
+
+    return flag
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/misc.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..e923cad4e5f7d210640ee51291a48d82c3b84c32
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/misc.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import numpy as np
+
+import mmcv
+
+try:
+    import torch
+except ImportError:
+    torch = None
+
+
+def tensor2imgs(tensor,
+                mean: Optional[tuple] = None,
+                std: Optional[tuple] = None,
+                to_rgb: bool = True) -> list:
+    """Convert tensor to 3-channel images or 1-channel gray images.
+
+    Args:
+        tensor (torch.Tensor): Tensor that contains multiple images, shape (
+            N, C, H, W). :math:`C` can be either 3 or 1.
+        mean (tuple[float], optional): Mean of images. If None,
+            (0, 0, 0) will be used for tensor with 3-channel,
+            while (0, ) for tensor with 1-channel. Defaults to None.
+        std (tuple[float], optional): Standard deviation of images. If None,
+            (1, 1, 1) will be used for tensor with 3-channel,
+            while (1, ) for tensor with 1-channel. Defaults to None.
+        to_rgb (bool, optional): Whether the tensor was converted to RGB
+            format in the first place. If so, convert it back to BGR.
+            For the tensor with 1 channel, it must be False. Defaults to True.
+
+    Returns:
+        list[np.ndarray]: A list that contains multiple images.
+    """
+
+    if torch is None:
+        raise RuntimeError('pytorch is not installed')
+    assert torch.is_tensor(tensor) and tensor.ndim == 4
+    channels = tensor.size(1)
+    assert channels in [1, 3]
+    if mean is None:
+        mean = (0, ) * channels
+    if std is None:
+        std = (1, ) * channels
+    assert (channels == len(mean) == len(std) == 3) or \
+        (channels == len(mean) == len(std) == 1 and not to_rgb)
+
+    num_imgs = tensor.size(0)
+    mean = np.array(mean, dtype=np.float32)
+    std = np.array(std, dtype=np.float32)
+    imgs = []
+    for img_id in range(num_imgs):
+        img = tensor[img_id, ...].cpu().numpy().transpose(1, 2, 0)
+        img = mmcv.imdenormalize(
+            img, mean, std, to_bgr=to_rgb).astype(np.uint8)
+        imgs.append(np.ascontiguousarray(img))
+    return imgs
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/image/photometric.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/photometric.py
new file mode 100644
index 0000000000000000000000000000000000000000..12cbb90822564bf14cd5176cc3c5532220db40da
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/image/photometric.py
@@ -0,0 +1,561 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Optional
+
+import cv2
+import numpy as np
+from mmengine.utils import is_tuple_of
+from PIL import Image, ImageEnhance
+
+from .colorspace import bgr2gray, gray2bgr
+from .io import imread_backend
+
+
+def imnormalize(img, mean, std, to_rgb=True):
+    """Normalize an image with mean and std.
+
+    Args:
+        img (ndarray): Image to be normalized.
+        mean (ndarray): The mean to be used for normalize.
+        std (ndarray): The std to be used for normalize.
+        to_rgb (bool): Whether to convert to rgb.
+
+    Returns:
+        ndarray: The normalized image.
+    """
+    img = img.copy().astype(np.float32)
+    return imnormalize_(img, mean, std, to_rgb)
+
+
+def imnormalize_(img, mean, std, to_rgb=True):
+    """Inplace normalize an image with mean and std.
+
+    Args:
+        img (ndarray): Image to be normalized.
+        mean (ndarray): The mean to be used for normalize.
+        std (ndarray): The std to be used for normalize.
+        to_rgb (bool): Whether to convert to rgb.
+
+    Returns:
+        ndarray: The normalized image.
+    """
+    # cv2 inplace normalization does not accept uint8
+    assert img.dtype != np.uint8
+    mean = np.float64(mean.reshape(1, -1))
+    stdinv = 1 / np.float64(std.reshape(1, -1))
+    if to_rgb:
+        cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)  # inplace
+    cv2.subtract(img, mean, img)  # inplace
+    cv2.multiply(img, stdinv, img)  # inplace
+    return img
+
+
+def imdenormalize(img, mean, std, to_bgr=True):
+    assert img.dtype != np.uint8
+    mean = mean.reshape(1, -1).astype(np.float64)
+    std = std.reshape(1, -1).astype(np.float64)
+    img = cv2.multiply(img, std)  # make a copy
+    cv2.add(img, mean, img)  # inplace
+    if to_bgr:
+        cv2.cvtColor(img, cv2.COLOR_RGB2BGR, img)  # inplace
+    return img
+
+
+def iminvert(img):
+    """Invert (negate) an image.
+
+    Args:
+        img (ndarray): Image to be inverted.
+
+    Returns:
+        ndarray: The inverted image.
+    """
+    return np.full_like(img, 255) - img
+
+
+def solarize(img, thr=128):
+    """Solarize an image (invert all pixel values above a threshold)
+
+    Args:
+        img (ndarray): Image to be solarized.
+        thr (int): Threshold for solarizing (0 - 255).
+
+    Returns:
+        ndarray: The solarized image.
+    """
+    img = np.where(img < thr, img, 255 - img)
+    return img
+
+
+def posterize(img, bits):
+    """Posterize an image (reduce the number of bits for each color channel)
+
+    Args:
+        img (ndarray): Image to be posterized.
+        bits (int): Number of bits (1 to 8) to use for posterizing.
+
+    Returns:
+        ndarray: The posterized image.
+    """
+    shift = 8 - bits
+    img = np.left_shift(np.right_shift(img, shift), shift)
+    return img
+
+
+def adjust_color(img, alpha=1, beta=None, gamma=0, backend=None):
+    r"""It blends the source image and its gray image:
+
+    .. math::
+        output = img * alpha + gray\_img * beta + gamma
+
+    Args:
+        img (ndarray): The input source image.
+        alpha (int | float): Weight for the source image. Default 1.
+        beta (int | float): Weight for the converted gray image.
+            If None, it's assigned the value (1 - `alpha`).
+        gamma (int | float): Scalar added to each sum.
+            Same as :func:`cv2.addWeighted`. Default 0.
+        backend (str | None): The image processing backend type. Options are
+            `cv2`, `pillow`, `None`. If backend is None, the global
+            ``imread_backend`` specified by ``mmcv.use_backend()`` will be
+            used. Defaults to None.
+
+    Returns:
+        ndarray: Colored image which has the same size and dtype as input.
+    """
+    if backend is None:
+        backend = imread_backend
+    if backend not in ['cv2', 'pillow']:
+        raise ValueError(f'backend: {backend} is not supported.'
+                         f"Supported backends are 'cv2', 'pillow'")
+
+    if backend == 'pillow':
+        assert img.dtype == np.uint8, 'Pillow backend only support uint8 type'
+        warnings.warn("Only use 'alpha' for pillow backend.")
+        # Image.fromarray defaultly supports RGB, not BGR.
+        pil_image = Image.fromarray(img[..., ::-1], mode='RGB')
+        enhancer = ImageEnhance.Color(pil_image)
+        pil_image = enhancer.enhance(alpha)
+        return np.array(pil_image, dtype=img.dtype)[..., ::-1]
+    else:
+        gray_img = bgr2gray(img)
+        gray_img = np.tile(gray_img[..., None], [1, 1, 3])
+        if beta is None:
+            beta = 1 - alpha
+        colored_img = cv2.addWeighted(img, alpha, gray_img, beta, gamma)
+        if not colored_img.dtype == np.uint8:
+            # Note when the dtype of `img` is not the default `np.uint8`
+            # (e.g. np.float32), the value in `colored_img` got from cv2
+            # is not guaranteed to be in range [0, 255], so here clip
+            # is needed.
+            colored_img = np.clip(colored_img, 0, 255)
+        return colored_img.astype(img.dtype)
+
+
+def imequalize(img):
+    """Equalize the image histogram.
+
+    This function applies a non-linear mapping to the input image,
+    in order to create a uniform distribution of grayscale values
+    in the output image.
+
+    Args:
+        img (ndarray): Image to be equalized.
+
+    Returns:
+        ndarray: The equalized image.
+    """
+
+    def _scale_channel(im, c):
+        """Scale the data in the corresponding channel."""
+        im = im[:, :, c]
+        # Compute the histogram of the image channel.
+        histo = np.histogram(im, 256, (0, 255))[0]
+        # For computing the step, filter out the nonzeros.
+        nonzero_histo = histo[histo > 0]
+        step = (np.sum(nonzero_histo) - nonzero_histo[-1]) // 255
+        if not step:
+            lut = np.array(range(256))
+        else:
+            # Compute the cumulative sum, shifted by step // 2
+            # and then normalized by step.
+            lut = (np.cumsum(histo) + (step // 2)) // step
+            # Shift lut, prepending with 0.
+            lut = np.concatenate([[0], lut[:-1]], 0)
+            # handle potential integer overflow
+            lut[lut > 255] = 255
+        # If step is zero, return the original image.
+        # Otherwise, index from lut.
+        return np.where(np.equal(step, 0), im, lut[im])
+
+    # Scales each channel independently and then stacks
+    # the result.
+    s1 = _scale_channel(img, 0)
+    s2 = _scale_channel(img, 1)
+    s3 = _scale_channel(img, 2)
+    equalized_img = np.stack([s1, s2, s3], axis=-1)
+    return equalized_img.astype(img.dtype)
+
+
+def adjust_brightness(img, factor=1., backend=None):
+    """Adjust image brightness.
+
+    This function controls the brightness of an image. An
+    enhancement factor of 0.0 gives a black image.
+    A factor of 1.0 gives the original image. This function
+    blends the source image and the degenerated black image:
+
+    .. math::
+        output = img * factor + degenerated * (1 - factor)
+
+    Args:
+        img (ndarray): Image to be brightened.
+        factor (float): A value controls the enhancement.
+            Factor 1.0 returns the original image, lower
+            factors mean less color (brightness, contrast,
+            etc), and higher values more. Default 1.
+        backend (str | None): The image processing backend type. Options are
+            `cv2`, `pillow`, `None`. If backend is None, the global
+            ``imread_backend`` specified by ``mmcv.use_backend()`` will be
+            used. Defaults to None.
+
+    Returns:
+        ndarray: The brightened image.
+    """
+    if backend is None:
+        backend = imread_backend
+    if backend not in ['cv2', 'pillow']:
+        raise ValueError(f'backend: {backend} is not supported.'
+                         f"Supported backends are 'cv2', 'pillow'")
+
+    if backend == 'pillow':
+        assert img.dtype == np.uint8, 'Pillow backend only support uint8 type'
+        # Image.fromarray defaultly supports RGB, not BGR.
+        pil_image = Image.fromarray(img[..., ::-1], mode='RGB')
+        enhancer = ImageEnhance.Brightness(pil_image)
+        pil_image = enhancer.enhance(factor)
+        return np.array(pil_image, dtype=img.dtype)[..., ::-1]
+    else:
+        degenerated = np.zeros_like(img)
+        # Note manually convert the dtype to np.float32, to
+        # achieve as close results as PIL.ImageEnhance.Brightness.
+        # Set beta=1-factor, and gamma=0
+        brightened_img = cv2.addWeighted(
+            img.astype(np.float32), factor, degenerated.astype(np.float32),
+            1 - factor, 0)
+        brightened_img = np.clip(brightened_img, 0, 255)
+        return brightened_img.astype(img.dtype)
+
+
+def adjust_contrast(img, factor=1., backend=None):
+    """Adjust image contrast.
+
+    This function controls the contrast of an image. An
+    enhancement factor of 0.0 gives a solid grey
+    image. A factor of 1.0 gives the original image. It
+    blends the source image and the degenerated mean image:
+
+    .. math::
+        output = img * factor + degenerated * (1 - factor)
+
+    Args:
+        img (ndarray): Image to be contrasted. BGR order.
+        factor (float): Same as :func:`mmcv.adjust_brightness`.
+        backend (str | None): The image processing backend type. Options are
+            `cv2`, `pillow`, `None`. If backend is None, the global
+            ``imread_backend`` specified by ``mmcv.use_backend()`` will be
+            used. Defaults to None.
+
+    Returns:
+        ndarray: The contrasted image.
+    """
+    if backend is None:
+        backend = imread_backend
+    if backend not in ['cv2', 'pillow']:
+        raise ValueError(f'backend: {backend} is not supported.'
+                         f"Supported backends are 'cv2', 'pillow'")
+
+    if backend == 'pillow':
+        assert img.dtype == np.uint8, 'Pillow backend only support uint8 type'
+        # Image.fromarray defaultly supports RGB, not BGR.
+        pil_image = Image.fromarray(img[..., ::-1], mode='RGB')
+        enhancer = ImageEnhance.Contrast(pil_image)
+        pil_image = enhancer.enhance(factor)
+        return np.array(pil_image, dtype=img.dtype)[..., ::-1]
+    else:
+        gray_img = bgr2gray(img)
+        hist = np.histogram(gray_img, 256, (0, 255))[0]
+        mean = round(np.sum(gray_img) / np.sum(hist))
+        degenerated = (np.ones_like(img[..., 0]) * mean).astype(img.dtype)
+        degenerated = gray2bgr(degenerated)
+        contrasted_img = cv2.addWeighted(
+            img.astype(np.float32), factor, degenerated.astype(np.float32),
+            1 - factor, 0)
+        contrasted_img = np.clip(contrasted_img, 0, 255)
+        return contrasted_img.astype(img.dtype)
+
+
+def auto_contrast(img, cutoff=0):
+    """Auto adjust image contrast.
+
+    This function maximize (normalize) image contrast by first removing cutoff
+    percent of the lightest and darkest pixels from the histogram and remapping
+    the image so that the darkest pixel becomes black (0), and the lightest
+    becomes white (255).
+
+    Args:
+        img (ndarray): Image to be contrasted. BGR order.
+        cutoff (int | float | tuple): The cutoff percent of the lightest and
+            darkest pixels to be removed. If given as tuple, it shall be
+            (low, high). Otherwise, the single value will be used for both.
+            Defaults to 0.
+
+    Returns:
+        ndarray: The contrasted image.
+    """
+
+    def _auto_contrast_channel(im, c, cutoff):
+        im = im[:, :, c]
+        # Compute the histogram of the image channel.
+        histo = np.histogram(im, 256, (0, 255))[0]
+        # Remove cut-off percent pixels from histo
+        histo_sum = np.cumsum(histo)
+        cut_low = histo_sum[-1] * cutoff[0] // 100
+        cut_high = histo_sum[-1] - histo_sum[-1] * cutoff[1] // 100
+        histo_sum = np.clip(histo_sum, cut_low, cut_high) - cut_low
+        histo = np.concatenate([[histo_sum[0]], np.diff(histo_sum)], 0)
+
+        # Compute mapping
+        low, high = np.nonzero(histo)[0][0], np.nonzero(histo)[0][-1]
+        # If all the values have been cut off, return the origin img
+        if low >= high:
+            return im
+        scale = 255.0 / (high - low)
+        offset = -low * scale
+        lut = np.array(range(256))
+        lut = lut * scale + offset
+        lut = np.clip(lut, 0, 255)
+        return lut[im]
+
+    if isinstance(cutoff, (int, float)):
+        cutoff = (cutoff, cutoff)
+    else:
+        assert isinstance(cutoff, tuple), 'cutoff must be of type int, ' \
+            f'float or tuple, but got {type(cutoff)} instead.'
+    # Auto adjusts contrast for each channel independently and then stacks
+    # the result.
+    s1 = _auto_contrast_channel(img, 0, cutoff)
+    s2 = _auto_contrast_channel(img, 1, cutoff)
+    s3 = _auto_contrast_channel(img, 2, cutoff)
+    contrasted_img = np.stack([s1, s2, s3], axis=-1)
+    return contrasted_img.astype(img.dtype)
+
+
+def adjust_sharpness(img, factor=1., kernel=None):
+    """Adjust image sharpness.
+
+    This function controls the sharpness of an image. An
+    enhancement factor of 0.0 gives a blurred image. A
+    factor of 1.0 gives the original image. And a factor
+    of 2.0 gives a sharpened image. It blends the source
+    image and the degenerated mean image:
+
+    .. math::
+        output = img * factor + degenerated * (1 - factor)
+
+    Args:
+        img (ndarray): Image to be sharpened. BGR order.
+        factor (float): Same as :func:`mmcv.adjust_brightness`.
+        kernel (np.ndarray, optional): Filter kernel to be applied on the img
+            to obtain the degenerated img. Defaults to None.
+
+    Note:
+        No value sanity check is enforced on the kernel set by users. So with
+        an inappropriate kernel, the ``adjust_sharpness`` may fail to perform
+        the function its name indicates but end up performing whatever
+        transform determined by the kernel.
+
+    Returns:
+        ndarray: The sharpened image.
+    """
+
+    if kernel is None:
+        # adopted from PIL.ImageFilter.SMOOTH
+        kernel = np.array([[1., 1., 1.], [1., 5., 1.], [1., 1., 1.]]) / 13
+    assert isinstance(kernel, np.ndarray), \
+        f'kernel must be of type np.ndarray, but got {type(kernel)} instead.'
+    assert kernel.ndim == 2, \
+        f'kernel must have a dimension of 2, but got {kernel.ndim} instead.'
+
+    degenerated = cv2.filter2D(img, -1, kernel)
+    sharpened_img = cv2.addWeighted(
+        img.astype(np.float32), factor, degenerated.astype(np.float32),
+        1 - factor, 0)
+    sharpened_img = np.clip(sharpened_img, 0, 255)
+    return sharpened_img.astype(img.dtype)
+
+
+def adjust_lighting(img, eigval, eigvec, alphastd=0.1, to_rgb=True):
+    """AlexNet-style PCA jitter.
+
+    This data augmentation is proposed in `ImageNet Classification with Deep
+    Convolutional Neural Networks
+    <https://dl.acm.org/doi/pdf/10.1145/3065386>`_.
+
+    Args:
+        img (ndarray): Image to be adjusted lighting. BGR order.
+        eigval (ndarray): the eigenvalue of the convariance matrix of pixel
+            values, respectively.
+        eigvec (ndarray): the eigenvector of the convariance matrix of pixel
+            values, respectively.
+        alphastd (float): The standard deviation for distribution of alpha.
+            Defaults to 0.1
+        to_rgb (bool): Whether to convert img to rgb.
+
+    Returns:
+        ndarray: The adjusted image.
+    """
+    assert isinstance(eigval, np.ndarray) and isinstance(eigvec, np.ndarray), \
+        f'eigval and eigvec should both be of type np.ndarray, got ' \
+        f'{type(eigval)} and {type(eigvec)} instead.'
+
+    assert eigval.ndim == 1 and eigvec.ndim == 2
+    assert eigvec.shape == (3, eigval.shape[0])
+    n_eigval = eigval.shape[0]
+    assert isinstance(alphastd, float), 'alphastd should be of type float, ' \
+        f'got {type(alphastd)} instead.'
+
+    img = img.copy().astype(np.float32)
+    if to_rgb:
+        cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)  # inplace
+
+    alpha = np.random.normal(0, alphastd, n_eigval)
+    alter = eigvec \
+        * np.broadcast_to(alpha.reshape(1, n_eigval), (3, n_eigval)) \
+        * np.broadcast_to(eigval.reshape(1, n_eigval), (3, n_eigval))
+    alter = np.broadcast_to(alter.sum(axis=1).reshape(1, 1, 3), img.shape)
+    img_adjusted = img + alter
+    return img_adjusted
+
+
+def lut_transform(img, lut_table):
+    """Transform array by look-up table.
+
+    The function lut_transform fills the output array with values from the
+    look-up table. Indices of the entries are taken from the input array.
+
+    Args:
+        img (ndarray): Image to be transformed.
+        lut_table (ndarray): look-up table of 256 elements; in case of
+            multi-channel input array, the table should either have a single
+            channel (in this case the same table is used for all channels) or
+            the same number of channels as in the input array.
+
+    Returns:
+        ndarray: The transformed image.
+    """
+    assert isinstance(img, np.ndarray)
+    assert 0 <= np.min(img) and np.max(img) <= 255
+    assert isinstance(lut_table, np.ndarray)
+    assert lut_table.shape == (256, )
+
+    return cv2.LUT(np.array(img, dtype=np.uint8), lut_table)
+
+
+def clahe(img, clip_limit=40.0, tile_grid_size=(8, 8)):
+    """Use CLAHE method to process the image.
+
+    See `ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J].
+    Graphics Gems, 1994:474-485.` for more information.
+
+    Args:
+        img (ndarray): Image to be processed.
+        clip_limit (float): Threshold for contrast limiting. Default: 40.0.
+        tile_grid_size (tuple[int]): Size of grid for histogram equalization.
+            Input image will be divided into equally sized rectangular tiles.
+            It defines the number of tiles in row and column. Default: (8, 8).
+
+    Returns:
+        ndarray: The processed image.
+    """
+    assert isinstance(img, np.ndarray)
+    assert img.ndim == 2
+    assert isinstance(clip_limit, (float, int))
+    assert is_tuple_of(tile_grid_size, int)
+    assert len(tile_grid_size) == 2
+
+    clahe = cv2.createCLAHE(clip_limit, tile_grid_size)
+    return clahe.apply(np.array(img, dtype=np.uint8))
+
+
+def adjust_hue(img: np.ndarray,
+               hue_factor: float,
+               backend: Optional[str] = None) -> np.ndarray:
+    """Adjust hue of an image.
+
+    The image hue is adjusted by converting the image to HSV and cyclically
+    shifting the intensities in the hue channel (H). The image is then
+    converted back to original image mode.
+
+    `hue_factor` is the amount of shift in H channel and must be in the
+    interval `[-0.5, 0.5]`.
+
+    Modified from
+    https://github.com/pytorch/vision/blob/main/torchvision/
+    transforms/functional.py
+
+    Args:
+        img (ndarray): Image to be adjusted.
+        hue_factor (float):  How much to shift the hue channel. Should be in
+            [-0.5, 0.5]. 0.5 and -0.5 give complete reversal of hue channel in
+            HSV space in positive and negative direction respectively.
+            0 means no shift. Therefore, both -0.5 and 0.5 will give an image
+            with complementary colors while 0 gives the original image.
+        backend (str | None): The image processing backend type. Options are
+            `cv2`, `pillow`, `None`. If backend is None, the global
+            ``imread_backend`` specified by ``mmcv.use_backend()`` will be
+            used. Defaults to None.
+
+    Returns:
+        ndarray: Hue adjusted image.
+    """
+    if backend is None:
+        backend = imread_backend
+    if backend not in ['cv2', 'pillow']:
+        raise ValueError(f'backend: {backend} is not supported.'
+                         f"Supported backends are 'cv2', 'pillow'")
+
+    if not (-0.5 <= hue_factor <= 0.5):
+        raise ValueError(f'hue_factor:{hue_factor} is not in [-0.5, 0.5].')
+    if not (isinstance(img, np.ndarray) and (img.ndim in {2, 3})):
+        raise TypeError('img should be ndarray with dim=[2 or 3].')
+
+    if backend == 'pillow':
+        assert img.dtype == np.uint8, 'Pillow backend only support uint8 type'
+        # Image.fromarray defaultly supports RGB, not BGR.
+        pil_image = Image.fromarray(img[..., ::-1], mode='RGB')
+        input_mode = pil_image.mode
+        if input_mode in {'L', '1', 'I', 'F'}:
+            return pil_image
+
+        h, s, v = pil_image.convert('HSV').split()
+
+        np_h = np.array(h, dtype=np.uint8)
+        # uint8 addition take cares of rotation across boundaries
+        with np.errstate(over='ignore'):
+            np_h += np.uint8(hue_factor * 255)
+        h = Image.fromarray(np_h, 'L')
+
+        pil_image = Image.merge('HSV', (h, s, v)).convert(input_mode)
+        return np.array(pil_image, dtype=img.dtype)[..., ::-1]
+    else:
+        dtype = img.dtype
+        img = img.astype(np.uint8)
+        hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV_FULL)
+        h, s, v = cv2.split(hsv_img)
+        h = h.astype(np.uint8)
+        # uint8 addition take cares of rotation across boundaries
+        with np.errstate(over='ignore'):
+            h += np.uint8(hue_factor * 255)
+        hsv_img = cv2.merge([h, s, v])
+        return cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR_FULL).astype(dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b4081ed332108d7dd0fe8713e2cf9800ffca9469
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/__init__.py
@@ -0,0 +1,103 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .active_rotated_filter import active_rotated_filter
+from .assign_score_withk import assign_score_withk
+from .ball_query import ball_query
+from .bbox import bbox_overlaps
+from .bezier_align import BezierAlign, bezier_align
+from .border_align import BorderAlign, border_align
+from .box_iou_quadri import box_iou_quadri
+from .box_iou_rotated import box_iou_rotated
+from .carafe import CARAFE, CARAFENaive, CARAFEPack, carafe, carafe_naive
+from .cc_attention import CrissCrossAttention
+from .chamfer_distance import chamfer_distance
+from .contour_expand import contour_expand
+from .convex_iou import convex_giou, convex_iou
+from .corner_pool import CornerPool
+from .correlation import Correlation
+from .deform_conv import DeformConv2d, DeformConv2dPack, deform_conv2d
+from .deform_roi_pool import (DeformRoIPool, DeformRoIPoolPack,
+                              ModulatedDeformRoIPoolPack, deform_roi_pool)
+from .deprecated_wrappers import Conv2d_deprecated as Conv2d
+from .deprecated_wrappers import ConvTranspose2d_deprecated as ConvTranspose2d
+from .deprecated_wrappers import Linear_deprecated as Linear
+from .deprecated_wrappers import MaxPool2d_deprecated as MaxPool2d
+from .diff_iou_rotated import diff_iou_rotated_2d, diff_iou_rotated_3d
+from .focal_loss import (SigmoidFocalLoss, SoftmaxFocalLoss,
+                         sigmoid_focal_loss, softmax_focal_loss)
+from .furthest_point_sample import (furthest_point_sample,
+                                    furthest_point_sample_with_dist)
+from .fused_bias_leakyrelu import FusedBiasLeakyReLU, fused_bias_leakyrelu
+from .gather_points import gather_points
+from .group_points import GroupAll, QueryAndGroup, grouping_operation
+from .info import get_compiler_version, get_compiling_cuda_version
+from .iou3d import (boxes_iou3d, boxes_iou_bev, boxes_overlap_bev, nms3d,
+                    nms3d_normal, nms_bev, nms_normal_bev)
+from .knn import knn
+from .masked_conv import MaskedConv2d, masked_conv2d
+from .min_area_polygons import min_area_polygons
+from .modulated_deform_conv import (ModulatedDeformConv2d,
+                                    ModulatedDeformConv2dPack,
+                                    modulated_deform_conv2d)
+from .multi_scale_deform_attn import MultiScaleDeformableAttention
+from .nms import batched_nms, nms, nms_match, nms_quadri, nms_rotated, soft_nms
+from .pixel_group import pixel_group
+from .point_sample import (SimpleRoIAlign, point_sample,
+                           rel_roi_point_to_rel_img_point)
+from .points_in_boxes import (points_in_boxes_all, points_in_boxes_cpu,
+                              points_in_boxes_part)
+from .points_in_polygons import points_in_polygons
+from .points_sampler import PointsSampler
+from .prroi_pool import PrRoIPool, prroi_pool
+from .psa_mask import PSAMask
+from .riroi_align_rotated import RiRoIAlignRotated, riroi_align_rotated
+from .roi_align import RoIAlign, roi_align
+from .roi_align_rotated import RoIAlignRotated, roi_align_rotated
+from .roi_pool import RoIPool, roi_pool
+from .roiaware_pool3d import RoIAwarePool3d
+from .roipoint_pool3d import RoIPointPool3d
+from .rotated_feature_align import rotated_feature_align
+from .saconv import SAConv2d
+from .scatter_points import DynamicScatter, dynamic_scatter
+# from .sparse_conv import (SparseConv2d, SparseConv3d, SparseConvTranspose2d,
+#                           SparseConvTranspose3d, SparseInverseConv2d,
+#                           SparseInverseConv3d, SubMConv2d, SubMConv3d)
+# from .sparse_modules import SparseModule, SparseSequential
+# from .sparse_pool import SparseMaxPool2d, SparseMaxPool3d
+# from .sparse_structure import SparseConvTensor, scatter_nd
+from .sync_bn import SyncBatchNorm
+from .three_interpolate import three_interpolate
+from .three_nn import three_nn
+from .tin_shift import TINShift, tin_shift
+from .upfirdn2d import upfirdn2d
+from .voxelize import Voxelization, voxelization
+
+__all__ = [
+    'bbox_overlaps', 'CARAFE', 'CARAFENaive', 'CARAFEPack', 'carafe',
+    'carafe_naive', 'CornerPool', 'DeformConv2d', 'DeformConv2dPack',
+    'deform_conv2d', 'DeformRoIPool', 'DeformRoIPoolPack',
+    'ModulatedDeformRoIPoolPack', 'deform_roi_pool', 'SigmoidFocalLoss',
+    'SoftmaxFocalLoss', 'sigmoid_focal_loss', 'softmax_focal_loss',
+    'get_compiler_version', 'get_compiling_cuda_version', 'MaskedConv2d',
+    'masked_conv2d', 'ModulatedDeformConv2d', 'ModulatedDeformConv2dPack',
+    'modulated_deform_conv2d', 'batched_nms', 'nms', 'soft_nms', 'nms_match',
+    'RoIAlign', 'roi_align', 'RoIPool', 'roi_pool', 'SyncBatchNorm', 'Conv2d',
+    'ConvTranspose2d', 'Linear', 'MaxPool2d', 'CrissCrossAttention', 'PSAMask',
+    'point_sample', 'rel_roi_point_to_rel_img_point', 'SimpleRoIAlign',
+    'SAConv2d', 'TINShift', 'tin_shift', 'assign_score_withk',
+    'box_iou_rotated', 'box_iou_quadri', 'RoIPointPool3d', 'nms_rotated',
+    'knn', 'ball_query', 'upfirdn2d', 'FusedBiasLeakyReLU',
+    'fused_bias_leakyrelu', 'rotated_feature_align', 'RiRoIAlignRotated',
+    'riroi_align_rotated', 'RoIAlignRotated', 'roi_align_rotated',
+    'pixel_group', 'QueryAndGroup', 'GroupAll', 'grouping_operation',
+    'contour_expand', 'three_nn', 'three_interpolate',
+    'MultiScaleDeformableAttention', 'BorderAlign', 'border_align',
+    'gather_points', 'furthest_point_sample', 'nms_quadri',
+    'furthest_point_sample_with_dist', 'PointsSampler', 'Correlation',
+    'boxes_iou3d', 'boxes_iou_bev', 'boxes_overlap_bev', 'nms_bev',
+    'nms_normal_bev', 'nms3d', 'nms3d_normal', 'Voxelization', 'voxelization',
+    'dynamic_scatter', 'DynamicScatter', 'RoIAwarePool3d', 'points_in_boxes_part',
+    'points_in_boxes_cpu', 'points_in_boxes_all', 'points_in_polygons',
+    'min_area_polygons', 'active_rotated_filter', 'convex_iou', 'convex_giou',
+    'diff_iou_rotated_2d', 'diff_iou_rotated_3d', 'chamfer_distance',
+    'PrRoIPool', 'prroi_pool', 'BezierAlign', 'bezier_align'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/active_rotated_filter.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/active_rotated_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..b8ba43dd41cca14e0d74b4ba7dd8316da2ba4abe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/active_rotated_filter.py
@@ -0,0 +1,64 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple
+
+import torch
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext',
+    ['active_rotated_filter_forward', 'active_rotated_filter_backward'])
+
+
+class ActiveRotatedFilterFunction(Function):
+    """Encoding the orientation information and generating orientation-
+    sensitive features.
+
+    The details are described in the paper `Align Deep Features for Oriented
+    Object Detection  <https://arxiv.org/abs/2008.09397>_`.
+    """
+
+    @staticmethod
+    def forward(ctx, input: torch.Tensor,
+                indices: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            input (torch.Tensor): Input features with shape
+                [num_output_planes, num_input_planes, num_orientations, H, W].
+            indices (torch.Tensor): Indices with shape
+                [num_orientations, H, W, num_rotations].
+
+        Returns:
+            torch.Tensor: Refined features with shape [num_output_planes *
+            num_rotations, num_input_planes * num_orientations, H, W].
+        """
+        ctx.save_for_backward(input, indices)
+        op, ip, o, h, w = input.size()
+        o, h, w, r = indices.size()
+        output = input.new_zeros((op * r, ip * o, h, w))
+        ext_module.active_rotated_filter_forward(input, indices, output)
+
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_out: torch.Tensor) -> Tuple[torch.Tensor, None]:
+        """
+        Args:
+            grad_output (torch.Tensor): The gradient of output features
+                with shape [num_output_planes * num_rotations,
+                num_input_planes * num_orientations, H, W].
+
+        Returns:
+            torch.Tensor: The gradient of input features with shape
+            [num_output_planes, num_input_planes, num_orientations, H, W].
+        """
+        input, indices = ctx.saved_tensors
+        grad_in = torch.zeros_like(input)
+        ext_module.active_rotated_filter_backward(grad_out, indices, grad_in)
+        return grad_in, None
+
+
+active_rotated_filter = ActiveRotatedFilterFunction.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/assign_score_withk.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/assign_score_withk.py
new file mode 100644
index 0000000000000000000000000000000000000000..deca0892bddc52b51e9d2543a9e893f0bd67ebdb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/assign_score_withk.py
@@ -0,0 +1,131 @@
+from typing import Tuple
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['assign_score_withk_forward', 'assign_score_withk_backward'])
+
+
+class AssignScoreWithK(Function):
+    r"""Perform weighted sum to generate output features according to scores.
+    Modified from `PAConv <https://github.com/CVMI-Lab/PAConv/tree/main/
+    scene_seg/lib/paconv_lib/src/gpu>`_.
+
+    This is a memory-efficient CUDA implementation of assign_scores operation,
+    which first transform all point features with weight bank, then assemble
+    neighbor features with ``knn_idx`` and perform weighted sum of ``scores``.
+
+    See the `paper <https://arxiv.org/pdf/2103.14635.pdf>`_ appendix Sec. D for
+        more detailed descriptions.
+
+    Note:
+        This implementation assumes using ``neighbor`` kernel input, which is
+            (point_features - center_features, point_features).
+        See https://github.com/CVMI-Lab/PAConv/blob/main/scene_seg/model/
+        pointnet2/paconv.py#L128 for more details.
+    """
+
+    @staticmethod
+    def forward(ctx,
+                scores: torch.Tensor,
+                point_features: torch.Tensor,
+                center_features: torch.Tensor,
+                knn_idx: torch.Tensor,
+                aggregate: str = 'sum') -> torch.Tensor:
+        """
+        Args:
+            scores (torch.Tensor): (B, npoint, K, M), predicted scores to
+                aggregate weight matrices in the weight bank.
+                ``npoint`` is the number of sampled centers.
+                ``K`` is the number of queried neighbors.
+                ``M`` is the number of weight matrices in the weight bank.
+            point_features (torch.Tensor): (B, N, M, out_dim)
+                Pre-computed point features to be aggregated.
+            center_features (torch.Tensor): (B, N, M, out_dim)
+                Pre-computed center features to be aggregated.
+            knn_idx (torch.Tensor): (B, npoint, K), index of sampled kNN.
+                We assume the first idx in each row is the idx of the center.
+            aggregate (str, optional): Aggregation method.
+                Can be 'sum', 'avg' or 'max'. Defaults: 'sum'.
+
+        Returns:
+            torch.Tensor: (B, out_dim, npoint, K), the aggregated features.
+        """
+        agg = {'sum': 0, 'avg': 1, 'max': 2}
+
+        B, N, M, out_dim = point_features.size()
+        _, npoint, K, _ = scores.size()
+
+        output = point_features.new_zeros((B, out_dim, npoint, K))
+        ext_module.assign_score_withk_forward(
+            point_features.contiguous(),
+            center_features.contiguous(),
+            scores.contiguous(),
+            knn_idx.contiguous(),
+            output,
+            B=B,
+            N0=N,
+            N1=npoint,
+            M=M,
+            K=K,
+            O=out_dim,
+            aggregate=agg[aggregate])
+
+        ctx.save_for_backward(output, point_features, center_features, scores,
+                              knn_idx)
+        ctx.agg = agg[aggregate]
+
+        return output
+
+    @staticmethod
+    def backward(
+        ctx, grad_out: torch.Tensor
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, None, None]:
+        """
+        Args:
+            grad_out (torch.Tensor): (B, out_dim, npoint, K)
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains five elements. The first one
+            is the gradient of ``scores`` whose shape is (B, npoint, K, M). The
+            second is the gradient of ``point_features`` whose shape is
+            (B, N, M, out_dim). The third is the gradient of
+            ``center_features`` with the shape of (B, N, M, out_dim). The last
+            two are ``None``.
+        """
+        _, point_features, center_features, scores, knn_idx = ctx.saved_tensors
+
+        agg = ctx.agg
+
+        B, N, M, out_dim = point_features.size()
+        _, npoint, K, _ = scores.size()
+
+        grad_point_features = point_features.new_zeros(point_features.shape)
+        grad_center_features = center_features.new_zeros(center_features.shape)
+        grad_scores = scores.new_zeros(scores.shape)
+
+        ext_module.assign_score_withk_backward(
+            grad_out.contiguous(),
+            point_features.contiguous(),
+            center_features.contiguous(),
+            scores.contiguous(),
+            knn_idx.contiguous(),
+            grad_point_features,
+            grad_center_features,
+            grad_scores,
+            B=B,
+            N0=N,
+            N1=npoint,
+            M=M,
+            K=K,
+            O=out_dim,
+            aggregate=agg)
+
+        return grad_scores, grad_point_features, \
+            grad_center_features, None, None
+
+
+assign_score_withk = AssignScoreWithK.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/ball_query.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/ball_query.py
new file mode 100644
index 0000000000000000000000000000000000000000..a89b36b52b1cce8ab90274418a4d1346796d971c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/ball_query.py
@@ -0,0 +1,87 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Tuple
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['ball_query_forward', 'stack_ball_query_forward'])
+
+
+class BallQuery(Function):
+    """Find nearby points in spherical space."""
+
+    @staticmethod
+    def forward(
+            ctx,
+            min_radius: float,
+            max_radius: float,
+            sample_num: int,
+            xyz: torch.Tensor,
+            center_xyz: torch.Tensor,
+            xyz_batch_cnt: Optional[torch.Tensor] = None,
+            center_xyz_batch_cnt: Optional[torch.Tensor] = None
+    ) -> torch.Tensor:
+        """
+        Args:
+            min_radius (float): minimum radius of the balls.
+            max_radius (float): maximum radius of the balls.
+            sample_num (int): maximum number of features in the balls.
+            xyz (torch.Tensor): (B, N, 3) xyz coordinates of the features,
+                or staked input (N1 + N2 ..., 3).
+            center_xyz (torch.Tensor): (B, npoint, 3) centers of the ball
+                query, or staked input (M1 + M2 ..., 3).
+            xyz_batch_cnt: (batch_size): Stacked input xyz coordinates nums in
+                each batch, just like (N1, N2, ...). Defaults to None.
+                New in version 1.7.0.
+            center_xyz_batch_cnt: (batch_size): Stacked centers coordinates
+                nums in each batch, just line (M1, M2, ...). Defaults to None.
+                New in version 1.7.0.
+
+        Returns:
+            torch.Tensor: (B, npoint, nsample) tensor with the indices of the
+            features that form the query balls.
+        """
+        assert center_xyz.is_contiguous()
+        assert xyz.is_contiguous()
+        assert min_radius < max_radius
+        if xyz_batch_cnt is not None and center_xyz_batch_cnt is not None:
+            assert xyz_batch_cnt.dtype == torch.int
+            assert center_xyz_batch_cnt.dtype == torch.int
+            idx = center_xyz.new_zeros((center_xyz.shape[0], sample_num),
+                                       dtype=torch.int32)
+            ext_module.stack_ball_query_forward(
+                center_xyz,
+                center_xyz_batch_cnt,
+                xyz,
+                xyz_batch_cnt,
+                idx,
+                max_radius=max_radius,
+                nsample=sample_num,
+            )
+        else:
+            B, N, _ = xyz.size()
+            npoint = center_xyz.size(1)
+            idx = xyz.new_zeros(B, npoint, sample_num, dtype=torch.int32)
+            ext_module.ball_query_forward(
+                center_xyz,
+                xyz,
+                idx,
+                b=B,
+                n=N,
+                m=npoint,
+                min_radius=min_radius,
+                max_radius=max_radius,
+                nsample=sample_num)
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(idx)
+        return idx
+
+    @staticmethod
+    def backward(ctx, a=None) -> Tuple[None, None, None, None]:
+        return None, None, None, None
+
+
+ball_query = BallQuery.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bbox.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bbox.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf6bd43bbb0adcb4b6d104a815f73ed2e5912069
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bbox.py
@@ -0,0 +1,130 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
+
+
+def _bbox_overlaps_cpu(bboxes1: torch.Tensor,
+                       bboxes2: torch.Tensor,
+                       mode: str = 'iou',
+                       aligned: bool = False,
+                       offset: int = 0) -> torch.Tensor:
+    assert mode in ['iou', 'iof']
+
+    if aligned:
+        lt = torch.max(bboxes1[:, :2], bboxes2[:, :2])  # [rows, 2]
+        rb = torch.min(bboxes1[:, 2:], bboxes2[:, 2:])  # [rows, 2]
+
+        wh = (rb - lt + offset).clamp(min=0)  # [rows, 2]
+        overlap = wh[:, 0] * wh[:, 1]
+        area1 = (bboxes1[:, 2] - bboxes1[:, 0] + offset) * (
+            bboxes1[:, 3] - bboxes1[:, 1] + offset)
+
+        if mode == 'iou':
+            area2 = (bboxes2[:, 2] - bboxes2[:, 0] + offset) * (
+                bboxes2[:, 3] - bboxes2[:, 1] + offset)
+            ious = overlap / (area1 + area2 - overlap)
+        else:
+            ious = overlap / area1
+    else:
+        lt = torch.max(bboxes1[:, None, :2], bboxes2[:, :2])  # [rows, cols, 2]
+        rb = torch.min(bboxes1[:, None, 2:], bboxes2[:, 2:])  # [rows, cols, 2]
+
+        wh = (rb - lt + offset).clamp(min=0)  # [rows, cols, 2]
+        overlap = wh[:, :, 0] * wh[:, :, 1]
+        area1 = (bboxes1[:, 2] - bboxes1[:, 0] + offset) * (
+            bboxes1[:, 3] - bboxes1[:, 1] + offset)
+
+        if mode == 'iou':
+            area2 = (bboxes2[:, 2] - bboxes2[:, 0] + offset) * (
+                bboxes2[:, 3] - bboxes2[:, 1] + offset)
+            ious = overlap / (area1[:, None] + area2 - overlap)
+        else:
+            ious = overlap / (area1[:, None])
+
+    return ious
+
+
+def bbox_overlaps(bboxes1: torch.Tensor,
+                  bboxes2: torch.Tensor,
+                  mode: str = 'iou',
+                  aligned: bool = False,
+                  offset: int = 0) -> torch.Tensor:
+    """Calculate overlap between two set of bboxes.
+
+    If ``aligned`` is ``False``, then calculate the ious between each bbox
+    of bboxes1 and bboxes2, otherwise the ious between each aligned pair of
+    bboxes1 and bboxes2.
+
+    Args:
+        bboxes1 (torch.Tensor): shape (m, 4) in <x1, y1, x2, y2> format or
+            empty.
+        bboxes2 (torch.Tensor): shape (n, 4) in <x1, y1, x2, y2> format or
+            empty. If aligned is ``True``, then m and n must be equal.
+        mode (str): "iou" (intersection over union) or iof (intersection over
+            foreground).
+
+    Returns:
+        torch.Tensor: Return the ious betweens boxes. If ``aligned`` is
+        ``False``, the shape of ious is (m, n) else (m, 1).
+
+    Example:
+        >>> bboxes1 = torch.FloatTensor([
+        >>>     [0, 0, 10, 10],
+        >>>     [10, 10, 20, 20],
+        >>>     [32, 32, 38, 42],
+        >>> ])
+        >>> bboxes2 = torch.FloatTensor([
+        >>>     [0, 0, 10, 20],
+        >>>     [0, 10, 10, 19],
+        >>>     [10, 10, 20, 20],
+        >>> ])
+        >>> bbox_overlaps(bboxes1, bboxes2)
+        tensor([[0.5000, 0.0000, 0.0000],
+                [0.0000, 0.0000, 1.0000],
+                [0.0000, 0.0000, 0.0000]])
+
+    Example:
+        >>> empty = torch.FloatTensor([])
+        >>> nonempty = torch.FloatTensor([
+        >>>     [0, 0, 10, 9],
+        >>> ])
+        >>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
+        >>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
+        >>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
+    """
+
+    mode_dict = {'iou': 0, 'iof': 1}
+    assert mode in mode_dict.keys()
+    mode_flag = mode_dict[mode]
+    # Either the boxes are empty or the length of boxes' last dimension is 4
+    assert (bboxes1.size(-1) == 4 or bboxes1.size(0) == 0)
+    assert (bboxes2.size(-1) == 4 or bboxes2.size(0) == 0)
+    assert offset == 1 or offset == 0
+
+    rows = bboxes1.size(0)
+    cols = bboxes2.size(0)
+    if aligned:
+        assert rows == cols
+
+    if rows * cols == 0:
+        return bboxes1.new(rows, 1) if aligned else bboxes1.new(rows, cols)
+
+    if bboxes1.device.type == 'cpu':
+        return _bbox_overlaps_cpu(
+            bboxes1, bboxes2, mode=mode, aligned=aligned, offset=offset)
+    else:
+        if aligned:
+            ious = bboxes1.new_zeros(rows)
+        else:
+            ious = bboxes1.new_zeros((rows, cols))
+        ext_module.bbox_overlaps(
+            bboxes1,
+            bboxes2,
+            ious,
+            mode=mode_flag,
+            aligned=aligned,
+            offset=offset)
+        return ious
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bezier_align.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bezier_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..6db7f5c8d8567b4c6ad5df2eb77f6cf60a4f0bb6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/bezier_align.py
@@ -0,0 +1,137 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['bezier_align_forward', 'bezier_align_backward'])
+
+
+class BezierAlignFunction(Function):
+
+    @staticmethod
+    def forward(ctx,
+                input: torch.Tensor,
+                beziers: torch.Tensor,
+                output_size: Union[int, Tuple[int, int]],
+                spatial_scale: Union[int, float] = 1.0,
+                sampling_ratio: int = 0,
+                aligned: bool = True) -> torch.Tensor:
+        ctx.output_size = _pair(output_size)
+        ctx.spatial_scale = spatial_scale
+        ctx.input_shape = input.size()
+        ctx.sampling_ratio = sampling_ratio
+        ctx.aligned = aligned
+
+        assert beziers.size(1) == 17
+        output_shape = (beziers.size(0), input.size(1), ctx.output_size[0],
+                        ctx.output_size[1])
+        output = input.new_zeros(output_shape)
+        ext_module.bezier_align_forward(
+            input,
+            beziers,
+            output,
+            aligned_height=ctx.output_size[0],
+            aligned_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            aligned=ctx.aligned)
+
+        ctx.save_for_backward(beziers)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_output: torch.Tensor):
+        beziers = ctx.saved_tensors[0]
+        grad_input = grad_output.new_zeros(ctx.input_shape)
+        grad_output = grad_output.contiguous()
+        ext_module.bezier_align_backward(
+            grad_output,
+            beziers,
+            grad_input,
+            aligned_height=ctx.output_size[0],
+            aligned_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            aligned=ctx.aligned)
+        return grad_input, None, None, None, None, None
+
+
+bezier_align = BezierAlignFunction.apply
+
+
+class BezierAlign(nn.Module):
+    """Bezier align pooling layer.
+
+    Args:
+        output_size (tuple): h, w
+        spatial_scale (float): scale the input boxes by this number
+        sampling_ratio (int): number of inputs samples to take for each
+            output sample. 0 to take samples densely for current models.
+        aligned (bool): if False, use the legacy implementation in
+            MMDetection. If True, align the results more perfectly.
+
+    Note:
+        The implementation of BezierAlign is modified from
+        https://github.com/aim-uofa/AdelaiDet
+
+        The meaning of aligned=True:
+
+        Given a continuous coordinate c, its two neighboring pixel
+        indices (in our pixel model) are computed by floor(c - 0.5) and
+        ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete
+        indices [0] and [1] (which are sampled from the underlying signal
+        at continuous coordinates 0.5 and 1.5). But the original roi_align
+        (aligned=False) does not subtract the 0.5 when computing
+        neighboring pixel indices and therefore it uses pixels with a
+        slightly incorrect alignment (relative to our pixel model) when
+        performing bilinear interpolation.
+
+        With `aligned=True`,
+        we first appropriately scale the ROI and then shift it by -0.5
+        prior to calling roi_align. This produces the correct neighbors;
+
+        The difference does not make a difference to the model's
+        performance if ROIAlign is used together with conv layers.
+    """
+
+    def __init__(
+        self,
+        output_size: Tuple,
+        spatial_scale: Union[int, float],
+        sampling_ratio: int,
+        aligned: bool = True,
+    ) -> None:
+        super().__init__()
+
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+        self.sampling_ratio = int(sampling_ratio)
+        self.aligned = aligned
+
+    def forward(self, input: torch.Tensor,
+                beziers: torch.Tensor) -> torch.Tensor:
+        """BezierAlign forward.
+
+        Args:
+            inputs (Tensor): input features.
+            beziers (Tensor): beziers for align.
+        """
+        return bezier_align(input, beziers, self.output_size,
+                            self.spatial_scale, self.sampling_ratio,
+                            self.aligned)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(output_size={self.output_size}, '
+        s += f'spatial_scale={self.spatial_scale})'
+        s += f'sampling_ratio={self.sampling_ratio})'
+        s += f'aligned={self.aligned})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/border_align.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/border_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..c09501b962cfce10b1da87e6b651d61911eb8406
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/border_align.py
@@ -0,0 +1,114 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# modified from
+# https://github.com/Megvii-BaseDetection/cvpods/blob/master/cvpods/layers/border_align.py
+
+from typing import Tuple
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['border_align_forward', 'border_align_backward'])
+
+
+class BorderAlignFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, boxes, pool_size):
+        return g.op(
+            'mmcv::MMCVBorderAlign', input, boxes, pool_size_i=pool_size)
+
+    @staticmethod
+    def forward(ctx, input: torch.Tensor, boxes: torch.Tensor,
+                pool_size: int) -> torch.Tensor:
+        ctx.pool_size = pool_size
+        ctx.input_shape = input.size()
+
+        assert boxes.ndim == 3, 'boxes must be with shape [B, H*W, 4]'
+        assert boxes.size(2) == 4, \
+            'the last dimension of boxes must be (x1, y1, x2, y2)'
+        assert input.size(1) % 4 == 0, \
+            'the channel for input feature must be divisible by factor 4'
+
+        # [B, C//4, H*W, 4]
+        output_shape = (input.size(0), input.size(1) // 4, boxes.size(1), 4)
+        output = input.new_zeros(output_shape)
+        # `argmax_idx` only used for backward
+        argmax_idx = input.new_zeros(output_shape).to(torch.int)
+
+        ext_module.border_align_forward(
+            input, boxes, output, argmax_idx, pool_size=ctx.pool_size)
+
+        ctx.save_for_backward(boxes, argmax_idx)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx,
+                 grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
+        boxes, argmax_idx = ctx.saved_tensors
+        grad_input = grad_output.new_zeros(ctx.input_shape)
+        # complex head architecture may cause grad_output uncontiguous
+        grad_output = grad_output.contiguous()
+        ext_module.border_align_backward(
+            grad_output,
+            boxes,
+            argmax_idx,
+            grad_input,
+            pool_size=ctx.pool_size)
+        return grad_input, None, None
+
+
+border_align = BorderAlignFunction.apply
+
+
+class BorderAlign(nn.Module):
+    r"""Border align pooling layer.
+
+    Applies border_align over the input feature based on predicted bboxes.
+    The details were described in the paper
+    `BorderDet: Border Feature for Dense Object Detection
+    <https://arxiv.org/abs/2007.11056>`_.
+
+    For each border line (e.g. top, left, bottom or right) of each box,
+    border_align does the following:
+
+    1. uniformly samples ``pool_size`` +1 positions on this line, involving
+       the start and end points.
+    2. the corresponding features on these points are computed by bilinear
+       interpolation.
+    3. max pooling over all the ``pool_size`` +1 positions are used for
+       computing pooled feature.
+
+    Args:
+        pool_size (int): number of positions sampled over the boxes' borders
+            (e.g. top, bottom, left, right).
+    """
+
+    def __init__(self, pool_size: int):
+        super().__init__()
+        self.pool_size = pool_size
+
+    def forward(self, input: torch.Tensor,
+                boxes: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            input: Features with shape [N,4C,H,W]. Channels ranged in [0,C),
+                [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom,
+                right features respectively.
+            boxes: Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).
+
+        Returns:
+            torch.Tensor: Pooled features with shape [N,C,H*W,4]. The order is
+            (top,left,bottom,right) for the last dimension.
+        """
+        return border_align(input, boxes, self.pool_size)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(pool_size={self.pool_size})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_quadri.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_quadri.py
new file mode 100644
index 0000000000000000000000000000000000000000..89747fdf1f03e0491351f876385ba3c1369ebaf7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_quadri.py
@@ -0,0 +1,49 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['box_iou_quadri'])
+
+
+def box_iou_quadri(bboxes1: torch.Tensor,
+                   bboxes2: torch.Tensor,
+                   mode: str = 'iou',
+                   aligned: bool = False) -> torch.Tensor:
+    """Return intersection-over-union (Jaccard index) of boxes.
+
+    Both sets of boxes are expected to be in
+    (x1, y1, ..., x4, y4) format.
+
+    If ``aligned`` is ``False``, then calculate the ious between each bbox
+    of bboxes1 and bboxes2, otherwise the ious between each aligned pair of
+    bboxes1 and bboxes2.
+
+    Args:
+        bboxes1 (torch.Tensor): quadrilateral bboxes 1. It has shape (N, 8),
+            indicating (x1, y1, ..., x4, y4) for each row.
+        bboxes2 (torch.Tensor): quadrilateral bboxes 2. It has shape (M, 8),
+            indicating (x1, y1, ..., x4, y4) for each row.
+        mode (str): "iou" (intersection over union) or iof (intersection over
+            foreground).
+
+    Returns:
+        torch.Tensor: Return the ious betweens boxes. If ``aligned`` is
+        ``False``, the shape of ious is (N, M) else (N,).
+    """
+    assert mode in ['iou', 'iof']
+    mode_dict = {'iou': 0, 'iof': 1}
+    mode_flag = mode_dict[mode]
+    rows = bboxes1.size(0)
+    cols = bboxes2.size(0)
+    if aligned:
+        ious = bboxes1.new_zeros(rows)
+    else:
+        ious = bboxes1.new_zeros(rows * cols)
+    bboxes1 = bboxes1.contiguous()
+    bboxes2 = bboxes2.contiguous()
+    ext_module.box_iou_quadri(
+        bboxes1, bboxes2, ious, mode_flag=mode_flag, aligned=aligned)
+    if not aligned:
+        ious = ious.view(rows, cols)
+    return ious
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_rotated.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..2443af27c92146ed4328e8f94b1415c7e72c542b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/box_iou_rotated.py
@@ -0,0 +1,148 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['box_iou_rotated'])
+
+
+def box_iou_rotated(bboxes1: torch.Tensor,
+                    bboxes2: torch.Tensor,
+                    mode: str = 'iou',
+                    aligned: bool = False,
+                    clockwise: bool = True) -> torch.Tensor:
+    """Return intersection-over-union (Jaccard index) of boxes.
+
+    Both sets of boxes are expected to be in
+    (x_center, y_center, width, height, angle) format.
+
+    If ``aligned`` is ``False``, then calculate the ious between each bbox
+    of bboxes1 and bboxes2, otherwise the ious between each aligned pair of
+    bboxes1 and bboxes2.
+
+    .. note::
+        The operator assumes:
+
+        1) The positive direction along x axis is left -> right.
+
+        2) The positive direction along y axis is top -> down.
+
+        3) The w border is in parallel with x axis when angle = 0.
+
+        However, there are 2 opposite definitions of the positive angular
+        direction, clockwise (CW) and counter-clockwise (CCW). MMCV supports
+        both definitions and uses CW by default.
+
+        Please set ``clockwise=False`` if you are using the CCW definition.
+
+        The coordinate system when ``clockwise`` is ``True`` (default)
+
+            .. code-block:: none
+
+                0-------------------> x (0 rad)
+                |  A-------------B
+                |  |             |
+                |  |     box     h
+                |  |   angle=0   |
+                |  D------w------C
+                v
+                y (pi/2 rad)
+
+            In such coordination system the rotation matrix is
+
+            .. math::
+                \\begin{pmatrix}
+                \\cos\\alpha & -\\sin\\alpha \\\\
+                \\sin\\alpha & \\cos\\alpha
+                \\end{pmatrix}
+
+            The coordinates of the corner point A can be calculated as:
+
+            .. math::
+                P_A=
+                \\begin{pmatrix} x_A \\\\ y_A\\end{pmatrix}
+                =
+                \\begin{pmatrix} x_{center} \\\\ y_{center}\\end{pmatrix} +
+                \\begin{pmatrix}\\cos\\alpha & -\\sin\\alpha \\\\
+                \\sin\\alpha & \\cos\\alpha\\end{pmatrix}
+                \\begin{pmatrix} -0.5w \\\\ -0.5h\\end{pmatrix} \\\\
+                =
+                \\begin{pmatrix} x_{center}-0.5w\\cos\\alpha+0.5h\\sin\\alpha
+                \\\\
+                y_{center}-0.5w\\sin\\alpha-0.5h\\cos\\alpha\\end{pmatrix}
+
+
+        The coordinate system when ``clockwise`` is ``False``
+
+            .. code-block:: none
+
+                0-------------------> x (0 rad)
+                |  A-------------B
+                |  |             |
+                |  |     box     h
+                |  |   angle=0   |
+                |  D------w------C
+                v
+                y (-pi/2 rad)
+
+            In such coordination system the rotation matrix is
+
+            .. math::
+                \\begin{pmatrix}
+                \\cos\\alpha & \\sin\\alpha \\\\
+                -\\sin\\alpha & \\cos\\alpha
+                \\end{pmatrix}
+
+            The coordinates of the corner point A can be calculated as:
+
+            .. math::
+                P_A=
+                \\begin{pmatrix} x_A \\\\ y_A\\end{pmatrix}
+                =
+                \\begin{pmatrix} x_{center} \\\\ y_{center}\\end{pmatrix} +
+                \\begin{pmatrix}\\cos\\alpha & \\sin\\alpha \\\\
+                -\\sin\\alpha & \\cos\\alpha\\end{pmatrix}
+                \\begin{pmatrix} -0.5w \\\\ -0.5h\\end{pmatrix} \\\\
+                =
+                \\begin{pmatrix} x_{center}-0.5w\\cos\\alpha-0.5h\\sin\\alpha
+                \\\\
+                y_{center}+0.5w\\sin\\alpha-0.5h\\cos\\alpha\\end{pmatrix}
+
+    Args:
+        boxes1 (torch.Tensor): rotated bboxes 1. It has shape (N, 5),
+            indicating (x, y, w, h, theta) for each row. Note that theta is in
+            radian.
+        boxes2 (torch.Tensor): rotated bboxes 2. It has shape (M, 5),
+            indicating (x, y, w, h, theta) for each row. Note that theta is in
+            radian.
+        mode (str): "iou" (intersection over union) or iof (intersection over
+            foreground).
+        clockwise (bool): flag indicating whether the positive angular
+            orientation is clockwise. default True.
+            `New in version 1.4.3.`
+
+    Returns:
+        torch.Tensor: Return the ious betweens boxes. If ``aligned`` is
+        ``False``, the shape of ious is (N, M) else (N,).
+    """
+    assert mode in ['iou', 'iof']
+    mode_dict = {'iou': 0, 'iof': 1}
+    mode_flag = mode_dict[mode]
+    rows = bboxes1.size(0)
+    cols = bboxes2.size(0)
+    if aligned:
+        ious = bboxes1.new_zeros(rows)
+    else:
+        ious = bboxes1.new_zeros(rows * cols)
+    if not clockwise:
+        flip_mat = bboxes1.new_ones(bboxes1.shape[-1])
+        flip_mat[-1] = -1
+        bboxes1 = bboxes1 * flip_mat
+        bboxes2 = bboxes2 * flip_mat
+    bboxes1 = bboxes1.contiguous()
+    bboxes2 = bboxes2.contiguous()
+    ext_module.box_iou_rotated(
+        bboxes1, bboxes2, ious, mode_flag=mode_flag, aligned=aligned)
+    if not aligned:
+        ious = ious.view(rows, cols)
+    return ious
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/carafe.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/carafe.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7e79c275e2bea62ce7e08fb6e6e4629c7565600
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/carafe.py
@@ -0,0 +1,300 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model import normal_init, xavier_init
+from mmengine.registry import MODELS
+from torch import Tensor
+from torch.autograd import Function
+from torch.nn.modules.module import Module
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'carafe_naive_forward', 'carafe_naive_backward', 'carafe_forward',
+    'carafe_backward'
+])
+
+
+class CARAFENaiveFunction(Function):
+
+    @staticmethod
+    def symbolic(g, features: Tensor, masks: Tensor, kernel_size: int,
+                 group_size: int, scale_factor: int) -> Tensor:
+        return g.op(
+            'mmcv::MMCVCARAFENaive',
+            features,
+            masks,
+            kernel_size_i=kernel_size,
+            group_size_i=group_size,
+            scale_factor_f=scale_factor)
+
+    @staticmethod
+    def forward(ctx, features: Tensor, masks: Tensor, kernel_size: int,
+                group_size: int, scale_factor: int) -> Tensor:
+        assert scale_factor >= 1
+        assert masks.size(1) == kernel_size * kernel_size * group_size
+        assert masks.size(-1) == features.size(-1) * scale_factor
+        assert masks.size(-2) == features.size(-2) * scale_factor
+        assert features.size(1) % group_size == 0
+        assert (kernel_size - 1) % 2 == 0 and kernel_size >= 1
+        ctx.kernel_size = kernel_size
+        ctx.group_size = group_size
+        ctx.scale_factor = scale_factor
+        ctx.feature_size = features.size()
+        ctx.mask_size = masks.size()
+
+        n, c, h, w = features.size()
+        output = features.new_zeros((n, c, h * scale_factor, w * scale_factor))
+        ext_module.carafe_naive_forward(
+            features,
+            masks,
+            output,
+            kernel_size=kernel_size,
+            group_size=group_size,
+            scale_factor=scale_factor)
+
+        if features.requires_grad or masks.requires_grad or \
+                torch.__version__ == 'parrots':
+            ctx.save_for_backward(features, masks)
+        return output
+
+    @staticmethod
+    def backward(
+            ctx,
+            grad_output: Tensor) -> Tuple[Tensor, Tensor, None, None, None]:
+        assert grad_output.is_cuda
+
+        features, masks = ctx.saved_tensors
+        kernel_size = ctx.kernel_size
+        group_size = ctx.group_size
+        scale_factor = ctx.scale_factor
+
+        grad_input = torch.zeros_like(features)
+        grad_masks = torch.zeros_like(masks)
+        ext_module.carafe_naive_backward(
+            grad_output.contiguous(),
+            features,
+            masks,
+            grad_input,
+            grad_masks,
+            kernel_size=kernel_size,
+            group_size=group_size,
+            scale_factor=scale_factor)
+
+        return grad_input, grad_masks, None, None, None
+
+
+carafe_naive = CARAFENaiveFunction.apply
+
+
+class CARAFENaive(Module):
+
+    def __init__(self, kernel_size: int, group_size: int, scale_factor: int):
+        super().__init__()
+
+        assert isinstance(kernel_size, int) and isinstance(
+            group_size, int) and isinstance(scale_factor, int)
+        self.kernel_size = kernel_size
+        self.group_size = group_size
+        self.scale_factor = scale_factor
+
+    def forward(self, features: Tensor, masks: Tensor) -> Tensor:
+        return carafe_naive(features, masks, self.kernel_size, self.group_size,
+                            self.scale_factor)
+
+
+class CARAFEFunction(Function):
+
+    @staticmethod
+    def symbolic(g, features: Tensor, masks: Tensor, kernel_size: int,
+                 group_size: int, scale_factor: int) -> Tensor:
+        return g.op(
+            'mmcv::MMCVCARAFE',
+            features,
+            masks,
+            kernel_size_i=kernel_size,
+            group_size_i=group_size,
+            scale_factor_f=scale_factor)
+
+    @staticmethod
+    def forward(ctx, features: Tensor, masks: Tensor, kernel_size: int,
+                group_size: int, scale_factor: int) -> Tensor:
+        assert scale_factor >= 1
+        assert masks.size(1) == kernel_size * kernel_size * group_size
+        assert masks.size(-1) == features.size(-1) * scale_factor
+        assert masks.size(-2) == features.size(-2) * scale_factor
+        assert features.size(1) % group_size == 0
+        assert (kernel_size - 1) % 2 == 0 and kernel_size >= 1
+        ctx.kernel_size = kernel_size
+        ctx.group_size = group_size
+        ctx.scale_factor = scale_factor
+        ctx.feature_size = features.size()
+        ctx.mask_size = masks.size()
+
+        n, c, h, w = features.size()
+        output = features.new_zeros((n, c, h * scale_factor, w * scale_factor))
+        routput = features.new_zeros(output.size(), requires_grad=False)
+        rfeatures = features.new_zeros(features.size(), requires_grad=False)
+        rmasks = masks.new_zeros(masks.size(), requires_grad=False)
+        ext_module.carafe_forward(
+            features,
+            masks,
+            rfeatures,
+            routput,
+            rmasks,
+            output,
+            kernel_size=kernel_size,
+            group_size=group_size,
+            scale_factor=scale_factor)
+
+        if features.requires_grad or masks.requires_grad or \
+                torch.__version__ == 'parrots':
+            ctx.save_for_backward(features, masks, rfeatures)
+        return output
+
+    @staticmethod
+    def backward(
+            ctx,
+            grad_output: Tensor) -> Tuple[Tensor, Tensor, None, None, None]:
+        features, masks, rfeatures = ctx.saved_tensors
+        kernel_size = ctx.kernel_size
+        group_size = ctx.group_size
+        scale_factor = ctx.scale_factor
+
+        rgrad_output = torch.zeros_like(grad_output, requires_grad=False)
+        rgrad_input_hs = torch.zeros_like(grad_output, requires_grad=False)
+        rgrad_input = torch.zeros_like(features, requires_grad=False)
+        rgrad_masks = torch.zeros_like(masks, requires_grad=False)
+        grad_input = torch.zeros_like(features, requires_grad=False)
+        grad_masks = torch.zeros_like(masks, requires_grad=False)
+        ext_module.carafe_backward(
+            grad_output.contiguous(),
+            rfeatures,
+            masks,
+            rgrad_output,
+            rgrad_input_hs,
+            rgrad_input,
+            rgrad_masks,
+            grad_input,
+            grad_masks,
+            kernel_size=kernel_size,
+            group_size=group_size,
+            scale_factor=scale_factor)
+        return grad_input, grad_masks, None, None, None
+
+
+carafe = CARAFEFunction.apply
+
+
+class CARAFE(Module):
+    """ CARAFE: Content-Aware ReAssembly of FEatures
+
+    Please refer to `CARAFE: Content-Aware ReAssembly of FEatures
+    <https://arxiv.org/abs/1905.02188>`_ for more details.
+
+    Args:
+        kernel_size (int): reassemble kernel size
+        group_size (int): reassemble group size
+        scale_factor (int): upsample ratio
+
+    Returns:
+        upsampled feature map
+    """
+
+    def __init__(self, kernel_size: int, group_size: int, scale_factor: int):
+        super().__init__()
+
+        assert isinstance(kernel_size, int) and isinstance(
+            group_size, int) and isinstance(scale_factor, int)
+        self.kernel_size = kernel_size
+        self.group_size = group_size
+        self.scale_factor = scale_factor
+
+    def forward(self, features: Tensor, masks: Tensor) -> Tensor:
+        return carafe(features, masks, self.kernel_size, self.group_size,
+                      self.scale_factor)
+
+
+@MODELS.register_module(name='carafe')
+class CARAFEPack(nn.Module):
+    """A unified package of CARAFE upsampler that contains: 1) channel
+    compressor 2) content encoder 3) CARAFE op.
+
+    Official implementation of ICCV 2019 paper
+    `CARAFE: Content-Aware ReAssembly of FEatures
+    <https://arxiv.org/abs/1905.02188>`_.
+
+    Args:
+        channels (int): input feature channels
+        scale_factor (int): upsample ratio
+        up_kernel (int): kernel size of CARAFE op
+        up_group (int): group size of CARAFE op
+        encoder_kernel (int): kernel size of content encoder
+        encoder_dilation (int): dilation of content encoder
+        compressed_channels (int): output channels of channels compressor
+
+    Returns:
+        upsampled feature map
+    """
+
+    def __init__(self,
+                 channels: int,
+                 scale_factor: int,
+                 up_kernel: int = 5,
+                 up_group: int = 1,
+                 encoder_kernel: int = 3,
+                 encoder_dilation: int = 1,
+                 compressed_channels: int = 64):
+        super().__init__()
+        self.channels = channels
+        self.scale_factor = scale_factor
+        self.up_kernel = up_kernel
+        self.up_group = up_group
+        self.encoder_kernel = encoder_kernel
+        self.encoder_dilation = encoder_dilation
+        self.compressed_channels = compressed_channels
+        self.channel_compressor = nn.Conv2d(channels, self.compressed_channels,
+                                            1)
+        self.content_encoder = nn.Conv2d(
+            self.compressed_channels,
+            self.up_kernel * self.up_kernel * self.up_group *
+            self.scale_factor * self.scale_factor,
+            self.encoder_kernel,
+            padding=int((self.encoder_kernel - 1) * self.encoder_dilation / 2),
+            dilation=self.encoder_dilation,
+            groups=1)
+        self.init_weights()
+
+    def init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                xavier_init(m, distribution='uniform')
+        normal_init(self.content_encoder, std=0.001)
+
+    def kernel_normalizer(self, mask: Tensor) -> Tensor:
+        mask = F.pixel_shuffle(mask, self.scale_factor)
+        n, mask_c, h, w = mask.size()
+        # use float division explicitly,
+        # to void inconsistency while exporting to onnx
+        mask_channel = int(mask_c / float(self.up_kernel**2))
+        mask = mask.view(n, mask_channel, -1, h, w)
+
+        mask = F.softmax(mask, dim=2, dtype=mask.dtype)
+        mask = mask.view(n, mask_c, h, w).contiguous()
+
+        return mask
+
+    def feature_reassemble(self, x: Tensor, mask: Tensor) -> Tensor:
+        x = carafe(x, mask, self.up_kernel, self.up_group, self.scale_factor)
+        return x
+
+    def forward(self, x: Tensor) -> Tensor:
+        compressed_x = self.channel_compressor(x)
+        mask = self.content_encoder(compressed_x)
+        mask = self.kernel_normalizer(mask)
+
+        x = self.feature_reassemble(x, mask)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/cc_attention.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/cc_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..efde7b703c8c50ecf5aa604e756422f0be488759
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/cc_attention.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.registry import MODELS
+
+from mmcv.cnn import Scale
+
+
+def NEG_INF_DIAG(n: int, device: torch.device) -> torch.Tensor:
+    """Returns a diagonal matrix of size [n, n].
+
+    The diagonal are all "-inf". This is for avoiding calculating the
+    overlapped element in the Criss-Cross twice.
+    """
+    return torch.diag(torch.tensor(float('-inf')).to(device).repeat(n), 0)
+
+
+@MODELS.register_module()
+class CrissCrossAttention(nn.Module):
+    """Criss-Cross Attention Module.
+
+    .. note::
+        Before v1.3.13, we use a CUDA op. Since v1.3.13, we switch
+        to a pure PyTorch and equivalent implementation. For more
+        details, please refer to https://github.com/open-mmlab/mmcv/pull/1201.
+
+        Speed comparison for one forward pass
+
+        - Input size: [2,512,97,97]
+        - Device: 1 NVIDIA GeForce RTX 2080 Ti
+
+        +-----------------------+---------------+------------+---------------+
+        |                       |PyTorch version|CUDA version|Relative speed |
+        +=======================+===============+============+===============+
+        |with torch.no_grad()   |0.00554402 s   |0.0299619 s |5.4x           |
+        +-----------------------+---------------+------------+---------------+
+        |no with torch.no_grad()|0.00562803 s   |0.0301349 s |5.4x           |
+        +-----------------------+---------------+------------+---------------+
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+    """
+
+    def __init__(self, in_channels: int) -> None:
+        super().__init__()
+        self.query_conv = nn.Conv2d(in_channels, in_channels // 8, 1)
+        self.key_conv = nn.Conv2d(in_channels, in_channels // 8, 1)
+        self.value_conv = nn.Conv2d(in_channels, in_channels, 1)
+        self.gamma = Scale(0.)
+        self.in_channels = in_channels
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """forward function of Criss-Cross Attention.
+
+        Args:
+            x (torch.Tensor): Input feature with the shape of
+                (batch_size, in_channels, height, width).
+
+        Returns:
+            torch.Tensor: Output of the layer, with the shape of
+            (batch_size, in_channels, height, width)
+        """
+        B, C, H, W = x.size()
+        query = self.query_conv(x)
+        key = self.key_conv(x)
+        value = self.value_conv(x)
+        energy_H = torch.einsum('bchw,bciw->bwhi', query, key) + NEG_INF_DIAG(
+            H, query.device)
+        energy_H = energy_H.transpose(1, 2)
+        energy_W = torch.einsum('bchw,bchj->bhwj', query, key)
+        attn = F.softmax(
+            torch.cat([energy_H, energy_W], dim=-1), dim=-1)  # [B,H,W,(H+W)]
+        out = torch.einsum('bciw,bhwi->bchw', value, attn[..., :H])
+        out += torch.einsum('bchj,bhwj->bchw', value, attn[..., H:])
+
+        out = self.gamma(out) + x
+        out = out.contiguous()
+
+        return out
+
+    def __repr__(self) -> str:
+        s = self.__class__.__name__
+        s += f'(in_channels={self.in_channels})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/chamfer_distance.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/chamfer_distance.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f908a5bbc2655de6233cd6ddfa140ee783079ba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/chamfer_distance.py
@@ -0,0 +1,93 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Sequence, Tuple
+
+import torch
+from torch import Tensor
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['chamfer_distance_forward', 'chamfer_distance_backward'])
+
+
+class ChamferDistanceFunction(Function):
+    """This is an implementation of the 2D Chamfer Distance.
+
+    It has been used in the paper `Oriented RepPoints for Aerial Object
+    Detection (CVPR 2022) <https://arxiv.org/abs/2105.11111>_`.
+    """
+
+    @staticmethod
+    def forward(ctx, xyz1: Tensor, xyz2: Tensor) -> Sequence[Tensor]:
+        """
+        Args:
+            xyz1 (Tensor): Point set with shape (B, N, 2).
+            xyz2 (Tensor): Point set with shape (B, N, 2).
+
+        Returns:
+            Sequence[Tensor]:
+
+                - dist1 (Tensor): Chamfer distance (xyz1 to xyz2) with
+                    shape (B, N).
+                - dist2 (Tensor): Chamfer distance (xyz2 to xyz1) with
+                    shape (B, N).
+                - idx1 (Tensor): Index of chamfer distance (xyz1 to xyz2)
+                    with shape (B, N), which be used in compute gradient.
+                - idx2 (Tensor): Index of chamfer distance (xyz2 to xyz2)
+                    with shape (B, N), which be used in compute gradient.
+        """
+        batch_size, n, _ = xyz1.size()
+        _, m, _ = xyz2.size()
+        device = xyz1.device
+        xyz1 = xyz1.contiguous()
+        xyz2 = xyz2.contiguous()
+
+        dist1 = torch.zeros(batch_size, n).to(device)
+        dist2 = torch.zeros(batch_size, m).to(device)
+        idx1 = torch.zeros(batch_size, n).type(torch.IntTensor).to(device)
+        idx2 = torch.zeros(batch_size, m).type(torch.IntTensor).to(device)
+
+        ext_module.chamfer_distance_forward(xyz1, xyz2, dist1, dist2, idx1,
+                                            idx2)
+        ctx.save_for_backward(xyz1, xyz2, idx1, idx2)
+        return dist1, dist2, idx1, idx2
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx,
+                 grad_dist1: Tensor,
+                 grad_dist2: Tensor,
+                 grad_idx1=None,
+                 grad_idx2=None) -> Tuple[Tensor, Tensor]:
+        """
+
+        Args:
+            grad_dist1 (Tensor): Gradient of chamfer distance
+                (xyz1 to xyz2) with shape (B, N).
+            grad_dist2 (Tensor): Gradient of chamfer distance
+                (xyz2 to xyz1) with shape (B, N).
+
+        Returns:
+            Tuple[Tensor, Tensor]:
+
+            - grad_xyz1 (Tensor): Gradient of the point set with shape \
+                (B, N, 2).
+            - grad_xyz2 (Tensor):Gradient of the point set with shape \
+                (B, N, 2).
+        """
+        xyz1, xyz2, idx1, idx2 = ctx.saved_tensors
+        device = grad_dist1.device
+        grad_dist1 = grad_dist1.contiguous()
+        grad_dist2 = grad_dist2.contiguous()
+        grad_xyz1 = torch.zeros(xyz1.size()).to(device)
+        grad_xyz2 = torch.zeros(xyz2.size()).to(device)
+
+        ext_module.chamfer_distance_backward(xyz1, xyz2, idx1, idx2,
+                                             grad_dist1, grad_dist2, grad_xyz1,
+                                             grad_xyz2)
+        return grad_xyz1, grad_xyz2
+
+
+chamfer_distance = ChamferDistanceFunction.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/contour_expand.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/contour_expand.py
new file mode 100644
index 0000000000000000000000000000000000000000..7184609ad9b64d421c17fdfe4a1a0dbeb62d64c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/contour_expand.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Union
+
+import numpy as np
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['contour_expand'])
+
+
+def contour_expand(kernel_mask: Union[np.array, torch.Tensor],
+                   internal_kernel_label: Union[np.array, torch.Tensor],
+                   min_kernel_area: int, kernel_num: int) -> list:
+    """Expand kernel contours so that foreground pixels are assigned into
+    instances.
+
+    Args:
+        kernel_mask (np.array or torch.Tensor): The instance kernel mask with
+            size hxw.
+        internal_kernel_label (np.array or torch.Tensor): The instance internal
+            kernel label with size hxw.
+        min_kernel_area (int): The minimum kernel area.
+        kernel_num (int): The instance kernel number.
+
+    Returns:
+        list: The instance index map with size hxw.
+    """
+    assert isinstance(kernel_mask, (torch.Tensor, np.ndarray))
+    assert isinstance(internal_kernel_label, (torch.Tensor, np.ndarray))
+    assert isinstance(min_kernel_area, int)
+    assert isinstance(kernel_num, int)
+
+    if isinstance(kernel_mask, np.ndarray):
+        kernel_mask = torch.from_numpy(kernel_mask)
+    if isinstance(internal_kernel_label, np.ndarray):
+        internal_kernel_label = torch.from_numpy(internal_kernel_label)
+
+    if torch.__version__ == 'parrots':
+        if kernel_mask.shape[0] == 0 or internal_kernel_label.shape[0] == 0:
+            label = []
+        else:
+            label = ext_module.contour_expand(
+                kernel_mask,
+                internal_kernel_label,
+                min_kernel_area=min_kernel_area,
+                kernel_num=kernel_num)
+            label = label.tolist()  # type: ignore
+    else:
+        label = ext_module.contour_expand(kernel_mask, internal_kernel_label,
+                                          min_kernel_area, kernel_num)
+    return label
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/convex_iou.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/convex_iou.py
new file mode 100644
index 0000000000000000000000000000000000000000..50050363ac5b08cfa8f86dd186ab7087fac6f48a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/convex_iou.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple
+
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['convex_iou', 'convex_giou'])
+
+
+def convex_giou(pointsets: torch.Tensor,
+                polygons: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+    """Return generalized intersection-over-union (Jaccard index) between point
+    sets and polygons.
+
+    Args:
+        pointsets (torch.Tensor): It has shape (N, 18),
+            indicating (x1, y1, x2, y2, ..., x9, y9) for each row.
+        polygons (torch.Tensor): It has shape (N, 8),
+            indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.
+
+    Returns:
+        tuple[torch.Tensor, torch.Tensor]: The first element is the gious
+        between point sets and polygons with the shape (N,). The second
+        element is the gradient of point sets with the shape (N, 18).
+    """
+    output = pointsets.new_zeros((pointsets.size(0), 19))
+    ext_module.convex_giou(pointsets, polygons, output)
+    convex_giou = output[:, -1]
+    points_grad = output[:, 0:-1]
+    return convex_giou, points_grad
+
+
+def convex_iou(pointsets: torch.Tensor,
+               polygons: torch.Tensor) -> torch.Tensor:
+    """Return intersection-over-union (Jaccard index) between point sets and
+    polygons.
+
+    Args:
+        pointsets (torch.Tensor): It has shape (N, 18),
+            indicating (x1, y1, x2, y2, ..., x9, y9) for each row.
+        polygons (torch.Tensor): It has shape (K, 8),
+            indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.
+
+    Returns:
+        torch.Tensor: Return the ious between point sets and polygons with the
+        shape (N, K).
+    """
+    N, K = pointsets.size(0), polygons.size(0)
+    ious = pointsets.new_zeros((N, K))
+    ext_module.convex_iou(pointsets, polygons, ious)
+    return ious
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/corner_pool.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/corner_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..89a7c485ce8d5ced215eac36e922cfe110ff5318
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/corner_pool.py
@@ -0,0 +1,83 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from torch import Tensor, nn
+
+_mode_dict = {'top': 0, 'bottom': 1, 'left': 2, 'right': 3}
+
+
+def _corner_pool(x: Tensor, dim: int, flip: bool) -> Tensor:
+    size = x.size(dim)
+    output = x.clone()
+
+    ind = 1
+    while ind < size:
+        if flip:
+            cur_start = 0
+            cur_len = size - ind
+            next_start = ind
+            next_len = size - ind
+        else:
+            cur_start = ind
+            cur_len = size - ind
+            next_start = 0
+            next_len = size - ind
+
+        # max_temp should be cloned for backward computation
+        max_temp = output.narrow(dim, cur_start, cur_len).clone()
+        cur_temp = output.narrow(dim, cur_start, cur_len)
+        next_temp = output.narrow(dim, next_start, next_len)
+
+        cur_temp[...] = torch.where(max_temp > next_temp, max_temp, next_temp)
+
+        ind = ind << 1
+
+    return output
+
+
+class CornerPool(nn.Module):
+    """Corner Pooling.
+
+    Corner Pooling is a new type of pooling layer that helps a
+    convolutional network better localize corners of bounding boxes.
+
+    Please refer to `CornerNet: Detecting Objects as Paired Keypoints
+    <https://arxiv.org/abs/1808.01244>`_ for more details.
+
+    Code is modified from https://github.com/princeton-vl/CornerNet-Lite.
+
+    Args:
+        mode (str): Pooling orientation for the pooling layer
+
+            - 'bottom': Bottom Pooling
+            - 'left': Left Pooling
+            - 'right': Right Pooling
+            - 'top': Top Pooling
+
+    Returns:
+        Feature map after pooling.
+    """
+
+    cummax_dim_flip = {
+        'bottom': (2, False),
+        'left': (3, True),
+        'right': (3, False),
+        'top': (2, True),
+    }
+
+    def __init__(self, mode: str):
+        super().__init__()
+        assert mode in self.cummax_dim_flip
+        self.mode = mode
+
+    def forward(self, x: Tensor) -> Tensor:
+        if torch.__version__ != 'parrots' and torch.__version__ >= '1.5.0':
+            dim, flip = self.cummax_dim_flip[self.mode]
+            if flip:
+                x = x.flip(dim)
+            pool_tensor, _ = torch.cummax(x, dim=dim)
+            if flip:
+                pool_tensor = pool_tensor.flip(dim)
+            return pool_tensor
+        else:
+            dim, flip = self.cummax_dim_flip[self.mode]
+            return _corner_pool(x, dim, flip)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/correlation.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/correlation.py
new file mode 100644
index 0000000000000000000000000000000000000000..319b7646782637e9ebaac4ef07b82d1f460031b5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/correlation.py
@@ -0,0 +1,200 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple
+
+import torch
+from torch import Tensor, nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['correlation_forward', 'correlation_backward'])
+
+
+class CorrelationFunction(Function):
+
+    @staticmethod
+    def forward(ctx,
+                input1: Tensor,
+                input2: Tensor,
+                kernel_size: int = 1,
+                max_displacement: int = 1,
+                stride: int = 1,
+                padding: int = 1,
+                dilation: int = 1,
+                dilation_patch: int = 1) -> Tensor:
+
+        ctx.save_for_backward(input1, input2)
+
+        kH, kW = ctx.kernel_size = _pair(kernel_size)
+        patch_size = max_displacement * 2 + 1
+        ctx.patch_size = patch_size
+        dH, dW = ctx.stride = _pair(stride)
+        padH, padW = ctx.padding = _pair(padding)
+        dilationH, dilationW = ctx.dilation = _pair(dilation)
+        dilation_patchH, dilation_patchW = ctx.dilation_patch = _pair(
+            dilation_patch)
+
+        output_size = CorrelationFunction._output_size(ctx, input1)
+
+        output = input1.new_zeros(output_size)
+
+        ext_module.correlation_forward(
+            input1,
+            input2,
+            output,
+            kH=kH,
+            kW=kW,
+            patchH=patch_size,
+            patchW=patch_size,
+            padH=padH,
+            padW=padW,
+            dilationH=dilationH,
+            dilationW=dilationW,
+            dilation_patchH=dilation_patchH,
+            dilation_patchW=dilation_patchW,
+            dH=dH,
+            dW=dW)
+
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(
+        ctx, grad_output: Tensor
+    ) -> Tuple[Tensor, Tensor, None, None, None, None, None, None]:
+        input1, input2 = ctx.saved_tensors
+
+        kH, kW = ctx.kernel_size
+        patch_size = ctx.patch_size
+        padH, padW = ctx.padding
+        dilationH, dilationW = ctx.dilation
+        dilation_patchH, dilation_patchW = ctx.dilation_patch
+        dH, dW = ctx.stride
+        grad_input1 = torch.zeros_like(input1)
+        grad_input2 = torch.zeros_like(input2)
+
+        ext_module.correlation_backward(
+            grad_output,
+            input1,
+            input2,
+            grad_input1,
+            grad_input2,
+            kH=kH,
+            kW=kW,
+            patchH=patch_size,
+            patchW=patch_size,
+            padH=padH,
+            padW=padW,
+            dilationH=dilationH,
+            dilationW=dilationW,
+            dilation_patchH=dilation_patchH,
+            dilation_patchW=dilation_patchW,
+            dH=dH,
+            dW=dW)
+        return grad_input1, grad_input2, None, None, None, None, None, None
+
+    @staticmethod
+    def _output_size(ctx, input1):
+        iH, iW = input1.size(2), input1.size(3)
+        batch_size = input1.size(0)
+        kH, kW = ctx.kernel_size
+        patch_size = ctx.patch_size
+        dH, dW = ctx.stride
+        padH, padW = ctx.padding
+        dilationH, dilationW = ctx.dilation
+        dilatedKH = (kH - 1) * dilationH + 1
+        dilatedKW = (kW - 1) * dilationW + 1
+
+        oH = int((iH + 2 * padH - dilatedKH) / dH + 1)
+        oW = int((iW + 2 * padW - dilatedKW) / dW + 1)
+
+        output_size = (batch_size, patch_size, patch_size, oH, oW)
+        return output_size
+
+
+class Correlation(nn.Module):
+    r"""Correlation operator
+
+    This correlation operator works for optical flow correlation computation.
+
+    There are two batched tensors with shape :math:`(N, C, H, W)`,
+    and the correlation output's shape is :math:`(N, max\_displacement \times
+    2 + 1, max\_displacement * 2 + 1, H_{out}, W_{out})`
+
+    where
+
+    .. math::
+        H_{out} = \left\lfloor\frac{H_{in}  + 2 \times padding -
+            dilation \times (kernel\_size - 1) - 1}
+            {stride} + 1\right\rfloor
+
+    .. math::
+        W_{out} = \left\lfloor\frac{W_{in}  + 2 \times padding - dilation
+            \times (kernel\_size - 1) - 1}
+            {stride} + 1\right\rfloor
+
+    the correlation item :math:`(N_i, dy, dx)` is formed by taking the sliding
+    window convolution between input1 and shifted input2,
+
+    .. math::
+        Corr(N_i, dx, dy) =
+        \sum_{c=0}^{C-1}
+        input1(N_i, c) \star
+        \mathcal{S}(input2(N_i, c), dy, dx)
+
+    where :math:`\star` is the valid 2d sliding window convolution operator,
+    and :math:`\mathcal{S}` means shifting the input features (auto-complete
+    zero marginal), and :math:`dx, dy` are shifting distance, :math:`dx, dy \in
+    [-max\_displacement \times dilation\_patch, max\_displacement \times
+    dilation\_patch]`.
+
+    Args:
+        kernel_size (int): The size of sliding window i.e. local neighborhood
+            representing the center points and involved in correlation
+            computation. Defaults to 1.
+        max_displacement (int): The radius for computing correlation volume,
+            but the actual working space can be dilated by dilation_patch.
+            Defaults to 1.
+        stride (int): The stride of the sliding blocks in the input spatial
+            dimensions. Defaults to 1.
+        padding (int): Zero padding added to all four sides of the input1.
+            Defaults to 0.
+        dilation (int): The spacing of local neighborhood that will involved
+            in correlation. Defaults to 1.
+        dilation_patch (int): The spacing between position need to compute
+            correlation.  Defaults to 1.
+    """
+
+    def __init__(self,
+                 kernel_size: int = 1,
+                 max_displacement: int = 1,
+                 stride: int = 1,
+                 padding: int = 0,
+                 dilation: int = 1,
+                 dilation_patch: int = 1) -> None:
+        super().__init__()
+        self.kernel_size = kernel_size
+        self.max_displacement = max_displacement
+        self.stride = stride
+        self.padding = padding
+        self.dilation = dilation
+        self.dilation_patch = dilation_patch
+
+    def forward(self, input1: Tensor, input2: Tensor) -> Tensor:
+        return CorrelationFunction.apply(input1, input2, self.kernel_size,
+                                         self.max_displacement, self.stride,
+                                         self.padding, self.dilation,
+                                         self.dilation_patch)
+
+    def __repr__(self) -> str:
+        s = self.__class__.__name__
+        s += f'(kernel_size={self.kernel_size}, '
+        s += f'max_displacement={self.max_displacement}, '
+        s += f'stride={self.stride}, '
+        s += f'padding={self.padding}, '
+        s += f'dilation={self.dilation}, '
+        s += f'dilation_patch={self.dilation_patch})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/README.md b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8fcc6eb1a3260148aa7448470967684f8c9f0365
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/README.md
@@ -0,0 +1,162 @@
+# Code Structure of CUDA operators
+
+This folder contains all non-python code for MMCV custom ops. Please follow the same architecture if you want to add new ops.
+
+## Directories Tree
+
+```folder
+.
+├── common
+│   ├── box_iou_rotated_utils.hpp
+│   ├── parrots_cpp_helper.hpp
+│   ├── parrots_cuda_helper.hpp
+│   ├── pytorch_cpp_helper.hpp
+│   ├── pytorch_cuda_helper.hpp
+│   ├── pytorch_device_registry.hpp
+│   ├── cuda
+│   │   ├── common_cuda_helper.hpp
+│   │   ├── parrots_cudawarpfunction.cuh
+│   │   ├── ...
+│   │   └── ops_cuda_kernel.cuh
+|   ├── mps
+│   │   ├── MPSLibrary.h
+│   │   ├── ...
+│   │   └── MPSUtils.h
+|   ├── mlu
+│   │   └── ...
+|   └── utils
+│   │   └── ...
+├── parrots
+│   ├── ...
+│   ├── ops.cpp
+│   ├── ops_parrots.cpp
+│   └── ops_pytorch.h
+└── pytorch
+    ├── info.cpp
+    ├── pybind.cpp
+    ├── ...
+    ├── ops.cpp
+    ├── cuda
+    │   ├── ...
+    │   └── ops_cuda.cu
+    ├── cpu
+    │   ├── ...
+    │   └── ops.cpp
+    ├── mps
+    │   ├── ...
+    |   └── op_mps.mm
+    └── mlu
+        ├── ...
+        └── op_mlu.cpp
+```
+
+## Components
+
+- `common`: This directory contains all tools and shared codes.
+  - `cuda`: The cuda kernels which can be shared by all backends. **HIP** kernel is also here since they have similar syntax.
+  - `mps`: The tools used to support MPS ops. **NOTE** that MPS support is **experimental**.
+  - `mlu`: The MLU kernels used to support [Cambricon](https://www.cambricon.com/) device.
+  - `utils`: The kernels and utils of spconv.
+- `parrots`: **Parrots** is a deep learning frame for model training and inference. Parrots custom ops are placed in this directory.
+- `pytorch`: **PyTorch** custom ops are supported by binding C++ to Python with **pybind11**. The ops implementation and binding codes are placed in this directory.
+  - `cuda`: This directory contains cuda kernel launchers, which feed memory pointers of tensor to the cuda kernel in `common/cuda`. The launchers provide c++ interface of cuda implementation of corresponding custom ops.
+  - `cpu`: This directory contain cpu implementations of corresponding custom ops.
+  - `mlu`: This directory contain launchers of each MLU kernels.
+  - `mps`: MPS ops implementation and launchers.
+
+## How to add new PyTorch ops?
+
+1. (Optional) Add shared kernel in `common` to support special hardware platform.
+
+   ```c++
+   // src/common/cuda/new_ops_cuda_kernel.cuh
+
+   template <typename T>
+   __global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) {
+       // forward here
+   }
+
+   ```
+
+   Add cuda kernel launcher in `pytorch/cuda`.
+
+   ```c++
+   // src/pytorch/cuda
+   #include <new_ops_cuda_kernel.cuh>
+
+   void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){
+       // initialize
+       at::cuda::CUDAGuard device_guard(input.device());
+       cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+       ...
+       AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+           input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] {
+               new_ops_forward_cuda_kernel<scalar_t>
+                   <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                       input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...);
+           }));
+       AT_CUDA_CHECK(cudaGetLastError());
+   }
+   ```
+
+2. Register implementation for different devices.
+
+   ```c++
+   // src/pytorch/cuda/cudabind.cpp
+   ...
+
+   Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){
+       // implement cuda forward here
+       // use `NewOpsForwardCUDAKernelLauncher` here
+   }
+   // declare interface here.
+   Tensor new_ops_forward_impl(Tensor input, Tensor output, ...);
+   // register the implementation for given device (CUDA here).
+   REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda);
+   ```
+
+3. Add ops implementation in `pytorch` directory. Select different implementations according to device type.
+
+   ```c++
+   // src/pytorch/new_ops.cpp
+   Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){
+       // dispatch the implementation according to the device type of input.
+       DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...);
+   }
+   ...
+
+   Tensor new_ops_forward(Tensor input, Tensor output, ...){
+       return new_ops_forward_impl(input, output, ...);
+   }
+   ```
+
+4. Binding the implementation in `pytorch/pybind.cpp`
+
+   ```c++
+   // src/pytorch/pybind.cpp
+
+   ...
+
+   Tensor new_ops_forward(Tensor input, Tensor output, ...);
+
+   ...
+
+   // bind with pybind11
+   m.def("new_ops_forward", &new_ops_forward, "new_ops_forward",
+           py::arg("input"), py::arg("output"), ...);
+
+   ...
+
+   ```
+
+5. Build MMCV again. Enjoy new ops in python
+
+   ```python
+   from ..utils import ext_loader
+   ext_module = ext_loader.load_ext('_ext', ['new_ops_forward'])
+
+   ...
+
+   ext_module.new_ops_forward(input, output, ...)
+
+   ```
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/box_iou_rotated_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/box_iou_rotated_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..e92ff5d686e8548b8f9cd55332ca6528e3e86e98
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/box_iou_rotated_utils.hpp
@@ -0,0 +1,431 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_utils.h
+#pragma once
+#include <cassert>
+#include <cmath>
+
+#ifdef __CUDACC__
+// Designates functions callable from the host (CPU) and the device (GPU)
+#define HOST_DEVICE __host__ __device__
+#define HOST_DEVICE_INLINE HOST_DEVICE __forceinline__
+#else
+#include <algorithm>
+#define HOST_DEVICE
+#define HOST_DEVICE_INLINE HOST_DEVICE inline
+#endif
+
+namespace {
+
+template <typename T>
+struct RotatedBox {
+  T x_ctr, y_ctr, w, h, a;
+};
+
+template <typename T>
+struct Point {
+  T x, y;
+  HOST_DEVICE_INLINE Point(const T& px = 0, const T& py = 0) : x(px), y(py) {}
+  HOST_DEVICE_INLINE Point operator+(const Point& p) const {
+    return Point(x + p.x, y + p.y);
+  }
+  HOST_DEVICE_INLINE Point& operator+=(const Point& p) {
+    x += p.x;
+    y += p.y;
+    return *this;
+  }
+  HOST_DEVICE_INLINE Point operator-(const Point& p) const {
+    return Point(x - p.x, y - p.y);
+  }
+  HOST_DEVICE_INLINE Point operator*(const T coeff) const {
+    return Point(x * coeff, y * coeff);
+  }
+};
+
+template <typename T>
+HOST_DEVICE_INLINE T dot_2d(const Point<T>& A, const Point<T>& B) {
+  return A.x * B.x + A.y * B.y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T cross_2d(const Point<T>& A, const Point<T>& B) {
+  return A.x * B.y - B.x * A.y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE void get_rotated_vertices(const RotatedBox<T>& box,
+                                             Point<T> (&pts)[4]) {
+  // M_PI / 180. == 0.01745329251
+  // float theta = box.a * 0.01745329251;
+  // MODIFIED
+  // float theta = box.a;
+#if defined(__ILUVATAR__)
+  float theta = box.a;
+#else
+  double theta = box.a;
+#endif
+  T cosTheta2 = (T)cos(theta) * 0.5f;
+  T sinTheta2 = (T)sin(theta) * 0.5f;
+
+  // y: top --> down; x: left --> right
+  pts[0].x = box.x_ctr - sinTheta2 * box.h - cosTheta2 * box.w;
+  pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w;
+  pts[1].x = box.x_ctr + sinTheta2 * box.h - cosTheta2 * box.w;
+  pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w;
+  pts[2].x = 2 * box.x_ctr - pts[0].x;
+  pts[2].y = 2 * box.y_ctr - pts[0].y;
+  pts[3].x = 2 * box.x_ctr - pts[1].x;
+  pts[3].y = 2 * box.y_ctr - pts[1].y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE int get_intersection_points(const Point<T> (&pts1)[4],
+                                               const Point<T> (&pts2)[4],
+                                               Point<T> (&intersections)[24]) {
+  // Line vector
+  // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1]
+  Point<T> vec1[4], vec2[4];
+  for (int i = 0; i < 4; i++) {
+    vec1[i] = pts1[(i + 1) % 4] - pts1[i];
+    vec2[i] = pts2[(i + 1) % 4] - pts2[i];
+  }
+
+  // Line test - test all line combos for intersection
+  int num = 0;  // number of intersections
+  for (int i = 0; i < 4; i++) {
+    for (int j = 0; j < 4; j++) {
+      // Solve for 2x2 Ax=b
+      T det = cross_2d<T>(vec2[j], vec1[i]);
+
+      // This takes care of parallel lines
+      if (fabs(det) <= 1e-14) {
+        continue;
+      }
+
+      auto vec12 = pts2[j] - pts1[i];
+
+      T t1 = cross_2d<T>(vec2[j], vec12) / det;
+      T t2 = cross_2d<T>(vec1[i], vec12) / det;
+
+      if (t1 >= 0.0f && t1 <= 1.0f && t2 >= 0.0f && t2 <= 1.0f) {
+        intersections[num++] = pts1[i] + vec1[i] * t1;
+      }
+    }
+  }
+
+  // Check for vertices of rect1 inside rect2
+  {
+    const auto& AB = vec2[0];
+    const auto& DA = vec2[3];
+    auto ABdotAB = dot_2d<T>(AB, AB);
+    auto ADdotAD = dot_2d<T>(DA, DA);
+    for (int i = 0; i < 4; i++) {
+      // assume ABCD is the rectangle, and P is the point to be judged
+      // P is inside ABCD iff. P's projection on AB lies within AB
+      // and P's projection on AD lies within AD
+
+      auto AP = pts1[i] - pts2[0];
+
+      auto APdotAB = dot_2d<T>(AP, AB);
+      auto APdotAD = -dot_2d<T>(AP, DA);
+
+      if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
+          (APdotAD <= ADdotAD)) {
+        intersections[num++] = pts1[i];
+      }
+    }
+  }
+
+  // Reverse the check - check for vertices of rect2 inside rect1
+  {
+    const auto& AB = vec1[0];
+    const auto& DA = vec1[3];
+    auto ABdotAB = dot_2d<T>(AB, AB);
+    auto ADdotAD = dot_2d<T>(DA, DA);
+    for (int i = 0; i < 4; i++) {
+      auto AP = pts2[i] - pts1[0];
+
+      auto APdotAB = dot_2d<T>(AP, AB);
+      auto APdotAD = -dot_2d<T>(AP, DA);
+
+      if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
+          (APdotAD <= ADdotAD)) {
+        intersections[num++] = pts2[i];
+      }
+    }
+  }
+
+  return num;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE int convex_hull_graham(const Point<T> (&p)[24],
+                                          const int& num_in, Point<T> (&q)[24],
+                                          bool shift_to_zero = false) {
+  assert(num_in >= 2);
+
+  // Step 1:
+  // Find point with minimum y
+  // if more than 1 points have the same minimum y,
+  // pick the one with the minimum x.
+  int t = 0;
+  for (int i = 1; i < num_in; i++) {
+    if (p[i].y < p[t].y || (p[i].y == p[t].y && p[i].x < p[t].x)) {
+      t = i;
+    }
+  }
+  auto& start = p[t];  // starting point
+
+  // Step 2:
+  // Subtract starting point from every points (for sorting in the next step)
+  for (int i = 0; i < num_in; i++) {
+    q[i] = p[i] - start;
+  }
+
+  // Swap the starting point to position 0
+  auto tmp = q[0];
+  q[0] = q[t];
+  q[t] = tmp;
+
+  // Step 3:
+  // Sort point 1 ~ num_in according to their relative cross-product values
+  // (essentially sorting according to angles)
+  // If the angles are the same, sort according to their distance to origin
+  T dist[24];
+  for (int i = 0; i < num_in; i++) {
+    dist[i] = dot_2d<T>(q[i], q[i]);
+  }
+
+#ifdef __CUDACC__
+  // CUDA version
+  // In the future, we can potentially use thrust
+  // for sorting here to improve speed (though not guaranteed)
+  for (int i = 1; i < num_in - 1; i++) {
+    for (int j = i + 1; j < num_in; j++) {
+      T crossProduct = cross_2d<T>(q[i], q[j]);
+      if ((crossProduct < -1e-6) ||
+          (fabs(crossProduct) < 1e-6 && dist[i] > dist[j])) {
+        auto q_tmp = q[i];
+        q[i] = q[j];
+        q[j] = q_tmp;
+        auto dist_tmp = dist[i];
+        dist[i] = dist[j];
+        dist[j] = dist_tmp;
+      }
+    }
+  }
+#else
+  // CPU version
+  std::sort(q + 1, q + num_in,
+            [](const Point<T>& A, const Point<T>& B) -> bool {
+              T temp = cross_2d<T>(A, B);
+              if (fabs(temp) < 1e-6) {
+                return dot_2d<T>(A, A) < dot_2d<T>(B, B);
+              } else {
+                return temp > 0;
+              }
+            });
+  // compute distance to origin after sort, since the points are now different.
+  for (int i = 0; i < num_in; i++) {
+    dist[i] = dot_2d<T>(q[i], q[i]);
+  }
+#endif
+
+  // Step 4:
+  // Make sure there are at least 2 points (that don't overlap with each other)
+  // in the stack
+  int k;  // index of the non-overlapped second point
+  for (k = 1; k < num_in; k++) {
+    if (dist[k] > 1e-8) {
+      break;
+    }
+  }
+  if (k == num_in) {
+    // We reach the end, which means the convex hull is just one point
+    q[0] = p[t];
+    return 1;
+  }
+  q[1] = q[k];
+  int m = 2;  // 2 points in the stack
+  // Step 5:
+  // Finally we can start the scanning process.
+  // When a non-convex relationship between the 3 points is found
+  // (either concave shape or duplicated points),
+  // we pop the previous point from the stack
+  // until the 3-point relationship is convex again, or
+  // until the stack only contains two points
+  for (int i = k + 1; i < num_in; i++) {
+    while (m > 1 && cross_2d<T>(q[i] - q[m - 2], q[m - 1] - q[m - 2]) >= 0) {
+      m--;
+    }
+    q[m++] = q[i];
+  }
+
+  // Step 6 (Optional):
+  // In general sense we need the original coordinates, so we
+  // need to shift the points back (reverting Step 2)
+  // But if we're only interested in getting the area/perimeter of the shape
+  // We can simply return.
+  if (!shift_to_zero) {
+    for (int i = 0; i < m; i++) {
+      q[i] += start;
+    }
+  }
+
+  return m;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T quadri_box_area(const Point<T> (&q)[4]) {
+  T area = 0;
+#pragma unroll
+  for (int i = 1; i < 3; i++) {
+    area += fabs(cross_2d<T>(q[i] - q[0], q[i + 1] - q[0]));
+  }
+
+  return area / 2.0;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T polygon_area(const Point<T> (&q)[24], const int& m) {
+  if (m <= 2) {
+    return 0;
+  }
+
+  T area = 0;
+  for (int i = 1; i < m - 1; i++) {
+    area += fabs(cross_2d<T>(q[i] - q[0], q[i + 1] - q[0]));
+  }
+
+  return area / 2.0;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T rotated_boxes_intersection(const RotatedBox<T>& box1,
+                                                const RotatedBox<T>& box2) {
+  // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned
+  // from rotated_rect_intersection_pts
+  Point<T> intersectPts[24], orderedPts[24];
+
+  Point<T> pts1[4];
+  Point<T> pts2[4];
+  get_rotated_vertices<T>(box1, pts1);
+  get_rotated_vertices<T>(box2, pts2);
+
+  int num = get_intersection_points<T>(pts1, pts2, intersectPts);
+
+  if (num <= 2) {
+    return 0.0;
+  }
+
+  // Convex Hull to order the intersection points in clockwise order and find
+  // the contour area.
+  int num_convex = convex_hull_graham<T>(intersectPts, num, orderedPts, true);
+  return polygon_area<T>(orderedPts, num_convex);
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T quadri_boxes_intersection(const Point<T> (&pts1)[4],
+                                               const Point<T> (&pts2)[4]) {
+  // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned
+  // from rotated_rect_intersection_pts
+  Point<T> intersectPts[24], orderedPts[24];
+
+  int num = get_intersection_points<T>(pts1, pts2, intersectPts);
+
+  if (num <= 2) {
+    return 0.0;
+  }
+
+  // Convex Hull to order the intersection points in clockwise order and find
+  // the contour area.
+  int num_convex = convex_hull_graham<T>(intersectPts, num, orderedPts, true);
+  return polygon_area<T>(orderedPts, num_convex);
+}
+
+}  // namespace
+
+template <typename T>
+HOST_DEVICE_INLINE T single_box_iou_rotated(T const* const box1_raw,
+                                            T const* const box2_raw,
+                                            const int mode_flag) {
+  // shift center to the middle point to achieve higher precision in result
+  RotatedBox<T> box1, box2;
+  auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0;
+  auto center_shift_y = (box1_raw[1] + box2_raw[1]) / 2.0;
+  box1.x_ctr = box1_raw[0] - center_shift_x;
+  box1.y_ctr = box1_raw[1] - center_shift_y;
+  box1.w = box1_raw[2];
+  box1.h = box1_raw[3];
+  box1.a = box1_raw[4];
+  box2.x_ctr = box2_raw[0] - center_shift_x;
+  box2.y_ctr = box2_raw[1] - center_shift_y;
+  box2.w = box2_raw[2];
+  box2.h = box2_raw[3];
+  box2.a = box2_raw[4];
+
+  const T area1 = box1.w * box1.h;
+  const T area2 = box2.w * box2.h;
+  if (area1 < 1e-14 || area2 < 1e-14) {
+    return 0.f;
+  }
+
+  const T intersection = rotated_boxes_intersection<T>(box1, box2);
+  T baseS = 1.0;
+  if (mode_flag == 0) {
+    baseS = (area1 + area2 - intersection);
+  } else if (mode_flag == 1) {
+    baseS = area1;
+  }
+  const T iou = intersection / baseS;
+  return iou;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T single_box_iou_quadri(T const* const pts1_raw,
+                                           T const* const pts2_raw,
+                                           const int mode_flag) {
+  // shift center to the middle point to achieve higher precision in result
+  Point<T> pts1[4], pts2[4];
+
+  auto center_shift_x =
+      (pts1_raw[0] + pts2_raw[0] + pts1_raw[2] + pts2_raw[2] + pts1_raw[4] +
+       pts2_raw[4] + pts1_raw[6] + pts2_raw[6]) /
+      8.0;
+  auto center_shift_y =
+      (pts1_raw[1] + pts2_raw[1] + pts1_raw[3] + pts2_raw[3] + pts1_raw[5] +
+       pts2_raw[5] + pts1_raw[7] + pts2_raw[7]) /
+      8.0;
+  pts1[0].x = pts1_raw[0] - center_shift_x;
+  pts1[0].y = pts1_raw[1] - center_shift_y;
+  pts1[1].x = pts1_raw[2] - center_shift_x;
+  pts1[1].y = pts1_raw[3] - center_shift_y;
+  pts1[2].x = pts1_raw[4] - center_shift_x;
+  pts1[2].y = pts1_raw[5] - center_shift_y;
+  pts1[3].x = pts1_raw[6] - center_shift_x;
+  pts1[3].y = pts1_raw[7] - center_shift_y;
+  pts2[0].x = pts2_raw[0] - center_shift_x;
+  pts2[0].y = pts2_raw[1] - center_shift_y;
+  pts2[1].x = pts2_raw[2] - center_shift_x;
+  pts2[1].y = pts2_raw[3] - center_shift_y;
+  pts2[2].x = pts2_raw[4] - center_shift_x;
+  pts2[2].y = pts2_raw[5] - center_shift_y;
+  pts2[3].x = pts2_raw[6] - center_shift_x;
+  pts2[3].y = pts2_raw[7] - center_shift_y;
+
+  const T area1 = quadri_box_area<T>(pts1);
+  const T area2 = quadri_box_area<T>(pts2);
+  if (area1 < 1e-14 || area2 < 1e-14) {
+    return 0.f;
+  }
+
+  const T intersection = quadri_boxes_intersection<T>(pts1, pts2);
+  T baseS = 1.0;
+  if (mode_flag == 0) {
+    baseS = (area1 + area2 - intersection);
+  } else if (mode_flag == 1) {
+    baseS = area1;
+  }
+  const T iou = intersection / baseS;
+  return iou;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/active_rotated_filter_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/active_rotated_filter_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..36e41107ebd52d3cf5e9a71cffe6eddeed4f0765
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/active_rotated_filter_cuda_kernel.cuh
@@ -0,0 +1,59 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/orn/src/cuda/ActiveRotatingFilter_cuda.cu
+#ifndef ACTIVE_ROTATED_FILTER_CUDA_KERNEL_CUH
+#define ACTIVE_ROTATED_FILTER_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename scalar_t>
+__global__ void active_rotated_filter_forward_cuda_kernel(
+    const int nthreads, const scalar_t* weight_data, const int* indices_data,
+    const int num_input_planes, const int num_output_planes,
+    const int num_orientations, const int num_rotations, const int nEntry,
+    scalar_t* output_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int l = index % nEntry;
+    int j = (index / nEntry) % num_input_planes;
+    int i = index / nEntry / num_input_planes;
+    int k;
+    scalar_t val = *(weight_data + index);
+    for (k = 0; k < num_rotations; k++) {
+      int idx = (int)(*(indices_data + l * num_rotations + k)) - 1;
+      scalar_t* target = output_data +
+                         i * (num_rotations * num_input_planes * nEntry) +
+                         k * (num_input_planes * nEntry) + j * (nEntry) + idx;
+      *target = val;
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void active_rotated_filter_backward_cuda_kernel(
+    const int nthreads, const scalar_t* gradWeight_data,
+    const int* indices_data, const int num_input_planes,
+    const int num_output_planes, const int num_orientations,
+    const int num_rotations, const int nEntry, scalar_t* weight_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int l = index % nEntry;
+    int j = (index / nEntry) % num_input_planes;
+    int i = index / nEntry / num_input_planes;
+    int k;
+    scalar_t* val = weight_data + index;
+    *val = 0;
+    scalar_t tmp = 0;
+    for (k = 0; k < num_rotations; k++) {
+      int idx = (int)(*(indices_data + l * num_rotations + k)) - 1;
+      scalar_t target =
+          *(gradWeight_data + i * (num_rotations * num_input_planes * nEntry) +
+            k * (num_input_planes * nEntry) + j * (nEntry) + idx);
+      tmp = tmp + target;
+    }
+    *val = tmp;
+  }
+}
+#endif  // ACTIVE_ROTATED_FILTER_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/assign_score_withk_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/assign_score_withk_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..9f9250844b9ceeca0df0377640c3d28e3f61cecc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/assign_score_withk_cuda_kernel.cuh
@@ -0,0 +1,116 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ASSIGN_SCORE_WITHK_CUDA_KERNEL_CUH
+#define ASSIGN_SCORE_WITHK_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+// input: points(B,N0,M,O), centers(B,N0,M,O), scores(B,N1,K,M), knn_idx(B,N1,K)
+// output: fout(B,O,N)
+// algo: fout(b,i,k,j) = s(b,i,k,m)*p(b,c(i),k,m,j) =  s(b,i,k,m)*p(b,i(k),m,j)
+//       i(k) = idx(b,i,k)
+//      sum: fout(b,i,j) = fout(b,i,j) + s(b,i,k,m)*p(b,i,k,m,j)
+//      avg: fout(b,i,j) = sum(fout(b,i,k,j)) / k
+//      max: fout(b,i,j) = max(fout(b,i,k,j), sum(s(b,i,k,m)*p(b,i,k,m,j)))
+
+template <typename T>
+__global__ void assign_score_withk_forward_cuda_kernel(
+    const int B, const int N0, const int N1, const int M, const int K,
+    const int O, const int aggregate, const T* points, const T* centers,
+    const T* scores, const int64_t* knn_idx, T* output) {
+  // ----- parallel loop for B, N1, K and O ---------
+  CUDA_1D_KERNEL_LOOP(i, B * O * N1 * K) {
+    // ------- loop for M ----------
+    const int b = (int)(i / (O * N1 * K));
+    const int o = (int)(i % (O * N1 * K) / (N1 * K));
+    const int n = (int)(i % (N1 * K) / K);
+    const int k = (int)(i % K);
+    const int cn = (int)knn_idx[b * K * N1 + n * K +
+                                0];  // The first neighbor is the center point
+    const int kn = (int)knn_idx[b * K * N1 + n * K + k];
+    if (kn >= N0 ||
+        kn < 0) {  // if index overflows, it is out of the neighborhood range
+      return;
+    }
+    assert(b < B);
+    assert(kn < N0);
+    assert(cn < N0);
+    assert(o < O);
+    assert(n < N1);
+    const int out_idx = b * N1 * O * K + o * N1 * K + n * K + k;
+    T val = output[out_idx];
+    for (int m = 0; m < M; m++) {
+      val += points[b * N0 * M * O + kn * M * O + m * O + o] *
+                 scores[b * N1 * K * M + n * K * M + k * M + m] -
+             centers[b * N0 * M * O + cn * M * O + m * O + o] *
+                 scores[b * N1 * K * M + n * K * M + k * M + m];
+    }
+    output[out_idx] = val;
+  }
+}
+
+template <typename T>
+__global__ void assign_score_withk_points_backward_cuda_kernel(
+    const int B, const int N0, const int N, const int M, const int K,
+    const int O, const int aggregate, const T* grad_out, const T* scores,
+    const int64_t* knn_idx, T* grad_points, T* grad_centers) {
+  // ----- parallel loop for B, M, O ---------
+  CUDA_1D_KERNEL_LOOP(i, B * M * O) {
+    int b = (int)(i / (M * O));
+    int m = (int)(i % (M * O) / O);
+    int o = (int)(i % O);
+
+    // ----- loop for N,K ---------
+    for (int n = 0; n < N; n++) {
+      for (int k = 0; k < K; k++) {
+        int kn = knn_idx[b * N * K + n * K + k];
+        int cn = knn_idx[b * N * K + n * K + 0];
+        if (kn >= N0 || kn < 0) {  // if index overflows, it is out of the
+                                   // neighborhood range
+          continue;
+        }
+        atomicAdd(grad_points + b * N0 * M * O + kn * M * O + m * O + o,
+                  scores[b * N * K * M + n * K * M + k * M + m] *
+                      grad_out[b * O * N * K + o * N * K + n * K + k]);
+        atomicAdd(grad_centers + b * N0 * M * O + cn * M * O + m * O + o,
+                  -scores[b * N * K * M + n * K * M + k * M + m] *
+                      grad_out[b * O * N * K + o * N * K + n * K + k]);
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void assign_score_withk_scores_backward_cuda_kernel(
+    const int B, const int N0, const int N, const int M, const int K,
+    const int O, const int aggregate, const T* grad_out, const T* points,
+    const T* centers, const int64_t* knn_idx, T* grad_scores) {
+  // ----- parallel loop for B, N, K, M ---------
+  CUDA_1D_KERNEL_LOOP(i, B * N * K * M) {
+    const int b = (int)(i / (N * M * K));
+    const int n = (int)(i % (N * M * K) / M / K);
+    const int k = (int)(i % (M * K) / M);
+    const int m = (int)(i % M);
+    const int cn = knn_idx[b * N * K + n * K + 0];
+    const int kn = knn_idx[b * N * K + n * K + k];
+    if (kn >= N0 ||
+        kn < 0) {  // if index overflows, it is out of the neighborhood range
+      return;
+    }
+
+    // -------------- loop for O ------------------------
+    const int out_idx = b * N * K * M + n * K * M + k * M + m;
+    T val = grad_scores[out_idx];
+    for (int o = 0; o < O; o++) {
+      val += (points[b * N0 * M * O + kn * M * O + m * O + o] -
+              centers[b * N0 * M * O + cn * M * O + m * O + o]) *
+             grad_out[b * O * N * K + o * N * K + n * K + k];
+    }
+    grad_scores[out_idx] = val;
+  }
+}
+
+#endif  // ASSIGN_SCORE_WITHK_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ball_query_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ball_query_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..632b5c4940b33a9d8d839fa3f3b92e7b6a2bd29e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ball_query_cuda_kernel.cuh
@@ -0,0 +1,58 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query_gpu.cu
+#ifndef BALL_QUERY_CUDA_KERNEL_CUH
+#define BALL_QUERY_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void ball_query_forward_cuda_kernel(int b, int n, int m,
+                                               float min_radius,
+                                               float max_radius, int nsample,
+                                               const T* new_xyz, const T* xyz,
+                                               int* idx) {
+  // new_xyz: (B, M, 3)
+  // xyz: (B, N, 3)
+  // output:
+  //      idx: (B, M, nsample)
+  int bs_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, m) {
+    if (bs_idx >= b) return;
+
+    new_xyz += bs_idx * m * 3 + pt_idx * 3;
+    xyz += bs_idx * n * 3;
+    idx += bs_idx * m * nsample + pt_idx * nsample;
+
+    float max_radius2 = max_radius * max_radius;
+    float min_radius2 = min_radius * min_radius;
+    T new_x = new_xyz[0];
+    T new_y = new_xyz[1];
+    T new_z = new_xyz[2];
+
+    int cnt = 0;
+    for (int k = 0; k < n; ++k) {
+      T x = xyz[k * 3 + 0];
+      T y = xyz[k * 3 + 1];
+      T z = xyz[k * 3 + 2];
+      T d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) +
+             (new_z - z) * (new_z - z);
+      if (d2 == 0 || (d2 >= min_radius2 && d2 < max_radius2)) {
+        if (cnt == 0) {
+          for (int l = 0; l < nsample; ++l) {
+            idx[l] = k;
+          }
+        }
+        idx[cnt] = k;
+        ++cnt;
+        if (cnt >= nsample) break;
+      }
+    }
+  }
+}
+
+#endif  // BALL_QUERY_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bbox_overlaps_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bbox_overlaps_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..15bd91eca629895d3a99dde3fe6614036ca31dc9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bbox_overlaps_cuda_kernel.cuh
@@ -0,0 +1,147 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef BBOX_OVERLAPS_CUDA_KERNEL_CUH
+#define BBOX_OVERLAPS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__device__ __forceinline__ void load_bbox(const T* bbox, const int base, T& x1,
+                                          T& y1, T& x2, T& y2) {
+  x1 = bbox[base];
+  y1 = bbox[base + 1];
+  x2 = bbox[base + 2];
+  y2 = bbox[base + 3];
+}
+
+template <>
+__device__ __forceinline__ void load_bbox<float>(const float* bbox,
+                                                 const int base, float& x1,
+                                                 float& y1, float& x2,
+                                                 float& y2) {
+  const float4 bbox_offset = reinterpret_cast<const float4*>(bbox + base)[0];
+  x1 = bbox_offset.x;
+  y1 = bbox_offset.y;
+  x2 = bbox_offset.z;
+  y2 = bbox_offset.w;
+}
+
+template <typename T>
+__global__ void bbox_overlaps_cuda_kernel(const T* bbox1, const T* bbox2,
+                                          T* ious, const int num_bbox1,
+                                          const int num_bbox2, const int mode,
+                                          const bool aligned,
+                                          const int offset) {
+  if (aligned) {
+    CUDA_1D_KERNEL_LOOP(index, num_bbox1) {
+      const int b1 = index;
+      const int b2 = index;
+
+      const int base1 = b1 << 2;  // b1 * 4
+      T b1_x1, b1_y1, b1_x2, b1_y2;
+      load_bbox<T>(bbox1, base1, b1_x1, b1_y1, b1_x2, b1_y2);
+      const T b1_area = (b1_x2 - b1_x1 + offset) * (b1_y2 - b1_y1 + offset);
+
+      const int base2 = b2 << 2;  // b2 * 4
+      T b2_x1, b2_y1, b2_x2, b2_y2;
+      load_bbox<T>(bbox2, base2, b2_x1, b2_y1, b2_x2, b2_y2);
+      const T b2_area = (b2_x2 - b2_x1 + offset) * (b2_y2 - b2_y1 + offset);
+
+      const T left = fmaxf(b1_x1, b2_x1), right = fminf(b1_x2, b2_x2);
+      const T top = fmaxf(b1_y1, b2_y1), bottom = fminf(b1_y2, b2_y2);
+      const T width = fmaxf(right - left + offset, 0.f);
+      const T height = fmaxf(bottom - top + offset, 0.f);
+      const T interS = width * height;
+
+      const T baseS =
+          fmaxf(mode == 0 ? b1_area + b2_area - interS : b1_area, T(offset));
+      ious[index] = interS / baseS;
+    }
+  } else {
+    CUDA_1D_KERNEL_LOOP(index, num_bbox1 * num_bbox2) {
+      const int b1 = index / num_bbox2;
+      const int b2 = index % num_bbox2;
+
+      const int base1 = b1 << 2;  // b1 * 4
+      T b1_x1, b1_y1, b1_x2, b1_y2;
+      load_bbox<T>(bbox1, base1, b1_x1, b1_y1, b1_x2, b1_y2);
+      const T b1_area = (b1_x2 - b1_x1 + offset) * (b1_y2 - b1_y1 + offset);
+
+      const int base2 = b2 << 2;  // b2 * 4
+      T b2_x1, b2_y1, b2_x2, b2_y2;
+      load_bbox<T>(bbox2, base2, b2_x1, b2_y1, b2_x2, b2_y2);
+      const T b2_area = (b2_x2 - b2_x1 + offset) * (b2_y2 - b2_y1 + offset);
+
+      const T left = fmaxf(b1_x1, b2_x1), right = fminf(b1_x2, b2_x2);
+      const T top = fmaxf(b1_y1, b2_y1), bottom = fminf(b1_y2, b2_y2);
+      const T width = fmaxf(right - left + offset, 0.f);
+      const T height = fmaxf(bottom - top + offset, 0.f);
+      const T interS = width * height;
+
+      const T baseS =
+          fmaxf(mode == 0 ? b1_area + b2_area - interS : b1_area, T(offset));
+      ious[index] = interS / baseS;
+    }
+  }
+}
+
+#if __CUDA_ARCH__ >= 530
+__device__ __forceinline__ __half __half_area(const __half x1, const __half y1,
+                                              const __half x2, const __half y2,
+                                              const __half offset) {
+  const __half half_w = __hadd(__hsub(x2, x1), offset);
+  const __half half_h = __hadd(__hsub(y2, y1), offset);
+  return __hmul(half_w, half_h);
+}
+
+__device__ __forceinline__ __half __half_max(const __half a, const __half b) {
+  return __hge(a, b) ? a : b;
+}
+
+__device__ __forceinline__ __half __half_min(const __half a, const __half b) {
+  return __hle(a, b) ? a : b;
+}
+
+// fp16 won't provide much increase when aligned==true. It is useful when
+// aligned==false, which would give you ~40% bonus.
+__device__ void bbox_overlaps_cuda_kernel_half(
+    const __half* bbox1, const __half* bbox2, __half* ious, const int num_bbox1,
+    const int num_bbox2, const int mode, const bool aligned, const int offset) {
+  const int num_output = aligned ? num_bbox1 : num_bbox1 * num_bbox2;
+  const __half h_offset = __int2half_rn(offset);
+  CUDA_1D_KERNEL_LOOP(index, num_output) {
+    const int b1 = aligned ? index : index / num_bbox2;
+    const int b2 = aligned ? index : index % num_bbox2;
+
+    const int base1 = b1 << 2;
+    __half b1_x1, b1_y1, b1_x2, b1_y2;
+    load_bbox<__half>(bbox1, base1, b1_x1, b1_y1, b1_x2, b1_y2);
+    const __half b1_area = __half_area(b1_x1, b1_y1, b1_x2, b1_y2, h_offset);
+
+    const int base2 = b2 << 2;
+    __half b2_x1, b2_y1, b2_x2, b2_y2;
+    load_bbox<__half>(bbox2, base2, b2_x1, b2_y1, b2_x2, b2_y2);
+    const __half b2_area = __half_area(b2_x1, b2_y1, b2_x2, b2_y2, h_offset);
+
+    const __half left = __half_max(b1_x1, b2_x1),
+                 right = __half_min(b1_x2, b2_x2);
+    const __half top = __half_max(b1_y1, b2_y1),
+                 bottom = __half_min(b1_y2, b2_y2);
+    const __half width =
+        __half_max(__hadd(__hsub(right, left), h_offset), __float2half(0.f));
+    const __half height =
+        __half_max(__hadd(__hsub(bottom, top), h_offset), __float2half(0.f));
+    const __half interS = __hmul(width, height);
+
+    const __half baseS = __half_max(
+        mode == 0 ? __hsub(__hadd(b1_area, b2_area), interS) : b1_area,
+        h_offset);
+    ious[index] = __hdiv(interS, baseS);
+  }
+}
+#endif  // __CUDA_ARCH__ >= 530
+
+#endif  // BBOX_OVERLAPS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bezier_align_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bezier_align_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..537610416e16aae8979d0843972e090d127b0d43
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/bezier_align_cuda_kernel.cuh
@@ -0,0 +1,230 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/aim-uofa/AdelaiDet/blob/master/adet/layers/csrc/BezierAlign/BezierAlign_cuda.cu
+#ifndef BEZIER_ALIGN_CUDA_KERNEL_CUH
+#define BEZIER_ALIGN_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+template <typename T>
+__device__ T bezier_curve(const T p0, const T p1, const T p2, const T p3,
+                          const T u) {
+  return ((1. - u) * (1. - u) * (1. - u) * p0 +
+          3. * u * (1. - u) * (1. - u) * p1 + 3. * u * u * (1. - u) * p2 +
+          u * u * u * p3);
+}
+
+template <typename T>
+__global__ void bezier_align_forward_cuda_kernel(
+    const int nthreads,
+    const T *bottom_data,  // inputs
+    const T *bottom_rois,  // bottom rois contains the bezier curve
+    T *top_data,           // outputs
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int sampling_ratio, bool aligned, const int channels,
+    const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    // beziers have size Nx(1+8*2) = Nx17
+    const T *offset_bottom_rois = bottom_rois + n * 17;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+
+    // TODO: avoid this by using parallel annotation, for good
+    T p0_x = offset_bottom_rois[1] * spatial_scale;
+    T p0_y = offset_bottom_rois[2] * spatial_scale;
+    T p1_x = offset_bottom_rois[3] * spatial_scale;
+    T p1_y = offset_bottom_rois[4] * spatial_scale;
+    T p2_x = offset_bottom_rois[5] * spatial_scale;
+    T p2_y = offset_bottom_rois[6] * spatial_scale;
+    T p3_x = offset_bottom_rois[7] * spatial_scale;
+    T p3_y = offset_bottom_rois[8] * spatial_scale;
+    T p4_x = offset_bottom_rois[15] * spatial_scale;
+    T p4_y = offset_bottom_rois[16] * spatial_scale;
+    T p5_x = offset_bottom_rois[13] * spatial_scale;
+    T p5_y = offset_bottom_rois[14] * spatial_scale;
+    T p6_x = offset_bottom_rois[11] * spatial_scale;
+    T p6_y = offset_bottom_rois[12] * spatial_scale;
+    T p7_x = offset_bottom_rois[9] * spatial_scale;
+    T p7_y = offset_bottom_rois[10] * spatial_scale;
+
+    // compute the coords
+    const T u = pw / static_cast<T>(pooled_width);
+    const T v = ph / static_cast<T>(pooled_height);
+    const T x0 = bezier_curve(p0_x, p1_x, p2_x, p3_x, u);
+    const T y0 = bezier_curve(p0_y, p1_y, p2_y, p3_y, u);
+    const T x1 = bezier_curve(p4_x, p5_x, p6_x, p7_x, u);
+    const T y1 = bezier_curve(p4_y, p5_y, p6_y, p7_y, u);
+    const T x_center = x1 * v + x0 * (1. - v) - offset;
+    const T y_center = y1 * v + y0 * (1. - v) - offset;
+
+    T roi_width = max(abs(p0_x - p3_x), abs(p4_x - p7_x));
+    T roi_height = max(abs(p0_y - p3_y), abs(p4_y - p7_y));
+    if (!aligned) {  // for backward-compatibility only
+      roi_width = max(roi_width, (T)1.);
+      roi_height = max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    const T *offset_bottom_data =
+        bottom_data + (roi_batch_ind * channels + c) * height * width;
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceil(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+
+    // We do average (integral) pooling inside a bin
+    // When the grid is empty, output zeros == 0/1, instead of NaN.
+    const T count = max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    T output_val = 0.;
+    for (int iy = 0; iy < roi_bin_grid_h; iy++)  // e.g., iy = 0, 1
+    {
+      const T y = y_center - (T)0.5 * bin_size_h +
+                  static_cast<T>(iy + .5f) * bin_size_h /
+                      static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T x = x_center - (T)0.5 * bin_size_w +
+                    static_cast<T>(ix + .5f) * bin_size_w /
+                        static_cast<T>(roi_bin_grid_w);
+
+        T val = bilinear_interpolate(offset_bottom_data, height, width, y, x,
+                                     index);
+        output_val += val;
+      }
+    }
+    output_val /= count;
+
+    top_data[index] = output_val;
+  }
+}
+
+template <typename T>
+__global__ void bezier_align_backward_cuda_kernel(
+    const int nthreads, const T *top_diff, const T *bottom_rois, T *bottom_diff,
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int sampling_ratio, bool aligned, const int channels,
+    const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    // beziers have size Nx(1+8*2) = Nx17
+    const T *offset_bottom_rois = bottom_rois + n * 17;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T p0_x = offset_bottom_rois[1] * spatial_scale;
+    T p0_y = offset_bottom_rois[2] * spatial_scale;
+    T p1_x = offset_bottom_rois[3] * spatial_scale;
+    T p1_y = offset_bottom_rois[4] * spatial_scale;
+    T p2_x = offset_bottom_rois[5] * spatial_scale;
+    T p2_y = offset_bottom_rois[6] * spatial_scale;
+    T p3_x = offset_bottom_rois[7] * spatial_scale;
+    T p3_y = offset_bottom_rois[8] * spatial_scale;
+    T p4_x = offset_bottom_rois[15] * spatial_scale;
+    T p4_y = offset_bottom_rois[16] * spatial_scale;
+    T p5_x = offset_bottom_rois[13] * spatial_scale;
+    T p5_y = offset_bottom_rois[14] * spatial_scale;
+    T p6_x = offset_bottom_rois[11] * spatial_scale;
+    T p6_y = offset_bottom_rois[12] * spatial_scale;
+    T p7_x = offset_bottom_rois[9] * spatial_scale;
+    T p7_y = offset_bottom_rois[10] * spatial_scale;
+
+    // compute the coords
+    const T u = pw / static_cast<T>(pooled_width);
+    const T v = ph / static_cast<T>(pooled_height);
+    const T x0 = bezier_curve(p0_x, p1_x, p2_x, p3_x, u);
+    const T y0 = bezier_curve(p0_y, p1_y, p2_y, p3_y, u);
+    const T x1 = bezier_curve(p4_x, p5_x, p6_x, p7_x, u);
+    const T y1 = bezier_curve(p4_y, p5_y, p6_y, p7_y, u);
+    const T x_center = x1 * v + x0 * (1. - v) - offset;
+    const T y_center = y1 * v + y0 * (1. - v) - offset;
+
+    T roi_width = max(abs(p0_x - p3_x), abs(p4_x - p7_x));
+    T roi_height = max(abs(p0_y - p3_y), abs(p4_y - p7_y));
+    if (!aligned) {  // for backward-compatibility only
+      roi_width = max(roi_width, (T)1.);
+      roi_height = max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    T *offset_bottom_diff =
+        bottom_diff + (roi_batch_ind * channels + c) * height * width;
+
+    int top_offset = (n * channels + c) * pooled_height * pooled_width;
+    const T *offset_top_diff = top_diff + top_offset;
+    const T top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceil(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++)  // e.g., iy = 0, 1
+    {
+      const T y = y_center - (T)0.5 * bin_size_h +
+                  static_cast<T>(iy + .5f) * bin_size_h /
+                      static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T x = x_center - (T)0.5 * bin_size_w +
+                    static_cast<T>(ix + .5f) * bin_size_w /
+                        static_cast<T>(roi_bin_grid_w);
+
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high, index);
+
+        T g1 = top_diff_this_bin * w1 / count;
+        T g2 = top_diff_this_bin * w2 / count;
+        T g3 = top_diff_this_bin * w3 / count;
+        T g4 = top_diff_this_bin * w4 / count;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          atomicAdd(offset_bottom_diff + y_low * width + x_low,
+                    static_cast<T>(g1));
+          atomicAdd(offset_bottom_diff + y_low * width + x_high,
+                    static_cast<T>(g2));
+          atomicAdd(offset_bottom_diff + y_high * width + x_low,
+                    static_cast<T>(g3));
+          atomicAdd(offset_bottom_diff + y_high * width + x_high,
+                    static_cast<T>(g4));
+        }  // if
+      }    // ix
+    }      // iy
+  }        // CUDA_1D_KERNEL_LOOP
+}  // BezierAlignBackward
+
+#endif  // BEZIER_ALIGN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/border_align_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/border_align_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..1d2a2197b45ef5c82412c4b75d7819a7e27674f6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/border_align_cuda_kernel.cuh
@@ -0,0 +1,200 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/Megvii-BaseDetection/cvpods/blob/master/cvpods/layers/csrc/border_align/border_align_kernel.cu.
+// the main difference: (1) use `argmax_idx` for fast computing of gradient
+// during the backward. (2) `wh` is directly computed by `boxes`, rather than
+// passing it as argument to forward or backward functions.
+
+#ifndef BORDER_ALIGN_CUDA_KERNEL_CUH
+#define BORDER_ALIGN_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+enum BorderMode { Top = 0, Left = 1, Bottom = 2, Right = 3 };
+
+/*** Forward ***/
+template <typename T>
+__global__ void border_align_forward_cuda_kernel(
+    const int nthreads, const T* input, const T* boxes, T* output,
+    int* argmax_idx, const int channels, const int box_size, const int height,
+    const int width, const int pool_size) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (batch_idx, c_idx, box_idx) is an element paralleled for computing
+    // output, and `extreme_idx` is in range [0,3]
+    int batch_idx, c_idx, box_idx, extreme_idx, maxidx, *offset_argmax_idx;
+    const T *offset_box, *offset_input, *offset_box_x;
+    T *offset_output, box_width, box_height, stride, x_stride, y_stride, x, y,
+        val, maxval;
+
+    extreme_idx = threadIdx.y;
+    // shape (N, C, box_size, 4) for output
+    batch_idx = index / channels / box_size;
+    // shape (N, box_size, 4) for boxes
+    box_idx = index % box_size + batch_idx * box_size;
+    c_idx = (index / box_size) % channels;
+
+    offset_box = boxes + box_idx * 4;
+    box_width = *(offset_box + 2) - *offset_box;
+    box_height = *(offset_box + 3) - *(offset_box + 1);
+    offset_output = output + index * 4 + extreme_idx;
+    offset_argmax_idx = argmax_idx + index * 4 + extreme_idx;
+    // shape (N, 4C, h, w) for input.
+    // [0,C) for top feature, [C,2C) for left feature,
+    // [2C,3C) for bottom feature, [3C,4C) for right feature
+    offset_input =
+        input + (batch_idx * channels * 4 + extreme_idx * channels + c_idx) *
+                    height * width;
+
+    // extreme_idx in [0,1] -> offset_box_x indexed at x1
+    // extreme_idx in [2,3] -> offset_box_x indexed at x2
+    offset_box_x = offset_box + extreme_idx / 2 * 2;
+
+    // (x1,y1) or (x2,y2) for (x,y)
+    x = *offset_box_x;
+    y = *(offset_box_x + 1);
+
+    switch (extreme_idx) {
+      // top
+      case BorderMode::Top:
+        stride = box_width / pool_size;
+        x_stride = stride;
+        y_stride = 0;
+        break;
+      // left
+      case BorderMode::Left:
+        stride = box_height / pool_size;
+        x_stride = 0;
+        y_stride = stride;
+        break;
+      // bottom
+      case BorderMode::Bottom:
+        stride = box_width / pool_size;
+        x_stride = -stride;
+        y_stride = 0;
+        break;
+      // right
+      case BorderMode::Right:
+        stride = box_height / pool_size;
+        x_stride = 0;
+        y_stride = -stride;
+        break;
+    }
+
+    // initialize maxval and maxidx with the start position (e.g. (x1,y1) or
+    // (x2,y2))
+    maxval = bilinear_interpolate(offset_input, height, width, y, x, index);
+    maxidx = 0;
+
+    // do max_pool along the border
+    for (int i = 1; i <= pool_size; i++) {
+      x += x_stride;
+      y += y_stride;
+      val = bilinear_interpolate(offset_input, height, width, y, x, index);
+      if (val > maxval) {
+        maxval = val;
+        maxidx = i;
+      }
+    }
+
+    // update output and argmax_idx
+    *offset_output = maxval;
+    *offset_argmax_idx = maxidx;
+  }
+}
+
+/*** Backward ***/
+template <typename T>
+__global__ void border_align_backward_cuda_kernel(
+    const int nthreads, const T* grad_output, const T* boxes,
+    const int* argmax_idx, T* grad_input, const int channels,
+    const int box_size, const int height, const int width,
+    const int pool_size) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (batch_idx, c_idx, box_idx) is an element paralleled for computing
+    // output, and `extreme_idx` is in range [0,3]
+    int batch_idx, c_idx, box_idx, extreme_idx;
+    const int* offset_argmax_idx;
+    const T *offset_grad_output, *offset_box, *offset_box_x;
+    T *offset_grad_input, box_width, box_height, stride, x_stride, y_stride, x,
+        y;
+
+    extreme_idx = threadIdx.y;
+    batch_idx = index / channels / box_size;
+    box_idx = index % box_size + batch_idx * box_size;
+    c_idx = (index / box_size) % channels;
+
+    offset_box = boxes + box_idx * 4;
+    box_width = *(offset_box + 2) - *offset_box;
+    box_height = *(offset_box + 3) - *(offset_box + 1);
+    offset_grad_output = grad_output + index * 4 + extreme_idx;
+    offset_argmax_idx = argmax_idx + index * 4 + extreme_idx;
+    // [0,C) for top feature grad, [C,2C) for left feature grad,
+    // [2C,3C) for bottom feature grad, [3C,4C) for right feature grad
+    offset_grad_input = grad_input + (batch_idx * channels * 4 +
+                                      extreme_idx * channels + c_idx) *
+                                         height * width;
+
+    // extreme_idx in [0,1] -> offset_box_x indexed at x1
+    // extreme_idx in [2,3] -> offset_box_x indexed at x2
+    offset_box_x = offset_box + extreme_idx / 2 * 2;
+
+    switch (extreme_idx) {
+      // top
+      case BorderMode::Top:
+        stride = box_width / pool_size;
+        x_stride = stride;
+        y_stride = 0;
+        break;
+      // left
+      case BorderMode::Left:
+        stride = box_height / pool_size;
+        x_stride = 0;
+        y_stride = stride;
+        break;
+      // bottom
+      case BorderMode::Bottom:
+        stride = box_width / pool_size;
+        x_stride = -stride;
+        y_stride = 0;
+        break;
+      // right
+      case BorderMode::Right:
+        stride = box_height / pool_size;
+        x_stride = 0;
+        y_stride = -stride;
+        break;
+    }
+
+    // get position (x,y) which has maximum value during forward
+    x = *offset_box_x;
+    y = *(offset_box_x + 1);
+    x += x_stride * (T)(*offset_argmax_idx);
+    y += y_stride * (T)(*offset_argmax_idx);
+
+    T w1, w2, w3, w4;
+    int x_low, x_high, y_low, y_high;
+    bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4, x_low,
+                                  x_high, y_low, y_high, index);
+
+    // update grad_output
+    atomicAdd(offset_grad_input + y_low * width + x_low,
+              *offset_grad_output * w1);
+    atomicAdd(offset_grad_input + y_low * width + x_high,
+              *offset_grad_output * w2);
+    atomicAdd(offset_grad_input + y_high * width + x_low,
+              *offset_grad_output * w3);
+    atomicAdd(offset_grad_input + y_high * width + x_high,
+              *offset_grad_output * w4);
+  }
+}
+
+#endif  // BORDER_ALIGN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_quadri_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_quadri_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..cf8ad5e1a324de3a11c8fc8af28a8d559a661ed6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_quadri_cuda.cuh
@@ -0,0 +1,91 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#ifndef BOX_IOU_QUADRI_CUDA_CUH
+#define BOX_IOU_QUADRI_CUDA_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+#include "box_iou_rotated_utils.hpp"
+
+// 2D block with 32 * 16 = 512 threads per block
+const int BLOCK_DIM_X = 32;
+const int BLOCK_DIM_Y = 16;
+
+inline int divideUP(const int x, const int y) { return (((x) + (y)-1) / (y)); }
+
+template <typename T>
+__global__ void box_iou_quadri_cuda_kernel(
+    const int n_boxes1, const int n_boxes2, const T* dev_boxes1,
+    const T* dev_boxes2, T* dev_ious, const int mode_flag, const bool aligned) {
+  if (aligned) {
+    CUDA_1D_KERNEL_LOOP(index, n_boxes1) {
+      int b1 = index;
+      int b2 = index;
+
+      int base1 = b1 * 8;
+
+      float block_boxes1[8];
+      float block_boxes2[8];
+
+      block_boxes1[0] = dev_boxes1[base1 + 0];
+      block_boxes1[1] = dev_boxes1[base1 + 1];
+      block_boxes1[2] = dev_boxes1[base1 + 2];
+      block_boxes1[3] = dev_boxes1[base1 + 3];
+      block_boxes1[4] = dev_boxes1[base1 + 4];
+      block_boxes1[5] = dev_boxes1[base1 + 5];
+      block_boxes1[6] = dev_boxes1[base1 + 6];
+      block_boxes1[7] = dev_boxes1[base1 + 7];
+
+      int base2 = b2 * 8;
+
+      block_boxes2[0] = dev_boxes2[base2 + 0];
+      block_boxes2[1] = dev_boxes2[base2 + 1];
+      block_boxes2[2] = dev_boxes2[base2 + 2];
+      block_boxes2[3] = dev_boxes2[base2 + 3];
+      block_boxes2[4] = dev_boxes2[base2 + 4];
+      block_boxes2[5] = dev_boxes2[base2 + 5];
+      block_boxes2[6] = dev_boxes2[base2 + 6];
+      block_boxes2[7] = dev_boxes2[base2 + 7];
+
+      dev_ious[index] =
+          single_box_iou_quadri<T>(block_boxes1, block_boxes2, mode_flag);
+    }
+  } else {
+    CUDA_1D_KERNEL_LOOP(index, n_boxes1 * n_boxes2) {
+      int b1 = index / n_boxes2;
+      int b2 = index % n_boxes2;
+
+      int base1 = b1 * 8;
+
+      float block_boxes1[8];
+      float block_boxes2[8];
+
+      block_boxes1[0] = dev_boxes1[base1 + 0];
+      block_boxes1[1] = dev_boxes1[base1 + 1];
+      block_boxes1[2] = dev_boxes1[base1 + 2];
+      block_boxes1[3] = dev_boxes1[base1 + 3];
+      block_boxes1[4] = dev_boxes1[base1 + 4];
+      block_boxes1[5] = dev_boxes1[base1 + 5];
+      block_boxes1[6] = dev_boxes1[base1 + 6];
+      block_boxes1[7] = dev_boxes1[base1 + 7];
+
+      int base2 = b2 * 8;
+
+      block_boxes2[0] = dev_boxes2[base2 + 0];
+      block_boxes2[1] = dev_boxes2[base2 + 1];
+      block_boxes2[2] = dev_boxes2[base2 + 2];
+      block_boxes2[3] = dev_boxes2[base2 + 3];
+      block_boxes2[4] = dev_boxes2[base2 + 4];
+      block_boxes2[5] = dev_boxes2[base2 + 5];
+      block_boxes2[6] = dev_boxes2[base2 + 6];
+      block_boxes2[7] = dev_boxes2[base2 + 7];
+
+      dev_ious[index] =
+          single_box_iou_quadri<T>(block_boxes1, block_boxes2, mode_flag);
+    }
+  }
+}
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_rotated_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_rotated_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..abd47cd85437804310886de057b5a839a49481b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/box_iou_rotated_cuda.cuh
@@ -0,0 +1,81 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cuda.cu
+#ifndef BOX_IOU_ROTATED_CUDA_CUH
+#define BOX_IOU_ROTATED_CUDA_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+#include "box_iou_rotated_utils.hpp"
+
+// 2D block with 32 * 16 = 512 threads per block
+const int BLOCK_DIM_X = 32;
+const int BLOCK_DIM_Y = 16;
+
+inline int divideUP(const int x, const int y) { return (((x) + (y)-1) / (y)); }
+
+template <typename T>
+__global__ void box_iou_rotated_cuda_kernel(
+    const int n_boxes1, const int n_boxes2, const T* dev_boxes1,
+    const T* dev_boxes2, T* dev_ious, const int mode_flag, const bool aligned) {
+  if (aligned) {
+    CUDA_1D_KERNEL_LOOP(index, n_boxes1) {
+      int b1 = index;
+      int b2 = index;
+
+      int base1 = b1 * 5;
+
+      float block_boxes1[5];
+      float block_boxes2[5];
+
+      block_boxes1[0] = dev_boxes1[base1 + 0];
+      block_boxes1[1] = dev_boxes1[base1 + 1];
+      block_boxes1[2] = dev_boxes1[base1 + 2];
+      block_boxes1[3] = dev_boxes1[base1 + 3];
+      block_boxes1[4] = dev_boxes1[base1 + 4];
+
+      int base2 = b2 * 5;
+
+      block_boxes2[0] = dev_boxes2[base2 + 0];
+      block_boxes2[1] = dev_boxes2[base2 + 1];
+      block_boxes2[2] = dev_boxes2[base2 + 2];
+      block_boxes2[3] = dev_boxes2[base2 + 3];
+      block_boxes2[4] = dev_boxes2[base2 + 4];
+
+      dev_ious[index] =
+          single_box_iou_rotated<T>(block_boxes1, block_boxes2, mode_flag);
+    }
+  } else {
+    CUDA_1D_KERNEL_LOOP(index, n_boxes1 * n_boxes2) {
+      int b1 = index / n_boxes2;
+      int b2 = index % n_boxes2;
+
+      int base1 = b1 * 5;
+
+      float block_boxes1[5];
+      float block_boxes2[5];
+
+      block_boxes1[0] = dev_boxes1[base1 + 0];
+      block_boxes1[1] = dev_boxes1[base1 + 1];
+      block_boxes1[2] = dev_boxes1[base1 + 2];
+      block_boxes1[3] = dev_boxes1[base1 + 3];
+      block_boxes1[4] = dev_boxes1[base1 + 4];
+
+      int base2 = b2 * 5;
+
+      block_boxes2[0] = dev_boxes2[base2 + 0];
+      block_boxes2[1] = dev_boxes2[base2 + 1];
+      block_boxes2[2] = dev_boxes2[base2 + 2];
+      block_boxes2[3] = dev_boxes2[base2 + 3];
+      block_boxes2[4] = dev_boxes2[base2 + 4];
+
+      dev_ious[index] =
+          single_box_iou_rotated<T>(block_boxes1, block_boxes2, mode_flag);
+    }
+  }
+}
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..88f70f5ed8aa68d4c08643d1948f0d9a5e339b9b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_cuda_kernel.cuh
@@ -0,0 +1,334 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CARAFE_CUDA_KERNEL_CUH
+#define CARAFE_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+// #ifdef MMCV_WITH_HIP
+#if defined(MMCV_WITH_HIP) || defined(__ILUVATAR__)
+#define WARP_SIZE 64
+#else
+#define WARP_SIZE 32
+#endif
+#define THREADS_PER_PIXEL 32
+#define MAX_SHARED_MEMORY 49152
+#define MAX_SHARED_SCALAR_T 6144  // 49152 / 8 = 6144
+#define MAXIMIZE_KERNEL_SIZE true
+#define kTileDim 32
+#define kBlockRows 8
+#define FULL_MASK 0xffffffff
+
+inline int divideUP(const int x, const int y) { return (((x) + (y)-1) / (y)); }
+
+__device__ inline int Loc2Index(const int n, const int c, const int h,
+                                const int w, const int channel_num,
+                                const int height, const int width) {
+  int index = w + (h + (c + n * channel_num) * height) * width;
+  return index;
+}
+#ifndef MMCV_WITH_HIP
+/* TODO: move this to a common place */
+template <typename scalar_t>
+__device__ inline scalar_t min(scalar_t a, scalar_t b) {
+  return a < b ? a : b;
+}
+
+template <typename scalar_t>
+__device__ inline scalar_t max(scalar_t a, scalar_t b) {
+  return a > b ? a : b;
+}
+#endif
+template <typename scalar_t>
+__device__ __forceinline__ scalar_t warpReduceSum(scalar_t val) {
+  for (int offset = WARP_SIZE / 2; offset > 0; offset /= 2)
+#ifdef MMCV_WITH_HIP
+    val += __shfl_down(val, offset);
+#else
+    val += __shfl_down_sync(FULL_MASK, val, offset);
+#endif
+  return val;
+}
+
+template <>
+__device__ __forceinline__ phalf warpReduceSum(phalf val) {
+  for (int offset = WARP_SIZE / 2; offset > 0; offset /= 2)
+#ifdef MMCV_WITH_HIP
+    __PHALF(val) += __shfl_down(val, offset);
+#else
+    __PHALF(val) +=
+        __shfl_down_sync(FULL_MASK, static_cast<__half>(__PHALF(val)), offset);
+#endif
+  return val;
+}
+
+// Splits the original matrix into submatrices with size 32 * 32.
+// Each block transposes one submatrix by loading it into shared memory.
+// Reference https://devblogs.nvidia.com/efficient-matrix-transpose-cuda-cc/
+template <typename scalar_t>
+__global__ void BatchTranspose2DCUDAKernel(const int N, const int H,
+                                           const int W, const int dh,
+                                           const int dw,
+                                           const scalar_t *__restrict__ X,
+                                           scalar_t *__restrict__ Y) {
+  __shared__ scalar_t tile[kTileDim][kTileDim + 1];
+  const int n = blockIdx.x / (dh * dw);
+  const int k = blockIdx.x % (dh * dw);
+  const int r = k / dw;
+  const int c = k % dw;
+  const int offset = n * H * W;
+  int x = c * kTileDim + threadIdx.x;
+  int y = r * kTileDim + threadIdx.y;
+  if (x < W) {
+    for (int i = 0; threadIdx.y + i < kTileDim && y + i < H; i += kBlockRows) {
+      tile[threadIdx.y + i][threadIdx.x] = X[offset + (y + i) * W + x];
+    }
+  }
+  __syncthreads();
+  x = r * kTileDim + threadIdx.x;
+  y = c * kTileDim + threadIdx.y;
+  if (x < H) {
+    for (int i = 0; threadIdx.y + i < kTileDim && y + i < W; i += kBlockRows) {
+      Y[offset + (y + i) * H + x] = tile[threadIdx.x][threadIdx.y + i];
+    }
+  }
+}
+template <typename scalar_t>
+__global__ void CARAFEForward(
+    const int num_kernels, const scalar_t *__restrict__ bottom_data,
+    const scalar_t *__restrict__ bottom_masks, const int kernel_size,
+    const int group_size, const int scale_factor, const int channels,
+    const int down_height, const int down_width, const int height,
+    const int width, const int mask_channels, scalar_t *__restrict__ top_data) {
+#if MAXIMIZE_KERNEL_SIZE
+  __shared__ float shared_mask[MAX_SHARED_SCALAR_T * 2];
+#else
+  __shared__ scalar_t shared_mask[MAX_SHARED_SCALAR_T];
+#endif
+
+  int index = threadIdx.x + blockIdx.x * blockDim.x;
+  if (index > num_kernels - 1) {
+    return;
+  }
+  const int pixel_id = threadIdx.x / THREADS_PER_PIXEL;
+  const int split_id = threadIdx.x % THREADS_PER_PIXEL;
+  index = index / THREADS_PER_PIXEL;
+  const int pw = index % width;
+  const int ph = (index / width) % height;
+  const int n = index / width / height;
+
+  const int down_pw = pw / scale_factor;
+  const int down_ph = ph / scale_factor;
+
+  const int start_w = down_pw - (kernel_size - 1) / 2;
+  const int end_w = down_pw + (kernel_size - 1) / 2 + 1;
+  const int start_h = down_ph - (kernel_size - 1) / 2;
+  const int end_h = down_ph + (kernel_size - 1) / 2 + 1;
+  for (int c = split_id; c < mask_channels; c += THREADS_PER_PIXEL) {
+    int mask_index = Loc2Index(n, ph, pw, c, height, width, mask_channels);
+    shared_mask[c * WARP_SIZE + pixel_id] = bottom_masks[mask_index];
+  }
+  __syncthreads();
+
+  const int channels_per_group = ceilf(channels / (float)group_size);
+#pragma unroll
+  for (int c = split_id; c < channels; c += THREADS_PER_PIXEL) {
+    int mask_group = c / channels_per_group;
+    scalar_t output_val = 0;
+#pragma unroll
+    for (int iy = start_h; iy < end_h; iy++) {
+#pragma unroll
+      for (int ix = start_w; ix < end_w; ix++) {
+        if (iy < 0 || iy > down_height - 1 || ix < 0 || ix > down_width - 1) {
+          continue;
+        }
+        int mask_iy = iy - down_ph + (kernel_size - 1) / 2;
+        int mask_ix = ix - down_pw + (kernel_size - 1) / 2;
+        int mask_c =
+            (mask_group * kernel_size + mask_iy) * kernel_size + mask_ix;
+        int feat_index =
+            Loc2Index(n, iy, ix, c, down_height, down_width, channels);
+
+        output_val += bottom_data[feat_index] *
+                      shared_mask[mask_c * WARP_SIZE + pixel_id];
+      }
+    }
+
+    int top_index = Loc2Index(n, ph, pw, c, height, width, channels);
+    top_data[top_index] = output_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void CARAFEBackward_Feature(
+    const int num_kernels, const scalar_t *__restrict__ top_diff,
+    const scalar_t *__restrict__ bottom_masks, const int kernel_size,
+    const int group_size, const int scale_factor, const int channels,
+    const int down_height, const int down_width, const int height,
+    const int width, const int mask_channels,
+    scalar_t *__restrict__ bottom_diff) {
+#if MAXIMIZE_KERNEL_SIZE
+  __shared__ float shared_mask[MAX_SHARED_SCALAR_T * 2];
+#else
+  __shared__ scalar_t shared_mask[MAX_SHARED_SCALAR_T];
+#endif
+
+  int index = threadIdx.x + blockIdx.x * blockDim.x;
+  if (index > num_kernels - 1) {
+    return;
+  }
+
+  const int pixel_id = threadIdx.x / THREADS_PER_PIXEL;
+  const int split_id = threadIdx.x % THREADS_PER_PIXEL;
+  // (n, c, ph, pw) is an element in the bottom_data
+  index = index / THREADS_PER_PIXEL;
+  const int pw = index % width;
+  const int ph = (index / width) % height;
+  const int n = index / width / height;
+
+  const int start_w = pw - (kernel_size - 1) * scale_factor / 2;
+  const int end_w = pw + (kernel_size - 1) * scale_factor / 2 + 1;
+  const int start_h = ph - (kernel_size - 1) * scale_factor / 2;
+  const int end_h = ph + (kernel_size - 1) * scale_factor / 2 + 1;
+  for (int c = split_id; c < mask_channels; c += THREADS_PER_PIXEL) {
+    const int mask_w = (c % kernel_size) * scale_factor;
+    const int mask_h = (c / kernel_size % kernel_size) * scale_factor;
+    const int mask_x = start_w + mask_w;
+    const int mask_y = start_h + mask_h;
+    if (mask_y < 0 || mask_y > height - 1 || mask_x < 0 || mask_x > width - 1) {
+      shared_mask[c * WARP_SIZE + pixel_id] = 0;
+      continue;
+    }
+    const int mask_group = c / (kernel_size * kernel_size);
+    const int mask_c = (2 * mask_group + 1) * kernel_size * kernel_size - c - 1;
+    int mask_index =
+        Loc2Index(n, mask_c, mask_y, mask_x, mask_channels, height, width);
+    shared_mask[c * WARP_SIZE + pixel_id] = bottom_masks[mask_index];
+  }
+  __syncthreads();
+  const int channels_per_group = ceilf(channels / (float)group_size);
+#pragma unroll
+  for (int c = split_id; c < channels; c += THREADS_PER_PIXEL) {
+    int mask_group = c / channels_per_group;
+    int top_index = Loc2Index(n, ph, pw, c, height, width, channels);
+    scalar_t output_val = 0;
+#pragma unroll
+    for (int iy = start_h; iy < end_h; iy += scale_factor) {
+#pragma unroll
+      for (int ix = start_w; ix < end_w; ix += scale_factor) {
+        if (iy < 0 || iy > height - 1 || ix < 0 || ix > width - 1) {
+          continue;
+        }
+        int mask_iy =
+            (iy - ph + (kernel_size - 1) * scale_factor / 2) / scale_factor;
+        int mask_ix =
+            (ix - pw + (kernel_size - 1) * scale_factor / 2) / scale_factor;
+        int mask_c =
+            (mask_group * kernel_size + mask_iy) * kernel_size + mask_ix;
+        int feat_index = Loc2Index(n, iy, ix, c, height, width, channels);
+        output_val +=
+            shared_mask[mask_c * WARP_SIZE + pixel_id] * top_diff[feat_index];
+      }
+    }
+    bottom_diff[top_index] = output_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void FeatureSum(const int num_kernels,
+                           const scalar_t *__restrict__ input_data,
+                           const int scale_factor, const int channels,
+                           const int height, const int width,
+                           scalar_t *__restrict__ output_data) {
+  int index = threadIdx.x + blockIdx.x * blockDim.x;
+  if (index > num_kernels - 1) {
+    return;
+  }
+  const int split_id = threadIdx.x % THREADS_PER_PIXEL;
+  index = index / THREADS_PER_PIXEL;
+  const int pw = index % width;
+  const int ph = (index / width) % height;
+  const int n = index / width / height;
+  for (int c = split_id; c < channels; c += THREADS_PER_PIXEL) {
+    scalar_t output_val = 0;
+    for (int iy = ph * scale_factor; iy < (ph + 1) * scale_factor; iy++) {
+      for (int ix = pw * scale_factor; ix < (pw + 1) * scale_factor; ix++) {
+        int input_id = Loc2Index(n, iy, ix, c, height * scale_factor,
+                                 width * scale_factor, channels);
+        output_val += input_data[input_id];
+      }
+    }
+    const int output_id = Loc2Index(n, ph, pw, c, height, width, channels);
+    output_data[output_id] = output_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void CARAFEBackward_Mask(const int num_kernels,
+                                    const scalar_t *__restrict__ top_diff,
+                                    const scalar_t *__restrict__ bottom_data,
+                                    const int kernel_size, const int group_size,
+                                    const int scale_factor, const int channels,
+                                    const int down_height, const int down_width,
+                                    const int height, const int width,
+                                    const int mask_channels,
+                                    scalar_t *__restrict__ mask_diff) {
+  int index = threadIdx.x + blockIdx.x * blockDim.x;
+  if (index > num_kernels - 1) {
+    return;
+  }
+
+  const int lane_id = index % WARP_SIZE;
+  index = index / WARP_SIZE;
+  const int mask_c = index % mask_channels;
+  // (n, c, ph, pw) is an element in the bottom_data
+  index = index / mask_channels;
+  const int pw = index % width;
+  const int ph = (index / width) % height;
+  const int n = index / width / height;
+
+  const int down_pw = pw / scale_factor;
+  const int down_ph = ph / scale_factor;
+
+  const int mask_group = mask_c / (kernel_size * kernel_size);
+  const int mask_loc = mask_c % (kernel_size * kernel_size);
+
+  const int offset_x = mask_loc % kernel_size - (kernel_size - 1) / 2;
+  const int offset_y =
+      mask_loc / kernel_size % kernel_size - (kernel_size - 1) / 2;
+
+  const int down_x = down_pw + offset_x;
+  const int down_y = down_ph + offset_y;
+
+  scalar_t output_val = 0;
+
+  if (down_y >= 0 && down_y <= down_height - 1 && down_x >= 0 &&
+      down_x <= down_width - 1) {
+    const int channels_per_mask = ceilf(channels / (float)group_size);
+    const int start = channels_per_mask * mask_group;
+    const int end = min(channels_per_mask * (mask_group + 1), channels);
+    for (int c = start + lane_id; c < end; c += WARP_SIZE) {
+      int bottom_id =
+          Loc2Index(n, down_y, down_x, c, down_height, down_width, channels);
+      int top_id = Loc2Index(n, ph, pw, c, height, width, channels);
+      output_val += top_diff[top_id] * bottom_data[bottom_id];
+    }
+  }
+// #ifdef MMCV_WITH_HIP
+#if defined(MMCV_WITH_HIP) || defined(__ILUVATAR__)
+  __syncthreads();
+#else
+  __syncwarp();
+#endif
+  output_val = warpReduceSum(output_val);
+  if (lane_id == 0) {
+    const int mask_id =
+        Loc2Index(n, ph, pw, mask_c, height, width, mask_channels);
+    mask_diff[mask_id] = output_val;
+  }
+}
+
+#endif  // CARAFE_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_naive_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_naive_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..48230c632f223b736aa72a9d5fd682c97b3aa93a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/carafe_naive_cuda_kernel.cuh
@@ -0,0 +1,111 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CARAFE_NAIVE_CUDA_KERNEL_CUH
+#define CARAFE_NAIVE_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+__device__ inline int Loc2Index(const int n, const int c, const int h,
+                                const int w, const int channel_num,
+                                const int height, const int width) {
+  int index = w + (h + (c + n * channel_num) * height) * width;
+  return index;
+}
+
+template <typename scalar_t>
+__global__ void carafe_naive_forward_cuda_kernel(
+    const int nthreads, const scalar_t *bottom_data,
+    const scalar_t *bottom_masks, scalar_t *top_data, const int kernel_size,
+    const int group_size, const int scale_factor, const int channels,
+    const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the bottom_data
+    int pw = index % width;
+    int ph = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    int mask_channels = kernel_size * kernel_size * group_size;
+    int mask_group = c / (channels / group_size);
+
+    int down_pw = pw / scale_factor;
+    int down_ph = ph / scale_factor;
+    int down_width = width / scale_factor;
+    int down_height = height / scale_factor;
+    int start_w = down_pw - (kernel_size - 1) / 2;
+    int end_w = down_pw + (kernel_size - 1) / 2 + 1;
+    int start_h = down_ph - (kernel_size - 1) / 2;
+    int end_h = down_ph + (kernel_size - 1) / 2 + 1;
+
+    scalar_t output_val = 0;
+    for (int iy = start_h; iy < end_h; iy++) {
+      for (int ix = start_w; ix < end_w; ix++) {
+        if (iy < 0 || iy > down_height - 1 || ix < 0 || ix > down_width - 1) {
+          continue;
+        }
+        int mask_iy = iy - down_ph + (kernel_size - 1) / 2;
+        int mask_ix = ix - down_pw + (kernel_size - 1) / 2;
+        int mask_c =
+            (mask_group * kernel_size + mask_iy) * kernel_size + mask_ix;
+        int feat_index =
+            Loc2Index(n, c, iy, ix, channels, down_height, down_width);
+        int mask_index =
+            Loc2Index(n, mask_c, ph, pw, mask_channels, height, width);
+        output_val += bottom_data[feat_index] * bottom_masks[mask_index];
+      }
+    }
+    top_data[index] = output_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void carafe_naive_backward_cuda_kernel(
+    const int nthreads, const scalar_t *top_diff, const scalar_t *bottom_data,
+    const scalar_t *bottom_masks, scalar_t *bottom_diff, scalar_t *mask_diff,
+    const int kernel_size, const int group_size, const int scale_factor,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the bottom_data
+    int pw = index % width;
+    int ph = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    int mask_channels = kernel_size * kernel_size * group_size;
+    int mask_group = c / (channels / group_size);
+
+    int down_pw = pw / scale_factor;
+    int down_ph = ph / scale_factor;
+    int down_width = width / scale_factor;
+    int down_height = height / scale_factor;
+    int start_w = down_pw - (kernel_size - 1) / 2;
+    int end_w = down_pw + (kernel_size - 1) / 2 + 1;
+    int start_h = down_ph - (kernel_size - 1) / 2;
+    int end_h = down_ph + (kernel_size - 1) / 2 + 1;
+
+    for (int iy = start_h; iy < end_h; iy++) {
+      for (int ix = start_w; ix < end_w; ix++) {
+        if (iy < 0 || iy > down_height - 1 || ix < 0 || ix > down_width - 1) {
+          continue;
+        }
+        int mask_iy = iy - down_ph + (kernel_size - 1) / 2;
+        int mask_ix = ix - down_pw + (kernel_size - 1) / 2;
+        int mask_c =
+            (mask_group * kernel_size + mask_iy) * kernel_size + mask_ix;
+        int feat_index =
+            Loc2Index(n, c, iy, ix, channels, down_height, down_width);
+        int mask_index =
+            Loc2Index(n, mask_c, ph, pw, mask_channels, height, width);
+        atomicAdd(bottom_diff + feat_index,
+                  bottom_masks[mask_index] * top_diff[index]);
+        atomicAdd(mask_diff + mask_index,
+                  bottom_data[feat_index] * top_diff[index]);
+      }
+    }
+  }
+}
+
+#endif  // CARAFE_NAIVE_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/chamfer_distance_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/chamfer_distance_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..89feea4a546a5093967f26393ca6be3b9fe6ae05
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/chamfer_distance_cuda_kernel.cuh
@@ -0,0 +1,101 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/chrdiller/pyTorchChamferDistance/blob/master/chamfer_distance/chamfer_distance.cu
+#ifndef CHAMFER_DISTANCE_CUDA_KERNEL_CUH
+#define CHAMFER_DISTANCE_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#define MAX_SHARED_SCALAR_T 6144  // 49152 / 8 = 6144
+
+template <typename scalar_t>
+__global__ void chamfer_distance_forward_cuda_kernel(int b, int n,
+                                                     const scalar_t* xyz, int m,
+                                                     const scalar_t* xyz2,
+                                                     scalar_t* result,
+                                                     int* result_i) {
+  __shared__ scalar_t buf[MAX_SHARED_SCALAR_T];
+  for (int i = blockIdx.x; i < b; i += gridDim.x) {
+    for (int k2 = 0; k2 < m; k2 += THREADS_PER_BLOCK) {
+      int end_k = min(m, k2 + THREADS_PER_BLOCK) - k2;
+      for (int j = threadIdx.x; j < end_k * 2; j += blockDim.x) {
+        buf[j] = xyz2[(i * m + k2) * 2 + j];
+      }
+      __syncthreads();
+      for (int j = threadIdx.x; j < n; j += blockDim.x * gridDim.y) {
+        scalar_t x1 = xyz[(i * n + j) * 2 + 0];
+        scalar_t y1 = xyz[(i * n + j) * 2 + 1];
+        int best_i = 0;
+        scalar_t best = 1e10;
+        int end_ka = end_k & (~2);
+        if (end_ka == THREADS_PER_BLOCK) {
+          for (int k = 0; k < THREADS_PER_BLOCK; k += 4) {
+#pragma unroll
+            for (int j = 0; j < 4; ++j) {
+              scalar_t x2 = buf[(k + j) * 2] - x1;
+              scalar_t y2 = buf[(k + j) * 2 + 1] - y1;
+              scalar_t d = x2 * x2 + y2 * y2;
+              if (d < best) {
+                best = d;
+                best_i = k + k2 + j;
+              }
+            }
+          }
+        } else {
+          for (int k = 0; k < end_ka; k += 4) {
+#pragma unroll
+            for (int j = 0; j < 4; ++j) {
+              scalar_t x2 = buf[(k + j) * 2] - x1;
+              scalar_t y2 = buf[(k + j) * 2 + 1] - y1;
+              scalar_t d = x2 * x2 + y2 * y2;
+              if (d < best) {
+                best = d;
+                best_i = k + k2 + j;
+              }
+            }
+          }
+        }
+        for (int k = end_ka; k < end_k; k++) {
+          scalar_t x2 = buf[k * 2 + 0] - x1;
+          scalar_t y2 = buf[k * 2 + 1] - y1;
+          scalar_t d = x2 * x2 + y2 * y2;
+          if (k == 0 || d < best) {
+            best = d;
+            best_i = k + k2;
+          }
+        }
+        if (k2 == 0 || result[(i * n + j)] > best) {
+          result[(i * n + j)] = best;
+          result_i[(i * n + j)] = best_i;
+        }
+      }
+      __syncthreads();
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void chamfer_distance_backward_cuda_kernel(
+    int b, int n, const scalar_t* xyz1, int m, const scalar_t* xyz2,
+    const scalar_t* grad_dist1, const int* idx1, scalar_t* grad_xyz1,
+    scalar_t* grad_xyz2) {
+  for (int i = blockIdx.x; i < b; i += gridDim.x) {
+    for (int j = threadIdx.x; j < n; j += blockDim.x * gridDim.y) {
+      scalar_t x1 = xyz1[(i * n + j) * 2 + 0];
+      scalar_t y1 = xyz1[(i * n + j) * 2 + 1];
+      int j2 = idx1[i * n + j];
+      scalar_t x2 = xyz2[(i * m + j2) * 2 + 0];
+      scalar_t y2 = xyz2[(i * m + j2) * 2 + 1];
+      scalar_t g = grad_dist1[i * n + j] * 2;
+      atomicAdd(&(grad_xyz1[(i * n + j) * 2 + 0]), g * (x1 - x2));
+      atomicAdd(&(grad_xyz1[(i * n + j) * 2 + 1]), g * (y1 - y2));
+      atomicAdd(&(grad_xyz2[(i * m + j2) * 2 + 0]), -(g * (x1 - x2)));
+      atomicAdd(&(grad_xyz2[(i * m + j2) * 2 + 1]), -(g * (y1 - y2)));
+    }
+  }
+}
+#endif  // CHAMFER_DISTANCE_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/common_cuda_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/common_cuda_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..9e544c79b4563f612242121fd9361df0bb9e23fe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/common_cuda_helper.hpp
@@ -0,0 +1,122 @@
+#ifndef COMMON_CUDA_HELPER
+#define COMMON_CUDA_HELPER
+
+#include <cuda.h>
+#include <iostream>
+using namespace std;
+
+#define CUDA_1D_KERNEL_LOOP(i, n)                              \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
+       i += blockDim.x * gridDim.x)
+
+#define CUDA_2D_KERNEL_LOOP(i, n, j, m)                             \
+  for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < (n);   \
+       i += blockDim.x * gridDim.x)                                 \
+    for (size_t j = blockIdx.y * blockDim.y + threadIdx.y; j < (m); \
+         j += blockDim.y * gridDim.y)
+
+#define CUDA_2D_KERNEL_BLOCK_LOOP(i, n, j, m)          \
+  for (size_t i = blockIdx.x; i < (n); i += gridDim.x) \
+    for (size_t j = blockIdx.y; j < (m); j += gridDim.y)
+
+#define THREADS_PER_BLOCK 512
+
+inline int GET_BLOCKS(const int N, const int num_threads = THREADS_PER_BLOCK) {
+  int optimal_block_num = (N + num_threads - 1) / num_threads;
+  int max_block_num = 4096;
+  return std::min(optimal_block_num, max_block_num);
+}
+
+template <typename T>
+__device__ T bilinear_interpolate(const T* input, const int height,
+                                  const int width, T y, T x,
+                                  const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) return 0;
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  int y_low = (int)y;
+  int x_low = (int)x;
+  int y_high;
+  int x_high;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+  // do bilinear interpolation
+  T v1 = input[y_low * width + x_low];
+  T v2 = input[y_low * width + x_high];
+  T v3 = input[y_high * width + x_low];
+  T v4 = input[y_high * width + x_high];
+  T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+
+  return val;
+}
+
+template <typename T>
+__device__ void bilinear_interpolate_gradient(
+    const int height, const int width, T y, T x, T& w1, T& w2, T& w3, T& w4,
+    int& x_low, int& x_high, int& y_low, int& y_high,
+    const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    // empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  y_low = (int)y;
+  x_low = (int)x;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+
+  // reference in forward
+  // T v1 = input[y_low * width + x_low];
+  // T v2 = input[y_low * width + x_high];
+  // T v3 = input[y_high * width + x_low];
+  // T v4 = input[y_high * width + x_high];
+  // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+  return;
+}
+#endif  // COMMON_CUDA_HELPER
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/convex_iou_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/convex_iou_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..9dc42bad6fa627068f6cddec9e61b8a6a58ca7d9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/convex_iou_cuda_kernel.cuh
@@ -0,0 +1,831 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CONVEX_IOU_CUDA_KERNEL_CUH
+#define CONVEX_IOU_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#define MAXN 100
+#define NMAX 512
+__device__ const float EPS = 1E-8;
+
+__device__ inline int sig(float d) { return (d > EPS) - (d < -EPS); }
+
+struct Point {
+  float x, y;
+  __device__ Point() {}
+  __device__ Point(float x, float y) : x(x), y(y) {}
+};
+
+__device__ inline bool point_same(Point& a, Point& b) {
+  return sig(a.x - b.x) == 0 && sig(a.y - b.y) == 0;
+}
+
+__device__ inline void swap1(Point* a, Point* b) {
+  Point temp;
+  temp.x = a->x;
+  temp.y = a->y;
+
+  a->x = b->x;
+  a->y = b->y;
+
+  b->x = temp.x;
+  b->y = temp.y;
+}
+
+__device__ inline void reverse1(Point* a, const int n) {
+  for (int i = 0; i < (n - 1) / 2.0; i++) {
+    Point* j = &(a[i]);
+    Point* k = &(a[n - 1 - i]);
+    swap1(j, k);
+  }
+}
+
+__device__ inline float cross(Point o, Point a, Point b) {
+  return (a.x - o.x) * (b.y - o.y) - (b.x - o.x) * (a.y - o.y);
+}
+
+__device__ inline float dis(Point a, Point b) {
+  return (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
+}
+__device__ inline float area(Point* ps, int n) {
+  ps[n] = ps[0];
+  float res = 0;
+  for (int i = 0; i < n; i++) {
+    res += ps[i].x * ps[i + 1].y - ps[i].y * ps[i + 1].x;
+  }
+  return res / 2.0;
+}
+__device__ inline float polygon_area_grad(Point* ps, int n,
+                                           int* polygon_to_pred_index,
+                                           int n_pred, float* grad_C) {
+  ps[n] = ps[0];
+  float partion_grad[4 * 30 + 2];
+  float res = 0;
+  for (int i = 0; i < n; i++) {
+    res += ps[i].x * ps[i + 1].y - ps[i].y * ps[i + 1].x;
+    partion_grad[i * 4 + 2] = ps[i + 1].y;
+    partion_grad[i * 4 + 3] = -ps[i + 1].x;
+    if (i != n - 1) {
+      partion_grad[i * 4 + 4] = -ps[i].y;
+      partion_grad[i * 4 + 5] = ps[i].x;
+    } else {
+      partion_grad[0] = -ps[i].y;
+      partion_grad[1] = ps[i].x;
+    }
+  }
+  for (int i = 0; i < n; i++) {
+    for (int j = 0; j < n_pred; j++) {
+      if (i == polygon_to_pred_index[j]) {
+        grad_C[2 * polygon_to_pred_index[j + n_pred]] =
+            (partion_grad[i * 4] + partion_grad[i * 4 + 2]) / 2;
+        break;
+      }
+    }
+    for (int j = 0; j < n_pred; j++) {
+      if (i == polygon_to_pred_index[j]) {
+        grad_C[2 * polygon_to_pred_index[j + n_pred] + 1] =
+            (partion_grad[i * 4 + 1] + partion_grad[i * 4 + 1 + 2]) / 2;
+        break;
+      }
+    }
+  }
+
+  return res / 2.0;
+}
+
+__device__ inline int lineCross(Point a, Point b, Point c, Point d, Point& p,
+                                float* cut_grad, int m, int n, int i) {
+  float s1, s2;
+  float s2_s1_2;
+  float ds1_dxc, ds1_dyc, ds2_dxd, ds2_dyd;
+  float dxp_dxc, dxp_dyc, dxp_dxd, dxp_dyd, dyp_dxc, dyp_dyc, dyp_dxd, dyp_dyd;
+  s1 = cross(a, b, c);
+  s2 = cross(a, b, d);
+
+  ds1_dxc = -(b.y - a.y);
+  ds1_dyc = b.x - a.x;
+  ds2_dxd = ds1_dxc;
+  ds2_dyd = ds1_dyc;
+  s2_s1_2 = (s2 - s1) * (s2 - s1);
+
+  if (sig(s1) == 0 && sig(s2) == 0) return 2;
+  if (sig(s2 - s1) == 0) return 0;
+
+  dxp_dxc =
+      ((s2 - d.x * ds1_dxc) * (s2 - s1) - (c.x * s2 - d.x * s1) * (-ds1_dxc)) /
+      (s2_s1_2);
+  dxp_dyc =
+      ((0 - d.x * ds1_dyc) * (s2 - s1) - (c.x * s2 - d.x * s1) * (-ds1_dyc)) /
+      (s2_s1_2);
+  dxp_dxd =
+      ((c.x * ds2_dxd - s1) * (s2 - s1) - (c.x * s2 - d.x * s1) * (ds2_dxd)) /
+      (s2_s1_2);
+  dxp_dyd =
+      ((c.x * ds2_dyd - 0) * (s2 - s1) - (c.x * s2 - d.x * s1) * (ds2_dyd)) /
+      (s2_s1_2);
+
+  dyp_dxc =
+      ((0 - d.y * ds1_dxc) * (s2 - s1) - (c.y * s2 - d.y * s1) * (-ds1_dxc)) /
+      (s2_s1_2);
+  dyp_dyc =
+      ((s2 - d.y * ds1_dyc) * (s2 - s1) - (c.y * s2 - d.y * s1) * (-ds1_dyc)) /
+      (s2_s1_2);
+  dyp_dxd =
+      ((c.y * ds2_dxd - 0) * (s2 - s1) - (c.y * s2 - d.y * s1) * (ds2_dxd)) /
+      (s2_s1_2);
+  dyp_dyd =
+      ((c.y * ds2_dyd - s1) * (s2 - s1) - (c.y * s2 - d.y * s1) * (ds2_dyd)) /
+      (s2_s1_2);
+
+  p.x = (c.x * s2 - d.x * s1) / (s2 - s1);
+  p.y = (c.y * s2 - d.y * s1) / (s2 - s1);
+  if (i == n - 1) {
+    cut_grad[4 * n * m + 4 * i] = dxp_dxc;  // + dyp_dxc;
+    cut_grad[4 * n * m + 4 * i + 1] = dyp_dxc;
+    cut_grad[4 * n * m + 4 * i + 2] = dxp_dyc;  // + dyp_dyc;
+    cut_grad[4 * n * m + 4 * i + 3] = dyp_dyc;
+    cut_grad[4 * n * m + 0] = dxp_dxd;  // + dyp_dxd;
+    cut_grad[4 * n * m + 1] = dyp_dxd;
+    cut_grad[4 * n * m + 2] = dxp_dyd;  // + dyp_dyd;
+    cut_grad[4 * n * m + 3] = dyp_dyd;
+  } else {
+    cut_grad[4 * n * m + 4 * i] = dxp_dxc;  // + dyp_dxc;
+    cut_grad[4 * n * m + 4 * i + 1] = dyp_dxc;
+    cut_grad[4 * n * m + 4 * i + 2] = dxp_dyc;  // + dyp_dyc;
+    cut_grad[4 * n * m + 4 * i + 3] = dyp_dyc;
+    cut_grad[4 * n * m + 4 * (i + 1)] = dxp_dxd;  // + dyp_dxd;
+    cut_grad[4 * n * m + 4 * (i + 1) + 1] = dyp_dxd;
+    cut_grad[4 * n * m + 4 * (i + 1) + 2] = dxp_dyd;  // + dyp_dyd;
+    cut_grad[4 * n * m + 4 * (i + 1) + 3] = dyp_dyd;
+  }
+
+  return 1;
+}
+__device__ inline void polygon_cut(Point* p, int& n, Point a, Point b,
+                                   float* cut_grad) {
+  Point pp[MAXN];
+  float ccur_grad[MAXN] = {};
+  int m = 0;
+  p[n] = p[0];
+  int k = n;
+  for (int i = 0; i < n; i++) {
+    if (sig(cross(a, b, p[i])) > 0) {
+      pp[m] = p[i];
+      ccur_grad[4 * n * m + 4 * i] = 1.0;
+      ccur_grad[4 * n * m + 4 * i + 3] = 1.0;
+      m++;
+    }
+    if (sig(cross(a, b, p[i])) != sig(cross(a, b, p[i + 1]))) {
+      lineCross(a, b, p[i], p[i + 1], pp[m], ccur_grad, m, n, i);
+      m++;
+    }
+  }
+
+  n = 0;
+  for (int i = 0; i < m; i++) {
+    if (!i || !(point_same(pp[i], pp[i - 1]))) {
+      p[n] = pp[i];
+      for (int j = 0; j < 4 * k; j++) {
+        cut_grad[4 * k * n + j] = ccur_grad[4 * k * i + j];
+      }
+      n++;
+    }
+  }
+
+  while (n > 1 && point_same(p[n - 1], p[0])) n--;
+}
+
+__device__ inline float intersectArea(Point a, Point b, Point c, Point d,
+                                       float* grad_AB, int order,
+                                       int convex_n) {
+  Point o(0, 0);
+  int res_flag = 0;
+  int s1 = sig(cross(o, a, b));
+  int s2 = sig(cross(o, c, d));
+  if (s1 == 0 || s2 == 0) return 0.0;
+  if (s1 == -1) {
+    Point* i = &a;
+    Point* j = &b;
+    swap1(i, j);
+    res_flag = 1;
+  }
+  if (s2 == -1) {
+    Point* i = &c;
+    Point* j = &d;
+    swap1(i, j);
+  }
+  Point p[10] = {o, a, b};
+  int n = 3, n0 = 3, n1, n2, n3;
+  float cut_grad1[MAXN] = {};
+  float cut_grad2[MAXN] = {};
+  float cut_grad3[MAXN] = {};
+  float p1_p_grad[10][10] = {};
+  float p2_p1_grad[10][10] = {};
+  float p3_p2_grad[10][10] = {};
+
+  float p3_p1_grad[10][10] = {};
+  float p3_p_grad[10][10] = {};
+
+  // 1
+  polygon_cut(p, n, o, c, cut_grad1);
+  n1 = n;
+  for (int i = 0; i < n; i++) {
+    for (int j = 0; j < 4 * n0; j++) {
+      if (!(j % 2)) {
+        p1_p_grad[2 * i][j / 2] = cut_grad1[4 * n0 * i + j];
+      } else {
+        p1_p_grad[2 * i + 1][j / 2] = cut_grad1[4 * n0 * i + j];
+      }
+    }
+  }
+
+  // 2
+  polygon_cut(p, n, c, d, cut_grad2);
+  n2 = n;
+  for (int i = 0; i < n; i++) {
+    for (int j = 0; j < 4 * n1; j++) {
+      if (!(j % 2)) {
+        p2_p1_grad[2 * i][j / 2] = cut_grad2[4 * n1 * i + j];
+      } else {
+        p2_p1_grad[2 * i + 1][j / 2] = cut_grad2[4 * n1 * i + j];
+      }
+    }
+  }
+  // 3
+  polygon_cut(p, n, d, o, cut_grad3);
+  n3 = n;
+  for (int i = 0; i < n; i++) {
+    for (int j = 0; j < 4 * n2; j++) {
+      if (!(j % 2)) {
+        p3_p2_grad[2 * i][j / 2] = cut_grad3[4 * n2 * i + j];
+      } else {
+        p3_p2_grad[2 * i + 1][j / 2] = cut_grad3[4 * n2 * i + j];
+      }
+    }
+  }
+
+  // mul
+  //  p3_p2(n3 * n2) * p2_p1(n2 * n1) = p3_p1 (n3 * n1)
+  for (int i = 0; i < 2 * n3; i++) {
+    for (int j = 0; j < 2 * n1; j++) {
+      float sum = 0.0;
+      for (int m = 0; m < 2 * n2; m++) {
+        sum = sum + p3_p2_grad[i][m] * p2_p1_grad[m][j];
+      }
+      p3_p1_grad[i][j] = sum;
+    }
+  }
+
+  // p3_p1 (n3 * n1) * p1_p (n1 * n0) = p3_p (n3 * n0)
+  for (int i = 0; i < 2 * n3; i++) {
+    for (int j = 0; j < 2 * n0; j++) {
+      float sum = 0.0;
+      for (int m = 0; m < 2 * n1; m++) {
+        sum = sum + p3_p1_grad[i][m] * p1_p_grad[m][j];
+      }
+      p3_p_grad[i][j] = sum;
+    }
+  }
+
+  // calculate S_grad
+  int polygon_index_box_index[20];
+  float grad_polygon[20];
+  float S_grad[6];
+
+  for (int i = 0; i < n3; i++) {
+    polygon_index_box_index[i] = i;
+    polygon_index_box_index[i + n3] = i;
+  }
+
+  float res =
+      polygon_area_grad(p, n3, polygon_index_box_index, n3, grad_polygon);
+
+  if (s1 * s2 == -1) {
+    for (int j = 0; j < 2 * 3; j++) {
+      float sum = 0.0;
+      for (int m = 0; m < 2 * n3; m++) {
+        sum = sum - grad_polygon[m] * p3_p_grad[m][j];
+      }
+      S_grad[j] = sum;
+    }
+
+    if (order != convex_n - 1) {
+      if (res_flag) {
+        grad_AB[2 * order] += S_grad[4];
+        grad_AB[2 * order + 1] += S_grad[5];
+        grad_AB[2 * order + 2] += S_grad[2];
+        grad_AB[2 * order + 3] += S_grad[3];
+
+      } else {
+        grad_AB[2 * order] += S_grad[2];
+        grad_AB[2 * order + 1] += S_grad[3];
+        grad_AB[2 * order + 2] += S_grad[4];
+        grad_AB[2 * order + 3] += S_grad[5];
+      }
+    } else {
+      if (res_flag) {
+        grad_AB[2 * order] += S_grad[4];
+        grad_AB[2 * order + 1] += S_grad[5];
+        grad_AB[0] += S_grad[2];
+        grad_AB[1] += S_grad[3];
+
+      } else {
+        grad_AB[2 * order] += S_grad[2];
+        grad_AB[2 * order + 1] += S_grad[3];
+        grad_AB[0] += S_grad[4];
+        grad_AB[1] += S_grad[5];
+      }
+    }
+    res = -res;
+  } else {
+    for (int j = 0; j < 2 * 3; j++) {
+      float sum = 0.0;
+      for (int m = 0; m < 2 * n3; m++) {
+        sum = sum + grad_polygon[m] * p3_p_grad[m][j];
+      }
+      S_grad[j] = sum;
+    }
+
+    if (order != convex_n - 1) {
+      if (res_flag) {
+        grad_AB[2 * order] += S_grad[4];
+        grad_AB[2 * order + 1] += S_grad[5];
+        grad_AB[2 * order + 2] += S_grad[2];
+        grad_AB[2 * order + 3] += S_grad[3];
+      } else {
+        grad_AB[2 * order] += S_grad[2];
+        grad_AB[2 * order + 1] += S_grad[3];
+        grad_AB[2 * order + 2] += S_grad[4];
+        grad_AB[2 * order + 3] += S_grad[5];
+      }
+    } else {
+      if (res_flag) {
+        grad_AB[2 * order] += S_grad[4];
+        grad_AB[2 * order + 1] += S_grad[5];
+        grad_AB[0] += S_grad[2];
+        grad_AB[1] += S_grad[3];
+      } else {
+        grad_AB[2 * order] += S_grad[2];
+        grad_AB[2 * order + 1] += S_grad[3];
+        grad_AB[0] += S_grad[4];
+        grad_AB[1] += S_grad[5];
+      }
+    }
+  }
+  return res;
+}
+
+__device__ inline float intersectAreaO(Point* ps1, int n1, Point* ps2, int n2,
+                                        float* grad_AB) {
+  if (area(ps1, n1) < 0) reverse1(ps1, n1);
+  if (area(ps2, n2) < 0) reverse1(ps2, n2);
+  ps1[n1] = ps1[0];
+  ps2[n2] = ps2[0];
+  float res = 0;
+  for (int i = 0; i < n1; i++) {
+    for (int j = 0; j < n2; j++) {
+      res +=
+          intersectArea(ps1[i], ps1[i + 1], ps2[j], ps2[j + 1], grad_AB, i, n1);
+    }
+  }
+  return res;
+}
+
+__device__ inline void Jarvis(Point* in_poly, int& n_poly) {
+  Point p_max, p_k;
+  int max_index, k_index;
+  int Stack[NMAX] = {}, top1, top2;
+  float sign;
+  Point right_point[10], left_point[10];
+
+  for (int i = 0; i < n_poly; i++) {
+    if (in_poly[i].y < in_poly[0].y ||
+        in_poly[i].y == in_poly[0].y && in_poly[i].x < in_poly[0].x) {
+      Point* j = &(in_poly[0]);
+      Point* k = &(in_poly[i]);
+      swap1(j, k);
+    }
+    if (i == 0) {
+      p_max = in_poly[0];
+      max_index = 0;
+    }
+    if (in_poly[i].y > p_max.y ||
+        in_poly[i].y == p_max.y && in_poly[i].x > p_max.x) {
+      p_max = in_poly[i];
+      max_index = i;
+    }
+  }
+
+  if (max_index == 0) {
+    max_index = 1;
+    p_max = in_poly[max_index];
+  }
+
+  k_index = 0, Stack[0] = 0, top1 = 0;
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top1]], in_poly[i], p_k);
+      if ((sign > 0) || ((sign == 0) && (dis(in_poly[Stack[top1]], in_poly[i]) >
+                                         dis(in_poly[Stack[top1]], p_k)))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top1++;
+    Stack[top1] = k_index;
+  }
+  for (int i = 0; i <= top1; i++) right_point[i] = in_poly[Stack[i]];
+
+  k_index = 0, Stack[0] = 0, top2 = 0;
+
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top2]], in_poly[i], p_k);
+      if ((sign < 0) || (sign == 0) && (dis(in_poly[Stack[top2]], in_poly[i]) >
+                                        dis(in_poly[Stack[top2]], p_k))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top2++;
+    Stack[top2] = k_index;
+  }
+  for (int i = top2 - 1; i >= 0; i--) left_point[i] = in_poly[Stack[i]];
+
+  for (int i = 0; i < top1 + top2; i++) {
+    if (i <= top1) {
+      in_poly[i] = right_point[i];
+    } else {
+      in_poly[i] = left_point[top2 - (i - top1)];
+    }
+  }
+  n_poly = top1 + top2;
+}
+
+__device__ inline float intersectAreaPoly(Point* ps1, int n1, Point* ps2,
+                                           int n2, float* grad_C) {
+  Point polygon[MAXN];
+  int n = n1 + n2, n_poly = 0;
+  for (int i = 0; i < n1; i++) {
+    for (int j = 0; j < n - n1; j++) {
+      if (point_same(ps1[i], ps2[j])) {
+        for (int k = j; k < n - n1 - 1; k++) {
+          ps2[k] = ps2[k + 1];
+        }
+        n2--;
+        break;
+      }
+    }
+  }
+  n_poly = n1 + n2;
+  for (int i = 0; i < n_poly; i++) {
+    if (i < n1) {
+      polygon[i] = ps1[i];
+    } else {
+      polygon[i] = ps2[i - n1];
+    }
+  }
+
+  Jarvis(polygon, n_poly);
+
+  int polygon_to_pred_index[18] = {-1, -1, -1, -1, -1, -1, -1, -1, -1,
+                                   -1, -1, -1, -1, -1, -1, -1, -1, -1};
+  int n_pred = 0;
+  for (int i = 0; i < n_poly; i++) {
+    for (int j = 0; j < n1; j++) {
+      if (polygon[i].x == ps1[j].x && polygon[i].y == ps1[j].y) {
+        polygon_to_pred_index[n_pred] = i;
+        polygon_to_pred_index[n_pred + n1] = j;
+        n_pred += 1;
+        break;
+      }
+    }
+  }
+  if (n_pred == 0) {
+    float polygon_area = fabs(area(polygon, n_poly));
+    for (int i = 0; i < 18; i++) {
+      grad_C[i] = 0.0;
+    }
+    return polygon_area;
+  } else {
+    float polygon_area =
+        polygon_area_grad(polygon, n_poly, polygon_to_pred_index, n1, grad_C);
+    if (polygon_area < 0) {
+      for (int i = 0; i < 18; i++) {
+        grad_C[i] = -grad_C[i];
+      }
+    }
+    return fabs(polygon_area);
+  }
+}
+
+// convex_find and get the polygon_index_box_index
+__device__ inline void Jarvis_and_index(Point* in_poly, int& n_poly,
+                                        int* points_to_convex_ind) {
+  int n_input = n_poly;
+  Point input_poly[20];
+  for (int i = 0; i < n_input; i++) {
+    input_poly[i].x = in_poly[i].x;
+    input_poly[i].y = in_poly[i].y;
+  }
+  Point p_max, p_k;
+  int max_index, k_index;
+  int Stack[20], top1, top2;
+  float sign;
+  Point right_point[10], left_point[10];
+
+  for (int i = 0; i < n_poly; i++) {
+    if (in_poly[i].y < in_poly[0].y ||
+        in_poly[i].y == in_poly[0].y && in_poly[i].x < in_poly[0].x) {
+      Point* j = &(in_poly[0]);
+      Point* k = &(in_poly[i]);
+      swap1(j, k);
+    }
+    if (i == 0) {
+      p_max = in_poly[0];
+      max_index = 0;
+    }
+    if (in_poly[i].y > p_max.y ||
+        in_poly[i].y == p_max.y && in_poly[i].x > p_max.x) {
+      p_max = in_poly[i];
+      max_index = i;
+    }
+  }
+  if (max_index == 0) {
+    max_index = 1;
+    p_max = in_poly[max_index];
+  }
+
+  k_index = 0, Stack[0] = 0, top1 = 0;
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top1]], in_poly[i], p_k);
+      if ((sign > 0) || ((sign == 0) && (dis(in_poly[Stack[top1]], in_poly[i]) >
+                                         dis(in_poly[Stack[top1]], p_k)))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top1++;
+    Stack[top1] = k_index;
+  }
+  for (int i = 0; i <= top1; i++) {
+    right_point[i] = in_poly[Stack[i]];
+  }
+
+  k_index = 0, Stack[0] = 0, top2 = 0;
+
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top2]], in_poly[i], p_k);
+      if ((sign < 0) || (sign == 0) && (dis(in_poly[Stack[top2]], in_poly[i]) >
+                                        dis(in_poly[Stack[top2]], p_k))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top2++;
+    Stack[top2] = k_index;
+  }
+
+  for (int i = top2 - 1; i >= 0; i--) {
+    left_point[i] = in_poly[Stack[i]];
+  }
+
+  for (int i = 0; i < top1 + top2; i++) {
+    if (i <= top1) {
+      in_poly[i] = right_point[i];
+    } else {
+      in_poly[i] = left_point[top2 - (i - top1)];
+    }
+  }
+  n_poly = top1 + top2;
+  for (int i = 0; i < n_poly; i++) {
+    for (int j = 0; j < n_input; j++) {
+      if (point_same(in_poly[i], input_poly[j])) {
+        points_to_convex_ind[i] = j;
+        break;
+      }
+    }
+  }
+}
+
+template <typename T>
+__device__ inline float devrIoU(T const* const p, T const* const q,
+                                T* point_grad, const int idx) {
+  Point ps1[MAXN], ps2[MAXN];
+
+  Point convex[MAXN];
+  for (int i = 0; i < 9; i++) {
+    convex[i].x = (float)p[i * 2];
+    convex[i].y = (float)p[i * 2 + 1];
+  }
+  int n_convex = 9;
+  int points_to_convex_ind[9] = {-1, -1, -1, -1, -1, -1, -1, -1, -1};
+  Jarvis_and_index(convex, n_convex, points_to_convex_ind);
+
+  int n1 = n_convex;
+  int n2 = 4;
+
+  for (int i = 0; i < n1; i++) {
+    ps1[i].x = (float)convex[i].x;
+    ps1[i].y = (float)convex[i].y;
+  }
+
+  for (int i = 0; i < n2; i++) {
+    ps2[i].x = (float)q[i * 2];
+    ps2[i].y = (float)q[i * 2 + 1];
+  }
+
+  int polygon_index_box_index[18];
+  for (int i = 0; i < n1; i++) {
+    polygon_index_box_index[i] = i;
+    polygon_index_box_index[i + n1] = i;
+  }
+
+  float grad_A[18] = {};
+  float grad_AB[18] = {};
+  float grad_C[18] = {};
+
+  float inter_area = intersectAreaO(ps1, n1, ps2, n2, grad_AB);
+  float S_pred =
+      polygon_area_grad(ps1, n1, polygon_index_box_index, n1, grad_A);
+  if (S_pred < 0) {
+    for (int i = 0; i < n_convex * 2; i++) {
+      grad_A[i] = -grad_A[i];
+    }
+  }
+  float union_area = fabs(S_pred) + fabs(area(ps2, n2)) - inter_area;
+
+  float iou = inter_area / union_area;
+  float polygon_area = intersectAreaPoly(ps1, n1, ps2, n2, grad_C);
+
+  //    printf("%d:live\n", idx);
+  float rot_giou = iou - (polygon_area - union_area) / polygon_area;
+
+  float grad_point_temp[18] = {};
+
+  for (int i = 0; i < n_convex; i++) {
+    int grad_point = points_to_convex_ind[i];
+    grad_point_temp[2 * grad_point] =
+        (float)((union_area + inter_area) / (union_area * union_area) *
+                    grad_AB[2 * i] -
+                iou / union_area * grad_A[2 * i] -
+                1 / polygon_area * (grad_AB[2 * i] - grad_A[2 * i]) -
+                (union_area) / polygon_area / polygon_area * grad_C[2 * i]);
+    grad_point_temp[2 * grad_point + 1] =
+        (float)((union_area + inter_area) / (union_area * union_area) *
+                    grad_AB[2 * i + 1] -
+                iou / union_area * grad_A[2 * i + 1] -
+                1 / polygon_area * (grad_AB[2 * i + 1] - grad_A[2 * i + 1]) -
+                (union_area) / polygon_area / polygon_area * grad_C[2 * i + 1]);
+  }
+
+  for (int i = 0; i < 9; i++) {
+    point_grad[2 * i] = grad_point_temp[2 * i];
+    point_grad[2 * i + 1] = grad_point_temp[2 * i + 1];
+  }
+  return (float)rot_giou;
+}
+
+template <typename T>
+__global__ void convex_giou_cuda_kernel(const int ex_n_boxes,
+                                        const int gt_n_boxes, const T* ex_boxes,
+                                        const T* gt_boxes, T* point_grad) {
+  CUDA_1D_KERNEL_LOOP(index, ex_n_boxes) {
+    const T* cur_box = ex_boxes + index * 18;
+    const T* cur_gt_box = gt_boxes + index * 8;
+    T* cur_grad = point_grad + index * 19;
+    T giou = devrIoU(cur_box, cur_gt_box, cur_grad, threadIdx.x);
+    cur_grad[18] = giou;
+  }
+}
+
+__device__ inline int lineCross(Point a, Point b, Point c, Point d, Point& p) {
+  float s1, s2;
+  s1 = cross(a, b, c);
+  s2 = cross(a, b, d);
+  if (sig(s1) == 0 && sig(s2) == 0) return 2;
+  if (sig(s2 - s1) == 0) return 0;
+  p.x = (c.x * s2 - d.x * s1) / (s2 - s1);
+  p.y = (c.y * s2 - d.y * s1) / (s2 - s1);
+  return 1;
+}
+
+__device__ inline void polygon_cut(Point* p, int& n, Point a, Point b) {
+  Point pp[MAXN];
+  int m = 0;
+  p[n] = p[0];
+  for (int i = 0; i < n; i++) {
+    if (sig(cross(a, b, p[i])) > 0) {
+      pp[m] = p[i];
+      m++;
+    }
+    if (sig(cross(a, b, p[i])) != sig(cross(a, b, p[i + 1]))) {
+      lineCross(a, b, p[i], p[i + 1], pp[m]);
+      m++;
+    }
+  }
+  n = 0;
+  for (int i = 0; i < m; i++) {
+    if (!i || !(point_same(pp[i], pp[i - 1]))) {
+      p[n] = pp[i];
+      n++;
+    }
+  }
+
+  while (n > 1 && point_same(p[n - 1], p[0])) n--;
+}
+
+__device__ inline float intersectArea(Point a, Point b, Point c, Point d) {
+  Point o(0, 0);
+  int s1 = sig(cross(o, a, b));
+  int s2 = sig(cross(o, c, d));
+  if (s1 == 0 || s2 == 0) return 0.0;
+  if (s1 == -1) {
+    Point* i = &a;
+    Point* j = &b;
+    swap1(i, j);
+  }
+  if (s2 == -1) {
+    Point* i = &c;
+    Point* j = &d;
+    swap1(i, j);
+  }
+  Point p[10] = {o, a, b};
+  int n = 3;
+
+  polygon_cut(p, n, o, c);
+  polygon_cut(p, n, c, d);
+  polygon_cut(p, n, d, o);
+  float res = area(p, n);
+  if (s1 * s2 == -1) res = -res;
+  return res;
+}
+__device__ inline float intersectAreaO(Point* ps1, int n1, Point* ps2,
+                                        int n2) {
+  if (area(ps1, n1) < 0) reverse1(ps1, n1);
+  if (area(ps2, n2) < 0) reverse1(ps2, n2);
+  ps1[n1] = ps1[0];
+  ps2[n2] = ps2[0];
+  float res = 0;
+  for (int i = 0; i < n1; i++) {
+    for (int j = 0; j < n2; j++) {
+      res += intersectArea(ps1[i], ps1[i + 1], ps2[j], ps2[j + 1]);
+    }
+  }
+  return res;
+}
+
+template <typename T>
+__device__ inline float devrIoU(T const* const p, T const* const q) {
+  Point ps1[MAXN], ps2[MAXN];
+  Point convex[MAXN];
+  for (int i = 0; i < 9; i++) {
+    convex[i].x = (float)p[i * 2];
+    convex[i].y = (float)p[i * 2 + 1];
+  }
+  int n_convex = 9;
+  int points_to_convex_ind[9] = {-1, -1, -1, -1, -1, -1, -1, -1, -1};
+  Jarvis_and_index(convex, n_convex, points_to_convex_ind);
+  int n1 = n_convex;
+  for (int i = 0; i < n1; i++) {
+    ps1[i].x = (float)convex[i].x;
+    ps1[i].y = (float)convex[i].y;
+  }
+  int n2 = 4;
+  for (int i = 0; i < n2; i++) {
+    ps2[i].x = (float)q[i * 2];
+    ps2[i].y = (float)q[i * 2 + 1];
+  }
+  float inter_area = intersectAreaO(ps1, n1, ps2, n2);
+  float S_pred = area(ps1, n1);
+  float union_area = fabs(S_pred) + fabs(area(ps2, n2)) - inter_area;
+  float iou = inter_area / union_area;
+  return (float)iou;
+}
+
+template <typename T>
+__global__ void convex_iou_cuda_kernel(const int ex_n_boxes,
+                                       const int gt_n_boxes, const T* ex_boxes,
+                                       const T* gt_boxes, T* iou) {
+  CUDA_1D_KERNEL_LOOP(index, ex_n_boxes) {
+    const T* cur_box = ex_boxes + index * 18;
+    for (int i = 0; i < gt_n_boxes; i++) {
+      iou[index * gt_n_boxes + i] = devrIoU(cur_box, gt_boxes + i * 8);
+    }
+  }
+}
+#endif  // CONVEX_IOU_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/correlation_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/correlation_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..f910561ec309cd50fd6d4da131ab36cdf3ca963a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/correlation_cuda.cuh
@@ -0,0 +1,231 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/ClementPinard/Pytorch-Correlation-extension/blob/master/Correlation_Module/correlation_cuda_kernel.cu
+// Original licence: Under MIT License
+
+#ifndef CORRELATION_CUDA
+#define CORRELATION_CUDA
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#include <cuda.h>
+#include <cuda_runtime.h>
+// Using <torch/extension.h> is recommended in the official documentation in
+// https://pytorch.org/tutorials/advanced/cpp_extension.html#writing-the-c-op.
+// However, we use <torch/types.h> for compatibility with CUDA 9.0
+// Read https://github.com/pytorch/extension-cpp/issues/35 for more details.
+#include <torch/types.h>
+
+#include <iostream>
+#include <vector>
+
+using namespace torch;
+
+#define TensorAcc4R PackedTensorAccessor32<scalar_t, 4, RestrictPtrTraits>
+#define TensorAcc5R PackedTensorAccessor32<scalar_t, 5, RestrictPtrTraits>
+#define WITHIN_BOUNDS(x, y, H, W) (x >= 0 && x < H && y >= 0 && y < W)
+
+#define WARP_SIZE 32
+#define FULL_MASK 0xffffffff
+
+template <typename scalar_t>
+__global__ void correlation_forward_cuda_kernel(
+    const TensorAcc4R rInput1, const TensorAcc4R rInput2, TensorAcc5R output,
+    int kH, int kW, int patchH, int patchW, int padH, int padW, int dilationH,
+    int dilationW, int dilation_patchH, int dilation_patchW, int dH, int dW,
+    int oH, int oW) {
+  const int iH = rInput1.size(1);
+  const int iW = rInput1.size(2);
+  const int C = rInput1.size(3);
+
+  const int n = blockIdx.x;
+  const int h = blockIdx.y * blockDim.y + threadIdx.y;
+  const int w = blockIdx.z * blockDim.z + threadIdx.z;
+
+  if (h >= oH || w >= oW) return;
+
+  const int thread = threadIdx.x;
+
+  const int start_i = -padH + h * dH;
+  const int start_j = -padW + w * dW;
+
+  const int patchRadH = dilation_patchH * (patchH - 1) / 2;
+  const int patchRadW = dilation_patchW * (patchW - 1) / 2;
+
+  for (int ph = 0; ph < patchH; ++ph) {
+    int ph_dilated = ph * dilation_patchH - patchRadH;
+    for (int pw = 0; pw < patchW; ++pw) {
+      int pw_dilated = pw * dilation_patchW - patchRadW;
+      scalar_t prod_sum = 0.0f;
+      for (int i = 0; i < kH; ++i) {
+        int i1 = start_i + i * dilationH;
+        int i2 = i1 + ph_dilated;
+        if (WITHIN_BOUNDS(i1, i2, iH, iH)) {
+          for (int j = 0; j < kW; ++j) {
+            int j1 = start_j + j * dilationW;
+            int j2 = j1 + pw_dilated;
+            if (WITHIN_BOUNDS(j1, j2, iW, iW)) {
+              for (int c = thread; c < C; c += WARP_SIZE) {
+                scalar_t v1 = rInput1[n][i1][j1][c];
+                scalar_t v2 = rInput2[n][i2][j2][c];
+                prod_sum += v1 * v2;
+              }
+            }
+          }
+        }
+      }
+      // accumulate
+      for (int offset = 16; offset > 0; offset /= 2)
+#ifdef MMCV_WITH_HIP
+        prod_sum += __shfl_down(float(prod_sum), offset);
+#else
+        prod_sum += __shfl_down_sync(FULL_MASK, float(prod_sum), offset);
+#endif
+      if (thread == 0) {
+        output[n][ph][pw][h][w] = prod_sum;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void correlation_backward_cuda_kernel_input1(
+    const TensorAcc5R grad_output, const TensorAcc4R input2,
+    TensorAcc4R grad_input1, const int kH, const int kW, const int patchH,
+    const int patchW, const int padH, const int padW, const int dilationH,
+    const int dilationW, const int dilation_patchH, const int dilation_patchW,
+    const int dH, const int dW) {
+  const int iH = input2.size(1);
+  const int iW = input2.size(2);
+  const int C = input2.size(3);
+
+  const int H = grad_output.size(3);
+  const int W = grad_output.size(4);
+
+  const int patchRadH = (patchH - 1) / 2;
+  const int patchRadW = (patchW - 1) / 2;
+
+  const int n = blockIdx.x;
+  const int h = blockIdx.y;
+  const int w = blockIdx.z;
+
+  const int h_2 = h + padH;
+  const int w_2 = w + padW;
+  const int min_h = h_2 - kH * dilationH;
+  const int min_w = w_2 - kW * dilationW;
+
+  extern __shared__ __align__(sizeof(4)) unsigned char grad_cache_char[];
+  scalar_t *grad_cache = reinterpret_cast<scalar_t *>(grad_cache_char);
+  for (int i = threadIdx.x; i < patchH * patchW; i += blockDim.x) {
+    const int ph = i / patchW;
+    const int pw = i % patchW;
+    int i1 = h + dilation_patchH * (ph - patchRadH);
+    int j1 = w + dilation_patchW * (pw - patchRadW);
+
+    if (WITHIN_BOUNDS(i1, j1, iH, iW)) {
+      scalar_t grad_val = 0.0f;
+      for (int h_3 = h_2; h_3 > min_h; h_3 -= dilationH) {
+        int i2 = (h_3) / dH;
+        if (i2 * dH != h_3) continue;
+        for (int w_3 = w_2; w_3 > min_w; w_3 -= dilationW) {
+          int j2 = (w_3) / dW;
+          if (j2 * dW != w_3) continue;
+          if (WITHIN_BOUNDS(i2, j2, H, W)) {
+            grad_val += grad_output[n][ph][pw][i2][j2];
+          }
+        }
+      }
+      grad_cache[i] = grad_val;
+    }
+  }
+  __syncthreads();
+
+  for (int c = threadIdx.x; c < C; c += blockDim.x) {
+    scalar_t grad_input_val = 0.0f;
+    for (int ph = 0; ph < patchH; ++ph) {
+      int i1 = h + dilation_patchH * (ph - patchRadH);
+      for (int pw = 0; pw < patchW; ++pw) {
+        int j1 = w + dilation_patchW * (pw - patchRadW);
+        if (WITHIN_BOUNDS(i1, j1, iH, iW)) {
+          grad_input_val += input2[n][i1][j1][c] * grad_cache[ph * patchW + pw];
+        }
+      }
+    }
+    grad_input1[n][c][h][w] = grad_input_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void correlation_backward_cuda_kernel_input2(
+    const TensorAcc5R grad_output, const TensorAcc4R input1,
+    TensorAcc4R grad_input2, int kH, int kW, int patchH, int patchW, int padH,
+    int padW, int dilationH, int dilationW, int dilation_patchH,
+    int dilation_patchW, int dH, int dW) {
+  const int iH = input1.size(1);
+  const int iW = input1.size(2);
+  const int C = input1.size(3);
+
+  const int patchRadH = (patchH - 1) / 2;
+  const int patchRadW = (patchW - 1) / 2;
+
+  const int H = grad_output.size(3);
+  const int W = grad_output.size(4);
+
+  const int dilatedKH = kH * dilationH;
+  const int dilatedKW = kW * dilationW;
+
+  const int n = blockIdx.x;
+  const int h = blockIdx.y;
+  const int w = blockIdx.z;
+
+  extern __shared__ __align__(sizeof(4)) unsigned char grad_cache_char[];
+  scalar_t *grad_cache = reinterpret_cast<scalar_t *>(grad_cache_char);
+  for (int i = threadIdx.x; i < patchH * patchW; i += blockDim.x) {
+    const int ph = i / patchW;
+    const int pw = i % patchW;
+    int i1 = h - dilation_patchH * (ph - patchRadH);
+    int j1 = w - dilation_patchW * (pw - patchRadW);
+
+    if (WITHIN_BOUNDS(i1, j1, iH, iW)) {
+      scalar_t grad_val = 0.0f;
+
+      const int h_2 = i1 + padH;
+      const int w_2 = j1 + padW;
+      const int min_h = h_2 - dilatedKH;
+      const int min_w = w_2 - dilatedKW;
+
+      for (int h_3 = h_2; h_3 > min_h; h_3 -= dilationH) {
+        int i2 = (h_3) / dH;
+        if (i2 * dH != h_3) continue;
+        for (int w_3 = w_2; w_3 > min_w; w_3 -= dilationW) {
+          int j2 = (w_3) / dW;
+          if (j2 * dW != w_3) continue;
+          if (WITHIN_BOUNDS(i2, j2, H, W)) {
+            grad_val += grad_output[n][ph][pw][i2][j2];
+          }
+        }
+      }
+      grad_cache[i] = grad_val;
+    }
+  }
+  __syncthreads();
+
+  for (int c = threadIdx.x; c < C; c += blockDim.x) {
+    scalar_t grad_input_val = 0.0f;
+    for (int ph = 0; ph < patchH; ++ph) {
+      int i1 = h - dilation_patchH * (ph - patchRadH);
+      for (int pw = 0; pw < patchW; ++pw) {
+        int j1 = w - dilation_patchW * (pw - patchRadW);
+        if (WITHIN_BOUNDS(i1, j1, iH, iW)) {
+          grad_input_val += input1[n][i1][j1][c] * grad_cache[ph * patchW + pw];
+        }
+      }
+    }
+    grad_input2[n][c][h][w] = grad_input_val;
+  }
+}
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_conv_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_conv_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..6b4d1bbd85bad1b87ee5d6b8a3cd3b29e3cbc411
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_conv_cuda_kernel.cuh
@@ -0,0 +1,367 @@
+/*!
+ ******************* BEGIN Caffe Copyright Notice and Disclaimer
+ *****************
+ *
+ * COPYRIGHT
+ *
+ * All contributions by the University of California:
+ * Copyright (c) 2014-2017 The Regents of the University of California (Regents)
+ * All rights reserved.
+ *
+ * All other contributions:
+ * Copyright (c) 2014-2017, the respective contributors
+ * All rights reserved.
+ *
+ * Caffe uses a shared copyright model: each contributor holds copyright over
+ * their contributions to Caffe. The project versioning records all such
+ * contribution and copyright details. If a contributor wants to further mark
+ * their specific copyright on a particular contribution, they should indicate
+ * their copyright solely in the commit message of the change when it is
+ * committed.
+ *
+ * LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ *AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ *IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+ *FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ *DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ *SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ *CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ *OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * CONTRIBUTION AGREEMENT
+ *
+ * By contributing to the BVLC/caffe repository through pull-request, comment,
+ * or otherwise, the contributor releases their content to the
+ * license and copyright terms herein.
+ *
+ ***************** END Caffe Copyright Notice and Disclaimer
+ *********************
+ *
+ * Copyright (c) 2018 Microsoft
+ * Licensed under The MIT License [see LICENSE for details]
+ * \file modulated_deformable_im2col.cuh
+ * \brief Function definitions of converting an image to
+ * column matrix based on kernel, padding, dilation, and offset.
+ * These functions are mainly used in deformable convolution operators.
+ * \ref: https://arxiv.org/abs/1703.06211
+ * \author Yuwen Xiong, Haozhi Qi, Jifeng Dai, Xizhou Zhu, Han Hu, Dazhi Cheng
+ */
+
+// modified from
+// https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/mmdetection/mmdet/ops/dcn/src/deform_conv_cuda_kernel.cu
+
+#ifndef DEFORM_CONV_CUDA_KERNEL_CUH
+#define DEFORM_CONV_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+template <typename T>
+__device__ T deformable_im2col_bilinear(const T *input, const int data_width,
+                                        const int height, const int width, T h,
+                                        T w) {
+  if (h <= -1 || height <= h || w <= -1 || width <= w) {
+    return 0;
+  }
+
+  int h_low = floorf(h);
+  int w_low = floorf(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+
+  T lh = h - h_low;
+  T lw = w - w_low;
+  T hh = 1 - lh, hw = 1 - lw;
+
+  T v1 = 0;
+  if (h_low >= 0 && w_low >= 0) v1 = input[h_low * data_width + w_low];
+  T v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = input[h_low * data_width + w_high];
+  T v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = input[h_high * data_width + w_low];
+  T v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = input[h_high * data_width + w_high];
+
+  T w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+
+template <typename T>
+__device__ T get_gradient_weight(T argmax_h, T argmax_w, const int h,
+                                 const int w, const int height,
+                                 const int width) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+
+template <typename T>
+__device__ T get_coordinate_weight(T argmax_h, T argmax_w, const int height,
+                                   const int width, const T *im_data,
+                                   const int data_width, const int bp_dir) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+
+  if (bp_dir == 0) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  } else if (bp_dir == 1) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+
+  return weight;
+}
+
+template <typename T>
+__global__ void deformable_im2col_gpu_kernel(
+    const int n, const T *data_im, const T *data_offset, const int height,
+    const int width, const int kernel_h, const int kernel_w, const int pad_h,
+    const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int num_channels, const int deformable_group, const int height_col,
+    const int width_col, T *data_col) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+    T *data_col_ptr =
+        data_col +
+        ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    const T *data_im_ptr =
+        data_im + (b_col * num_channels + c_im) * height * width;
+    const T *data_offset_ptr =
+        data_offset + (b_col * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    for (int i = 0; i < kernel_h; ++i) {
+      for (int j = 0; j < kernel_w; ++j) {
+        const int data_offset_h_ptr =
+            ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr =
+            ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col +
+            w_col;
+        const T offset_h = data_offset_ptr[data_offset_h_ptr];
+        const T offset_w = data_offset_ptr[data_offset_w_ptr];
+        T val = static_cast<T>(0);
+        const T h_im = h_in + i * dilation_h + offset_h;
+        const T w_im = w_in + j * dilation_w + offset_w;
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+          val = deformable_im2col_bilinear(data_im_ptr, width, height, width,
+                                           h_im, w_im);
+        *data_col_ptr = val;
+        data_col_ptr += batch_size * height_col * width_col;
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void deformable_col2im_gpu_kernel(
+    const int n, const T *data_col, const T *data_offset, const int channels,
+    const int height, const int width, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int deformable_group, const int height_col, const int width_col,
+    T *grad_im) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i =
+        (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c =
+        index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / channel_per_deformable_group;
+
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const int data_offset_h_ptr =
+        ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr =
+        ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const T offset_h = data_offset_ptr[data_offset_h_ptr];
+    const T offset_w = data_offset_ptr[data_offset_w_ptr];
+    const T cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const T cur_inv_w_data = w_in + j * dilation_w + offset_w;
+
+    const T cur_top_grad = data_col[index];
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++) {
+      for (int dx = -2; dx <= 2; dx++) {
+        if (cur_h + dy >= 0 && cur_h + dy < height && cur_w + dx >= 0 &&
+            cur_w + dx < width && abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1) {
+          int cur_bottom_grad_pos =
+              ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          T weight = get_gradient_weight(cur_inv_h_data, cur_inv_w_data,
+                                         cur_h + dy, cur_w + dx, height, width);
+          atomicAdd(grad_im + cur_bottom_grad_pos, weight * cur_top_grad);
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void deformable_col2im_coord_gpu_kernel(
+    const int n, const T *data_col, const T *data_im, const T *data_offset,
+    const int channels, const int height, const int width, const int kernel_h,
+    const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
+    const int stride_w, const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int offset_channels, const int deformable_group, const int height_col,
+    const int width_col, T *grad_offset) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    T val = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const T *data_col_ptr = data_col + deformable_group_index *
+                                           channel_per_deformable_group *
+                                           batch_size * width_col * height_col;
+    const T *data_im_ptr =
+        data_im + (b * deformable_group + deformable_group_index) *
+                      channel_per_deformable_group / kernel_h / kernel_w *
+                      height * width;
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group;
+         col_c += col_step) {
+      const int col_pos =
+          (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i =
+          (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr =
+          (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr =
+          (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col +
+           w_out);
+      const T offset_h = data_offset_ptr[data_offset_h_ptr];
+      const T offset_w = data_offset_ptr[data_offset_w_ptr];
+      T inv_h = h_in + i * dilation_h + offset_h;
+      T inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+        inv_h = inv_w = -2;
+      const T weight = get_coordinate_weight(inv_h, inv_w, height, width,
+                                             data_im_ptr + cnt * height * width,
+                                             width, bp_dir);
+      val += weight * data_col_ptr[col_pos];
+      cnt += 1;
+    }
+
+    grad_offset[index] = val;
+  }
+}
+
+#endif  // DEFORM_CONV_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_roi_pool_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_roi_pool_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..86c4bc66dd2fb289340a4fb1714edb5db1e798c4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/deform_roi_pool_cuda_kernel.cuh
@@ -0,0 +1,186 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef DEFORM_ROI_POOL_CUDA_KERNEL_CUH
+#define DEFORM_ROI_POOL_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void deform_roi_pool_forward_cuda_kernel(
+    const int nthreads, const T* input, const T* rois, const T* offset,
+    T* output, const int pooled_height, const int pooled_width,
+    const T spatial_scale, const int sampling_ratio, const T gamma,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+
+    // Do not using rounding; this implementation detail is critical
+    T roi_start_w = offset_rois[1] * spatial_scale - 0.5;
+    T roi_start_h = offset_rois[2] * spatial_scale - 0.5;
+    T roi_end_w = offset_rois[3] * spatial_scale - 0.5;
+    T roi_end_h = offset_rois[4] * spatial_scale - 0.5;
+
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    const T* offset_input =
+        input + (roi_batch_ind * channels + c) * height * width;
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_height / pooled_height));
+    int roi_bin_grid_w =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_width / pooled_width));
+
+    // Compute roi offset
+    if (offset != NULL) {
+      const T* offset_cur_w = offset + n * pooled_width * pooled_height * 2 +
+                              ph * pooled_width + pw;
+      T offset_roi_w = gamma * roi_width * offset_cur_w[0];
+      T offset_roi_h =
+          gamma * roi_height * offset_cur_w[pooled_width * pooled_height];
+      roi_start_w += offset_roi_w;
+      roi_start_h += offset_roi_h;
+    }
+
+    // We do average pooling inside a bin
+    const T count = max(roi_bin_grid_h * roi_bin_grid_w, 1);
+    T output_val = 0.;
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+      const T y = roi_start_h + ph * bin_size_h +
+                  static_cast<T>(iy + .5f) * bin_size_h /
+                      static_cast<T>(roi_bin_grid_h);
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T x = roi_start_w + pw * bin_size_w +
+                    static_cast<T>(ix + .5f) * bin_size_w /
+                        static_cast<T>(roi_bin_grid_w);
+        T val = bilinear_interpolate(offset_input, height, width, y, x, index);
+        output_val += val;
+      }
+    }
+    output[index] = output_val / count;
+  }
+}
+
+template <typename T>
+__global__ void deform_roi_pool_backward_cuda_kernel(
+    const int nthreads, const T* grad_output, const T* input, const T* rois,
+    const T* offset, T* grad_input, T* grad_offset, const int pooled_height,
+    const int pooled_width, const T spatial_scale, const int sampling_ratio,
+    const T gamma, const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+    const T* offset_input =
+        input + ((roi_batch_ind * channels + c) * height * width);
+    T* offset_grad_input =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+
+    // Do not using rounding; this implementation detail is critical
+    T roi_start_w = offset_rois[1] * spatial_scale - 0.5;
+    T roi_start_h = offset_rois[2] * spatial_scale - 0.5;
+    T roi_end_w = offset_rois[3] * spatial_scale - 0.5;
+    T roi_end_h = offset_rois[4] * spatial_scale - 0.5;
+
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_height / pooled_height));
+    int roi_bin_grid_w =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_width / pooled_width));
+
+    // Compute roi offset
+    if (offset != NULL) {
+      const T* offset_cur_w = offset + n * pooled_width * pooled_height * 2 +
+                              ph * pooled_width + pw;
+      T offset_roi_w = gamma * roi_width * offset_cur_w[0];
+      T offset_roi_h =
+          gamma * roi_height * offset_cur_w[pooled_width * pooled_height];
+      roi_start_w += offset_roi_w;
+      roi_start_h += offset_roi_h;
+    }
+
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+    const T grad_output_this_bin = grad_output[index] / count;
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+      const T y = roi_start_h + ph * bin_size_h +
+                  static_cast<T>(iy + .5f) * bin_size_h /
+                      static_cast<T>(roi_bin_grid_h);
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T x = roi_start_w + pw * bin_size_w +
+                    static_cast<T>(ix + .5f) * bin_size_w /
+                        static_cast<T>(roi_bin_grid_w);
+
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high, index);
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          atomicAdd(offset_grad_input + y_low * width + x_low,
+                    grad_output_this_bin * w1);
+          atomicAdd(offset_grad_input + y_low * width + x_high,
+                    grad_output_this_bin * w2);
+          atomicAdd(offset_grad_input + y_high * width + x_low,
+                    grad_output_this_bin * w3);
+          atomicAdd(offset_grad_input + y_high * width + x_high,
+                    grad_output_this_bin * w4);
+          if (offset != NULL) {
+            T input_00 = offset_input[y_low * width + x_low];
+            T input_10 = offset_input[y_low * width + x_high];
+            T input_01 = offset_input[y_high * width + x_low];
+            T input_11 = offset_input[y_high * width + x_high];
+            T ogx = gamma * roi_width * grad_output_this_bin *
+                    (input_11 * (y - y_low) + input_10 * (y_high - y) +
+                     input_01 * (y_low - y) + input_00 * (y - y_high));
+            T ogy = gamma * roi_height * grad_output_this_bin *
+                    (input_11 * (x - x_low) + input_01 * (x_high - x) +
+                     input_10 * (x_low - x) + input_00 * (x - x_high));
+            atomicAdd(grad_offset + n * pooled_width * pooled_height * 2 +
+                          ph * pooled_width + pw,
+                      ogx);
+            atomicAdd(grad_offset + n * pooled_width * pooled_height * 2 +
+                          pooled_width * pooled_height + ph * pooled_width + pw,
+                      ogy);
+          }
+        }
+      }
+    }
+  }
+}
+
+#endif  // DEFORM_ROI_POOL_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/diff_iou_rotated_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/diff_iou_rotated_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..d30a1a73459965f4701f0a7f78fadfd8a99c2b73
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/diff_iou_rotated_cuda_kernel.cuh
@@ -0,0 +1,137 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Adapted from
+// https://github.com/lilanxiao/Rotated_IoU/cuda_op/sort_vert_kernel.cu  # noqa
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#define MAX_NUM_VERT_IDX 9
+#define INTERSECTION_OFFSET 8
+#define EPSILON 1e-8
+
+inline int opt_n_thread(int work_size) {
+  const int pow_2 = std::log(static_cast<float>(work_size)) / std::log(2.0);
+  return std::max(std::min(1 << pow_2, THREADS_PER_BLOCK), 1);
+}
+
+/*
+compare normalized vertices (vertices around (0,0))
+if vertex1 < vertex2 return true.
+order: minimum at x-aixs, become larger in anti-clockwise direction
+*/
+__device__ bool compare_vertices(float x1, float y1, float x2, float y2) {
+  if (fabs(x1 - x2) < EPSILON && fabs(y2 - y1) < EPSILON)
+    return false;  // if equal, return false
+
+  if (y1 > 0 && y2 < 0) return true;
+  if (y1 < 0 && y2 > 0) return false;
+
+  float n1 = x1 * x1 + y1 * y1 + EPSILON;
+  float n2 = x2 * x2 + y2 * y2 + EPSILON;
+  float diff = fabs(x1) * x1 / n1 - fabs(x2) * x2 / n2;
+
+  if (y1 > 0 && y2 > 0) {
+    if (diff > EPSILON)
+      return true;
+    else
+      return false;
+  }
+  if (y1 < 0 && y2 < 0) {
+    if (diff < EPSILON)
+      return true;
+    else
+      return false;
+  }
+  return false;
+}
+
+__global__ void diff_iou_rotated_sort_vertices_forward_cuda_kernel(
+    int b, int n, int m, const float *__restrict__ vertices,
+    const bool *__restrict__ mask, const int *__restrict__ num_valid,
+    int *__restrict__ idx) {
+  int batch_idx = blockIdx.x;
+  vertices += batch_idx * n * m * 2;
+  mask += batch_idx * n * m;
+  num_valid += batch_idx * n;
+  idx += batch_idx * n * MAX_NUM_VERT_IDX;
+
+  int index = threadIdx.x;  // index of polygon
+  int stride = blockDim.x;
+  for (int i = index; i < n; i += stride) {
+    int pad;  // index of arbitrary invalid intersection point (not box corner!)
+    for (int j = INTERSECTION_OFFSET; j < m; ++j) {
+      if (!mask[i * m + j]) {
+        pad = j;
+        break;
+      }
+    }
+    if (num_valid[i] < 3) {
+      // not enough vertices, take an invalid intersection point
+      // (zero padding)
+      for (int j = 0; j < MAX_NUM_VERT_IDX; ++j) {
+        idx[i * MAX_NUM_VERT_IDX + j] = pad;
+      }
+    } else {
+      // sort the valid vertices
+      // note the number of valid vertices is known
+      // note: check that num_valid[i] < MAX_NUM_VERT_IDX
+      for (int j = 0; j < num_valid[i]; ++j) {
+        // initialize with a "big" value
+        float x_min = 1;
+        float y_min = -EPSILON;
+        int i_take = 0;
+        int i2;
+        float x2, y2;
+        if (j != 0) {
+          i2 = idx[i * MAX_NUM_VERT_IDX + j - 1];
+          x2 = vertices[i * m * 2 + i2 * 2 + 0];
+          y2 = vertices[i * m * 2 + i2 * 2 + 1];
+        }
+        for (int k = 0; k < m; ++k) {
+          float x = vertices[i * m * 2 + k * 2 + 0];
+          float y = vertices[i * m * 2 + k * 2 + 1];
+          if (mask[i * m + k] && compare_vertices(x, y, x_min, y_min)) {
+            if ((j == 0) || (j != 0 && compare_vertices(x2, y2, x, y))) {
+              x_min = x;
+              y_min = y;
+              i_take = k;
+            }
+          }
+        }
+        idx[i * MAX_NUM_VERT_IDX + j] = i_take;
+      }
+      // duplicate the first idx
+      idx[i * MAX_NUM_VERT_IDX + num_valid[i]] = idx[i * MAX_NUM_VERT_IDX + 0];
+
+      // pad zeros
+      for (int j = num_valid[i] + 1; j < MAX_NUM_VERT_IDX; ++j) {
+        idx[i * MAX_NUM_VERT_IDX + j] = pad;
+      }
+
+      // for corner case: the two boxes are exactly the same.
+      // in this case, idx would have duplicate elements, which makes the
+      // shoelace formula broken because of the definition, the duplicate
+      // elements only appear in the first 8 positions (they are "corners in
+      // box", not "intersection of edges")
+      if (num_valid[i] == 8) {
+        int counter = 0;
+        for (int j = 0; j < 4; ++j) {
+          int check = idx[i * MAX_NUM_VERT_IDX + j];
+          for (int k = 4; k < INTERSECTION_OFFSET; ++k) {
+            if (idx[i * MAX_NUM_VERT_IDX + k] == check) counter++;
+          }
+        }
+        if (counter == 4) {
+          idx[i * MAX_NUM_VERT_IDX + 4] = idx[i * MAX_NUM_VERT_IDX + 0];
+          for (int j = 5; j < MAX_NUM_VERT_IDX; ++j) {
+            idx[i * MAX_NUM_VERT_IDX + j] = pad;
+          }
+        }
+      }
+
+      // TODO: still might need to cover some other corner cases :(
+    }
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/furthest_point_sample_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/furthest_point_sample_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..d3801a02c1c8f44874fb84fa884cc23bee25c331
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/furthest_point_sample_cuda_kernel.cuh
@@ -0,0 +1,152 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef FURTHEST_POINT_SAMPLE_CUDA_KERNEL_CUH
+#define FURTHEST_POINT_SAMPLE_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+__device__ void __update(float *__restrict__ dists, int *__restrict__ dists_i,
+                         int idx1, int idx2) {
+  const float v1 = dists[idx1], v2 = dists[idx2];
+  const int i1 = dists_i[idx1], i2 = dists_i[idx2];
+  dists[idx1] = max(v1, v2);
+  dists_i[idx1] = v2 > v1 ? i2 : i1;
+}
+
+template <unsigned int block_size>
+__global__ void furthest_point_sampling_forward_cuda_kernel(
+    int b, int n, int m, const float *__restrict__ dataset,
+    float *__restrict__ temp, int *__restrict__ idxs) {
+  // dataset: (B, N, 3)
+  // tmp: (B, N)
+  // output:
+  //      idx: (B, M)
+
+  if (m <= 0) return;
+  __shared__ float dists[block_size];
+  __shared__ int dists_i[block_size];
+
+  int batch_index = blockIdx.x;
+  dataset += batch_index * n * 3;
+  temp += batch_index * n;
+  idxs += batch_index * m;
+
+  int tid = threadIdx.x;
+  const int stride = block_size;
+
+  int old = 0;
+  if (threadIdx.x == 0) idxs[0] = old;
+
+  __syncthreads();
+  for (int j = 1; j < m; j++) {
+    int besti = 0;
+    float best = -1;
+    float x1 = dataset[old * 3 + 0];
+    float y1 = dataset[old * 3 + 1];
+    float z1 = dataset[old * 3 + 2];
+    for (int k = tid; k < n; k += stride) {
+      float x2, y2, z2;
+      x2 = dataset[k * 3 + 0];
+      y2 = dataset[k * 3 + 1];
+      z2 = dataset[k * 3 + 2];
+      // float mag = (x2 * x2) + (y2 * y2) + (z2 * z2);
+      // if (mag <= 1e-3)
+      // continue;
+
+      float d =
+          (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) * (z2 - z1);
+      float d2 = min(d, temp[k]);
+      temp[k] = d2;
+      besti = d2 > best ? k : besti;
+      best = d2 > best ? d2 : best;
+    }
+    dists[tid] = best;
+    dists_i[tid] = besti;
+    __syncthreads();
+
+#pragma unroll
+    for (int block_size_thres = 1024; block_size_thres >= 2;
+         block_size_thres >>= 1) {
+      const int tid_thres = block_size_thres / 2;
+      if (block_size >= block_size_thres && tid < tid_thres) {
+        __update(dists, dists_i, tid, tid + tid_thres);
+      }
+      __syncthreads();
+    }
+
+    old = dists_i[0];
+    if (tid == 0) idxs[j] = old;
+  }
+}
+
+// Modified from
+// https://github.com/qiqihaer/3DSSD-pytorch/blob/master/lib/pointnet2/src/sampling_gpu.cu
+template <unsigned int block_size>
+__global__ void furthest_point_sampling_with_dist_forward_cuda_kernel(
+    int b, int n, int m, const float *__restrict__ dataset,
+    float *__restrict__ temp, int *__restrict__ idxs) {
+  // dataset: (B, N, N)
+  // tmp: (B, N)
+  // output:
+  //      idx: (B, M)
+
+  if (m <= 0) return;
+  __shared__ float dists[block_size];
+  __shared__ int dists_i[block_size];
+
+  int batch_index = blockIdx.x;
+  dataset += batch_index * n * n;
+  temp += batch_index * n;
+  idxs += batch_index * m;
+
+  int tid = threadIdx.x;
+  const int stride = block_size;
+
+  int old = 0;
+  if (threadIdx.x == 0) idxs[0] = old;
+
+  __syncthreads();
+  for (int j = 1; j < m; j++) {
+    int besti = 0;
+    float best = -1;
+    // float x1 = dataset[old * 3 + 0];
+    // float y1 = dataset[old * 3 + 1];
+    // float z1 = dataset[old * 3 + 2];
+    for (int k = tid; k < n; k += stride) {
+      // float x2, y2, z2;
+      // x2 = dataset[k * 3 + 0];
+      // y2 = dataset[k * 3 + 1];
+      // z2 = dataset[k * 3 + 2];
+
+      // float d = (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) *
+      // (z2 - z1);
+      float d = dataset[old * n + k];
+
+      float d2 = min(d, temp[k]);
+      temp[k] = d2;
+      besti = d2 > best ? k : besti;
+      best = d2 > best ? d2 : best;
+    }
+    dists[tid] = best;
+    dists_i[tid] = besti;
+    __syncthreads();
+
+#pragma unroll
+    for (int block_size_thres = 1024; block_size_thres >= 2;
+         block_size_thres >>= 1) {
+      const int tid_thres = block_size_thres / 2;
+      if (block_size >= block_size_thres && tid < tid_thres) {
+        __update(dists, dists_i, tid, tid + tid_thres);
+      }
+      __syncthreads();
+    }
+
+    old = dists_i[0];
+    if (tid == 0) idxs[j] = old;
+  }
+}
+
+#endif  // FURTHEST_POINT_SAMPLE_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/gather_points_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/gather_points_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..6d932434cba245833e661b8c7e140601940bc35b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/gather_points_cuda_kernel.cuh
@@ -0,0 +1,58 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef GATHER_POINTS_CUDA_KERNEL_CUH
+#define GATHER_POINTS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#define TOTAL_THREADS 1024
+
+template <typename T>
+__global__ void gather_points_forward_cuda_kernel(int b, int c, int n, int m,
+                                                  const T *points,
+                                                  const int *__restrict__ idx,
+                                                  T *out) {
+  // points: (B, C, N)
+  // idx: (B, M)
+  // output:
+  //      out: (B, C, M)
+
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, m) {
+    if (bs_idx >= b || c_idx >= c) return;
+
+    out += bs_idx * c * m + c_idx * m + pt_idx;
+    idx += bs_idx * m + pt_idx;
+    points += bs_idx * c * n + c_idx * n;
+    out[0] = points[idx[0]];
+  }
+}
+
+template <typename T>
+__global__ void gather_points_backward_cuda_kernel(int b, int c, int n, int m,
+                                                   const T *grad_out,
+                                                   const int *__restrict__ idx,
+                                                   T *grad_points) {
+  // grad_out: (B, C, M)
+  // idx: (B, M)
+  // output:
+  //      grad_points: (B, C, N)
+
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, m) {
+    if (bs_idx >= b || c_idx >= c) return;
+
+    grad_out += bs_idx * c * m + c_idx * m + pt_idx;
+    idx += bs_idx * m + pt_idx;
+    grad_points += bs_idx * c * n + c_idx * n;
+
+    atomicAdd(grad_points + idx[0], grad_out[0]);
+  }
+}
+
+#endif  // GATHER_POINTS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/group_points_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/group_points_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..dfad66fc16d8759f614d7f36fa961673976b1d95
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/group_points_cuda_kernel.cuh
@@ -0,0 +1,65 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points_gpu.cu
+#ifndef GROUP_POINTS_CUDA_KERNEL_CUH
+#define GROUP_POINTS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void group_points_forward_cuda_kernel(int b, int c, int n,
+                                                 int npoints, int nsample,
+                                                 const T *points,
+                                                 const int *__restrict__ idx,
+                                                 T *out) {
+  // points: (B, C, N)
+  // idx: (B, npoints, nsample)
+  // output:
+  //      out: (B, C, npoints, nsample)
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(index, npoints * nsample) {
+    if (bs_idx >= b || c_idx >= c) return;
+
+    int pt_idx = index / nsample;
+    int sample_idx = index % nsample;
+
+    idx += bs_idx * npoints * nsample + pt_idx * nsample + sample_idx;
+    int in_idx = bs_idx * c * n + c_idx * n + idx[0];
+    int out_idx = bs_idx * c * npoints * nsample + c_idx * npoints * nsample +
+                  pt_idx * nsample + sample_idx;
+
+    out[out_idx] = points[in_idx];
+  }
+}
+
+template <typename T>
+__global__ void group_points_backward_cuda_kernel(int b, int c, int n,
+                                                  int npoints, int nsample,
+                                                  const T *grad_out,
+                                                  const int *__restrict__ idx,
+                                                  T *grad_points) {
+  // grad_out: (B, C, npoints, nsample)
+  // idx: (B, npoints, nsample)
+  // output:
+  //      grad_points: (B, C, N)
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(index, npoints * nsample) {
+    int pt_idx = index / nsample;
+    if (bs_idx >= b || c_idx >= c) return;
+
+    int sample_idx = index % nsample;
+    grad_out += bs_idx * c * npoints * nsample + c_idx * npoints * nsample +
+                pt_idx * nsample + sample_idx;
+    idx += bs_idx * npoints * nsample + pt_idx * nsample + sample_idx;
+
+    atomicAdd(grad_points + bs_idx * c * n + c_idx * n + idx[0], grad_out[0]);
+  }
+}
+
+#endif  // GROUP_POINTS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/iou3d_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/iou3d_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..46e7c7d0aa4ecdbbb4ae73c624e92178908c6348
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/iou3d_cuda_kernel.cuh
@@ -0,0 +1,367 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef IOU3D_CUDA_KERNEL_CUH
+#define IOU3D_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+const int THREADS_PER_BLOCK_IOU3D = 16;
+const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8;
+__device__ const float EPS = 1e-8;
+
+struct Point {
+  float x, y;
+  __device__ Point() {}
+  __device__ Point(float _x, float _y) { x = _x, y = _y; }
+
+  __device__ void set(float _x, float _y) {
+    x = _x;
+    y = _y;
+  }
+
+  __device__ Point operator+(const Point &b) const {
+    return Point(x + b.x, y + b.y);
+  }
+
+  __device__ Point operator-(const Point &b) const {
+    return Point(x - b.x, y - b.y);
+  }
+};
+
+__device__ inline float cross(const Point &a, const Point &b) {
+  return a.x * b.y - a.y * b.x;
+}
+
+__device__ inline float cross(const Point &p1, const Point &p2,
+                              const Point &p0) {
+  return (p1.x - p0.x) * (p2.y - p0.y) - (p2.x - p0.x) * (p1.y - p0.y);
+}
+
+__device__ int check_rect_cross(const Point &p1, const Point &p2,
+                                const Point &q1, const Point &q2) {
+  int ret = min(p1.x, p2.x) <= max(q1.x, q2.x) &&
+            min(q1.x, q2.x) <= max(p1.x, p2.x) &&
+            min(p1.y, p2.y) <= max(q1.y, q2.y) &&
+            min(q1.y, q2.y) <= max(p1.y, p2.y);
+  return ret;
+}
+
+__device__ inline int check_in_box2d(const float *box, const Point &p) {
+  // params: box (7) [x, y, z, dx, dy, dz, heading]
+  const float MARGIN = 1e-2;
+
+  float center_x = box[0], center_y = box[1];
+  // rotate the point in the opposite direction of box
+  float angle_cos = cos(-box[6]), angle_sin = sin(-box[6]);
+  float rot_x = (p.x - center_x) * angle_cos + (p.y - center_y) * (-angle_sin);
+  float rot_y = (p.x - center_x) * angle_sin + (p.y - center_y) * angle_cos;
+
+  return (fabs(rot_x) < box[3] / 2 + MARGIN &&
+          fabs(rot_y) < box[4] / 2 + MARGIN);
+}
+
+__device__ inline int intersection(const Point &p1, const Point &p0,
+                                   const Point &q1, const Point &q0,
+                                   Point &ans_point) {
+  // fast exclusion
+  if (check_rect_cross(p0, p1, q0, q1) == 0) return 0;
+
+  // check cross standing
+  float s1 = cross(q0, p1, p0);
+  float s2 = cross(p1, q1, p0);
+  float s3 = cross(p0, q1, q0);
+  float s4 = cross(q1, p1, q0);
+
+  if (!(s1 * s2 > 0 && s3 * s4 > 0)) return 0;
+
+  // calculate intersection of two lines
+  float s5 = cross(q1, p1, p0);
+  if (fabs(s5 - s1) > EPS) {
+    ans_point.x = (s5 * q0.x - s1 * q1.x) / (s5 - s1);
+    ans_point.y = (s5 * q0.y - s1 * q1.y) / (s5 - s1);
+
+  } else {
+    float a0 = p0.y - p1.y, b0 = p1.x - p0.x, c0 = p0.x * p1.y - p1.x * p0.y;
+    float a1 = q0.y - q1.y, b1 = q1.x - q0.x, c1 = q0.x * q1.y - q1.x * q0.y;
+    float D = a0 * b1 - a1 * b0;
+
+    ans_point.x = (b0 * c1 - b1 * c0) / D;
+    ans_point.y = (a1 * c0 - a0 * c1) / D;
+  }
+
+  return 1;
+}
+
+__device__ inline void rotate_around_center(const Point &center,
+                                            const float angle_cos,
+                                            const float angle_sin, Point &p) {
+  float new_x =
+      (p.x - center.x) * angle_cos - (p.y - center.y) * angle_sin + center.x;
+  float new_y =
+      (p.x - center.x) * angle_sin + (p.y - center.y) * angle_cos + center.y;
+  p.set(new_x, new_y);
+}
+
+__device__ inline int point_cmp(const Point &a, const Point &b,
+                                const Point &center) {
+  return atan2(a.y - center.y, a.x - center.x) >
+         atan2(b.y - center.y, b.x - center.x);
+}
+
+__device__ inline float box_overlap(const float *box_a, const float *box_b) {
+  // params box_a: [x, y, z, dx, dy, dz, heading]
+  // params box_b: [x, y, z, dx, dy, dz, heading]
+
+  float a_angle = box_a[6], b_angle = box_b[6];
+  float a_dx_half = box_a[3] / 2, b_dx_half = box_b[3] / 2,
+        a_dy_half = box_a[4] / 2, b_dy_half = box_b[4] / 2;
+  float a_x1 = box_a[0] - a_dx_half, a_y1 = box_a[1] - a_dy_half;
+  float a_x2 = box_a[0] + a_dx_half, a_y2 = box_a[1] + a_dy_half;
+  float b_x1 = box_b[0] - b_dx_half, b_y1 = box_b[1] - b_dy_half;
+  float b_x2 = box_b[0] + b_dx_half, b_y2 = box_b[1] + b_dy_half;
+
+  Point center_a(box_a[0], box_a[1]);
+  Point center_b(box_b[0], box_b[1]);
+
+  Point box_a_corners[5];
+  box_a_corners[0].set(a_x1, a_y1);
+  box_a_corners[1].set(a_x2, a_y1);
+  box_a_corners[2].set(a_x2, a_y2);
+  box_a_corners[3].set(a_x1, a_y2);
+
+  Point box_b_corners[5];
+  box_b_corners[0].set(b_x1, b_y1);
+  box_b_corners[1].set(b_x2, b_y1);
+  box_b_corners[2].set(b_x2, b_y2);
+  box_b_corners[3].set(b_x1, b_y2);
+
+  // get oriented corners
+  float a_angle_cos = cos(a_angle), a_angle_sin = sin(a_angle);
+  float b_angle_cos = cos(b_angle), b_angle_sin = sin(b_angle);
+
+  for (int k = 0; k < 4; k++) {
+    rotate_around_center(center_a, a_angle_cos, a_angle_sin, box_a_corners[k]);
+    rotate_around_center(center_b, b_angle_cos, b_angle_sin, box_b_corners[k]);
+  }
+
+  box_a_corners[4] = box_a_corners[0];
+  box_b_corners[4] = box_b_corners[0];
+
+  // get intersection of lines
+  Point cross_points[16];
+  Point poly_center;
+  int cnt = 0, flag = 0;
+
+  poly_center.set(0, 0);
+  for (int i = 0; i < 4; i++) {
+    for (int j = 0; j < 4; j++) {
+      flag = intersection(box_a_corners[i + 1], box_a_corners[i],
+                          box_b_corners[j + 1], box_b_corners[j],
+                          cross_points[cnt]);
+      if (flag) {
+        poly_center = poly_center + cross_points[cnt];
+        cnt++;
+      }
+    }
+  }
+
+  // check corners
+  for (int k = 0; k < 4; k++) {
+    if (check_in_box2d(box_a, box_b_corners[k])) {
+      poly_center = poly_center + box_b_corners[k];
+      cross_points[cnt] = box_b_corners[k];
+      cnt++;
+    }
+    if (check_in_box2d(box_b, box_a_corners[k])) {
+      poly_center = poly_center + box_a_corners[k];
+      cross_points[cnt] = box_a_corners[k];
+      cnt++;
+    }
+  }
+
+  poly_center.x /= cnt;
+  poly_center.y /= cnt;
+
+  // sort the points of polygon
+  Point temp;
+  for (int j = 0; j < cnt - 1; j++) {
+    for (int i = 0; i < cnt - j - 1; i++) {
+      if (point_cmp(cross_points[i], cross_points[i + 1], poly_center)) {
+        temp = cross_points[i];
+        cross_points[i] = cross_points[i + 1];
+        cross_points[i + 1] = temp;
+      }
+    }
+  }
+
+  // get the overlap areas
+  float area = 0;
+  for (int k = 0; k < cnt - 1; k++) {
+    area += cross(cross_points[k] - cross_points[0],
+                  cross_points[k + 1] - cross_points[0]);
+  }
+
+  return fabs(area) / 2.0;
+}
+
+__device__ inline float iou_bev(const float *box_a, const float *box_b) {
+  // params box_a: [x, y, z, dx, dy, dz, heading]
+  // params box_b: [x, y, z, dx, dy, dz, heading]
+  float sa = box_a[3] * box_a[4];
+  float sb = box_b[3] * box_b[4];
+  float s_overlap = box_overlap(box_a, box_b);
+  return s_overlap / fmaxf(sa + sb - s_overlap, EPS);
+}
+
+__global__ void iou3d_boxes_overlap_bev_forward_cuda_kernel(
+    const int num_a, const float *boxes_a, const int num_b,
+    const float *boxes_b, float *ans_overlap) {
+  // params boxes_a: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params boxes_b: (M, 7) [x, y, z, dx, dy, dz, heading]
+  CUDA_2D_KERNEL_LOOP(b_idx, num_b, a_idx, num_a) {
+    if (a_idx >= num_a || b_idx >= num_b) {
+      return;
+    }
+
+    const float *cur_box_a = boxes_a + a_idx * 7;
+    const float *cur_box_b = boxes_b + b_idx * 7;
+    float cur_overlap = box_overlap(cur_box_a, cur_box_b);
+    ans_overlap[a_idx * num_b + b_idx] = cur_overlap;
+  }
+}
+
+__global__ void iou3d_nms3d_forward_cuda_kernel(const int boxes_num,
+                                                const float nms_overlap_thresh,
+                                                const float *boxes,
+                                                unsigned long long *mask) {
+  // params: boxes (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params: mask (N, N/THREADS_PER_BLOCK_NMS)
+  const int blocks =
+      (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+  CUDA_2D_KERNEL_BLOCK_LOOP(col_start, blocks, row_start, blocks) {
+    // if (row_start > col_start) return;
+
+    const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS,
+                               THREADS_PER_BLOCK_NMS);
+    const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS,
+                               THREADS_PER_BLOCK_NMS);
+
+    __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 7];
+
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 7 + 0] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 0];
+      block_boxes[threadIdx.x * 7 + 1] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 1];
+      block_boxes[threadIdx.x * 7 + 2] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 2];
+      block_boxes[threadIdx.x * 7 + 3] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 3];
+      block_boxes[threadIdx.x * 7 + 4] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 4];
+      block_boxes[threadIdx.x * 7 + 5] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 5];
+      block_boxes[threadIdx.x * 7 + 6] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 6];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x;
+      const float *cur_box = boxes + cur_box_idx * 7;
+
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        if (iou_bev(cur_box, block_boxes + i * 7) > nms_overlap_thresh) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks =
+          (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+      mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  }
+}
+
+__device__ inline float iou_normal(float const *const a, float const *const b) {
+  // params: a: [x, y, z, dx, dy, dz, heading]
+  // params: b: [x, y, z, dx, dy, dz, heading]
+
+  float left = fmaxf(a[0] - a[3] / 2, b[0] - b[3] / 2),
+        right = fminf(a[0] + a[3] / 2, b[0] + b[3] / 2);
+  float top = fmaxf(a[1] - a[4] / 2, b[1] - b[4] / 2),
+        bottom = fminf(a[1] + a[4] / 2, b[1] + b[4] / 2);
+  float width = fmaxf(right - left, 0.f), height = fmaxf(bottom - top, 0.f);
+  float interS = width * height;
+  float Sa = a[3] * a[4];
+  float Sb = b[3] * b[4];
+  return interS / fmaxf(Sa + Sb - interS, EPS);
+}
+
+__global__ void iou3d_nms3d_normal_forward_cuda_kernel(
+    const int boxes_num, const float nms_overlap_thresh, const float *boxes,
+    unsigned long long *mask) {
+  // params: boxes (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params: mask (N, N/THREADS_PER_BLOCK_NMS)
+
+  const int blocks =
+      (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+  CUDA_2D_KERNEL_BLOCK_LOOP(col_start, blocks, row_start, blocks) {
+    // if (row_start > col_start) return;
+
+    const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS,
+                               THREADS_PER_BLOCK_NMS);
+    const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS,
+                               THREADS_PER_BLOCK_NMS);
+
+    __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 7];
+
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 7 + 0] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 0];
+      block_boxes[threadIdx.x * 7 + 1] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 1];
+      block_boxes[threadIdx.x * 7 + 2] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 2];
+      block_boxes[threadIdx.x * 7 + 3] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 3];
+      block_boxes[threadIdx.x * 7 + 4] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 4];
+      block_boxes[threadIdx.x * 7 + 5] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 5];
+      block_boxes[threadIdx.x * 7 + 6] =
+          boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 7 + 6];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x;
+      const float *cur_box = boxes + cur_box_idx * 7;
+
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        if (iou_normal(cur_box, block_boxes + i * 7) > nms_overlap_thresh) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks =
+          (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+      mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  }
+}
+
+#endif  // IOU3D_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/knn_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/knn_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..3cf52bb90eb27d02b28c52069c760c8a38f83f08
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/knn_cuda_kernel.cuh
@@ -0,0 +1,92 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/pointops/src/knnquery_heap
+#ifndef KNN_CUDA_KERNEL_CUH
+#define KNN_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+inline __device__ void swap_float(float *x, float *y) {
+  float tmp = *x;
+  *x = *y;
+  *y = tmp;
+}
+
+inline __device__ void swap_int(int *x, int *y) {
+  int tmp = *x;
+  *x = *y;
+  *y = tmp;
+}
+
+__device__ void reheap(float *dist, int *idx, int k) {
+  int root = 0;
+  int child = root * 2 + 1;
+  while (child < k) {
+    if (child + 1 < k && dist[child + 1] > dist[child]) child++;
+    if (dist[root] > dist[child]) return;
+    swap_float(&dist[root], &dist[child]);
+    swap_int(&idx[root], &idx[child]);
+    root = child;
+    child = root * 2 + 1;
+  }
+}
+
+__device__ void heap_sort(float *dist, int *idx, int k) {
+  int i;
+  for (i = k - 1; i > 0; i--) {
+    swap_float(&dist[0], &dist[i]);
+    swap_int(&idx[0], &idx[i]);
+    reheap(dist, idx, i);
+  }
+}
+
+// input: xyz (b, n, 3) new_xyz (b, m, 3)
+// output: idx (b, m, nsample) dist2 (b, m, nsample)
+template <typename T>
+__global__ void knn_forward_cuda_kernel(int b, int n, int m, int nsample,
+                                        const T *xyz, const T *new_xyz,
+                                        int *__restrict__ idx, T *dist2) {
+  int bs_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, m) {
+    if (bs_idx >= b) return;
+
+    new_xyz += bs_idx * m * 3 + pt_idx * 3;
+    xyz += bs_idx * n * 3;
+    idx += bs_idx * m * nsample + pt_idx * nsample;
+    dist2 += bs_idx * m * nsample + pt_idx * nsample;
+
+    T new_x = new_xyz[0];
+    T new_y = new_xyz[1];
+    T new_z = new_xyz[2];
+
+    float best_dist[100];
+    int best_idx[100];
+    for (int i = 0; i < nsample; i++) {
+      best_dist[i] = 1e10;
+      best_idx[i] = 0;
+    }
+    for (int i = 0; i < n; i++) {
+      T x = xyz[i * 3 + 0];
+      T y = xyz[i * 3 + 1];
+      T z = xyz[i * 3 + 2];
+      T d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) +
+             (new_z - z) * (new_z - z);
+      if (d2 < best_dist[0]) {
+        best_dist[0] = d2;
+        best_idx[0] = i;
+        reheap(best_dist, best_idx, nsample);
+      }
+    }
+    heap_sort(best_dist, best_idx, nsample);
+    for (int i = 0; i < nsample; i++) {
+      idx[i] = best_idx[i];
+      dist2[i] = best_dist[i];
+    }
+  }
+}
+
+#endif  // KNN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/masked_conv2d_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/masked_conv2d_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..1a0bd040e823eaaa79f96e525f961a8b8fbeafb5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/masked_conv2d_cuda_kernel.cuh
@@ -0,0 +1,62 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef MASKED_CONV2D_CUDA_KERNEL_CUH
+#define MASKED_CONV2D_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename scalar_t>
+__global__ void MaskedIm2colForward(const int n, const scalar_t *data_im,
+                                    const int height, const int width,
+                                    const int kernel_h, const int kernel_w,
+                                    const int pad_h, const int pad_w,
+                                    const int64_t *mask_h_idx,
+                                    const int64_t *mask_w_idx,
+                                    const int mask_cnt, scalar_t *data_col) {
+  // mask_cnt * channels
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    const int m_index = index % mask_cnt;
+    const int h_col = mask_h_idx[m_index];
+    const int w_col = mask_w_idx[m_index];
+    const int c_im = index / mask_cnt;
+    const int c_col = c_im * kernel_h * kernel_w;
+    const int h_offset = h_col - pad_h;
+    const int w_offset = w_col - pad_w;
+    scalar_t *data_col_ptr = data_col + c_col * mask_cnt + m_index;
+    for (int i = 0; i < kernel_h; ++i) {
+      int h_im = h_offset + i;
+      for (int j = 0; j < kernel_w; ++j) {
+        int w_im = w_offset + j;
+        if (h_im >= 0 && w_im >= 0 && h_im < height && w_im < width) {
+          *data_col_ptr =
+              (scalar_t)data_im[(c_im * height + h_im) * width + w_im];
+        } else {
+          *data_col_ptr = 0.0;
+        }
+        data_col_ptr += mask_cnt;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void MaskedCol2imForward(const int n, const scalar_t *data_col,
+                                    const int height, const int width,
+                                    const int channels,
+                                    const int64_t *mask_h_idx,
+                                    const int64_t *mask_w_idx,
+                                    const int mask_cnt, scalar_t *data_im) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    const int m_index = index % mask_cnt;
+    const int h_im = mask_h_idx[m_index];
+    const int w_im = mask_w_idx[m_index];
+    const int c_im = index / mask_cnt;
+    // compute the start and end of the output
+    data_im[(c_im * height + h_im) * width + w_im] = data_col[index];
+  }
+}
+
+#endif  // MASKED_CONV2D_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/min_area_polygons_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/min_area_polygons_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..b8e3b426d00af99fd6a76e8bb2df4388d882f6d9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/min_area_polygons_cuda.cuh
@@ -0,0 +1,300 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef MIN_AREA_POLYGONS_CUDA_KERNEL_CUH
+#define MIN_AREA_POLYGONS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+#define MAXN 20
+__device__ const float PI = 3.1415926;
+
+struct Point {
+  float x, y;
+  __device__ Point() {}
+  __device__ Point(float x, float y) : x(x), y(y) {}
+};
+
+__device__ inline void swap1(Point *a, Point *b) {
+  Point temp;
+  temp.x = a->x;
+  temp.y = a->y;
+
+  a->x = b->x;
+  a->y = b->y;
+
+  b->x = temp.x;
+  b->y = temp.y;
+}
+__device__ inline float cross(Point o, Point a, Point b) {
+  return (a.x - o.x) * (b.y - o.y) - (b.x - o.x) * (a.y - o.y);
+}
+
+__device__ inline float dis(Point a, Point b) {
+  return (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
+}
+__device__ inline void minBoundingRect(Point *ps, int n_points, float *minbox) {
+  float convex_points[2][MAXN];
+  for (int j = 0; j < n_points; j++) {
+    convex_points[0][j] = ps[j].x;
+  }
+  for (int j = 0; j < n_points; j++) {
+    convex_points[1][j] = ps[j].y;
+  }
+
+  Point edges[MAXN];
+  float edges_angles[MAXN];
+  float unique_angles[MAXN];
+  int n_edges = n_points - 1;
+  int n_unique = 0;
+  int unique_flag = 0;
+
+  for (int i = 0; i < n_edges; i++) {
+    edges[i].x = ps[i + 1].x - ps[i].x;
+    edges[i].y = ps[i + 1].y - ps[i].y;
+  }
+  for (int i = 0; i < n_edges; i++) {
+    edges_angles[i] = atan2((float)edges[i].y, (float)edges[i].x);
+    if (edges_angles[i] >= 0) {
+      edges_angles[i] = fmod((float)edges_angles[i], (float)PI / 2);
+    } else {
+      edges_angles[i] =
+          edges_angles[i] - (int)(edges_angles[i] / (PI / 2) - 1) * (PI / 2);
+    }
+  }
+  unique_angles[0] = edges_angles[0];
+  n_unique += 1;
+  for (int i = 1; i < n_edges; i++) {
+    for (int j = 0; j < n_unique; j++) {
+      if (edges_angles[i] == unique_angles[j]) {
+        unique_flag += 1;
+      }
+    }
+    if (unique_flag == 0) {
+      unique_angles[n_unique] = edges_angles[i];
+      n_unique += 1;
+      unique_flag = 0;
+    } else {
+      unique_flag = 0;
+    }
+  }
+
+  float minarea = 1e12;
+  for (int i = 0; i < n_unique; i++) {
+    float R[2][2];
+    float rot_points[2][MAXN];
+    R[0][0] = cos(unique_angles[i]);
+    R[0][1] = sin(unique_angles[i]);
+    R[1][0] = -sin(unique_angles[i]);
+    R[1][1] = cos(unique_angles[i]);
+    // R x Points
+    for (int m = 0; m < 2; m++) {
+      for (int n = 0; n < n_points; n++) {
+        float sum = 0.0;
+        for (int k = 0; k < 2; k++) {
+          sum = sum + R[m][k] * convex_points[k][n];
+        }
+        rot_points[m][n] = sum;
+      }
+    }
+
+    // xmin;
+    float xmin, ymin, xmax, ymax;
+    xmin = 1e12;
+    for (int j = 0; j < n_points; j++) {
+      if (isinf(rot_points[0][j]) || isnan(rot_points[0][j])) {
+        continue;
+      } else {
+        if (rot_points[0][j] < xmin) {
+          xmin = rot_points[0][j];
+        }
+      }
+    }
+    // ymin
+    ymin = 1e12;
+    for (int j = 0; j < n_points; j++) {
+      if (isinf(rot_points[1][j]) || isnan(rot_points[1][j])) {
+        continue;
+      } else {
+        if (rot_points[1][j] < ymin) {
+          ymin = rot_points[1][j];
+        }
+      }
+    }
+    // xmax
+    xmax = -1e12;
+    for (int j = 0; j < n_points; j++) {
+      if (isinf(rot_points[0][j]) || isnan(rot_points[0][j])) {
+        continue;
+      } else {
+        if (rot_points[0][j] > xmax) {
+          xmax = rot_points[0][j];
+        }
+      }
+    }
+    // ymax
+    ymax = -1e12;
+    for (int j = 0; j < n_points; j++) {
+      if (isinf(rot_points[1][j]) || isnan(rot_points[1][j])) {
+        continue;
+      } else {
+        if (rot_points[1][j] > ymax) {
+          ymax = rot_points[1][j];
+        }
+      }
+    }
+    float area = (xmax - xmin) * (ymax - ymin);
+    if (area < minarea) {
+      minarea = area;
+      minbox[0] = unique_angles[i];
+      minbox[1] = xmin;
+      minbox[2] = ymin;
+      minbox[3] = xmax;
+      minbox[4] = ymax;
+    }
+  }
+}
+
+// convex_find
+__device__ inline void Jarvis(Point *in_poly, int &n_poly) {
+  int n_input = n_poly;
+  Point input_poly[20];
+  for (int i = 0; i < n_input; i++) {
+    input_poly[i].x = in_poly[i].x;
+    input_poly[i].y = in_poly[i].y;
+  }
+  Point p_max, p_k;
+  int max_index, k_index;
+  int Stack[20], top1, top2;
+  // float sign;
+  float sign;
+  Point right_point[10], left_point[10];
+
+  for (int i = 0; i < n_poly; i++) {
+    if (in_poly[i].y < in_poly[0].y ||
+        in_poly[i].y == in_poly[0].y && in_poly[i].x < in_poly[0].x) {
+      Point *j = &(in_poly[0]);
+      Point *k = &(in_poly[i]);
+      swap1(j, k);
+    }
+    if (i == 0) {
+      p_max = in_poly[0];
+      max_index = 0;
+    }
+    if (in_poly[i].y > p_max.y ||
+        in_poly[i].y == p_max.y && in_poly[i].x > p_max.x) {
+      p_max = in_poly[i];
+      max_index = i;
+    }
+  }
+  if (max_index == 0) {
+    max_index = 1;
+    p_max = in_poly[max_index];
+  }
+
+  k_index = 0, Stack[0] = 0, top1 = 0;
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top1]], in_poly[i], p_k);
+      if ((sign > 0) || ((sign == 0) && (dis(in_poly[Stack[top1]], in_poly[i]) >
+                                         dis(in_poly[Stack[top1]], p_k)))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top1++;
+    Stack[top1] = k_index;
+  }
+
+  for (int i = 0; i <= top1; i++) {
+    right_point[i] = in_poly[Stack[i]];
+  }
+
+  k_index = 0, Stack[0] = 0, top2 = 0;
+
+  while (k_index != max_index) {
+    p_k = p_max;
+    k_index = max_index;
+    for (int i = 1; i < n_poly; i++) {
+      sign = cross(in_poly[Stack[top2]], in_poly[i], p_k);
+      if ((sign < 0) || (sign == 0) && (dis(in_poly[Stack[top2]], in_poly[i]) >
+                                        dis(in_poly[Stack[top2]], p_k))) {
+        p_k = in_poly[i];
+        k_index = i;
+      }
+    }
+    top2++;
+    Stack[top2] = k_index;
+  }
+
+  for (int i = top2 - 1; i >= 0; i--) {
+    left_point[i] = in_poly[Stack[i]];
+  }
+
+  for (int i = 0; i < top1 + top2; i++) {
+    if (i <= top1) {
+      in_poly[i] = right_point[i];
+    } else {
+      in_poly[i] = left_point[top2 - (i - top1)];
+    }
+  }
+  n_poly = top1 + top2;
+}
+
+template <typename T>
+__device__ inline void Findminbox(T const *const p, T *minpoints) {
+  Point ps1[MAXN];
+  Point convex[MAXN];
+  for (int i = 0; i < 9; i++) {
+    convex[i].x = p[i * 2];
+    convex[i].y = p[i * 2 + 1];
+  }
+  int n_convex = 9;
+  Jarvis(convex, n_convex);
+  int n1 = n_convex;
+  for (int i = 0; i < n1; i++) {
+    ps1[i].x = convex[i].x;
+    ps1[i].y = convex[i].y;
+  }
+  ps1[n1].x = convex[0].x;
+  ps1[n1].y = convex[0].y;
+
+  float minbbox[5] = {0};
+  minBoundingRect(ps1, n1 + 1, minbbox);
+  float angle = minbbox[0];
+  float xmin = minbbox[1];
+  float ymin = minbbox[2];
+  float xmax = minbbox[3];
+  float ymax = minbbox[4];
+  float R[2][2];
+
+  R[0][0] = cos(angle);
+  R[0][1] = sin(angle);
+  R[1][0] = -sin(angle);
+  R[1][1] = cos(angle);
+
+  minpoints[0] = xmax * R[0][0] + ymin * R[1][0];
+  minpoints[1] = xmax * R[0][1] + ymin * R[1][1];
+  minpoints[2] = xmin * R[0][0] + ymin * R[1][0];
+  minpoints[3] = xmin * R[0][1] + ymin * R[1][1];
+  minpoints[4] = xmin * R[0][0] + ymax * R[1][0];
+  minpoints[5] = xmin * R[0][1] + ymax * R[1][1];
+  minpoints[6] = xmax * R[0][0] + ymax * R[1][0];
+  minpoints[7] = xmax * R[0][1] + ymax * R[1][1];
+}
+
+template <typename T>
+__global__ void min_area_polygons_cuda_kernel(const int ex_n_boxes,
+                                              const T *ex_boxes, T *minbox) {
+  CUDA_1D_KERNEL_LOOP(index, ex_n_boxes) {
+    const T *cur_box = ex_boxes + index * 18;
+    T *cur_min_box = minbox + index * 8;
+    Findminbox(cur_box, cur_min_box);
+  }
+}
+
+#endif  // MIN_AREA_POLYGONS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/modulated_deform_conv_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/modulated_deform_conv_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..ca0e91a25246569bb7de04649ab4f5afe233670c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/modulated_deform_conv_cuda_kernel.cuh
@@ -0,0 +1,399 @@
+/*!
+ ******************* BEGIN Caffe Copyright Notice and Disclaimer
+ *****************
+ *
+ * COPYRIGHT
+ *
+ * All contributions by the University of California:
+ * Copyright (c) 2014-2017 The Regents of the University of California (Regents)
+ * All rights reserved.
+ *
+ * All other contributions:
+ * Copyright (c) 2014-2017, the respective contributors
+ * All rights reserved.
+ *
+ * Caffe uses a shared copyright model: each contributor holds copyright over
+ * their contributions to Caffe. The project versioning records all such
+ * contribution and copyright details. If a contributor wants to further mark
+ * their specific copyright on a particular contribution, they should indicate
+ * their copyright solely in the commit message of the change when it is
+ * committed.
+ *
+ * LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ *AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ *IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+ *FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ *DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ *SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ *CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ *OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * CONTRIBUTION AGREEMENT
+ *
+ * By contributing to the BVLC/caffe repository through pull-request, comment,
+ * or otherwise, the contributor releases their content to the
+ * license and copyright terms herein.
+ *
+ ***************** END Caffe Copyright Notice and Disclaimer
+ *********************
+ *
+ * Copyright (c) 2018 Microsoft
+ * Licensed under The MIT License [see LICENSE for details]
+ * \file modulated_deformable_im2col.cuh
+ * \brief Function definitions of converting an image to
+ * column matrix based on kernel, padding, dilation, and offset.
+ * These functions are mainly used in deformable convolution operators.
+ * \ref: https://arxiv.org/abs/1703.06211
+ * \author Yuwen Xiong, Haozhi Qi, Jifeng Dai, Xizhou Zhu, Han Hu, Dazhi Cheng
+ */
+
+// modified from
+// https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/mmdetection/mmdet/ops/dcn/src/deform_conv_cuda_kernel.cu
+
+#ifndef MODULATED_DEFORM_CONV_CUDA_KERNEL_CUH
+#define MODULATED_DEFORM_CONV_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+template <typename T>
+__device__ T dmcn_im2col_bilinear(const T *input, const int data_width,
+                                  const int height, const int width, T h, T w) {
+  int h_low = floorf(h);
+  int w_low = floorf(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+
+  T lh = h - h_low;
+  T lw = w - w_low;
+  T hh = 1 - lh, hw = 1 - lw;
+
+  T v1 = 0;
+  if (h_low >= 0 && w_low >= 0) v1 = input[h_low * data_width + w_low];
+  T v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = input[h_low * data_width + w_high];
+  T v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = input[h_high * data_width + w_low];
+  T v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = input[h_high * data_width + w_high];
+
+  T w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+
+template <typename T>
+__device__ T dmcn_get_gradient_weight(T argmax_h, T argmax_w, const int h,
+                                      const int w, const int height,
+                                      const int width) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+
+template <typename T>
+__device__ T dmcn_get_coordinate_weight(T argmax_h, T argmax_w,
+                                        const int height, const int width,
+                                        const T *im_data, const int data_width,
+                                        const int bp_dir) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+
+  if (bp_dir == 0) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  } else if (bp_dir == 1) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+
+  return weight;
+}
+
+template <typename T>
+__global__ void modulated_deformable_im2col_gpu_kernel(
+    const int n, const T *data_im, const T *data_offset, const T *data_mask,
+    const int height, const int width, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int num_channels, const int deformable_group, const int height_col,
+    const int width_col, T *data_col) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+
+    T *data_col_ptr =
+        data_col +
+        ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    const T *data_im_ptr =
+        data_im + (b_col * num_channels + c_im) * height * width;
+    const T *data_offset_ptr =
+        data_offset + (b_col * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    const T *data_mask_ptr =
+        data_mask + (b_col * deformable_group + deformable_group_index) *
+                        kernel_h * kernel_w * height_col * width_col;
+
+    for (int i = 0; i < kernel_h; ++i) {
+      for (int j = 0; j < kernel_w; ++j) {
+        const int data_offset_h_ptr =
+            ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr =
+            ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col +
+            w_col;
+        const int data_mask_hw_ptr =
+            ((i * kernel_w + j) * height_col + h_col) * width_col + w_col;
+        const T offset_h = data_offset_ptr[data_offset_h_ptr];
+        const T offset_w = data_offset_ptr[data_offset_w_ptr];
+        const T mask = data_mask_ptr[data_mask_hw_ptr];
+        T val = static_cast<T>(0);
+        const T h_im = h_in + i * dilation_h + offset_h;
+        const T w_im = w_in + j * dilation_w + offset_w;
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+          val = dmcn_im2col_bilinear(data_im_ptr, width, height, width, h_im,
+                                     w_im);
+        *data_col_ptr = val * mask;
+        data_col_ptr += batch_size * height_col * width_col;
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void modulated_deformable_col2im_gpu_kernel(
+    const int n, const T *data_col, const T *data_offset, const T *data_mask,
+    const int channels, const int height, const int width, const int kernel_h,
+    const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
+    const int stride_w, const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int deformable_group, const int height_col, const int width_col,
+    T *grad_im) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i =
+        (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c =
+        index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / channel_per_deformable_group;
+
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const T *data_mask_ptr =
+        data_mask + (b * deformable_group + deformable_group_index) * kernel_h *
+                        kernel_w * height_col * width_col;
+    const int data_offset_h_ptr =
+        ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr =
+        ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const int data_mask_hw_ptr =
+        ((i * kernel_w + j) * height_col + h_out) * width_col + w_out;
+    const T offset_h = data_offset_ptr[data_offset_h_ptr];
+    const T offset_w = data_offset_ptr[data_offset_w_ptr];
+    const T mask = data_mask_ptr[data_mask_hw_ptr];
+    const T cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const T cur_inv_w_data = w_in + j * dilation_w + offset_w;
+
+    const T cur_top_grad = data_col[index] * mask;
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++) {
+      for (int dx = -2; dx <= 2; dx++) {
+        if (cur_h + dy >= 0 && cur_h + dy < height && cur_w + dx >= 0 &&
+            cur_w + dx < width && abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1) {
+          int cur_bottom_grad_pos =
+              ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          T weight =
+              dmcn_get_gradient_weight(cur_inv_h_data, cur_inv_w_data,
+                                       cur_h + dy, cur_w + dx, height, width);
+          atomicAdd(grad_im + cur_bottom_grad_pos, weight * cur_top_grad);
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void modulated_deformable_col2im_coord_gpu_kernel(
+    const int n, const T *data_col, const T *data_im, const T *data_offset,
+    const T *data_mask, const int channels, const int height, const int width,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int channel_per_deformable_group,
+    const int batch_size, const int offset_channels, const int deformable_group,
+    const int height_col, const int width_col, T *grad_offset, T *grad_mask) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    T val = 0, mval = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const T *data_col_ptr = data_col + deformable_group_index *
+                                           channel_per_deformable_group *
+                                           batch_size * width_col * height_col;
+    const T *data_im_ptr =
+        data_im + (b * deformable_group + deformable_group_index) *
+                      channel_per_deformable_group / kernel_h / kernel_w *
+                      height * width;
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const T *data_mask_ptr =
+        data_mask + (b * deformable_group + deformable_group_index) * kernel_h *
+                        kernel_w * height_col * width_col;
+
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group;
+         col_c += col_step) {
+      const int col_pos =
+          (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i =
+          (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr =
+          (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr =
+          (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col +
+           w_out);
+      const int data_mask_hw_ptr =
+          (((i * kernel_w + j) * height_col + h_out) * width_col + w_out);
+      const T offset_h = data_offset_ptr[data_offset_h_ptr];
+      const T offset_w = data_offset_ptr[data_offset_w_ptr];
+      const T mask = data_mask_ptr[data_mask_hw_ptr];
+      T inv_h = h_in + i * dilation_h + offset_h;
+      T inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+        inv_h = inv_w = -2;
+      else
+        mval += data_col_ptr[col_pos] *
+                dmcn_im2col_bilinear(data_im_ptr + cnt * height * width, width,
+                                     height, width, inv_h, inv_w);
+      const T weight = dmcn_get_coordinate_weight(
+          inv_h, inv_w, height, width, data_im_ptr + cnt * height * width,
+          width, bp_dir);
+      val += weight * data_col_ptr[col_pos] * mask;
+      cnt += 1;
+    }
+    // KERNEL_ASSIGN(grad_offset[index], offset_req, val);
+    grad_offset[index] = val;
+    if (offset_c % 2 == 0)
+      // KERNEL_ASSIGN(grad_mask[(((b * deformable_group +
+      // deformable_group_index) * kernel_h * kernel_w + offset_c / 2) *
+      // height_col + h) * width_col + w], mask_req, mval);
+      grad_mask[(((b * deformable_group + deformable_group_index) * kernel_h *
+                      kernel_w +
+                  offset_c / 2) *
+                     height_col +
+                 h) *
+                    width_col +
+                w] = mval;
+  }
+}
+
+#endif  // MODULATED_DEFORM_CONV_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ms_deform_attn_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ms_deform_attn_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..12225ffdb3b1691ad9edabcd1663109f67ef1a6f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/ms_deform_attn_cuda_kernel.cuh
@@ -0,0 +1,801 @@
+/*!
+**************************************************************************************************
+* Deformable DETR
+* Copyright (c) 2020 SenseTime. All Rights Reserved.
+* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
+**************************************************************************************************
+* Modified from
+*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
+**************************************************************************************************
+*/
+#ifndef DEFORM_ATTN_CUDA_KERNEL
+#define DEFORM_ATTN_CUDA_KERNEL
+
+#include "common_cuda_helper.hpp"
+#include "pytorch_cuda_helper.hpp"
+
+template <typename scalar_t>
+__device__ scalar_t ms_deform_attn_im2col_bilinear(
+    const scalar_t *&bottom_data, const int &height, const int &width,
+    const int &nheads, const int &channels, const scalar_t &h,
+    const scalar_t &w, const int &m, const int &c) {
+  const int h_low = floorf(h);
+  const int w_low = floorf(w);
+  const int h_high = h_low + 1;
+  const int w_high = w_low + 1;
+
+  const scalar_t lh = h - h_low;
+  const scalar_t lw = w - w_low;
+  const scalar_t hh = 1 - lh, hw = 1 - lw;
+
+  const int w_stride = nheads * channels;
+  const int h_stride = width * w_stride;
+  const int h_low_ptr_offset = h_low * h_stride;
+  const int h_high_ptr_offset = h_low_ptr_offset + h_stride;
+  const int w_low_ptr_offset = w_low * w_stride;
+  const int w_high_ptr_offset = w_low_ptr_offset + w_stride;
+  const int base_ptr = m * channels + c;
+
+  scalar_t v1 = 0;
+  if (h_low >= 0 && w_low >= 0) {
+    const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;
+    v1 = bottom_data[ptr1];
+  }
+  scalar_t v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1) {
+    const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;
+    v2 = bottom_data[ptr2];
+  }
+  scalar_t v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0) {
+    const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;
+    v3 = bottom_data[ptr3];
+  }
+  scalar_t v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;
+    v4 = bottom_data[ptr4];
+  }
+
+  const scalar_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+
+  const scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+
+template <typename scalar_t>
+__device__ void ms_deform_attn_col2im_bilinear(
+    const scalar_t *&bottom_data, const int &height, const int &width,
+    const int &nheads, const int &channels, const scalar_t &h,
+    const scalar_t &w, const int &m, const int &c, const scalar_t &top_grad,
+    const scalar_t &attn_weight, scalar_t *&grad_value,
+    scalar_t *grad_sampling_loc, scalar_t *grad_attn_weight) {
+  const int h_low = floorf(h);
+  const int w_low = floorf(w);
+  const int h_high = h_low + 1;
+  const int w_high = w_low + 1;
+
+  const scalar_t lh = h - h_low;
+  const scalar_t lw = w - w_low;
+  const scalar_t hh = 1 - lh, hw = 1 - lw;
+
+  const int w_stride = nheads * channels;
+  const int h_stride = width * w_stride;
+  const int h_low_ptr_offset = h_low * h_stride;
+  const int h_high_ptr_offset = h_low_ptr_offset + h_stride;
+  const int w_low_ptr_offset = w_low * w_stride;
+  const int w_high_ptr_offset = w_low_ptr_offset + w_stride;
+  const int base_ptr = m * channels + c;
+
+  const scalar_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+  const scalar_t top_grad_value = top_grad * attn_weight;
+  scalar_t grad_h_weight = 0, grad_w_weight = 0;
+
+  scalar_t v1 = 0;
+  if (h_low >= 0 && w_low >= 0) {
+    const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;
+    v1 = bottom_data[ptr1];
+    grad_h_weight -= hw * v1;
+    grad_w_weight -= hh * v1;
+    atomicAdd(grad_value + ptr1, w1 * top_grad_value);
+  }
+  scalar_t v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1) {
+    const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;
+    v2 = bottom_data[ptr2];
+    grad_h_weight -= lw * v2;
+    grad_w_weight += hh * v2;
+    atomicAdd(grad_value + ptr2, w2 * top_grad_value);
+  }
+  scalar_t v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0) {
+    const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;
+    v3 = bottom_data[ptr3];
+    grad_h_weight += hw * v3;
+    grad_w_weight -= lh * v3;
+    atomicAdd(grad_value + ptr3, w3 * top_grad_value);
+  }
+  scalar_t v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;
+    v4 = bottom_data[ptr4];
+    grad_h_weight += lw * v4;
+    grad_w_weight += lh * v4;
+    atomicAdd(grad_value + ptr4, w4 * top_grad_value);
+  }
+
+  const scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  *grad_attn_weight = top_grad * val;
+  *grad_sampling_loc = width * grad_w_weight * top_grad_value;
+  *(grad_sampling_loc + 1) = height * grad_h_weight * top_grad_value;
+}
+
+template <typename scalar_t>
+__device__ void ms_deform_attn_col2im_bilinear_gm(
+    const scalar_t *&bottom_data, const int &height, const int &width,
+    const int &nheads, const int &channels, const scalar_t &h,
+    const scalar_t &w, const int &m, const int &c, const scalar_t &top_grad,
+    const scalar_t &attn_weight, scalar_t *&grad_value,
+    scalar_t *grad_sampling_loc, scalar_t *grad_attn_weight) {
+  const int h_low = floorf(h);
+  const int w_low = floorf(w);
+  const int h_high = h_low + 1;
+  const int w_high = w_low + 1;
+
+  const scalar_t lh = h - h_low;
+  const scalar_t lw = w - w_low;
+  const scalar_t hh = 1 - lh, hw = 1 - lw;
+
+  const int w_stride = nheads * channels;
+  const int h_stride = width * w_stride;
+  const int h_low_ptr_offset = h_low * h_stride;
+  const int h_high_ptr_offset = h_low_ptr_offset + h_stride;
+  const int w_low_ptr_offset = w_low * w_stride;
+  const int w_high_ptr_offset = w_low_ptr_offset + w_stride;
+  const int base_ptr = m * channels + c;
+
+  const scalar_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+  const scalar_t top_grad_value = top_grad * attn_weight;
+  scalar_t grad_h_weight = 0, grad_w_weight = 0;
+
+  scalar_t v1 = 0;
+  if (h_low >= 0 && w_low >= 0) {
+    const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;
+    v1 = bottom_data[ptr1];
+    grad_h_weight -= hw * v1;
+    grad_w_weight -= hh * v1;
+    atomicAdd(grad_value + ptr1, w1 * top_grad_value);
+  }
+  scalar_t v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1) {
+    const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;
+    v2 = bottom_data[ptr2];
+    grad_h_weight -= lw * v2;
+    grad_w_weight += hh * v2;
+    atomicAdd(grad_value + ptr2, w2 * top_grad_value);
+  }
+  scalar_t v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0) {
+    const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;
+    v3 = bottom_data[ptr3];
+    grad_h_weight += hw * v3;
+    grad_w_weight -= lh * v3;
+    atomicAdd(grad_value + ptr3, w3 * top_grad_value);
+  }
+  scalar_t v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;
+    v4 = bottom_data[ptr4];
+    grad_h_weight += lw * v4;
+    grad_w_weight += lh * v4;
+    atomicAdd(grad_value + ptr4, w4 * top_grad_value);
+  }
+
+  const scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  atomicAdd(grad_attn_weight, top_grad * val);
+  atomicAdd(grad_sampling_loc, width * grad_w_weight * top_grad_value);
+  atomicAdd(grad_sampling_loc + 1, height * grad_h_weight * top_grad_value);
+}
+
+template <typename scalar_t>
+__global__ void ms_deformable_im2col_gpu_kernel(
+    const int n, const scalar_t *data_value, const int64_t *data_spatial_shapes,
+    const int64_t *data_level_start_index, const scalar_t *data_sampling_loc,
+    const scalar_t *data_attn_weight, const int batch_size,
+    const int spatial_size, const int num_heads, const int channels,
+    const int num_levels, const int num_query, const int num_point,
+    scalar_t *data_col) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    scalar_t *data_col_ptr = data_col + index;
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+    scalar_t col = 0;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const scalar_t *data_value_ptr =
+          data_value +
+          (data_value_ptr_init_offset + level_start_id * qid_stride);
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          col += ms_deform_attn_im2col_bilinear(data_value_ptr, spatial_h,
+                                                spatial_w, num_heads, channels,
+                                                h_im, w_im, m_col, c_col) *
+                 weight;
+        }
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+      }
+    }
+    *data_col_ptr = col;
+  }
+}
+
+template <typename scalar_t, unsigned int blockSize>
+__global__ void ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  __shared__ scalar_t cache_grad_sampling_loc[blockSize * 2];
+  __shared__ scalar_t cache_grad_attn_weight[blockSize];
+  unsigned int tid = threadIdx.x;
+  const int qid_stride = num_heads * channels;
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0;
+        *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0;
+        *(cache_grad_attn_weight + threadIdx.x) = 0;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              cache_grad_sampling_loc + (threadIdx.x << 1),
+              cache_grad_attn_weight + threadIdx.x);
+        }
+
+        __syncthreads();
+        if (tid == 0) {
+          scalar_t _grad_w = cache_grad_sampling_loc[0],
+                   _grad_h = cache_grad_sampling_loc[1],
+                   _grad_a = cache_grad_attn_weight[0];
+          int sid = 2;
+          for (unsigned int _tid = 1; _tid < blockSize; ++_tid) {
+            _grad_w += cache_grad_sampling_loc[sid];
+            _grad_h += cache_grad_sampling_loc[sid + 1];
+            _grad_a += cache_grad_attn_weight[_tid];
+            sid += 2;
+          }
+
+          *grad_sampling_loc_out = _grad_w;
+          *(grad_sampling_loc_out + 1) = _grad_h;
+          *grad_attn_weight_out = _grad_a;
+        }
+        __syncthreads();
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+
+template <typename scalar_t, unsigned int blockSize>
+__global__ void ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  __shared__ scalar_t cache_grad_sampling_loc[blockSize * 2];
+  __shared__ scalar_t cache_grad_attn_weight[blockSize];
+  unsigned int tid = threadIdx.x;
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0;
+        *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0;
+        *(cache_grad_attn_weight + threadIdx.x) = 0;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              cache_grad_sampling_loc + (threadIdx.x << 1),
+              cache_grad_attn_weight + threadIdx.x);
+        }
+
+        __syncthreads();
+
+        for (unsigned int s = blockSize / 2; s > 0; s >>= 1) {
+          if (tid < s) {
+            const unsigned int xid1 = tid << 1;
+            const unsigned int xid2 = (tid + s) << 1;
+            cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s];
+            cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2];
+            cache_grad_sampling_loc[xid1 + 1] +=
+                cache_grad_sampling_loc[xid2 + 1];
+          }
+          __syncthreads();
+        }
+
+        if (tid == 0) {
+          *grad_sampling_loc_out = cache_grad_sampling_loc[0];
+          *(grad_sampling_loc_out + 1) = cache_grad_sampling_loc[1];
+          *grad_attn_weight_out = cache_grad_attn_weight[0];
+        }
+        __syncthreads();
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void ms_deformable_col2im_gpu_kernel_shm_reduce_v1(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  extern __shared__ int _s[];
+  scalar_t *cache_grad_sampling_loc = reinterpret_cast<scalar_t *>(_s);
+  scalar_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x;
+  unsigned int tid = threadIdx.x;
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0;
+        *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0;
+        *(cache_grad_attn_weight + threadIdx.x) = 0;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              cache_grad_sampling_loc + (threadIdx.x << 1),
+              cache_grad_attn_weight + threadIdx.x);
+        }
+
+        __syncthreads();
+        if (tid == 0) {
+          scalar_t _grad_w = cache_grad_sampling_loc[0],
+                   _grad_h = cache_grad_sampling_loc[1],
+                   _grad_a = cache_grad_attn_weight[0];
+          int sid = 2;
+          for (unsigned int _tid = 1; _tid < blockDim.x; ++_tid) {
+            _grad_w += cache_grad_sampling_loc[sid];
+            _grad_h += cache_grad_sampling_loc[sid + 1];
+            _grad_a += cache_grad_attn_weight[_tid];
+            sid += 2;
+          }
+
+          *grad_sampling_loc_out = _grad_w;
+          *(grad_sampling_loc_out + 1) = _grad_h;
+          *grad_attn_weight_out = _grad_a;
+        }
+        __syncthreads();
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void ms_deformable_col2im_gpu_kernel_shm_reduce_v2(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  extern __shared__ int _s[];
+  scalar_t *cache_grad_sampling_loc = reinterpret_cast<scalar_t *>(_s);
+  scalar_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x;
+  unsigned int tid = threadIdx.x;
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0;
+        *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0;
+        *(cache_grad_attn_weight + threadIdx.x) = 0;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              cache_grad_sampling_loc + (threadIdx.x << 1),
+              cache_grad_attn_weight + threadIdx.x);
+        }
+
+        __syncthreads();
+
+        for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0;
+             s >>= 1, spre >>= 1) {
+          if (tid < s) {
+            const unsigned int xid1 = tid << 1;
+            const unsigned int xid2 = (tid + s) << 1;
+            cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s];
+            cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2];
+            cache_grad_sampling_loc[xid1 + 1] +=
+                cache_grad_sampling_loc[xid2 + 1];
+            if (tid + (s << 1) < spre) {
+              cache_grad_attn_weight[tid] +=
+                  cache_grad_attn_weight[tid + (s << 1)];
+              cache_grad_sampling_loc[xid1] +=
+                  cache_grad_sampling_loc[xid2 + (s << 1)];
+              cache_grad_sampling_loc[xid1 + 1] +=
+                  cache_grad_sampling_loc[xid2 + 1 + (s << 1)];
+            }
+          }
+          __syncthreads();
+        }
+
+        if (tid == 0) {
+          *grad_sampling_loc_out = cache_grad_sampling_loc[0];
+          *(grad_sampling_loc_out + 1) = cache_grad_sampling_loc[1];
+          *grad_attn_weight_out = cache_grad_attn_weight[0];
+        }
+        __syncthreads();
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void ms_deformable_col2im_gpu_kernel_shm_reduce_v2_multi_blocks(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  extern __shared__ int _s[];
+  scalar_t *cache_grad_sampling_loc = reinterpret_cast<scalar_t *>(_s);
+  scalar_t *cache_grad_attn_weight = cache_grad_sampling_loc + 2 * blockDim.x;
+  unsigned int tid = threadIdx.x;
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        *(cache_grad_sampling_loc + (threadIdx.x << 1)) = 0;
+        *(cache_grad_sampling_loc + ((threadIdx.x << 1) + 1)) = 0;
+        *(cache_grad_attn_weight + threadIdx.x) = 0;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              cache_grad_sampling_loc + (threadIdx.x << 1),
+              cache_grad_attn_weight + threadIdx.x);
+        }
+
+        __syncthreads();
+
+        for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0;
+             s >>= 1, spre >>= 1) {
+          if (tid < s) {
+            const unsigned int xid1 = tid << 1;
+            const unsigned int xid2 = (tid + s) << 1;
+            cache_grad_attn_weight[tid] += cache_grad_attn_weight[tid + s];
+            cache_grad_sampling_loc[xid1] += cache_grad_sampling_loc[xid2];
+            cache_grad_sampling_loc[xid1 + 1] +=
+                cache_grad_sampling_loc[xid2 + 1];
+            if (tid + (s << 1) < spre) {
+              cache_grad_attn_weight[tid] +=
+                  cache_grad_attn_weight[tid + (s << 1)];
+              cache_grad_sampling_loc[xid1] +=
+                  cache_grad_sampling_loc[xid2 + (s << 1)];
+              cache_grad_sampling_loc[xid1 + 1] +=
+                  cache_grad_sampling_loc[xid2 + 1 + (s << 1)];
+            }
+          }
+          __syncthreads();
+        }
+
+        if (tid == 0) {
+          atomicAdd(grad_sampling_loc_out, cache_grad_sampling_loc[0]);
+          atomicAdd(grad_sampling_loc_out + 1, cache_grad_sampling_loc[1]);
+          atomicAdd(grad_attn_weight_out, cache_grad_attn_weight[0]);
+        }
+        __syncthreads();
+
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+
+template <typename scalar_t>
+__global__ void ms_deformable_col2im_gpu_kernel_gm(
+    const int n, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  CUDA_1D_KERNEL_LOOP(index, n) {
+    int _temp = index;
+    const int c_col = _temp % channels;
+    _temp /= channels;
+    const int sampling_index = _temp;
+    const int m_col = _temp % num_heads;
+    _temp /= num_heads;
+    _temp /= num_query;
+    const int b_col = _temp;
+
+    const scalar_t top_grad = grad_col[index];
+
+    int data_weight_ptr = sampling_index * num_levels * num_point;
+    int data_loc_w_ptr = data_weight_ptr << 1;
+    const int grad_sampling_ptr = data_weight_ptr;
+    scalar_t *grad_sampling_loc_out =
+        grad_sampling_loc + (grad_sampling_ptr << 1);
+    scalar_t *grad_attn_weight_out = grad_attn_weight + grad_sampling_ptr;
+    const int grad_weight_stride = 1;
+    const int grad_loc_stride = 2;
+    const int qid_stride = num_heads * channels;
+    const int data_value_ptr_init_offset = b_col * spatial_size * qid_stride;
+
+    for (int l_col = 0; l_col < num_levels; ++l_col) {
+      const int level_start_id = data_level_start_index[l_col];
+      const int spatial_h_ptr = l_col << 1;
+      const int spatial_h = data_spatial_shapes[spatial_h_ptr];
+      const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
+      const int value_ptr_offset =
+          data_value_ptr_init_offset + level_start_id * qid_stride;
+      const scalar_t *data_value_ptr = data_value + value_ptr_offset;
+      scalar_t *grad_value_ptr = grad_value + value_ptr_offset;
+
+      for (int p_col = 0; p_col < num_point; ++p_col) {
+        const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
+        const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
+        const scalar_t weight = data_attn_weight[data_weight_ptr];
+
+        const scalar_t h_im = loc_h * spatial_h - 0.5;
+        const scalar_t w_im = loc_w * spatial_w - 0.5;
+        if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+          ms_deform_attn_col2im_bilinear_gm(
+              data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im,
+              w_im, m_col, c_col, top_grad, weight, grad_value_ptr,
+              grad_sampling_loc_out, grad_attn_weight_out);
+        }
+        data_weight_ptr += 1;
+        data_loc_w_ptr += 2;
+        grad_attn_weight_out += grad_weight_stride;
+        grad_sampling_loc_out += grad_loc_stride;
+      }
+    }
+  }
+}
+#endif  // DEFORM_ATTN_CUDA_KERNEL
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..281d9f0b409f54260a81a79ad96ab09fde9580ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_cuda_kernel.cuh
@@ -0,0 +1,117 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef NMS_CUDA_KERNEL_CUH
+#define NMS_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+int const threadsPerBlock = sizeof(unsigned long long int) * 8;
+
+__device__ inline bool devIoU(float const *const a, float const *const b,
+                              const int offset, const float threshold) {
+  float left = fmaxf(a[0], b[0]), right = fminf(a[2], b[2]);
+  float top = fmaxf(a[1], b[1]), bottom = fminf(a[3], b[3]);
+  float width = fmaxf(right - left + offset, 0.f),
+        height = fmaxf(bottom - top + offset, 0.f);
+  float interS = width * height;
+  float Sa = (a[2] - a[0] + offset) * (a[3] - a[1] + offset);
+  float Sb = (b[2] - b[0] + offset) * (b[3] - b[1] + offset);
+  return interS > threshold * (Sa + Sb - interS);
+}
+
+__global__ static void nms_cuda(const int n_boxes, const float iou_threshold,
+                                const int offset, const float *dev_boxes,
+                                unsigned long long *dev_mask) {
+  int blocks = (n_boxes + threadsPerBlock - 1) / threadsPerBlock;
+  CUDA_2D_KERNEL_BLOCK_LOOP(col_start, blocks, row_start, blocks) {
+    const int tid = threadIdx.x;
+
+    if (row_start > col_start) return;
+
+    const int row_size =
+        fminf(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
+    const int col_size =
+        fminf(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
+
+    __shared__ float block_boxes[threadsPerBlock * 4];
+    if (tid < col_size) {
+      block_boxes[tid * 4 + 0] =
+          dev_boxes[(threadsPerBlock * col_start + tid) * 4 + 0];
+      block_boxes[tid * 4 + 1] =
+          dev_boxes[(threadsPerBlock * col_start + tid) * 4 + 1];
+      block_boxes[tid * 4 + 2] =
+          dev_boxes[(threadsPerBlock * col_start + tid) * 4 + 2];
+      block_boxes[tid * 4 + 3] =
+          dev_boxes[(threadsPerBlock * col_start + tid) * 4 + 3];
+    }
+    __syncthreads();
+
+    if (tid < row_size) {
+      const int cur_box_idx = threadsPerBlock * row_start + tid;
+      const float *cur_box = dev_boxes + cur_box_idx * 4;
+      int i = 0;
+      unsigned long long int t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = tid + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        if (devIoU(cur_box, block_boxes + i * 4, offset, iou_threshold)) {
+          t |= 1ULL << i;
+        }
+      }
+      dev_mask[cur_box_idx * gridDim.y + col_start] = t;
+    }
+  }
+}
+
+__global__ static void gather_keep_from_mask(bool *keep,
+                                             const unsigned long long *dev_mask,
+                                             const int n_boxes) {
+  const int col_blocks = (n_boxes + threadsPerBlock - 1) / threadsPerBlock;
+  const int tid = threadIdx.x;
+
+  // mark the bboxes which have been removed.
+  extern __shared__ unsigned long long removed[];
+
+  // initialize removed.
+  for (int i = tid; i < col_blocks; i += blockDim.x) {
+    removed[i] = 0;
+  }
+  __syncthreads();
+
+  for (int nblock = 0; nblock < col_blocks; ++nblock) {
+    auto removed_val = removed[nblock];
+    __syncthreads();
+    const int i_offset = nblock * threadsPerBlock;
+#pragma unroll
+    for (int inblock = 0; inblock < threadsPerBlock; ++inblock) {
+      const int i = i_offset + inblock;
+      if (i >= n_boxes) break;
+      // select a candidate, check if it should kept.
+      if (!(removed_val & (1ULL << inblock))) {
+        if (tid == 0) {
+          // mark the output.
+          keep[i] = true;
+        }
+        auto p = dev_mask + i * col_blocks;
+        // remove all bboxes which overlap the candidate.
+        for (int j = tid; j < col_blocks; j += blockDim.x) {
+          if (j >= nblock) removed[j] |= p[j];
+        }
+        __syncthreads();
+        removed_val = removed[nblock];
+      }
+    }
+  }
+}
+
+#endif  // NMS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_quadri_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_quadri_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..bba3b8258f6b8798b9d1a651bfda29c48bb5376a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_quadri_cuda.cuh
@@ -0,0 +1,141 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#ifndef NMS_QUADRI_CUDA_CUH
+#define NMS_QUADRI_CUDA_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+#include "box_iou_rotated_utils.hpp"
+
+__host__ __device__ inline int divideUP(const int x, const int y) {
+  return (((x) + (y)-1) / (y));
+}
+
+namespace {
+int const threadsPerBlock = sizeof(unsigned long long) * 8;
+}
+
+template <typename T>
+__global__ void nms_quadri_cuda_kernel(const int n_boxes,
+                                       const float iou_threshold,
+                                       const T* dev_boxes,
+                                       unsigned long long* dev_mask,
+                                       const int multi_label) {
+  if (multi_label == 1) {
+    const int row_start = blockIdx.y;
+    const int col_start = blockIdx.x;
+
+    // if (row_start > col_start) return;
+
+    const int row_size =
+        min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
+    const int col_size =
+        min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
+
+    // Compared to nms_cuda_kernel, where each box is represented with 4 values
+    // (x1, y1, x2, y2), each rotated box is represented with 8 values
+    // (x1, y1, ..., x4, y4) here.
+    __shared__ T block_boxes[threadsPerBlock * 8];
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 8 + 0] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 0];
+      block_boxes[threadIdx.x * 8 + 1] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 1];
+      block_boxes[threadIdx.x * 8 + 2] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 2];
+      block_boxes[threadIdx.x * 8 + 3] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 3];
+      block_boxes[threadIdx.x * 8 + 4] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 4];
+      block_boxes[threadIdx.x * 8 + 5] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 5];
+      block_boxes[threadIdx.x * 8 + 6] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 6];
+      block_boxes[threadIdx.x * 8 + 7] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 9 + 7];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
+      const T* cur_box = dev_boxes + cur_box_idx * 9;
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        // Instead of devIoU used by original horizontal nms, here
+        // we use the single_box_iou_quadri function from
+        // box_iou_rotated_utils.h
+        if (single_box_iou_quadri<T>(cur_box, block_boxes + i * 8, 0) >
+            iou_threshold) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks = divideUP(n_boxes, threadsPerBlock);
+      dev_mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  } else {
+    const int row_start = blockIdx.y;
+    const int col_start = blockIdx.x;
+
+    // if (row_start > col_start) return;
+
+    const int row_size =
+        min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
+    const int col_size =
+        min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
+
+    // Compared to nms_cuda_kernel, where each box is represented with 4 values
+    // (x1, y1, x2, y2), each rotated box is represented with 8 values
+    // (x1, y1, , ..., x4, y4) here.
+    __shared__ T block_boxes[threadsPerBlock * 8];
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 8 + 0] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 0];
+      block_boxes[threadIdx.x * 8 + 1] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 1];
+      block_boxes[threadIdx.x * 8 + 2] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 2];
+      block_boxes[threadIdx.x * 8 + 3] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 3];
+      block_boxes[threadIdx.x * 8 + 4] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 4];
+      block_boxes[threadIdx.x * 8 + 5] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 5];
+      block_boxes[threadIdx.x * 8 + 6] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 6];
+      block_boxes[threadIdx.x * 8 + 7] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 8 + 7];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
+      const T* cur_box = dev_boxes + cur_box_idx * 8;
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        // Instead of devIoU used by original horizontal nms, here
+        // we use the single_box_iou_quadri function from
+        // box_iou_rotated_utils.h
+        if (single_box_iou_quadri<T>(cur_box, block_boxes + i * 8, 0) >
+            iou_threshold) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks = divideUP(n_boxes, threadsPerBlock);
+      dev_mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  }
+}
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_rotated_cuda.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_rotated_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..747327afb83900177dd4721f1b0ba99153f658d7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/nms_rotated_cuda.cuh
@@ -0,0 +1,133 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/nms_rotated/nms_rotated_cuda.cu
+#ifndef NMS_ROTATED_CUDA_CUH
+#define NMS_ROTATED_CUDA_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+#include "box_iou_rotated_utils.hpp"
+
+__host__ __device__ inline int divideUP(const int x, const int y) {
+  return (((x) + (y)-1) / (y));
+}
+
+namespace {
+int const threadsPerBlock = sizeof(unsigned long long) * 8;
+}
+
+template <typename T>
+__global__ void nms_rotated_cuda_kernel(const int n_boxes,
+                                        const float iou_threshold,
+                                        const T* dev_boxes,
+                                        unsigned long long* dev_mask,
+                                        const int multi_label) {
+  // nms_rotated_cuda_kernel is modified from torchvision's nms_cuda_kernel
+
+  if (multi_label == 1) {
+    const int row_start = blockIdx.y;
+    const int col_start = blockIdx.x;
+
+    // if (row_start > col_start) return;
+
+    const int row_size =
+        min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
+    const int col_size =
+        min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
+
+    // Compared to nms_cuda_kernel, where each box is represented with 4 values
+    // (x1, y1, x2, y2), each rotated box is represented with 5 values
+    // (x_center, y_center, width, height, angle_degrees) here.
+    __shared__ T block_boxes[threadsPerBlock * 5];
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 5 + 0] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 0];
+      block_boxes[threadIdx.x * 5 + 1] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 1];
+      block_boxes[threadIdx.x * 5 + 2] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 2];
+      block_boxes[threadIdx.x * 5 + 3] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 3];
+      block_boxes[threadIdx.x * 5 + 4] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 4];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
+      const T* cur_box = dev_boxes + cur_box_idx * 6;
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        // Instead of devIoU used by original horizontal nms, here
+        // we use the single_box_iou_rotated function from
+        // box_iou_rotated_utils.h
+        if (single_box_iou_rotated<T>(cur_box, block_boxes + i * 5, 0) >
+            iou_threshold) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks = divideUP(n_boxes, threadsPerBlock);
+      dev_mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  } else {
+    const int row_start = blockIdx.y;
+    const int col_start = blockIdx.x;
+
+    // if (row_start > col_start) return;
+
+    const int row_size =
+        min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
+    const int col_size =
+        min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
+
+    // Compared to nms_cuda_kernel, where each box is represented with 4 values
+    // (x1, y1, x2, y2), each rotated box is represented with 5 values
+    // (x_center, y_center, width, height, angle_degrees) here.
+    __shared__ T block_boxes[threadsPerBlock * 5];
+    if (threadIdx.x < col_size) {
+      block_boxes[threadIdx.x * 5 + 0] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
+      block_boxes[threadIdx.x * 5 + 1] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
+      block_boxes[threadIdx.x * 5 + 2] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
+      block_boxes[threadIdx.x * 5 + 3] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
+      block_boxes[threadIdx.x * 5 + 4] =
+          dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
+    }
+    __syncthreads();
+
+    if (threadIdx.x < row_size) {
+      const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
+      const T* cur_box = dev_boxes + cur_box_idx * 5;
+      int i = 0;
+      unsigned long long t = 0;
+      int start = 0;
+      if (row_start == col_start) {
+        start = threadIdx.x + 1;
+      }
+      for (i = start; i < col_size; i++) {
+        // Instead of devIoU used by original horizontal nms, here
+        // we use the single_box_iou_rotated function from
+        // box_iou_rotated_utils.h
+        if (single_box_iou_rotated<T>(cur_box, block_boxes + i * 5, 0) >
+            iou_threshold) {
+          t |= 1ULL << i;
+        }
+      }
+      const int col_blocks = divideUP(n_boxes, threadsPerBlock);
+      dev_mask[cur_box_idx * col_blocks + col_start] = t;
+    }
+  }
+}
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/parrots_cudawarpfunction.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/parrots_cudawarpfunction.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..7918a57452bbde9dc7c249b0c3dd2774aa1961bf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/parrots_cudawarpfunction.cuh
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2019, SenseTime.
+ */
+
+#ifndef INCLUDE_PARROTS_DARRAY_CUDAWARPFUNCTION_CUH_
+#define INCLUDE_PARROTS_DARRAY_CUDAWARPFUNCTION_CUH_
+
+#ifndef __CUDACC__
+#error cudawarpfunction.cuh should only be included by .cu files
+#endif
+#include <cuda.h>
+
+#include <parrots/foundation/common.hpp>
+
+#ifdef PARROTS_USE_HALF
+#include <cuda_fp16.h>
+#endif
+#ifdef __CUDA_ARCH__
+#define CUDA_INTRINSIC_FUNC(Expr) Expr
+#else
+#define CUDA_INTRINSIC_FUNC(Expr)
+#endif
+
+#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
+
+#ifdef PARROTS_USE_HALF
+
+#if CUDA_VERSION < 9000
+
+__device__ inline float16 __shfl(float16 var, int srcLane, int width) {
+  CUDA_INTRINSIC_FUNC(return __shfl(var.y, srcLane, width););
+}
+
+__device__ inline float16 __shfl_up(float16 var, unsigned delta, int width) {
+  CUDA_INTRINSIC_FUNC(return __shfl_up(var.y, delta, width););
+}
+
+__device__ inline float16 __shfl_down(float16 var, unsigned delta, int width) {
+  CUDA_INTRINSIC_FUNC(return __shfl_down(var.y, delta, width););
+}
+
+__device__ inline float16 __shfl_xor(float16 var, int laneMask, int width) {
+  CUDA_INTRINSIC_FUNC(return __shfl_xor(var.y, laneMask, width););
+}
+
+#else  // CUDA_VERSION >= 9000
+
+__device__ inline float16 __shfl_sync(unsigned mask, float16 var, int srcLane,
+                                      int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(float16 r; r.y = __shfl_sync(mask, var.y, srcLane, width);
+                      return r;);
+}
+
+__device__ inline float16 __shfl_up_sync(unsigned mask, float16 var,
+                                         unsigned delta, int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(
+      float16 r; r.y = __shfl_up_sync(mask, var.y, delta, width); return r;);
+}
+
+__device__ inline float16 __shfl_down_sync(unsigned mask, float16 var,
+                                           unsigned delta,
+                                           int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(
+      float16 r; r.y = __shfl_down_sync(mask, var.y, delta, width); return r;);
+}
+
+__device__ inline float16 __shfl_xor_sync(unsigned mask, float16 var,
+                                          int laneMask, int width) {
+  CUDA_INTRINSIC_FUNC(float16 r;
+                      r.y = __shfl_xor_sync(mask, var.y, laneMask, width);
+                      return r;);
+}
+
+#endif  // CUDA_VERSION < 9000
+
+#endif  // PARROTS_USE_HALF
+
+// warp shuffle interface with a dummy mask
+#if CUDA_VERSION < 9000
+
+template <typename T>
+__device__ inline T __shfl_sync(unsigned mask, T var, int srcLane,
+                                int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(return __shfl(var, srcLane, width););
+}
+
+template <typename T>
+__device__ inline T __shfl_up_sync(unsigned mask, T var, unsigned delta,
+                                   int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(return __shfl_up(var, delta, width););
+}
+
+template <typename T>
+__device__ inline T __shfl_down_sync(unsigned mask, T var, unsigned delta,
+                                     int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(return __shfl_down(var, delta, width););
+}
+
+template <typename T>
+__device__ inline T __shfl_xor_sync(unsigned mask, T var, int laneMask,
+                                    int width = warpSize) {
+  CUDA_INTRINSIC_FUNC(return __shfl_xor(var, laneMask, width););
+}
+
+#endif  // CUDA_VERSION < 9000
+
+#endif  // !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 300
+
+#endif  // INCLUDE_PARROTS_DARRAY_CUDAWARPFUNCTION_CUH_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_boxes_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_boxes_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..342362079a5ce3dde6d19532b3014872f4373330
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_boxes_cuda_kernel.cuh
@@ -0,0 +1,95 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef POINT_IN_BOXES_CUDA_KERNEL_CUH
+#define POINT_IN_BOXES_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__device__ inline void lidar_to_local_coords(T shift_x, T shift_y, T rz,
+                                             T &local_x, T &local_y) {
+  T cosa = cos(-rz), sina = sin(-rz);
+  local_x = shift_x * cosa + shift_y * (-sina);
+  local_y = shift_x * sina + shift_y * cosa;
+}
+
+template <typename T>
+__device__ inline int check_pt_in_box3d(const T *pt, const T *box3d, T &local_x,
+                                        T &local_y) {
+  // param pt: (x, y, z)
+  // param box3d: (cx, cy, cz, x_size, y_size, z_size, rz) in LiDAR coordinate,
+  // cz in the bottom center
+  T x = pt[0], y = pt[1], z = pt[2];
+  T cx = box3d[0], cy = box3d[1], cz = box3d[2];
+  T x_size = box3d[3], y_size = box3d[4], z_size = box3d[5], rz = box3d[6];
+  cz += z_size /
+        2.0;  // shift to the center since cz in box3d is the bottom center
+
+  if (fabsf(z - cz) > z_size / 2.0) return 0;
+  lidar_to_local_coords(x - cx, y - cy, rz, local_x, local_y);
+  float in_flag = (local_x > -x_size / 2.0) & (local_x < x_size / 2.0) &
+                  (local_y > -y_size / 2.0) & (local_y < y_size / 2.0);
+  return in_flag;
+}
+
+template <typename T>
+__global__ void points_in_boxes_part_forward_cuda_kernel(
+    int batch_size, int boxes_num, int pts_num, const T *boxes, const T *pts,
+    int *box_idx_of_points) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box DO NOT overlaps params pts:
+  // (B, npoints, 3) [x, y, z] in LiDAR coordinate params boxes_idx_of_points:
+  // (B, npoints), default -1
+
+  int bs_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, pts_num) {
+    if (bs_idx >= batch_size) return;
+
+    boxes += bs_idx * boxes_num * 7;
+    pts += bs_idx * pts_num * 3 + pt_idx * 3;
+    box_idx_of_points += bs_idx * pts_num + pt_idx;
+
+    T local_x = 0, local_y = 0;
+    int cur_in_flag = 0;
+    for (int k = 0; k < boxes_num; k++) {
+      cur_in_flag = check_pt_in_box3d(pts, boxes + k * 7, local_x, local_y);
+      if (cur_in_flag) {
+        box_idx_of_points[0] = k;
+        break;
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void points_in_boxes_all_forward_cuda_kernel(
+    int batch_size, int boxes_num, int pts_num, const T *boxes, const T *pts,
+    int *box_idx_of_points) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box DO NOT overlaps params pts:
+  // (B, npoints, 3) [x, y, z] in LiDAR coordinate params boxes_idx_of_points:
+  // (B, npoints), default -1
+
+  int bs_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, pts_num) {
+    if (bs_idx >= batch_size) return;
+
+    boxes += bs_idx * boxes_num * 7;
+    pts += bs_idx * pts_num * 3 + pt_idx * 3;
+    box_idx_of_points += bs_idx * pts_num * boxes_num + pt_idx * boxes_num;
+
+    T local_x = 0, local_y = 0;
+    for (int k = 0; k < boxes_num; k++) {
+      const int cur_in_flag =
+          check_pt_in_box3d(pts, boxes + k * 7, local_x, local_y);
+      if (cur_in_flag) {
+        box_idx_of_points[k] = 1;
+      }
+    }
+  }
+}
+
+#endif  // POINT_IN_BOXES_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_polygons_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_polygons_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..a0769d75a29ce8d7eac00931d6f51caa292b2693
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/points_in_polygons_cuda_kernel.cuh
@@ -0,0 +1,79 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef POINTS_IN_POLYGONS_CUDA_KERNEL_CUH
+#define POINTS_IN_POLYGONS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+struct point {
+  float x, y;
+};
+
+template <typename scalar_t>
+__global__ void points_in_polygons_forward_cuda_kernel(
+    const int nthreads, const scalar_t *vertex1, const scalar_t *vertex2,
+    const int rows, const int cols, scalar_t *inside_flag) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int row = index / cols;
+    int col = index % cols;
+
+    const scalar_t *offset_vertex1 = vertex1 + row * 2;
+    const scalar_t *offset_vertex2 = vertex2 + col * 8;
+
+    point point_[1];
+    point polygon[4];
+
+    point_[0].x = offset_vertex1[0];
+    point_[0].y = offset_vertex1[1];
+
+    polygon[0].x = offset_vertex2[0];
+    polygon[0].y = offset_vertex2[1];
+    polygon[1].x = offset_vertex2[2];
+    polygon[1].y = offset_vertex2[3];
+    polygon[2].x = offset_vertex2[4];
+    polygon[2].y = offset_vertex2[5];
+    polygon[3].x = offset_vertex2[6];
+    polygon[3].y = offset_vertex2[7];
+
+    int nCross = 0;
+    int i, j;
+    float sx, sy, tx, ty, px, py, x;
+    for (i = 0, j = 3; i < 4; j = i, i++) {
+      sx = polygon[i].x;
+      sy = polygon[i].y;
+      tx = polygon[j].x;
+      ty = polygon[j].y;
+
+      px = point_[0].x;
+      py = point_[0].y;
+
+      if (py < min(sy, ty)) continue;
+      if (py > max(sy, ty)) continue;
+
+      if ((sx == px && sy == py) || (tx == px && ty == py)) {
+        break;
+      } else {
+        if ((sy < py && ty >= py) || (sy >= py && ty < py)) {
+          x = sx + (py - sy) * (tx - sx) / (ty - sy);
+          if (x == px) {
+            break;
+          }
+          if (x > px) {
+            nCross++;
+          }
+        }
+      }
+    }
+    if (nCross % 2 == 1) {
+      inside_flag[index] = 1.0;
+    } else {
+      inside_flag[index] = 0.0;
+    }
+    return;
+  }
+}
+
+#endif  // POINTS_IN_POLYGONS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/prroi_pool_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/prroi_pool_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..e2f5a11b8dd6058f8d2fd288fc943dc235b39c37
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/prroi_pool_cuda_kernel.cuh
@@ -0,0 +1,381 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/vacancy/PreciseRoIPooling/blob/master/src/prroi_pooling_gpu_impl.cu
+// Distributed under terms of the MIT license.
+#ifndef PRROI_POOL_CUDA_KERNEL_CUH
+#define PRROI_POOL_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__device__ static __forceinline__ T PrRoIPoolingGetData(const T *data,
+                                                        const int h,
+                                                        const int w,
+                                                        const int height,
+                                                        const int width) {
+  bool overflow = (h < 0) || (w < 0) || (h >= height) || (w >= width);
+  T retVal = overflow ? 0.0f : data[h * width + w];
+  return retVal;
+}
+
+template <typename T>
+__device__ static __forceinline__ T PrRoIPoolingGetCoeff(T dh, T dw) {
+  return (1.0f - abs(dh)) * (1.0f - abs(dw));
+}
+
+template <typename T>
+__device__ static __forceinline__ T PrRoIPoolingSingleCoorIntegral(T s, T t,
+                                                                   T c1, T c2) {
+  return 0.5 * (t * t - s * s) * (c2 - c1) + (t - s) * c1;
+}
+
+template <typename T>
+__device__ static T PrRoIPoolingInterpolation(const T *data, const T h,
+                                              const T w, const int height,
+                                              const int width) {
+  T retVal = 0.0f;
+  int h1 = floorf(h);
+  int w1 = floorf(w);
+  retVal += PrRoIPoolingGetData(data, h1, w1, height, width) *
+            PrRoIPoolingGetCoeff(h - T(h1), w - T(w1));
+  h1 = floorf(h) + 1;
+  w1 = floorf(w);
+  retVal += PrRoIPoolingGetData(data, h1, w1, height, width) *
+            PrRoIPoolingGetCoeff(h - T(h1), w - T(w1));
+  h1 = floorf(h);
+  w1 = floorf(w) + 1;
+  retVal += PrRoIPoolingGetData(data, h1, w1, height, width) *
+            PrRoIPoolingGetCoeff(h - T(h1), w - T(w1));
+  h1 = floorf(h) + 1;
+  w1 = floorf(w) + 1;
+  retVal += PrRoIPoolingGetData(data, h1, w1, height, width) *
+            PrRoIPoolingGetCoeff(h - T(h1), w - T(w1));
+  return retVal;
+}
+
+template <typename T>
+__device__ static T PrRoIPoolingMatCalculation(const T *this_data,
+                                               const int s_h, const int s_w,
+                                               const int e_h, const int e_w,
+                                               const T y0, const T x0,
+                                               const T y1, const T x1,
+                                               const int h0, const int w0) {
+  T alpha, beta, lim_alpha, lim_beta, tmp;
+  T sum_out = 0;
+
+  alpha = x0 - T(s_w);
+  beta = y0 - T(s_h);
+  lim_alpha = x1 - T(s_w);
+  lim_beta = y1 - T(s_h);
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  sum_out += PrRoIPoolingGetData(this_data, s_h, s_w, h0, w0) * tmp;
+
+  alpha = T(e_w) - x1;
+  lim_alpha = T(e_w) - x0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  sum_out += PrRoIPoolingGetData(this_data, s_h, e_w, h0, w0) * tmp;
+
+  alpha = x0 - T(s_w);
+  beta = T(e_h) - y1;
+  lim_alpha = x1 - T(s_w);
+  lim_beta = T(e_h) - y0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  sum_out += PrRoIPoolingGetData(this_data, e_h, s_w, h0, w0) * tmp;
+
+  alpha = T(e_w) - x1;
+  lim_alpha = T(e_w) - x0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  sum_out += PrRoIPoolingGetData(this_data, e_h, e_w, h0, w0) * tmp;
+
+  return sum_out;
+}
+
+template <typename T>
+__device__ static void PrRoIPoolingDistributeDiff(T *diff, const T top_diff,
+                                                  const int h, const int w,
+                                                  const int height,
+                                                  const int width,
+                                                  const T coeff) {
+  bool overflow = (h < 0) || (w < 0) || (h >= height) || (w >= width);
+  if (!overflow) atomicAdd(diff + h * width + w, top_diff * coeff);
+}
+
+template <typename T>
+__device__ static void PrRoIPoolingMatDistributeDiff(
+    T *diff, const T top_diff, const int s_h, const int s_w, const int e_h,
+    const int e_w, const T y0, const T x0, const T y1, const T x1, const int h0,
+    const int w0) {
+  T alpha, beta, lim_alpha, lim_beta, tmp;
+
+  alpha = x0 - T(s_w);
+  beta = y0 - T(s_h);
+  lim_alpha = x1 - T(s_w);
+  lim_beta = y1 - T(s_h);
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  PrRoIPoolingDistributeDiff(diff, top_diff, s_h, s_w, h0, w0, tmp);
+
+  alpha = T(e_w) - x1;
+  lim_alpha = T(e_w) - x0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  PrRoIPoolingDistributeDiff(diff, top_diff, s_h, e_w, h0, w0, tmp);
+
+  alpha = x0 - T(s_w);
+  beta = T(e_h) - y1;
+  lim_alpha = x1 - T(s_w);
+  lim_beta = T(e_h) - y0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  PrRoIPoolingDistributeDiff(diff, top_diff, e_h, s_w, h0, w0, tmp);
+
+  alpha = T(e_w) - x1;
+  lim_alpha = T(e_w) - x0;
+  tmp = (lim_alpha - 0.5f * lim_alpha * lim_alpha - alpha +
+         0.5f * alpha * alpha) *
+        (lim_beta - 0.5f * lim_beta * lim_beta - beta + 0.5f * beta * beta);
+  PrRoIPoolingDistributeDiff(diff, top_diff, e_h, e_w, h0, w0, tmp);
+}
+
+template <typename T>
+__global__ void prroi_pool_forward_cuda_kernel(
+    const int nthreads, const T *input, const T *rois, T *output,
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T *offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+
+    T roi_x1 = offset_rois[1] * spatial_scale;
+    T roi_y1 = offset_rois[2] * spatial_scale;
+    T roi_x2 = offset_rois[3] * spatial_scale;
+    T roi_y2 = offset_rois[4] * spatial_scale;
+
+    T roi_width = max(roi_x2 - roi_x1, ((T)0.0));
+    T roi_height = max(roi_y2 - roi_y1, ((T)0.0));
+    T bin_size_h = roi_height / static_cast<T>(pooled_height);
+    T bin_size_w = roi_width / static_cast<T>(pooled_width);
+
+    const T *this_data =
+        input + (roi_batch_ind * channels + c) * height * width;
+    T *this_out = output + index;
+
+    T bin_x1 = roi_x1 + bin_size_w * pw;
+    T bin_y1 = roi_y1 + bin_size_h * ph;
+    T bin_x2 = bin_x1 + bin_size_w;
+    T bin_y2 = bin_y1 + bin_size_h;
+
+    T bin_size = max(T(0.0), bin_size_w * bin_size_h);
+    if (bin_size == 0) {
+      *this_out = 0;
+      continue;
+    }
+
+    T sum_out = 0;
+
+    int start_x, start_y, end_x, end_y;
+
+    start_x = floorf(bin_x1);
+    end_x = ceilf(bin_x2);
+    start_y = floorf(bin_y1);
+    end_y = ceilf(bin_y2);
+
+    for (int bin_x = start_x; bin_x < end_x; ++bin_x)
+      for (int bin_y = start_y; bin_y < end_y; ++bin_y)
+        sum_out += PrRoIPoolingMatCalculation(
+            this_data, bin_y, bin_x, bin_y + 1, bin_x + 1,
+            max(bin_y1, T(bin_y)), max(bin_x1, T(bin_x)),
+            min(bin_y2, T(bin_y) + 1.0f), min(bin_x2, T(bin_x + 1.0f)), height,
+            width);
+    *this_out = sum_out / bin_size;
+  }
+}
+
+template <typename T>
+__global__ void prroi_pool_backward_cuda_kernel(
+    const int nthreads, const T *grad_output, const T *rois, T *grad_input,
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    auto rois_cur = rois + n * 5;
+
+    int roi_batch_ind = rois_cur[0];
+    T roi_x1 = rois_cur[1] * spatial_scale;
+    T roi_y1 = rois_cur[2] * spatial_scale;
+    T roi_x2 = rois_cur[3] * spatial_scale;
+    T roi_y2 = rois_cur[4] * spatial_scale;
+
+    T roi_width = max(roi_x2 - roi_x1, (T)0);
+    T roi_height = max(roi_y2 - roi_y1, (T)0);
+    T bin_size_h = roi_height / static_cast<T>(pooled_height);
+    T bin_size_w = roi_width / static_cast<T>(pooled_width);
+
+    const T *this_out_grad = grad_output + index;
+    T *this_data_grad =
+        grad_input + (roi_batch_ind * channels + c) * height * width;
+
+    T bin_x1 = roi_x1 + bin_size_w * pw;
+    T bin_y1 = roi_y1 + bin_size_h * ph;
+    T bin_x2 = bin_x1 + bin_size_w;
+    T bin_y2 = bin_y1 + bin_size_h;
+
+    T bin_size = max(T(0.0), bin_size_w * bin_size_h);
+
+    T sum_out = bin_size == T(0) ? T(0) : *this_out_grad / bin_size;
+
+    int start_x, start_y, end_x, end_y;
+
+    start_x = floorf(bin_x1);
+    end_x = ceilf(bin_x2);
+    start_y = floorf(bin_y1);
+    end_y = ceilf(bin_y2);
+
+    for (int bin_x = start_x; bin_x < end_x; ++bin_x)
+      for (int bin_y = start_y; bin_y < end_y; ++bin_y)
+        PrRoIPoolingMatDistributeDiff(
+            this_data_grad, sum_out, bin_y, bin_x, bin_y + 1, bin_x + 1,
+            max(bin_y1, T(bin_y)), max(bin_x1, T(bin_x)),
+            min(bin_y2, T(bin_y) + 1.0f), min(bin_x2, T(bin_x + 1.0f)), height,
+            width);
+  }
+}
+
+template <typename T>
+__global__ void prroi_pool_coor_backward_cuda_kernel(
+    const int nthreads, const T *output, const T *grad_output, const T *input,
+    const T *rois, T *grad_rois, const int pooled_height,
+    const int pooled_width, const T spatial_scale, const int channels,
+    const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    auto rois_cur = rois + n * 5;
+
+    int roi_batch_ind = rois_cur[0];
+    T roi_x1 = rois_cur[1] * spatial_scale;
+    T roi_y1 = rois_cur[2] * spatial_scale;
+    T roi_x2 = rois_cur[3] * spatial_scale;
+    T roi_y2 = rois_cur[4] * spatial_scale;
+
+    T roi_width = max(roi_x2 - roi_x1, (T)0);
+    T roi_height = max(roi_y2 - roi_y1, (T)0);
+    T bin_size_h = roi_height / static_cast<T>(pooled_height);
+    T bin_size_w = roi_width / static_cast<T>(pooled_width);
+
+    const T output_grad_val = grad_output[index];
+    const T *this_input_data =
+        input + (roi_batch_ind * channels + c) * height * width;
+    const T output_val = output[index];
+    T *this_rois_grad = grad_rois + n * 5;
+
+    T bin_x1 = roi_x1 + bin_size_w * pw;
+    T bin_y1 = roi_y1 + bin_size_h * ph;
+    T bin_x2 = bin_x1 + bin_size_w;
+    T bin_y2 = bin_y1 + bin_size_h;
+
+    T bin_size = max(T(0.0), bin_size_w * bin_size_h);
+
+    T sum_out = bin_size == T(0) ? T(0) : output_grad_val / bin_size;
+
+    // WARNING: to be discussed
+    if (sum_out == 0) continue;
+
+    int start_x, start_y, end_x, end_y;
+
+    start_x = floorf(bin_x1);
+    end_x = ceilf(bin_x2);
+    start_y = floorf(bin_y1);
+    end_y = ceilf(bin_y2);
+
+    T grad_x1_y = 0, grad_x2_y = 0, grad_x_y1 = 0, grad_x_y2 = 0;
+    for (int bin_y = start_y; bin_y < end_y; ++bin_y) {
+      grad_x1_y += PrRoIPoolingSingleCoorIntegral(
+          max(bin_y1, T(bin_y)) - bin_y, min(bin_y2, T(bin_y + 1)) - bin_y,
+          PrRoIPoolingInterpolation(this_input_data, float(bin_y), bin_x1,
+                                    height, width),
+          PrRoIPoolingInterpolation(this_input_data, float(bin_y + 1), bin_x1,
+                                    height, width));
+
+      grad_x2_y += PrRoIPoolingSingleCoorIntegral(
+          max(bin_y1, T(bin_y)) - bin_y, min(bin_y2, T(bin_y + 1)) - bin_y,
+          PrRoIPoolingInterpolation(this_input_data, float(bin_y), bin_x2,
+                                    height, width),
+          PrRoIPoolingInterpolation(this_input_data, float(bin_y + 1), bin_x2,
+                                    height, width));
+    }
+
+    for (int bin_x = start_x; bin_x < end_x; ++bin_x) {
+      grad_x_y1 += PrRoIPoolingSingleCoorIntegral(
+          max(bin_x1, T(bin_x)) - bin_x, min(bin_x2, T(bin_x + 1)) - bin_x,
+          PrRoIPoolingInterpolation(this_input_data, bin_y1, float(bin_x),
+                                    height, width),
+          PrRoIPoolingInterpolation(this_input_data, bin_y1, float(bin_x + 1),
+                                    height, width));
+
+      grad_x_y2 += PrRoIPoolingSingleCoorIntegral(
+          max(bin_x1, T(bin_x)) - bin_x, min(bin_x2, T(bin_x + 1)) - bin_x,
+          PrRoIPoolingInterpolation(this_input_data, bin_y2, float(bin_x),
+                                    height, width),
+          PrRoIPoolingInterpolation(this_input_data, bin_y2, float(bin_x + 1),
+                                    height, width));
+    }
+
+    T partial_x1 = -grad_x1_y + (bin_y2 - bin_y1) * output_val;
+    T partial_y1 = -grad_x_y1 + (bin_x2 - bin_x1) * output_val;
+    T partial_x2 = grad_x2_y - (bin_y2 - bin_y1) * output_val;
+    T partial_y2 = grad_x_y2 - (bin_x2 - bin_x1) * output_val;
+
+    partial_x1 = partial_x1 / bin_size * spatial_scale;
+    partial_x2 = partial_x2 / bin_size * spatial_scale;
+    partial_y1 = partial_y1 / bin_size * spatial_scale;
+    partial_y2 = partial_y2 / bin_size * spatial_scale;
+
+    // (index, x1, y1, x2, y2)
+    this_rois_grad[0] = 0;
+    atomicAdd(this_rois_grad + 1,
+              (partial_x1 * (1.0f - T(pw) / pooled_width) +
+               partial_x2 * (1.0f - T(pw + 1) / pooled_width)) *
+                  output_grad_val);
+    atomicAdd(this_rois_grad + 2,
+              (partial_y1 * (1.0f - T(ph) / pooled_height) +
+               partial_y2 * (1.0f - T(ph + 1) / pooled_height)) *
+                  output_grad_val);
+    atomicAdd(this_rois_grad + 3, (partial_x2 * T(pw + 1) / pooled_width +
+                                   partial_x1 * T(pw) / pooled_width) *
+                                      output_grad_val);
+    atomicAdd(this_rois_grad + 4, (partial_y2 * T(ph + 1) / pooled_height +
+                                   partial_y1 * T(ph) / pooled_height) *
+                                      output_grad_val);
+  }
+}
+
+#endif  // ROI_POOL_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/psamask_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/psamask_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..5d946686bdd5fdfbf8a27f6d040e15861202f471
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/psamask_cuda_kernel.cuh
@@ -0,0 +1,141 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef PSAMASK_CUDA_KERNEL_CUH
+#define PSAMASK_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+// CUDA: grid stride looping
+#ifndef CUDA_KERNEL_LOOP
+#define CUDA_KERNEL_LOOP(i, n)                                 \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
+       i += blockDim.x * gridDim.x)
+#endif
+
+template <typename T>
+__global__ void psamask_collect_forward_cuda(
+    const int nthreads, const int h_feature, const int w_feature,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask, const T* mask_data, T* buffer_data) {
+  CUDA_KERNEL_LOOP(index, nthreads) {
+    const int w = index % w_feature;
+    const int h = (index / w_feature) % h_feature;
+    const int n = index / w_feature / h_feature;
+    // effective mask region : [hstart, hend) x [wstart, wend) with mask-indexed
+    const int hstart = max(0, half_h_mask - h);
+    const int hend = min(h_mask, h_feature + half_h_mask - h);
+    const int wstart = max(0, half_w_mask - w);
+    const int wend = min(w_mask, w_feature + half_w_mask - w);
+    // (hidx,                    widx                   ) with mask-indexed
+    // (hidx + h - half_h_mask, widx + w - half_w_mask) with feature-indexed
+    for (int hidx = hstart; hidx < hend; hidx++) {
+      for (int widx = wstart; widx < wend; widx++) {
+        buffer_data[(n * h_feature * w_feature +
+                     (hidx + h - half_h_mask) * w_feature +
+                     (widx + w - half_w_mask)) *
+                        h_feature * w_feature +
+                    h * w_feature + w] = mask_data
+            [((n * h_mask * w_mask + hidx * w_mask + widx) * h_feature + h) *
+                 w_feature +
+             w];
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void psamask_distribute_forward_cuda(
+    const int nthreads, const int h_feature, const int w_feature,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask, const T* mask_data, T* buffer_data) {
+  CUDA_KERNEL_LOOP(index, nthreads) {
+    const int w = index % w_feature;
+    const int h = (index / w_feature) % h_feature;
+    const int n = index / w_feature / h_feature;
+    // effective mask region : [hstart, hend) x [wstart, wend) with mask-indexed
+    const int hstart = max(0, half_h_mask - h);
+    const int hend = min(h_mask, h_feature + half_h_mask - h);
+    const int wstart = max(0, half_w_mask - w);
+    const int wend = min(w_mask, w_feature + half_w_mask - w);
+    // (hidx,                    widx                   ) with mask-indexed
+    // (hidx + h - half_h_mask, widx + w - half_w_mask) with feature-indexed
+    for (int hidx = hstart; hidx < hend; hidx++) {
+      for (int widx = wstart; widx < wend; widx++) {
+        buffer_data[(n * h_feature * w_feature + h * w_feature + w) *
+                        h_feature * w_feature +
+                    (hidx + h - half_h_mask) * w_feature +
+                    (widx + w - half_w_mask)] = mask_data
+            [((n * h_mask * w_mask + hidx * w_mask + widx) * h_feature + h) *
+                 w_feature +
+             w];
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void psamask_collect_backward_cuda(
+    const int nthreads, const int h_feature, const int w_feature,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask, const T* buffer_diff, T* mask_diff) {
+  CUDA_KERNEL_LOOP(index, nthreads) {
+    const int w = index % w_feature;
+    const int h = (index / w_feature) % h_feature;
+    const int n = index / w_feature / h_feature;
+    // effective mask region : [hstart, hend) x [wstart, wend) with mask-indexed
+    const int hstart = max(0, half_h_mask - h);
+    const int hend = min(h_mask, h_feature + half_h_mask - h);
+    const int wstart = max(0, half_w_mask - w);
+    const int wend = min(w_mask, w_feature + half_w_mask - w);
+    // (hidx,                    widx                   ) with mask-indexed
+    // (hidx + h - half_h_mask, widx + w - half_w_mask) with feature-indexed
+    for (int hidx = hstart; hidx < hend; hidx++) {
+      for (int widx = wstart; widx < wend; widx++) {
+        mask_diff[((n * h_mask * w_mask + hidx * w_mask + widx) * h_feature +
+                   h) *
+                      w_feature +
+                  w] = buffer_diff[(n * h_feature * w_feature +
+                                    (hidx + h - half_h_mask) * w_feature +
+                                    (widx + w - half_w_mask)) *
+                                       h_feature * w_feature +
+                                   h * w_feature + w];
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void psamask_distribute_backward_cuda(
+    const int nthreads, const int h_feature, const int w_feature,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask, const T* buffer_diff, T* mask_diff) {
+  CUDA_KERNEL_LOOP(index, nthreads) {
+    const int w = index % w_feature;
+    const int h = (index / w_feature) % h_feature;
+    const int n = index / w_feature / h_feature;
+    // effective mask region : [hstart, hend) x [wstart, wend) with mask-indexed
+    const int hstart = max(0, half_h_mask - h);
+    const int hend = min(h_mask, h_feature + half_h_mask - h);
+    const int wstart = max(0, half_w_mask - w);
+    const int wend = min(w_mask, w_feature + half_w_mask - w);
+    // (hidx,                    widx                   ) with mask-indexed
+    // (hidx + h - half_h_mask, widx + w - half_w_mask) with feature-indexed
+    for (int hidx = hstart; hidx < hend; hidx++) {
+      for (int widx = wstart; widx < wend; widx++) {
+        mask_diff[((n * h_mask * w_mask + hidx * w_mask + widx) * h_feature +
+                   h) *
+                      w_feature +
+                  w] =
+            buffer_diff[(n * h_feature * w_feature + h * w_feature + w) *
+                            h_feature * w_feature +
+                        (hidx + h - half_h_mask) * w_feature +
+                        (widx + w - half_w_mask)];
+      }
+    }
+  }
+}
+
+#endif  // PSAMASK_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/riroi_align_rotated_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/riroi_align_rotated_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..4383d9e82cce97362f53cf799b8dfa30c7b4cd02
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/riroi_align_rotated_cuda_kernel.cuh
@@ -0,0 +1,242 @@
+// Modified from
+// https://github.com/csuhan/ReDet/blob/master/mmdet/ops/riroi_align/src/riroi_align_kernel.cu
+#ifndef RIROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
+#define RIROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+
+/*** Forward ***/
+template <typename scalar_t>
+__global__ void riroi_align_rotated_forward_cuda_kernel(
+    const int nthreads, const scalar_t *bottom_data,
+    const scalar_t *bottom_rois, const scalar_t spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int pooled_height,
+    const int pooled_width, const int num_orientations, scalar_t *top_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int o = (index / pooled_width / pooled_height) % num_orientations;
+    int c =
+        (index / pooled_width / pooled_height / num_orientations) % channels;
+    int n = index / pooled_width / pooled_height / num_orientations / channels;
+
+    const scalar_t *offset_bottom_rois = bottom_rois + n * 6;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not using rounding; this implementation detail is critical
+    scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale;
+    scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale;
+    scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
+    scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
+    // scalar_t theta = offset_bottom_rois[5] * M_PI / 180.0;
+    scalar_t theta = offset_bottom_rois[5];
+    // Force malformed ROIs to be 1x1
+    roi_width = max(roi_width, (scalar_t)1.);
+    roi_height = max(roi_height, (scalar_t)1.);
+    scalar_t bin_size_h = static_cast<scalar_t>(roi_height) /
+                          static_cast<scalar_t>(pooled_height);
+    scalar_t bin_size_w =
+        static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
+
+    // find aligned index
+    scalar_t ind_float = theta * num_orientations / (2 * M_PI);
+    int ind = floorf(ind_float);
+    scalar_t l_var = ind_float - (scalar_t)ind;
+    scalar_t r_var = 1.0 - l_var;
+    // correct start channel
+    ind = (ind + num_orientations) % num_orientations;
+    // rotated channel
+    int ind_rot = (o - ind + num_orientations) % num_orientations;
+    int ind_rot_plus = (ind_rot + 1 + num_orientations) % num_orientations;
+    const scalar_t *offset_bottom_data =
+        bottom_data + (roi_batch_ind * channels * num_orientations +
+                       c * num_orientations + ind_rot) *
+                          height * width;
+
+    const scalar_t *offset_bottom_data_plus =
+        bottom_data + (roi_batch_ind * channels * num_orientations +
+                       c * num_orientations + ind_rot_plus) *
+                          height * width;
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (num_samples > 0)
+                             ? num_samples
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (num_samples > 0) ? num_samples : ceilf(roi_width / pooled_width);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    scalar_t roi_start_h = -roi_height / 2.0;
+    scalar_t roi_start_w = -roi_width / 2.0;
+    scalar_t cosscalar_theta = cos(theta);
+    scalar_t sinscalar_theta = sin(theta);
+
+    // We do average (integral) pooling inside a bin
+    const scalar_t count = max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    scalar_t output_val = 0.;
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {  // e.g., iy = 0, 1
+      const scalar_t yy =
+          roi_start_h + ph * bin_size_h +
+          static_cast<scalar_t>(iy + .5f) * bin_size_h /
+              static_cast<scalar_t>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const scalar_t xx = roi_start_w + pw * bin_size_w +
+                            static_cast<scalar_t>(ix + .5f) * bin_size_w /
+                                static_cast<scalar_t>(roi_bin_grid_w);
+
+        // Rotate by theta (counterclockwise) around the center and translate
+        scalar_t y = yy * cosscalar_theta - xx * sinscalar_theta + roi_center_h;
+        scalar_t x = yy * sinscalar_theta + xx * cosscalar_theta + roi_center_w;
+
+        scalar_t val = bilinear_interpolate<scalar_t>(
+            offset_bottom_data, height, width, y, x, index);
+        scalar_t val_plus = bilinear_interpolate<scalar_t>(
+            offset_bottom_data_plus, height, width, y, x, index);
+        output_val += r_var * val + l_var * val_plus;
+      }
+    }
+    output_val /= count;
+
+    top_data[index] = output_val;
+  }
+}
+
+/*** Backward ***/
+template <typename scalar_t>
+__global__ void riroi_align_rotated_backward_cuda_kernel(
+    const int nthreads, const scalar_t *top_diff, const scalar_t *bottom_rois,
+    const scalar_t spatial_scale, const int num_samples, const bool clockwise,
+    const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    scalar_t *bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int o = (index / pooled_width / pooled_height) % num_orientations;
+    int c =
+        (index / pooled_width / pooled_height / num_orientations) % channels;
+    int n = index / pooled_width / pooled_height / num_orientations / channels;
+
+    const scalar_t *offset_bottom_rois = bottom_rois + n * 6;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not round
+    scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale;
+    scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale;
+    scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
+    scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
+    // scalar_t theta = offset_bottom_rois[5] * M_PI / 180.0;
+    scalar_t theta = offset_bottom_rois[5];
+    // Force malformed ROIs to be 1x1
+    roi_width = max(roi_width, (scalar_t)1.);
+    roi_height = max(roi_height, (scalar_t)1.);
+
+    scalar_t bin_size_h = static_cast<scalar_t>(roi_height) /
+                          static_cast<scalar_t>(pooled_height);
+    scalar_t bin_size_w =
+        static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
+
+    // find aligned index
+    scalar_t ind_float = theta * num_orientations / (2 * M_PI);
+    int ind = floorf(ind_float);
+    scalar_t l_var = ind_float - (scalar_t)ind;
+    scalar_t r_var = 1.0 - l_var;
+    // correct start channel
+    ind = (ind + num_orientations) % num_orientations;
+    // rotated channel
+    int ind_rot = (o - ind + num_orientations) % num_orientations;
+    int ind_rot_plus = (ind_rot + 1 + num_orientations) % num_orientations;
+    scalar_t *offset_bottom_diff =
+        bottom_diff + (roi_batch_ind * channels * num_orientations +
+                       c * num_orientations + ind_rot) *
+                          height * width;
+    scalar_t *offset_bottom_diff_plus =
+        bottom_diff + (roi_batch_ind * channels * num_orientations +
+                       c * num_orientations + ind_rot_plus) *
+                          height * width;
+    int top_offset =
+        (n * channels * num_orientations + c * num_orientations + o) *
+        pooled_height * pooled_width;
+    const scalar_t *offset_top_diff = top_diff + top_offset;
+    const scalar_t top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (num_samples > 0)
+                             ? num_samples
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (num_samples > 0) ? num_samples : ceilf(roi_width / pooled_width);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    scalar_t roi_start_h = -roi_height / 2.0;
+    scalar_t roi_start_w = -roi_width / 2.0;
+    scalar_t cosTheta = cos(theta);
+    scalar_t sinTheta = sin(theta);
+
+    // We do average (integral) pooling inside a bin
+    const scalar_t count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {  // e.g., iy = 0, 1
+      const scalar_t yy =
+          roi_start_h + ph * bin_size_h +
+          static_cast<scalar_t>(iy + .5f) * bin_size_h /
+              static_cast<scalar_t>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const scalar_t xx = roi_start_w + pw * bin_size_w +
+                            static_cast<scalar_t>(ix + .5f) * bin_size_w /
+                                static_cast<scalar_t>(roi_bin_grid_w);
+
+        // Rotate by theta around the center and translate
+        scalar_t y = yy * cosTheta - xx * sinTheta + roi_center_h;
+        scalar_t x = yy * sinTheta + xx * cosTheta + roi_center_w;
+
+        scalar_t w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+
+        bilinear_interpolate_gradient<scalar_t>(height, width, y, x, w1, w2, w3,
+                                                w4, x_low, x_high, y_low,
+                                                y_high, index);
+
+        scalar_t g1 = top_diff_this_bin * w1 / count;
+        scalar_t g2 = top_diff_this_bin * w2 / count;
+        scalar_t g3 = top_diff_this_bin * w3 / count;
+        scalar_t g4 = top_diff_this_bin * w4 / count;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          atomicAdd(offset_bottom_diff + y_low * width + x_low, g1 * r_var);
+          atomicAdd(offset_bottom_diff + y_low * width + x_high, g2 * r_var);
+          atomicAdd(offset_bottom_diff + y_high * width + x_low, g3 * r_var);
+          atomicAdd(offset_bottom_diff + y_high * width + x_high, g4 * r_var);
+
+          atomicAdd(offset_bottom_diff_plus + y_low * width + x_low,
+                    g1 * l_var);
+          atomicAdd(offset_bottom_diff_plus + y_low * width + x_high,
+                    g2 * l_var);
+          atomicAdd(offset_bottom_diff_plus + y_high * width + x_low,
+                    g3 * l_var);
+          atomicAdd(offset_bottom_diff_plus + y_high * width + x_high,
+                    g4 * l_var);
+
+        }  // if
+      }    // ix
+    }      // iy
+  }        // CUDA_1D_KERNEL_LOOP
+}  // RiRoIAlignBackward
+
+#endif  // RIROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..4541462afd6bd77ee794badd7d84bdd6c91b2c43
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_cuda_kernel.cuh
@@ -0,0 +1,212 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROI_ALIGN_CUDA_KERNEL_CUH
+#define ROI_ALIGN_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+/*** Forward ***/
+template <typename T>
+__global__ void roi_align_forward_cuda_kernel(
+    const int nthreads, const T* input, const T* rois, T* output, T* argmax_y,
+    T* argmax_x, const int pooled_height, const int pooled_width,
+    const T spatial_scale, const int sampling_ratio,
+    const int pool_mode,  // 0 - max pool, 1 - avg pool
+    const bool aligned, const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+
+    // Do not using rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T roi_start_w = offset_rois[1] * spatial_scale - offset;
+    T roi_start_h = offset_rois[2] * spatial_scale - offset;
+    T roi_end_w = offset_rois[3] * spatial_scale - offset;
+    T roi_end_h = offset_rois[4] * spatial_scale - offset;
+
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+    if (!aligned) {  // for backward-compatibility only
+      roi_width = max(roi_width, (T)1.);
+      roi_height = max(roi_height, (T)1.);
+    }
+
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    const T* offset_input =
+        input + (roi_batch_ind * channels + c) * height * width;
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_height / pooled_height));
+    int roi_bin_grid_w =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_width / pooled_width));
+
+    if (pool_mode == 0) {
+      // We do max pooling inside a bin
+      T maxval = -FLT_MAX;
+      T maxidx_y = -1.f, maxidx_x = -1.f;
+      for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+        const T y = roi_start_h + ph * bin_size_h +
+                    static_cast<T>(iy + .5f) * bin_size_h /
+                        static_cast<T>(roi_bin_grid_h);
+        for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+          const T x = roi_start_w + pw * bin_size_w +
+                      static_cast<T>(ix + .5f) * bin_size_w /
+                          static_cast<T>(roi_bin_grid_w);
+          T val =
+              bilinear_interpolate(offset_input, height, width, y, x, index);
+          if (val > maxval) {
+            maxval = val;
+            maxidx_y = y;
+            maxidx_x = x;
+          }
+        }
+      }
+      output[index] = maxval;
+      argmax_y[index] = maxidx_y;
+      argmax_x[index] = maxidx_x;
+    } else if (pool_mode == 1) {
+      // We do average pooling inside a bin
+      const T count = max(roi_bin_grid_h * roi_bin_grid_w, 1);
+      T output_val = 0.;
+      for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+        const T y = roi_start_h + ph * bin_size_h +
+                    static_cast<T>(iy + .5f) * bin_size_h /
+                        static_cast<T>(roi_bin_grid_h);
+        for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+          const T x = roi_start_w + pw * bin_size_w +
+                      static_cast<T>(ix + .5f) * bin_size_w /
+                          static_cast<T>(roi_bin_grid_w);
+          T val =
+              bilinear_interpolate(offset_input, height, width, y, x, index);
+          output_val += val;
+        }
+      }
+      output[index] = output_val / count;
+    }
+  }
+}
+
+/*** Backward ***/
+template <typename T>
+__global__ void roi_align_backward_cuda_kernel(
+    const int nthreads, const T* grad_output, const T* rois, const T* argmax_y,
+    const T* argmax_x, T* grad_input, const int pooled_height,
+    const int pooled_width, const T spatial_scale, const int sampling_ratio,
+    const int pool_mode,  // 0 - max pool, 1 - avg pool
+    const bool aligned, const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T grad_output_this_bin = grad_output[index];
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+    T* offset_grad_input =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+
+    if (pool_mode == 0) {
+      T y = argmax_y[index], x = argmax_x[index];
+      if (y != -1.f) {
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high, index);
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          atomicAdd(offset_grad_input + y_low * width + x_low,
+                    grad_output_this_bin * w1);
+          atomicAdd(offset_grad_input + y_low * width + x_high,
+                    grad_output_this_bin * w2);
+          atomicAdd(offset_grad_input + y_high * width + x_low,
+                    grad_output_this_bin * w3);
+          atomicAdd(offset_grad_input + y_high * width + x_high,
+                    grad_output_this_bin * w4);
+        }
+      }
+    } else if (pool_mode == 1) {
+      // Do not using rounding; this implementation detail is critical
+      T offset = aligned ? (T)0.5 : (T)0.0;
+      T roi_start_w = offset_rois[1] * spatial_scale - offset;
+      T roi_start_h = offset_rois[2] * spatial_scale - offset;
+      T roi_end_w = offset_rois[3] * spatial_scale - offset;
+      T roi_end_h = offset_rois[4] * spatial_scale - offset;
+
+      T roi_width = roi_end_w - roi_start_w;
+      T roi_height = roi_end_h - roi_start_h;
+      if (!aligned) {  // for backward-compatibility only
+        roi_width = max(roi_width, (T)1.);
+        roi_height = max(roi_height, (T)1.);
+      }
+
+      T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+      T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+      // We use roi_bin_grid to sample the grid and mimic integral
+      int roi_bin_grid_h =
+          (sampling_ratio > 0)
+              ? sampling_ratio
+              : static_cast<int>(ceilf(roi_height / pooled_height));
+      int roi_bin_grid_w =
+          (sampling_ratio > 0)
+              ? sampling_ratio
+              : static_cast<int>(ceilf(roi_width / pooled_width));
+
+      // We do average (integral) pooling inside a bin
+      const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+      for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+        const T y = roi_start_h + ph * bin_size_h +
+                    static_cast<T>(iy + .5f) * bin_size_h /
+                        static_cast<T>(roi_bin_grid_h);
+        for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+          const T x = roi_start_w + pw * bin_size_w +
+                      static_cast<T>(ix + .5f) * bin_size_w /
+                          static_cast<T>(roi_bin_grid_w);
+
+          T w1, w2, w3, w4;
+          int x_low, x_high, y_low, y_high;
+          bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                        x_low, x_high, y_low, y_high, index);
+
+          if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+            atomicAdd(offset_grad_input + y_low * width + x_low,
+                      grad_output_this_bin * w1 / count);
+            atomicAdd(offset_grad_input + y_low * width + x_high,
+                      grad_output_this_bin * w2 / count);
+            atomicAdd(offset_grad_input + y_high * width + x_low,
+                      grad_output_this_bin * w3 / count);
+            atomicAdd(offset_grad_input + y_high * width + x_high,
+                      grad_output_this_bin * w4 / count);
+          }
+        }
+      }
+    }
+  }
+}
+
+#endif  // ROI_ALIGN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_rotated_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_rotated_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..8274dc50c709630c4ee456efd543aa1265049b41
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_align_rotated_cuda_kernel.cuh
@@ -0,0 +1,202 @@
+// Modified from
+// https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/ROIAlignRotated
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#ifndef ROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
+#define ROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
+
+#include <float.h>
+#ifdef MMCV_WITH_TRT
+#include "common_cuda_helper.hpp"
+#else  // MMCV_WITH_TRT
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else  // MMCV_USE_PARROTS
+#include "pytorch_cuda_helper.hpp"
+#endif  // MMCV_USE_PARROTS
+#endif  // MMCV_WITH_TRT
+
+/*** Forward ***/
+template <typename scalar_t>
+__global__ void roi_align_rotated_forward_cuda_kernel(
+    const int nthreads, const scalar_t *bottom_data,
+    const scalar_t *bottom_rois, const scalar_t spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width, scalar_t *top_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const scalar_t *offset_bottom_rois = bottom_rois + n * 6;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not using rounding; this implementation detail is critical
+    scalar_t offset = aligned ? (scalar_t)0.5 : (scalar_t)0.0;
+    scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale - offset;
+    scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale - offset;
+    scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
+    scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
+    // scalar_t theta = offset_bottom_rois[5] * M_PI / 180.0;
+    scalar_t theta = offset_bottom_rois[5];
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    if (!aligned) {  // for backward-compatibility only
+      // Force malformed ROIs to be 1x1
+      roi_width = max(roi_width, (scalar_t)1.);
+      roi_height = max(roi_height, (scalar_t)1.);
+    }
+    scalar_t bin_size_h = static_cast<scalar_t>(roi_height) /
+                          static_cast<scalar_t>(pooled_height);
+    scalar_t bin_size_w =
+        static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
+
+    const scalar_t *offset_bottom_data =
+        bottom_data + (roi_batch_ind * channels + c) * height * width;
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    scalar_t roi_start_h = -roi_height / 2.0;
+    scalar_t roi_start_w = -roi_width / 2.0;
+    scalar_t cosscalar_theta = cos(theta);
+    scalar_t sinscalar_theta = sin(theta);
+
+    // We do average (integral) pooling inside a bin
+    const scalar_t count = max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    scalar_t output_val = 0.;
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {  // e.g., iy = 0, 1
+      const scalar_t yy =
+          roi_start_h + ph * bin_size_h +
+          static_cast<scalar_t>(iy + .5f) * bin_size_h /
+              static_cast<scalar_t>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const scalar_t xx = roi_start_w + pw * bin_size_w +
+                            static_cast<scalar_t>(ix + .5f) * bin_size_w /
+                                static_cast<scalar_t>(roi_bin_grid_w);
+
+        // Rotate by theta (counterclockwise) around the center and translate
+        scalar_t y = yy * cosscalar_theta - xx * sinscalar_theta + roi_center_h;
+        scalar_t x = yy * sinscalar_theta + xx * cosscalar_theta + roi_center_w;
+
+        scalar_t val = bilinear_interpolate<scalar_t>(
+            offset_bottom_data, height, width, y, x, index);
+        output_val += val;
+      }
+    }
+    output_val /= count;
+
+    top_data[index] = output_val;
+  }
+}
+
+/*** Backward ***/
+template <typename scalar_t>
+__global__ void roi_align_rotated_backward_cuda_kernel(
+    const int nthreads, const scalar_t *top_diff, const scalar_t *bottom_rois,
+    const scalar_t spatial_scale, const int sampling_ratio, const bool aligned,
+    const bool clockwise, const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width, scalar_t *bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const scalar_t *offset_bottom_rois = bottom_rois + n * 6;
+    int roi_batch_ind = offset_bottom_rois[0];
+
+    // Do not round
+    scalar_t offset = aligned ? (scalar_t)0.5 : (scalar_t)0.0;
+    scalar_t roi_center_w = offset_bottom_rois[1] * spatial_scale - offset;
+    scalar_t roi_center_h = offset_bottom_rois[2] * spatial_scale - offset;
+    scalar_t roi_width = offset_bottom_rois[3] * spatial_scale;
+    scalar_t roi_height = offset_bottom_rois[4] * spatial_scale;
+    // scalar_t theta = offset_bottom_rois[5] * M_PI / 180.0;
+    scalar_t theta = offset_bottom_rois[5];
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    if (!aligned) {  // for backward-compatibility only
+      // Force malformed ROIs to be 1x1
+      roi_width = max(roi_width, (scalar_t)1.);
+      roi_height = max(roi_height, (scalar_t)1.);
+    }
+    scalar_t bin_size_h = static_cast<scalar_t>(roi_height) /
+                          static_cast<scalar_t>(pooled_height);
+    scalar_t bin_size_w =
+        static_cast<scalar_t>(roi_width) / static_cast<scalar_t>(pooled_width);
+
+    scalar_t *offset_bottom_diff =
+        bottom_diff + (roi_batch_ind * channels + c) * height * width;
+
+    int top_offset = (n * channels + c) * pooled_height * pooled_width;
+    const scalar_t *offset_top_diff = top_diff + top_offset;
+    const scalar_t top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    scalar_t roi_start_h = -roi_height / 2.0;
+    scalar_t roi_start_w = -roi_width / 2.0;
+    scalar_t cosTheta = cos(theta);
+    scalar_t sinTheta = sin(theta);
+
+    // We do average (integral) pooling inside a bin
+    const scalar_t count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {  // e.g., iy = 0, 1
+      const scalar_t yy =
+          roi_start_h + ph * bin_size_h +
+          static_cast<scalar_t>(iy + .5f) * bin_size_h /
+              static_cast<scalar_t>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const scalar_t xx = roi_start_w + pw * bin_size_w +
+                            static_cast<scalar_t>(ix + .5f) * bin_size_w /
+                                static_cast<scalar_t>(roi_bin_grid_w);
+
+        // Rotate by theta around the center and translate
+        scalar_t y = yy * cosTheta - xx * sinTheta + roi_center_h;
+        scalar_t x = yy * sinTheta + xx * cosTheta + roi_center_w;
+
+        scalar_t w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+
+        bilinear_interpolate_gradient<scalar_t>(height, width, y, x, w1, w2, w3,
+                                                w4, x_low, x_high, y_low,
+                                                y_high, index);
+
+        scalar_t g1 = top_diff_this_bin * w1 / count;
+        scalar_t g2 = top_diff_this_bin * w2 / count;
+        scalar_t g3 = top_diff_this_bin * w3 / count;
+        scalar_t g4 = top_diff_this_bin * w4 / count;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          atomicAdd(offset_bottom_diff + y_low * width + x_low, g1);
+          atomicAdd(offset_bottom_diff + y_low * width + x_high, g2);
+          atomicAdd(offset_bottom_diff + y_high * width + x_low, g3);
+          atomicAdd(offset_bottom_diff + y_high * width + x_high, g4);
+        }  // if
+      }    // ix
+    }      // iy
+  }        // CUDA_1D_KERNEL_LOOP
+}  // RoIAlignBackward
+
+#endif  // ROI_ALIGN_ROTATED_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_pool_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_pool_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..3d7eae66b99b7812b92d9fc8bad237cbcbd59436
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roi_pool_cuda_kernel.cuh
@@ -0,0 +1,93 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROI_POOL_CUDA_KERNEL_CUH
+#define ROI_POOL_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void roi_pool_forward_cuda_kernel(
+    const int nthreads, const T* input, const T* rois, T* output, int* argmax,
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+    // calculate the roi region on feature maps
+    T roi_x1 = offset_rois[1] * spatial_scale;
+    T roi_y1 = offset_rois[2] * spatial_scale;
+    T roi_x2 = (offset_rois[3] + 1) * spatial_scale;
+    T roi_y2 = (offset_rois[4] + 1) * spatial_scale;
+
+    // force malformed rois to be 1x1
+    T roi_w = roi_x2 - roi_x1;
+    T roi_h = roi_y2 - roi_y1;
+    if (roi_w <= 0 || roi_h <= 0) continue;
+
+    T bin_size_w = roi_w / static_cast<T>(pooled_width);
+    T bin_size_h = roi_h / static_cast<T>(pooled_height);
+
+    // the corresponding bin region
+    int bin_x1 = floorf(static_cast<T>(pw) * bin_size_w + roi_x1);
+    int bin_y1 = floorf(static_cast<T>(ph) * bin_size_h + roi_y1);
+    int bin_x2 = ceilf(static_cast<T>(pw + 1) * bin_size_w + roi_x1);
+    int bin_y2 = ceilf(static_cast<T>(ph + 1) * bin_size_h + roi_y1);
+
+    // add roi offsets and clip to input boundaries
+    bin_x1 = min(max(bin_x1, 0), width);
+    bin_y1 = min(max(bin_y1, 0), height);
+    bin_x2 = min(max(bin_x2, 0), width);
+    bin_y2 = min(max(bin_y2, 0), height);
+    bool is_empty = (bin_y2 <= bin_y1) || (bin_x2 <= bin_x1);
+
+    const T* offset_input =
+        input + (roi_batch_ind * channels + c) * height * width;
+    // Define an empty pooling region to be zero
+    // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
+    T max_val = is_empty ? 0 : -FLT_MAX;
+    int max_idx = -1;
+    for (int h = bin_y1; h < bin_y2; ++h) {
+      for (int w = bin_x1; w < bin_x2; ++w) {
+        int offset = h * width + w;
+        if (offset_input[offset] > max_val) {
+          max_val = offset_input[offset];
+          max_idx = offset;
+        }
+      }
+    }
+    output[index] = max_val;
+    if (argmax != NULL) argmax[index] = max_idx;
+  }
+}
+
+template <typename T>
+__global__ void roi_pool_backward_cuda_kernel(
+    const int nthreads, const T* grad_output, const T* rois, const int* argmax,
+    T* grad_input, const int pooled_height, const int pooled_width,
+    const int channels, const int height, const int width) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c) is an element in the pooled output
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    int roi_batch_ind = rois[n * 5];
+    T* grad_input_offset =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+    int argmax_index = argmax[index];
+
+    if (argmax_index != -1) {
+      atomicAdd(grad_input_offset + argmax_index, grad_output[index]);
+    }
+  }
+}
+
+#endif  // ROI_POOL_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roiaware_pool3d_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roiaware_pool3d_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..fc0aacf1435f8715fae92de535bf01bac07ac39a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roiaware_pool3d_cuda_kernel.cuh
@@ -0,0 +1,260 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROIAWARE_POOL3D_CUDA_KERNEL_CUH
+#define ROIAWARE_POOL3D_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__device__ inline void lidar_to_local_coords(T shift_x, T shift_y, T rz,
+                                             T &local_x, T &local_y) {
+  T cosa = cos(-rz), sina = sin(-rz);
+  local_x = shift_x * cosa + shift_y * (-sina);
+  local_y = shift_x * sina + shift_y * cosa;
+}
+
+template <typename T>
+__device__ inline int check_pt_in_box3d(const T *pt, const T *box3d, T &local_x,
+                                        T &local_y) {
+  // param pt: (x, y, z)
+  // param box3d: (cx, cy, cz, x_size, y_size, z_size, rz) in LiDAR coordinate,
+  // cz in the bottom center
+  T x = pt[0], y = pt[1], z = pt[2];
+  T cx = box3d[0], cy = box3d[1], cz = box3d[2];
+  T x_size = box3d[3], y_size = box3d[4], z_size = box3d[5], rz = box3d[6];
+  cz += z_size /
+        2.0;  // shift to the center since cz in box3d is the bottom center
+
+  if (fabsf(z - cz) > z_size / 2.0) return 0;
+  lidar_to_local_coords(x - cx, y - cy, rz, local_x, local_y);
+  float in_flag = (local_x > -x_size / 2.0) & (local_x < x_size / 2.0) &
+                  (local_y > -y_size / 2.0) & (local_y < y_size / 2.0);
+  return in_flag;
+}
+
+template <typename T>
+__global__ void generate_pts_mask_for_box3d(int boxes_num, int pts_num,
+                                            int out_x, int out_y, int out_z,
+                                            const T *rois, const T *pts,
+                                            int *pts_mask) {
+  // params rois: (N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate params pts: (npoints, 3) [x, y, z] params pts_mask: (N,
+  // npoints): -1 means point does not in this box, otherwise: encode (x_idxs,
+  // y_idxs, z_idxs) by binary bit
+  int box_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, pts_num) {
+    if (box_idx >= boxes_num) return;
+
+    pts += pt_idx * 3;
+    rois += box_idx * 7;
+    pts_mask += box_idx * pts_num + pt_idx;
+
+    T local_x = 0, local_y = 0;
+    int cur_in_flag = check_pt_in_box3d(pts, rois, local_x, local_y);
+
+    pts_mask[0] = -1;
+    if (cur_in_flag > 0) {
+      T local_z = pts[2] - rois[2];
+      T x_size = rois[3], y_size = rois[4], z_size = rois[5];
+
+      T x_res = x_size / out_x;
+      T y_res = y_size / out_y;
+      T z_res = z_size / out_z;
+
+      unsigned int x_idx = int((local_x + x_size / 2) / x_res);
+      unsigned int y_idx = int((local_y + y_size / 2) / y_res);
+      unsigned int z_idx = int(local_z / z_res);
+
+      x_idx = min(max(x_idx, 0), out_x - 1);
+      y_idx = min(max(y_idx, 0), out_y - 1);
+      z_idx = min(max(z_idx, 0), out_z - 1);
+
+      unsigned int idx_encoding = (x_idx << 16) + (y_idx << 8) + z_idx;
+
+      pts_mask[0] = idx_encoding;
+    }
+  }
+}
+
+template <typename T>
+__global__ void collect_inside_pts_for_box3d(int boxes_num, int pts_num,
+                                             int max_pts_each_voxel, int out_x,
+                                             int out_y, int out_z,
+                                             const int *pts_mask,
+                                             T *pts_idx_of_voxels) {
+  // params pts_mask: (N, npoints)  0 or 1
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  CUDA_1D_KERNEL_LOOP(box_idx, boxes_num) {
+    int max_num_pts = max_pts_each_voxel - 1;  // index 0 is the counter
+    pts_idx_of_voxels += box_idx * out_x * out_y * out_z * max_pts_each_voxel;
+
+    for (int k = 0; k < pts_num; k++) {
+      if (pts_mask[box_idx * pts_num + k] != -1) {
+        unsigned int idx_encoding = pts_mask[box_idx * pts_num + k];
+        unsigned int x_idx = (idx_encoding >> 16) & 0xFF;
+        unsigned int y_idx = (idx_encoding >> 8) & 0xFF;
+        unsigned int z_idx = idx_encoding & 0xFF;
+        unsigned int base_offset = x_idx * out_y * out_z * max_pts_each_voxel +
+                                   y_idx * out_z * max_pts_each_voxel +
+                                   z_idx * max_pts_each_voxel;
+        unsigned int cnt = pts_idx_of_voxels[base_offset];
+        if (cnt < max_num_pts) {
+          pts_idx_of_voxels[base_offset + cnt + 1] = k;
+          pts_idx_of_voxels[base_offset]++;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void roiaware_maxpool3d(int boxes_num, int pts_num, int channels,
+                                   int max_pts_each_voxel, int out_x, int out_y,
+                                   int out_z, const T *pts_feature,
+                                   const int *pts_idx_of_voxels,
+                                   T *pooled_features, int *argmax) {
+  // params pts_feature: (npoints, C)
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel),
+  // index 0 is the counter params pooled_features: (N, out_x, out_y, out_z, C)
+  // params argmax: (N, out_x, out_y, out_z, C)
+
+  int box_idx = blockIdx.z;
+  int channel_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(voxel_idx_flat, out_x * out_y * out_z) {
+    int x_idx = voxel_idx_flat / (out_y * out_z);
+    int y_idx = (voxel_idx_flat - x_idx * (out_y * out_z)) / out_z;
+    int z_idx = voxel_idx_flat % out_z;
+    if (box_idx >= boxes_num || channel_idx >= channels) return;
+
+    int offset_base = x_idx * out_y * out_z + y_idx * out_z + z_idx;
+    pts_idx_of_voxels += box_idx * out_x * out_y * out_z * max_pts_each_voxel +
+                         offset_base * max_pts_each_voxel;
+    pooled_features += box_idx * out_x * out_y * out_z * channels +
+                       offset_base * channels + channel_idx;
+    argmax += box_idx * out_x * out_y * out_z * channels +
+              offset_base * channels + channel_idx;
+
+    int argmax_idx = -1;
+    float max_val = -1e50;
+
+    int total_pts = pts_idx_of_voxels[0];
+
+    for (int k = 1; k <= total_pts; k++) {
+      if (pts_feature[pts_idx_of_voxels[k] * channels + channel_idx] >
+          max_val) {
+        max_val = pts_feature[pts_idx_of_voxels[k] * channels + channel_idx];
+        argmax_idx = pts_idx_of_voxels[k];
+      }
+    }
+
+    if (argmax_idx != -1) {
+      pooled_features[0] = max_val;
+    }
+    argmax[0] = argmax_idx;
+  }
+}
+
+template <typename T>
+__global__ void roiaware_avgpool3d(int boxes_num, int pts_num, int channels,
+                                   int max_pts_each_voxel, int out_x, int out_y,
+                                   int out_z, const T *pts_feature,
+                                   const int *pts_idx_of_voxels,
+                                   T *pooled_features) {
+  // params pts_feature: (npoints, C)
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel),
+  // index 0 is the counter params pooled_features: (N, out_x, out_y, out_z, C)
+  // params argmax: (N, out_x, out_y, out_z, C)
+
+  int box_idx = blockIdx.z;
+  int channel_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(voxel_idx_flat, out_x * out_y * out_z) {
+    int x_idx = voxel_idx_flat / (out_y * out_z);
+    int y_idx = (voxel_idx_flat - x_idx * (out_y * out_z)) / out_z;
+    int z_idx = voxel_idx_flat % out_z;
+    if (box_idx >= boxes_num || channel_idx >= channels) return;
+
+    int offset_base = x_idx * out_y * out_z + y_idx * out_z + z_idx;
+    pts_idx_of_voxels += box_idx * out_x * out_y * out_z * max_pts_each_voxel +
+                         offset_base * max_pts_each_voxel;
+    pooled_features += box_idx * out_x * out_y * out_z * channels +
+                       offset_base * channels + channel_idx;
+
+    float sum_val = 0;
+    int total_pts = pts_idx_of_voxels[0];
+
+    for (int k = 1; k <= total_pts; k++) {
+      sum_val += pts_feature[pts_idx_of_voxels[k] * channels + channel_idx];
+    }
+
+    if (total_pts > 0) {
+      pooled_features[0] = sum_val / total_pts;
+    }
+  }
+}
+
+template <typename T>
+__global__ void roiaware_maxpool3d_backward(int boxes_num, int channels,
+                                            int out_x, int out_y, int out_z,
+                                            const int *argmax,
+                                            const T *grad_out, T *grad_in) {
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params grad_out: (N, out_x, out_y, out_z, C)
+  // params grad_in: (npoints, C), return value
+
+  int box_idx = blockIdx.z;
+  int channel_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(voxel_idx_flat, out_x * out_y * out_z) {
+    int x_idx = voxel_idx_flat / (out_y * out_z);
+    int y_idx = (voxel_idx_flat - x_idx * (out_y * out_z)) / out_z;
+    int z_idx = voxel_idx_flat % out_z;
+    if (box_idx >= boxes_num || channel_idx >= channels) return;
+
+    int offset_base = x_idx * out_y * out_z + y_idx * out_z + z_idx;
+    argmax += box_idx * out_x * out_y * out_z * channels +
+              offset_base * channels + channel_idx;
+    grad_out += box_idx * out_x * out_y * out_z * channels +
+                offset_base * channels + channel_idx;
+
+    if (argmax[0] == -1) return;
+
+    atomicAdd(grad_in + argmax[0] * channels + channel_idx, grad_out[0] * 1);
+  }
+}
+
+template <typename T>
+__global__ void roiaware_avgpool3d_backward(int boxes_num, int channels,
+                                            int out_x, int out_y, int out_z,
+                                            int max_pts_each_voxel,
+                                            const int *pts_idx_of_voxels,
+                                            const T *grad_out, T *grad_in) {
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params grad_out: (N, out_x, out_y, out_z, C)
+  // params grad_in: (npoints, C), return value
+
+  int box_idx = blockIdx.z;
+  int channel_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(voxel_idx_flat, out_x * out_y * out_z) {
+    int x_idx = voxel_idx_flat / (out_y * out_z);
+    int y_idx = (voxel_idx_flat - x_idx * (out_y * out_z)) / out_z;
+    int z_idx = voxel_idx_flat % out_z;
+    if (box_idx >= boxes_num || channel_idx >= channels) return;
+
+    int offset_base = x_idx * out_y * out_z + y_idx * out_z + z_idx;
+    pts_idx_of_voxels += box_idx * out_x * out_y * out_z * max_pts_each_voxel +
+                         offset_base * max_pts_each_voxel;
+    grad_out += box_idx * out_x * out_y * out_z * channels +
+                offset_base * channels + channel_idx;
+
+    int total_pts = pts_idx_of_voxels[0];
+    float cur_grad = 1 / fmaxf(float(total_pts), 1.0);
+    for (int k = 1; k <= total_pts; k++) {
+      atomicAdd(grad_in + pts_idx_of_voxels[k] * channels + channel_idx,
+                grad_out[0] * cur_grad);
+    }
+  }
+}
+
+#endif  // ROIAWARE_POOL3D_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roipoint_pool3d_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roipoint_pool3d_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..545f6ffa09d4a6cae49f1f1e68c191c1fd54de68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/roipoint_pool3d_cuda_kernel.cuh
@@ -0,0 +1,134 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROIPOINT_POOL3D_CUDA_KERNEL_CUH
+#define ROIPOINT_POOL3D_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__device__ inline void lidar_to_local_coords(T shift_x, T shift_y, T rz,
+                                             T &local_x, T &local_y) {
+  T cosa = cos(-rz), sina = sin(-rz);
+  local_x = shift_x * cosa + shift_y * (-sina);
+  local_y = shift_x * sina + shift_y * cosa;
+}
+
+template <typename T>
+__device__ inline int check_pt_in_box3d(const T *pt, const T *box3d, T &local_x,
+                                        T &local_y) {
+  // param pt: (x, y, z)
+  // param box3d: (cx, cy, cz, dx, dy, dz, rz) in LiDAR coordinate, cz in the
+  // bottom center
+  T x = pt[0], y = pt[1], z = pt[2];
+  T cx = box3d[0], cy = box3d[1], cz = box3d[2];
+  T dx = box3d[3], dy = box3d[4], dz = box3d[5], rz = box3d[6];
+  cz += dz / 2.0;  // shift to the center since cz in box3d is the bottom center
+
+  if (fabsf(z - cz) > dz / 2.0) return 0;
+  lidar_to_local_coords(x - cx, y - cy, rz, local_x, local_y);
+  T in_flag = (local_x > -dx / 2.0) & (local_x < dx / 2.0) &
+              (local_y > -dy / 2.0) & (local_y < dy / 2.0);
+  return in_flag;
+}
+
+template <typename T>
+__global__ void assign_pts_to_box3d(int batch_size, int pts_num, int boxes_num,
+                                    const T *xyz, const T *boxes3d,
+                                    int *pts_assign) {
+  // params xyz: (B, N, 3)
+  // params boxes3d: (B, M, 7)
+  // params pts_assign: (B, N, M): idx of the corresponding box3d, -1 means
+  // background points
+  int box_idx = blockIdx.y;
+  int bs_idx = blockIdx.z;
+  CUDA_1D_KERNEL_LOOP(pt_idx, pts_num) {
+    if (box_idx >= boxes_num || bs_idx >= batch_size) return;
+
+    int assign_idx =
+        bs_idx * pts_num * boxes_num + pt_idx * boxes_num + box_idx;
+    pts_assign[assign_idx] = 0;
+
+    int box_offset = bs_idx * boxes_num * 7 + box_idx * 7;
+    int pt_offset = bs_idx * pts_num * 3 + pt_idx * 3;
+
+    T local_x = 0, local_y = 0;
+    int cur_in_flag = check_pt_in_box3d(xyz + pt_offset, boxes3d + box_offset,
+                                        local_x, local_y);
+    pts_assign[assign_idx] = cur_in_flag;
+  }
+}
+
+__global__ void get_pooled_idx(int batch_size, int pts_num, int boxes_num,
+                               int sampled_pts_num, const int *pts_assign,
+                               int *pts_idx, int *pooled_empty_flag) {
+  // params xyz: (B, N, 3)
+  // params pts_feature: (B, N, C)
+  // params pts_assign: (B, N)
+  // params pts_idx: (B, M, 512)
+  // params pooled_empty_flag: (B, M)
+  CUDA_1D_KERNEL_LOOP(boxes_idx, boxes_num) {
+    int bs_idx = blockIdx.y;
+
+    int cnt = 0;
+    for (int k = 0; k < pts_num; k++) {
+      if (pts_assign[bs_idx * pts_num * boxes_num + k * boxes_num +
+                     boxes_idx]) {
+        if (cnt < sampled_pts_num) {
+          pts_idx[bs_idx * boxes_num * sampled_pts_num +
+                  boxes_idx * sampled_pts_num + cnt] = k;
+          cnt++;
+        } else
+          break;
+      }
+    }
+
+    if (cnt == 0) {
+      pooled_empty_flag[bs_idx * boxes_num + boxes_idx] = 1;
+    } else if (cnt < sampled_pts_num) {
+      // duplicate same points for sampling
+      for (int k = cnt; k < sampled_pts_num; k++) {
+        int duplicate_idx = k % cnt;
+        int base_offset =
+            bs_idx * boxes_num * sampled_pts_num + boxes_idx * sampled_pts_num;
+        pts_idx[base_offset + k] = pts_idx[base_offset + duplicate_idx];
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void roipoint_pool3d_forward(
+    int batch_size, int pts_num, int boxes_num, int feature_in_len,
+    int sampled_pts_num, const T *xyz, const int *pts_idx, const T *pts_feature,
+    T *pooled_features, int *pooled_empty_flag) {
+  // params xyz: (B, N, 3)
+  // params pts_idx: (B, M, 512)
+  // params pts_feature: (B, N, C)
+  // params pooled_features: (B, M, 512, 3+C)
+  // params pooled_empty_flag: (B, M)
+  int box_idx = blockIdx.y;
+  int bs_idx = blockIdx.z;
+  CUDA_1D_KERNEL_LOOP(sample_pt_idx, sampled_pts_num) {
+    if (box_idx >= boxes_num || bs_idx >= batch_size) return;
+    if (pooled_empty_flag[bs_idx * boxes_num + box_idx]) return;
+
+    int temp_idx = bs_idx * boxes_num * sampled_pts_num +
+                   box_idx * sampled_pts_num + sample_pt_idx;
+    int src_pt_idx = pts_idx[temp_idx];
+    int dst_feature_offset = temp_idx * (3 + feature_in_len);
+
+    for (int j = 0; j < 3; j++)
+      pooled_features[dst_feature_offset + j] =
+          xyz[bs_idx * pts_num * 3 + src_pt_idx * 3 + j];
+
+    int src_feature_offset =
+        bs_idx * pts_num * feature_in_len + src_pt_idx * feature_in_len;
+    memcpy(pooled_features + dst_feature_offset + 3,
+           pts_feature + src_feature_offset, feature_in_len * sizeof(T));
+  }
+}
+
+#endif  // ROIPOINT_POOL3D_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/rotated_feature_align_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/rotated_feature_align_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..ffcc658ccb1f5e3059c0428159bc2e80fbeee3d4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/rotated_feature_align_cuda_kernel.cuh
@@ -0,0 +1,129 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection/blob/master/mmdet/ops/fr/src/feature_refine_kernel.cu
+#ifndef ROTATED_FEATURE_ALIGN_CUDA_KERNEL_CUH
+#define ROTATED_FEATURE_ALIGN_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename scalar_t>
+__global__ void rotated_feature_align_forward_kernel(
+    const int nthreads, const int points, const scalar_t* bottom_data,
+    const scalar_t* best_bboxes, const scalar_t spatial_scale,
+    const int channels, const int height, const int width, scalar_t* top_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int w = index % width;
+    int h = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    const scalar_t* bbox_offset =
+        best_bboxes + ((n * height + h) * width + w) * 5;
+    scalar_t roi_y = bbox_offset[0] * spatial_scale;
+    scalar_t roi_x = bbox_offset[1] * spatial_scale;
+
+    scalar_t px[5] = {roi_x, 0, 0, 0, 0};
+    scalar_t py[5] = {roi_y, 0, 0, 0, 0};
+
+    if (points > 1) {
+      scalar_t roi_w = bbox_offset[2] * spatial_scale;
+      scalar_t roi_h = bbox_offset[3] * spatial_scale;
+      scalar_t roi_a = bbox_offset[4];
+
+      scalar_t w_2 = roi_w / 2, h_2 = roi_h / 2;
+      scalar_t cosa = cosf(roi_a), sina = sinf(roi_a);
+      scalar_t wx = cosa * w_2, wy = sina * w_2;
+      scalar_t hx = -sina * h_2, hy = cosa * h_2;
+
+      px[1] = roi_x + wx + hx;
+      py[1] = roi_y + wy + hy;
+      px[2] = roi_x - wx + hx;
+      py[2] = roi_y - wy + hy;
+      px[3] = roi_x - wx - hx;
+      py[3] = roi_y - wy - hy;
+      px[4] = roi_x + wx - hx;
+      py[4] = roi_y + wy - hy;
+    }
+
+    const scalar_t* offset_bottom_data =
+        bottom_data + (n * channels + c) * height * width;
+
+    scalar_t output_val = bottom_data[index];
+    for (int i = 0; i < points; i++) {
+      output_val += bilinear_interpolate<scalar_t>(offset_bottom_data, height,
+                                                   width, py[i], px[i], i);
+    }
+    top_data[index] = output_val;
+  }
+}
+
+template <typename scalar_t>
+__global__ void rotated_feature_align_backward_kernel(
+    const int nthreads, const int points, const scalar_t* top_diff,
+    const scalar_t* best_bboxes, const scalar_t spatial_scale,
+    const int channels, const int height, const int width,
+    scalar_t* bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int w = index % width;
+    int h = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    const scalar_t* bbox_offset =
+        best_bboxes + ((n * height + h) * width + w) * 5;
+    scalar_t roi_y = bbox_offset[0] * spatial_scale;
+    scalar_t roi_x = bbox_offset[1] * spatial_scale;
+
+    scalar_t px[5] = {roi_x, 0, 0, 0, 0};
+    scalar_t py[5] = {roi_y, 0, 0, 0, 0};
+
+    if (points > 1) {
+      scalar_t roi_w = bbox_offset[2] * spatial_scale;
+      scalar_t roi_h = bbox_offset[3] * spatial_scale;
+      scalar_t roi_a = bbox_offset[4];
+
+      scalar_t w_2 = roi_w / 2, h_2 = roi_h / 2;
+      scalar_t cosa = cosf(roi_a), sina = sinf(roi_a);
+      scalar_t wx = cosa * w_2, wy = sina * w_2;
+      scalar_t hx = -sina * h_2, hy = cosa * h_2;
+
+      px[1] = roi_x + wx + hx;
+      py[1] = roi_y + wy + hy;
+      px[2] = roi_x - wx + hx;
+      py[2] = roi_y - wy + hy;
+      px[3] = roi_x - wx - hx;
+      py[3] = roi_y - wy - hy;
+      px[4] = roi_x + wx - hx;
+      py[4] = roi_y + wy - hy;
+    }
+
+    scalar_t* offset_bottom_diff =
+        bottom_diff + (n * channels + c) * height * width;
+    scalar_t value_top_diff = top_diff[index];
+
+    atomicAdd(bottom_diff + index, value_top_diff);
+    for (int i = 0; i < points; i++) {
+      scalar_t w1, w2, w3, w4;
+      int x_low, x_high, y_low, y_high;
+
+      bilinear_interpolate_gradient<scalar_t>(height, width, py[i], px[i], w1,
+                                              w2, w3, w4, x_low, x_high, y_low,
+                                              y_high, i);
+      scalar_t g1 = value_top_diff * w1;
+      scalar_t g2 = value_top_diff * w2;
+      scalar_t g3 = value_top_diff * w3;
+      scalar_t g4 = value_top_diff * w4;
+      if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+        atomicAdd(offset_bottom_diff + y_low * width + x_low, g1);
+        atomicAdd(offset_bottom_diff + y_low * width + x_high, g2);
+        atomicAdd(offset_bottom_diff + y_high * width + x_low, g3);
+        atomicAdd(offset_bottom_diff + y_high * width + x_high, g4);
+      }
+    }
+  }
+}
+#endif  // ROTATED_FEATURE_ALIGN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/scatter_points_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/scatter_points_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..bc2f7a58746f141da698fcc9b77231a001ac5b11
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/scatter_points_cuda_kernel.cuh
@@ -0,0 +1,189 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef SCATTER_POINTS_CUDA_KERNEL_CUH
+#define SCATTER_POINTS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+int const maxGridDim = 50000;
+
+__device__ __forceinline__ static void reduceMax(float *address, float val) {
+  int *address_as_i = reinterpret_cast<int *>(address);
+  int old = *address_as_i, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_i, assumed,
+                    __float_as_int(fmaxf(val, __int_as_float(assumed))));
+  } while (assumed != old || __int_as_float(old) < val);
+}
+
+__device__ __forceinline__ static void reduceMax(double *address, double val) {
+  unsigned long long *address_as_ull =
+      reinterpret_cast<unsigned long long *>(address);
+  unsigned long long old = *address_as_ull, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(
+        address_as_ull, assumed,
+        __double_as_longlong(fmax(val, __longlong_as_double(assumed))));
+  } while (assumed != old || __longlong_as_double(old) < val);
+}
+
+// get rid of meaningless warnings when compiling host code
+// #ifdef MMCV_WITH_HIP
+#if defined(MMCV_WITH_HIP) || defined(__ILUVATAR__)
+
+__device__ __forceinline__ static void reduceAdd(float *address, float val) {
+  atomicAdd(address, val);
+}
+__device__ __forceinline__ static void reduceAdd(double *address, double val) {
+  atomicAdd(address, val);
+}
+#else
+// #ifdef __CUDA_ARCH__
+__device__ __forceinline__ static void reduceAdd(float *address, float val) {
+#if (__CUDA_ARCH__ < 200)
+#ifdef _MSC_VER
+#pragma message( \
+    "compute capability lower than 2.x. fall back to use CAS version of atomicAdd for float32")
+#else
+#warning \
+    "compute capability lower than 2.x. fall back to use CAS version of atomicAdd for float32"
+#endif
+  int *address_as_i = reinterpret_cast<int *>(address);
+  int old = *address_as_i, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_i, assumed,
+                    __float_as_int(val + __int_as_float(assumed)));
+  } while (assumed != old);
+#else
+  atomicAdd(address, val);
+#endif
+}
+
+__device__ __forceinline__ static void reduceAdd(double *address, double val) {
+#if (__CUDA_ARCH__ < 600)
+#ifdef _MSC_VER
+#pragma message( \
+    "compute capability lower than 6.x. fall back to use CAS version of atomicAdd for float64")
+#else
+#warning \
+    "compute capability lower than 6.x. fall back to use CAS version of atomicAdd for float64"
+#endif
+  unsigned long long *address_as_ull =
+      reinterpret_cast<unsigned long long *>(address);
+  unsigned long long old = *address_as_ull, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_ull, assumed,
+                    __double_as_longlong(val + __longlong_as_double(assumed)));
+  } while (assumed != old);
+#else
+  atomicAdd(address, val);
+#endif
+}
+// #endif  // __CUDA_ARCH__
+#endif  // MMCV_WITH_HIP
+
+template <typename T>
+__global__ void feats_reduce_kernel(
+    const T *feats, const int32_t *coors_map,
+    T *reduced_feats,  // shall be 0 at initialization
+    const int num_input, const int num_feats, const reduce_t reduce_type) {
+  CUDA_1D_KERNEL_LOOP(x, num_input) {
+    int32_t reduce_to = coors_map[x];
+    if (reduce_to == -1) continue;
+
+    const T *feats_offset = feats + x * num_feats;
+    T *reduced_feats_offset = reduced_feats + reduce_to * num_feats;
+    if (reduce_type == reduce_t::MAX) {
+      for (int i = 0; i < num_feats; i++) {
+        reduceMax(&reduced_feats_offset[i], feats_offset[i]);
+      }
+    } else {
+      for (int i = 0; i < num_feats; i++) {
+        reduceAdd(&reduced_feats_offset[i], feats_offset[i]);
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void add_reduce_traceback_grad_kernel(
+    T *grad_feats, const T *grad_reduced_feats, const int32_t *coors_map,
+    const int32_t *reduce_count, const int num_input, const int num_feats,
+    const reduce_t reduce_type) {
+  CUDA_1D_KERNEL_LOOP(x, num_input) {
+    int32_t reduce_to = coors_map[x];
+    if (reduce_to == -1) {
+      continue;
+    }
+
+    const int input_offset = x * num_feats;
+    T *grad_feats_offset = grad_feats + input_offset;
+    const int reduced_offset = reduce_to * num_feats;
+    const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
+
+    if (reduce_type == reduce_t::SUM) {
+      for (int i = 0; i < num_feats; i++) {
+        grad_feats_offset[i] = grad_reduced_feats_offset[i];
+      }
+    } else if (reduce_type == reduce_t::MEAN) {
+      for (int i = 0; i < num_feats; i++) {
+        grad_feats_offset[i] = grad_reduced_feats_offset[i] /
+                               static_cast<T>(reduce_count[reduce_to]);
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void max_reduce_traceback_scatter_idx_kernel(
+    const T *feats, const T *reduced_feats, int32_t *reduce_from,
+    const int32_t *coors_map, const int num_input, const int num_feats) {
+  CUDA_1D_KERNEL_LOOP(x, num_input) {
+    int32_t reduce_to = coors_map[x];
+
+    const int input_offset = x * num_feats;
+    const T *feats_offset = feats + input_offset;
+
+    if (reduce_to == -1) {
+      continue;
+    }
+
+    const int reduced_offset = reduce_to * num_feats;
+    const T *reduced_feats_offset = reduced_feats + reduced_offset;
+    int32_t *reduce_from_offset = reduce_from + reduced_offset;
+
+    for (int i = 0; i < num_feats; i++) {
+      if (feats_offset[i] == reduced_feats_offset[i]) {
+        atomicMin(&reduce_from_offset[i], static_cast<int32_t>(x));
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void max_reduce_scatter_grad_kernel(T *grad_feats,
+                                               const T *grad_reduced_feats,
+                                               const int32_t *reduce_from,
+                                               const int num_reduced,
+                                               const int num_feats) {
+  CUDA_1D_KERNEL_LOOP(x, num_reduced) {
+    const int reduced_offset = x * num_feats;
+    const int32_t *scatter_to_offset = reduce_from + reduced_offset;
+    const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
+
+    for (int i = 0; i < num_feats; i++) {
+      grad_feats[scatter_to_offset[i] * num_feats + i] =
+          grad_reduced_feats_offset[i];
+    }
+  }
+}
+
+#endif  // SCATTER_POINTS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sigmoid_focal_loss_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sigmoid_focal_loss_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..1896b1d037efdf856433787e60def870a18bff35
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sigmoid_focal_loss_cuda_kernel.cuh
@@ -0,0 +1,71 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef SIGMOID_FOCAL_LOSS_CUDA_KERNEL_CUH
+#define SIGMOID_FOCAL_LOSS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void sigmoid_focal_loss_forward_cuda_kernel(
+    const int nthreads, const T* input, const int32_t* target, const T* weight,
+    T* output, const T gamma, const T alpha, const int num_classes) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int n = index / num_classes;
+    int c = index % num_classes;
+
+    int32_t t = target[n];
+    T flag_p = (t == c);
+    T flag_n = (t != c);
+
+    // p = sigmoid(x) = 1. / 1. + expf(-x)
+    T p = (T)1. / ((T)1. + expf(-input[index]));
+
+    // (1 - p)**gamma * log(p)
+    T term_p = pow(((T)1. - p), gamma) * log(max(p, (T)FLT_MIN));
+    // p**gamma * log(1 - p)
+    T term_n = pow(p, gamma) * log(max((T)1. - p, (T)FLT_MIN));
+
+    output[index] = (T)0.;
+    output[index] += -flag_p * alpha * term_p;
+    output[index] += -flag_n * ((T)1. - alpha) * term_n;
+    if (weight != NULL) {
+      output[index] *= weight[t];
+    }
+  }
+}
+
+template <typename T>
+__global__ void sigmoid_focal_loss_backward_cuda_kernel(
+    const int nthreads, const T* input, const int32_t* target, const T* weight,
+    T* grad_input, const T gamma, const T alpha, const int num_classes) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int n = index / num_classes;
+    int c = index % num_classes;
+
+    int32_t t = target[n];
+    T flag_p = (t == c);
+    T flag_n = (t != c);
+
+    // p = sigmoid(x) = 1. / 1. + expf(-x)
+    T p = (T)1. / ((T)1. + exp(-input[index]));
+
+    // (1 - p)**gamma * (1 - p - gamma*p*log(p))
+    T term_p = pow(((T)1. - p), gamma) *
+               ((T)1. - p - (gamma * p * log(max(p, (T)FLT_MIN))));
+    // p**gamma * (gamma * (1 - p) * log(1 - p) - p)
+    T term_n = pow(p, gamma) *
+               (gamma * ((T)1. - p) * log(max((T)1. - p, (T)FLT_MIN)) - p);
+
+    grad_input[index] = (T)0.;
+    grad_input[index] += -flag_p * alpha * term_p;
+    grad_input[index] += -flag_n * ((T)1. - alpha) * term_n;
+    if (weight != NULL) {
+      grad_input[index] *= weight[t];
+    }
+  }
+}
+
+#endif  // SIGMOID_FOCAL_LOSS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/softmax_focal_loss_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/softmax_focal_loss_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..58a07431e1cd2be48bbd07e8186c9053c63f2e30
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/softmax_focal_loss_cuda_kernel.cuh
@@ -0,0 +1,72 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef SOFTMAX_FOCAL_LOSS_CUDA_KERNEL_CUH
+#define SOFTMAX_FOCAL_LOSS_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void softmax_focal_loss_forward_cuda_kernel(
+    const int nthreads, const T* softmax, const int32_t* target,
+    const T* weight, T* output, const T gamma, const T alpha,
+    const int num_classes) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int32_t label = target[index];
+    T pred = softmax[index * num_classes + label];
+
+    if (label >= 0) {
+      output[index] =
+          -alpha * pow((T)1. - pred, gamma) * log(max(pred, (T)FLT_MIN));
+    } else {
+      output[index] = 0;
+    }
+    if (weight != NULL) {
+      output[index] *= weight[label];
+    }
+  }
+}
+
+template <typename T>
+__global__ void softmax_focal_loss_backward_cuda1_kernel(
+    const int nthreads, const T* softmax, const int32_t* target,
+    const T* weight, T* buff, const T gamma, const T alpha,
+    const int num_classes) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int32_t label = target[index];
+    T pred = softmax[index * num_classes + label];
+
+    if (label >= 0) {
+      buff[index] = alpha * (-pow((T)1. - pred, gamma) +
+                             gamma * pow((T)1. - pred, gamma - 1) * pred *
+                                 log(max(pred, (T)FLT_MIN)));
+    } else {
+      buff[index] = 0;
+    }
+    if (weight != NULL) {
+      buff[index] *= weight[label];
+    }
+  }
+}
+
+template <typename T>
+__global__ void softmax_focal_loss_backward_cuda2_kernel(
+    const int nthreads, const T* softmax, const int32_t* target, const T* buff,
+    T* grad_input, const int num_classes) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int n = index / num_classes;
+    int c = index % num_classes;
+    int32_t label = target[n];
+
+    if (label >= 0) {
+      T flag = (label == c ? (T)1. : (T)0.);
+      grad_input[index] = buff[n] * (flag - softmax[index]);
+    } else {
+      grad_input[index] = 0;
+    }
+  }
+}
+
+#endif  // SOFTMAX_FOCAL_LOSS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_ball_query_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_ball_query_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..06caefa18d47be11b6cb8770ceb8951479add902
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_ball_query_cuda_kernel.cuh
@@ -0,0 +1,68 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query_gpu.cu
+#ifndef STACK_BALL_QUERY_CUDA_KERNEL_CUH
+#define STACK_BALL_QUERY_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void stack_ball_query_forward_cuda_kernel(
+    int B, int M, float radius, int nsample, const T *new_xyz,
+    const int *new_xyz_batch_cnt, const T *xyz, const int *xyz_batch_cnt,
+    int *idx) {
+  // :param xyz: (N1 + N2 ..., 3) xyz coordinates of the features
+  // :param xyz_batch_cnt: (batch_size), [N1, N2, ...]
+  // :param new_xyz: (M1 + M2 ..., 3) centers of the ball query
+  // :param new_xyz_batch_cnt: (batch_size), [M1, M2, ...]
+  // output:
+  //      idx: (M, nsample)
+  const T *cur_xyz = xyz;
+  int *cur_idx = idx;
+  CUDA_1D_KERNEL_LOOP(pt_idx, M) {
+    int bs_idx = 0;
+    for (int pt_cnt = 0; bs_idx < B; bs_idx++) {
+      pt_cnt += new_xyz_batch_cnt[bs_idx];
+      if (pt_idx < pt_cnt) break;
+    }
+
+    int xyz_batch_start_idx = 0;
+    for (int k = 0; k < bs_idx; k++) xyz_batch_start_idx += xyz_batch_cnt[k];
+
+    const T *new_xyz_p = new_xyz + pt_idx * 3;
+    cur_xyz += xyz_batch_start_idx * 3;
+    cur_idx += pt_idx * nsample;
+
+    float radius2 = radius * radius;
+    T new_x = new_xyz_p[0];
+    T new_y = new_xyz_p[1];
+    T new_z = new_xyz_p[2];
+    int n = xyz_batch_cnt[bs_idx];
+
+    int cnt = 0;
+    for (int k = 0; k < n; ++k) {
+      T x = cur_xyz[k * 3 + 0];
+      T y = cur_xyz[k * 3 + 1];
+      T z = cur_xyz[k * 3 + 2];
+      T d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) +
+             (new_z - z) * (new_z - z);
+      if (d2 < radius2) {
+        if (cnt == 0) {
+          for (int l = 0; l < nsample; ++l) {
+            cur_idx[l] = k;
+          }
+        }
+        cur_idx[cnt] = k;
+        ++cnt;
+        if (cnt >= nsample) break;
+      }
+    }
+    if (cnt == 0) cur_idx[0] = -1;
+  }
+}
+
+#endif  // STACK_BALL_QUERY_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_group_points_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_group_points_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..4ef3663d05bcd9146e15dd93bb979734538919cb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/stack_group_points_cuda_kernel.cuh
@@ -0,0 +1,97 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points_gpu.cu
+#ifndef STACK_GROUP_POINTS_CUDA_KERNEL_CUH
+#define STACK_GROUP_POINTS_CUDA_KERNEL_CUH
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+#include <stdio.h>
+template <typename T>
+__global__ void stack_group_points_forward_cuda_kernel(
+    int b, int c, int m, int nsample, const T *features,
+    const int *features_batch_cnt, const int *idx, const int *idx_batch_cnt,
+    T *out) {
+  // :param features: (N1 + N2 ..., C) tensor of features to group
+  // :param features_batch_cnt: (batch_size) [N1 + N2 ...] tensor containing the
+  // indices of features to group with :param idx: (M1 + M2 ..., nsample) tensor
+  // containing the indices of features to group with :param idx_batch_cnt:
+  // (batch_size) [M1 + M2 ...] tensor containing the indices of features to
+  // group with :return:
+  //     output: (M1 + M2, C, nsample) tensor
+  CUDA_1D_KERNEL_LOOP(index, m * c * nsample) {
+    const T *cur_features = features;
+    const int *cur_idx = idx;
+    int sample_idx = index % nsample;
+    int c_idx = (index / nsample) % c;
+    int pt_idx = (index / nsample / c);
+
+    if (pt_idx >= m || c_idx >= c || sample_idx >= nsample) return;
+    int bs_idx = 0, pt_cnt = idx_batch_cnt[0];
+    for (int k = 1; k < b; k++) {
+      if (pt_idx < pt_cnt) break;
+      pt_cnt += idx_batch_cnt[k];
+      bs_idx = k;
+    }
+
+    int features_batch_start_idx = 0;
+    int features_batch_end_idx = features_batch_cnt[0];
+    for (int k = 0; k < bs_idx; k++) {
+      features_batch_start_idx += features_batch_cnt[k];
+      features_batch_end_idx =
+          features_batch_start_idx + features_batch_cnt[k + 1];
+    }
+    cur_features += features_batch_start_idx * c;
+
+    cur_idx += pt_idx * nsample + sample_idx;
+    int in_idx = cur_idx[0] * c + c_idx;
+    int out_idx = pt_idx * c * nsample + c_idx * nsample + sample_idx;
+    if (in_idx < features_batch_end_idx * c) {
+      out[out_idx] = cur_features[in_idx];
+    }
+  }
+}
+
+template <typename T>
+__global__ void stack_group_points_backward_cuda_kernel(
+    int b, int c, int m, int n, int nsample, const T *grad_out, const int *idx,
+    const int *idx_batch_cnt, const int *features_batch_cnt, T *grad_features) {
+  // :param grad_out: (M1 + M2 ..., C, nsample) tensor of the gradients of the
+  // output from forward :param idx: (M1 + M2 ..., nsample) tensor containing
+  // the indices of features to group with :param idx_batch_cnt: (batch_size)
+  // [M1 + M2 ...] tensor containing the indices of features to group with
+  // :param features_batch_cnt: (batch_size) [N1 + N2 ...] tensor containing the
+  // indices of features to group with :return:
+  //     grad_features: (N1 + N2 ..., C) gradient of the features
+  CUDA_1D_KERNEL_LOOP(index, m * c * nsample) {
+    const T *cur_grad_out = grad_out;
+    const int *cur_idx = idx;
+    T *cur_grad_features = grad_features;
+    int sample_idx = index % nsample;
+    int c_idx = (index / nsample) % c;
+    int pt_idx = (index / nsample / c);
+
+    if (pt_idx >= m || c_idx >= c || sample_idx >= nsample) return;
+
+    int bs_idx = 0, pt_cnt = idx_batch_cnt[0];
+    for (int k = 1; k < b; k++) {
+      if (pt_idx < pt_cnt) break;
+      pt_cnt += idx_batch_cnt[k];
+      bs_idx = k;
+    }
+
+    int features_batch_start_idx = 0;
+    for (int k = 0; k < bs_idx; k++)
+      features_batch_start_idx += features_batch_cnt[k];
+
+    cur_grad_out += pt_idx * c * nsample + c_idx * nsample + sample_idx;
+    cur_idx += pt_idx * nsample + sample_idx;
+    cur_grad_features += (features_batch_start_idx + cur_idx[0]) * c + c_idx;
+
+    atomicAdd(cur_grad_features, cur_grad_out[0]);
+  }
+}
+
+#endif  // GROUP_POINTS_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sync_bn_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sync_bn_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..4ec6a466886832d38c72da6e3a3574e72d53cec8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/sync_bn_cuda_kernel.cuh
@@ -0,0 +1,331 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef SYNCBN_CUDA_KERNEL_CUH
+#define SYNCBN_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void sync_bn_forward_mean_cuda_kernel(const T *input, float *mean,
+                                                 int num, int channels,
+                                                 int spatial) {
+  __shared__ float buffer[THREADS_PER_BLOCK];
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    buffer[tid] += input[index];
+  }
+  __syncthreads();
+
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer[tid] += buffer[tid + s];
+    }
+    __syncthreads();
+  }
+  int total = num * spatial;
+  if (tid == 0) {
+    mean[c] = buffer[0] / total;
+  }
+}
+
+template <>
+__global__ void sync_bn_forward_mean_cuda_kernel(const phalf *input,
+                                                 float *mean, int num,
+                                                 int channels, int spatial) {
+  __shared__ float buffer[THREADS_PER_BLOCK];
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    buffer[tid] += static_cast<float>(input[index]);
+  }
+  __syncthreads();
+
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer[tid] += buffer[tid + s];
+    }
+    __syncthreads();
+  }
+  int total = num * spatial;
+  if (tid == 0) {
+    mean[c] = buffer[0] / total;
+  }
+}
+
+template <typename T>
+__global__ void sync_bn_forward_var_cuda_kernel(const T *input,
+                                                const float *mean, float *var,
+                                                int num, int channels,
+                                                int spatial) {
+  __shared__ float buffer[THREADS_PER_BLOCK];
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    float td = input[index] - mean[c];
+    buffer[tid] += td * td;
+  }
+  __syncthreads();
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer[tid] += buffer[tid + s];
+    }
+    __syncthreads();
+  }
+  int total = num * spatial;
+  if (tid == 0) {
+    var[c] = buffer[0] / total;
+  }
+}
+
+template <>
+__global__ void sync_bn_forward_var_cuda_kernel(const phalf *input,
+                                                const float *mean, float *var,
+                                                int num, int channels,
+                                                int spatial) {
+  __shared__ float buffer[THREADS_PER_BLOCK];
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    float td = static_cast<float>(input[index]) - mean[c];
+    buffer[tid] += td * td;
+  }
+  __syncthreads();
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer[tid] += buffer[tid + s];
+    }
+    __syncthreads();
+  }
+  int total = num * spatial;
+  if (tid == 0) {
+    var[c] = buffer[0] / total;
+  }
+}
+
+template <typename T>
+__global__ void sync_bn_forward_output_cuda_kernel(
+    const T *input, const float *mean, const float *var, float *running_mean,
+    float *running_var, const float *weight, const float *bias, float *norm,
+    float *std, T *output, int num, int channels, int spatial, float eps,
+    float momentum, int group_size) {
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  float mean_value = mean[c];
+  float std_value = sqrt(var[c] + eps);
+
+  if (weight != nullptr) {
+    float weight_value = weight[c];
+    float bias_value = bias[c];
+    if (norm != nullptr) {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        norm[index] = (input[index] - mean_value) / std_value;
+        output[index] = norm[index] * weight_value + bias_value;
+      }
+    } else {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        output[index] =
+            (input[index] - mean_value) / std_value * weight_value + bias_value;
+      }
+    }
+  } else {
+    if (norm != nullptr) {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        output[index] = norm[index] = (input[index] - mean_value) / std_value;
+      }
+    } else {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        output[index] = (input[index] - mean_value) / std_value;
+      }
+    }
+  }
+  if (tid == 0) {
+    if (std != nullptr) std[c] = std_value;
+    if (running_mean != nullptr) {
+      running_mean[c] =
+          momentum * mean_value + (1 - momentum) * running_mean[c];
+      int count = num * spatial * group_size;
+      float var_unbias = count > 1 ? var[c] * count / (count - 1) : var[c];
+      running_var[c] = momentum * var_unbias + (1 - momentum) * running_var[c];
+    }
+  }
+}
+
+template <>
+__global__ void sync_bn_forward_output_cuda_kernel(
+    const phalf *input, const float *mean, const float *var,
+    float *running_mean, float *running_var, const float *weight,
+    const float *bias, float *norm, float *std, phalf *output, int num,
+    int channels, int spatial, float eps, float momentum, int group_size) {
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  float mean_value = mean[c];
+  float std_value = sqrt(var[c] + eps);
+  if (weight != nullptr) {
+    float weight_value = weight[c];
+    float bias_value = bias[c];
+    if (norm != nullptr) {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        norm[index] =
+            (static_cast<float>(input[index]) - mean_value) / std_value;
+        output[index] =
+            static_cast<phalf>(norm[index] * weight_value + bias_value);
+      }
+    } else {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        output[index] =
+            static_cast<phalf>((static_cast<float>(input[index]) - mean_value) /
+                                   std_value * weight_value +
+                               bias_value);
+      }
+    }
+  } else {
+    if (norm != nullptr) {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        norm[index] =
+            (static_cast<float>(input[index]) - mean_value) / std_value;
+        output[index] = static_cast<phalf>(norm[index]);
+      }
+    } else {
+      for (int i = tid; i < num * spatial; i += blockDim.x) {
+        int index =
+            (i / spatial) * channels * spatial + c * spatial + i % spatial;
+        output[index] = static_cast<phalf>(
+            (static_cast<float>(input[index]) - mean_value) / std_value);
+      }
+    }
+  }
+  if (tid == 0) {
+    if (std != nullptr) std[c] = std_value;
+    if (running_mean != nullptr) {
+      running_mean[c] =
+          momentum * mean_value + (1 - momentum) * running_mean[c];
+      int count = num * spatial * group_size;
+      float var_unbias = count > 1 ? var[c] * count / (count - 1) : var[c];
+      running_var[c] = momentum * var_unbias + (1 - momentum) * running_var[c];
+    }
+  }
+}
+
+template <typename T>
+__global__ void sync_bn_backward_param_cuda_kernel(const T *grad_output,
+                                                   const float *norm,
+                                                   float *grad_weight,
+                                                   float *grad_bias, int num,
+                                                   int channels, int spatial) {
+  __shared__ float buffer1[THREADS_PER_BLOCK];
+  __shared__ float buffer2[THREADS_PER_BLOCK];
+
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer1[tid] = buffer2[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    buffer1[tid] += grad_output[index] * norm[index];
+    buffer2[tid] += grad_output[index];
+  }
+  __syncthreads();
+
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer1[tid] += buffer1[tid + s];
+      buffer2[tid] += buffer2[tid + s];
+    }
+    __syncthreads();
+  }
+  if (tid == 0) {
+    grad_weight[c] = buffer1[0];
+    grad_bias[c] = buffer2[0];
+  }
+}
+
+template <>
+__global__ void sync_bn_backward_param_cuda_kernel(const phalf *grad_output,
+                                                   const float *norm,
+                                                   float *grad_weight,
+                                                   float *grad_bias, int num,
+                                                   int channels, int spatial) {
+  __shared__ float buffer1[THREADS_PER_BLOCK];
+  __shared__ float buffer2[THREADS_PER_BLOCK];
+
+  int tid = threadIdx.x;
+  int c = blockIdx.x;
+  buffer1[tid] = buffer2[tid] = 0;
+  for (int i = tid; i < num * spatial; i += blockDim.x) {
+    int index = (i / spatial) * channels * spatial + c * spatial + i % spatial;
+    buffer1[tid] += static_cast<float>(grad_output[index]) * norm[index];
+    buffer2[tid] += static_cast<float>(grad_output[index]);
+  }
+  __syncthreads();
+
+  for (int s = blockDim.x / 2; s > 0; s >>= 1) {
+    if (tid < s) {
+      buffer1[tid] += buffer1[tid + s];
+      buffer2[tid] += buffer2[tid + s];
+    }
+    __syncthreads();
+  }
+  if (tid == 0) {
+    grad_weight[c] = buffer1[0];
+    grad_bias[c] = buffer2[0];
+  }
+}
+
+template <typename T>
+__global__ void sync_bn_backward_data_cuda_kernel(
+    int output_size, const T *grad_output, const float *weight,
+    const float *grad_weight, const float *grad_bias, const float *norm,
+    const float *std, T *grad_input, int num, int channels, int spatial) {
+  int factor = num * spatial;
+  CUDA_1D_KERNEL_LOOP(index, output_size) {
+    int c = (index / spatial) % channels;
+    grad_input[index] =
+        weight[c] *
+        (grad_output[index] -
+         (grad_weight[c] * norm[index] + grad_bias[c]) / factor) /
+        std[c];
+  }
+}
+
+template <>
+__global__ void sync_bn_backward_data_cuda_kernel(
+    int output_size, const phalf *grad_output, const float *weight,
+    const float *grad_weight, const float *grad_bias, const float *norm,
+    const float *std, phalf *grad_input, int num, int channels, int spatial) {
+  int factor = num * spatial;
+  CUDA_1D_KERNEL_LOOP(index, output_size) {
+    int c = (index / spatial) % channels;
+    grad_input[index] = static_cast<phalf>(
+        weight[c] *
+        (static_cast<float>(grad_output[index]) -
+         (grad_weight[c] * norm[index] + grad_bias[c]) / factor) /
+        std[c]);
+  }
+}
+
+#endif  // SYNCBN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_interpolate_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_interpolate_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..971b496e589d2210131351305cbaf0ed1a027cb1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_interpolate_cuda_kernel.cuh
@@ -0,0 +1,61 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef THREE_INTERPOLATE_CUDA_KERNEL_CUH
+#define THREE_INTERPOLATE_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void three_interpolate_forward_cuda_kernel(
+    int b, int c, int m, int n, const T *points, const int *__restrict__ idx,
+    const T *weight, T *out) {
+  // points: (B, C, M)
+  // idx: (B, N, 3)
+  // weight: (B, N, 3)
+  // output:
+  //      out: (B, C, N)
+
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, n) {
+    if (bs_idx >= b || c_idx >= c) return;
+
+    weight += bs_idx * n * 3 + pt_idx * 3;
+    points += bs_idx * c * m + c_idx * m;
+    idx += bs_idx * n * 3 + pt_idx * 3;
+    out += bs_idx * c * n + c_idx * n;
+
+    out[pt_idx] = weight[0] * points[idx[0]] + weight[1] * points[idx[1]] +
+                  weight[2] * points[idx[2]];
+  }
+}
+
+template <typename T>
+__global__ void three_interpolate_backward_cuda_kernel(
+    int b, int c, int n, int m, const T *grad_out, const int *__restrict__ idx,
+    const T *weight, T *grad_points) {
+  // grad_out: (B, C, N)
+  // weight: (B, N, 3)
+  // output:
+  //      grad_points: (B, C, M)
+
+  int bs_idx = blockIdx.z;
+  int c_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, n) {
+    if (bs_idx >= b || c_idx >= c) return;
+
+    grad_out += bs_idx * c * n + c_idx * n + pt_idx;
+    weight += bs_idx * n * 3 + pt_idx * 3;
+    grad_points += bs_idx * c * m + c_idx * m;
+    idx += bs_idx * n * 3 + pt_idx * 3;
+
+    atomicAdd(grad_points + idx[0], grad_out[0] * weight[0]);
+    atomicAdd(grad_points + idx[1], grad_out[0] * weight[1]);
+    atomicAdd(grad_points + idx[2], grad_out[0] * weight[2]);
+  }
+}
+
+#endif  // THREE_INTERPOLATE_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_nn_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_nn_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..f6b91f9c2d5ac304d47998be20f15943c1dd33fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/three_nn_cuda_kernel.cuh
@@ -0,0 +1,72 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef THREE_NN_CUDA_KERNEL_CUH
+#define THREE_NN_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void three_nn_forward_cuda_kernel(int b, int n, int m,
+                                             const T *unknown, const T *known,
+                                             T *dist2, int *__restrict__ idx) {
+  // unknown: (B, N, 3)
+  // known: (B, M, 3)
+  // output:
+  //      dist2: (B, N, 3)
+  //      idx: (B, N, 3)
+
+  int bs_idx = blockIdx.y;
+  CUDA_1D_KERNEL_LOOP(pt_idx, n) {
+    if (bs_idx >= b) return;
+
+    unknown += bs_idx * n * 3 + pt_idx * 3;
+    known += bs_idx * m * 3;
+    dist2 += bs_idx * n * 3 + pt_idx * 3;
+    idx += bs_idx * n * 3 + pt_idx * 3;
+
+    T ux = unknown[0];
+    T uy = unknown[1];
+    T uz = unknown[2];
+#if defined(__ILUVATAR__)
+  //float max: 3.4e38
+  float best1 = 3e38, best2 = 3e38, best3 = 3e38;
+#else
+  float best1 = 1e40, best2 = 1e40, best3 = 1e40;
+#endif
+
+    int besti1 = 0, besti2 = 0, besti3 = 0;
+    for (int k = 0; k < m; ++k) {
+      T x = known[k * 3 + 0];
+      T y = known[k * 3 + 1];
+      T z = known[k * 3 + 2];
+      T d = (ux - x) * (ux - x) + (uy - y) * (uy - y) + (uz - z) * (uz - z);
+      if (d < best1) {
+        best3 = best2;
+        besti3 = besti2;
+        best2 = best1;
+        besti2 = besti1;
+        best1 = d;
+        besti1 = k;
+      } else if (d < best2) {
+        best3 = best2;
+        besti3 = besti2;
+        best2 = d;
+        besti2 = k;
+      } else if (d < best3) {
+        best3 = d;
+        besti3 = k;
+      }
+    }
+    dist2[0] = best1;
+    dist2[1] = best2;
+    dist2[2] = best3;
+    idx[0] = besti1;
+    idx[1] = besti2;
+    idx[2] = besti3;
+  }
+}
+
+#endif  // THREE_NN_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/tin_shift_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/tin_shift_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..4d1159a515f4de2666c25ba4bd5e4f2cbbca1e10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/tin_shift_cuda_kernel.cuh
@@ -0,0 +1,61 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef TIN_SHIFT_CUDA_KERNEL_CUH
+#define TIN_SHIFT_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+template <typename T>
+__global__ void tin_shift_forward_cuda_kernel(
+    const int nthreads, const T* input, const int* shift, T* output,
+    const int batch_size, const int channels, const int t_size,
+    const int hw_size, const int group_size, const int group_channel) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    const int hw_index = index % hw_size;
+    const int j = (index / hw_size) % channels;
+
+    const int n_index = (index / hw_size / channels) % batch_size;
+    int group_id = j / group_channel;
+    int t_shift = shift[n_index * group_size + group_id];
+    int offset = n_index * t_size * hw_size * channels + hw_size * j + hw_index;
+    for (int i = 0; i < t_size; i++) {
+      int now_t = i + t_shift;
+      int data_id = i * hw_size * channels + offset;
+      if (now_t < 0 || now_t >= t_size) {
+        continue;
+      }
+      int out_id = now_t * hw_size * channels + offset;
+      output[out_id] = input[data_id];
+    }
+  }
+}
+
+template <typename T>
+__global__ void tin_shift_backward_cuda_kernel(
+    const int nthreads, const T* input, const int* shift, T* output,
+    const int batch_size, const int channels, const int t_size,
+    const int hw_size, const int group_size, const int group_channel) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    const int hw_index = index % hw_size;
+    const int j = (index / hw_size) % channels;
+
+    const int n_index = (index / hw_size / channels) % batch_size;
+    int group_id = j / group_channel;
+    int t_shift = shift[n_index * group_size + group_id];
+    int offset = n_index * t_size * hw_size * channels + hw_size * j + hw_index;
+    for (int i = 0; i < t_size; i++) {
+      int now_t = i + t_shift;
+      int data_id = i * hw_size * channels + offset;
+      if (now_t < 0 || now_t >= t_size) {
+        continue;
+      }
+      int out_id = now_t * hw_size * channels + offset;
+      output[out_id] = input[data_id];
+    }
+  }
+}
+
+#endif  // TIN_SHIFT_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/voxelization_cuda_kernel.cuh b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/voxelization_cuda_kernel.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..021b488d8d716c9e8132173bf04491d42b7b6fa2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/cuda/voxelization_cuda_kernel.cuh
@@ -0,0 +1,216 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#ifndef VOXELIZATION_CUDA_KERNEL_CUH
+#define VOXELIZATION_CUDA_KERNEL_CUH
+
+#ifdef MMCV_USE_PARROTS
+#include "parrots_cuda_helper.hpp"
+#else
+#include "pytorch_cuda_helper.hpp"
+#endif
+
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
+template <typename T, typename T_int>
+__global__ void dynamic_voxelize_kernel(
+    const T* points, T_int* coors, const float voxel_x, const float voxel_y,
+    const float voxel_z, const float coors_x_min, const float coors_y_min,
+    const float coors_z_min, const float coors_x_max, const float coors_y_max,
+    const float coors_z_max, const int grid_x, const int grid_y,
+    const int grid_z, const int num_points, const int num_features,
+    const int NDim) {
+  //   const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
+  CUDA_1D_KERNEL_LOOP(index, num_points) {
+    // To save some computation
+    auto points_offset = points + index * num_features;
+    auto coors_offset = coors + index * NDim;
+    int c_x = floorf((points_offset[0] - coors_x_min) / voxel_x);
+    if (c_x < 0 || c_x >= grid_x) {
+      coors_offset[0] = -1;
+      continue;
+    }
+
+    int c_y = floorf((points_offset[1] - coors_y_min) / voxel_y);
+    if (c_y < 0 || c_y >= grid_y) {
+      coors_offset[0] = -1;
+      coors_offset[1] = -1;
+      continue;
+    }
+
+    int c_z = floorf((points_offset[2] - coors_z_min) / voxel_z);
+    if (c_z < 0 || c_z >= grid_z) {
+      coors_offset[0] = -1;
+      coors_offset[1] = -1;
+      coors_offset[2] = -1;
+    } else {
+      coors_offset[0] = c_z;
+      coors_offset[1] = c_y;
+      coors_offset[2] = c_x;
+    }
+  }
+}
+
+template <typename T, typename T_int>
+__global__ void assign_point_to_voxel(const int nthreads, const T* points,
+                                      T_int* point_to_voxelidx,
+                                      T_int* coor_to_voxelidx, T* voxels,
+                                      const int max_points,
+                                      const int num_features,
+                                      const int num_points, const int NDim) {
+  CUDA_1D_KERNEL_LOOP(thread_idx, nthreads) {
+    // const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
+    int index = thread_idx / num_features;
+
+    int num = point_to_voxelidx[index];
+    int voxelidx = coor_to_voxelidx[index];
+    if (num > -1 && voxelidx > -1) {
+      auto voxels_offset =
+          voxels + voxelidx * max_points * num_features + num * num_features;
+
+      int k = thread_idx % num_features;
+      voxels_offset[k] = points[thread_idx];
+    }
+  }
+}
+
+template <typename T, typename T_int>
+__global__ void assign_voxel_coors(const int nthreads, T_int* coor,
+                                   T_int* point_to_voxelidx,
+                                   T_int* coor_to_voxelidx, T_int* voxel_coors,
+                                   const int num_points, const int NDim) {
+  CUDA_1D_KERNEL_LOOP(thread_idx, nthreads) {
+    // const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
+    // if (index >= num_points) return;
+    int index = thread_idx / NDim;
+    int num = point_to_voxelidx[index];
+    int voxelidx = coor_to_voxelidx[index];
+    if (num == 0 && voxelidx > -1) {
+      auto coors_offset = voxel_coors + voxelidx * NDim;
+      int k = thread_idx % NDim;
+      coors_offset[k] = coor[thread_idx];
+    }
+  }
+}
+
+template <typename T_int>
+__global__ void point_to_voxelidx_kernel(const T_int* coor,
+                                         T_int* point_to_voxelidx,
+                                         T_int* point_to_pointidx,
+                                         const int max_points,
+                                         const int max_voxels,
+                                         const int num_points, const int NDim) {
+  CUDA_1D_KERNEL_LOOP(index, num_points) {
+    auto coor_offset = coor + index * NDim;
+    // skip invalid points
+    if (coor_offset[0] == -1) continue;
+
+    int num = 0;
+    int coor_x = coor_offset[0];
+    int coor_y = coor_offset[1];
+    int coor_z = coor_offset[2];
+    // only calculate the coors before this coor[index]
+    for (int i = 0; i < index; ++i) {
+      auto prev_coor = coor + i * NDim;
+      if (prev_coor[0] == -1) continue;
+
+      // Find all previous points that have the same coors
+      // if find the same coor, record it
+      if ((prev_coor[0] == coor_x) && (prev_coor[1] == coor_y) &&
+          (prev_coor[2] == coor_z)) {
+        num++;
+        if (num == 1) {
+          // point to the same coor that first show up
+          point_to_pointidx[index] = i;
+        } else if (num >= max_points) {
+          // out of boundary
+          break;
+        }
+      }
+    }
+    if (num == 0) {
+      point_to_pointidx[index] = index;
+    }
+    if (num < max_points) {
+      point_to_voxelidx[index] = num;
+    }
+  }
+}
+
+template <typename T_int>
+__global__ void determin_voxel_num(
+    // const T_int* coor,
+    T_int* num_points_per_voxel, T_int* point_to_voxelidx,
+    T_int* point_to_pointidx, T_int* coor_to_voxelidx, T_int* voxel_num,
+    const int max_points, const int max_voxels, const int num_points) {
+  // only calculate the coors before this coor[index]
+  for (int i = 0; i < num_points; ++i) {
+    int point_pos_in_voxel = point_to_voxelidx[i];
+    // record voxel
+    if (point_pos_in_voxel == -1) {
+      // out of max_points or invalid point
+      continue;
+    } else if (point_pos_in_voxel == 0) {
+      // record new voxel
+      int voxelidx = voxel_num[0];
+      if (voxel_num[0] >= max_voxels) continue;
+      voxel_num[0] += 1;
+      coor_to_voxelidx[i] = voxelidx;
+      num_points_per_voxel[voxelidx] = 1;
+    } else {
+      int point_idx = point_to_pointidx[i];
+      int voxelidx = coor_to_voxelidx[point_idx];
+      if (voxelidx != -1) {
+        coor_to_voxelidx[i] = voxelidx;
+        num_points_per_voxel[voxelidx] += 1;
+      }
+    }
+  }
+}
+
+__global__ void nondeterministic_get_assign_pos(
+    const int nthreads, const int32_t* coors_map, int32_t* pts_id,
+    int32_t* coors_count, int32_t* reduce_count, int32_t* coors_order) {
+  CUDA_1D_KERNEL_LOOP(thread_idx, nthreads) {
+    int coors_idx = coors_map[thread_idx];
+    if (coors_idx > -1) {
+      int32_t coors_pts_pos = atomicAdd(&reduce_count[coors_idx], 1);
+      pts_id[thread_idx] = coors_pts_pos;
+      if (coors_pts_pos == 0) {
+        coors_order[coors_idx] = atomicAdd(coors_count, 1);
+      }
+    }
+  }
+}
+
+template <typename T>
+__global__ void nondeterministic_assign_point_voxel(
+    const int nthreads, const T* points, const int32_t* coors_map,
+    const int32_t* pts_id, const int32_t* coors_in, const int32_t* reduce_count,
+    const int32_t* coors_order, T* voxels, int32_t* coors, int32_t* pts_count,
+    const int max_voxels, const int max_points, const int num_features,
+    const int NDim) {
+  CUDA_1D_KERNEL_LOOP(thread_idx, nthreads) {
+    int coors_idx = coors_map[thread_idx];
+    int coors_pts_pos = pts_id[thread_idx];
+    if (coors_idx > -1 && coors_pts_pos < max_points) {
+      int coors_pos = coors_order[coors_idx];
+      if (coors_pos < max_voxels) {
+        auto voxels_offset =
+            voxels + (coors_pos * max_points + coors_pts_pos) * num_features;
+        auto points_offset = points + thread_idx * num_features;
+        for (int k = 0; k < num_features; k++) {
+          voxels_offset[k] = points_offset[k];
+        }
+        if (coors_pts_pos == 0) {
+          pts_count[coors_pos] = min(reduce_count[coors_idx], max_points);
+          auto coors_offset = coors + coors_pos * NDim;
+          auto coors_in_offset = coors_in + coors_idx * NDim;
+          for (int k = 0; k < NDim; k++) {
+            coors_offset[k] = coors_in_offset[k];
+          }
+        }
+      }
+    }
+  }
+}
+
+#endif  // VOXELIZATION_CUDA_KERNEL_CUH
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/bbox_overlaps_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/bbox_overlaps_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..0f273d2508d58aa26fa22f86180c843fc3bca90a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/bbox_overlaps_mlu_kernel.mlu
@@ -0,0 +1,322 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include <float.h>
+
+#include "common_mlu_helper.hpp"
+
+#define COORD_NUM 4
+
+__nram__ char nmem_buf[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void computeDiv(void *nram_dst, void *nram_src0, void *nram_src1,
+                             void *nram_addition, const int32_t deal_num) {
+  __bang_active_reciphp((T *)nram_dst, (T *)nram_src1, deal_num);
+  __bang_mul((T *)nram_dst, (T *)nram_src0, (T *)nram_dst, deal_num);
+}
+
+template <>
+__mlu_func__ void computeDiv<half>(void *nram_dst, void *nram_src0,
+                                   void *nram_src1, void *nram_addition,
+                                   const int32_t deal_num) {
+  __bang_half2float((float *)nram_addition, (half *)nram_src1, deal_num);
+  __bang_active_reciphp((float *)nram_addition, (float *)nram_addition,
+                        deal_num);
+  __bang_float2half_rd((half *)nram_src1, (float *)nram_addition, deal_num);
+  __bang_mul((half *)nram_dst, (half *)nram_src0, (half *)nram_src1, deal_num);
+}
+
+template <typename T>
+__mlu_func__ void bboxOverlapsWorkflow(
+    T *vec_b1_x1, T *vec_b1_y1, T *vec_b1_x2, T *vec_b1_y2, T *vec_b2_x1,
+    T *vec_b2_y1, T *vec_b2_x2, T *vec_b2_y2, T *vec_left, T *vec_right,
+    T *vec_top, T *vec_bottom, const T *bbox1, const T *bbox2, void *ious,
+    const int32_t offset, const int32_t mode, const int32_t batches_stride,
+    const int32_t num_bbox1, const int32_t num_bbox2, const bool aligned) {
+  int32_t task_batch_stride = (num_bbox1 + taskDim - 1) / taskDim;
+  int32_t batch_start = taskId * task_batch_stride;
+  int32_t batch_per_task = batch_start + task_batch_stride < num_bbox1
+                               ? task_batch_stride
+                               : num_bbox1 - batch_start;
+  batch_per_task = batch_per_task > 0 ? batch_per_task : (0);
+
+  if (aligned) {
+    int32_t num_loop_cpy = batch_per_task / batches_stride;
+    int32_t num_rem_cpy_batches = batch_per_task % batches_stride;
+    num_loop_cpy = num_rem_cpy_batches > 0 ? num_loop_cpy + 1 : num_loop_cpy;
+    for (int32_t i = 0; i < num_loop_cpy; i++) {
+      int32_t index = batch_start + i * batches_stride;
+      int32_t handle_batches = index + batches_stride > num_bbox1
+                                   ? num_rem_cpy_batches
+                                   : batches_stride;
+      int32_t b1 = index;
+      int32_t b2 = index;
+
+      int32_t base1 = b1 * COORD_NUM;
+      __memcpy(vec_b1_x1, &bbox1[base1], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b1_y1, &bbox1[base1 + 1], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b1_x2, &bbox1[base1 + 2], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b1_y2, &bbox1[base1 + 3], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+
+      int32_t base2 = b2 * COORD_NUM;
+      __memcpy(vec_b2_x1, &bbox2[base2], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b2_y1, &bbox2[base2 + 1], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b2_x2, &bbox2[base2 + 2], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      __memcpy(vec_b2_y2, &bbox2[base2 + 3], sizeof(T), GDRAM2NRAM, sizeof(T),
+               COORD_NUM * sizeof(T), handle_batches - 1);
+      // get the width and height
+      __bang_maxequal(vec_left, vec_b1_x1, vec_b2_x1, batches_stride);
+      __bang_minequal(vec_right, vec_b1_x2, vec_b2_x2, batches_stride);
+      __bang_maxequal(vec_top, vec_b1_y1, vec_b2_y1, batches_stride);
+      __bang_minequal(vec_bottom, vec_b1_y2, vec_b2_y2, batches_stride);
+
+      // right - left + offset ---> left
+      __bang_sub(vec_left, vec_right, vec_left, batches_stride);
+      __bang_add_scalar(vec_left, vec_left, (T)offset, batches_stride);
+
+      // bottom - top + offset ---> right
+      __bang_sub(vec_right, vec_bottom, vec_top, batches_stride);
+      __bang_add_scalar(vec_right, vec_right, (T)offset, batches_stride);
+
+      // zero vector ---> bottom
+      __bang_write_value(vec_bottom, batches_stride, 0.f);
+
+      // width --> vec_left
+      __bang_maxequal(vec_left, vec_bottom, vec_left, batches_stride);
+      T *width = vec_left;
+      // height --> vec_right
+      __bang_maxequal(vec_right, vec_bottom, vec_right, batches_stride);
+      T *height = vec_right;
+
+      // get the b1_area
+      // (b1_x2 - b1_x1 + offset)  --->  vec_top
+      __bang_sub(vec_top, vec_b1_x2, vec_b1_x1, batches_stride);
+      __bang_add_scalar(vec_top, vec_top, (T)offset, batches_stride);
+
+      // (b1_y2 - b1_y1 + offset)  --->  vec_bottom
+      __bang_sub(vec_bottom, vec_b1_y2, vec_b1_y1, batches_stride);
+      __bang_add_scalar(vec_bottom, vec_bottom, (T)offset, batches_stride);
+
+      // b1_area = (b1_x2 - b1_x1 + offset) * (b1_y2 - b1_y1 + offset)
+      // --->  vec_top;
+      __bang_mul(vec_top, vec_top, vec_bottom, batches_stride);
+      T *b1_area = vec_top;
+
+      // get the b2_area
+      // (b2_x2 - b2_x1 + offset)  --->  b2_x1
+      __bang_sub(vec_b2_x1, vec_b2_x2, vec_b2_x1, batches_stride);
+      __bang_add_scalar(vec_b2_x1, vec_b2_x1, (T)offset, batches_stride);
+
+      // (b2_y2 - b2_y1 + offset)  --->  b2_y1
+      __bang_sub(vec_b2_y1, vec_b2_y2, vec_b2_y1, batches_stride);
+      __bang_add_scalar(vec_b2_y1, vec_b2_y1, (T)offset, batches_stride);
+
+      // b2_area = (b2_x2 - b2_x1 + offset) * (b2_y2 - b2_y1 + offset)
+      // --->  b2_x1;
+      __bang_mul(vec_b2_x1, vec_b2_x1, vec_b2_y1, batches_stride);
+      T *b2_area = vec_b2_x1;
+
+      // inter_s = width * height
+      __bang_mul(height, width, height, batches_stride);
+      T *inter_s = height;
+
+      // offset vector ---> vec_b2_y1
+      __bang_write_value(vec_b2_y1, batches_stride, T(offset));
+      T *vec_offset = vec_b2_y1;
+
+      if (mode == 0) {
+        __bang_add(b1_area, b1_area, b2_area, batches_stride);
+        __bang_sub(b1_area, b1_area, inter_s, batches_stride);
+        __bang_maxequal(b1_area, vec_offset, b1_area, batches_stride);
+      } else {
+        __bang_maxequal(b1_area, vec_offset, b1_area, batches_stride);
+      }
+      T *base_s = b1_area;
+
+      // ious = inter_s / base_s
+      computeDiv<T>(width, inter_s, base_s, vec_b2_x2, batches_stride);
+      __memcpy((T *)ious + index, width, handle_batches * sizeof(T),
+               NRAM2GDRAM);
+    }
+  } else {
+    int32_t num_loop_cpy = num_bbox2 / batches_stride;
+    int32_t num_rem_cpy_batches = num_bbox2 % batches_stride;
+    num_loop_cpy = num_rem_cpy_batches > 0 ? num_loop_cpy + 1 : num_loop_cpy;
+    for (int32_t i = 0; i < batch_per_task; i++) {
+      int32_t index1 = batch_start + i;
+      int32_t b1 = index1;
+      int32_t base1 = b1 * COORD_NUM;
+
+      // set bbox1 and bbox2 to nram
+      __bang_write_value(vec_b1_x1, batches_stride, bbox1[base1]);
+      __bang_write_value(vec_b1_y1, batches_stride, bbox1[base1 + 1]);
+      __bang_write_value(vec_b1_x2, batches_stride, bbox1[base1 + 2]);
+      __bang_write_value(vec_b1_y2, batches_stride, bbox1[base1 + 3]);
+
+      for (int32_t j = 0; j < num_loop_cpy; j++) {
+        int32_t index2 = j * batches_stride;
+        int32_t handle_batches = index2 + batches_stride > num_bbox2
+                                     ? num_rem_cpy_batches
+                                     : batches_stride;
+        int32_t b2 = index2;
+        int32_t base2 = b2 * COORD_NUM;
+
+        // copy bbox2 to nram
+        __memcpy(vec_b2_x1, &bbox2[base2], sizeof(T), GDRAM2NRAM, sizeof(T),
+                 COORD_NUM * sizeof(T), handle_batches - 1);
+        __memcpy(vec_b2_y1, &bbox2[base2 + 1], sizeof(T), GDRAM2NRAM, sizeof(T),
+                 COORD_NUM * sizeof(T), handle_batches - 1);
+        __memcpy(vec_b2_x2, &bbox2[base2 + 2], sizeof(T), GDRAM2NRAM, sizeof(T),
+                 COORD_NUM * sizeof(T), handle_batches - 1);
+        __memcpy(vec_b2_y2, &bbox2[base2 + 3], sizeof(T), GDRAM2NRAM, sizeof(T),
+                 COORD_NUM * sizeof(T), handle_batches - 1);
+
+        // get the width and height
+        __bang_maxequal(vec_left, vec_b1_x1, vec_b2_x1, batches_stride);
+        __bang_minequal(vec_right, vec_b1_x2, vec_b2_x2, batches_stride);
+        __bang_maxequal(vec_top, vec_b1_y1, vec_b2_y1, batches_stride);
+        __bang_minequal(vec_bottom, vec_b1_y2, vec_b2_y2, batches_stride);
+
+        // right - left + offset ---> left
+        __bang_sub(vec_left, vec_right, vec_left, batches_stride);
+        __bang_add_scalar(vec_left, vec_left, (T)offset, batches_stride);
+        // bottom - top + offset ---> right
+        __bang_sub(vec_right, vec_bottom, vec_top, batches_stride);
+        __bang_add_scalar(vec_right, vec_right, (T)offset, batches_stride);
+
+        // zero vector ---> bottom
+        __bang_write_value(vec_bottom, batches_stride, (T)0);
+
+        // width --> vec_left
+        __bang_maxequal(vec_left, vec_bottom, vec_left, batches_stride);
+        T *width = vec_left;
+        // height --> vec_right
+        __bang_maxequal(vec_right, vec_bottom, vec_right, batches_stride);
+        T *height = vec_right;
+
+        // get the b1_area
+        // (b1_x2 - b1_x1 + offset)  --->  vec_top
+        __bang_sub(vec_top, vec_b1_x2, vec_b1_x1, batches_stride);
+        __bang_add_scalar(vec_top, vec_top, (T)offset, batches_stride);
+        // (b1_y2 - b1_y1 + offset)  --->  vec_bottom
+        __bang_sub(vec_bottom, vec_b1_y2, vec_b1_y1, batches_stride);
+        __bang_add_scalar(vec_bottom, vec_bottom, (T)offset, batches_stride);
+        // b1_area = (b1_x2 - b1_x1 + offset) * (b1_y2 - b1_y1 + offset)
+        // --->  vec_top;
+        __bang_mul(vec_top, vec_top, vec_bottom, batches_stride);
+        T *b1_area = vec_top;
+
+        // get the b2_area
+        // (b2_x2 - b2_x1 + offset)  --->  b2_x1
+        __bang_sub(vec_b2_x1, vec_b2_x2, vec_b2_x1, batches_stride);
+        __bang_add_scalar(vec_b2_x1, vec_b2_x1, (T)offset, batches_stride);
+        // (b2_y2 - b2_y1 + offset)  --->  b2_y1
+        __bang_sub(vec_b2_y1, vec_b2_y2, vec_b2_y1, batches_stride);
+        __bang_add_scalar(vec_b2_y1, vec_b2_y1, (T)offset, batches_stride);
+        // b2_area = (b2_x2 - b2_x1 + offset) * (b2_y2 - b2_y1 + offset)
+        // --->  b2_x1;
+        __bang_mul(vec_b2_x1, vec_b2_x1, vec_b2_y1, batches_stride);
+        T *b2_area = vec_b2_x1;
+
+        // inter_s = width * height
+        __bang_mul(height, width, height, batches_stride);
+        T *inter_s = height;
+
+        // offset vector ---> vec_b2_y1
+        __bang_write_value(vec_b2_y1, batches_stride, T(offset));
+        T *vec_offset = vec_b2_y1;
+
+        if (mode == 0) {
+          __bang_add(b1_area, b1_area, b2_area, batches_stride);
+          __bang_sub(b1_area, b1_area, inter_s, batches_stride);
+          __bang_maxequal(b1_area, vec_offset, b1_area, batches_stride);
+        } else {
+          __bang_maxequal(b1_area, vec_offset, b1_area, batches_stride);
+        }
+        T *base_s = b1_area;
+
+        // ious = inter_s / base_s
+        computeDiv<T>(width, inter_s, base_s, vec_b2_x2, batches_stride);
+        int32_t gdram_offset = index1 * num_bbox2 + index2;
+        __memcpy((T *)ious + gdram_offset, width, handle_batches * sizeof(T),
+                 NRAM2GDRAM);
+      }
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelBBoxOverlaps(
+    const void *bbox1, const void *bbox2, void *ious, const int32_t num_bbox1,
+    const int32_t num_bbox2, const int32_t mode, const bool aligned,
+    const int32_t offset) {
+  /*
+   * NRAM partition
+   *  |-------------------------------------------------------------|
+   *  |   vec_b1_x1   |  vec_b1_y1   |   vec_b1_x2  |   vec_b1_y2   |
+   *  |-------------------------------------------------------------|
+   *  |   vec_b2_x1   |  vec_b2_y1   |   vec_b2_x2  |   vec_b2_y2   |
+   *  |-------------------------------------------------------------|
+   *  |    vec_left   |  vec_right   |    vec_top   |   vec_bottom  |
+   *  |-------------------------------------------------------------|
+   *
+  */
+  const int32_t align_bytes = PAD_DOWN(MAX_NRAM_SIZE, NFU_ALIGN_SIZE);
+  const int32_t split_nram_num = 12;
+  const int32_t nram_stride =
+      align_bytes / NFU_ALIGN_SIZE / split_nram_num * NFU_ALIGN_SIZE;
+
+  void *vec_b1_x1 = nmem_buf;
+  void *vec_b1_y1 = nmem_buf + nram_stride;
+  void *vec_b1_x2 = nmem_buf + 2 * nram_stride;
+  void *vec_b1_y2 = nmem_buf + 3 * nram_stride;
+
+  void *vec_b2_x1 = nmem_buf + 4 * nram_stride;
+  void *vec_b2_y1 = nmem_buf + 5 * nram_stride;
+  void *vec_b2_x2 = nmem_buf + 6 * nram_stride;
+  void *vec_b2_y2 = nmem_buf + 7 * nram_stride;
+
+  void *vec_left = nmem_buf + 8 * nram_stride;
+  void *vec_right = nmem_buf + 9 * nram_stride;
+  void *vec_top = nmem_buf + 10 * nram_stride;
+  void *vec_bottom = nmem_buf + 11 * nram_stride;
+
+  const int32_t vec_length = nram_stride / sizeof(T);
+  bboxOverlapsWorkflow((T *)vec_b1_x1, (T *)vec_b1_y1, (T *)vec_b1_x2,
+                       (T *)vec_b1_y2, (T *)vec_b2_x1, (T *)vec_b2_y1,
+                       (T *)vec_b2_x2, (T *)vec_b2_y2, (T *)vec_left,
+                       (T *)vec_right, (T *)vec_top, (T *)vec_bottom,
+                       (T *)bbox1, (T *)bbox2, (T *)ious, offset, mode,
+                       vec_length, num_bbox1, num_bbox2, aligned);
+}
+
+void KernelBBoxOverlaps(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                        cnrtQueue_t queue, const cnrtDataType_t d_type,
+                        const void *bbox1, const void *bbox2, void *ious,
+                        const int32_t num_bbox1, const int32_t num_bbox2,
+                        const int32_t mode, const bool aligned,
+                        const int32_t offset) {
+  if (d_type == CNRT_FLOAT16) {
+    MLUUnion1KernelBBoxOverlaps<half><<<k_dim, k_type, queue>>>(
+        bbox1, bbox2, ious, num_bbox1, num_bbox2, mode, aligned, offset);
+  } else {
+    MLUUnion1KernelBBoxOverlaps<float><<<k_dim, k_type, queue>>>(
+        bbox1, bbox2, ious, num_bbox1, num_bbox2, mode, aligned, offset);
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..8dd6a8e58221758d8bbe99730e8c02813341beea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_mlu_kernel.mlu
@@ -0,0 +1,552 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "carafe_utils.hpp"
+#include "common_mlu_helper.hpp"
+
+#define INDEX3(n, h, w, c, strN, strH, strW) \
+  (strN) * (n) + (strH) * (h) + (strW) * (w) + (c)
+
+#define NRAM_BLOCK PAD_DOWN(MAX_NRAM_SIZE / 5, NRAM_ALIGN_SIZE)
+
+__nram__ char nram_buf[MAX_NRAM_SIZE];
+
+namespace forward {
+struct BlockId {
+  int Ho;
+  int Wo;
+  int G;
+  int Cg;
+  int Kh;
+  int Kw;
+  int Hi;
+  int Wi;
+};
+
+// start indices of block
+struct BlockStart {
+  int Ho;
+  int Wo;
+  int G;
+  int Cg;
+  int Kh;
+  int Kw;
+  int Hi;
+  int Wi;
+  int C;
+};
+
+struct BlockEnd {
+  int Ho;
+  int Wo;
+  int Kh;
+  int Kw;
+  int Hi;
+  int Wi;
+};
+
+struct BlockSize {
+  int Ho;
+  int Wo;
+  int G;
+  int Cg;
+  int Kh;
+  int Kw;
+  int Hi;
+  int Wi;
+};
+
+template <typename T>
+__mlu_func__ void carafeForwardBLOCK(T *input, T *mask,
+                                     const CarafeForwardParam param,
+                                     const CarafeForwardBlockDim block_dim,
+                                     const CarafeForwardGridDim grid_dim,
+                                     T *output) {
+  // data block info
+  BlockId blkId;
+  BlockStart blkStart;
+  BlockEnd blkEnd;
+  BlockSize blkSize;
+
+  // set pointers on NRAM arrays
+
+  // input_nram[blkDim_(Hi+Kh)-1, blkDim_(Wi+Kw)-1, blkDim_(G*Cg)]
+  T *input_nram = (T *)nram_buf;
+
+  // mask_nram[blkDim_Ho, blkDim_Wo, blkDim_(G*Kh*Kw)]
+  T *mask_nram = input_nram + param.input_nram_size;
+
+  // output_nram[blkDim_Ho, blkDim_Wo, blkDim_(G*Cg)]
+  T *output_nram = mask_nram + param.mask_nram_size;
+
+  // sum_array[blkDim_(G*Cg)]
+  T *sum_array = output_nram + param.output_nram_size;
+
+  /* ===== loop over N, grid_dim(Ho,Wo,G,Cg)
+   * iterations are distributed over computing cores
+   */
+  for (int loop_index = taskId; loop_index < param.job_num;
+       loop_index += taskDim) {
+    // block idx
+    blkId.Cg = loop_index;
+    blkId.G = blkId.Cg / grid_dim.Cg;
+    blkId.Wo = blkId.G / grid_dim.G;
+    blkId.Ho = blkId.Wo / grid_dim.Wo;
+    int sample_idx = blkId.Ho / grid_dim.Ho;
+
+    blkId.Cg %= grid_dim.Cg;
+    blkId.G %= grid_dim.G;
+    blkId.Wo %= grid_dim.Wo;
+    blkId.Ho %= grid_dim.Ho;
+
+    // block starting indices
+    blkStart.Ho = blkId.Ho * block_dim.Ho;
+    blkStart.Wo = blkId.Wo * block_dim.Wo;
+    blkStart.G = blkId.G * block_dim.G;
+    blkStart.Cg = blkId.Cg * block_dim.Cg;
+    blkStart.C = blkStart.G * param.Cg + blkStart.Cg;
+
+    // block size
+    blkSize.Ho = block_dim.Ho;
+    blkSize.Wo = block_dim.Wo;
+    blkSize.G = block_dim.G;
+    blkSize.Cg = block_dim.Cg;
+
+    // take care of blocks near the end of each dimension
+    if (blkId.Ho == (grid_dim.Ho - 1)) {
+      blkSize.Ho = param.Ho - (grid_dim.Ho - 1) * block_dim.Ho;
+    }
+    if (blkId.Wo == (grid_dim.Wo - 1)) {
+      blkSize.Wo = param.Wo - (grid_dim.Wo - 1) * block_dim.Wo;
+    }
+    if (blkId.G == (grid_dim.G - 1)) {
+      blkSize.G = param.group_size - (grid_dim.G - 1) * block_dim.G;
+    }
+    if (blkId.Cg == (grid_dim.Cg - 1)) {
+      blkSize.Cg = param.Cg - (grid_dim.Cg - 1) * block_dim.Cg;
+    }
+
+    // block end indices
+    blkEnd.Ho = blkStart.Ho + blkSize.Ho - 1;
+    blkEnd.Wo = blkStart.Wo + blkSize.Wo - 1;
+
+    // set output_nram to zero
+    __bang_write_value(output_nram, param.output_nram_size, T(0));
+
+    // loop blocks of kernel window: grid_dim.(Kh, Kw)
+    for (blkId.Kh = 0; blkId.Kh < grid_dim.Kh; ++blkId.Kh) {
+      blkStart.Kh = blkId.Kh * block_dim.Kh;
+      blkSize.Kh = block_dim.Kh;
+      if (blkId.Kh == (grid_dim.Kh - 1)) {
+        blkSize.Kh = param.kernel_size - (grid_dim.Kh - 1) * block_dim.Kh;
+      }
+      blkEnd.Kh = blkStart.Kh + blkSize.Kh - 1;
+
+      blkStart.Hi = blkStart.Ho / param.scale_factor - param.kernel_size_half +
+                    blkStart.Kh;
+      blkEnd.Hi =
+          blkEnd.Ho / param.scale_factor - param.kernel_size_half + blkEnd.Kh;
+      blkSize.Hi = blkEnd.Hi - blkStart.Hi + 1;
+
+      for (blkId.Kw = 0; blkId.Kw < grid_dim.Kw; ++blkId.Kw) {
+        blkStart.Kw = blkId.Kw * block_dim.Kw;
+        blkSize.Kw = block_dim.Kw;
+        if (blkId.Kw == (grid_dim.Kw - 1)) {
+          blkSize.Kw = param.kernel_size - (grid_dim.Kw - 1) * block_dim.Kw;
+        }
+        blkEnd.Kw = blkStart.Kw + blkSize.Kw - 1;
+
+        blkStart.Wi = blkStart.Wo / param.scale_factor -
+                      param.kernel_size_half + blkStart.Kw;
+        blkEnd.Wi =
+            blkEnd.Wo / param.scale_factor - param.kernel_size_half + blkEnd.Kw;
+        blkSize.Wi = blkEnd.Wi - blkStart.Wi + 1;
+
+        // load input block from gdram2nram
+        //
+        // input_nram[            | input[ sample_idx,
+        //   0:blkSize.Hi-1,      |   blkStart.Hi + 0:blkSize.Hi-1,
+        //   0:blkSize.Wi-1,      |   blkStart.Wi + 0:blkSize.Wi-1,
+        //   0:blkSize.G-1        |   blkStart.G + 0:blkSize.G-1
+        //   0:blkSize.Cg-1]      |   blkStart.Cg + 0:blkSize.Cg-1]
+        //
+        // To skip out of bound indices:
+        //
+        // input_nram[
+        //    hi_start_local:hi_end_local,
+        //    wi_start_local:wi_end_local, ...]
+        // = input[n,
+        //    hi_start_global:hi_end_global,
+        //    wi_start_global:wi_end_global, ...]
+        //
+        int hi_start_local = 0;
+        int hi_start_global = blkStart.Hi;
+        if (blkStart.Hi < 0) {
+          hi_start_local = -blkStart.Hi;
+          hi_start_global = 0;
+        }
+        int wi_start_local = 0;
+        int wi_start_global = blkStart.Wi;
+        if (blkStart.Wi < 0) {
+          wi_start_local = -blkStart.Wi;
+          wi_start_global = 0;
+        }
+        int hi_end_local = blkSize.Hi - 1;
+        int hi_end_global = blkEnd.Hi;
+        if (blkEnd.Hi > param.Hi - 1) {
+          hi_end_global = param.Hi - 1;
+          hi_end_local -= blkEnd.Hi - hi_end_global;
+        }
+        int wi_end_local = blkSize.Wi - 1;
+        int wi_end_global = blkEnd.Wi;
+        if (blkEnd.Wi > param.Wi - 1) {
+          wi_end_global = param.Wi - 1;
+          wi_end_local -= blkEnd.Wi - wi_end_global;
+        }
+
+        int dst_offset = param.input_nram_stride_h * hi_start_local +
+                         param.input_nram_stride_w * wi_start_local;
+        T *dst = input_nram + dst_offset;
+
+        int src_offset = INDEX3(sample_idx, hi_start_global, wi_start_global,
+                                blkStart.C, param.input_stride_n,
+                                param.input_stride_h, param.input_stride_w);
+        T *src = input + src_offset;
+
+        int input_seg_num_h = hi_end_local - hi_start_local + 1;
+        int input_seg_num_w = wi_end_local - wi_start_local + 1;
+        for (int i = 0; i < input_seg_num_h; ++i) {
+          loadStr3D(dst, src, blkSize.Cg, blkSize.G, input_seg_num_w,
+                    param.input_nram_stride_g, param.input_nram_stride_w,
+                    param.input_stride_g, param.input_stride_w);
+          dst += param.input_nram_stride_h;
+          src += param.input_stride_h;
+        }
+
+        /* load mask block from gdram2nram
+         *
+         * mask_nram[          |  mask[sample_idx,
+         *   0:blkSize.Ho-1 ,  |    blkStart.Ho + 0:blkSize.Ho-1,
+         *   0:blkSize.Wo-1,   |    blkStart.Wo + 0:blkSize.Wo-1,
+         *   0:blkSize.G-1,    |    blkStart.G  + 0:blkSize.G-1,
+         *   0:blkSize.Kh-1,   |    blkStart.Kh + 0:blkSize.Kh-1,
+         *   0:blkSize.Kw-1]   |    blkStart.Kw + 0:blkSize.Kw-1]
+         */
+        src_offset = INDEX3(blkStart.Wo, blkStart.G, blkStart.Kh, blkStart.Kw,
+                            param.mask_stride_w, param.mask_stride_g,
+                            param.mask_stride_kh);
+        src_offset += sample_idx * param.mask_stride_n +
+                      blkStart.Ho * param.mask_stride_h;
+
+        for (int ho = 0; ho < blkSize.Ho; ++ho) {
+          dst = mask_nram + ho * param.mask_nram_stride_h;
+          src = mask + src_offset + ho * param.mask_stride_h;
+
+          for (int wo = 0; wo < blkSize.Wo; ++wo) {
+            loadStr3D(dst, src, blkSize.Kw, blkSize.Kh, blkSize.G,
+                      param.mask_nram_stride_kh, param.mask_nram_stride_g,
+                      param.mask_stride_kh, param.mask_stride_g);
+            dst += param.mask_nram_stride_w;
+            src += param.mask_stride_w;
+          }
+        }
+
+        // loop each pixel of the output block
+        for (int ho = 0; ho < blkSize.Ho; ++ho) {
+          int kernel_hi_start_global = (blkStart.Ho + ho) / param.scale_factor -
+                                       param.kernel_size_half + blkStart.Kh;
+          int kernel_hi_start_local = kernel_hi_start_global - blkStart.Hi;
+
+          // int kernel_hi_end_global = kernel_hi_start_global + blkSize.Kh - 1;
+          // int kernel_hi_end_local = kernel_hi_end_global - blkStart.Hi;
+
+          // exclude out of bound indices which should be ignored
+          int kh_min = hi_start_local - kernel_hi_start_local > 0
+                           ? hi_start_local - kernel_hi_start_local
+                           : 0;
+          int kh_max = hi_end_local - kernel_hi_start_local < blkSize.Kh - 1
+                           ? hi_end_local - kernel_hi_start_local
+                           : blkSize.Kh - 1;
+
+          for (int wo = 0; wo < blkSize.Wo; ++wo) {
+            int kernel_wi_start_global =
+                (blkStart.Wo + wo) / param.scale_factor -
+                param.kernel_size_half + blkStart.Kw;
+            int kernel_wi_start_local = kernel_wi_start_global - blkStart.Wi;
+
+            // exclude out of bound indices wwich should be ignored
+            int kw_min = wi_start_local - kernel_wi_start_local > 0
+                             ? wi_start_local - kernel_wi_start_local
+                             : 0;
+            int kw_max = wi_end_local - kernel_wi_start_local < blkSize.Kw - 1
+                             ? wi_end_local - kernel_wi_start_local
+                             : blkSize.Kw - 1;
+
+            // output_nram[ho, wo, g, c] = sum(mask_nram[ho, wo, g, kh, kw]
+            //     * input_nram[hi+kh, wi+kw, g, c],
+            //  for (kh,kw) in [0:blkSize.Kw-1] x [0:blkSize.Kh-1])
+            //
+            // sum(mask_nram[ho, wo, g, kh, kw]
+            //     * input_nram[hi+kh, wi+kw, g, c], (kh,kw))
+            //
+            T *mask_array = mask_nram + param.mask_nram_stride_h * ho +
+                            param.mask_nram_stride_w * wo;
+
+            for (int kh = kh_min; kh <= kh_max; ++kh) {
+              for (int kw = kw_min; kw <= kw_max; ++kw) {
+                T *src =
+                    input_nram +
+                    param.input_nram_stride_h * (kernel_hi_start_local + kh) +
+                    param.input_nram_stride_w * (kernel_wi_start_local + kw);
+
+                int mask_index = param.mask_nram_stride_kh * kh + kw;
+
+                // mlutiply mask weight with channels for each channel group
+                T *sum = sum_array;
+
+                for (int g = 0; g < blkSize.G; ++g) {
+                  __bang_mul_scalar(sum, src, mask_array[mask_index],
+                                    param.block_Cg_NFU);
+                  //
+                  // NOTE: Since block_Cg_NFU >= block_Cg_stride,
+                  // overlapped writing may occur on sum_array.
+                  // So this loop must be executed in order to
+                  // avoid data contamination, as shown below.
+                  //
+                  // |-----block_Cg_NFU---------|
+                  // xxxxxxxxxxxxxxxxxxxxyyyzzzzz------------
+                  // |---block_Cg_stride---|^^^^^will be overwritten
+                  //                             in the next iteration.
+                  //
+                  // x: actual data used, y: not used, z: overwritten
+                  //
+                  sum += param.input_nram_stride_g;
+                  src += param.input_nram_stride_g;
+                  mask_index += param.mask_nram_stride_g;
+                }  // loop blk_G
+
+                // add array[blk_G * blk_C] to output_nram
+                dst = output_nram + param.output_nram_stride_h * ho +
+                      param.output_nram_stride_w * wo;
+
+                __bang_add(dst, dst, sum_array, param.output_nram_stride_w);
+              }  // end loop blk_Kw
+            }    // end loop blk_Kh
+          }      // end loop blk_Wo
+        }        // end loop blk_Ho
+      }          // end loop grid_dim.Kw
+    }            // end loop grid_dim.Kh
+
+    /* write output from nram2gdram
+     *
+     * output_nram[          |   output[sample_idx,
+     *   0:blkSize.Ho-1,     |     blkStart.Ho + 0:blkSize.Ho-1,
+     *   0:blkSize.Wo-1,     |     blkStart.Wo + 0:blkSize.Wo-1,
+     *   0:blkSize.G-1,      |     blkStart.G  + 0:blkSize.G-1,
+     *   0:blkSize.Cg-1]     |     blkStart.Cg + 0:blkSize.Cg-1]
+     */
+    int dst_offset = INDEX3(sample_idx, blkStart.Ho, blkStart.Wo, blkStart.C,
+                            param.output_stride_n, param.output_stride_h,
+                            param.output_stride_w);
+    T *dst = output + dst_offset;
+    T *src = output_nram;
+    for (int i = 0; i < blkSize.Ho; ++i) {
+      storeStr3D(dst, src, blkSize.Cg, blkSize.G, blkSize.Wo,
+                 param.output_stride_g, param.output_stride_w,
+                 param.output_nram_stride_g, param.output_nram_stride_w);
+      dst += param.output_stride_h;
+      src += param.output_nram_stride_h;
+    }
+  }  // end loop N, grid_dim.(Hi,Wi,G,Cg)
+}
+
+template <typename T>
+__mlu_global__ void MLUBLOCKKernelCarafeForward(
+    const void *input, const void *mask, const CarafeForwardParam param,
+    const CarafeForwardBlockDim block_dim, const CarafeForwardGridDim grid_dim,
+    void *output) {
+  carafeForwardBLOCK((T *)input, (T *)mask, param, block_dim, grid_dim,
+                     (T *)output);
+}
+}  // namespace forward
+
+namespace backward {
+template <typename T>
+__mlu_func__ void CarafeCompute(T *input, T *mask, T *grad_output,
+                                T *grad_input, T *grad_mask, const int n,
+                                const int hi, const int wi, const int c,
+                                const int k_up, const int group,
+                                const int scale) {
+  char *input_buff = nram_buf;
+  char *mask_buff = input_buff + NRAM_BLOCK;
+  char *grad_input_buff = mask_buff + NRAM_BLOCK;
+  char *grad_output_buff = grad_input_buff + NRAM_BLOCK;
+  char *grad_mask_buff = grad_output_buff + NRAM_BLOCK;
+
+  int wo = wi * scale;
+  int ho = hi * scale;
+  int out_num = n * ho * wo * group;
+  int group_size = c / group;
+  int repeat = out_num / taskDim + (int)(taskId < out_num % taskDim);
+  int num_align = PAD_DOWN(NRAM_BLOCK / sizeof(T), NFU_ALIGN_SIZE / sizeof(T));
+  int num_per_loop = group_size / num_align;
+  int rem_for_loop = group_size % num_align;
+  int rem_for_loop_align = PAD_UP(rem_for_loop, NFU_ALIGN_SIZE / sizeof(T));
+  for (int k = 0; k < repeat; k++) {
+    int iter = k * taskDim + taskId;
+    int group_k = iter % group;
+    int w_k = (iter / group) % wo;
+    int h_k = (iter / wo / group) % ho;
+    int n_k = (iter / ho / wo / group) % n;
+    int h_i = h_k / scale;
+    int w_i = w_k / scale;
+    int start_h = h_i - ((k_up - 1) / 2);
+    int end_h = h_i + ((k_up - 1) / 2) + 1;
+    int start_w = w_i - ((k_up - 1) / 2);
+    int end_w = w_i + ((k_up - 1) / 2) + 1;
+    T *base_mask = (T *)mask + n_k * ho * wo * group * k_up * k_up +
+                   h_k * wo * group * k_up * k_up + w_k * group * k_up * k_up +
+                   group_k * k_up * k_up;
+    T *base_grad_mask = (T *)grad_mask + n_k * ho * wo * group * k_up * k_up +
+                        h_k * wo * group * k_up * k_up +
+                        w_k * group * k_up * k_up + group_k * k_up * k_up;
+
+    __bang_write_zero((T *)grad_input_buff, NRAM_BLOCK / sizeof(T));
+    __bang_write_zero((T *)grad_mask_buff, NRAM_BLOCK / sizeof(T));
+    __bang_write_zero((T *)grad_output_buff, NRAM_BLOCK / sizeof(T));
+
+    __memcpy((T *)mask_buff, (T *)base_mask, k_up * k_up * sizeof(T),
+             GDRAM2NRAM);
+    for (int i = 0; i < num_per_loop; i++) {
+      __bang_write_zero((T *)input_buff, NRAM_BLOCK / sizeof(T));
+      T *base_grad_output = (T *)grad_output + n_k * ho * wo * c +
+                            h_k * wo * c + w_k * c + group_k * group_size +
+                            i * num_align;
+      __memcpy((T *)grad_output_buff, (T *)base_grad_output,
+               num_align * sizeof(T), GDRAM2NRAM);
+      for (int ih = start_h; ih < end_h; ih++) {
+        for (int iw = start_w; iw < end_w; iw++) {
+          if (ih < 0 || ih > hi - 1 || iw < 0 || iw > wi - 1) {
+            continue;
+          }
+          int mask_ih = ih - h_i + (k_up - 1) / 2;
+          int mask_iw = iw - w_i + (k_up - 1) / 2;
+          int mask_index = mask_ih * k_up + mask_iw;
+          int input_index = n_k * hi * wi * c + ih * wi * c + iw * c +
+                            group_k * group_size + i * num_align;
+          T *base_input = (T *)input + input_index;
+          T *base_grad_input = (T *)grad_input + input_index;
+          __memcpy((T *)input_buff, (T *)base_input, num_align * sizeof(T),
+                   GDRAM2NRAM);
+          __bang_mul_scalar((T *)grad_input_buff, (T *)grad_output_buff,
+                            ((T *)mask_buff)[mask_index], num_align);
+          __bang_atomic_add((T *)grad_input_buff, (T *)base_grad_input,
+                            (T *)grad_input_buff, num_align);
+          __bang_mul((T *)input_buff, (T *)grad_output_buff, (T *)input_buff,
+                     num_align);
+
+          __bang_sumpool((T *)input_buff, (T *)input_buff,
+                         NFU_ALIGN_SIZE / sizeof(T),
+                         num_align / (NFU_ALIGN_SIZE / sizeof(T)), 1,
+                         num_align / (NFU_ALIGN_SIZE / sizeof(T)), 1, 1, 1);
+
+          __bang_reduce_sum((T *)input_buff, (T *)input_buff,
+                            NFU_ALIGN_SIZE / sizeof(T));
+          ((T *)grad_mask_buff)[mask_index] += ((T *)input_buff)[0];
+        }
+      }
+    }
+    if (rem_for_loop) {
+      __bang_write_zero((T *)input_buff, NRAM_BLOCK / sizeof(T));
+      T *base_grad_output = (T *)grad_output + n_k * ho * wo * c +
+                            h_k * wo * c + w_k * c + group_k * group_size +
+                            num_per_loop * num_align;
+      __memcpy((T *)grad_output_buff, (T *)base_grad_output,
+               rem_for_loop * sizeof(T), GDRAM2NRAM);
+      for (int ih = start_h; ih < end_h; ih++) {
+        for (int iw = start_w; iw < end_w; iw++) {
+          if (ih < 0 || ih > hi - 1 || iw < 0 || iw > wi - 1) {
+            continue;
+          }
+          int mask_ih = ih - h_i + (k_up - 1) / 2;
+          int mask_iw = iw - w_i + (k_up - 1) / 2;
+          int mask_index = mask_ih * k_up + mask_iw;
+          int input_index = n_k * hi * wi * c + ih * wi * c + iw * c +
+                            group_k * group_size + num_per_loop * num_align;
+          T *base_input = (T *)input + input_index;
+          T *base_grad_input = (T *)grad_input + input_index;
+          __memcpy((T *)input_buff, (T *)base_input, rem_for_loop * sizeof(T),
+                   GDRAM2NRAM);
+          __bang_mul_scalar((T *)grad_input_buff, (T *)grad_output_buff,
+                            ((T *)mask_buff)[mask_index], rem_for_loop_align);
+          __bang_atomic_add((T *)grad_input_buff, (T *)base_grad_input,
+                            (T *)grad_input_buff, rem_for_loop);
+          __bang_mul((T *)input_buff, (T *)grad_output_buff, (T *)input_buff,
+                     rem_for_loop_align);
+
+          __bang_sumpool(
+              (T *)input_buff, (T *)input_buff, NFU_ALIGN_SIZE / sizeof(T),
+              rem_for_loop_align / (NFU_ALIGN_SIZE / sizeof(T)), 1,
+              rem_for_loop_align / (NFU_ALIGN_SIZE / sizeof(T)), 1, 1, 1);
+          __bang_reduce_sum((T *)input_buff, (T *)input_buff,
+                            NFU_ALIGN_SIZE / sizeof(T));
+
+          ((T *)grad_mask_buff)[mask_index] += ((T *)input_buff)[0];
+        }
+      }
+    }
+    __memcpy((T *)base_grad_mask, (T *)grad_mask_buff, k_up * k_up * sizeof(T),
+             NRAM2GDRAM);
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelCarafeBackward(
+    const void *input, const void *mask, const void *grad_output,
+    void *grad_input, void *grad_mask, const int n, const int hi, const int wi,
+    const int c, const int k_up, const int group, const int scale) {
+  CarafeCompute((T *)input, (T *)mask, (T *)grad_output, (T *)grad_input,
+                (T *)grad_mask, n, hi, wi, c, k_up, group, scale);
+}
+}  // namespace backward
+
+void KernelCarafeForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                         cnrtQueue_t queue, const cnrtDataType_t d_type,
+                         const void *input, const void *mask,
+                         const CarafeForwardParam &param,
+                         const CarafeForwardBlockDim &block_dim,
+                         const CarafeForwardGridDim &grid_dim, void *output) {
+  if (d_type == CNRT_FLOAT16) {
+    forward::MLUBLOCKKernelCarafeForward<half><<<k_dim, k_type, queue>>>(
+        input, mask, param, block_dim, grid_dim, output);
+  } else {
+    forward::MLUBLOCKKernelCarafeForward<float><<<k_dim, k_type, queue>>>(
+        input, mask, param, block_dim, grid_dim, output);
+  }
+}
+
+void KernelCarafeBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t dtype,
+                          const void *input, const void *mask,
+                          const void *grad_output, void *grad_input,
+                          void *grad_mask, const int n, const int hi,
+                          const int wi, const int c, const int k_up,
+                          const int group, const int scale) {
+  if (dtype == CNRT_FLOAT16) {
+    backward::MLUUnion1KernelCarafeBackward<half><<<k_dim, k_type, queue>>>(
+        input, mask, grad_output, grad_input, grad_mask, n, hi, wi, c, k_up,
+        group, scale);
+  } else {
+    backward::MLUUnion1KernelCarafeBackward<float><<<k_dim, k_type, queue>>>(
+        input, mask, grad_output, grad_input, grad_mask, n, hi, wi, c, k_up,
+        group, scale);
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..09ca60ab1111a52f9f1d1bb20b7ef4e9cef99247
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/carafe_utils.hpp
@@ -0,0 +1,95 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef CARAFE_UTILS_HPP_
+#define CARAFE_UTILS_HPP_
+
+#define NRAM_ALIGN_SIZE 64
+
+struct CarafeForwardParam {
+  int N;   // batch size
+  int Hi;  // input height
+  int Wi;  // input width
+  int Ci;  // input channels
+  int Ho;  // output height
+  int Wo;  // output width
+  int Cg;  // channels per group
+
+  int kernel_size;       // kernel_size
+  int group_size;        // group_size
+  int scale_factor;      // scale_factor
+  int kernel_size_half;  // kernel half size (K-1)/2
+  int kernel_size_sq;    // square of kernel size
+
+  int dtype_size;  // size of tensor data type
+
+  // Host arrays' geometry
+  int input_stride_g;
+  int input_stride_w;
+  int input_stride_h;
+  int input_stride_n;
+  int input_size;
+  int mask_stride_kh;
+  int mask_stride_g;
+  int mask_stride_w;
+  int mask_stride_h;
+  int mask_stride_n;
+  int mask_size;
+  int output_stride_g;
+  int output_stride_w;
+  int output_stride_h;
+  int output_stride_n;
+  int output_size;
+
+  // NRAM arrays' geometry
+  int input_nram_stride_g;
+  int input_nram_stride_w;
+  int input_nram_stride_h;
+  int input_nram_size;
+  int mask_nram_stride_kh;
+  int mask_nram_stride_g;
+  int mask_nram_stride_w;
+  int mask_nram_stride_h;
+  int mask_nram_size;
+  int output_nram_stride_g;
+  int output_nram_stride_w;
+  int output_nram_stride_h;
+  int output_nram_size;
+
+  // for address/compute alignment
+  int align_size_NRAM;  // for addressing on NRAM
+  int align_size_NFU;   // for NFU operation length
+  int block_Cg_NFU;     // for bang_mul_const
+
+  int job_num;  // total job number
+};
+
+struct CarafeForwardBlockDim {
+  int Ho;  // block size of output height
+  int Wo;  // block size of output width
+  int Kh;  // block size of kernel height
+  int Kw;  // block size of kernel width
+  int G;   // block size of groups
+  int Cg;  // block size of channels within a group
+  int Hi;  // block size of input height
+  int Wi;  // block size of input width
+};
+
+struct CarafeForwardGridDim {
+  int Ho;  // number of blocks of output height
+  int Wo;
+  int Kh;
+  int Kw;
+  int G;
+  int Cg;
+};
+
+#endif  // CARAFE_UTILS_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/common_mlu_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/common_mlu_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..88805ba8e92086dff751de8f716b84ffff608d29
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/common_mlu_helper.hpp
@@ -0,0 +1,398 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef COMMON_MLU_HELPER_HPP_
+#define COMMON_MLU_HELPER_HPP_
+
+#define NFU_ALIGN_SIZE 128          // Byte
+#define REM_FOR_STACK (128 * 1024)  // 128KB reserved for cncc
+
+#ifdef __BANG_ARCH__
+#define MAX_NRAM_SIZE \
+  (__MLU_NRAM_SIZE__ * 1024 - REM_FOR_STACK)  // 128KB reserved for cncc
+#define MAX_SRAM_SIZE \
+  (__MLU_SRAM_SIZE__ * 1024 - REM_FOR_STACK)  // 128KB reserved for cncc
+#else
+#define MAX_NRAM_SIZE (384 * 1024)   // 384KB,  initialization value
+#define MAX_SRAM_SIZE (1920 * 1024)  // 1920KB, initialization value
+#endif
+
+#ifndef PAD_UP
+#define PAD_UP(x, y) (((x) / (y) + (int)((x) % (y) > 0)) * (y))
+#endif
+
+#ifndef PAD_DOWN
+#define PAD_DOWN(x, y) (((x) / (y)) * (y))
+#endif
+
+#define CEIL_ALIGN(x, y) (((x) + (y)-1) / (y) * (y))
+
+template <typename scalar_t>
+__mlu_func__ inline scalar_t min(scalar_t a, scalar_t b) {
+  return a < b ? a : b;
+}
+
+template <typename scalar_t>
+__mlu_func__ inline scalar_t max(scalar_t a, scalar_t b) {
+  return a > b ? a : b;
+}
+
+/*!
+ * @brief loads data from global DRAM to NRAM with 2D pattern.
+ *
+ * @param[out] dst
+ *   Pointer to NRAM that stores dst data.
+ * @param[in] src
+ *   Pointer to global DRAM that stores src data.
+ * @param[in] size
+ *   The byte size of segment in the lower dimension.
+ * @param[in] dst_str
+ *   The data stride in bytes between segments in the lower dimension of dst.
+ * @param[in] src_str
+ *   The data stride in bytes between segments in the lower dimension of src.
+ * @param[in] seg_num
+ *   The total count of data segments in the lower dimension.
+ */
+template <typename T>
+__mlu_func__ void loadStr2D(T *dst, T *src, const int size, const int dst_str,
+                            const int src_str, const int seg_num) {
+  if (dst_str == src_str && size == src_str) {
+    __memcpy(dst, src, src_str * seg_num * sizeof(T), GDRAM2NRAM);
+  } else if ((size == src_str || src_str <= dst_str) &&
+             src_str * sizeof(T) <= 512) {
+    // gather data less than 512Bytes to improve IO efficiency
+    T *tmp = (T *)dst + (dst_str - src_str) * seg_num;
+    __memcpy(tmp, src, (src_str * (seg_num - 1) + size) * sizeof(T),
+             GDRAM2NRAM);
+    if (dst_str != src_str) {
+      __memcpy(dst, tmp, size * sizeof(T), NRAM2NRAM, dst_str * sizeof(T),
+               src_str * sizeof(T), seg_num - 1);
+    }
+  } else {
+    __memcpy(dst, src, size * sizeof(T), GDRAM2NRAM, dst_str * sizeof(T),
+             src_str * sizeof(T), seg_num - 1);
+  }
+}
+
+/*!
+ * @brief loads data from global DRAM to NRAM with 3D pattern.
+ *
+ * @param[out] dst
+ *   Pointer to NRAM that stores dst data.
+ * @param[in] src
+ *   Pointer to global DRAM that stores src data.
+ * @param[in] size
+ *   The byte size of segment in the lowest dimension.
+ * @param[in] seg_num_in
+ *   The total count of data segments in the lowest dimension.
+ * @param[in] seg_num_out
+ *   The total count of data segments in the middle dimension.
+ * @param[in] dst_str_in
+ *   The data stride in bytes between segments in the lowest dimension of dst.
+ * @param[in] dst_str_out
+ *   The data stride in bytes between segments in the middle dimension of dst.
+ * @param[in] src_str_in
+ *   The data stride in bytes between segments in the lowest dimension of src.
+ * @param[in] src_str_out
+ *   The data stride in bytes between segments in the middle dimension of src.
+ */
+template <typename T>
+__mlu_func__ void loadStr3D(T *dst, T *src, const int size,
+                            const int seg_num_in, const int seg_num_out,
+                            const int dst_str_in, const int dst_str_out,
+                            const int src_str_in, const int src_str_out) {
+  T *tmp_dst = dst;
+  T *tmp_src = src;
+
+  for (int i = 0; i < seg_num_out; ++i) {
+    loadStr2D(tmp_dst, tmp_src, size, dst_str_in, src_str_in, seg_num_in);
+    tmp_src += src_str_out;
+    tmp_dst += dst_str_out;
+  }
+}
+
+/*!
+ * @brief stores data from NRAM to global DRAM with 2D pattern.
+ *
+ * @param[out] dst
+ *   Pointer to global DRAM that stores dst data.
+ * @param[in] src
+ *   Pointer to NRAM that stores src data.
+ * @param[in] size
+ *   The byte size of segment in the lower dimension.
+ * @param[in] dst_str
+ *   The data stride in bytes between segments in the lower dimension of dst.
+ * @param[in] src_str
+ *   The data stride in bytes between segments in the lower dimension of src.
+ * @param[in] seg_num
+ *   The total count of data segments in the lower dimension.
+ */
+template <typename T>
+__mlu_func__ void storeStr2D(T *dst, T *src, const int size, const int seg_num,
+                             const int dst_str, const int src_str) {
+  if ((size == dst_str && dst_str <= src_str) && dst_str * sizeof(T) <= 512) {
+    // gather data less than 512Bytes to improve IO efficiency
+    if (dst_str != src_str) {
+      __memcpy(src, src, size * sizeof(T), NRAM2NRAM, dst_str * sizeof(T),
+               src_str * sizeof(T), seg_num - 1);
+    }
+    __memcpy(dst, src, size * seg_num * sizeof(T), NRAM2GDRAM);
+  } else {
+    __memcpy(dst, src, size * sizeof(T), NRAM2GDRAM, dst_str * sizeof(T),
+             src_str * sizeof(T), seg_num - 1);
+  }
+}
+
+/*!
+ * @brief stores data from NRAM to global DRAM with 3D pattern.
+ *
+ * @param[out] dst
+ *   Pointer to global DRAM that stores dst data.
+ * @param[in] src
+ *   Pointer to NRAM that stores src data.
+ * @param[in] size
+ *   The byte size of segment in the lowest dimension.
+ * @param[in] seg_num_in
+ *   The total count of data segments in the lowest dimension.
+ * @param[in] seg_num_out
+ *   The total count of data segments in the middle dimension.
+ * @param[in] dst_str_in
+ *   The data stride in bytes between segments in the lowest dimension of dst.
+ * @param[in] dst_str_out
+ *   The data stride in bytes between segments in the middle dimension of dst.
+ * @param[in] src_str_in
+ *   The data stride in bytes between segments in the lowest dimension of src.
+ * @param[in] src_str_out
+ *   The data stride in bytes between segments in the middle dimension of src.
+ */
+template <typename T>
+__mlu_func__ void storeStr3D(T *dst, T *src, const int size,
+                             const int seg_num_in, const int seg_num_out,
+                             const int dst_str_in, const int dst_str_out,
+                             const int src_str_in, const int src_str_out) {
+  T *tmp_dst = dst;
+  T *tmp_src = src;
+  for (int i = 0; i < seg_num_out; ++i) {
+    storeStr2D(tmp_dst, tmp_src, size, seg_num_in, dst_str_in, src_str_in);
+    tmp_src += src_str_out;
+    tmp_dst += dst_str_out;
+  }
+}
+
+/*!
+ * @brief Converts int32 to float32 data type.
+ *
+ * @param[out] dst
+ *   Pointer to NRAM that stores int32 type data.
+ * @param[in,out] dst_addition
+ *   Pointer to NRAM as the workspace of dst, which has the same size as dst.
+ *   It allows empty pointer on MLU300 series.
+ * @param[in] src
+ *   Pointer to NRAM that stores float32 type data.
+ * @param[in,out] src_addition
+ *   Pointer to NRAM as the workspace of src, which has a size of 128 Bytes.
+ *   It allows empty pointer on MLU300 series.
+ * @param[in] src_count
+ *   The count of elements in src.
+ */
+__mlu_func__ void convertInt2Float(float *dst, float *dst_addition, int *src,
+                                   float *src_addition, const int src_count) {
+#if __BANG_ARCH__ >= 300
+  __bang_int2float((float *)dst, (int32_t *)src, src_count, 0);
+#else
+  // get sign bit
+  const float move_23bit = 8388608.0;
+  // 0x80000000 = 1,000000000,0000000000000000000000000000
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x80000000);
+  __bang_cycle_band((char *)dst_addition, (char *)src, (char *)src_addition,
+                    src_count * sizeof(float), NFU_ALIGN_SIZE);
+  // get 1 or 0 from sign bit
+  // judg is Odd
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x00000001);
+  __bang_cycle_bor((char *)dst_addition, (char *)dst_addition,
+                   (char *)src_addition, src_count * sizeof(float),
+                   NFU_ALIGN_SIZE);
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x80000001);
+  __bang_cycle_eq(dst_addition, dst_addition, src_addition, src_count,
+                  NFU_ALIGN_SIZE / sizeof(float));
+  // minus xor, positive num invariant
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0xffffffff);
+  __bang_cycle_mul(dst, dst_addition, src_addition, src_count,
+                   NFU_ALIGN_SIZE / sizeof(float));
+  __bang_bxor((char *)dst, (char *)src, (char *)dst, src_count * sizeof(float));
+  // convert int32 to float32
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x7fffff);
+  __bang_cycle_band((char *)dst, (char *)dst, (char *)src_addition,
+                    src_count * sizeof(float), NFU_ALIGN_SIZE);
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x4b000000);
+  __bang_cycle_bor((char *)dst, (char *)dst, (char *)src_addition,
+                   src_count * sizeof(float), NFU_ALIGN_SIZE);
+  __bang_sub_scalar(dst, dst, move_23bit, src_count);
+  // add one
+  __bang_add(dst, dst, dst_addition, src_count);
+  // set sign for float32
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0xffffffff);
+  __bang_cycle_mul(dst_addition, dst_addition, src_addition, src_count,
+                   NFU_ALIGN_SIZE / sizeof(float));
+
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x00000001);
+  __bang_cycle_add(dst_addition, dst_addition, src_addition, src_count,
+                   NFU_ALIGN_SIZE / sizeof(float));
+
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0x80000000);
+  __bang_cycle_band((char *)dst_addition, (char *)dst_addition,
+                    (char *)src_addition, src_count * 4, 128);
+  __bang_bor((char *)dst, (char *)dst, (char *)dst_addition, src_count * 4);
+#endif  // __BANG_ARCH__ >= 300
+}
+
+/*!
+ * @brief Converts float32 to int32 data type with to_zero round mode.
+ *
+ * @param[out] dst
+ *   Pointer to NRAM that stores float32 type data.
+ * @param[in,out] dst_addition
+ *   Pointer to NRAM as the workspace of dst, which has the same size as dst.
+ *   It allows empty pointer on MLU300 series.
+ * @param[in] src
+ *   Pointer to NRAM that stores int32 type data.
+ * @param[in,out] src_addition
+ *   Pointer to NRAM as the workspace of src, which has a size of 128 Bytes.
+ *   It allows empty pointer on MLU300 series.
+ * @param[in] src_count
+ *   The count of elements in src.
+ */
+__mlu_func__ void convertFloat2Int(int *dst, float *dst_addition, float *src,
+                                   float *src_addition, const int src_count) {
+#if __BANG_ARCH__ >= 300
+  __bang_float2int_tz((int32_t *)dst, (float *)src, src_count, 0);
+#else
+  // sign ===> src_addition
+  // dst=-1.0 : when src[i] is a negative number
+  // dst=+1.0 : when src[i] is a positive number
+  const int floatDchar = sizeof(float) / sizeof(char);
+  __bang_active_sign((float *)dst, src, src_count);
+  // dst_addition = abs(src)
+  __bang_mul(dst_addition, src, (float *)dst, src_count);
+  // if dst_addition < 1.0 , then src_addition + 1, to fix add error.
+  __bang_write_value((float *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     1.0f);
+  __bang_cycle_lt(dst_addition, dst_addition, (float *)src_addition, src_count,
+                  NFU_ALIGN_SIZE / sizeof(float));
+  __bang_add_tz((float *)dst, (float *)dst, (float *)dst_addition, src_count);
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     0xbf800000);
+  // set negative flag -1.0 = 0xbf80000
+  __bang_cycle_eq(
+      (float *)dst, (float *)dst, (float *)src_addition, src_count,
+      NFU_ALIGN_SIZE / sizeof(float));  //  to mark all src in [x<-1.0]
+  __bang_active_abs(dst_addition, src, src_count);
+  __bang_write_value((float *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     8388608.0f);
+  // mask shift move 23
+  __bang_cycle_add_tz(
+      dst_addition, dst_addition, src_addition, src_count,
+      NFU_ALIGN_SIZE / sizeof(float));  // right shift move 23bit
+  // two`s complement for negatibe
+  // dst=1.0 , when src <-1.0
+  // dst=0.0 , when src >=-1.0
+  __bang_sub(dst_addition, dst_addition, (float *)dst, src_count);
+  // to fix max value
+  // 0 1001 0110 111 1111 1111 1111 1111 1111 <=> 0xcb7fffff <=> 16777215.0,
+  // means max value.
+  __bang_mul_scalar((float *)dst, (float *)dst, 16777215.0, src_count);
+  __bang_bxor((char *)dst_addition, (char *)dst_addition, (char *)dst,
+              src_count * floatDchar);
+  // get low 23bit
+  __bang_write_value((unsigned *)src_addition, NFU_ALIGN_SIZE / sizeof(float),
+                     (unsigned)0x007fffff);
+  // mask low 23bit is 1
+  __bang_cycle_band((char *)dst_addition, (char *)dst_addition,
+                    (char *)src_addition, src_count * floatDchar,
+                    NFU_ALIGN_SIZE / sizeof(char));
+  // set 9 high bit ===> dst
+  // -2.0 <=> 0xc0000000 <=> 1100 0000 0000 0000 0000 0000 0000 0000
+  //  1.0 <=> 0x3f800000 <=> 0011 1111 1000 0000 0000 0000 0000 0000
+  __bang_write_value(src_addition, NFU_ALIGN_SIZE / sizeof(float), 0x3f800000);
+  __bang_cycle_and((float *)dst, (float *)dst, src_addition, src_count,
+                   NFU_ALIGN_SIZE / sizeof(float));
+  // src or dst_addition
+  __bang_bor((char *)dst_addition, (char *)dst, (char *)dst_addition,
+             src_count * floatDchar);
+  __bang_mul_scalar((float *)dst, (float *)dst, -2.0, src_count);
+  __bang_bor((char *)dst, (char *)dst, (char *)dst_addition,
+             src_count * floatDchar);
+#endif  // __BANG_ARCH__ >= 300
+}
+
+/*!
+ * @brief Converts float32 to half data type,
+ * the rounding mode on MLU200 is rd, on MLU300 is rn.
+ *
+ * @param[out] dst
+ *   Pointer to NRAM that stores half type data.
+ * @param[in] src
+ *   Pointer to NRAM that stores float32 type data.
+ * @param[in] src_count
+ *   The count of elements in src.
+ */
+__mlu_func__ inline void convertFloat2half(half *dst, float *src,
+                                           int src_count) {
+#if __BANG_ARCH__ >= 300
+  __bang_float2half_rn(dst, src, src_count);
+#else
+  __bang_float2half_rd(dst, src, src_count);
+#endif
+}
+
+/*!
+ * @brief recursiveSumPool.
+ * @param[in,out] dst
+ *     Pointer to NRAM that stores the input and output data.
+ * @param[in] low_dim
+ *     Which is the number of low dim.
+ * @param[in] high_dim
+ *     Which is the number of high dim.
+ * @param[in] kernel_limit
+ *     Which is the high_dim of sumpool per time.
+ ******************************************************************************/
+template <typename T>
+__mlu_func__ void recursiveSumPool(T *dst, int low_dim, int high_dim,
+                                   int kernel_limit) {
+  for (; high_dim > 1;) {
+    int repeat_s = high_dim / kernel_limit;
+    int remain_s = high_dim % kernel_limit;
+
+    if (remain_s) {
+      __bang_sumpool((T *)dst, (T *)dst, low_dim, 1, remain_s, 1, remain_s, 1,
+                     1);
+    }
+    if (repeat_s) {
+      __bang_sumpool((T *)dst + (remain_s > 0 ? low_dim : 0),
+                     (T *)dst + remain_s * low_dim, low_dim,
+                     kernel_limit * repeat_s, 1, kernel_limit, 1, 1,
+                     kernel_limit);
+    }
+    high_dim = repeat_s + (bool)remain_s;
+  }
+  return;
+}
+
+#endif  // COMMON_MLU_HELPER_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/deform_roi_pool_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/deform_roi_pool_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..6c765e3eaab33684adfd30f34b8ac734d3253709
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/deform_roi_pool_mlu_kernel.mlu
@@ -0,0 +1,712 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include <iostream>
+
+#include "common_mlu_helper.hpp"
+
+#define ROI_OFFSET 5
+#define FOURSPLIT 4
+#define FIVESPLIT 5
+#define NINESPLIT 9
+#define THIRTEENSPLIT 13
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+static __mlu_func__ void bilinearInterpolate(const int input_width, T y, T x,
+                                             T *w1, T *w2, T *w3, T *w4,
+                                             int *x_low, int *x_high,
+                                             const int y_low, bool *is_empty) {
+  if (x < -1.0 || x > input_width) {
+    *is_empty = true;
+    return;
+  }
+
+  if (x <= 0) x = 0;
+
+  *x_low = int(x);
+
+  if (*x_low >= input_width - 1) {
+    *x_high = *x_low = input_width - 1;
+    x = T(*x_low);
+  } else {
+    *x_high = *x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - *x_low;
+  T hy = 1.0 - ly;
+  T hx = 1.0 - lx;
+  *w1 = hy * hx;
+  *w2 = hy * lx;
+  *w3 = ly * hx;
+  *w4 = ly * lx;
+}
+
+template <typename T>
+__mlu_func__ void MLUUnion1DeformRoIPoolForward(
+    const T *input, const T *rois, const T *offset, T *output,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const T spatial_scale,
+    const int sampling_ratio, const T gamma) {
+  for (int bin_index = taskId;
+       bin_index < num_rois * pooled_width * pooled_height;
+       bin_index += taskDim) {
+    int out_batch = bin_index / pooled_width / pooled_height;
+    int out_height = bin_index / pooled_width % pooled_height;
+    int out_width = bin_index % pooled_width;
+    const T *cur_roi = rois + out_batch * ROI_OFFSET;
+    T *nram_rois = (T *)nram_buffer;
+    __memcpy((void *)nram_rois, (void *)cur_roi, ROI_OFFSET * sizeof(T),
+             GDRAM2NRAM);
+    const int roi_batch = nram_rois[0];
+    T roi_x_min = nram_rois[1] * spatial_scale - 0.5;
+    T roi_y_min = nram_rois[2] * spatial_scale - 0.5;
+    const T roi_x_max = nram_rois[3] * spatial_scale - 0.5;
+    const T roi_y_max = nram_rois[4] * spatial_scale - 0.5;
+    const T roi_width = roi_x_max - roi_x_min;
+    const T roi_height = roi_y_max - roi_y_min;
+    const T bin_width = roi_width / static_cast<T>(pooled_width);
+    const T bin_height = roi_height / static_cast<T>(pooled_height);
+    const T *offset_input = input + roi_batch * height * width * channels;
+    int roi_bin_grid_height =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_height / pooled_height));
+    int roi_bin_grid_width =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_width / pooled_width));
+    if (offset != NULL) {
+      const T *offset_cur = offset +
+                            out_batch * pooled_width * pooled_height * 2 +
+                            out_height * pooled_width + out_width;
+      roi_x_min += gamma * roi_width * offset_cur[0];
+      roi_y_min +=
+          gamma * roi_height * offset_cur[pooled_width * pooled_height];
+    }
+    int type_align = NFU_ALIGN_SIZE / sizeof(T);
+    int channels_max_num_nram = MAX_NRAM_SIZE / sizeof(T);
+    int channels_nram_split =
+        channels_max_num_nram / NINESPLIT / type_align * type_align;
+    int channel_rem = channels % channels_nram_split;
+    int channel_loops =
+        channels / channels_nram_split + (channel_rem != 0 ? 1 : 0);
+    for (int channel_loop_index = 0; channel_loop_index < channel_loops;
+         ++channel_loop_index) {
+      int channels_num =
+          channels_nram_split >= channels ? channels : channels_nram_split;
+      const int channel_offset = channel_loop_index * channels_num;
+      if (channel_loop_index + 1 == channel_loops && channel_rem != 0) {
+        channels_num = channel_rem;
+      }
+      int channels_align = CEIL_ALIGN(channels_num, type_align);
+      int nram_limit = (MAX_NRAM_SIZE / sizeof(T) - channels_align) >> 1;
+      int c_slice = nram_limit / FOURSPLIT / type_align * type_align;
+      int c_slice_align = 0;
+
+      /* NRAM partition
+       *
+       * |          |       ping        |       pong        |
+       * |----------|-------------------|-------------------|
+       * | nram_out | p1 | p2 | p3 | p4 | p1 | p2 | p3 | p4 |
+       *
+       */
+
+      T *nram_out = (T *)nram_buffer;
+      T *nram_ping = nram_out + channels_align;
+      T *nram_pong = nram_ping + nram_limit;
+      __bang_write_value((T *)nram_out, channels_align, (T)0);
+      __bang_write_value((T *)nram_ping, FOURSPLIT * c_slice, (T)0);
+      __bang_write_value((T *)nram_pong, FOURSPLIT * c_slice, (T)0);
+      const T num_bins =
+          static_cast<T>(max(roi_bin_grid_height * roi_bin_grid_width, 1));
+      const T value_div = 1.0f / num_bins;
+      bool is_ping_empty = true;
+      for (int iy = 0; iy < roi_bin_grid_height; ++iy) {
+        T y = roi_y_min + out_height * bin_height +
+              static_cast<T>(iy + .5f) * bin_height /
+                  static_cast<T>(roi_bin_grid_height);
+        if (y < -1.0 || y > height) {
+          is_ping_empty = true;
+          continue;
+        }
+        if (y <= 0) {
+          y = 0;
+        }
+        int y_low = 0, y_high = 0;
+        y_low = int(y);
+        if (y_low >= height - 1) {
+          y_high = y_low = height - 1;
+          y = T(y_low);
+        } else {
+          y_high = y_low + 1;
+        }
+        for (int ix = 0; ix < roi_bin_grid_width; ++ix) {
+          T x = roi_x_min + out_width * bin_width +
+                static_cast<T>(ix + .5f) * bin_width /
+                    static_cast<T>(roi_bin_grid_width);
+          const int sample_index = iy * roi_bin_grid_width + ix;
+          int c_rem = channels_num;
+          c_slice = nram_limit / FOURSPLIT / type_align * type_align;
+          c_slice_align = 0;
+          bool is_empty = false;
+          T w1, w2, w3, w4;
+          int x_low = 0, x_high = 0;
+          bilinearInterpolate(width, y, x, &w1, &w2, &w3, &w4, &x_low, &x_high,
+                              y_low, &is_empty);
+          if (is_empty) {
+            is_ping_empty = true;
+            continue;
+          }
+          if (is_ping_empty) {
+            c_slice = c_slice > c_rem ? c_rem : c_slice;
+            c_slice_align = CEIL_ALIGN(c_slice, type_align);
+            __bang_write_value(nram_ping, FOURSPLIT * c_slice_align, (T)0);
+            __asm__ volatile("sync;");
+            __memcpy(nram_ping,
+                     offset_input + y_low * width * channels +
+                         x_low * channels + channel_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            __memcpy(nram_ping + c_slice_align,
+                     offset_input + y_low * width * channels +
+                         x_high * channels + channel_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            __memcpy(nram_ping + 2 * c_slice_align,
+                     offset_input + y_high * width * channels +
+                         x_low * channels + channel_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            __memcpy(nram_ping + 3 * c_slice_align,
+                     offset_input + y_high * width * channels +
+                         x_high * channels + channel_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            is_ping_empty = false;
+          }
+          int c_offset = 0;
+          int pongc_slice = 0;
+          int pongc_slice_align = 0;
+          while (c_rem > 0) {
+            c_slice = c_slice > c_rem ? c_rem : c_slice;
+            c_slice_align = CEIL_ALIGN(c_slice, type_align);
+            if (sample_index + 1 < roi_bin_grid_height * roi_bin_grid_width) {
+              int iy_tmp = (sample_index + 1) / roi_bin_grid_width;
+              int ix_tmp = (sample_index + 1) % roi_bin_grid_width;
+              y = roi_y_min + out_height * bin_height +
+                  static_cast<T>(iy_tmp + .5f) * bin_height /
+                      static_cast<T>(roi_bin_grid_height);
+              x = roi_x_min + out_width * bin_width +
+                  static_cast<T>(ix_tmp + .5f) * bin_width /
+                      static_cast<T>(roi_bin_grid_width);
+              if (y < -1.0 || y > height) {
+                is_empty = true;
+              } else {
+                T w1_tmp, w2_tmp, w3_tmp, w4_tmp;
+                if (y <= 0) {
+                  y = 0;
+                }
+                y_low = int(y);
+                if (y_low >= height - 1) {
+                  y_high = y_low = height - 1;
+                  y = T(y_low);
+                } else {
+                  y_high = y_low + 1;
+                }
+                bilinearInterpolate(width, y, x, &w1_tmp, &w2_tmp, &w3_tmp,
+                                    &w4_tmp, &x_low, &x_high, y_low, &is_empty);
+              }
+              pongc_slice = nram_limit / FOURSPLIT / type_align * type_align;
+              pongc_slice =
+                  pongc_slice > channels_num ? channels_num : pongc_slice;
+              pongc_slice_align = CEIL_ALIGN(pongc_slice, type_align);
+              __bang_write_value(nram_pong, FOURSPLIT * pongc_slice_align,
+                                 (T)0);
+              __asm__ volatile("sync;");
+              if (!is_empty) {
+                __memcpy_async(nram_pong,
+                               offset_input + y_low * width * channels +
+                                   x_low * channels + channel_offset,
+                               pongc_slice * sizeof(T), GDRAM2NRAM);
+                __memcpy_async(nram_pong + pongc_slice_align,
+                               offset_input + y_low * width * channels +
+                                   x_high * channels + channel_offset,
+                               pongc_slice * sizeof(T), GDRAM2NRAM);
+                __memcpy_async(nram_pong + 2 * pongc_slice_align,
+                               offset_input + y_high * width * channels +
+                                   x_low * channels + channel_offset,
+                               pongc_slice * sizeof(T), GDRAM2NRAM);
+                __memcpy_async(nram_pong + 3 * pongc_slice_align,
+                               offset_input + y_high * width * channels +
+                                   x_high * channels + channel_offset,
+                               pongc_slice * sizeof(T), GDRAM2NRAM);
+              }
+            }
+            __bang_mul_scalar(nram_ping, nram_ping, w1, c_slice_align);
+            __bang_mul_scalar(nram_ping + c_slice_align,
+                              nram_ping + c_slice_align, w2, c_slice_align);
+            __bang_add(nram_ping, nram_ping, nram_ping + c_slice_align,
+                       c_slice_align);
+            __bang_mul_scalar(nram_ping + 2 * c_slice_align,
+                              nram_ping + 2 * c_slice_align, w3, c_slice_align);
+            __bang_add(nram_ping, nram_ping, nram_ping + 2 * c_slice_align,
+                       c_slice_align);
+            __bang_mul_scalar(nram_ping + 3 * c_slice_align,
+                              nram_ping + 3 * c_slice_align, w4, c_slice_align);
+            __bang_add(nram_ping, nram_ping, nram_ping + 3 * c_slice_align,
+                       c_slice_align);
+            __bang_add(nram_out + c_offset, nram_out + c_offset, nram_ping,
+                       c_slice_align);
+            T *nram_tmp = nram_ping;
+            nram_ping = nram_pong;
+            nram_pong = nram_tmp;
+            c_rem -= c_slice;
+            c_offset += c_slice;
+            __asm__ volatile("sync;");
+          }
+        }
+      }
+      __bang_mul_scalar(nram_out, nram_out, value_div, channels_align);
+      __memcpy(output + channels * bin_index + channel_offset, nram_out,
+               channels_num * sizeof(T), NRAM2GDRAM);
+    }
+  }
+}
+
+__mlu_global__ void MLUKernelDeformRoIPoolForward(
+    cnrtDataType_t data_type, const void *input, const void *rois,
+    const void *offset, void *output, const int channels, const int height,
+    const int width, const int num_rois, const int pooled_height,
+    const int pooled_width, const float spatial_scale, const int sampling_ratio,
+    const float gamma) {
+  switch (data_type) {
+    case CNRT_FLOAT16: {
+      MLUUnion1DeformRoIPoolForward((half *)input, (half *)rois, (half *)offset,
+                                    (half *)output, channels, height, width,
+                                    num_rois, pooled_height, pooled_width,
+                                    static_cast<half>(spatial_scale),
+                                    sampling_ratio, static_cast<half>(gamma));
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1DeformRoIPoolForward(
+          (float *)input, (float *)rois, (float *)offset, (float *)output,
+          channels, height, width, num_rois, pooled_height, pooled_width,
+          static_cast<float>(spatial_scale), sampling_ratio,
+          static_cast<float>(gamma));
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+void KernelDeformRoIPoolForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                cnrtQueue_t queue, cnrtDataType_t data_type,
+                                const void *input, const void *rois,
+                                const void *offset, void *output,
+                                const int channels, const int height,
+                                const int width, const int num_rois,
+                                const int pooled_height, const int pooled_width,
+                                const float spatial_scale,
+                                const int sampling_ratio, const float gamma) {
+  MLUKernelDeformRoIPoolForward<<<k_dim, k_type, queue>>>(
+      data_type, input, rois, offset, output, channels, height, width, num_rois,
+      pooled_height, pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+template <typename T>
+__mlu_func__ void MLUUnion1DeformRoIPoolBackward(
+    const T *grad_output, const T *input, const T *rois, const T *offset,
+    T *grad_input, T *grad_offset, const int channels, const int height,
+    const int width, const int num_rois, const int pooled_height,
+    const int pooled_width, const T spatial_scale, const int sampling_ratio,
+    const T gamma) {
+  for (int bin_index = taskId;
+       bin_index < num_rois * pooled_width * pooled_height;
+       bin_index += taskDim) {
+    int out_batch = bin_index / pooled_width / pooled_height;
+    int out_height = bin_index / pooled_width % pooled_height;
+    int out_width = bin_index % pooled_width;
+    const T *cur_roi = rois + out_batch * ROI_OFFSET;
+    T *nram_rois = (T *)nram_buffer;
+    __memcpy((void *)nram_rois, (void *)cur_roi, ROI_OFFSET * sizeof(T),
+             GDRAM2NRAM);
+    const int roi_batch = nram_rois[0];
+    T roi_x_min = nram_rois[1] * spatial_scale - 0.5;
+    T roi_y_min = nram_rois[2] * spatial_scale - 0.5;
+    const T roi_x_max = nram_rois[3] * spatial_scale - 0.5;
+    const T roi_y_max = nram_rois[4] * spatial_scale - 0.5;
+    const T roi_width = roi_x_max - roi_x_min;
+    const T roi_height = roi_y_max - roi_y_min;
+    const T bin_width = roi_width / static_cast<T>(pooled_width);
+    const T bin_height = roi_height / static_cast<T>(pooled_height);
+    const T *offset_input = input + roi_batch * height * width * channels;
+    T *offset_grad_input = grad_input + roi_batch * height * width * channels;
+    int roi_bin_grid_height =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_height / pooled_height));
+    int roi_bin_grid_width =
+        (sampling_ratio > 0)
+            ? sampling_ratio
+            : static_cast<int>(ceilf(roi_width / pooled_width));
+    if (offset != NULL) {
+      const T *offset_cur = offset +
+                            out_batch * pooled_width * pooled_height * 2 +
+                            out_height * pooled_width + out_width;
+      roi_x_min += gamma * roi_width * offset_cur[0];
+      roi_y_min +=
+          gamma * roi_height * offset_cur[pooled_width * pooled_height];
+    }
+
+    /* NRAM partition
+     *
+     * If offset != NULL, NRAM partition belows.
+     * |                                                                     |
+     * ping   |    pong   |
+     * |---------------------------------------------------------------------|-----------|-----------|
+     * |nram_tmp1|nram_tmp2|nram_tmp3|nram_tmp4|nram_grad_output|nram_sum_tmp|p1|p2|p3|p4|p1|p2|p3|p4|
+     *
+     * If offset == NULL, ping and pang will not be needed.
+     * | |
+     * |----------------------------------------------------------------------------------|
+     * | nram_tmp1 | nram_tmp2 | nram_tmp3 |  nram_tmp4 | nram_grad_output |
+     *
+     */
+
+    int type_align = NFU_ALIGN_SIZE / sizeof(T);
+    int channels_max_num_nram = MAX_NRAM_SIZE / sizeof(T);
+    int channels_nram_split =
+        channels_max_num_nram / FIVESPLIT / type_align * type_align;
+    int channel_rem = channels % channels_nram_split;
+    int channel_loops =
+        channels / channels_nram_split + (channel_rem != 0 ? 1 : 0);
+    if (offset != NULL) {
+      channels_nram_split =
+          channels_max_num_nram / THIRTEENSPLIT / type_align * type_align;
+      channel_rem = channels % channels_nram_split;
+      channel_loops =
+          channels / channels_nram_split + (channel_rem != 0 ? 1 : 0);
+    }
+
+    for (int channel_loop_index = 0; channel_loop_index < channel_loops;
+         ++channel_loop_index) {
+      int channels_num =
+          channels_nram_split >= channels ? channels : channels_nram_split;
+      const int channel_offset = channel_loop_index * channels_num;
+      if (channel_loop_index + 1 == channel_loops && channel_rem != 0) {
+        channels_num = channel_rem;
+      }
+      int channels_align = CEIL_ALIGN(channels_num, type_align);
+      const int32_t nram_sum_tmp_channel = NFU_ALIGN_SIZE / sizeof(T);
+      int nram_limit = (MAX_NRAM_SIZE / sizeof(T) - 5 * channels_align -
+                        nram_sum_tmp_channel) >>
+                       1;
+      int c_slice = 0;
+      int c_slice_align = 0;
+      T *nram_tmp1 = (T *)nram_buffer;
+      T *nram_tmp2 = (T *)nram_buffer + channels_align;
+      T *nram_tmp3 = (T *)nram_buffer + 2 * channels_align;
+      T *nram_tmp4 = (T *)nram_buffer + 3 * channels_align;
+      T *nram_grad_output = nram_tmp4 + channels_align;
+      T *nram_sum_tmp = NULL;
+      T *nram_ping_input = NULL;
+      T *nram_pong_input = NULL;
+      __bang_write_value((T *)nram_grad_output, channels_align, (T)0);
+      __asm__ volatile("sync;");
+
+      if (offset != NULL) {
+        c_slice = nram_limit / FOURSPLIT / type_align * type_align;
+        nram_sum_tmp = nram_grad_output + channels_align;
+        nram_ping_input = nram_sum_tmp + nram_sum_tmp_channel;
+        nram_pong_input = nram_ping_input + FOURSPLIT * c_slice;
+        __bang_write_value((T *)nram_sum_tmp, nram_sum_tmp_channel, (T)0);
+        __bang_write_value((T *)nram_ping_input, FOURSPLIT * c_slice, (T)0);
+        __bang_write_value((T *)nram_pong_input, FOURSPLIT * c_slice, (T)0);
+        __asm__ volatile("sync;");
+      }
+      const T num_bins =
+          static_cast<T>(max(roi_bin_grid_height * roi_bin_grid_width, 1));
+      const T value_div = 1.0f / num_bins;
+      bool is_ping_empty = true;
+      __memcpy(nram_grad_output,
+               grad_output + channels * bin_index + channel_offset,
+               channels_num * sizeof(T), GDRAM2NRAM);
+      __bang_mul_scalar(nram_grad_output, nram_grad_output, value_div,
+                        channels_align);
+      for (int iy = 0; iy < roi_bin_grid_height; ++iy) {
+        T y = roi_y_min + out_height * bin_height +
+              static_cast<T>(iy + .5f) * bin_height /
+                  static_cast<T>(roi_bin_grid_height);
+        T y_tmp = y;
+        if (y_tmp < -1.0 || y_tmp > height) {
+          is_ping_empty = true;
+          continue;
+        }
+        if (y_tmp <= 0) {
+          y_tmp = 0;
+        }
+        int y_low = 0, y_high = 0;
+        y_low = int(y_tmp);
+        if (y_low >= height - 1) {
+          y_high = y_low = height - 1;
+          y_tmp = T(y_low);
+        } else {
+          y_high = y_low + 1;
+        }
+        for (int ix = 0; ix < roi_bin_grid_width; ++ix) {
+          T x = roi_x_min + out_width * bin_width +
+                static_cast<T>(ix + .5f) * bin_width /
+                    static_cast<T>(roi_bin_grid_width);
+          const int sample_index = iy * roi_bin_grid_width + ix;
+          int c_rem = channels_num;
+          bool is_empty = false;
+          T w1, w2, w3, w4;
+          int x_low = 0, x_high = 0;
+          bilinearInterpolate(width, y_tmp, x, &w1, &w2, &w3, &w4, &x_low,
+                              &x_high, y_low, &is_empty);
+          if (is_empty) {
+            is_ping_empty = true;
+            continue;
+          }
+          __bang_mul_scalar((T *)nram_tmp1, (T *)nram_grad_output, w1,
+                            channels_align);
+          __bang_mul_scalar((T *)nram_tmp2, (T *)nram_grad_output, w2,
+                            channels_align);
+          __bang_mul_scalar((T *)nram_tmp3, (T *)nram_grad_output, w3,
+                            channels_align);
+          __bang_mul_scalar((T *)nram_tmp4, (T *)nram_grad_output, w4,
+                            channels_align);
+          __asm__ volatile("sync;");
+          __bang_atomic_add(
+              (T *)nram_tmp1,
+              (T *)(offset_grad_input + (y_low * width + x_low) * channels +
+                    channel_offset),
+              (T *)nram_tmp1, channels_num);
+          __bang_atomic_add(
+              (T *)nram_tmp2,
+              (T *)(offset_grad_input + (y_low * width + x_high) * channels +
+                    channel_offset),
+              (T *)nram_tmp2, channels_num);
+          __bang_atomic_add(
+              (T *)nram_tmp3,
+              (T *)(offset_grad_input + (y_high * width + x_low) * channels +
+                    channel_offset),
+              (T *)nram_tmp3, channels_num);
+          __bang_atomic_add(
+              (T *)nram_tmp4,
+              (T *)(offset_grad_input + (y_high * width + x_high) * channels +
+                    channel_offset),
+              (T *)nram_tmp4, channels_num);
+          if (offset != NULL) {
+            c_slice = nram_limit / FOURSPLIT / type_align * type_align;
+            c_slice_align = 0;
+            if (is_ping_empty) {
+              c_slice = c_slice > c_rem ? c_rem : c_slice;
+              c_slice_align = CEIL_ALIGN(c_slice, type_align);
+              __bang_write_value(nram_ping_input, FOURSPLIT * c_slice_align,
+                                 (T)0);
+              __asm__ volatile("sync;");
+              const T *src_offset1 = offset_input + y_low * width * channels +
+                                     x_low * channels + channel_offset;
+              const T *src_offset2 = offset_input + y_low * width * channels +
+                                     x_high * channels + channel_offset;
+              const T *src_offset3 = offset_input + y_high * width * channels +
+                                     x_low * channels + channel_offset;
+              const T *src_offset4 = offset_input + y_high * width * channels +
+                                     x_high * channels + channel_offset;
+              __memcpy(nram_ping_input, src_offset1, c_slice * sizeof(T),
+                       GDRAM2NRAM);
+              __memcpy(nram_ping_input + c_slice_align, src_offset2,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              __memcpy(nram_ping_input + 2 * c_slice_align, src_offset3,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              __memcpy(nram_ping_input + 3 * c_slice_align, src_offset4,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              is_ping_empty = false;
+            }
+            int c_offset = 0;
+            int pongc_slice = 0;
+            int pongc_slice_align = 0;
+            while (c_rem > 0) {
+              c_slice = c_slice > c_rem ? c_rem : c_slice;
+              c_slice_align = CEIL_ALIGN(c_slice, type_align);
+              if (sample_index + 1 < roi_bin_grid_height * roi_bin_grid_width) {
+                int iy_tmp = (sample_index + 1) / roi_bin_grid_width;
+                int ix_tmp = (sample_index + 1) % roi_bin_grid_width;
+                T y_tmp = roi_y_min + out_height * bin_height +
+                          static_cast<T>(iy_tmp + .5f) * bin_height /
+                              static_cast<T>(roi_bin_grid_height);
+                T x_tmp = roi_x_min + out_width * bin_width +
+                          static_cast<T>(ix_tmp + .5f) * bin_width /
+                              static_cast<T>(roi_bin_grid_width);
+                int x_low_tmp = 0, x_high_tmp = 0, y_low_tmp = 0,
+                    y_high_tmp = 0;
+                if (y_tmp < -1.0 || y_tmp > height) {
+                  is_empty = true;
+                } else {
+                  T w1_tmp, w2_tmp, w3_tmp, w4_tmp;
+                  if (y_tmp <= 0) {
+                    y_tmp = 0;
+                  }
+                  y_low_tmp = int(y_tmp);
+                  if (y_low_tmp >= height - 1) {
+                    y_high_tmp = y_low_tmp = height - 1;
+                    y_tmp = T(y_low_tmp);
+                  } else {
+                    y_high_tmp = y_low_tmp + 1;
+                  }
+                  bilinearInterpolate(width, y_tmp, x_tmp, &w1_tmp, &w2_tmp,
+                                      &w3_tmp, &w4_tmp, &x_low_tmp, &x_high_tmp,
+                                      y_low_tmp, &is_empty);
+                }
+                pongc_slice = nram_limit / FOURSPLIT / type_align * type_align;
+                pongc_slice =
+                    pongc_slice > channels_num ? channels_num : pongc_slice;
+                pongc_slice_align = CEIL_ALIGN(pongc_slice, type_align);
+                __bang_write_value(nram_pong_input,
+                                   FOURSPLIT * pongc_slice_align, (T)0);
+                __asm__ volatile("sync;");
+                if (!is_empty) {
+                  const T *src_offset1 = offset_input +
+                                         y_low_tmp * width * channels +
+                                         x_low_tmp * channels + channel_offset;
+                  const T *src_offset2 = offset_input +
+                                         y_low_tmp * width * channels +
+                                         x_high_tmp * channels + channel_offset;
+                  const T *src_offset3 = offset_input +
+                                         y_high_tmp * width * channels +
+                                         x_low_tmp * channels + channel_offset;
+                  const T *src_offset4 = offset_input +
+                                         y_high_tmp * width * channels +
+                                         x_high_tmp * channels + channel_offset;
+                  __memcpy_async(nram_pong_input, src_offset1,
+                                 pongc_slice * sizeof(T), GDRAM2NRAM);
+                  __memcpy_async(nram_pong_input + pongc_slice_align,
+                                 src_offset2, pongc_slice * sizeof(T),
+                                 GDRAM2NRAM);
+                  __memcpy_async(nram_pong_input + 2 * pongc_slice_align,
+                                 src_offset3, pongc_slice * sizeof(T),
+                                 GDRAM2NRAM);
+                  __memcpy_async(nram_pong_input + 3 * pongc_slice_align,
+                                 src_offset4, pongc_slice * sizeof(T),
+                                 GDRAM2NRAM);
+                }
+              }
+
+              __bang_mul_scalar(nram_tmp1, nram_ping_input + 3 * c_slice_align,
+                                y - y_low, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input + c_slice_align,
+                                y_high - y, c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input + 2 * c_slice_align,
+                                y_low - y, c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input, y - y_high,
+                                c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp1, nram_tmp1, gamma * roi_width,
+                                c_slice_align);
+              __bang_mul(nram_tmp1, nram_grad_output, nram_tmp1, c_slice_align);
+              const int32_t kernel_width =
+                  c_slice_align / nram_sum_tmp_channel +
+                  (int32_t)(c_slice_align % nram_sum_tmp_channel > 0);
+              __bang_sumpool(nram_sum_tmp, nram_tmp1, nram_sum_tmp_channel, 1,
+                             kernel_width, 1, kernel_width, kernel_width, 1);
+              __bang_reduce_sum(nram_sum_tmp, nram_sum_tmp,
+                                nram_sum_tmp_channel);
+              __bang_atomic_add(
+                  (T *)nram_sum_tmp,
+                  (T *)(grad_offset +
+                        out_batch * pooled_width * pooled_height * 2 +
+                        out_height * pooled_width + out_width),
+                  (T *)nram_sum_tmp, 1);
+              __bang_write_value((T *)nram_sum_tmp, nram_sum_tmp_channel, (T)0);
+              __bang_mul_scalar(nram_tmp1, nram_ping_input + 3 * c_slice_align,
+                                x - x_low, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input + 2 * c_slice_align,
+                                x_high - x, c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input + c_slice_align,
+                                x_low - x, c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp2, nram_ping_input, x - x_high,
+                                c_slice_align);
+              __bang_add(nram_tmp1, nram_tmp1, nram_tmp2, c_slice_align);
+              __bang_mul_scalar(nram_tmp1, nram_tmp1, gamma * roi_height,
+                                c_slice_align);
+              __bang_mul(nram_tmp1, nram_grad_output, nram_tmp1, c_slice_align);
+              __bang_sumpool(nram_sum_tmp, nram_tmp1, nram_sum_tmp_channel, 1,
+                             kernel_width, 1, kernel_width, kernel_width, 1);
+              __bang_reduce_sum(nram_sum_tmp, nram_sum_tmp,
+                                NFU_ALIGN_SIZE / sizeof(T));
+              __bang_atomic_add(
+                  (T *)nram_sum_tmp,
+                  (T *)(grad_offset +
+                        out_batch * pooled_width * pooled_height * 2 +
+                        pooled_width * pooled_height +
+                        out_height * pooled_width + out_width),
+                  (T *)nram_sum_tmp, 1);
+
+              T *nram_tmp = nram_ping_input;
+              nram_ping_input = nram_pong_input;
+              nram_pong_input = nram_tmp;
+              c_rem -= c_slice;
+              c_offset += c_slice;
+              __asm__ volatile("sync;");
+            }
+          }
+        }
+      }
+    }
+  }
+}
+
+__mlu_global__ void MLUKernelDeformRoIPoolBackward(
+    cnrtDataType_t data_type, const void *grad_output, const void *input,
+    const void *rois, const void *offset, void *grad_input, void *grad_offset,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const float spatial_scale,
+    const int sampling_ratio, const float gamma) {
+  switch (data_type) {
+    case CNRT_FLOAT16: {
+      MLUUnion1DeformRoIPoolBackward(
+          (half *)grad_output, (half *)input, (half *)rois, (half *)offset,
+          (half *)grad_input, (half *)grad_offset, channels, height, width,
+          num_rois, pooled_height, pooled_width,
+          static_cast<half>(spatial_scale), sampling_ratio,
+          static_cast<half>(gamma));
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1DeformRoIPoolBackward(
+          (float *)grad_output, (float *)input, (float *)rois, (float *)offset,
+          (float *)grad_input, (float *)grad_offset, channels, height, width,
+          num_rois, pooled_height, pooled_width,
+          static_cast<float>(spatial_scale), sampling_ratio,
+          static_cast<float>(gamma));
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+void KernelDeformRoIPoolBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    cnrtDataType_t data_type, const void *grad_output, const void *input,
+    const void *rois, const void *offset, void *grad_input, void *grad_offset,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const float spatial_scale,
+    const int sampling_ratio, const float gamma) {
+  MLUKernelDeformRoIPoolBackward<<<k_dim, k_type, queue>>>(
+      data_type, grad_output, input, rois, offset, grad_input, grad_offset,
+      channels, height, width, num_rois, pooled_height, pooled_width,
+      spatial_scale, sampling_ratio, gamma);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/focal_loss_sigmoid_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/focal_loss_sigmoid_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..7624379b68d6df41aae0253df26b9add61c7a76e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/focal_loss_sigmoid_mlu_kernel.mlu
@@ -0,0 +1,888 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include <float.h>
+
+#include "common_mlu_helper.hpp"
+
+#define PING 0
+#define PONG 1
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+namespace forward {
+template <typename T>
+__mlu_func__ void loadInput(char *nram_input, T *dram_input, const int32_t size,
+                            const int32_t dst_stride = 0,
+                            const int32_t src_stride = 0,
+                            const int32_t count = 1) {
+  if (dst_stride == src_stride) {
+    __memcpy_async(nram_input, dram_input, size * count, GDRAM2NRAM);
+  } else {
+    __memcpy_async(nram_input, dram_input, size, GDRAM2NRAM, dst_stride,
+                   src_stride, count - 1);
+  }
+}
+
+template <typename T>
+__mlu_func__ void loadWeight(char *nram_input, T *dram_input, const int32_t t,
+                             const int32_t c, const int32_t has_weight,
+                             const int32_t partition_nc) {
+  if (has_weight && partition_nc && t >= 0 && t < c) {
+    __memcpy_async(nram_input, (T *)dram_input + t, sizeof(T), GDRAM2NRAM);
+  }
+}
+
+template <typename T>
+__mlu_func__ void storeOutput(T *dram_output, char *nram_output,
+                              const int32_t size, const int32_t dst_stride = 0,
+                              const int32_t src_stride = 0,
+                              const int32_t count = 1) {
+  if (dst_stride == src_stride) {
+    __memcpy_async(dram_output, nram_output, size * count, NRAM2GDRAM);
+  } else {
+    __memcpy_async(dram_output, nram_output, size, NRAM2GDRAM, dst_stride,
+                   src_stride, count - 1);
+  }
+}
+
+template <typename T>
+__mlu_func__ void compute(T *input, const int32_t *target, const T *weight,
+                          const int32_t has_weight, const int32_t partition_nc,
+                          const int32_t deal_num, const int32_t n_seg,
+                          const int32_t c, const int32_t c_seg,
+                          const int32_t c_start_index, const float alpha,
+                          const float gamma, T *compute_a, T *compute_b,
+                          T *output) {
+  // set params
+  const int32_t c_num =
+      has_weight ? PAD_UP(c_seg, NFU_ALIGN_SIZE / sizeof(T)) : c_seg;
+  const int32_t c_end_index = c_start_index + c_seg;
+  const int32_t half_epsilon = 0x0400;
+  const T epsilon_f =
+      sizeof(T) == sizeof(float) ? FLT_MIN : *((half *)&half_epsilon);
+
+  // 0. alpha_t * p_t^r = alpha * (1 - p) ^ gamma  if t == c_i
+  //                    = (1 - alpha) * p ^ gamma  if t != c_i
+  __nramset((T *)output, deal_num, (T)(1 - alpha));
+  __bang_active_sigmoid((T *)compute_b, (T *)input, deal_num);
+  for (int32_t i = 0; i < n_seg; ++i) {
+    const int32_t t = *((uint32_t *)target + i);
+    if (t >= c_start_index && t < c_end_index) {
+      const uint32_t index = i * c_num + t - c_start_index;
+      *((T *)input + index) = -1.0 * (*((T *)input + index));
+      *((T *)compute_b + index) = 1.0 - (*((T *)compute_b + index)) + epsilon_f;
+      *((T *)output + index) = alpha;
+    }
+  }
+  if (sizeof(T) == sizeof(half)) {
+    __bang_half2float((float *)compute_a, (half *)compute_b, deal_num);
+    __bang_active_loghp((float *)compute_a, (float *)compute_a, deal_num);
+    __bang_mul_const((float *)compute_a, (float *)compute_a, (float)gamma,
+                     deal_num);
+    __bang_active_exphp((float *)compute_a, (float *)compute_a, deal_num);
+    __bang_float2half_rd((half *)compute_a, (float *)compute_a, deal_num);
+  } else {
+    __bang_active_loghp((T *)compute_a, (T *)compute_b, deal_num);
+    __bang_mul_const((T *)compute_a, (T *)compute_a, (T)gamma, deal_num);
+    __bang_active_exphp((T *)compute_a, (T *)compute_a, deal_num);
+  }
+  __bang_mul((T *)output, (T *)compute_a, (T *)output, deal_num);
+
+  // 1. max = max(0, -x)  if t == c_i
+  //        = max(0, x)   if t != c_i
+  __nramset((T *)compute_b, deal_num, (T)0);
+  __bang_maxequal((T *)compute_b, (T *)compute_b, (T *)input, deal_num);
+
+  // 2. -log(p_t) = ln(e^(-max)+ e^(-max-x) + max   if t == c_i
+  //              = ln(e^(-max)+ e^(-max+x) + max   if t != c_i
+  __bang_mul_const((T *)compute_a, (T *)compute_b, (T)-1.0, deal_num);
+  __bang_add((T *)input, (T *)compute_a, (T *)input, deal_num);
+
+  __bang_active_exphp((T *)compute_a, (T *)compute_a, deal_num);
+  __bang_active_exphp((T *)input, (T *)input, deal_num);
+  __bang_add((T *)compute_a, (T *)compute_a, (T *)input, deal_num);
+  __bang_active_loghp((T *)compute_a, (T *)compute_a, deal_num);
+  __bang_add((T *)input, (T *)compute_a, (T *)compute_b, deal_num);
+
+  // 3. output = alpha_t * p_t^r * [-log(p_t)]
+  __bang_mul((T *)output, (T *)output, (T *)input, deal_num);
+
+  // 4. with weight
+  if (has_weight) {
+    for (int32_t i = 0; i < n_seg; ++i) {
+      int32_t t = *((int32_t *)target + i);
+      if (t >= 0 && t < c) {
+        t = partition_nc ? 0 : t;
+        __bang_mul_const((T *)output + i * c_num, (T *)output + i * c_num,
+                         *((T *)weight + t), c_num);
+      }
+    }
+  }
+}
+
+template <typename T>
+__mlu_func__ void startPipeline(
+    const T *input, const int32_t *target, const T *weight,
+    char *nram_compute_a, char *nram_compute_b, char *nram_input,
+    char *nram_target, char *nram_weight, char *nram_output,
+    const int32_t has_weight, const int32_t partition_nc,
+    const int32_t pingpong_offset, const int32_t pingpong_weight_offset,
+    const int32_t c_offset_num, const int32_t n, const int32_t n_seg,
+    const int32_t c, const int32_t c_seg, const float alpha, const float gamma,
+    T *output) {
+  // with offset
+  input = (T *)((char *)input + c_offset_num * sizeof(T));
+  output = (T *)((char *)output + c_offset_num * sizeof(T));
+
+  const int32_t c_seg_align_num = PAD_UP(c_seg, NFU_ALIGN_SIZE / sizeof(T));
+  const int32_t c_num = has_weight ? c_seg_align_num : c_seg;
+  const int32_t deal_num = PAD_UP(n_seg * c_num, NFU_ALIGN_SIZE / sizeof(T));
+  const int32_t load_size = c_seg * sizeof(T);
+  const int32_t dram_stride = c * sizeof(T);
+  const int32_t nram_stride = c_num * sizeof(T);
+
+  if (has_weight && !partition_nc) {
+    loadInput<T>(nram_weight, (T *)weight, load_size, nram_stride, dram_stride,
+                 1);
+    __asm__ volatile("sync;\n\t");
+  }
+  const int32_t repeat = n / n_seg;
+  const int32_t remain = n % n_seg;
+
+  /*
+   * Pipeline: The pipeline is processed in three stages: Load, Compute, Store.
+   *           The allocated memory space of NRAM is divided into two parts:
+   *           PING and Pong. In a single time slice, PING is used to process
+   *           IO stream and PONG is used for computation. Both of them are
+   *           processed synchronously until finished.
+   *
+   * diagram of PINGPONG:
+   * |------|-----------------------------------------------------------------|
+   * |      |                              space                              |
+   * |------|-----------------------------------------------------------------|
+   * | time |   Ping   |   Pong   |   Ping   |   Pong   |   Ping   |   Pong   |
+   * |------|-----------------------------------------------------------------|
+   * |  0   |    L0    |          |          |          |          |          |
+   * |  1   |    C0    |    L1    |          |          |          |          |
+   * |  2   |    S0    |    C1    |    L2    |          |          |          |
+   * |  3   |          |    S1    |    C2    |    L3    |          |          |
+   * |  4   |          |          |    S2    |    C3    |    L4    |          |
+   * |  5   |          |          |          |    S3    |    C4    |    L5    |
+   * |  6   |          |          |          |          |    S4    |    C5    |
+   * |  7   |          |          |          |          |          |    S5    |
+   * |------|-----------------------------------------------------------------|
+   */
+
+  // diagram of PINGPONG: L0
+  if (repeat > 0) {
+    loadInput<T>(nram_input, (T *)input, load_size, nram_stride, dram_stride,
+                 n_seg);
+    loadInput<int32_t>(nram_target, (int32_t *)target, n_seg * sizeof(int32_t));
+    loadWeight<T>(nram_weight, (T *)weight, *((int32_t *)target), c, has_weight,
+                  partition_nc);
+    __asm__ volatile("sync;\n\t");
+  }
+
+  // diagram of PINGPONG: C0 and L1
+  if (repeat > 1) {
+    compute((T *)nram_input, (int32_t *)nram_target, (T *)nram_weight,
+            has_weight, partition_nc, deal_num, n_seg, c, c_seg, c_offset_num,
+            alpha, gamma, (T *)nram_compute_a, (T *)nram_compute_b,
+            (T *)nram_output);
+    loadInput<T>((char *)nram_input + pingpong_offset, (T *)input + c * n_seg,
+                 load_size, nram_stride, dram_stride, n_seg);
+    loadInput<int32_t>((char *)nram_target + pingpong_offset,
+                       (int32_t *)target + n_seg, n_seg * sizeof(int32_t));
+    loadWeight<T>((char *)nram_weight + pingpong_weight_offset, (T *)weight,
+                  *((int32_t *)target + n_seg), c, has_weight, partition_nc);
+    __asm__ volatile("sync;\n\t");
+  }
+
+  for (int32_t i = 0; i < repeat - 2; ++i) {
+    storeOutput<T>((T *)output + i * c * n_seg,
+                   nram_output + (i % 2) * pingpong_offset, load_size,
+                   dram_stride, nram_stride, n_seg);
+    loadInput<T>((char *)nram_input + (i % 2) * pingpong_offset,
+                 (T *)(input) + (i + 2) * c * n_seg, load_size, nram_stride,
+                 dram_stride, n_seg);
+    loadInput<int32_t>((char *)nram_target + (i % 2) * pingpong_offset,
+                       (int32_t *)target + (i + 2) * n_seg,
+                       n_seg * sizeof(int32_t));
+    loadWeight<T>((char *)nram_weight + (i % 2) * pingpong_weight_offset,
+                  (T *)weight, *((int32_t *)target + (i + 2) * n_seg), c,
+                  has_weight, partition_nc);
+    compute((T *)(nram_input + ((i + 1) % 2) * pingpong_offset),
+            (int32_t *)(nram_target + ((i + 1) % 2) * pingpong_offset),
+            (T *)(nram_weight +
+                  partition_nc * ((i + 1) % 2) * pingpong_weight_offset),
+            has_weight, partition_nc, deal_num, n_seg, c, c_seg, c_offset_num,
+            alpha, gamma, (T *)nram_compute_a, (T *)nram_compute_b,
+            (T *)(nram_output + ((i + 1) % 2) * pingpong_offset));
+    __asm__ volatile("sync;\n\t");
+  }
+
+  if (repeat > 1) {
+    storeOutput<T>((T *)output + (repeat - 2) * c * n_seg,
+                   (char *)nram_output + (repeat % 2) * pingpong_offset,
+                   load_size, dram_stride, nram_stride, n_seg);
+  }
+
+  if (remain > 0) {
+    loadInput<T>((char *)nram_input + (repeat % 2) * pingpong_offset,
+                 (T *)input + repeat * c * n_seg, load_size, nram_stride,
+                 dram_stride, remain);
+    loadInput<int32_t>((char *)nram_target + (repeat % 2) * pingpong_offset,
+                       (int32_t *)target + repeat * n_seg,
+                       remain * sizeof(int32_t));
+    loadWeight<T>((char *)nram_weight + (repeat % 2) * pingpong_weight_offset,
+                  (T *)weight, *((int32_t *)target + repeat * n_seg), c,
+                  has_weight, partition_nc);
+  }
+
+  if (repeat > 0) {
+    compute((T *)(nram_input + ((repeat - 1) % 2) * pingpong_offset),
+            (int32_t *)(nram_target + ((repeat - 1) % 2) * pingpong_offset),
+            (T *)(nram_weight +
+                  partition_nc * ((repeat - 1) % 2) * pingpong_weight_offset),
+            has_weight, partition_nc, deal_num, n_seg, c, c_seg, c_offset_num,
+            alpha, gamma, (T *)nram_compute_a, (T *)nram_compute_b,
+            (T *)(nram_output + ((repeat - 1) % 2) * pingpong_offset));
+  }
+  __asm__ volatile("sync;\n\t");
+
+  if (repeat > 0) {
+    storeOutput<T>((T *)output + (repeat - 1) * c * n_seg,
+                   (char *)nram_output + ((repeat - 1) % 2) * pingpong_offset,
+                   load_size, dram_stride, nram_stride, n_seg);
+  }
+
+  if (remain > 0) {
+    int32_t rem_num = PAD_UP(remain * c_num, NFU_ALIGN_SIZE / sizeof(T));
+    compute((T *)(nram_input + (repeat % 2) * pingpong_offset),
+            (int32_t *)(nram_target + (repeat % 2) * pingpong_offset),
+            (T *)(nram_weight +
+                  partition_nc * (repeat % 2) * pingpong_weight_offset),
+            has_weight, partition_nc, rem_num, remain, c, c_seg, c_offset_num,
+            alpha, gamma, (T *)nram_compute_a, (T *)nram_compute_b,
+            (T *)(nram_output + (repeat % 2) * pingpong_offset));
+    __asm__ volatile("sync;\n\t");
+
+    storeOutput<T>((T *)output + repeat * c * n_seg,
+                   (char *)nram_output + (repeat % 2) * pingpong_offset,
+                   load_size, dram_stride, nram_stride, remain);
+  }
+  __asm__ volatile("sync;\n\t");
+}
+
+template <typename T>
+__mlu_func__ void focalLossSigmoidForwardBlock(
+    const T *input, const int32_t *target, const T *weight, const int32_t n,
+    const int32_t c, const float alpha, const float gamma, T *output) {
+  /*
+   * NRAM partition
+   *  |-----------------------------------------------------------------------|
+   *  |                                weight                                 |
+   *  |------------------------------- COMPUTE -------------------------------|
+   *  |                                   |                                   |
+   *  |              computeA             |               computeB            |
+   *  |                                   |                                   |
+   *  |------------- PING ------------------------------- PONG ---------------|
+   *  |                                   |                                   |
+   *  |              input                |               input               |
+   *  |                                   |                                   |
+   *  |-----------------------------------|-----------------------------------|
+   *  |                                   |                                   |
+   *  |              output               |               output              |
+   *  |                                   |                                   |
+   *  |-----------------------------------|-----------------------------------|
+   *  |              target               |               target              |
+   *  |-----------------------------------|-----------------------------------|
+   *
+   * split_pipeline_num is 6: COMPUTE(computeA,computeB), PING(input,output),
+   * PONG(input,output).
+   * split_target_num is 2: PING(target), PONG(target).
+   * weight is not NULL:
+   *   The nram-size of weight is equal to c_align_size when partition input-N.
+   *   The nram-size of weight is equal to NFU_ALIGN_SIZE when partition
+   * input-NC.
+  */
+
+  // calculate threshold of c
+  const int32_t split_pipeline_num = 6;
+  const int32_t split_target_num = 2;
+  const int32_t has_weight = weight != NULL;
+  const int32_t threshold_c =
+      PAD_DOWN((MAX_NRAM_SIZE - split_target_num * sizeof(int32_t)) /
+                   (split_pipeline_num + has_weight),
+               NFU_ALIGN_SIZE) /
+      sizeof(T);
+  const int32_t c_align = PAD_UP(c, NFU_ALIGN_SIZE / sizeof(T));
+  const int32_t c_align_size = c_align * sizeof(T);
+
+  if (c <= threshold_c) {
+    // partition inputN
+    int32_t c_num = c;
+    int32_t reservered_align_size =
+        (split_target_num + split_pipeline_num) * NFU_ALIGN_SIZE;
+    int32_t weight_size = 0;
+    if (has_weight) {
+      c_num = c_align;
+      reservered_align_size = split_target_num * NFU_ALIGN_SIZE;
+      weight_size = c_align_size;
+    }
+
+    const int32_t remain_size =
+        MAX_NRAM_SIZE - weight_size - reservered_align_size;
+    const int32_t n_seg =
+        remain_size / (split_pipeline_num * c_num * sizeof(T) +
+                       split_target_num * sizeof(int32_t));
+    const int32_t split_pipeline_size =
+        PAD_UP(c_num * n_seg * sizeof(T), NFU_ALIGN_SIZE);
+    const int32_t compute_size = 2 * split_pipeline_size;
+    const int32_t pingpong_offset = (MAX_NRAM_SIZE - weight_size - compute_size) / 2;
+
+    char *nram_weight = (char *)nram_buffer;
+    char *nram_compute_a = nram_weight + has_weight * c_align_size;
+    char *nram_compute_b = nram_compute_a + split_pipeline_size;
+    char *nram_input = nram_compute_b + split_pipeline_size;
+    char *nram_output = nram_input + split_pipeline_size;
+    char *nram_target = nram_output + split_pipeline_size;
+
+    startPipeline<T>(input, target, weight, nram_compute_a, nram_compute_b,
+                     nram_input, nram_target, nram_weight, nram_output,
+                     has_weight, 0, pingpong_offset, 0, 0, n, n_seg, c, c,
+                     alpha, gamma, output);
+  } else {
+    // partition inputNC
+    const int32_t weight_size = has_weight * NFU_ALIGN_SIZE;
+    const int32_t remain_size = MAX_NRAM_SIZE - weight_size;
+    const int32_t split_pipeline_size = PAD_DOWN(
+        (remain_size - split_target_num * NFU_ALIGN_SIZE) / split_pipeline_num,
+        NFU_ALIGN_SIZE);
+    const int32_t c_seg = split_pipeline_size / sizeof(T);
+    const int32_t n_seg = 1;
+    const int32_t compute_size = 2 * split_pipeline_size;
+    const int32_t pingpong_offset = (MAX_NRAM_SIZE - weight_size - compute_size) / 2;
+    const int32_t pingpong_weight_offset = weight_size / 2;
+
+    char *nram_weight = (char *)nram_buffer;
+    char *nram_compute_a = nram_weight + weight_size;
+    char *nram_compute_b = nram_compute_a + split_pipeline_size;
+    char *nram_input = nram_compute_b + split_pipeline_size;
+    char *nram_output = nram_input + split_pipeline_size;
+    char *nram_target = nram_output + split_pipeline_size;
+
+    const int32_t loop_num = (c + c_seg - 1) / c_seg;
+    const int32_t partition_nc = 1;
+    for (int32_t i = 0; i < loop_num; ++i) {
+      const int32_t c_index = i * c_seg;
+      const int32_t c_seg_curr = i == (loop_num - 1) ? c - c_index : c_seg;
+      startPipeline<T>(input, target, weight, nram_compute_a, nram_compute_b,
+                       nram_input, nram_target, nram_weight, nram_output,
+                       has_weight, partition_nc, pingpong_offset,
+                       pingpong_weight_offset, c_index, n, n_seg, c, c_seg_curr,
+                       alpha, gamma, output);
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelFocalLossSigmoidForward(
+    const void *input, const void *target, const void *weight, const int32_t N,
+    const int32_t C, const float alpha, const float gamma, void *output) {
+  const int32_t n_seg = N / taskDim + (taskId == taskDim - 1) * (N % taskDim);
+  const T *input_offset = (T *)input + N / taskDim * taskId * C;
+  const int32_t *target_offset = (int32_t *)target + N / taskDim * taskId;
+  T *output_offset = (T *)output + N / taskDim * taskId * C;
+
+  focalLossSigmoidForwardBlock((T *)input_offset, (int32_t *)target_offset,
+                               (T *)weight, n_seg, C, alpha, gamma,
+                               (T *)output_offset);
+}
+}  // namespace forward
+
+namespace backward {
+template <typename T>
+__mlu_func__ void loadInput(char *nram_input, char *nram_target,
+                            const T *gdram_input, const int32_t *gdram_target,
+                            const int32_t deal_n, const int32_t total_c,
+                            const bool pingping_flag, const bool has_weight,
+                            const int32_t nram_offset,
+                            const int32_t gdram_offset) {
+  if (pingping_flag == PONG) {
+    nram_input += nram_offset;
+    nram_target += nram_offset;
+  }
+
+  __memcpy_async(nram_target, gdram_target + gdram_offset / total_c,
+                 deal_n * sizeof(int32_t), GDRAM2NRAM);
+
+  char *nram_input_load = nram_input;
+  int32_t compute_align_size = 2 * NFU_ALIGN_SIZE;
+  if (has_weight) {
+    if (sizeof(T) == sizeof(half)) {
+      int32_t compute_align_num = compute_align_size / sizeof(float);
+      int32_t align_c = PAD_UP(total_c, compute_align_num);
+      int32_t compute_size = deal_n * align_c * sizeof(float);
+      nram_input_load += compute_size / 2;
+    }
+    int32_t align_c = PAD_UP(total_c, NFU_ALIGN_SIZE / sizeof(T));
+    int32_t total_c_size = total_c * sizeof(T);
+    int32_t align_c_size = align_c * sizeof(T);
+    __memcpy_async(nram_input_load, gdram_input + gdram_offset, total_c_size,
+                   GDRAM2NRAM, align_c_size, total_c_size, deal_n - 1);
+  } else {
+    if (sizeof(T) == sizeof(half)) {
+      int32_t compute_size =
+          PAD_UP(deal_n * total_c * sizeof(float), compute_align_size);
+      nram_input_load += compute_size / 2;
+    }
+    int32_t load_size = deal_n * total_c * sizeof(T);
+    __memcpy_async(nram_input_load, gdram_input + gdram_offset, load_size,
+                   GDRAM2NRAM);
+  }
+}
+
+template <typename T>
+__mlu_func__ void sigmoid(T *dst_data, const T *src_data,
+                          const int32_t elem_count) {
+  __bang_mul_const(dst_data, (T *)src_data, T(-1), elem_count);
+  __bang_active_exphp(dst_data, dst_data, elem_count);
+  __bang_add_const(dst_data, dst_data, T(1), elem_count);
+  __bang_active_reciphp(dst_data, dst_data, elem_count);
+}
+
+template <typename T>
+__mlu_func__ void coreCompute(char *nram_input, const T *nram_weight,
+                              const float *nram_flt_min, char *nram_pt,
+                              char *nram_alpha_t, char *nram_temp,
+                              char *nram_target, const float *nram_gamma,
+                              char *nram_output, const float alpha,
+                              const int32_t compute_num, const int32_t deal_n,
+                              const int32_t total_c, const bool pingpong_flag,
+                              const int32_t nram_offset,
+                              const bool has_weight) {
+  if (pingpong_flag == PONG) {
+    nram_input += nram_offset;
+    nram_pt += nram_offset;
+    nram_alpha_t += nram_offset;
+    nram_temp += nram_offset;
+    nram_output += nram_offset;
+    nram_target += nram_offset;
+  }
+
+  if (sizeof(T) == sizeof(half)) {
+    const int32_t compute_size = compute_num * sizeof(float);
+    char *nram_input_load = nram_input + compute_size / 2;
+    __bang_half2float((float *)nram_input, (half *)nram_input_load,
+                      compute_num);
+  }
+
+  // 0. alpha_t = alpha - 1
+  __nramset((float *)nram_alpha_t, compute_num, (float)(alpha - 1.0));
+
+  // 1. pt = 1 - sigmoid(x)
+  sigmoid((float *)nram_pt, (float *)nram_input, compute_num);
+  __bang_mul_const((float *)nram_pt, (float *)nram_pt, (float)(-1),
+                   compute_num);
+  __bang_add_const((float *)nram_pt, (float *)nram_pt, (float)1, compute_num);
+
+  // 2. pt      = target[n] == c ? sigmoid(x) : 1 - sigmoid(x)
+  //    alpha_t = target[n] == c ? alpha      : alpha - 1
+  const int32_t nfu_align_num = NFU_ALIGN_SIZE / sizeof(float);
+  for (int n = 0; n < deal_n; n++) {
+    const int32_t target_value = ((int32_t *)nram_target)[n];
+    if (target_value >= total_c || target_value < 0) continue;
+    int32_t c_offset = 0;
+    if (has_weight) {
+      int32_t c_align_num = nfu_align_num;
+      if (sizeof(T) == sizeof(half)) {
+        c_align_num += nfu_align_num;
+      }
+      c_offset = PAD_UP(total_c, c_align_num);
+    } else {
+      c_offset = total_c;
+    }
+    int32_t idx = n * c_offset + target_value;
+    *((float *)nram_pt + idx) = 1.0 - *((float *)nram_pt + idx);
+    *((float *)nram_alpha_t + idx) = alpha;
+  }
+
+  // 3. temp = -alpha_t * e^(gamma * log(max(1 - pt, FLT_MIN))
+  __bang_mul_const((float *)nram_temp, (float *)nram_pt, (float)(-1),
+                   compute_num);
+  __bang_add_const((float *)nram_temp, (float *)nram_temp, (float)(1),
+                   compute_num);
+  __bang_cycle_maxequal((float *)nram_temp, (float *)nram_temp,
+                        (float *)nram_flt_min, compute_num, nfu_align_num);
+  __bang_active_loghp((float *)nram_temp, (float *)nram_temp, compute_num);
+  __bang_cycle_mul((float *)nram_temp, (float *)nram_temp, (float *)nram_gamma,
+                   compute_num, nfu_align_num);
+  __bang_active_exphp((float *)nram_temp, (float *)nram_temp, compute_num);
+  __bang_mul((float *)nram_temp, (float *)nram_temp, (float *)nram_alpha_t,
+             compute_num);
+  __bang_mul_const((float *)nram_temp, (float *)nram_temp, (float)(-1),
+                   compute_num);
+
+  // 4. output = 1 - pt - gamma * pt * log(max(pt, FLT_MIN))
+  __bang_cycle_maxequal((float *)nram_output, (float *)nram_pt,
+                        (float *)nram_flt_min, compute_num, nfu_align_num);
+  __bang_active_loghp((float *)nram_output, (float *)nram_output, compute_num);
+  __bang_mul((float *)nram_output, (float *)nram_output, (float *)nram_pt,
+             compute_num);
+  __bang_cycle_mul((float *)nram_output, (float *)nram_output,
+                   (float *)nram_gamma, compute_num, nfu_align_num);
+  __bang_add((float *)nram_output, (float *)nram_output, (float *)nram_pt,
+             compute_num);
+  __bang_mul_const((float *)nram_output, (float *)nram_output, (float)(-1),
+                   compute_num);
+  __bang_add_const((float *)nram_output, (float *)nram_output, (float)(1),
+                   compute_num);
+
+  // 5. output = output * temp
+  __bang_mul((float *)nram_output, (float *)nram_output, (float *)nram_temp,
+             compute_num);
+
+  if (sizeof(T) == sizeof(half)) {
+    __bang_float2half_rd((half *)nram_output, (float *)nram_output,
+                         compute_num);
+  }
+
+  if (has_weight) {
+    // with weight
+    for (int n = 0; n < deal_n; n++) {
+      int32_t c_align_num = nfu_align_num;
+      if (sizeof(T) == sizeof(half)) {
+        c_align_num += nfu_align_num;
+      }
+      int32_t align_c = PAD_UP(total_c, c_align_num);
+      int32_t target_value = ((int32_t *)nram_target)[n];
+      T weight_value = nram_weight[target_value];
+      __bang_mul_const((T *)nram_output + n * align_c,
+                       (T *)nram_output + n * align_c, weight_value, align_c);
+    }
+  }
+}
+
+template <typename T>
+__mlu_func__ void storeOutput(T *gdram_output, const char *nram_output,
+                              const int32_t deal_n, const int32_t total_c,
+                              const bool pingpong_flag, const bool has_weight,
+                              const int32_t nram_offset,
+                              const int32_t gdram_offset) {
+  if (pingpong_flag == PONG) {
+    nram_output += nram_offset;
+  }
+  const int32_t store_size = deal_n * total_c * sizeof(T);
+  if (has_weight) {
+    int32_t align_c = PAD_UP(total_c, NFU_ALIGN_SIZE / sizeof(T));
+    int32_t total_c_size = total_c * sizeof(T);
+    int32_t align_c_size = align_c * sizeof(T);
+    __memcpy_async(gdram_output + gdram_offset, nram_output, total_c_size,
+                   NRAM2GDRAM, total_c_size, align_c_size, deal_n - 1);
+  } else {
+    __memcpy_async(gdram_output + gdram_offset, nram_output, store_size,
+                   NRAM2GDRAM);
+  }
+}
+
+template <typename T>
+__mlu_func__ void focalLossSigmoidBackwardBlock(
+    const T *input, const int32_t *target, const T *weight, const float gamma,
+    const float alpha, const int32_t total_n, const int32_t deal_n,
+    const int32_t total_c, T *output) {
+  // params per time slice
+  int32_t deal_num = deal_n * total_c;
+  int32_t deal_size = deal_num * sizeof(float);
+  int32_t compute_num = 0;
+  int32_t compute_size = 0;
+  int32_t compute_align_size = NFU_ALIGN_SIZE;
+  const int32_t nfu_align_num = NFU_ALIGN_SIZE / sizeof(T);
+  if (sizeof(T) == sizeof(half)) {
+    compute_align_size += NFU_ALIGN_SIZE;
+  }
+  const int32_t compute_align_num = compute_align_size / sizeof(float);
+  bool has_weight = false;
+  if (weight != NULL) {
+    has_weight = true;
+    int32_t align_c = PAD_UP(total_c, compute_align_num);
+    compute_num = deal_n * align_c;
+    compute_size = compute_num * sizeof(float);
+  } else {
+    compute_size = PAD_UP(deal_size, compute_align_size);
+    compute_num = compute_size / sizeof(float);
+  }
+
+  // params per core
+  int32_t total_num = total_n * total_c;
+  int32_t num_per_core = PAD_DOWN(total_num / taskDim, deal_num);
+  int32_t loop_per_core = num_per_core / deal_num;
+
+  /* NRAM partition:
+   *
+   * |-----------------ping pong--------------------|
+   * |input | pt | alpha_t | temp | output | target | flt_min | gamma | weight|
+   *
+   * split_pipeline_num is 5: input, pt, alpha_t, temp, output.
+   * nram_reserved_line_num is 2: flt_min, gamma.
+   */
+  const int32_t split_pipeline_num = 5;
+  const int32_t nram_reserved_line_num = 2;
+  int32_t target_deal_size = deal_n * sizeof(int32_t);
+  int32_t target_deal_size_align = PAD_UP(target_deal_size, NFU_ALIGN_SIZE);
+  // nram PING/PONG offset
+  int32_t ping_pong_offset =
+      compute_size * split_pipeline_num + target_deal_size_align;
+
+  // gdram addr
+  int32_t *base_addr_target =
+      (int32_t *)target + taskId * loop_per_core * deal_n;
+  T *base_addr_input = (T *)input + taskId * num_per_core;
+  T *base_addr_output = output + taskId * num_per_core;
+
+  // nram addr
+  char *nram_input = (char *)nram_buffer;
+  char *nram_pt = nram_input + compute_size;
+  char *nram_alpha_t = nram_pt + compute_size;
+  char *nram_temp = nram_alpha_t + compute_size;
+  char *nram_output = nram_temp + compute_size;
+  char *nram_target = nram_output + compute_size;
+  float *nram_flt_min = NULL;
+  float *nram_gamma = NULL;
+  T *nram_weight = NULL;
+
+  if (!has_weight) {
+    nram_flt_min = (float *)(nram_buffer + MAX_NRAM_SIZE -
+                             nram_reserved_line_num * NFU_ALIGN_SIZE);
+    nram_gamma = nram_flt_min + nfu_align_num;
+  } else {
+    int32_t weight_space = PAD_UP(total_c * sizeof(T), NFU_ALIGN_SIZE);
+    nram_flt_min =
+        (float *)(nram_buffer + MAX_NRAM_SIZE -
+                  nram_reserved_line_num * NFU_ALIGN_SIZE - weight_space);
+    nram_gamma = nram_flt_min + nfu_align_num;
+    nram_weight = (T *)(nram_gamma + nfu_align_num);
+    __memcpy_async(nram_weight, weight, total_c * sizeof(T), GDRAM2NRAM);
+  }
+
+  // nram set gamma and FLT_MIN
+  __nramset(nram_gamma, nfu_align_num, gamma);
+  __nramset(nram_flt_min, nfu_align_num, FLT_MIN);
+
+  /*
+   * Pipeline: The pipeline is processed in three stages: Load, Compute, Store.
+   *           The allocated memory space of NRAM is divided into two parts:
+   *           PING and Pong. In a single time slice, PING is used to process
+   *           IO stream and PONG is used for computation. Both of them are
+   *           processed synchronously until finished.
+   *
+   * diagram of PINGPONG:
+   * |------|-----------------------------------------------------------------|
+   * |      |                              space                              |
+   * |------|-----------------------------------------------------------------|
+   * | time |   Ping   |   Pong   |   Ping   |   Pong   |   Ping   |   Pong   |
+   * |------|-----------------------------------------------------------------|
+   * |  0   |    L0    |          |          |          |          |          |
+   * |  1   |    C0    |    L1    |          |          |          |          |
+   * |  2   |    S0    |    C1    |    L2    |          |          |          |
+   * |  3   |          |    S1    |    C2    |    L3    |          |          |
+   * |  4   |          |          |    S2    |    C3    |    L4    |          |
+   * |  5   |          |          |          |    S3    |    C4    |    L5    |
+   * |  6   |          |          |          |          |    S4    |    C5    |
+   * |  7   |          |          |          |          |          |    S5    |
+   * |------|-----------------------------------------------------------------|
+   */
+
+  // diagram of PINGPONG: L0
+  if (loop_per_core > 0) {
+    loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+              deal_n, total_c, PING, has_weight, ping_pong_offset, 0);
+    __asm__ volatile("sync;");
+  }
+
+  // diagram of PINGPONG: C0 and L1
+  if (loop_per_core > 1) {
+    coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                compute_num, deal_n, total_c, PING, ping_pong_offset,
+                has_weight);
+    loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+              deal_n, total_c, PONG, has_weight, ping_pong_offset, deal_num);
+    __asm__ volatile("sync;");
+  }
+
+  for (int i = 0; i < loop_per_core - 2; ++i) {
+    if (i % 2 == PING) {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PING,
+                  has_weight, ping_pong_offset, i * deal_num);
+      coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                  nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                  compute_num, deal_n, total_c, PONG, ping_pong_offset,
+                  has_weight);
+      loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+                deal_n, total_c, PING, has_weight, ping_pong_offset,
+                (i + 2) * deal_num);
+    } else {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PONG,
+                  has_weight, ping_pong_offset, i * deal_num);
+      coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                  nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                  compute_num, deal_n, total_c, PING, ping_pong_offset,
+                  has_weight);
+      loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+                deal_n, total_c, PONG, has_weight, ping_pong_offset,
+                (i + 2) * deal_num);
+    }
+    __asm__ volatile("sync;");
+  }
+
+  if (loop_per_core > 1) {
+    if ((loop_per_core - 2) % 2 == PING) {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PING,
+                  has_weight, ping_pong_offset, (loop_per_core - 2) * deal_num);
+      coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                  nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                  compute_num, deal_n, total_c, PONG, ping_pong_offset,
+                  has_weight);
+    } else {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PONG,
+                  has_weight, ping_pong_offset, (loop_per_core - 2) * deal_num);
+      coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                  nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                  compute_num, deal_n, total_c, PING, ping_pong_offset,
+                  has_weight);
+    }
+    __asm__ volatile("sync;");
+  }
+
+  if (loop_per_core > 0) {
+    if (loop_per_core == 1) {
+      coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                  nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                  compute_num, deal_n, total_c, PING, ping_pong_offset,
+                  has_weight);
+      __asm__ volatile("sync;");
+    }
+    if ((loop_per_core - 1) % 2 == PING) {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PING,
+                  has_weight, ping_pong_offset, (loop_per_core - 1) * deal_num);
+    } else {
+      storeOutput(base_addr_output, nram_output, deal_n, total_c, PONG,
+                  has_weight, ping_pong_offset, (loop_per_core - 1) * deal_num);
+    }
+  }
+
+  // process the remaining data which N remainder per core is less than deal_n
+  int32_t rem_for_all = total_num - num_per_core * taskDim;
+  if (rem_for_all == 0) return;
+  int32_t rem_n_for_all = rem_for_all / total_c;
+  int32_t rem_n_per_core = (rem_n_for_all + taskDim - 1) / taskDim;
+  int32_t rem_num_per_core = rem_n_per_core * total_c;
+  int32_t rem_num_per_core_align = 0;
+  int32_t rem_core_num = rem_for_all / rem_num_per_core;
+
+  int32_t rem_n_for_last = rem_n_for_all % rem_n_per_core;
+  int32_t rem_num_for_last = rem_n_for_last * total_c;
+  int32_t rem_num_for_last_align = 0;
+
+  if (has_weight) {
+    int32_t align_c = PAD_UP(total_c, compute_align_num);
+    rem_num_per_core_align = rem_n_per_core * align_c;
+    rem_num_for_last_align = rem_n_for_last * align_c;
+  } else {
+    rem_num_per_core_align = PAD_UP(rem_num_per_core, compute_align_num);
+    rem_num_for_last_align = PAD_UP(rem_num_for_last, compute_align_num);
+  }
+
+  int32_t rem_addr_base = num_per_core * taskDim;
+  int32_t rem_target_addr_base = loop_per_core * deal_n * taskDim;
+  base_addr_target = (int32_t *)target + rem_target_addr_base;
+  base_addr_input = (T *)input + rem_addr_base;
+  base_addr_output = output + rem_addr_base;
+
+  if (taskId < rem_core_num) {
+    loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+              rem_n_per_core, total_c, PING, has_weight, ping_pong_offset,
+              taskId * rem_num_per_core);
+    __asm__ volatile("sync;");
+    coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                rem_num_per_core_align, rem_n_per_core, total_c, PING,
+                ping_pong_offset, has_weight);
+    __asm__ volatile("sync;");
+    storeOutput(base_addr_output, nram_output, rem_n_per_core, total_c, PING,
+                has_weight, ping_pong_offset, taskId * rem_num_per_core);
+  } else if (taskId == rem_core_num) {
+    if (rem_num_for_last == 0) return;
+    loadInput(nram_input, nram_target, base_addr_input, base_addr_target,
+              rem_n_for_last, total_c, PING, has_weight, ping_pong_offset,
+              taskId * rem_num_per_core);
+    __asm__ volatile("sync;");
+    coreCompute(nram_input, nram_weight, nram_flt_min, nram_pt, nram_alpha_t,
+                nram_temp, nram_target, nram_gamma, nram_output, alpha,
+                rem_num_for_last_align, rem_n_for_last, total_c, PING,
+                ping_pong_offset, has_weight);
+    __asm__ volatile("sync;");
+    storeOutput(base_addr_output, nram_output, rem_n_for_last, total_c, PING,
+                has_weight, ping_pong_offset, taskId * rem_num_per_core);
+  } else {
+    return;
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelFocalLossSigmoidBackward(
+    const void *input, const void *target, const void *weight,
+    const float gamma, const float alpha, const int32_t total_n,
+    const int32_t deal_n, const int32_t total_c, void *output) {
+  focalLossSigmoidBackwardBlock((T *)input, (int32_t *)target, (T *)weight,
+                                gamma, alpha, total_n, deal_n, total_c,
+                                (T *)output);
+}
+}  // namespace backward
+
+void KernelFocalLossSigmoidForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                   cnrtQueue_t queue,
+                                   const cnrtDataType_t d_type,
+                                   const void *input, const void *target,
+                                   const void *weight, const int32_t N,
+                                   const int32_t C, const float alpha,
+                                   const float gamma, void *output) {
+  if (d_type == CNRT_FLOAT16) {
+    forward::MLUUnion1KernelFocalLossSigmoidForward<
+        half><<<k_dim, k_type, queue>>>(input, target, weight, N, C, alpha,
+                                        gamma, output);
+  } else {
+    forward::MLUUnion1KernelFocalLossSigmoidForward<
+        float><<<k_dim, k_type, queue>>>(input, target, weight, N, C, alpha,
+                                         gamma, output);
+  }
+}
+
+void KernelFocalLossSigmoidBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                    cnrtQueue_t queue,
+                                    const cnrtDataType_t d_type,
+                                    const void *input, const void *target,
+                                    const void *weight, const float gamma,
+                                    const float alpha, const int32_t dim_n,
+                                    const int32_t deal_n, const int32_t dim_c,
+                                    void *output) {
+  if (d_type == CNRT_FLOAT16) {
+    backward::MLUUnion1KernelFocalLossSigmoidBackward<
+        half><<<k_dim, k_type, queue>>>(input, target, weight, gamma, alpha,
+                                        dim_n, deal_n, dim_c, output);
+  } else {
+    backward::MLUUnion1KernelFocalLossSigmoidBackward<
+        float><<<k_dim, k_type, queue>>>(input, target, weight, gamma, alpha,
+                                         dim_n, deal_n, dim_c, output);
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..84e53aa1f395ea57263db4685d4bca1e27f3dc60
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_mlu_kernel.mlu
@@ -0,0 +1,431 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "common_mlu_helper.hpp"
+#include "iou3d_utils.hpp"
+
+#define SIZE_SRAM_BUF (MAX_SRAM_SIZE)
+
+/* NRAM buffer
+ * Suppose deal N boxes once time.
+----------------------------------------------------------------
+| Basic |score (1N)+       |intersect_pts(48N)|                |
+|       |valid_box(1N)     |+ ordered_pts(48N)| temp_long(72N) |
+|       |+ temp_buffer(10N)|                  |                |
+|--------------------------|------------------|----------------|
+| Reuse |     null         |     null         |rotated_pts(16N)|
+|-------|------------------|------------------|----------------|
+
+---------------------------------------------------------------------------
+| Basic |  dist_ram(24N)   | valid_pts(24N)  |box1(5N)  |box1_buffer(5KB) |
+|       |                  |+ nums_in_ram(1N)|+ box2(5N)|+nram_save(5KB)  |
+|--------------------------|-----------------|----------|-----------------|
+| Reuse |  vec_buffer(5N)  |    null         |   null   |      null       |
+|-------|------------------|-----------------|----------|-----------------|
+Total Basic Memory Size = 239N * sizeof(float) + 10KB
+*/
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+__mlu_shared__ char sram_buffer[SIZE_SRAM_BUF];
+
+template <typename T>
+__mlu_func__ void iou3D_detection(int32_t &result_box_num, int32_t *output_data,
+                                  const T *boxes_data, float *scores_data,
+                                  const int core_limit, const int input_box_num,
+                                  const float iou_threshold,
+                                  mluMemcpyDirection_t scores_load_dir,
+                                  mluMemcpyDirection_t scores_store_dir,
+                                  mluMemcpyDirection_t boxes_load_dir) {
+  // NRAM divide by (2+4*COMPUTE_COUNT_ALIGN) copies of NRAM, counted by bytes
+  const int nram_save_limit_count = 256;
+  int box_read_limit_count = 256;
+  float div_thresh_iou = 1.0 / iou_threshold;
+  // every box require 239 * sizeof(float) space in nram;
+  const int32_t copies_of_nram = 239 * sizeof(float);
+  const int32_t limit = (MAX_NRAM_SIZE - 5 * box_read_limit_count * sizeof(T) -
+                         nram_save_limit_count * sizeof(int32_t)) /
+                        copies_of_nram;
+
+  // x,y,z,dx,dy,dz,angle
+  const T *input_x_ptr = boxes_data;
+  const T *input_y_ptr = input_x_ptr + input_box_num;
+  const T *input_dx_ptr = input_y_ptr + 2 * input_box_num;
+  const T *input_dy_ptr = input_dx_ptr + input_box_num;
+  const T *input_angle_ptr = input_dy_ptr + 2 * input_box_num;
+  float *input_score_ptr = scores_data;
+
+  // data split
+  int avg_cluster = 0;
+  int rem_cluster = 0;
+  int len_cluster = 0;
+  int cluster_offset = 0;
+  if (clusterDim > 0) {
+    // union
+    avg_cluster = input_box_num / clusterDim;
+    rem_cluster = input_box_num % clusterDim;
+    len_cluster = avg_cluster + (clusterId < rem_cluster ? 1 : 0);
+    cluster_offset = avg_cluster * clusterId +
+                     (clusterId <= rem_cluster ? clusterId : rem_cluster);
+  } else {
+    // block
+    len_cluster = input_box_num;
+    cluster_offset = 0;
+  }
+  int len_core = input_box_num;
+  int input_offset = 0;
+  if (core_limit > 1) {
+    int avg_core = len_cluster / coreDim;
+    int rem_core = len_cluster % coreDim;
+    len_core = avg_core + (coreId < rem_core ? 1 : 0);
+    int core_offset =
+        avg_core * coreId + (coreId <= rem_core ? coreId : rem_core);
+    input_offset = cluster_offset + core_offset;
+  }
+
+  int32_t max_seg_pad = IOU3D_DOWN(limit, IOU3D_SIZE);
+  int repeat_iou_compute = len_core / max_seg_pad;
+  int remain_iou_compute = len_core % max_seg_pad;
+
+  // basic consistent memory layout
+  void *score = ((char *)nram_buffer);
+  void *valid_box = ((char *)score) + 1 * max_seg_pad * sizeof(float);
+  void *temp_buffer = ((char *)valid_box) + 1 * max_seg_pad * sizeof(float);
+  void *intersect_pts_x =
+      ((char *)temp_buffer) + 10 * max_seg_pad * sizeof(float);
+  void *intersect_pts_y =
+      ((char *)intersect_pts_x) + 24 * max_seg_pad * sizeof(float);
+  void *ordered_pts_x =
+      ((char *)intersect_pts_y) + 24 * max_seg_pad * sizeof(float);
+  void *ordered_pts_y =
+      ((char *)ordered_pts_x) + 24 * max_seg_pad * sizeof(float);
+  void *temp_long_1 =
+      ((char *)ordered_pts_y) + 24 * max_seg_pad * sizeof(float);
+  void *temp_long_2 = ((char *)temp_long_1) + 24 * max_seg_pad * sizeof(float);
+  void *temp_long_3 = ((char *)temp_long_2) + 24 * max_seg_pad * sizeof(float);
+  void *dist_ram = ((char *)temp_long_3) + 24 * max_seg_pad * sizeof(float);
+  void *valid_pts = ((char *)dist_ram) + 24 * max_seg_pad * sizeof(float);
+  void *nums_in_ram = ((char *)valid_pts) + 24 * max_seg_pad * sizeof(float);
+  T *box1 = (T *)(((char *)nums_in_ram) + 1 * max_seg_pad * sizeof(float));
+  T *box2 = (T *)(((char *)box1) + 5 * max_seg_pad * sizeof(float));
+  void *box1_buffer = ((char *)box2) + 5 * max_seg_pad * sizeof(float);
+  int32_t *nram_save =
+      (int32_t *)(((char *)box1_buffer) + 5 * box_read_limit_count * sizeof(T));
+  // nram_save ~ nram_save_limit_count * sizeof(int32_t)
+  int nram_save_count = 0;
+
+  // reuse memory
+  void *rotated_pts1_x = ((char *)dist_ram);
+  void *rotated_pts1_y =
+      ((char *)rotated_pts1_x) + 4 * max_seg_pad * sizeof(float);
+  void *rotated_pts2_x =
+      ((char *)rotated_pts1_y) + 4 * max_seg_pad * sizeof(float);
+  void *rotated_pts2_y =
+      ((char *)rotated_pts2_x) + 4 * max_seg_pad * sizeof(float);
+  void *vec_buffer = ((char *)temp_long_1) + 5 * max_seg_pad * sizeof(float);
+  // vec_buffer ~ 16 * max_seg_pad * sizeof(float)
+
+  // First, initialize ram with all 0, or could cause nan/inf unexcepted results
+  __bang_write_zero((unsigned char *)nram_buffer, copies_of_nram * max_seg_pad);
+  // number 8 and 0xff relay on box_read_limit_count initial as 256
+  const int max_box_seg_id = (input_box_num - 1) >> 8;
+  const int last_rem_box_number = ((input_box_num - 1) & 0xff) + 1;
+  for (int32_t cur_box = 0; cur_box < input_box_num; ++cur_box) {
+    __sync_all();
+    int box_seg_id = cur_box >> 8, box_id = cur_box & 0xff;
+    box_read_limit_count = box_seg_id == max_box_seg_id ? last_rem_box_number
+                                                        : box_read_limit_count;
+    if (box_id == 0) {
+      // x,y,z,dx,dy,dz,angle
+      int offset_num = box_seg_id << 8;
+      // x
+      __memcpy((char *)box1_buffer, input_x_ptr + offset_num,
+               box_read_limit_count * 1 * sizeof(T), boxes_load_dir,
+               box_read_limit_count * 1 * sizeof(T),
+               box_read_limit_count * 1 * sizeof(T), 0);
+      // y
+      __memcpy((char *)box1_buffer + box_read_limit_count * 1 * sizeof(T),
+               input_y_ptr + offset_num, box_read_limit_count * 1 * sizeof(T),
+               boxes_load_dir, box_read_limit_count * 1 * sizeof(T),
+               box_read_limit_count * 1 * sizeof(T), 0);
+      // dx
+      __memcpy((char *)box1_buffer + box_read_limit_count * 2 * sizeof(T),
+               input_dx_ptr + offset_num, box_read_limit_count * 1 * sizeof(T),
+               boxes_load_dir, box_read_limit_count * 1 * sizeof(T),
+               box_read_limit_count * 1 * sizeof(T), 0);
+      // dy
+      __memcpy((char *)box1_buffer + box_read_limit_count * 3 * sizeof(T),
+               input_dy_ptr + offset_num, box_read_limit_count * 1 * sizeof(T),
+               boxes_load_dir, box_read_limit_count * 1 * sizeof(T),
+               box_read_limit_count * 1 * sizeof(T), 0);
+      // angle
+      __memcpy((char *)box1_buffer + box_read_limit_count * 4 * sizeof(T),
+               input_angle_ptr + offset_num,
+               box_read_limit_count * 1 * sizeof(T), boxes_load_dir,
+               box_read_limit_count * 1 * sizeof(T),
+               box_read_limit_count * 1 * sizeof(T), 0);
+    }
+    if (((float *)input_score_ptr)[cur_box] == 0) {
+      continue;
+    }
+    // save result
+    nram_save[nram_save_count] = cur_box;
+    result_box_num++;
+    nram_save_count++;
+    if (clusterId == 0 && coreId == 0 &&
+        nram_save_count == nram_save_limit_count) {
+      pvLock();
+      __memcpy(output_data, nram_save, nram_save_count * sizeof(int32_t),
+               NRAM2GDRAM);
+      pvUnlock();
+      output_data += nram_save_count;
+      nram_save_count = 0;
+    }
+    // prepare box1
+    // x
+    __bang_write_value((float *)box1, max_seg_pad,
+                       float(((T *)box1_buffer)[box_id]));
+    // y
+    __bang_write_value(
+        (float *)box1 + max_seg_pad, max_seg_pad,
+        float(((T *)box1_buffer)[box_id + 1 * box_read_limit_count]));
+    // dx
+    __bang_write_value(
+        (float *)box1 + max_seg_pad * 2, max_seg_pad,
+        float(((T *)box1_buffer)[box_id + 2 * box_read_limit_count]));
+    // dy
+    __bang_write_value(
+        (float *)box1 + max_seg_pad * 3, max_seg_pad,
+        float(((T *)box1_buffer)[box_id + 3 * box_read_limit_count]));
+    // angle
+    __bang_write_value(
+        (float *)box1 + max_seg_pad * 4, max_seg_pad,
+        float(((T *)box1_buffer)[box_id + 4 * box_read_limit_count]));
+
+    float max_area = 1.0f *
+                     ((T *)box1_buffer)[box_id + 2 * box_read_limit_count] *
+                     ((T *)box1_buffer)[box_id + 3 * box_read_limit_count];
+    // update score
+
+    for (int i = 0; i <= repeat_iou_compute; i++) {
+      if (i == repeat_iou_compute && remain_iou_compute == 0) {
+        break;
+      }
+      int seg_len = max_seg_pad;
+      int cpy_len =
+          (i == repeat_iou_compute) ? remain_iou_compute : max_seg_pad;
+      // int half_offset = std::is_same<T, half>::value ? max_seg_pad * 5 : 0;
+      int half_offset = (sizeof(T) == sizeof(half)) ? max_seg_pad * 5 : 0;
+      // score
+      __memcpy(score, input_score_ptr + input_offset + i * max_seg_pad,
+               cpy_len * sizeof(float), scores_load_dir,
+               cpy_len * sizeof(float), cpy_len * sizeof(float), 0);
+      // x
+      __memcpy(box2 + half_offset, input_x_ptr + input_offset + i * max_seg_pad,
+               cpy_len * 1 * sizeof(T), boxes_load_dir, cpy_len * 1 * sizeof(T),
+               cpy_len * 1 * sizeof(T), 0);
+      // y
+      __memcpy(box2 + half_offset + seg_len * 1,
+               input_y_ptr + input_offset + i * max_seg_pad,
+               cpy_len * 1 * sizeof(T), boxes_load_dir, cpy_len * 1 * sizeof(T),
+               cpy_len * 1 * sizeof(T), 0);
+      // dx
+      __memcpy(box2 + half_offset + seg_len * 2,
+               input_dx_ptr + input_offset + i * max_seg_pad,
+               cpy_len * 1 * sizeof(T), boxes_load_dir, cpy_len * 1 * sizeof(T),
+               cpy_len * 1 * sizeof(T), 0);
+      // dy
+      __memcpy(box2 + half_offset + seg_len * 3,
+               input_dy_ptr + input_offset + i * max_seg_pad,
+               cpy_len * 1 * sizeof(T), boxes_load_dir, cpy_len * 1 * sizeof(T),
+               cpy_len * 1 * sizeof(T), 0);
+      // angle
+      __memcpy(box2 + half_offset + seg_len * 4,
+               input_angle_ptr + input_offset + i * max_seg_pad,
+               cpy_len * 1 * sizeof(T), boxes_load_dir, cpy_len * 1 * sizeof(T),
+               cpy_len * 1 * sizeof(T), 0);
+      // if (std::is_same<T, half>::value) {
+      if (sizeof(T) == sizeof(half)) {
+        __bang_half2float((float *)box2, (half *)(box2 + half_offset),
+                          seg_len * 5);
+      }
+
+      // Calculate rotated vertices
+      void *temp1_ram = ((char *)temp_buffer);
+      void *temp2_ram = ((char *)temp_buffer) + seg_len * sizeof(float);
+      void *temp3_ram = ((char *)temp_buffer) + 2 * seg_len * sizeof(float);
+      void *temp4_ram = ((char *)temp_buffer) + 3 * seg_len * sizeof(float);
+      getRotatedVertices((float *)rotated_pts1_x, (float *)rotated_pts1_y,
+                         (float *)box1, (float *)temp1_ram, (float *)temp2_ram,
+                         (float *)temp3_ram, (float *)temp4_ram, seg_len);
+      getRotatedVertices((float *)rotated_pts2_x, (float *)rotated_pts2_y,
+                         (float *)box2, (float *)temp1_ram, (float *)temp2_ram,
+                         (float *)temp3_ram, (float *)temp4_ram, seg_len);
+
+      __bang_write_zero((float *)valid_pts, 24 * seg_len);
+      __bang_write_zero((float *)nums_in_ram, seg_len);
+      __bang_write_value(((float *)valid_box), seg_len, 1.0f);
+      void *vec1_x = ((char *)vec_buffer);
+      void *vec1_y = ((char *)vec1_x) + 4 * seg_len * sizeof(float);
+      void *vec2_x = ((char *)vec1_y) + 4 * seg_len * sizeof(float);
+      void *vec2_y = ((char *)vec2_x) + 4 * seg_len * sizeof(float);
+      void *temp5_ram = ((char *)temp_buffer) + 4 * seg_len * sizeof(float);
+      void *temp6_ram = ((char *)temp_buffer) + 5 * seg_len * sizeof(float);
+      void *temp7_ram = ((char *)temp_buffer) + 6 * seg_len * sizeof(float);
+      void *temp8_ram = ((char *)temp_buffer) + 7 * seg_len * sizeof(float);
+      void *temp9_ram = ((char *)temp_buffer) + 8 * seg_len * sizeof(float);
+      void *temp10_ram = ((char *)temp_buffer) + 9 * seg_len * sizeof(float);
+
+      // Get all intersection points
+      getIntersectPts(
+          (float *)rotated_pts1_x, (float *)rotated_pts1_y,
+          (float *)rotated_pts2_x, (float *)rotated_pts2_y, (float *)vec1_x,
+          (float *)vec1_y, (float *)vec2_x, (float *)vec2_y,
+          (float *)intersect_pts_x, (float *)intersect_pts_y,
+          (float *)valid_pts, (float *)nums_in_ram, (float *)temp1_ram,
+          (float *)temp2_ram, (float *)temp3_ram, (float *)temp4_ram,
+          (float *)temp5_ram, (float *)temp6_ram, (float *)temp7_ram,
+          (float *)temp8_ram, (float *)temp9_ram, (float *)temp10_ram, seg_len);
+
+      // Where nums_in <= 2, set valid_box to false
+      __bang_write_value((float *)temp9_ram, COMPUTE_COUNT_ALIGN, (float)2);
+      __bang_cycle_gt((float *)temp1_ram, (float *)nums_in_ram,
+                      (float *)temp9_ram, seg_len, COMPUTE_COUNT_ALIGN);
+      __bang_and((float *)valid_box, (float *)valid_box, (float *)temp1_ram,
+                 seg_len);
+      __bang_cycle_and((float *)valid_pts, (float *)valid_pts,
+                       (float *)valid_box, 24 * seg_len, seg_len);
+
+      // Convex-hull-graham to order the intersection points in clockwise order
+      // and find the contour area
+
+      convexHullGraham(
+          (float *)intersect_pts_x, (float *)intersect_pts_y,
+          (float *)ordered_pts_x, (float *)ordered_pts_y, (float *)dist_ram,
+          (float *)valid_box, (float *)valid_pts, (float *)nums_in_ram,
+          (float *)temp7_ram, (float *)temp8_ram, (float *)temp9_ram,
+          (float *)temp_long_1, (float *)temp_long_2, (float *)temp_long_3,
+          seg_len, seg_len);
+      // Calculate polygon area
+      // set temp1 = intersection part area
+      polygonArea((float *)ordered_pts_x, (float *)ordered_pts_y,
+                  (float *)valid_box, (float *)valid_pts, (float *)nums_in_ram,
+                  (float *)temp1_ram, (float *)temp2_ram, (float *)temp3_ram,
+                  (float *)temp4_ram, (float *)temp5_ram, (float *)temp6_ram,
+                  (float *)temp7_ram, (float *)temp8_ram, (float *)temp9_ram,
+                  seg_len);
+      // area
+      __bang_mul((float *)temp2_ram, (float *)box2 + seg_len * 2,
+                 (float *)box2 + seg_len * 3, seg_len);
+      // get the area_U: area + max_area - area_I
+      __bang_add_scalar((float *)temp2_ram, (float *)temp2_ram, float(max_area),
+                        seg_len);
+      __bang_sub((float *)temp2_ram, (float *)temp2_ram, (float *)temp1_ram,
+                 seg_len);  // area_U
+      if (iou_threshold > 0.0) {
+        __bang_mul_scalar((float *)temp1_ram, (float *)temp1_ram,
+                          div_thresh_iou, seg_len);
+      } else {
+        __bang_mul_scalar((float *)temp2_ram, (float *)temp2_ram, iou_threshold,
+                          seg_len);
+      }
+      __bang_ge((float *)temp1_ram, (float *)temp2_ram, (float *)temp1_ram,
+                seg_len);
+      __bang_mul((float *)score, (float *)score, (float *)temp1_ram, seg_len);
+
+      pvLock();
+      __memcpy(input_score_ptr + input_offset + i * max_seg_pad, score,
+               cpy_len * sizeof(float), scores_store_dir,
+               cpy_len * sizeof(float), cpy_len * sizeof(float), 0);
+      pvUnlock();
+    }
+  }
+  if (clusterId == 0 && coreId == 0 && nram_save_count) {
+    pvLock();
+    __memcpy(output_data, nram_save, nram_save_count * sizeof(int32_t),
+             NRAM2GDRAM);
+    pvUnlock();
+  }
+}
+__mlu_global__ void MLUBlockorUnionIKernelOU3D(
+    const void *input_boxes, const int input_box_num, const float iou_threshold,
+    const cnrtDataType_t data_type_input, void *workspace, void *result_num,
+    void *output) {
+  int input_dwidth = (data_type_input == CNRT_FLOAT32) ? 4 : 2;
+  mluMemcpyDirection_t scores_load_dir = GDRAM2NRAM;
+  mluMemcpyDirection_t scores_store_dir = NRAM2GDRAM;
+  mluMemcpyDirection_t boxes_load_dir = GDRAM2NRAM;
+  float *scores_data = (float *)workspace;
+  float *boxes_data = (float *)input_boxes;
+  const int cluster_score_size = input_box_num * sizeof(float);
+  const int cluster_boxes_size = input_box_num * 7 * input_dwidth;
+  char *sram_score = (char *)sram_buffer;
+  char *sram_boxes = (char *)sram_buffer + cluster_score_size;
+  if (clusterDim == 1 && SIZE_SRAM_BUF > cluster_score_size) {
+    scores_data = (float *)sram_score;
+    scores_load_dir = SRAM2NRAM;
+    scores_store_dir = NRAM2SRAM;
+    if (coreId == 0x80) {
+      __sramset((void *)sram_buffer, input_box_num, 1.0f);
+    }
+  } else {
+    if (coreId == 0) {
+      __gdramset(scores_data, input_box_num, 1.0f);
+    }
+  }
+  if (clusterDim == 1 &&
+      SIZE_SRAM_BUF - cluster_score_size >= cluster_boxes_size) {
+    boxes_load_dir = SRAM2NRAM;
+    boxes_data = (float *)sram_boxes;
+    if (coreId == 0x80) {
+      __memcpy((char *)boxes_data, (char *)input_boxes, cluster_boxes_size,
+               GDRAM2SRAM);
+    }
+  }
+  __sync_cluster();
+
+  int32_t result_box_num = 0;
+  int32_t *out_data = (int32_t *)output;
+
+  switch (data_type_input) {
+    default: { return; }
+    case CNRT_FLOAT16: {
+      iou3D_detection(result_box_num, out_data, (half *)boxes_data, scores_data,
+                      taskDim, input_box_num, iou_threshold, scores_load_dir,
+                      scores_store_dir, boxes_load_dir);
+    }; break;
+    case CNRT_FLOAT32: {
+      iou3D_detection(result_box_num, out_data, boxes_data, scores_data,
+                      taskDim, input_box_num, iou_threshold, scores_load_dir,
+                      scores_store_dir, boxes_load_dir);
+    }; break;
+  }
+  ((int32_t *)result_num)[0] = result_box_num;
+}
+
+void KernelIou3d(cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+                 const cnrtDataType_t data_type_input, const void *boxes_dram,
+                 const int input_box_num, const float iou_threshold,
+                 void *workspace, void *output_size, void *output) {
+  switch (k_type) {
+    default: { return; }
+    case CNRT_FUNC_TYPE_BLOCK:
+    case CNRT_FUNC_TYPE_UNION1:
+    case CNRT_FUNC_TYPE_UNION2:
+    case CNRT_FUNC_TYPE_UNION4:
+    case CNRT_FUNC_TYPE_UNION8:
+    case CNRT_FUNC_TYPE_UNION16: {
+      MLUBlockorUnionIKernelOU3D<<<k_dim, k_type, queue>>>(
+          (void *)boxes_dram, input_box_num, iou_threshold, data_type_input,
+          workspace, output_size, output);
+    }; break;
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..b98ffe2fcaa80c92090ff47cbea31ed9b7552bcb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/iou3d_utils.hpp
@@ -0,0 +1,695 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#ifndef IOU3D_UTILS_HPP_
+#define IOU3D_UTILS_HPP_
+#include "common_mlu_helper.hpp"
+
+#define IOU3D_SIZE 64
+#define IOU3D_UP(x, y) (x / y + (int)(x % y > 0)) * y
+#define IOU3D_DOWN(x, y) (x / y) * y
+#define SIZE_NRAM_BUF (MAX_NRAM_SIZE)
+#define SIZE_SRAM_BUF (MAX_SRAM_SIZE)
+#define COMPUTE_COUNT_ALIGN 64
+#define INFO_NUM (5)  // score, x1, y1, x2, y2
+#define REDUCE_NUM \
+  (7)  // score, x1, y1, x2, y2, max_index (reserve 2 num for half-type input)
+#define SINGLE_BOX_DIM 5
+#define MEMORY_CORE (0x80)
+__mlu_func__ void pvLock() {
+#if __BANG_ARCH__ == 270
+  if (coreId != MEMORY_CORE) {
+    __bang_lock(0, 0);
+  }
+#endif
+}
+
+__mlu_func__ void pvUnlock() {
+#if __BANG_ARCH__ == 270
+  if (coreId != MEMORY_CORE) {
+    __bang_unlock(0, 0);
+  }
+#endif
+}
+
+// cross2d<T>(A, B) = A.x * B.y - A.y * B.x;
+template <typename T>
+inline __mlu_func__ void cross2d(T *result, const T *p1_x, const T *p1_y,
+                                 const T *p2_x, const T *p2_y,
+                                 const int &length, T *temp_ram) {
+  __bang_mul((T *)temp_ram, (T *)p1_x, (T *)p2_y, length);
+  __bang_mul((T *)result, (T *)p1_y, (T *)p2_x, length);
+  __bang_sub((T *)result, (T *)temp_ram, (T *)result, length);
+}
+
+// dot2d<T>(A, B) =  A.x * B.x + A.y * B.y
+template <typename T>
+inline __mlu_func__ void dot2d(T *result, const T *p1_x, const T *p1_y,
+                               const T *p2_x, const T *p2_y, const int &length,
+                               T *temp_ram) {
+  __bang_mul((T *)temp_ram, (T *)p1_x, (T *)p2_x, length);
+  __bang_mul((T *)result, (T *)p1_y, (T *)p2_y, length);
+  __bang_add((T *)result, (T *)temp_ram, (T *)result, length);
+}
+
+template <typename T>
+__mlu_func__ void getRotatedVertices(T *pts_x, T *pts_y, T *box, T *temp1,
+                                     T *temp2, T *temp3, T *temp4,
+                                     const uint32_t &actual_compute_box_num) {
+// T cosTheta2 = (T)cos(theta) * 0.5f; -- temp1
+// T sinTheta2 = (T)sin(theta) * 0.5f; -- temp2
+// theta is the box's 5th data: a, rotated radian;
+#if __BANG_ARCH__ >= 300
+  __bang_cos((float *)temp1, ((float *)box) + 4 * actual_compute_box_num,
+             actual_compute_box_num);
+  __bang_sin((float *)temp2, ((float *)box) + 4 * actual_compute_box_num,
+             actual_compute_box_num);
+#else
+  __bang_taylor4_cos((T *)temp1, ((T *)box) + 4 * actual_compute_box_num,
+                     (T *)temp3, (T *)temp4, actual_compute_box_num);
+  __bang_taylor4_sin((T *)temp2, ((T *)box) + 4 * actual_compute_box_num,
+                     (T *)temp3, (T *)temp4, actual_compute_box_num);
+#endif
+  __bang_mul_scalar((T *)temp1, (T *)temp1, (T)0.5, actual_compute_box_num);
+  __bang_mul_scalar((T *)temp2, (T *)temp2, (T)0.5, actual_compute_box_num);
+
+  // Temp3 = sinTheta2 * box.h;
+  // Temp4 = cosTheta2 * box.w;
+  __bang_mul((T *)temp3, (T *)temp2, ((T *)box) + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+  __bang_mul((T *)temp4, (T *)temp1, ((T *)box) + 2 * actual_compute_box_num,
+             actual_compute_box_num);
+  // pts[0].x = box.x_ctr - sinTheta2 * box.h - cosTheta2 * box.w;
+  // pts[1].x = box.x_ctr + sinTheta2 * box.h - cosTheta2 * box.w;
+  __bang_sub((T *)pts_x, (T *)box, (T *)temp3, actual_compute_box_num);
+  __bang_sub((T *)pts_x, (T *)pts_x, (T *)temp4, actual_compute_box_num);
+  __bang_add((T *)pts_x + 1 * actual_compute_box_num, (T *)box, (T *)temp3,
+             actual_compute_box_num);
+  __bang_sub((T *)pts_x + 1 * actual_compute_box_num,
+             (T *)pts_x + 1 * actual_compute_box_num, (T *)temp4,
+             actual_compute_box_num);
+  // Temp3 = cosTheta2 * box.h;
+  // Temp4 = sinTheta2 * box.w;
+  __bang_mul((T *)temp3, (T *)temp1, box + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+  __bang_mul((T *)temp4, (T *)temp2, box + 2 * actual_compute_box_num,
+             actual_compute_box_num);
+  // pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w;
+  // pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w;
+  __bang_add((T *)pts_y, (T *)box + 1 * actual_compute_box_num, (T *)temp3,
+             actual_compute_box_num);
+  __bang_sub((T *)pts_y, (T *)pts_y, (T *)temp4, actual_compute_box_num);
+  __bang_sub((T *)pts_y + 1 * actual_compute_box_num,
+             (T *)box + 1 * actual_compute_box_num, (T *)temp3,
+             actual_compute_box_num);
+  __bang_sub((T *)pts_y + 1 * actual_compute_box_num,
+             (T *)pts_y + 1 * actual_compute_box_num, (T *)temp4,
+             actual_compute_box_num);
+  // pts[2].x = 2 * box.x_ctr - pts[0].x;
+  // pts[3].x = 2 * box.x_ctr - pts[1].x;
+  __bang_add((T *)pts_x + 2 * actual_compute_box_num, (T *)box, (T *)box,
+             actual_compute_box_num);
+  __bang_sub((T *)pts_x + 2 * actual_compute_box_num,
+             (T *)pts_x + 2 * actual_compute_box_num, (T *)pts_x,
+             actual_compute_box_num);
+  __bang_add((T *)pts_x + 3 * actual_compute_box_num, (T *)box, (T *)box,
+             actual_compute_box_num);
+  __bang_sub((T *)pts_x + 3 * actual_compute_box_num,
+             (T *)pts_x + 3 * actual_compute_box_num,
+             (T *)pts_x + 1 * actual_compute_box_num, actual_compute_box_num);
+  // pts[2].y = 2 * box.y_ctr - pts[0].y;
+  // pts[3].y = 2 * box.y_ctr - pts[1].y;
+  __bang_add((T *)pts_y + 2 * actual_compute_box_num,
+             (T *)box + 1 * actual_compute_box_num,
+             (T *)box + 1 * actual_compute_box_num, actual_compute_box_num);
+  __bang_sub((T *)pts_y + 2 * actual_compute_box_num,
+             (T *)pts_y + 2 * actual_compute_box_num, (T *)pts_y,
+             actual_compute_box_num);
+  __bang_add((T *)pts_y + 3 * actual_compute_box_num,
+             (T *)box + 1 * actual_compute_box_num,
+             (T *)box + 1 * actual_compute_box_num, actual_compute_box_num);
+  __bang_sub((T *)pts_y + 3 * actual_compute_box_num,
+             (T *)pts_y + 3 * actual_compute_box_num,
+             (T *)pts_y + 1 * actual_compute_box_num, actual_compute_box_num);
+}
+
+template <typename T>
+__mlu_func__ void getIntersectPts(T *rotated_pts1_x, T *rotated_pts1_y,
+                                  T *rotated_pts2_x, T *rotated_pts2_y,
+                                  T *vec1_x, T *vec1_y, T *vec2_x, T *vec2_y,
+                                  T *intersect_pts_x, T *intersect_pts_y,
+                                  T *valid_pts, T *nums_in_ram, T *temp1_ram,
+                                  T *temp2_ram, T *temp3_ram, T *temp4_ram,
+                                  T *temp5_ram, T *temp6_ram, T *temp7_ram,
+                                  T *temp8_ram, T *temp9_ram, T *temp10_ram,
+                                  const uint32_t &actual_compute_box_num) {
+// Initialize const data to ram
+// temp3 = const 1e-14(@float), length = COMPUTE_COUNT_ALIGN
+#if __BANG_ARCH__ >= 300
+  __bang_write_value((T *)temp3_ram, COMPUTE_COUNT_ALIGN, (T)1e-14);
+#else
+  // NOTE: Since active_reciphp function has strict value range,
+  //       [2.2205e-16, 2e6]@float, [0.00391, 65504]@half
+  __bang_write_value((T *)temp3_ram, COMPUTE_COUNT_ALIGN, (float)1e-14);
+#endif
+  // temp4 = const T(0), length = COMPUTE_COUNT_ALIGN
+  __bang_write_value((T *)temp4_ram, COMPUTE_COUNT_ALIGN, (T)0);
+  // temp5 = const T(1), length = COMPUTE_COUNT_ALIGN
+  __bang_write_value((T *)temp5_ram, COMPUTE_COUNT_ALIGN, (T)1);
+
+  // Line vector, from p1 to p2 is: p1+(p2-p1)*t, t=[0,1]
+  // for i = 0~3, vec[i] = pts[(i+1)%4] - pts[i]
+  __bang_sub((T *)vec1_x, (T *)rotated_pts1_x + actual_compute_box_num,
+             (T *)rotated_pts1_x, 3 * actual_compute_box_num);
+  __bang_sub((T *)vec1_x + 3 * actual_compute_box_num, (T *)rotated_pts1_x,
+             (T *)rotated_pts1_x + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+  __bang_sub((T *)vec1_y, (T *)rotated_pts1_y + actual_compute_box_num,
+             (T *)rotated_pts1_y, 3 * actual_compute_box_num);
+  __bang_sub((T *)vec1_y + 3 * actual_compute_box_num, (T *)rotated_pts1_y,
+             (T *)rotated_pts1_y + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+
+  __bang_sub((T *)vec2_x, (T *)rotated_pts2_x + actual_compute_box_num,
+             (T *)rotated_pts2_x, 3 * actual_compute_box_num);
+  __bang_sub((T *)vec2_x + 3 * actual_compute_box_num, (T *)rotated_pts2_x,
+             (T *)rotated_pts2_x + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+  __bang_sub((T *)vec2_y, (T *)rotated_pts2_y + actual_compute_box_num,
+             (T *)rotated_pts2_y, 3 * actual_compute_box_num);
+  __bang_sub((T *)vec2_y + 3 * actual_compute_box_num, (T *)rotated_pts2_y,
+             (T *)rotated_pts2_y + 3 * actual_compute_box_num,
+             actual_compute_box_num);
+
+  // First, line test - test all line combos for intersection, 4x4 possible
+  for (int i = 0; i < 4; i++) {
+    for (int j = 0; j < 4; j++) {
+      // T det = cross2d<T>(vec2[j], vec1[i]) -- temp2
+      cross2d<T>((T *)temp2_ram, (T *)vec2_x + j * actual_compute_box_num,
+                 (T *)vec2_y + j * actual_compute_box_num,
+                 (T *)vec1_x + i * actual_compute_box_num,
+                 (T *)vec1_y + i * actual_compute_box_num,
+                 actual_compute_box_num, (T *)temp1_ram);
+      // temp8 = sign(det), since active_reciphp only receive positive values
+      __bang_active_sign((T *)temp8_ram, (T *)temp2_ram,
+                         actual_compute_box_num);
+      // deal with parallel lines, temp2 = fabs(det), temp1 = temp2 > 1e-14
+      __bang_active_abs((T *)temp2_ram, (T *)temp2_ram, actual_compute_box_num);
+      __bang_cycle_gt((T *)temp1_ram, (T *)temp2_ram, (T *)temp3_ram,
+                      actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+      // Where temp1 = false, set recip input to 1, avoiding recip(0), cause inf
+      __bang_not((T *)temp9_ram, (T *)temp1_ram, actual_compute_box_num);
+      __bang_mul((T *)temp2_ram, (T *)temp2_ram, (T *)temp1_ram,
+                 actual_compute_box_num);
+      __bang_add((T *)temp2_ram, (T *)temp2_ram, (T *)temp9_ram,
+                 actual_compute_box_num);
+// temp2 = 1/temp2, use mult (1/temp2) instead of div temp2
+#if __BANG_ARCH__ >= 300
+      __bang_recip((float *)temp2_ram, (float *)temp2_ram,
+                   actual_compute_box_num);
+#else
+      // NOTE: active_reciphp function has strict value range:
+      //       [2.2205e-16, 2e6]@float, [0.00391, 65504]@half
+      __bang_active_reciphp((T *)temp2_ram, (T *)temp2_ram,
+                            actual_compute_box_num);
+#endif
+      // Restore temp2 invalid box value 1 and sign-bit
+      __bang_mul((T *)temp2_ram, (T *)temp2_ram, (T *)temp1_ram,
+                 actual_compute_box_num);
+      __bang_mul((T *)temp2_ram, (T *)temp2_ram, (T *)temp8_ram,
+                 actual_compute_box_num);
+
+      // auto vec12 = pts2[j] - pts1[i], (temp6, temp7) = (x, y)
+      __bang_sub((T *)temp6_ram,
+                 (T *)rotated_pts2_x + j * actual_compute_box_num,
+                 (T *)rotated_pts1_x + i * actual_compute_box_num,
+                 actual_compute_box_num);
+      __bang_sub((T *)temp7_ram,
+                 (T *)rotated_pts2_y + j * actual_compute_box_num,
+                 (T *)rotated_pts1_y + i * actual_compute_box_num,
+                 actual_compute_box_num);
+
+      // T t1 = cross2d<T>(vec2[j], vec12) mult (1/det)  -- temp8
+      cross2d<T>((T *)temp8_ram, (T *)vec2_x + j * actual_compute_box_num,
+                 (T *)vec2_y + j * actual_compute_box_num, (T *)temp6_ram,
+                 (T *)temp7_ram, actual_compute_box_num, (T *)temp9_ram);
+      __bang_mul((T *)temp8_ram, (T *)temp8_ram, (T *)temp2_ram,
+                 actual_compute_box_num);
+
+      // temp1 &= (t1 >= 0.0f && t1 <= 1.0f)  -- temp9
+      __bang_cycle_ge((T *)temp9_ram, (T *)temp8_ram, (T *)temp4_ram,
+                      actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+      __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp9_ram,
+                 actual_compute_box_num);
+      __bang_cycle_le((T *)temp9_ram, (T *)temp8_ram, (T *)temp5_ram,
+                      actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+      __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp9_ram,
+                 actual_compute_box_num);
+
+      // T t2 = cross2d<T>(vec1[i], vec12) mult temp2  -- temp9
+      // NOTE: temp8(t1) is used after, reuse temp7(p2_y) as cross2d temp ram
+      cross2d<T>((T *)temp9_ram, (T *)vec1_x + i * actual_compute_box_num,
+                 (T *)vec1_y + i * actual_compute_box_num, (T *)temp6_ram,
+                 (T *)temp7_ram, actual_compute_box_num, (T *)temp7_ram);
+      __bang_mul((T *)temp9_ram, (T *)temp9_ram, (T *)temp2_ram,
+                 actual_compute_box_num);
+
+      // temp1 &= (t2 >= 0.0f && t2 <= 1.0f)  -- temp9
+      __bang_cycle_ge((T *)temp7_ram, (T *)temp9_ram, (T *)temp4_ram,
+                      actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+      __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp7_ram,
+                 actual_compute_box_num);
+      __bang_cycle_le((T *)temp7_ram, (T *)temp9_ram, (T *)temp5_ram,
+                      actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+      __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp7_ram,
+                 actual_compute_box_num);
+
+      // intersections = (pts1[i] + vec1[i] * t1) * temp1
+      __bang_mul((T *)temp9_ram, (T *)vec1_x + i * actual_compute_box_num,
+                 (T *)temp8_ram, actual_compute_box_num);
+      __bang_add((T *)temp9_ram,
+                 (T *)rotated_pts1_x + i * actual_compute_box_num,
+                 (T *)temp9_ram, actual_compute_box_num);
+      __bang_mul((T *)intersect_pts_x + (4 * i + j) * actual_compute_box_num,
+                 (T *)temp9_ram, (T *)temp1_ram, actual_compute_box_num);
+      __bang_mul((T *)temp9_ram, (T *)vec1_y + i * actual_compute_box_num,
+                 (T *)temp8_ram, actual_compute_box_num);
+      __bang_add((T *)temp9_ram,
+                 (T *)rotated_pts1_y + i * actual_compute_box_num,
+                 (T *)temp9_ram, actual_compute_box_num);
+      __bang_mul((T *)intersect_pts_y + (4 * i + j) * actual_compute_box_num,
+                 (T *)temp9_ram, (T *)temp1_ram, actual_compute_box_num);
+
+      // Assign `valid_pts` bit and accumulate `nums_in` of valid points of each
+      // box pair
+      __bang_or((T *)valid_pts + (4 * i + j) * actual_compute_box_num,
+                (T *)valid_pts + (4 * i + j) * actual_compute_box_num,
+                (T *)temp1_ram, actual_compute_box_num);
+      __bang_add((T *)nums_in_ram, (T *)nums_in_ram, (T *)temp1_ram,
+                 actual_compute_box_num);
+    }
+  }
+
+  // Check for vertices of rect1 inside rect2
+  // temp5 = ABdotAB
+  dot2d<T>((T *)temp5_ram, (T *)vec2_x, (T *)vec2_y, (T *)vec2_x, (T *)vec2_y,
+           actual_compute_box_num, (T *)temp9_ram);
+  // temp6 = ADdotAD
+  dot2d<T>((T *)temp6_ram, (T *)vec2_x + 3 * actual_compute_box_num,
+           (T *)vec2_y + 3 * actual_compute_box_num,
+           (T *)vec2_x + 3 * actual_compute_box_num,
+           (T *)vec2_y + 3 * actual_compute_box_num, actual_compute_box_num,
+           (T *)temp9_ram);
+  // assume ABCD is the rectangle, and P is the point to be judged
+  // P is inside ABCD iff. P's projection on AB lines within AB
+  // and P's projection on AD lies within AD
+  for (int i = 0; i < 4; i++) {
+    // AP = pts1[i] - pts2[0] = (temp7, temp8)
+    __bang_sub((T *)temp7_ram, (T *)rotated_pts1_x + i * actual_compute_box_num,
+               (T *)rotated_pts2_x, actual_compute_box_num);
+    __bang_sub((T *)temp8_ram, (T *)rotated_pts1_y + i * actual_compute_box_num,
+               (T *)rotated_pts2_y, actual_compute_box_num);
+
+    // temp9 = APdotAB = dot2d<T>(AP, AB)
+    dot2d<T>((T *)temp9_ram, (T *)temp7_ram, (T *)temp8_ram, (T *)vec2_x,
+             (T *)vec2_y, actual_compute_box_num, (T *)temp2_ram);
+    // temp10 = APdotAD = -dot2d<T>(AP, DA)
+    dot2d<T>((T *)temp10_ram, (T *)temp7_ram, (T *)temp8_ram,
+             (T *)vec2_x + 3 * actual_compute_box_num,
+             (T *)vec2_y + 3 * actual_compute_box_num, actual_compute_box_num,
+             (T *)temp2_ram);
+    __bang_mul_scalar((T *)temp10_ram, (T *)temp10_ram, (T)-1,
+                      actual_compute_box_num);
+
+    // ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && (APdotAD <=
+    // ADdotAD))
+    __bang_cycle_ge((T *)temp1_ram, (T *)temp9_ram, (T *)temp4_ram,
+                    actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+    __bang_cycle_ge((T *)temp2_ram, (T *)temp10_ram, (T *)temp4_ram,
+                    actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+    __bang_le((T *)temp2_ram, (T *)temp9_ram, (T *)temp5_ram,
+              actual_compute_box_num);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+    __bang_le((T *)temp2_ram, (T *)temp10_ram, (T *)temp6_ram,
+              actual_compute_box_num);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+
+    // 16 means the 4x4 possible intersection points above
+    __bang_mul((T *)intersect_pts_x + (16 + i) * actual_compute_box_num,
+               (T *)temp1_ram, (T *)rotated_pts1_x + i * actual_compute_box_num,
+               actual_compute_box_num);
+    __bang_mul((T *)intersect_pts_y + (16 + i) * actual_compute_box_num,
+               (T *)temp1_ram, (T *)rotated_pts1_y + i * actual_compute_box_num,
+               actual_compute_box_num);
+
+    // assign valid_pts bit and accumulate nums of valid points of each box pair
+    __bang_or((T *)valid_pts + (16 + i) * actual_compute_box_num,
+              (T *)valid_pts + (16 + i) * actual_compute_box_num,
+              (T *)temp1_ram, actual_compute_box_num);
+    __bang_add((T *)nums_in_ram, (T *)nums_in_ram, (T *)temp1_ram,
+               actual_compute_box_num);
+  }
+
+  // Reverse the check - check for vertices of rect2 inside rect1
+  // temp5 = ABdotAB
+  dot2d<T>((T *)temp5_ram, (T *)vec1_x, (T *)vec1_y, (T *)vec1_x, (T *)vec1_y,
+           actual_compute_box_num, (T *)temp9_ram);
+  // temp6 = ADdotAD
+  dot2d<T>((T *)temp6_ram, (T *)vec1_x + 3 * actual_compute_box_num,
+           (T *)vec1_y + 3 * actual_compute_box_num,
+           (T *)vec1_x + 3 * actual_compute_box_num,
+           (T *)vec1_y + 3 * actual_compute_box_num, actual_compute_box_num,
+           (T *)temp9_ram);
+  for (int i = 0; i < 4; i++) {
+    // AP = pts2[i] - pts1[0] = (temp7, temp8)
+    __bang_sub((T *)temp7_ram, (T *)rotated_pts2_x + i * actual_compute_box_num,
+               (T *)rotated_pts1_x, actual_compute_box_num);
+    __bang_sub((T *)temp8_ram, (T *)rotated_pts2_y + i * actual_compute_box_num,
+               (T *)rotated_pts1_y, actual_compute_box_num);
+
+    // temp9 = APdotAB = dot2d<T>(AP, AB)
+    dot2d<T>((T *)temp9_ram, (T *)temp7_ram, (T *)temp8_ram, (T *)vec1_x,
+             (T *)vec1_y, actual_compute_box_num, (T *)temp2_ram);
+    // temp10 = APdotAD = -dot2d<T>(AP, DA)
+    dot2d<T>((T *)temp10_ram, (T *)temp7_ram, (T *)temp8_ram,
+             (T *)vec1_x + 3 * actual_compute_box_num,
+             (T *)vec1_y + 3 * actual_compute_box_num, actual_compute_box_num,
+             (T *)temp2_ram);
+    __bang_mul_scalar((T *)temp10_ram, (T *)temp10_ram, (T)-1,
+                      actual_compute_box_num);
+
+    // ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && (APdotAD <=
+    // ADdotAD))
+    __bang_cycle_ge((T *)temp1_ram, (T *)temp9_ram, (T *)temp4_ram,
+                    actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+    __bang_cycle_ge((T *)temp2_ram, (T *)temp10_ram, (T *)temp4_ram,
+                    actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+    __bang_le((T *)temp2_ram, (T *)temp9_ram, (T *)temp5_ram,
+              actual_compute_box_num);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+    __bang_le((T *)temp2_ram, (T *)temp10_ram, (T *)temp6_ram,
+              actual_compute_box_num);
+    __bang_and((T *)temp1_ram, (T *)temp1_ram, (T *)temp2_ram,
+               actual_compute_box_num);
+
+    // 20 means the (4x4+4) possible intersection points above
+    __bang_mul((T *)intersect_pts_x + (20 + i) * actual_compute_box_num,
+               (T *)temp1_ram, (T *)rotated_pts2_x + i * actual_compute_box_num,
+               actual_compute_box_num);
+    __bang_mul((T *)intersect_pts_y + (20 + i) * actual_compute_box_num,
+               (T *)temp1_ram, (T *)rotated_pts2_y + i * actual_compute_box_num,
+               actual_compute_box_num);
+
+    // assign valid_pts bit and accumulate nums of valid points of each box pair
+    __bang_or((T *)valid_pts + (20 + i) * actual_compute_box_num,
+              (T *)valid_pts + (20 + i) * actual_compute_box_num,
+              (T *)temp1_ram, actual_compute_box_num);
+    __bang_add((T *)nums_in_ram, (T *)nums_in_ram, (T *)temp1_ram,
+               actual_compute_box_num);
+  }
+}
+
+template <typename T>
+__mlu_func__ void convexHullGraham(
+    T *intersect_pts_x, T *intersect_pts_y, T *ordered_pts_x, T *ordered_pts_y,
+    T *dist_ram, T *valid_box, T *valid_pts, T *nums_in_ram, T *temp1_ram,
+    T *temp2_ram, T *temp3_ram, T *temp_long_1, T *temp_long_2, T *temp_long_3,
+    const uint32_t &actual_box_num, const uint32_t &actual_compute_box_num) {
+  // Step1. Find the point with minimum y, if more than 1 points have the same
+  // minimum y,
+  //        pick the one with the minimum x.
+  // set p[i].y to max_y_value if not valid_pts, to avoid invalid result
+  // 24 means all possible intersection points
+  __bang_max((T *)temp2_ram, (T *)intersect_pts_y, 24 * actual_compute_box_num);
+  __bang_write_value((T *)temp3_ram, COMPUTE_COUNT_ALIGN, ((T *)temp2_ram)[0]);
+  __bang_not((T *)temp_long_1, (T *)valid_pts, 24 * actual_compute_box_num);
+  __bang_cycle_mul((T *)temp_long_1, (T *)temp_long_1, (T *)temp3_ram,
+                   24 * actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+  __bang_mul((T *)temp_long_2, (T *)intersect_pts_y, (T *)valid_pts,
+             24 * actual_compute_box_num);
+  __bang_add((T *)temp_long_2, (T *)temp_long_2, (T *)temp_long_1,
+             24 * actual_compute_box_num);
+  // temp2 = min_y_value(temp_long_2), use min_pool, channel=box_num, h=1, w=24
+  __bang_minpool((T *)temp2_ram, (T *)temp_long_2, actual_compute_box_num, 1,
+                 24, 1, 24, 1, 24);
+  __bang_mul((T *)temp2_ram, (T *)temp2_ram, (T *)valid_box,
+             actual_compute_box_num);
+
+  // set p[i].x to max_x_value if not min_y point
+  __bang_max((T *)temp1_ram, (T *)intersect_pts_x, 24 * actual_compute_box_num);
+  __bang_write_value((T *)temp3_ram, COMPUTE_COUNT_ALIGN, ((T *)temp1_ram)[0]);
+  __bang_cycle_eq((T *)temp_long_1, (T *)temp_long_2, (T *)temp2_ram,
+                  24 * actual_compute_box_num, actual_compute_box_num);
+  __bang_and((T *)temp_long_1, (T *)temp_long_1, (T *)valid_pts,
+             24 * actual_compute_box_num);
+  __bang_not((T *)temp_long_3, (T *)temp_long_1, 24 * actual_compute_box_num);
+  __bang_cycle_mul((T *)temp_long_3, (T *)temp_long_3, (T *)temp3_ram,
+                   24 * actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+  __bang_mul((T *)temp_long_1, (T *)intersect_pts_x, (T *)temp_long_1,
+             24 * actual_compute_box_num);
+  __bang_add((T *)temp_long_1, (T *)temp_long_1, (T *)temp_long_3,
+             24 * actual_compute_box_num);
+  // temp3 = min_x_value(temp_long_1), use min_pool, channel=box_num, h=1, w=24
+  __bang_minpool((T *)temp3_ram, (T *)temp_long_1, actual_compute_box_num, 1,
+                 24, 1, 24, 1, 24);
+  __bang_mul((T *)temp3_ram, (T *)temp3_ram, (T *)valid_box,
+             actual_compute_box_num);
+
+  // Step2. All points subtract starting-point (for sorting in the next step)
+  __bang_cycle_sub((T *)ordered_pts_x, (T *)intersect_pts_x, (T *)temp3_ram,
+                   24 * actual_compute_box_num, actual_compute_box_num);
+  __bang_cycle_sub((T *)ordered_pts_y, (T *)intersect_pts_y, (T *)temp2_ram,
+                   24 * actual_compute_box_num, actual_compute_box_num);
+  __bang_mul((T *)ordered_pts_x, (T *)ordered_pts_x, (T *)valid_pts,
+             24 * actual_compute_box_num);
+  __bang_mul((T *)ordered_pts_y, (T *)ordered_pts_y, (T *)valid_pts,
+             24 * actual_compute_box_num);
+
+  // Step3. Sort every intersection point according to their relative
+  //        cross-product values (essentially sorting according to angles)
+  //        If the angles are the same, sort according to distance to origin
+  dot2d<T>((T *)dist_ram, (T *)ordered_pts_x, (T *)ordered_pts_y,
+           (T *)ordered_pts_x, (T *)ordered_pts_y, 24 * actual_compute_box_num,
+           (T *)temp_long_3);
+
+  T temp, temp_nums_in, temp_dist_1, temp_dist_2;
+  T temp1_x, temp1_y;
+  T temp2_x, temp2_y;
+  for (int i = 0; i < actual_box_num; i++) {
+    if (((T *)valid_box)[i]) {
+      // make sure all nums_in[i] points are at the front
+      for (int ii = 0; ii < 23; ii++) {
+        for (int jj = ii + 1; jj < 24; jj++) {
+          int ii_index = ii * actual_compute_box_num + i;
+          int jj_index = jj * actual_compute_box_num + i;
+          // ii point is not valid and jj point is valid, swap jj for ii
+          if ((!((T *)valid_pts)[ii_index]) && ((T *)valid_pts)[jj_index]) {
+            ((T *)ordered_pts_x)[ii_index] = ((T *)ordered_pts_x)[jj_index];
+            ((T *)ordered_pts_y)[ii_index] = ((T *)ordered_pts_y)[jj_index];
+            ((T *)dist_ram)[ii_index] = ((T *)dist_ram)[jj_index];
+            ((T *)valid_pts)[ii_index] = true;
+            ((T *)ordered_pts_x)[jj_index] = 0;
+            ((T *)ordered_pts_y)[jj_index] = 0;
+            ((T *)dist_ram)[jj_index] = 0;
+            ((T *)valid_pts)[jj_index] = false;
+            break;
+          }
+        }
+      }
+      temp_nums_in = ((T *)nums_in_ram)[i];
+      // make original q[0] = min_x, min_y before sort
+      for (int ii = 1; ii < temp_nums_in; ii++) {
+        int ii_index = ii * actual_compute_box_num + i;
+        if (((T *)dist_ram)[ii_index] == 0) {
+          // swap q[ii_index] and q[0]
+          ((T *)ordered_pts_x)[ii_index] = ((T *)ordered_pts_x)[i];
+          ((T *)ordered_pts_y)[ii_index] = ((T *)ordered_pts_y)[i];
+          ((T *)dist_ram)[ii_index] = ((T *)dist_ram)[i];
+          ((T *)ordered_pts_x)[i] = 0;
+          ((T *)ordered_pts_y)[i] = 0;
+          ((T *)dist_ram)[i] = 0;
+          break;
+        }
+      }
+      for (int ii = 1; ii < temp_nums_in - 1; ii++) {
+        for (int jj = ii + 1; jj < temp_nums_in; jj++) {
+          int ii_index = ii * actual_compute_box_num + i;
+          int jj_index = jj * actual_compute_box_num + i;
+          temp1_x = ((T *)ordered_pts_x)[ii_index];
+          temp1_y = ((T *)ordered_pts_y)[ii_index];
+          temp2_x = ((T *)ordered_pts_x)[jj_index];
+          temp2_y = ((T *)ordered_pts_y)[jj_index];
+          // calculate cross product and sort q (ordered_pts)
+          temp = (temp1_x * temp2_y) - (temp1_y * temp2_x);
+          temp_dist_1 = ((T *)dist_ram)[ii_index];
+          temp_dist_2 = ((T *)dist_ram)[jj_index];
+          if ((temp < (T)-1e-6) ||
+              ((fabs(temp) < (T)1e-6) && (temp_dist_1 > temp_dist_2))) {
+            ((T *)ordered_pts_x)[ii_index] = temp2_x;
+            ((T *)ordered_pts_y)[ii_index] = temp2_y;
+            ((T *)ordered_pts_x)[jj_index] = temp1_x;
+            ((T *)ordered_pts_y)[jj_index] = temp1_y;
+            ((T *)dist_ram)[ii_index] = temp_dist_2;
+            ((T *)dist_ram)[jj_index] = temp_dist_1;
+          }
+        }
+      }
+
+      // Step4:
+      // Make sure there are at least 2 points(that don't overlap with each
+      // other) in the stack
+      int k;  // index of the non-overlapped second point
+      for (k = 1; k < temp_nums_in; k++) {
+        if (((T *)dist_ram)[k * actual_compute_box_num + i] > (T)1e-8) {
+          break;
+        }
+      }
+      if (k == temp_nums_in) {
+        // We reach the end, which means the convex hull is just one point
+        // set valid_box = 0, to get ious = 0
+        ((T *)valid_box)[i] = 0;
+        continue;
+      }
+      // q[1] = q[k];
+      ((T *)ordered_pts_x)[actual_compute_box_num + i] =
+          ((T *)ordered_pts_x)[k * actual_compute_box_num + i];
+      ((T *)ordered_pts_y)[actual_compute_box_num + i] =
+          ((T *)ordered_pts_y)[k * actual_compute_box_num + i];
+
+      // Step 5:
+      // Finally we can start the scanning process.
+      // When a non-convex relationship between the 3 points is found
+      // (either concave shape or duplicated points),
+      // we pop the previous point from the stack
+      // until the 3-point relationship is convex again, or
+      // until the stack only contains two points
+      int m = 2;  // 2 points in the stack
+      for (int j = k + 1; j < temp_nums_in; j++) {
+        // while (m > 1 && cross2d<T>(q[j] - q[m - 2], q[m - 1] - q[m - 2]) >=
+        // 0) {
+        //   m--;
+        // }
+        temp1_x = ((T *)ordered_pts_x)[j * actual_compute_box_num + i] -
+                  ((T *)ordered_pts_x)[(m - 2) * actual_compute_box_num + i];
+        temp1_y = ((T *)ordered_pts_y)[j * actual_compute_box_num + i] -
+                  ((T *)ordered_pts_y)[(m - 2) * actual_compute_box_num + i];
+        temp2_x = ((T *)ordered_pts_x)[(m - 1) * actual_compute_box_num + i] -
+                  ((T *)ordered_pts_x)[(m - 2) * actual_compute_box_num + i];
+        temp2_y = ((T *)ordered_pts_y)[(m - 1) * actual_compute_box_num + i] -
+                  ((T *)ordered_pts_y)[(m - 2) * actual_compute_box_num + i];
+        temp = (temp1_x * temp2_y) - (temp1_y * temp2_x);
+        while ((m > 1) && (temp >= 0)) {
+          m--;
+          if (m > 1) {
+            temp1_x =
+                ((T *)ordered_pts_x)[j * actual_compute_box_num + i] -
+                ((T *)ordered_pts_x)[(m - 2) * actual_compute_box_num + i];
+            temp1_y =
+                ((T *)ordered_pts_y)[j * actual_compute_box_num + i] -
+                ((T *)ordered_pts_y)[(m - 2) * actual_compute_box_num + i];
+            temp2_x =
+                ((T *)ordered_pts_x)[(m - 1) * actual_compute_box_num + i] -
+                ((T *)ordered_pts_x)[(m - 2) * actual_compute_box_num + i];
+            temp2_y =
+                ((T *)ordered_pts_y)[(m - 1) * actual_compute_box_num + i] -
+                ((T *)ordered_pts_y)[(m - 2) * actual_compute_box_num + i];
+            temp = (temp1_x * temp2_y) - (temp1_y * temp2_x);
+          }
+        }
+        // q[m++] = q[j];
+        ((T *)ordered_pts_x)[m * actual_compute_box_num + i] =
+            ((T *)ordered_pts_x)[j * actual_compute_box_num + i];
+        ((T *)ordered_pts_y)[m * actual_compute_box_num + i] =
+            ((T *)ordered_pts_y)[j * actual_compute_box_num + i];
+        m++;
+      }
+      // set last(24-m) valid_pts to false, to erase invalid q in polygon area
+      for (int j = m; j < temp_nums_in; j++) {
+        ((T *)valid_pts)[j * actual_compute_box_num + i] = 0;
+      }
+      ((T *)nums_in_ram)[i] = m;
+    }
+  }
+}
+
+template <typename T>
+__mlu_func__ void polygonArea(T *ordered_pts_x, T *ordered_pts_y, T *valid_box,
+                              T *valid_pts, T *nums_in_ram, T *temp1_ram,
+                              T *temp2_ram, T *temp3_ram, T *temp4_ram,
+                              T *temp5_ram, T *temp6_ram, T *temp7_ram,
+                              T *temp8_ram, T *temp9_ram,
+                              const uint32_t &actual_compute_box_num) {
+  // Set where nums_in <= 2, valid_box = false
+  __bang_write_value((T *)temp9_ram, COMPUTE_COUNT_ALIGN, (T)2);
+  __bang_cycle_gt((T *)temp1_ram, (T *)nums_in_ram, (T *)temp9_ram,
+                  actual_compute_box_num, COMPUTE_COUNT_ALIGN);
+  __bang_and((T *)valid_box, (T *)valid_box, (T *)temp1_ram,
+             actual_compute_box_num);
+
+  // temp1 = area, initialize with all 0
+  __bang_write_zero((T *)temp1_ram, actual_compute_box_num);
+  __bang_max((T *)temp7_ram, (T *)nums_in_ram, actual_compute_box_num);
+
+  // temp_nums_in = max(nums_in)
+  T temp_nums_in = ((T *)temp7_ram)[0];
+  for (int i = 1; i < temp_nums_in - 1; i++) {
+    // q[i] - q[0]: (temp6, temp7)
+    __bang_sub((T *)temp6_ram, (T *)ordered_pts_x + i * actual_compute_box_num,
+               (T *)ordered_pts_x, actual_compute_box_num);
+    __bang_sub((T *)temp7_ram, (T *)ordered_pts_y + i * actual_compute_box_num,
+               (T *)ordered_pts_y, actual_compute_box_num);
+    __bang_mul((T *)temp6_ram, (T *)temp6_ram,
+               (T *)valid_pts + (i + 1) * actual_compute_box_num,
+               actual_compute_box_num);
+    __bang_mul((T *)temp7_ram, (T *)temp7_ram,
+               (T *)valid_pts + (i + 1) * actual_compute_box_num,
+               actual_compute_box_num);
+    // q[i + 1] - q[0]: (temp8, temp9)
+    __bang_sub((T *)temp8_ram,
+               (T *)ordered_pts_x + (i + 1) * actual_compute_box_num,
+               (T *)ordered_pts_x, actual_compute_box_num);
+    __bang_sub((T *)temp9_ram,
+               (T *)ordered_pts_y + (i + 1) * actual_compute_box_num,
+               (T *)ordered_pts_y, actual_compute_box_num);
+    __bang_mul((T *)temp8_ram, (T *)temp8_ram,
+               (T *)valid_pts + (i + 1) * actual_compute_box_num,
+               actual_compute_box_num);
+    __bang_mul((T *)temp9_ram, (T *)temp9_ram,
+               (T *)valid_pts + (i + 1) * actual_compute_box_num,
+               actual_compute_box_num);
+    // area += fabs(cross2d<T>(q[i] - q[0], q[i + 1] - q[0]));
+    __bang_mul((T *)temp4_ram, (T *)temp6_ram, (T *)temp9_ram,
+               actual_compute_box_num);
+    __bang_mul((T *)temp5_ram, (T *)temp7_ram, (T *)temp8_ram,
+               actual_compute_box_num);
+    __bang_sub((T *)temp3_ram, (T *)temp4_ram, (T *)temp5_ram,
+               actual_compute_box_num);
+    __bang_active_abs((T *)temp3_ram, (T *)temp3_ram, actual_compute_box_num);
+    __bang_add((T *)temp1_ram, (T *)temp1_ram, (T *)temp3_ram,
+               actual_compute_box_num);
+  }
+  //  Set where valid_box = false, intersection = 0
+  __bang_mul((T *)temp1_ram, (T *)temp1_ram, (T *)valid_box,
+             actual_compute_box_num);
+  //  area = area / 2.0
+  __bang_mul_scalar((T *)temp1_ram, (T *)temp1_ram, (T)0.5,
+                    actual_compute_box_num);
+}
+
+#endif  // IOU3D_UTILS_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/masked_conv2d_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/masked_conv2d_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..1356a799ac3ba5d36de9df25a0cdd0a706506e75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/masked_conv2d_mlu_kernel.mlu
@@ -0,0 +1,181 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void MLUUnion1MaskedIm2colForward(
+    const T *feature, const int height, const int width, const int channels,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int32_t *mask_h_idx, const int32_t *mask_w_idx, const int mask_cnt,
+    T *data_col) {
+  for (int index = taskId; index < mask_cnt; index += taskDim) {
+    const int h_col = mask_h_idx[index];
+    const int w_col = mask_w_idx[index];
+    const int h_offset = h_col - pad_h;
+    const int w_offset = w_col - pad_w;
+    int h_start = h_offset;
+    int h_end = h_offset + kernel_h - 1;
+    int w_start = w_offset;
+    int w_end = w_start + kernel_w - 1;
+    if (h_start >= height || w_start >= width || h_end < 0 || w_end < 0) {
+      continue;
+    } else {
+      int h_start_valid = max(0, h_start);
+      int h_end_valid = min(height - 1, h_end);
+      int w_start_valid = max(0, w_start);
+      int w_end_valid = min(width - 1, w_end);
+      __memcpy(
+          data_col + index * kernel_h * kernel_w * channels +
+              ((h_start_valid - h_start) * kernel_w +
+               (w_start_valid - w_start)) *
+                  channels,
+          feature + h_start_valid * width * channels + w_start_valid * channels,
+          (w_end_valid - w_start_valid + 1) * channels * sizeof(T), GDRAM2GDRAM,
+          kernel_w * channels * sizeof(T), width * channels * sizeof(T),
+          h_end_valid - h_start_valid);
+    }
+  }
+}
+
+template <typename T>
+__mlu_func__ void MLUUnion1MaskedCol2imForward(const T *col, const int height,
+                                               const int width,
+                                               const int channels,
+                                               const int32_t *mask_h_idx,
+                                               const int32_t *mask_w_idx,
+                                               const int mask_cnt, T *im) {
+  const int channels_max_num_nram = MAX_NRAM_SIZE / sizeof(T);
+  if (channels <= channels_max_num_nram) {
+    const int deal_num = channels_max_num_nram / channels;
+    int mask_per_core = mask_cnt / taskDim;
+    const int mask_remain = mask_cnt % taskDim;
+    mask_per_core += taskId < mask_remain ? 1 : 0;
+    int index_start = taskId < mask_remain
+                          ? taskId * mask_per_core
+                          : taskId * mask_per_core + mask_remain;
+    int loop = mask_per_core / deal_num;
+    int remain_num = mask_per_core % deal_num;
+    T *nram_col = (T *)nram_buffer;
+    for (int index = 0; index < loop; ++index) {
+      int cur_index = index_start + index * deal_num;
+      __memcpy(nram_col, col + cur_index * channels,
+               deal_num * channels * sizeof(T), GDRAM2NRAM);
+      for (int i = 0; i < deal_num; ++i) {
+        int mask_index = cur_index + i;
+        const int h_im = mask_h_idx[mask_index];
+        const int w_im = mask_w_idx[mask_index];
+        // if(h_im>=height || w_im>=width) continue;
+        __memcpy(im + (h_im * width + w_im) * channels, nram_col + i * channels,
+                 channels * sizeof(T), NRAM2GDRAM);
+      }
+    }
+    if (remain_num > 0) {
+      int cur_index = index_start + loop * deal_num;
+      __memcpy(nram_col, col + cur_index * channels,
+               remain_num * channels * sizeof(T), GDRAM2NRAM);
+      for (int i = 0; i < remain_num; ++i) {
+        int mask_index = cur_index + i;
+        const int h_im = mask_h_idx[mask_index];
+        const int w_im = mask_w_idx[mask_index];
+        // if(h_im>=height || w_im>=width) continue;
+        __memcpy(im + (h_im * width + w_im) * channels, nram_col + i * channels,
+                 channels * sizeof(T), NRAM2GDRAM);
+      }
+    }
+  } else {
+    for (int index = taskId; index < mask_cnt; index += taskDim) {
+      const int m_index = index % mask_cnt;
+      const int h_im = mask_h_idx[m_index];
+      const int w_im = mask_w_idx[m_index];
+      // if(h_im>=height || w_im>=width) continue;
+      __memcpy(im + (h_im * width + w_im) * channels, col + index * channels,
+               channels * sizeof(T), GDRAM2GDRAM);
+    }
+  }
+}
+
+__mlu_global__ void MLUKernelMaskedIm2colForward(
+    const void *feature, const int height, const int width, const int channels,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const void *mask_h_idx, const void *mask_w_idx, const int mask_cnt,
+    void *data_col, const cnrtDataType_t data_dtype) {
+  if (coreId == 0x80) {
+    return;
+  }
+
+  switch (data_dtype) {
+    case CNRT_FLOAT16: {
+      MLUUnion1MaskedIm2colForward((half *)feature, height, width, channels,
+                                   kernel_h, kernel_w, pad_h, pad_w,
+                                   (int32_t *)mask_h_idx, (int32_t *)mask_w_idx,
+                                   mask_cnt, (half *)data_col);
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1MaskedIm2colForward((float *)feature, height, width, channels,
+                                   kernel_h, kernel_w, pad_h, pad_w,
+                                   (int32_t *)mask_h_idx, (int32_t *)mask_w_idx,
+                                   mask_cnt, (float *)data_col);
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+__mlu_global__ void MLUKernelMaskedCol2imForward(
+    const void *col, const int height, const int width, const int channels,
+    const void *mask_h_idx, const void *mask_w_idx, const int mask_cnt,
+    void *im, const cnrtDataType_t data_dtype) {
+  if (coreId == 0x80) {
+    return;
+  }
+  switch (data_dtype) {
+    case CNRT_FLOAT16: {
+      MLUUnion1MaskedCol2imForward((half *)col, height, width, channels,
+                                   (int32_t *)mask_h_idx, (int32_t *)mask_w_idx,
+                                   mask_cnt, (half *)im);
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1MaskedCol2imForward((float *)col, height, width, channels,
+                                   (int32_t *)mask_h_idx, (int32_t *)mask_w_idx,
+                                   mask_cnt, (float *)im);
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+void KernelMaskedIm2colForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    cnrtDataType_t k_dtype, const void *im_ptr, const int height,
+    const int width, const int channels, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const void *mask_h_idx_ptr,
+    const void *mask_w_idx_ptr, const int mask_cnt, void *col_ptr) {
+  MLUKernelMaskedIm2colForward<<<k_dim, k_type, queue>>>(
+      im_ptr, height, width, channels, kernel_h, kernel_w, pad_h, pad_w,
+      mask_h_idx_ptr, mask_w_idx_ptr, mask_cnt, col_ptr, k_dtype);
+}
+
+void KernelMaskedCol2imForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                               cnrtQueue_t queue, cnrtDataType_t k_dtype,
+                               const void *col_ptr, const int height,
+                               const int width, const int channels,
+                               const void *mask_h_idx_ptr,
+                               const void *mask_w_idx_ptr, const int mask_cnt,
+                               void *im_ptr) {
+  MLUKernelMaskedCol2imForward<<<k_dim, k_type, queue>>>(
+      col_ptr, height, width, channels, mask_h_idx_ptr, mask_w_idx_ptr,
+      mask_cnt, im_ptr, k_dtype);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/ms_deform_attn_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/ms_deform_attn_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..7899e52cd3a342060416b9131b1380baac950622
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/ms_deform_attn_mlu_kernel.mlu
@@ -0,0 +1,853 @@
+/*************************************************************************
+ * Copyright (C) 2022 by Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "common_mlu_helper.hpp"
+#include <math.h>
+
+/****************************************************************************************
+ *
+ * NRAM partition forward:
+ * | spatial_shapes     | data_value_p1_ping | data_value_p2_ping |
+ * | data_value_p3_ping | data_value_p4_ping | data_col_ping      |
+ * | data_value_p1_pong | data_value_p2_pong | data_value_p3_pong |
+ * | data_value_p4_pong | data_col_pong      | auxiliary_a        |
+ * | auxiliary_b        |
+ * | 128bytes           | deal_size          | deal_size          |
+ * | deal_size          | deal_size          | deal_size          |
+ * | deal_size          | deal_size          | deal_size          |
+ * | deal_size          | deal_size          | deal_size          |
+ * | deal_size          |
+ *
+ ****************************************************************************************/
+
+/****************************************************************************************
+ *
+ * NRAM partition backward:
+ * | grad_output_nram   | grad_output_nram_temp | grad_weight       |
+ * | grad_h_weight      | grad_w_weight         | top_grad          |
+ * | top_grad_temp      | spatial_shapes_nram   | sampling_loc_nram |
+ * | deal_size          | deal_size             | deal_size         |
+ * | deal_size          | deal_size             | deal_size         |
+ * | deal_size          | deal_size             | 64bytes           |
+ *
+ ****************************************************************************************/
+
+#define TWELVE_SPLIT 12
+#define ALIGN_NUM 64
+#define ALIGN_NUM_FOR_REDUCE 32
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void loadNeighborPointsData(
+    const T *data_value_gdram, T *data_value_p1_nram, T *data_value_p2_nram,
+    T *data_value_p3_nram, T *data_value_p4_nram, const size_t deal_num,
+    const int32_t &width, const int32_t &height, const int32_t &num_heads,
+    const int32_t &channels, const T &x, const T &y, const int32_t &head_idx) {
+  const int32_t w_low = floorf(x);
+  const int32_t h_low = floorf(y);
+  const int32_t w_high = w_low + 1;
+  const int32_t h_high = h_low + 1;
+
+  const int32_t w_stride = num_heads * channels;
+  const int32_t h_stride = width * w_stride;
+  const int32_t h_low_ptr_offset = h_low * h_stride;
+  const int32_t h_high_ptr_offset = h_low_ptr_offset + h_stride;
+  const int32_t w_low_ptr_offset = w_low * w_stride;
+  const int32_t w_high_ptr_offset = w_low_ptr_offset + w_stride;
+  const int32_t base_ptr_offset = head_idx * channels;
+
+  // top-left point
+  if (h_low >= 0 && w_low >= 0) {
+    const int32_t v1_offset =
+        h_low_ptr_offset + w_low_ptr_offset + base_ptr_offset;
+    __memcpy_async(data_value_p1_nram, data_value_gdram + v1_offset,
+                   deal_num * sizeof(T), GDRAM2NRAM);
+  }
+
+  // top-right point
+  if (h_low >= 0 && w_high <= width - 1) {
+    const int32_t v2_offset =
+        h_low_ptr_offset + w_high_ptr_offset + base_ptr_offset;
+    __memcpy_async(data_value_p2_nram, data_value_gdram + v2_offset,
+                   deal_num * sizeof(T), GDRAM2NRAM);
+  }
+
+  // bottom-left point
+  if (h_high <= height - 1 && w_low >= 0) {
+    const int32_t v3_offset =
+        h_high_ptr_offset + w_low_ptr_offset + base_ptr_offset;
+    __memcpy_async(data_value_p3_nram, data_value_gdram + v3_offset,
+                   deal_num * sizeof(T), GDRAM2NRAM);
+  }
+
+  // bottom-right point
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    const int32_t v4_offset =
+        h_high_ptr_offset + w_high_ptr_offset + base_ptr_offset;
+    __memcpy_async(data_value_p4_nram, data_value_gdram + v4_offset,
+                   deal_num * sizeof(T), GDRAM2NRAM);
+  }
+}
+
+template <typename T>
+__mlu_func__ void bilinearInterpolation(
+    T *data_value_p1_nram, T *data_value_p2_nram, T *data_value_p3_nram,
+    T *data_value_p4_nram, T *sample_point_value, T *auxiliary_b,
+    const size_t deal_num, const int32_t &width, const int32_t &height,
+    const T &x, const T &y) {
+  const int32_t w_low = floorf(x);
+  const int32_t h_low = floorf(y);
+  const int32_t w_high = w_low + 1;
+  const int32_t h_high = h_low + 1;
+
+  const T lw = x - w_low;
+  const T lh = y - h_low;
+  const T hw = 1 - lw;
+  const T hh = 1 - lh;
+  const T w1 = hh * hw;
+  const T w2 = hh * lw;
+  const T w3 = lh * hw;
+  const T w4 = lh * lw;
+
+  __bang_write_value((T *)sample_point_value, deal_num, (T)0);
+
+  // top-left point
+  if (h_low >= 0 && w_low >= 0) {
+    // sample_point_value += v1 * w1
+    __bang_mul_scalar((T *)auxiliary_b, (T *)data_value_p1_nram, (T)w1,
+                      deal_num);
+    __bang_add((T *)sample_point_value, (T *)sample_point_value,
+               (T *)auxiliary_b, deal_num);
+  }
+
+  // top-right point
+  if (h_low >= 0 && w_high <= width - 1) {
+    // sample_point_value += v2 * w2
+    __bang_mul_scalar((T *)auxiliary_b, (T *)data_value_p2_nram, (T)w2,
+                      deal_num);
+    __bang_add((T *)sample_point_value, (T *)sample_point_value,
+               (T *)auxiliary_b, deal_num);
+  }
+
+  // bottom-left point
+  if (h_high <= height - 1 && w_low >= 0) {
+    // sample_point_value += v3 * w3
+    __bang_mul_scalar((T *)auxiliary_b, (T *)data_value_p3_nram, (T)w3,
+                      deal_num);
+    __bang_add((T *)sample_point_value, (T *)sample_point_value,
+               (T *)auxiliary_b, deal_num);
+  }
+
+  // bottom-right point
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    // sample_point_value += v4 * w4
+    __bang_mul_scalar((T *)auxiliary_b, (T *)data_value_p4_nram, (T)w4,
+                      deal_num);
+    __bang_add((T *)sample_point_value, (T *)sample_point_value,
+               (T *)auxiliary_b, deal_num);
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUKernelMsDeformAttnForward(
+    const char *data_value_gdram, const char *data_spatial_shapes_gdram,
+    const char *data_level_start_index_gdram,
+    const char *data_sampling_loc_gdram, const char *data_attn_weight_gdram,
+    const int32_t batch_size, const int32_t num_keys, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_queries,
+    const int32_t num_points, char *data_col_gdram) {
+  if (coreId == 0x80) {
+    return;
+  }
+
+  const size_t spatial_size = PAD_UP(2 * sizeof(int32_t), NFU_ALIGN_SIZE);
+  const size_t span_num_deal =
+      PAD_DOWN((MAX_NRAM_SIZE - spatial_size) / TWELVE_SPLIT / sizeof(T),
+               NFU_ALIGN_SIZE);
+  const size_t align_num = NFU_ALIGN_SIZE;
+  const int32_t channels_seg_num = channels / span_num_deal;
+  const size_t channels_rem = channels % span_num_deal;
+  const size_t channels_align_rem = CEIL_ALIGN(channels_rem, align_num);
+  char *data_spatial_shapes_nram = nram_buffer;
+  char *ping_data_value_p1_nram = data_spatial_shapes_nram + spatial_size;
+  char *ping_data_value_p2_nram =
+      ping_data_value_p1_nram + span_num_deal * sizeof(T);
+  char *ping_data_value_p3_nram =
+      ping_data_value_p2_nram + span_num_deal * sizeof(T);
+  char *ping_data_value_p4_nram =
+      ping_data_value_p3_nram + span_num_deal * sizeof(T);
+  char *ping_data_col_nram =
+      ping_data_value_p4_nram + span_num_deal * sizeof(T);
+  char *pong_data_value_p1_nram =
+      ping_data_col_nram + span_num_deal * sizeof(T);
+  char *pong_data_value_p2_nram =
+      pong_data_value_p1_nram + span_num_deal * sizeof(T);
+  char *pong_data_value_p3_nram =
+      pong_data_value_p2_nram + span_num_deal * sizeof(T);
+  char *pong_data_value_p4_nram =
+      pong_data_value_p3_nram + span_num_deal * sizeof(T);
+  char *pong_data_col_nram =
+      pong_data_value_p4_nram + span_num_deal * sizeof(T);
+  char *auxiliary_a = pong_data_col_nram + span_num_deal * sizeof(T);
+  char *auxiliary_b = auxiliary_a + span_num_deal * sizeof(T);
+  const size_t ping_pong_gap = 5 * span_num_deal * sizeof(T);
+  size_t data_col_ping_pong_idx = 0;
+
+  int32_t block_num_per_core = (batch_size * num_queries * num_heads) / taskDim;
+  const int32_t block_num_rem =
+      (batch_size * num_queries * num_heads) % taskDim;
+  const int32_t idx_start = taskId < (block_num_rem + 1)
+                                ? taskId * (block_num_per_core + 1)
+                                : taskId * block_num_per_core + block_num_rem;
+  block_num_per_core =
+      taskId < block_num_rem
+          ? (batch_size * num_queries * num_heads) / taskDim + 1
+          : (batch_size * num_queries * num_heads) / taskDim;
+
+  for (int32_t cur_idx = idx_start; cur_idx < idx_start + block_num_per_core;
+       ++cur_idx) {
+    // cur_idx = batch_idx * num_queries * num_heads + query_idx * num_heads +
+    // head_idx
+    const int32_t head_idx = cur_idx % num_heads;
+    const int32_t batch_idx = (cur_idx / num_heads) / num_queries;
+
+    const char *data_value_gdram_start =
+        data_value_gdram +
+        batch_idx * num_keys * num_heads * channels * sizeof(T);
+    const char *data_sampling_loc_gdram_start =
+        data_sampling_loc_gdram +
+        cur_idx * num_levels * num_points * 2 * sizeof(T);
+    const char *data_attn_weight_gdram_start =
+        data_attn_weight_gdram + cur_idx * num_levels * num_points * sizeof(T);
+    char *data_col_gdram_start =
+        data_col_gdram + cur_idx * channels * sizeof(T);
+
+    for (int32_t c_seg_idx = 0; c_seg_idx < channels_seg_num; ++c_seg_idx) {
+      __bang_write_value(
+          (T *)(ping_data_col_nram + data_col_ping_pong_idx * ping_pong_gap),
+          span_num_deal, (T)0);
+      // load data
+      // level_idx = 0, point_idx = 0
+      __memcpy(data_spatial_shapes_nram, data_spatial_shapes_gdram,
+               2 * sizeof(int32_t), GDRAM2NRAM);
+      int32_t spatial_h = ((int32_t *)data_spatial_shapes_nram)[0];
+      int32_t spatial_w = ((int32_t *)data_spatial_shapes_nram)[1];
+      const char *data_value_ptr =
+          data_value_gdram_start + c_seg_idx * span_num_deal * sizeof(T);
+      T loc_w = ((T *)data_sampling_loc_gdram_start)[0];
+      T loc_h = ((T *)data_sampling_loc_gdram_start)[1];
+      T weight = ((T *)data_attn_weight_gdram_start)[0];
+      T x = loc_w * spatial_w - 0.5;
+      T y = loc_h * spatial_h - 0.5;
+      if (y > -1 && x > -1 && y < spatial_h && x < spatial_w) {
+        loadNeighborPointsData(
+            (T *)data_value_ptr, (T *)ping_data_value_p1_nram,
+            (T *)ping_data_value_p2_nram, (T *)ping_data_value_p3_nram,
+            (T *)ping_data_value_p4_nram, span_num_deal, spatial_w, spatial_h,
+            num_heads, channels, x, y, head_idx);
+      }
+      T spatial_h_next_point = 0;
+      T spatial_w_next_point = 0;
+      T weight_next_point = 0;
+      T x_next_point = 0;
+      T y_next_point = 0;
+      __asm__ volatile("sync;");
+
+      for (int32_t level_idx = 0; level_idx < num_levels; ++level_idx) {
+        for (int32_t point_idx = 0; point_idx < num_points; ++point_idx) {
+          // load data
+          if (point_idx == num_points - 1 && level_idx == num_levels - 1) {
+            // last point no need to load data, continue to compute
+          } else if (point_idx == num_points - 1) {
+            const int32_t level_start_id =
+                ((int32_t *)data_level_start_index_gdram)[level_idx + 1];
+            const int32_t spatial_h_ptr = (level_idx + 1) << 1;
+            __memcpy(
+                data_spatial_shapes_nram,
+                data_spatial_shapes_gdram + spatial_h_ptr * sizeof(int32_t),
+                2 * sizeof(int32_t), GDRAM2NRAM);
+            spatial_h_next_point = ((int32_t *)data_spatial_shapes_nram)[0];
+            spatial_w_next_point = ((int32_t *)data_spatial_shapes_nram)[1];
+            data_value_ptr = data_value_gdram_start +
+                             (level_start_id * num_heads * channels +
+                              c_seg_idx * span_num_deal) *
+                                 sizeof(T);
+            loc_w = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2];
+            loc_h = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2 + 1];
+            weight_next_point =
+                ((T *)data_attn_weight_gdram_start)[level_idx * num_points +
+                                                    point_idx + 1];
+            x_next_point = loc_w * spatial_w_next_point - 0.5;
+            y_next_point = loc_h * spatial_h_next_point - 0.5;
+            if (y_next_point > -1 && x_next_point > -1 &&
+                y_next_point < spatial_h_next_point &&
+                x_next_point < spatial_w_next_point) {
+              loadNeighborPointsData(
+                  (T *)data_value_ptr,
+                  (T *)(ping_data_value_p1_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p2_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p3_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p4_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  span_num_deal, spatial_w_next_point, spatial_h_next_point,
+                  num_heads, channels, x_next_point, y_next_point, head_idx);
+            }
+          } else {
+            spatial_h_next_point = spatial_h;
+            spatial_w_next_point = spatial_w;
+            loc_w = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2];
+            loc_h = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2 + 1];
+            weight_next_point =
+                ((T *)data_attn_weight_gdram_start)[level_idx * num_points +
+                                                    point_idx + 1];
+            x_next_point = loc_w * spatial_w - 0.5;
+            y_next_point = loc_h * spatial_h - 0.5;
+            if (y_next_point > -1 && x_next_point > -1 &&
+                y_next_point < spatial_h && x_next_point < spatial_w) {
+              loadNeighborPointsData(
+                  (T *)data_value_ptr,
+                  (T *)(ping_data_value_p1_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p2_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p3_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p4_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  span_num_deal, spatial_w, spatial_h, num_heads, channels,
+                  x_next_point, y_next_point, head_idx);
+            }
+          }
+
+          // compute
+          if (y > -1 && x > -1 && y < spatial_h && x < spatial_w) {
+            bilinearInterpolation(
+                (T *)(ping_data_value_p1_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p2_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p3_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p4_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)auxiliary_a, (T *)auxiliary_b, span_num_deal, spatial_w,
+                spatial_h, x, y);
+            __bang_mul_scalar((T *)auxiliary_a, (T *)auxiliary_a, (T)weight,
+                              span_num_deal);
+            __bang_add((T *)(ping_data_col_nram +
+                             data_col_ping_pong_idx * ping_pong_gap),
+                       (T *)(ping_data_col_nram +
+                             data_col_ping_pong_idx * ping_pong_gap),
+                       (T *)auxiliary_a, span_num_deal);
+          }
+
+          spatial_w = spatial_w_next_point;
+          spatial_h = spatial_h_next_point;
+          weight = weight_next_point;
+          x = x_next_point;
+          y = y_next_point;
+          __asm__ volatile("sync;");
+        }
+      }
+      // store
+      __memcpy_async(
+          data_col_gdram_start + c_seg_idx * span_num_deal * sizeof(T),
+          ping_data_col_nram + data_col_ping_pong_idx * ping_pong_gap,
+          span_num_deal * sizeof(T), NRAM2GDRAM);
+      data_col_ping_pong_idx = (data_col_ping_pong_idx + 1) % 2;
+    }
+
+    if (channels_rem > 0) {
+      __bang_write_value(
+          (T *)(ping_data_col_nram + data_col_ping_pong_idx * ping_pong_gap),
+          channels_align_rem, (T)0);
+      // load data
+      // level_idx = 0, point_idx = 0
+      __memcpy(data_spatial_shapes_nram, data_spatial_shapes_gdram,
+               2 * sizeof(int32_t), GDRAM2NRAM);
+      int32_t spatial_h = ((int32_t *)data_spatial_shapes_nram)[0];
+      int32_t spatial_w = ((int32_t *)data_spatial_shapes_nram)[1];
+      const char *data_value_ptr =
+          data_value_gdram_start + channels_seg_num * span_num_deal * sizeof(T);
+      T loc_w = ((T *)data_sampling_loc_gdram_start)[0];
+      T loc_h = ((T *)data_sampling_loc_gdram_start)[1];
+      T weight = ((T *)data_attn_weight_gdram_start)[0];
+      T x = loc_w * spatial_w - 0.5;
+      T y = loc_h * spatial_h - 0.5;
+      if (y > -1 && x > -1 && y < spatial_h && x < spatial_w) {
+        loadNeighborPointsData(
+            (T *)data_value_ptr, (T *)ping_data_value_p1_nram,
+            (T *)ping_data_value_p2_nram, (T *)ping_data_value_p3_nram,
+            (T *)ping_data_value_p4_nram, channels_rem, spatial_w, spatial_h,
+            num_heads, channels, x, y, head_idx);
+      }
+      T spatial_h_next_point = 0;
+      T spatial_w_next_point = 0;
+      T weight_next_point = 0;
+      T x_next_point = 0;
+      T y_next_point = 0;
+      __asm__ volatile("sync;");
+
+      for (int32_t level_idx = 0; level_idx < num_levels; ++level_idx) {
+        for (int32_t point_idx = 0; point_idx < num_points; ++point_idx) {
+          // load data
+          if (point_idx == num_points - 1 && level_idx == num_levels - 1) {
+            // last point no need to load data, continue to compute
+          } else if (point_idx == num_points - 1) {
+            const int32_t level_start_id =
+                ((int32_t *)data_level_start_index_gdram)[level_idx + 1];
+            const int32_t spatial_h_ptr = (level_idx + 1) << 1;
+            __memcpy(
+                data_spatial_shapes_nram,
+                data_spatial_shapes_gdram + spatial_h_ptr * sizeof(int32_t),
+                2 * sizeof(int32_t), GDRAM2NRAM);
+            spatial_h_next_point = ((int32_t *)data_spatial_shapes_nram)[0];
+            spatial_w_next_point = ((int32_t *)data_spatial_shapes_nram)[1];
+            data_value_ptr = data_value_gdram_start +
+                             (level_start_id * num_heads * channels +
+                              channels_seg_num * span_num_deal) *
+                                 sizeof(T);
+            loc_w = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2];
+            loc_h = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2 + 1];
+            weight_next_point =
+                ((T *)data_attn_weight_gdram_start)[level_idx * num_points +
+                                                    point_idx + 1];
+            x_next_point = loc_w * spatial_w_next_point - 0.5;
+            y_next_point = loc_h * spatial_h_next_point - 0.5;
+            if (y_next_point > -1 && x_next_point > -1 &&
+                y_next_point < spatial_h_next_point &&
+                x_next_point < spatial_w_next_point) {
+              loadNeighborPointsData(
+                  (T *)data_value_ptr,
+                  (T *)(ping_data_value_p1_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p2_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p3_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p4_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  channels_rem, spatial_w_next_point, spatial_h_next_point,
+                  num_heads, channels, x_next_point, y_next_point, head_idx);
+            }
+          } else {
+            spatial_w_next_point = spatial_w;
+            spatial_h_next_point = spatial_h;
+            loc_w = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2];
+            loc_h = ((T *)data_sampling_loc_gdram_start)
+                [(level_idx * num_points + point_idx + 1) * 2 + 1];
+            weight_next_point =
+                ((T *)data_attn_weight_gdram_start)[level_idx * num_points +
+                                                    point_idx + 1];
+            x_next_point = loc_w * spatial_w - 0.5;
+            y_next_point = loc_h * spatial_h - 0.5;
+            if (y_next_point > -1 && x_next_point > -1 &&
+                y_next_point < spatial_h && x_next_point < spatial_w) {
+              loadNeighborPointsData(
+                  (T *)data_value_ptr,
+                  (T *)(ping_data_value_p1_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p2_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p3_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  (T *)(ping_data_value_p4_nram +
+                        ((level_idx * num_points + point_idx + 1) % 2) *
+                            ping_pong_gap),
+                  channels_rem, spatial_w, spatial_h, num_heads, channels,
+                  x_next_point, y_next_point, head_idx);
+            }
+          }
+
+          // compute
+          if (y > -1 && x > -1 && y < spatial_h && x < spatial_w) {
+            bilinearInterpolation(
+                (T *)(ping_data_value_p1_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p2_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p3_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)(ping_data_value_p4_nram +
+                      ((level_idx * num_points + point_idx) % 2) *
+                          ping_pong_gap),
+                (T *)auxiliary_a, (T *)auxiliary_b, channels_align_rem,
+                spatial_w, spatial_h, x, y);
+            __bang_mul_scalar((T *)auxiliary_a, (T *)auxiliary_a, (T)weight,
+                              channels_align_rem);
+            __bang_add((T *)(ping_data_col_nram +
+                             data_col_ping_pong_idx * ping_pong_gap),
+                       (T *)(ping_data_col_nram +
+                             data_col_ping_pong_idx * ping_pong_gap),
+                       (T *)auxiliary_a, channels_align_rem);
+          }
+
+          spatial_w = spatial_w_next_point;
+          spatial_h = spatial_h_next_point;
+          weight = weight_next_point;
+          x = x_next_point;
+          y = y_next_point;
+          __asm__ volatile("sync;");
+        }
+      }
+      // store
+      __memcpy_async(
+          data_col_gdram_start + channels_seg_num * span_num_deal * sizeof(T),
+          ping_data_col_nram + data_col_ping_pong_idx * ping_pong_gap,
+          channels_rem * sizeof(T), NRAM2GDRAM);
+      data_col_ping_pong_idx = (data_col_ping_pong_idx + 1) % 2;
+    }
+  }
+  __asm__ volatile("sync;");
+  return;
+}
+
+template __mlu_global__ void MLUKernelMsDeformAttnForward<float>(
+    const char *data_value_gdram, const char *data_spatial_shapes_gdram,
+    const char *data_level_start_index_gdram,
+    const char *data_sampling_loc_gdram, const char *data_attn_weight_gdram,
+    const int32_t batch_size, const int32_t num_keys, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_queries,
+    const int32_t num_points, char *data_col_gdram);
+
+void KernelMsDeformAttnForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const char *data_value_gdram,
+    const char *data_spatial_shapes_gdram,
+    const char *data_level_start_index_gdram,
+    const char *data_sampling_loc_gdram, const char *data_attn_weight_gdram,
+    const int32_t batch_size, const int32_t num_keys, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_queries,
+    const int32_t num_points, char *data_col_gdram) {
+  MLUKernelMsDeformAttnForward<float><<<k_dim, k_type, queue>>>(
+      data_value_gdram, data_spatial_shapes_gdram, data_level_start_index_gdram,
+      data_sampling_loc_gdram, data_attn_weight_gdram, batch_size, num_keys,
+      num_heads, channels, num_levels, num_queries, num_points, data_col_gdram);
+}
+
+template <typename T>
+void __mlu_func__ msDeformAttnCol2imBilinear(
+    T *top_grad_temp, const int32_t &height, const int32_t &width, const T &w1,
+    const T &w2, const T &w3, const T &w4, const int32_t &h_low,
+    const int32_t &w_low, const int32_t &h_high, const int32_t &w_high,
+    const int32_t &base_ptr, const int32_t &h_low_ptr_offset,
+    const int32_t &w_low_ptr_offset, const int32_t &h_high_ptr_offset,
+    const int32_t &w_high_ptr_offset, const T &hh, const T &hw, const T &lh,
+    const T &lw, T *top_grad, const T &data_attn_weight, T *grad_h_weight,
+    T *grad_w_weight, T *grad_value, T *grad_output_nram, T *grad_weight,
+    T *grad_sampling_loc, T *grad_attn_weight, T *grad_output_nram_temp,
+    const int32_t &deal_num, const int32_t &deal_num_real,
+    const T *data_value_ptr) {
+  if (h_low >= 0 && w_low >= 0) {
+    int32_t offset1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;
+    __memcpy(grad_output_nram, data_value_ptr + offset1,
+             deal_num_real * sizeof(T), GDRAM2NRAM);
+    __bang_mul_scalar(grad_weight, grad_output_nram, hw, deal_num);
+    __bang_sub(grad_h_weight, grad_h_weight, grad_weight, deal_num);
+    __bang_mul_scalar(grad_weight, grad_output_nram, hh, deal_num);
+    __bang_sub(grad_w_weight, grad_w_weight, grad_weight, deal_num);
+
+    __bang_mul_scalar(top_grad_temp, top_grad, data_attn_weight, deal_num);
+    __bang_mul_scalar(top_grad_temp, top_grad_temp, w1, deal_num);
+    // for calc grad_attn_weight
+    __bang_mul_scalar(grad_output_nram, grad_output_nram, w1, deal_num);
+    __bang_atomic_add((T *)top_grad_temp, (T *)(grad_value + offset1),
+                      (T *)top_grad_temp, deal_num_real);
+  }
+  if (h_low >= 0 && w_high <= width - 1) {
+    int32_t offset2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;
+    __memcpy(grad_output_nram_temp, data_value_ptr + offset2,
+             deal_num_real * sizeof(T), GDRAM2NRAM);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, lw, deal_num);
+    __bang_sub(grad_h_weight, grad_h_weight, grad_weight, deal_num);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, hh, deal_num);
+    __bang_add(grad_w_weight, grad_w_weight, grad_weight, deal_num);
+
+    __bang_mul_scalar(top_grad_temp, top_grad, data_attn_weight, deal_num);
+    __bang_mul_scalar(top_grad_temp, top_grad_temp, w2, deal_num);
+
+    __bang_mul_scalar(grad_output_nram_temp, grad_output_nram_temp, w2,
+                      deal_num);
+    __bang_add(grad_output_nram, grad_output_nram, grad_output_nram_temp,
+               deal_num);
+    __bang_atomic_add((T *)top_grad_temp, (T *)(grad_value + offset2),
+                      (T *)top_grad_temp, deal_num_real);
+  }
+  if (h_high <= height - 1 && w_low >= 0) {
+    int32_t offset3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;
+    __memcpy(grad_output_nram_temp, data_value_ptr + offset3,
+             deal_num_real * sizeof(T), GDRAM2NRAM);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, hw, deal_num);
+    __bang_add(grad_h_weight, grad_h_weight, grad_weight, deal_num);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, lh, deal_num);
+    __bang_sub(grad_w_weight, grad_w_weight, grad_weight, deal_num);
+
+    __bang_mul_scalar(top_grad_temp, top_grad, data_attn_weight, deal_num);
+    __bang_mul_scalar(top_grad_temp, top_grad_temp, w3, deal_num);
+    // for calc grad_attn_weight
+    __bang_mul_scalar(grad_output_nram_temp, grad_output_nram_temp, w3,
+                      deal_num);
+    __bang_add(grad_output_nram, grad_output_nram, grad_output_nram_temp,
+               deal_num);
+    __bang_atomic_add((T *)top_grad_temp, (T *)(grad_value + offset3),
+                      (T *)top_grad_temp, deal_num_real);
+  }
+  if (h_high <= height - 1 && w_high <= width - 1) {
+    int32_t offset4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;
+    __memcpy(grad_output_nram_temp, data_value_ptr + offset4,
+             deal_num_real * sizeof(T), GDRAM2NRAM);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, lw, deal_num);
+    __bang_add(grad_h_weight, grad_h_weight, grad_weight, deal_num);
+    __bang_mul_scalar(grad_weight, grad_output_nram_temp, lh, deal_num);
+    __bang_add(grad_w_weight, grad_w_weight, grad_weight, deal_num);
+
+    __bang_mul_scalar(top_grad_temp, top_grad, data_attn_weight, deal_num);
+    __bang_mul_scalar(top_grad_temp, top_grad_temp, w4, deal_num);
+    // for calc grad_attn_weight
+    __bang_mul_scalar(grad_output_nram_temp, grad_output_nram_temp, w4,
+                      deal_num);
+    __bang_add(grad_output_nram, grad_output_nram, grad_output_nram_temp,
+               deal_num);
+
+    __bang_atomic_add((T *)top_grad_temp, (T *)(grad_value + offset4),
+                      (T *)top_grad_temp, deal_num_real);
+  }
+  __bang_mul(grad_output_nram, grad_output_nram, top_grad, deal_num);
+#if __BANG_ARCH__ >= 322
+  recursiveSumPool(grad_output_nram, 1, deal_num_real, ALIGN_NUM_FOR_REDUCE);
+#else
+  const int32_t align_num_on_200 = NFU_ALIGN_SIZE / sizeof(float);
+  recursiveSumPool(grad_output_nram, align_num_on_200,
+                   deal_num / align_num_on_200, ALIGN_NUM_FOR_REDUCE);
+  __bang_reduce_sum(grad_output_nram, grad_output_nram,
+                    NFU_ALIGN_SIZE / sizeof(float));
+#endif
+  __bang_atomic_add((T *)grad_output_nram, (T *)grad_attn_weight,
+                    (T *)grad_output_nram, 1);
+  __bang_mul_scalar(grad_w_weight, grad_w_weight, width, deal_num);
+  __bang_mul_scalar(top_grad_temp, top_grad, data_attn_weight, deal_num);
+  __bang_mul(grad_w_weight, grad_w_weight, top_grad_temp, deal_num);
+#if __BANG_ARCH__ >= 322
+  recursiveSumPool(grad_w_weight, 1, deal_num_real, ALIGN_NUM_FOR_REDUCE);
+#else
+  recursiveSumPool(grad_w_weight, align_num_on_200, deal_num / align_num_on_200,
+                   ALIGN_NUM_FOR_REDUCE);
+  __bang_reduce_sum(grad_w_weight, grad_w_weight,
+                    NFU_ALIGN_SIZE / sizeof(float));
+#endif
+  __bang_atomic_add((T *)grad_w_weight, (T *)(grad_sampling_loc),
+                    (T *)grad_w_weight, 1);
+
+  __bang_mul_scalar(grad_h_weight, grad_h_weight, height, deal_num);
+  __bang_mul(grad_h_weight, grad_h_weight, top_grad_temp, deal_num);
+#if __BANG_ARCH__ >= 322
+  recursiveSumPool(grad_h_weight, 1, deal_num_real, ALIGN_NUM_FOR_REDUCE);
+#else
+  recursiveSumPool(grad_h_weight, align_num_on_200, deal_num / align_num_on_200,
+                   ALIGN_NUM_FOR_REDUCE);
+  __bang_reduce_sum(grad_h_weight, grad_h_weight,
+                    NFU_ALIGN_SIZE / sizeof(float));
+#endif
+  __bang_atomic_add((T *)grad_h_weight, (T *)(grad_sampling_loc + 1),
+                    (T *)grad_h_weight, 1);
+}
+
+__mlu_global__ void MLUUnion1KernelMsDeformAttnBackward(
+    const float *data_value, const int32_t *spatial_shapes,
+    const int32_t *data_level_start_index, const float *data_sampling_loc,
+    const float *data_attn_weight, const float *grad_output,
+    const int32_t batch, const int32_t spatial_size, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_query,
+    const int32_t num_points, float *grad_value, float *grad_sampling_loc,
+    float *grad_attn_weight) {
+  if (coreId == 0x80) {
+    return;
+  }
+  const int32_t split_num = 8;
+  const int32_t spatial_shapes_size = 64;
+  int32_t deal_num = PAD_DOWN(
+      (MAX_NRAM_SIZE - spatial_shapes_size) / split_num / sizeof(float),
+      ALIGN_NUM);
+  float *grad_output_nram = (float *)nram_buffer;
+  float *grad_output_nram_temp = (float *)nram_buffer + deal_num;
+  float *grad_weight = (float *)nram_buffer + 2 * deal_num;
+  float *grad_h_weight = (float *)nram_buffer + 3 * deal_num;
+  float *grad_w_weight = (float *)nram_buffer + 4 * deal_num;
+  float *top_grad = (float *)nram_buffer + 5 * deal_num;
+  float *top_grad_temp = (float *)nram_buffer + 6 * deal_num;
+  int32_t *spatial_shapes_nram =
+      (int32_t *)((float *)nram_buffer + 7 * deal_num);
+  float *sampling_loc_nram =
+      (float *)nram_buffer + 7 * deal_num + 2 * sizeof(int32_t);
+  const int32_t total_num = batch * num_query * num_heads * num_levels;
+  int32_t num_per_core = total_num / taskDim;
+  int32_t num_rem = total_num % taskDim;
+  num_per_core = num_per_core + int32_t(taskId < num_rem);
+  int32_t start_per_core =
+      num_rem > taskId
+          ? (taskId * num_per_core)
+          : ((num_per_core + 1) * num_rem + (taskId - num_rem) * num_per_core);
+  int32_t end_per_core = start_per_core + num_per_core;
+  const int32_t C_repeat = channels / deal_num;
+  const int32_t C_tail = channels % deal_num;
+  const int32_t qid_stride = num_heads * channels;
+  int32_t base_ptr = 0;
+  for (int32_t num_loop = start_per_core; num_loop < end_per_core; ++num_loop) {
+    const int32_t l_col = num_loop % num_levels;
+    const int32_t m_col = num_loop / num_levels % num_heads;
+    const int32_t q_col = num_loop / num_levels / num_heads % num_query;
+    const int32_t b_col = num_loop / num_query / num_heads / num_levels;
+    int32_t data_weight_ptr = num_loop * num_points;
+    int32_t data_loc_w_ptr = data_weight_ptr << 1;
+    const int32_t value_offset = b_col * spatial_size * num_heads * channels;
+    const int32_t level_start_id = data_level_start_index[l_col];
+    int32_t spatial_h_ptr = l_col << 1;
+    int32_t grad_output_offset = b_col * num_query * num_heads * channels +
+                                 q_col * num_heads * channels +
+                                 m_col * channels;
+    __memcpy(spatial_shapes_nram, spatial_shapes + spatial_h_ptr,
+             2 * sizeof(int32_t), GDRAM2NRAM);
+    const int32_t spatial_h = spatial_shapes_nram[0];
+    const int32_t spatial_w = spatial_shapes_nram[1];
+    const int32_t value_ptr_offset = value_offset + level_start_id * qid_stride;
+    const float *data_value_ptr = data_value + value_ptr_offset;
+    float *grad_value_ptr = grad_value + value_ptr_offset;
+    const int32_t grad_attn_weight_out = num_loop * num_points;
+    const int32_t grad_sampling_loc_out = num_loop * num_points * 2;
+    for (int32_t p_col = 0; p_col < num_points; ++p_col) {
+      __memcpy(sampling_loc_nram, data_sampling_loc + data_loc_w_ptr,
+               2 * sizeof(float), GDRAM2NRAM);
+      const float loc_w = sampling_loc_nram[0];
+      const float loc_h = sampling_loc_nram[1];
+      const float weight = data_attn_weight[data_weight_ptr];
+      const float h_im = loc_h * spatial_h - 0.5;
+      const float w_im = loc_w * spatial_w - 0.5;
+      if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) {
+        const int32_t h_low = floorf(h_im);
+        const int32_t w_low = floorf(w_im);
+        const int32_t h_high = h_low + 1;
+        const int32_t w_high = w_low + 1;
+
+        const float lh = h_im - h_low;
+        const float lw = w_im - w_low;
+        const float hh = 1.0 - lh;
+        const float hw = 1.0 - lw;
+
+        const int32_t w_stride = num_heads * channels;
+        const int32_t h_stride = spatial_w * w_stride;
+        const int32_t h_low_ptr_offset = h_low * h_stride;
+        const int32_t h_high_ptr_offset = h_low_ptr_offset + h_stride;
+        const int32_t w_low_ptr_offset = w_low * w_stride;
+        const int32_t w_high_ptr_offset = w_low_ptr_offset + w_stride;
+
+        float w1 = hh * hw;
+        float w2 = hh * lw;
+        float w3 = lh * hw;
+        float w4 = lh * lw;
+
+        for (int32_t C_loop = 0; C_loop < C_repeat; ++C_loop) {
+          base_ptr = m_col * channels + C_loop * deal_num;
+          __bang_write_zero(grad_weight, 3 * deal_num);
+          __bang_write_zero(grad_output_nram, deal_num);
+          __memcpy(top_grad,
+                   grad_output + grad_output_offset + C_loop * deal_num,
+                   deal_num * sizeof(float), GDRAM2NRAM);
+          msDeformAttnCol2imBilinear(
+              top_grad_temp, spatial_h, spatial_w, w1, w2, w3, w4, h_low, w_low,
+              h_high, w_high, base_ptr, h_low_ptr_offset, w_low_ptr_offset,
+              h_high_ptr_offset, w_high_ptr_offset, hh, hw, lh, lw, top_grad,
+              weight, grad_h_weight, grad_w_weight, grad_value_ptr,
+              grad_output_nram, grad_weight,
+              grad_sampling_loc + grad_sampling_loc_out + p_col * 2,
+              grad_attn_weight + grad_attn_weight_out + p_col,
+              grad_output_nram_temp, deal_num, deal_num, data_value_ptr);
+        }
+        if (C_tail != 0) {
+          base_ptr = m_col * channels + C_repeat * deal_num;
+          __bang_write_zero(grad_output_nram, 8 * deal_num);
+          __memcpy(top_grad,
+                   grad_output + grad_output_offset + C_repeat * deal_num,
+                   C_tail * sizeof(float), GDRAM2NRAM);
+          msDeformAttnCol2imBilinear(
+              top_grad_temp, spatial_h, spatial_w, w1, w2, w3, w4, h_low, w_low,
+              h_high, w_high, base_ptr, h_low_ptr_offset, w_low_ptr_offset,
+              h_high_ptr_offset, w_high_ptr_offset, hh, hw, lh, lw, top_grad,
+              weight, grad_h_weight, grad_w_weight, grad_value_ptr,
+              grad_output_nram, grad_weight,
+              grad_sampling_loc + grad_sampling_loc_out + p_col * 2,
+              grad_attn_weight + grad_attn_weight_out + p_col,
+              grad_output_nram_temp, deal_num, C_tail, data_value_ptr);
+        }
+      }
+      data_weight_ptr += 1;
+      data_loc_w_ptr += 2;
+    }
+  }
+}
+
+__mlu_global__ void MLUUnion1KernelMsDeformAttnBackward(
+    const float *data_value, const int32_t *spatial_shapes,
+    const int32_t *data_level_start_index, const float *data_sampling_loc,
+    const float *data_attn_weight, const float *grad_output,
+    const int32_t batch, const int32_t spatial_size, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_query,
+    const int32_t num_points, float *grad_value, float *grad_sampling_loc,
+    float *grad_attn_weight);
+
+void KernelMsDeformAttnBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const float *data_value,
+    const int32_t *spatial_shapes, const int32_t *data_level_start_index,
+    const float *data_sampling_loc, const float *data_attn_weight,
+    const float *grad_output, const int32_t batch, const int32_t spatial_size,
+    const int32_t num_heads, const int32_t channels, const int32_t num_levels,
+    const int32_t num_query, const int32_t num_points, float *grad_value,
+    float *grad_sampling_loc, float *grad_attn_weight) {
+  MLUUnion1KernelMsDeformAttnBackward<<<k_dim, k_type, queue>>>(
+      data_value, spatial_shapes, data_level_start_index, data_sampling_loc,
+      data_attn_weight, grad_output, batch, spatial_size, num_heads, channels,
+      num_levels, num_query, num_points, grad_value, grad_sampling_loc,
+      grad_attn_weight);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..dcc722d854a0ff67931c6af0aa32e9ca2b0d6509
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_mlu_kernel.mlu
@@ -0,0 +1,483 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "nms_utils.hpp"
+
+#define COORD_DIM (4)
+
+#define SIZE_NRAM_BUF (MAX_NRAM_SIZE + REM_FOR_STACK - 62 * 1024)
+#define SIZE_SRAM_BUF (MAX_SRAM_SIZE)
+
+__nram__ int8_t nram_buffer[SIZE_NRAM_BUF];
+__mlu_shared__ int8_t sram_buffer[SIZE_SRAM_BUF];
+
+enum Addr { SRAM, GDRAM };
+
+template <typename IN_DT, typename OUT_DT>
+__mlu_func__ void nms_detection(
+    uint32_t &output_box_num, const int output_mode, OUT_DT *output_dram,
+    IN_DT *input_data_score, const IN_DT *input_data_box, const Addr input_ram,
+    IN_DT *sram, const int core_limit, const int input_num_boxes,
+    const int max_output_size, const float thresh_iou, const float thresh_score,
+    const float offset, const int algo) {
+  // global value
+  int32_t *exit_flag = (int32_t *)(sram + 28);
+  exit_flag[0] = 0;
+  // score, x1, y1, x2, y2, inter_x1, inter_y1, inter_x2, inter_y2
+  int nms_buffer_count1 = 9;
+  // temp nram buffer to store selected target.
+  int nram_save_limit_count = 256;
+  float div_thresh_iou = 1.0 / thresh_iou;
+
+  // input data ptr
+  const IN_DT *input_x1_ptr = input_data_box;
+  const IN_DT *input_y1_ptr = input_x1_ptr + input_num_boxes;
+  const IN_DT *input_x2_ptr = input_y1_ptr + input_num_boxes;
+  const IN_DT *input_y2_ptr = input_x2_ptr + input_num_boxes;
+
+  int limit = 0;        // find limit when GDRAM or SRAM
+  int max_seg_pad = 0;  // the max length every repeat
+  int repeat = 0;
+  int remain = 0;
+  int remain_pad = 0;
+  int input_offset = 0;  // offset of input_data for current core
+  int nram_save_count = 0;
+
+  if (output_mode == 0) {
+    limit = (SIZE_NRAM_BUF - NFU_ALIGN_SIZE /*for max_box*/ * sizeof(IN_DT) -
+             nram_save_limit_count * sizeof(OUT_DT)) /
+            (nms_buffer_count1 * sizeof(IN_DT));
+  } else {
+    // 5 maens: score, x1, y1, x2, y2
+    limit = (SIZE_NRAM_BUF - NFU_ALIGN_SIZE /*for max_box*/ * sizeof(IN_DT) -
+             nram_save_limit_count * 5 * sizeof(OUT_DT)) /
+            (nms_buffer_count1 * sizeof(IN_DT));
+  }
+
+  int max_seg_iou_compute = 0;
+  int repeat_iou_compute = 0;
+  int remain_iou_compute = 0;
+  int remain_pad_iou_compute = 0;
+
+  getComputeParamsBlockOrU1(sizeof(IN_DT), input_num_boxes, limit, core_limit,
+                            input_offset, max_seg_pad, repeat, remain,
+                            remain_pad, max_seg_iou_compute, repeat_iou_compute,
+                            remain_iou_compute, remain_pad_iou_compute);
+
+  // init the data ptr
+  IN_DT *score = (IN_DT *)nram_buffer;
+  IN_DT *x1 = score + max_seg_pad;
+  IN_DT *y1 = x1 + max_seg_pad;
+  IN_DT *x2 = y1 + max_seg_pad;
+  IN_DT *y2 = x2 + max_seg_pad;
+  IN_DT *inter_x1 = y2 + max_seg_pad;
+  IN_DT *inter_y1 = inter_x1 + max_seg_pad;
+  IN_DT *inter_x2 = inter_y1 + max_seg_pad;
+  IN_DT *inter_y2 = inter_x2 + max_seg_pad;
+  IN_DT *max_box = inter_y2 + max_seg_pad;  // the max score, x1, y1, x2, y2
+  OUT_DT *nram_save =
+      (OUT_DT *)((char *)max_box +
+                 NFU_ALIGN_SIZE);  // offset two line from max_box
+
+#if __BANG_ARCH__ >= 300
+  float max_box_x1 = 0;
+  float max_box_y1 = 0;
+  float max_box_x2 = 0;
+  float max_box_y2 = 0;
+#endif
+  mluMemcpyDirection_t load_dir = SRAM2NRAM;
+  mluMemcpyDirection_t store_dir = NRAM2SRAM;
+  load_dir = (input_ram == SRAM) ? SRAM2NRAM : GDRAM2NRAM;
+  store_dir = (input_ram == SRAM) ? NRAM2SRAM : NRAM2GDRAM;
+
+  for (int keep = 0; keep < max_output_size;
+       keep++) {  // loop until the max_score <= 0
+    if (core_limit != 1) {
+      __sync_cluster();  // sync before current loop
+    }
+
+    /******FIND MAX START******/
+    int max_index = 0;         // the max score index
+    int global_max_index = 0;  // for U1
+    float max_area = 0;        // the max socre area
+    max_box[0] = 0;            // init 0
+    findCoreMaxBox(input_data_score, score, inter_x1, max_box, input_x1_ptr,
+                   input_y1_ptr, input_x2_ptr, input_y2_ptr, load_dir,
+                   input_offset, repeat, remain, remain_pad, max_seg_pad,
+                   max_index);
+
+    if (core_limit == 1) {
+#if __BANG_ARCH__ >= 300
+      calMaxArea(max_box, algo, offset, max_area, max_box_x1, max_box_y1,
+                 max_box_x2, max_box_y2);
+#else
+      calMaxArea(max_box, algo, offset, max_area);
+#endif
+      input_data_score[max_index] = 0;
+      global_max_index = max_index;
+    } else if (core_limit == 4) {
+      __sync_cluster();
+      findClusterMaxBox(sram, max_box, inter_x1, input_data_score, core_limit);
+
+#if __BANG_ARCH__ >= 300
+      calMaxArea(max_box, algo, offset, max_area, max_box_x1, max_box_y1,
+                 max_box_x2, max_box_y2);
+#else
+      calMaxArea(max_box, algo, offset, max_area);
+#endif
+      global_max_index = ((uint32_t *)(max_box + 5))[0];
+      input_data_score[global_max_index] = 0;
+    }
+    // by now, we get: max_score|max_index|max_box|max_area
+    /******FIND MAX END******/
+
+    storeResult(max_box, nram_save, output_dram, keep, nram_save_limit_count,
+                max_output_size, thresh_score, output_mode, nram_save_count,
+                output_box_num);
+
+    // if the max score <= 0, end
+    if (core_limit == 1) {
+      if (float(max_box[0]) <= thresh_score) {
+        break;
+      }
+    } else {
+      if (float(max_box[0]) <= thresh_score) {
+        if (coreId == 0) {
+          exit_flag[0] = 1;
+        }
+      }
+      __sync_cluster();
+      if (exit_flag[0] == 1) {
+        break;
+      }
+    }
+/******NMS STORE END******/
+#if __BANG_ARCH__ >= 300
+    scoreUpdate(input_data_score, load_dir, store_dir, input_x1_ptr,
+                input_y1_ptr, input_x2_ptr, input_y2_ptr, x1, y1, x2, y2, score,
+                inter_x1, inter_y1, inter_x2, inter_y2, max_box, max_box_x1,
+                max_box_y1, max_box_x2, max_box_y2, nram_save,
+                repeat_iou_compute, remain_iou_compute, remain_pad_iou_compute,
+                max_seg_iou_compute, max_seg_pad, thresh_iou, div_thresh_iou,
+                input_offset, offset, max_area, input_num_boxes, algo);
+#else
+    scoreUpdate(input_data_score, load_dir, store_dir, input_x1_ptr,
+                input_y1_ptr, input_x2_ptr, input_y2_ptr, x1, y1, x2, y2, score,
+                inter_x1, inter_y1, inter_x2, inter_y2, max_box, max_box[1],
+                max_box[2], max_box[3], max_box[4], nram_save,
+                repeat_iou_compute, remain_iou_compute, remain_pad_iou_compute,
+                max_seg_iou_compute, max_seg_pad, thresh_iou, div_thresh_iou,
+                input_offset, offset, max_area, input_num_boxes, algo);
+#endif
+  }  // for max_output_size
+}
+
+__mlu_global__ void MLUUnion1KernelNMS(
+    const void *input_boxes, const void *input_confidence,
+    const int input_num_boxes, const int max_output_size,
+    const float iou_threshold, const float confidence_threshold,
+    const int output_mode, void *workspace, void *result_num, void *output,
+    const cnrtDataType_t data_type_input, const float offset, const int algo) {
+  if (data_type_input == CNRT_FLOAT16) {
+    __memcpy(workspace, input_confidence, input_num_boxes * sizeof(half),
+             GDRAM2GDRAM);
+  } else if (data_type_input == CNRT_FLOAT32) {
+    __memcpy(workspace, input_confidence, input_num_boxes * sizeof(float),
+             GDRAM2GDRAM);
+  } else {
+  }
+
+  uint32_t output_box_num = 0;
+  float *score_data = (float *)workspace;
+  float *boxes_data = (float *)input_boxes;
+  float *sram = (float *)sram_buffer;
+
+  if (output_mode == 0) {
+    if (data_type_input == CNRT_FLOAT32) {
+      nms_detection(output_box_num, output_mode, (uint32_t *)output, score_data,
+                    boxes_data, GDRAM, sram, taskDim, input_num_boxes,
+                    max_output_size, iou_threshold, confidence_threshold,
+                    offset, algo);
+    } else {
+      nms_detection(output_box_num, output_mode, (uint32_t *)output,
+                    (half *)score_data, (half *)boxes_data, GDRAM, (half *)sram,
+                    taskDim, input_num_boxes, max_output_size, iou_threshold,
+                    confidence_threshold, offset, algo);
+    }
+  } else {
+    if (data_type_input == CNRT_FLOAT32) {
+      nms_detection(output_box_num, output_mode, (float *)output, score_data,
+                    boxes_data, GDRAM, sram, taskDim, input_num_boxes,
+                    max_output_size, iou_threshold, confidence_threshold,
+                    offset, algo);
+    } else {
+      nms_detection(output_box_num, output_mode, (half *)output,
+                    (half *)score_data, (half *)boxes_data, GDRAM, (half *)sram,
+                    taskDim, input_num_boxes, max_output_size, iou_threshold,
+                    confidence_threshold, offset, algo);
+    }
+  }
+  ((uint32_t *)result_num)[0] = output_box_num;
+}
+
+template <typename IN_DT, typename OUT_DT>
+__mlu_func__ void nms_detection_ux(
+    int32_t *exit_flag, uint32_t &output_box_num, OUT_DT *output_dram,
+    IN_DT *score_data, const IN_DT *boxes_data, const Addr input_ram,
+    const int input_num_boxes, const int max_output_size,
+    const float thresh_iou, const float thresh_score, const float offset,
+    const int output_mode, const int algo, char *cdma_gdram) {
+  exit_flag[0] = 0;
+
+  IN_DT *sram = (IN_DT *)sram_buffer;
+
+  // score, x1, y1, x2, y2, inter_x1, inter_y1, inter_x2, inter_y2
+  int nms_buffer_count1 = 9;
+  // temp nram buffer to store selected target.
+  int nram_save_limit_count = 256;
+  float div_thresh_iou = 1.0 / thresh_iou;
+
+  // input data ptr
+  const IN_DT *input_x1_ptr = boxes_data;
+  const IN_DT *input_y1_ptr = input_x1_ptr + input_num_boxes;
+  const IN_DT *input_x2_ptr = input_y1_ptr + input_num_boxes;
+  const IN_DT *input_y2_ptr = input_x2_ptr + input_num_boxes;
+
+  int limit = 0;        // find limit when GDRAM or SRAM
+  int max_seg_pad = 0;  // the max length every repeat
+  int repeat = 0;
+  int remain = 0;
+  int remain_pad = 0;
+  int nram_save_count = 0;
+
+  if (output_mode == 0) {
+    limit = (SIZE_NRAM_BUF - NFU_ALIGN_SIZE /*for max_box*/ * sizeof(IN_DT) -
+             nram_save_limit_count * sizeof(OUT_DT)) /
+            (nms_buffer_count1 * sizeof(IN_DT));
+  } else {
+    limit = (SIZE_NRAM_BUF - NFU_ALIGN_SIZE /*for max_box*/ * sizeof(IN_DT) -
+             nram_save_limit_count * INFO_NUM * sizeof(OUT_DT)) /
+            (nms_buffer_count1 * sizeof(IN_DT));
+  }
+
+  int input_offset = 0;
+  int max_seg_iou_compute = 0;
+  int repeat_iou_compute = 0;
+  int remain_iou_compute = 0;
+  int remain_pad_iou_compute = 0;
+
+  getComputeParamsUx(sizeof(IN_DT), input_num_boxes, limit, input_offset,
+                     max_seg_pad, repeat, remain, remain_pad,
+                     max_seg_iou_compute, repeat_iou_compute,
+                     remain_iou_compute, remain_pad_iou_compute);
+  // init the nram ptr
+  IN_DT *score = (IN_DT *)nram_buffer;
+  IN_DT *x1 = score + max_seg_pad;
+  IN_DT *y1 = x1 + max_seg_pad;
+  IN_DT *x2 = y1 + max_seg_pad;
+  IN_DT *y2 = x2 + max_seg_pad;
+  IN_DT *inter_x1 = y2 + max_seg_pad;
+  IN_DT *inter_y1 = inter_x1 + max_seg_pad;
+  IN_DT *inter_x2 = inter_y1 + max_seg_pad;
+  IN_DT *inter_y2 = inter_x2 + max_seg_pad;
+  IN_DT *max_box = inter_y2 + max_seg_pad;  // the max score, x1, y1, x2, y2
+  OUT_DT *nram_save =
+      (OUT_DT *)((char *)max_box +
+                 NFU_ALIGN_SIZE);  // offset two line from max_box
+#if __BANG_ARCH__ >= 300
+  float max_box_x1 = 0;
+  float max_box_y1 = 0;
+  float max_box_x2 = 0;
+  float max_box_y2 = 0;
+#endif
+  mluMemcpyDirection_t load_dir = SRAM2NRAM;
+  mluMemcpyDirection_t store_dir = NRAM2SRAM;
+  load_dir = (input_ram == SRAM) ? SRAM2NRAM : GDRAM2NRAM;
+  store_dir = (input_ram == SRAM) ? NRAM2SRAM : NRAM2GDRAM;
+
+  for (int keep = 0; keep < max_output_size;
+       keep++) {  // loop until the max_score <= 0
+    __sync_all();
+
+    int max_index = 0;
+    int global_max_index = 0;  // for Ux
+    float max_area = 0;        // the max socre area
+    max_box[0] = 0;            // init 0
+
+    if (coreId == 0) {
+      findCoreMaxBox(score_data, score, inter_x1, max_box, input_x1_ptr,
+                     input_y1_ptr, input_x2_ptr, input_y2_ptr, load_dir,
+                     input_offset, repeat, remain, remain_pad, max_seg_pad,
+                     max_index);
+      // copy max box info to sram
+      __memcpy(sram, max_box, REDUCE_NUM * sizeof(IN_DT), NRAM2SRAM);
+    }
+    __sync_all();
+#if __BANG_ARCH__ >= 590
+    __memcpy((char *)cdma_gdram + REDUCE_NUM * clusterId * sizeof(IN_DT), sram,
+             REDUCE_NUM * sizeof(IN_DT), SRAM2GDRAM);
+    __sync_all();
+    if (clusterId == 0 && coreId == 0) {
+      __bang_write_zero(inter_x1, NMS_SIZE);
+      __memcpy((char *)inter_x1, (char *)cdma_gdram, sizeof(IN_DT), GDRAM2NRAM,
+               sizeof(IN_DT), REDUCE_NUM * sizeof(IN_DT), clusterDim - 1);
+      __bang_max(max_box, inter_x1, NMS_SIZE);
+      int max_cluster = (sizeof(IN_DT) == sizeof(half))
+                            ? ((uint16_t *)max_box)[1]
+                            : ((uint32_t *)max_box)[1];
+      __memcpy((char *)cdma_gdram,
+               (char *)cdma_gdram + max_cluster * REDUCE_NUM * sizeof(IN_DT),
+               REDUCE_NUM * sizeof(IN_DT), GDRAM2GDRAM);
+    }
+    __sync_all();
+    __memcpy(max_box, cdma_gdram, REDUCE_NUM * sizeof(IN_DT), GDRAM2NRAM);
+#else
+    findGlobalMaxBox(max_box, sram, inter_x1);
+#endif
+
+#if __BANG_ARCH__ >= 300
+    calMaxArea(max_box, algo, offset, max_area, max_box_x1, max_box_y1,
+               max_box_x2, max_box_y2);
+#else
+    calMaxArea(max_box, algo, offset, max_area);
+#endif
+    global_max_index = ((uint32_t *)(max_box + 5))[0];
+    if (coreId != MEMORY_CORE) {
+      score_data[global_max_index] = 0;
+    }
+
+    storeResult(max_box, nram_save, output_dram, keep, nram_save_limit_count,
+                max_output_size, thresh_score, output_mode, nram_save_count,
+                output_box_num);
+
+    if (float(max_box[0]) <= thresh_score) {
+      if (clusterId == 0 && coreId == 0) {
+        exit_flag[0] = 1;  // dram
+      }
+    }
+    __sync_all();
+    if (exit_flag[0] == 1) {
+      break;
+    }
+/******NMS STORE END******/
+#if __BANG_ARCH__ >= 300
+    scoreUpdate(score_data, load_dir, store_dir, input_x1_ptr, input_y1_ptr,
+                input_x2_ptr, input_y2_ptr, x1, y1, x2, y2, score, inter_x1,
+                inter_y1, inter_x2, inter_y2, max_box, max_box_x1, max_box_y1,
+                max_box_x2, max_box_y2, nram_save, repeat_iou_compute,
+                remain_iou_compute, remain_pad_iou_compute, max_seg_iou_compute,
+                max_seg_pad, thresh_iou, div_thresh_iou, input_offset, offset,
+                max_area, input_num_boxes, algo);
+#else
+    scoreUpdate(score_data, load_dir, store_dir, input_x1_ptr, input_y1_ptr,
+                input_x2_ptr, input_y2_ptr, x1, y1, x2, y2, score, inter_x1,
+                inter_y1, inter_x2, inter_y2, max_box, max_box[1], max_box[2],
+                max_box[3], max_box[4], nram_save, repeat_iou_compute,
+                remain_iou_compute, remain_pad_iou_compute, max_seg_iou_compute,
+                max_seg_pad, thresh_iou, div_thresh_iou, input_offset, offset,
+                max_area, input_num_boxes, algo);
+#endif
+  }  // for max_output_size
+}
+
+__mlu_global__ void MLUUionXKernelNMS(
+    const void *input_boxes, const void *input_confidence,
+    const int input_num_boxes, const int max_output_size,
+    const float iou_threshold, const float confidence_threshold,
+    const float offset, const cnrtDataType_t data_type_input,
+    const int output_mode, const int algo, void *workspace, void *result_num,
+    void *output) {
+  int input_dwidth = (data_type_input == CNRT_FLOAT32) ? 4 : 2;
+  int32_t *exit_flag = (int32_t *)((char *)workspace +
+                                   INFO_NUM * input_num_boxes * input_dwidth);
+  char *cdma_addr = (char *)exit_flag + sizeof(int32_t);
+  int reduce_sram_size = NFU_ALIGN_SIZE * REDUCE_NUM * input_dwidth;
+  int availbale_sram_size = SIZE_SRAM_BUF - reduce_sram_size;
+
+  int cluster_score_size = input_num_boxes * input_dwidth;
+  int cluster_boxes_size = input_num_boxes * 4 * input_dwidth;
+  char *sram_score = (char *)sram_buffer + reduce_sram_size;
+  char *sram_boxes =
+      (char *)sram_buffer + reduce_sram_size + cluster_score_size;
+  Addr input_ram = GDRAM;
+  if ((cluster_score_size + cluster_boxes_size) < availbale_sram_size) {
+    input_ram = SRAM;
+    __memcpy(sram_score, input_confidence, cluster_score_size, GDRAM2SRAM);
+    __memcpy(sram_boxes, input_boxes, cluster_boxes_size, GDRAM2SRAM);
+  } else {
+    __memcpy(workspace, input_confidence, cluster_score_size, GDRAM2GDRAM);
+  }
+  __sync_cluster();
+
+  uint32_t output_box_num = 0;
+  float *score_data;
+  float *boxes_data;
+  score_data = (input_ram == SRAM) ? (float *)sram_score : (float *)workspace;
+  boxes_data = (input_ram == SRAM) ? (float *)sram_boxes : (float *)input_boxes;
+
+  if (output_mode == 0) {
+    if (data_type_input == CNRT_FLOAT32) {
+      nms_detection_ux(exit_flag, output_box_num, (uint32_t *)output,
+                       score_data, boxes_data, input_ram, input_num_boxes,
+                       max_output_size, iou_threshold, confidence_threshold,
+                       offset, output_mode, algo, cdma_addr);
+    } else {
+      nms_detection_ux(exit_flag, output_box_num, (uint32_t *)output,
+                       (half *)score_data, (half *)boxes_data, input_ram,
+                       input_num_boxes, max_output_size, iou_threshold,
+                       confidence_threshold, offset, output_mode, algo,
+                       cdma_addr);
+    }
+  } else {
+    if (data_type_input == CNRT_FLOAT32) {
+      nms_detection_ux(exit_flag, output_box_num, (float *)output, score_data,
+                       boxes_data, input_ram, input_num_boxes, max_output_size,
+                       iou_threshold, confidence_threshold, offset, output_mode,
+                       algo, cdma_addr);
+    } else {
+      nms_detection_ux(exit_flag, output_box_num, (half *)output,
+                       (half *)score_data, (half *)boxes_data, input_ram,
+                       input_num_boxes, max_output_size, iou_threshold,
+                       confidence_threshold, offset, output_mode, algo,
+                       cdma_addr);
+    }
+  }
+  ((uint32_t *)result_num)[0] = output_box_num;
+}
+
+void KernelNms(cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+               const cnrtDataType_t data_type_input, const void *boxes_ptr,
+               const void *scores_ptr, const int input_num_boxes,
+               const int max_output_boxes, const float iou_threshold,
+               const float offset, void *workspace_ptr, void *output_size_ptr,
+               void *output_ptr) {
+  switch (k_type) {
+    default: { return; }
+    case CNRT_FUNC_TYPE_BLOCK:
+    case CNRT_FUNC_TYPE_UNION1: {
+      MLUUnion1KernelNMS<<<k_dim, k_type, queue>>>(
+          (void *)boxes_ptr, (void *)scores_ptr, input_num_boxes,
+          max_output_boxes, iou_threshold, /*confidence_threshold=*/0.0,
+          /*output_mode=*/0, workspace_ptr, output_size_ptr, output_ptr,
+          data_type_input, offset, /*algo=*/1);
+    }; break;
+    case CNRT_FUNC_TYPE_UNION2:
+    case CNRT_FUNC_TYPE_UNION4:
+    case CNRT_FUNC_TYPE_UNION8:
+    case CNRT_FUNC_TYPE_UNION16: {
+      MLUUionXKernelNMS<<<k_dim, k_type, queue>>>(
+          (void *)boxes_ptr, (void *)scores_ptr, input_num_boxes,
+          max_output_boxes, iou_threshold, /*confidence_threshold=*/0.0, offset,
+          data_type_input, /*output_mode=*/0, /*algo=*/1, workspace_ptr,
+          output_size_ptr, output_ptr);
+    }; break;
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..61f5ba95df633ea5819f521c962d34ee36beefa8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/nms_utils.hpp
@@ -0,0 +1,553 @@
+/*************************************************************************
+ * Copyright (C) [2019-2022] by Cambricon, Inc.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef NMS_UTILS_HPP_
+#define NMS_UTILS_HPP_
+#include "common_mlu_helper.hpp"
+
+#define NMS_SIZE (64)
+#define NMS_UP(x, y) (x / y + (int)(x % y > 0)) * y
+#define NMS_DOWN(x, y) (x / y) * y
+#define INFO_NUM (5)  // 5 means x1, x2, y1, y2 and score
+#define MEMORY_CORE (0x80)
+#define REDUCE_NUM \
+  (7)  // score, x1, y1, x2, y2, max_index (reserve 2 num for half-type input)
+
+__mlu_func__ void pvLock() {
+#if __BANG_ARCH__ == 270
+  if (coreId != MEMORY_CORE) {
+    __bang_lock(0, 0);
+  }
+#endif
+}
+
+__mlu_func__ void pvUnlock() {
+#if __BANG_ARCH__ == 270
+  if (coreId != MEMORY_CORE) {
+    __bang_unlock(0, 0);
+  }
+#endif
+}
+
+template <typename T>
+static __mlu_func__ void computeReluN(T *nram_dst, T *nram_src, void *nram_tmp,
+                                      const int deal_num,
+                                      const T threshold = 0) {
+  if (threshold < 0) {
+    return;
+  }
+  if (threshold) {
+#if __BANG_ARCH__ >= 300
+    __bang_relun(nram_dst, nram_src, deal_num, threshold);
+#else
+    int align_num = NFU_ALIGN_SIZE / sizeof(T);
+    T *nram_aux_a = (T *)nram_tmp;
+    T *nram_aux_b = nram_aux_a + deal_num;
+    T *nram_zero = nram_aux_b + align_num;
+    __bang_write_value(nram_aux_b, align_num, threshold);
+    __bang_write_zero(nram_zero, align_num);
+    __bang_cycle_lt((T *)nram_aux_a, nram_src, (T *)nram_aux_b, deal_num,
+                    align_num);
+    __bang_mul(nram_dst, nram_src, (T *)nram_aux_a, deal_num);
+    __bang_cycle_eq((T *)nram_aux_a, (T *)nram_aux_a, (T *)nram_zero, deal_num,
+                    align_num);
+    __bang_cycle_mul((T *)nram_aux_a, (T *)nram_aux_a, (T *)nram_aux_b,
+                     deal_num, align_num);
+    __bang_add(nram_dst, nram_dst, (T *)nram_aux_a, deal_num);
+    __bang_cycle_gt((T *)nram_aux_a, nram_dst, (T *)nram_zero, deal_num,
+                    align_num);
+    __bang_mul(nram_dst, nram_dst, (T *)nram_aux_a, deal_num);
+#endif
+  } else {
+#if __BANG_ARCH__ >= 300
+    __bang_relu(nram_dst, nram_src, deal_num);
+#else
+    __bang_active_relu(nram_dst, nram_src, deal_num);
+#endif
+  }
+}
+
+__mlu_func__ void getComputeParamsBlockOrU1(
+    const int input_dwidth, const int input_box_num, const int limit,
+    const int core_limit, int &input_offset, int &max_seg_pad, int &repeat,
+    int &remain, int &remain_pad, int &max_seg_iou_compute,
+    int &repeat_iou_compute, int &remain_iou_compute,
+    int &remain_pad_iou_compute) {
+  int avg_core = input_box_num / core_limit;
+  int rem = input_box_num % core_limit;
+  int len_core = avg_core + (coreId < rem ? 1 : 0);
+  input_offset = avg_core * coreId + (coreId <= rem ? coreId : rem);
+  max_seg_pad = NMS_DOWN(limit, NMS_SIZE);
+  repeat = len_core / max_seg_pad;
+  remain = len_core % max_seg_pad;
+  remain_pad = NMS_UP(remain, NMS_SIZE);
+
+  // if datatype is fp16, we should cvt to fp32 when compute iou
+  max_seg_iou_compute = NMS_DOWN(max_seg_pad / (4 / input_dwidth), NMS_SIZE);
+  repeat_iou_compute = len_core / max_seg_iou_compute;
+  remain_iou_compute = len_core % max_seg_iou_compute;
+  remain_pad_iou_compute = NMS_UP(remain_iou_compute, NMS_SIZE);
+}
+
+__mlu_func__ void getComputeParamsUx(
+    const int input_dwidth, const int input_num_boxes, const int limit,
+    int &input_offset, int &max_seg_pad, int &repeat, int &remain,
+    int &remain_pad, int &max_seg_iou_compute, int &repeat_iou_compute,
+    int &remain_iou_compute, int &remain_pad_iou_compute) {
+  // data split
+  int avg_cluster = input_num_boxes / clusterDim;
+  int rem_cluster = input_num_boxes % clusterDim;
+  int len_cluster = avg_cluster + (clusterId < rem_cluster);
+  int cluster_offset = avg_cluster * clusterId +
+                       (clusterId <= rem_cluster ? clusterId : rem_cluster);
+
+  int avg_core = len_cluster / coreDim;
+  int rem_core = len_cluster % coreDim;
+  int len_core = avg_core + (coreId < rem_core);
+  int core_offset =
+      avg_core * coreId + (coreId <= rem_core ? coreId : rem_core);
+  input_offset = cluster_offset + core_offset;
+
+  max_seg_pad = NMS_DOWN(limit, NMS_SIZE);
+
+  // core 0 of each cluster calculate the max score index
+  int max_index_len_core = avg_cluster + (clusterId < rem_cluster);
+  repeat = max_index_len_core / max_seg_pad;
+  remain = max_index_len_core % max_seg_pad;
+  remain_pad = NMS_UP(remain, NMS_SIZE);
+  // if datatype is fp16, we should cvt to fp32 when compute iou
+  max_seg_iou_compute =
+      NMS_DOWN(max_seg_pad / (sizeof(float) / input_dwidth), NMS_SIZE);
+  repeat_iou_compute = len_core / max_seg_iou_compute;
+  remain_iou_compute = len_core % max_seg_iou_compute;
+  remain_pad_iou_compute = NMS_UP(remain_iou_compute, NMS_SIZE);
+}
+
+template <typename IN_DT>
+__mlu_func__ void findGlobalMaxBox(IN_DT *max_box, IN_DT *sram,
+                                   IN_DT *inter_x1) {
+  // copy all partial max to the sram of cluster 0
+  if (clusterId != 0) {
+    __memcpy(sram + REDUCE_NUM * clusterId, sram, REDUCE_NUM * sizeof(IN_DT),
+             SRAM2SRAM, 0);
+  }
+  __sync_all();
+
+  // reduce between clusters to get the global max box
+  if (clusterId == 0) {
+    if (coreId == 0) {
+      __bang_write_zero(inter_x1, NMS_SIZE);
+      __memcpy(inter_x1, sram, sizeof(IN_DT), SRAM2NRAM, sizeof(IN_DT),
+               REDUCE_NUM * sizeof(IN_DT), clusterDim - 1);
+      __bang_max(max_box, inter_x1, NMS_SIZE);
+      int max_cluster = (sizeof(IN_DT) == sizeof(half))
+                            ? ((uint16_t *)max_box)[1]
+                            : ((uint32_t *)max_box)[1];
+      __memcpy(max_box, sram + max_cluster * REDUCE_NUM,
+               REDUCE_NUM * sizeof(IN_DT), SRAM2NRAM);
+      __memcpy(sram, max_box, REDUCE_NUM * sizeof(IN_DT), NRAM2SRAM);
+    }
+    __sync_cluster();
+    if (coreId == 0x80 && clusterDim > 1) {
+      // broadcast global max box to each cluster's sram
+      for (int cluster_idx = 1; cluster_idx < clusterDim; ++cluster_idx) {
+        __memcpy(sram, sram, REDUCE_NUM * sizeof(IN_DT), SRAM2SRAM,
+                 cluster_idx);
+      }
+    }
+    __sync_cluster();
+  }
+  __sync_all();
+
+  // copy the global max box to max_box
+  __memcpy(max_box, sram, REDUCE_NUM * sizeof(IN_DT), SRAM2NRAM);
+}
+
+template <typename IN_DT>
+__mlu_func__ void findCoreMaxBox(
+    IN_DT *input_score_ptr, IN_DT *score, IN_DT *inter_x1, IN_DT *max_box,
+    const IN_DT *input_x1_ptr, const IN_DT *input_y1_ptr,
+    const IN_DT *input_x2_ptr, const IN_DT *input_y2_ptr,
+    const mluMemcpyDirection_t load_dir, const int input_offset,
+    const int repeat, const int remain, const int remain_pad,
+    const int max_seg_pad, int &max_index) {
+  if (coreId != 0x80) {
+    for (int i = 0; i <= repeat; i++) {
+      if (i == repeat && remain == 0) {
+        break;
+      }
+      int seg_len = 0;  // the length every nms compute
+      int cpy_len = 0;  // the length every nms memcpy
+      i == repeat ? seg_len = remain_pad : seg_len = max_seg_pad;
+      i == repeat ? cpy_len = remain : cpy_len = max_seg_pad;
+      /******NMS LOAD START******/
+      __bang_write_zero(score, seg_len);
+      __memcpy(score, input_score_ptr + input_offset + i * max_seg_pad,
+               cpy_len * sizeof(IN_DT), load_dir, cpy_len * sizeof(IN_DT),
+               cpy_len * sizeof(IN_DT), 0);
+
+      /******NMS LOAD END******/
+
+      __bang_max(inter_x1, score, seg_len);
+      if (inter_x1[0] > max_box[0]) {
+        max_box[0] = inter_x1[0];
+        if (sizeof(IN_DT) == sizeof(half)) {
+          max_index = ((uint16_t *)inter_x1)[1] + input_offset +
+                      i * max_seg_pad;  // offset start from head of input_data
+        } else if (sizeof(IN_DT) == sizeof(float)) {
+          max_index = ((uint32_t *)inter_x1)[1] + input_offset +
+                      i * max_seg_pad;  // offset start from head of input_data
+        }
+      }
+    }  // for repeat
+    // the max box's x1, y1, x2, y2 on every core
+    max_box[1] = input_x1_ptr[max_index];
+    max_box[2] = input_y1_ptr[max_index];
+    max_box[3] = input_x2_ptr[max_index];
+    max_box[4] = input_y2_ptr[max_index];
+    ((uint32_t *)(max_box + 5))[0] = max_index;
+  }
+}
+
+template <typename IN_DT>
+__mlu_func__ void findClusterMaxBox(IN_DT *sram, IN_DT *max_box,
+                                    IN_DT *inter_x1, IN_DT *input_data_score,
+                                    const int core_limit) {
+  // find the max with sram
+  // copy every core's box info to sram, form: score---x1---y1---x2---y2---
+  __memcpy(sram + REDUCE_NUM * coreId, max_box, REDUCE_NUM * sizeof(IN_DT),
+           NRAM2SRAM);  // int32_t datatype
+  __sync_cluster();
+
+  // copy score from sram to nram and find the max
+  __bang_write_zero(inter_x1, 64);
+  __memcpy(inter_x1, sram, sizeof(IN_DT), SRAM2NRAM, sizeof(IN_DT),
+           REDUCE_NUM * sizeof(IN_DT), coreDim - 1);
+  __bang_max(max_box, inter_x1, 64);
+  int max_core = sizeof(IN_DT) == sizeof(half) ? ((uint16_t *)max_box)[1]
+                                               : ((uint32_t *)max_box)[1];
+  // copy the max box to max_box
+  __memcpy(max_box, sram + max_core * REDUCE_NUM, REDUCE_NUM * sizeof(IN_DT),
+           SRAM2NRAM);
+}
+
+/*****************************************************************************/
+/*******************************CALCULATE MAX AREA****************************/
+/*****************************************************************************/
+
+template <typename IN_DT>
+__mlu_func__ void calMaxArea(IN_DT *max_box, const int algo, float offset,
+                             float &max_area) {
+  if (algo == 0 || offset == 0.0) {
+    max_area = ((float)max_box[3] - (float)max_box[1]) *
+               ((float)max_box[4] - (float)max_box[2]);
+  } else {
+    max_area = ((float)max_box[3] - (float)max_box[1] + offset) *
+               ((float)max_box[4] - (float)max_box[2] + offset);
+  }
+}
+
+template <typename IN_DT>
+__mlu_func__ void calMaxArea(IN_DT *max_box, const int algo, float offset,
+                             float &max_area, float &max_box_x1,
+                             float &max_box_y1, float &max_box_x2,
+                             float &max_box_y2) {
+  // the case of random inf will break the requirement of x1<=x2, y1<=y2
+  // so exchange it if it happens.
+  max_box_x1 = float(max_box[1]);
+  max_box_x2 = float(max_box[3]);
+  if (max_box[1] > max_box[3]) {
+    max_box_x1 = float(max_box[3]);
+    max_box_x2 = float(max_box[1]);
+  }
+  max_box_y1 = float(max_box[2]);
+  max_box_y2 = float(max_box[4]);
+  if (max_box[2] > max_box[4]) {
+    max_box_y1 = float(max_box[4]);
+    max_box_y2 = float(max_box[2]);
+  }
+  if (algo == 0 || offset == 0.0) {
+    max_area = (max_box_x2 - max_box_x1) * (max_box_y2 - max_box_y1);
+  } else {
+    max_area =
+        (max_box_x2 - max_box_x1 + offset) * (max_box_y2 - max_box_y1 + offset);
+  }
+}
+
+/***********************************************************************/
+/*******************************STORE RESULT****************************/
+/***********************************************************************/
+template <typename IN_DT, typename OUT_DT>
+__mlu_func__ void storeResult(IN_DT *max_box, OUT_DT *nram_save,
+                              OUT_DT *&output_dram, const int keep,
+                              const int nram_save_limit_count,
+                              const int max_output_size,
+                              const float thresh_score, const int output_mode,
+                              int &nram_save_count, uint32_t &output_box_num) {
+  /******NMS STORE START******/
+  // store to nram
+  if (float(max_box[0]) > thresh_score) {
+    OUT_DT *save_ptr;
+    int save_offset = 0;
+    int save_str_num = 0;
+    save_ptr = nram_save;
+    save_offset = nram_save_count;
+    save_str_num = nram_save_limit_count;
+    if (clusterId == 0 && coreId == 0) {
+      if (output_mode == 0) {  // index1, index2, ...
+        save_ptr[save_offset] = ((uint32_t *)(max_box + INFO_NUM))[0];
+      } else if (output_mode == 1) {  // score, x1, y1, x2, y2
+        __memcpy(save_ptr + save_offset * INFO_NUM, max_box,
+                 INFO_NUM * sizeof(IN_DT), NRAM2NRAM, INFO_NUM * sizeof(IN_DT),
+                 INFO_NUM * sizeof(IN_DT), 0);
+      } else if (output_mode == 2) {  // score---, x1---, y1---, x2---, y2---
+        __memcpy(save_ptr + save_offset, max_box, 1 * sizeof(IN_DT), NRAM2NRAM,
+                 save_str_num * sizeof(IN_DT), 1 * sizeof(IN_DT), 4);
+      }
+    }
+    nram_save_count++;
+    output_box_num++;
+  }
+
+  // store to sram/gdram
+  if (output_box_num != 0) {
+    if ((nram_save_count == nram_save_limit_count) ||
+        (float(max_box[0]) <= thresh_score) || keep == max_output_size - 1) {
+      if (nram_save_count != 0) {
+        if (clusterId == 0 && coreId == 0) {
+          if (output_mode == 0) {  // index1, index2, ...
+            pvLock();
+            __memcpy(output_dram, nram_save, nram_save_count * sizeof(uint32_t),
+                     NRAM2GDRAM);
+            pvUnlock();
+            output_dram += nram_save_count;
+          } else if (output_mode == 1) {  // score, x1, y1, x2, y2
+            pvLock();
+            __memcpy(output_dram, nram_save,
+                     nram_save_count * INFO_NUM * sizeof(IN_DT), NRAM2GDRAM);
+            pvUnlock();
+            output_dram += nram_save_count * INFO_NUM;
+          } else if (output_mode ==
+                     2) {  // score---, x1---, y1---, x2---, y2---
+            pvLock();
+            __memcpy(output_dram, nram_save, nram_save_count * sizeof(IN_DT),
+                     NRAM2GDRAM, max_output_size * sizeof(IN_DT),
+                     nram_save_limit_count * sizeof(IN_DT), 4);
+            pvUnlock();
+            output_dram += nram_save_count;
+          }
+          nram_save_count = 0;
+        }
+      }
+    }  // if move data nram->sram/gdram
+  }    // if dst
+}
+
+template <typename IN_DT, typename OUT_DT>
+__mlu_func__ void scoreUpdate(
+    IN_DT *input_score_ptr, const mluMemcpyDirection_t load_dir,
+    const mluMemcpyDirection_t store_dir, const IN_DT *input_x1_ptr,
+    const IN_DT *input_y1_ptr, const IN_DT *input_x2_ptr,
+    const IN_DT *input_y2_ptr, IN_DT *x1, IN_DT *y1, IN_DT *x2, IN_DT *y2,
+    IN_DT *score, IN_DT *inter_x1, IN_DT *inter_y1, IN_DT *inter_x2,
+    IN_DT *inter_y2, IN_DT *max_box, const float max_box_x1,
+    const float max_box_y1, const float max_box_x2, const float max_box_y2,
+    OUT_DT *nram_save, int repeat_iou_compute, int remain_iou_compute,
+    int remain_pad_iou_compute, int max_seg_iou_compute, int max_seg_pad,
+    const float thresh_iou, const float div_thresh_iou, const int input_offset,
+    const float offset, const float max_area, const int input_num_boxes,
+    const int algo) {
+  for (int i = 0; i <= repeat_iou_compute; i++) {
+    if (i == repeat_iou_compute && remain_iou_compute == 0) {
+      break;
+    }
+    int seg_len = (i == repeat_iou_compute) ? remain_pad_iou_compute
+                                            : max_seg_iou_compute;
+    int cpy_len =
+        (i == repeat_iou_compute) ? remain_iou_compute : max_seg_iou_compute;
+    /******NMS LOAD START******/
+    int dt_offset = 0;
+    if (sizeof(IN_DT) == sizeof(float)) {
+      __memcpy(score, input_score_ptr + input_offset + i * max_seg_pad,
+               cpy_len * sizeof(IN_DT), load_dir, cpy_len * sizeof(IN_DT),
+               cpy_len * sizeof(IN_DT), 0);
+      dt_offset = 0;
+    } else if (sizeof(IN_DT) == sizeof(half)) {
+      __memcpy(x1, input_score_ptr + input_offset + i * max_seg_iou_compute,
+               cpy_len * sizeof(IN_DT), load_dir, cpy_len * sizeof(IN_DT),
+               cpy_len * sizeof(IN_DT), 0);
+      __bang_half2float((float *)score, (half *)x1, seg_len);
+      dt_offset = max_seg_iou_compute;
+    }
+#if __BANG_ARCH__ >= 300
+    __memcpy(inter_x1 + dt_offset,
+             input_x1_ptr + input_offset + i * max_seg_iou_compute,
+             cpy_len * sizeof(IN_DT), load_dir, max_seg_pad * sizeof(IN_DT),
+             input_num_boxes * sizeof(IN_DT), 3);
+
+    if (sizeof(IN_DT) == sizeof(half)) {
+      __bang_half2float((float *)inter_x1,
+                        (half *)inter_x1 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)inter_y1,
+                        (half *)inter_y1 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)inter_x2,
+                        (half *)inter_x2 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)inter_y2,
+                        (half *)inter_y2 + max_seg_iou_compute, seg_len);
+    }
+    // box transfer
+    __bang_minequal((float *)x1, (float *)inter_x1, (float *)inter_x2, seg_len);
+    __bang_maxequal((float *)x2, (float *)inter_x1, (float *)inter_x2, seg_len);
+    __bang_minequal((float *)y1, (float *)inter_y1, (float *)inter_y2, seg_len);
+    __bang_maxequal((float *)y2, (float *)inter_y1, (float *)inter_y2, seg_len);
+    // 1、 compute IOU
+    // get the area_I
+    __bang_maxeq_scalar((float *)inter_x1, (float *)x1, max_box_x1,
+                        seg_len);  // inter_x1
+    __bang_mineq_scalar((float *)inter_x2, (float *)x2, max_box_x2,
+                        seg_len);  // inter_x2
+    __bang_sub((float *)inter_x1, (float *)inter_x2, (float *)inter_x1,
+               seg_len);
+    if (algo == 1 && offset != 0.0) {
+      __bang_add_scalar((float *)inter_x1, (float *)inter_x1, offset, seg_len);
+    }
+    computeReluN((float *)inter_x1, (float *)inter_x1, NULL,
+                 seg_len);  // inter_w
+    __bang_maxeq_scalar((float *)inter_y1, (float *)y1, float(max_box_y1),
+                        seg_len);  // inter_y1
+    __bang_mineq_scalar((float *)inter_y2, (float *)y2, float(max_box_y2),
+                        seg_len);  // inter_y2
+    __bang_sub((float *)inter_y1, (float *)inter_y2, (float *)inter_y1,
+               seg_len);
+    if (algo == 1 && offset != 0.0) {
+      __bang_add_scalar((float *)inter_y1, (float *)inter_y1, offset, seg_len);
+    }
+    computeReluN((float *)inter_y1, (float *)inter_y1, NULL,
+                 seg_len);  // inter_h
+    __bang_mul((float *)inter_x1, (float *)inter_x1, (float *)inter_y1,
+               seg_len);  // area_I
+    // get the area of input_box: area = (x2 - x1) * (y2 - y1);
+    if (algo == 1 && offset != 0.0) {
+      __bang_fusion(FUSION_FSA, (float *)inter_y1, (float *)x2, (float *)x1,
+                    offset, seg_len, seg_len);
+      __bang_fusion(FUSION_FSA, (float *)inter_y2, (float *)y2, (float *)y1,
+                    offset, seg_len, seg_len);
+      __bang_mul((float *)inter_x2, (float *)inter_y1, (float *)inter_y2,
+                 seg_len);  // area
+    } else {
+      __bang_sub((float *)inter_y1, (float *)x2, (float *)x1, seg_len);
+      __bang_fusion(FUSION_FSM, (float *)inter_x2, (float *)y2, (float *)y1,
+                    (float *)inter_y1, seg_len, seg_len);
+    }
+    // get the area_U: area + max_area - area_I
+    __bang_fusion(FUSION_FAS, (float *)inter_x2, (float *)inter_x2, max_area,
+                  (float *)inter_x1, seg_len, seg_len);
+    // 2、 select the box
+    // if IOU greater than thres, set the score to zero, abort it: area_U >
+    // area_I * (1 / thresh)?
+    if (thresh_iou > 0.0) {
+      __bang_mul_scalar((float *)inter_x1, (float *)inter_x1, div_thresh_iou,
+                        seg_len);
+    } else {
+      __bang_mul_scalar((float *)inter_x2, (float *)inter_x2, thresh_iou,
+                        seg_len);
+    }
+    // process for nan
+    __bang_lt((float *)inter_x1, (float *)inter_x2, (float *)inter_x1, seg_len);
+    __bang_not((float *)inter_x1, (float *)inter_x1, seg_len);
+    __bang_mul((float *)score, (float *)score, (float *)inter_x1, seg_len);
+/******NMS COMPUTE END******/
+#else
+    __memcpy(x1 + dt_offset,
+             input_x1_ptr + input_offset + i * max_seg_iou_compute,
+             cpy_len * sizeof(IN_DT), load_dir, max_seg_pad * sizeof(IN_DT),
+             input_num_boxes * sizeof(IN_DT), 3);
+    if (sizeof(IN_DT) == sizeof(half)) {
+      __bang_half2float((float *)x1, (half *)x1 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)y1, (half *)y1 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)x2, (half *)x2 + max_seg_iou_compute, seg_len);
+      __bang_half2float((float *)y2, (half *)y2 + max_seg_iou_compute, seg_len);
+    }
+    // 1、 compute IOU
+    // get the area_I
+    __bang_write_value((float *)inter_y1, seg_len,
+                       float(max_box[1]));  // max_x1
+    __bang_maxequal((float *)inter_x1, (float *)x1, (float *)inter_y1,
+                    seg_len);  // inter_x1
+    __bang_write_value((float *)inter_y2, seg_len,
+                       float(max_box[3]));  // max_x2
+    __bang_minequal((float *)inter_x2, (float *)x2, (float *)inter_y2,
+                    seg_len);  // inter_x2
+    __bang_sub((float *)inter_x1, (float *)inter_x2, (float *)inter_x1,
+               seg_len);
+    if (algo == 1 && offset != 0.0) {
+      __bang_add_scalar((float *)inter_x1, (float *)inter_x1, offset, seg_len);
+    }
+    computeReluN((float *)inter_x1, (float *)inter_x1, NULL,
+                 seg_len);  // inter_w
+    __bang_write_value((float *)inter_x2, seg_len,
+                       float(max_box[2]));  // max_y1
+    __bang_maxequal((float *)inter_y1, (float *)y1, (float *)inter_x2,
+                    seg_len);  // inter_y1
+    __bang_write_value((float *)inter_x2, seg_len,
+                       float(max_box[4]));  // max_y2
+    __bang_minequal((float *)inter_y2, (float *)y2, (float *)inter_x2,
+                    seg_len);  // inter_y2
+    __bang_sub((float *)inter_y1, (float *)inter_y2, (float *)inter_y1,
+               seg_len);
+    if (algo == 1 && offset != 0.0) {
+      __bang_add_scalar((float *)inter_y1, (float *)inter_y1, offset, seg_len);
+    }
+    computeReluN((float *)inter_y1, (float *)inter_y1, NULL,
+                 seg_len);  // inter_h
+    __bang_mul((float *)inter_x1, (float *)inter_x1, (float *)inter_y1,
+               seg_len);  // area_I
+    // get the area of input_box: area = (x2 - x1) * (y2 - y1);
+    __bang_sub((float *)inter_y1, (float *)x2, (float *)x1, seg_len);
+    __bang_sub((float *)inter_y2, (float *)y2, (float *)y1, seg_len);
+    if (algo == 1 && offset != 0.0) {
+      __bang_add_scalar((float *)inter_y1, (float *)inter_y1, offset, seg_len);
+      __bang_add_scalar((float *)inter_y2, (float *)inter_y2, offset, seg_len);
+    }
+    __bang_mul((float *)inter_x2, (float *)inter_y1, (float *)inter_y2,
+               seg_len);  // area
+    // get the area_U: area + max_area - area_I
+    __bang_add_scalar((float *)inter_x2, (float *)inter_x2, float(max_area),
+                      seg_len);
+    __bang_sub((float *)inter_x2, (float *)inter_x2, (float *)inter_x1,
+               seg_len);  // area_U
+    // 2、 select the box
+    // if IOU greater than thresh, set the score to zero, abort it: area_U >
+    // area_I * (1 / thresh)?
+    if (thresh_iou > 0.0) {
+      __bang_mul_scalar((float *)inter_x1, (float *)inter_x1, div_thresh_iou,
+                        seg_len);
+    } else {
+      __bang_mul_scalar((float *)inter_x2, (float *)inter_x2, thresh_iou,
+                        seg_len);
+    }
+    __bang_ge((float *)inter_x1, (float *)inter_x2, (float *)inter_x1, seg_len);
+    __bang_mul((float *)score, (float *)score, (float *)inter_x1, seg_len);
+/******NMS COMPUTE END******/
+#endif
+    // update the score
+    if (sizeof(IN_DT) == sizeof(half)) {
+      convertFloat2half((half *)score, (float *)score, seg_len);
+    }
+    pvLock();
+    __memcpy(input_score_ptr + input_offset + i * max_seg_iou_compute, score,
+             cpy_len * sizeof(IN_DT), store_dir, cpy_len * sizeof(IN_DT),
+             cpy_len * sizeof(IN_DT), 0);
+    pvUnlock();
+  }
+}
+
+#endif  // NMS_UTILS_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..055ee4f4d05a5e7a67634163b92c09beeb053c52
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_mlu_kernel.mlu
@@ -0,0 +1,615 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+#include "psamask_utils.hpp"
+
+#define COMPUTE_COUNT_ALIGN 64
+
+__nram__ char buf[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void swap(T &a, T &b) {
+  T tmp = a;
+  a = b;
+  b = tmp;
+}
+
+template <typename T>
+__mlu_func__ void storeDataFromNramToDram(T *dst, const T *src,
+                                          const PositionInCore &position,
+                                          const Shape &shape_full) {
+  int n_offset = shape_full.h * shape_full.w * shape_full.c;
+  int h_offset = shape_full.w * shape_full.c;
+  int w_offset = shape_full.c;
+  int n_seg = position.n_end - position.n_start;
+  int h_seg = position.h_end - position.h_start;
+  int w_seg = position.w_end - position.w_start;
+  int size = h_seg * w_seg * shape_full.c;
+
+  __memcpy(dst + position.n_start * n_offset + position.h_start * h_offset +
+               position.w_start * w_offset,
+           src, size * sizeof(T), NRAM2GDRAM, n_offset * sizeof(T),
+           size * sizeof(T), n_seg - 1);
+}
+
+template <typename T>
+__mlu_func__ void loadDataFromDramToNram(T *dst, const T *src,
+                                         const PositionInCore &position,
+                                         const Shape &shape_full) {
+  int n_offset = shape_full.h * shape_full.w * shape_full.c;
+  int h_offset = shape_full.w * shape_full.c;
+  int w_offset = shape_full.c;
+  int n_seg = position.n_end - position.n_start;
+  int h_seg = position.h_end - position.h_start;
+  int w_seg = position.w_end - position.w_start;
+  int size = h_seg * w_seg * shape_full.c;
+
+  __memcpy(dst, src + position.n_start * n_offset +
+                    position.h_start * h_offset + position.w_start * w_offset,
+           size * sizeof(T), GDRAM2NRAM, size * sizeof(T), n_offset * sizeof(T),
+           n_seg - 1);
+}
+
+// transpose the data from A*B*C*(D*E) to A*D*E*(B*C)
+template <typename T>
+__mlu_func__ void transposeData(T *dst, T *src, const Shape &shape_seg) {
+  int align_c = CEIL_ALIGN(shape_seg.c, COMPUTE_COUNT_ALIGN / sizeof(T));
+  int align_hw =
+      CEIL_ALIGN(shape_seg.h * shape_seg.w, COMPUTE_COUNT_ALIGN / sizeof(T));
+  for (int i = 0; i < shape_seg.n; ++i) {
+    __bang_transpose(dst, src, align_hw, align_c);
+    dst += align_hw * align_c;
+    src += align_hw * align_c;
+  }
+}
+
+template <typename T>
+__mlu_func__ void psamaskCollectForward(
+    const T *x_dram, T *y_dram, const PositionInCore &position,
+    const Shape &x_full, const Shape &y_full, const Shape &shape_seg,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask) {
+  T *x_nram = (T *)buf;
+  T *y_nram =
+      x_nram + CEIL_ALIGN(shape_seg.n * shape_seg.h * shape_seg.w * x_full.c,
+                          COMPUTE_COUNT_ALIGN / sizeof(T));
+  loadDataFromDramToNram(x_nram, x_dram, position, x_full);
+
+  // fill zeros to output
+  int elem_count =
+      CEIL_ALIGN(shape_seg.n * shape_seg.h * shape_seg.w * y_full.c,
+                 NFU_ALIGN_SIZE / sizeof(T));
+  __bang_write_value(y_nram, elem_count, (T)0);
+
+  int y_n_offset = shape_seg.h * shape_seg.w * shape_seg.c;
+  int y_h_offset = shape_seg.w * shape_seg.c;
+  int y_w_offset = shape_seg.c;
+  int x_n_offset = shape_seg.h * shape_seg.w * x_full.c;
+  int y_c_offset = 1;
+  int x_h_offset = shape_seg.w * x_full.c;
+  int x_w_offset = x_full.c;
+  int x_c_offset = 1;
+  int x_start = 0;
+  int y_start = 0;
+  for (int nidx = 0; nidx < shape_seg.n; ++nidx) {
+    for (int hidx = 0; hidx < shape_seg.h; ++hidx) {
+      for (int widx = 0; widx < shape_seg.w; ++widx) {
+        int h_abs = hidx + position.h_start;
+        int w_abs = widx + position.w_start;
+        int y_offset = y_start;
+        int x_offset = x_start;
+        y_offset += hidx * y_h_offset + widx * y_w_offset;
+        x_offset += hidx * x_h_offset + widx * x_w_offset;
+
+        const int hstart = half_h_mask - h_abs > 0 ? half_h_mask - h_abs : 0;
+        const int hend = x_full.h + half_h_mask - h_abs < h_mask
+                             ? x_full.h + half_h_mask - h_abs
+                             : h_mask;
+        const int wstart = half_w_mask - w_abs > 0 ? half_w_mask - w_abs : 0;
+        const int wend = x_full.w + half_w_mask - w_abs < w_mask
+                             ? x_full.w + half_w_mask - w_abs
+                             : w_mask;
+        // (h,                      w                  ) with mask-indexed
+        // (h + hidx - half_h_mask, w + widx - half_w_mask) with feature-indexed
+        y_offset += ((hstart + h_abs - half_h_mask) * x_full.w + wstart +
+                     w_abs - half_w_mask) *
+                    y_c_offset;
+        x_offset += (hstart * w_mask + wstart) * x_c_offset;
+        int count = wend - wstart;
+        __memcpy(y_nram + y_offset, x_nram + x_offset, count * sizeof(T),
+                 NRAM2NRAM, y_c_offset * x_full.w * sizeof(T),
+                 x_c_offset * w_mask * sizeof(T), hend - hstart - 1);
+      }
+    }
+    y_start += y_n_offset;
+    x_start += x_n_offset;
+  }
+  storeDataFromNramToDram(y_dram, y_nram, position, y_full);
+}
+
+template <typename T>
+__mlu_func__ void psamaskDistributeForward(
+    const T *x_dram, T *y_dram, const PositionInCore &position,
+    const Shape &x_full, const Shape &y_full, const Shape &shape_seg,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask) {
+  T *x_nram = (T *)buf;
+  T *y_nram_temp =
+      x_nram + CEIL_ALIGN(shape_seg.n * shape_seg.h * shape_seg.w * x_full.c,
+                          COMPUTE_COUNT_ALIGN / sizeof(T));
+  loadDataFromDramToNram(x_nram, x_dram, position, x_full);
+
+  // fill zeros to output
+  int align_c = CEIL_ALIGN(y_full.c, COMPUTE_COUNT_ALIGN / sizeof(T));
+  int align_hw =
+      CEIL_ALIGN(shape_seg.h * shape_seg.w, COMPUTE_COUNT_ALIGN / sizeof(T));
+  int elem_count =
+      CEIL_ALIGN(shape_seg.n * align_c * align_hw, NFU_ALIGN_SIZE / sizeof(T));
+  __bang_write_value(y_nram_temp, elem_count, (T)0);
+
+  int y_n_offset = align_hw * align_c;
+  int y_h_offset = shape_seg.w * align_c;
+  int y_w_offset = align_c;
+  int y_c_offset = 1;
+  int x_n_offset = shape_seg.h * shape_seg.w * x_full.c;
+  int x_h_offset = shape_seg.w * x_full.c;
+  int x_w_offset = x_full.c;
+  int x_c_offset = 1;
+  int h_feature = y_full.h;
+  int w_feature = y_full.w;
+
+  int y_start = 0;
+  int x_start = 0;
+  for (int nidx = 0; nidx < shape_seg.n; ++nidx) {
+    for (int hidx = 0; hidx < shape_seg.h; ++hidx) {
+      for (int widx = 0; widx < shape_seg.w; ++widx) {
+        int h_abs = hidx + position.h_start;
+        int w_abs = widx + position.w_start;
+        int y_offset = y_start;
+        int x_offset = x_start;
+        y_offset += hidx * y_h_offset + widx * y_w_offset;
+        x_offset += hidx * x_h_offset + widx * x_w_offset;
+        const int hstart = half_h_mask - h_abs > 0 ? half_h_mask - h_abs : 0;
+        const int hend = h_feature + half_h_mask - h_abs < h_mask
+                             ? h_feature + half_h_mask - h_abs
+                             : h_mask;
+        const int wstart = half_w_mask - w_abs > 0 ? half_w_mask - w_abs : 0;
+        const int wend = w_feature + half_w_mask - w_abs < w_mask
+                             ? w_feature + half_w_mask - w_abs
+                             : w_mask;
+        // (h,                      w                     ) with mask-indexed
+        // (h + hidx - half_h_mask, w + widx - half_w_mask) with feature-indexed
+        y_offset += ((hstart + h_abs - half_h_mask) * x_full.w + wstart +
+                     w_abs - half_w_mask) *
+                    y_c_offset;
+        x_offset += (hstart * w_mask + wstart) * x_c_offset;
+        int count = wend - wstart;
+        __memcpy(y_nram_temp + y_offset, x_nram + x_offset, count * sizeof(T),
+                 NRAM2NRAM, y_c_offset * w_feature * sizeof(T),
+                 x_c_offset * w_mask * sizeof(T), hend - hstart - 1);
+      }
+    }
+    y_start += y_n_offset;
+    x_start += x_n_offset;
+  }
+  // transpose y
+  T *y_nram = y_nram_temp + shape_seg.n * align_hw * align_c;
+  Shape y_seg{shape_seg.n, shape_seg.h, shape_seg.w, y_full.c};
+  transposeData(y_nram, y_nram_temp, y_seg);
+  swap(align_c, align_hw);
+  // store y from nram to dram
+  int y_n_offset_full = y_full.h * y_full.w * y_full.c;
+  int y_w_offset_full = y_full.c;
+  int y_c_offset_full = 1;
+
+  int y_dram_start =
+      position.n_start * y_n_offset_full +
+      (position.h_start * y_full.w + position.w_start) * y_c_offset_full;
+  int y_nram_start = 0;
+  for (int nidx = 0; nidx < shape_seg.n; ++nidx) {
+    int y_dram_offset = y_dram_start + nidx * y_n_offset_full;
+    int y_nram_offset = y_nram_start + nidx * align_hw * align_c;
+    __memcpy(y_dram + y_dram_offset, y_nram + y_nram_offset,
+             shape_seg.h * shape_seg.w * sizeof(T), NRAM2GDRAM,
+             y_w_offset_full * sizeof(T), align_c * sizeof(T),
+             h_feature * w_feature - 1);
+  }
+}
+
+template <typename T>
+__mlu_func__ void psamaskCollectBackward(
+    const T *dy_dram, T *dx_dram, const PositionInCore &position,
+    const Shape &dy_full, const Shape &dx_full, const Shape &shape_seg,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask) {
+  T *dy_nram = (T *)buf;
+  T *dx_nram =
+      dy_nram + CEIL_ALIGN(shape_seg.n * shape_seg.h * shape_seg.w * dy_full.c,
+                           COMPUTE_COUNT_ALIGN / sizeof(T));
+  loadDataFromDramToNram(dy_nram, dy_dram, position, dy_full);
+
+  // fill zeros to output
+  int elem_count =
+      CEIL_ALIGN(shape_seg.n * shape_seg.h * shape_seg.w * shape_seg.c,
+                 NFU_ALIGN_SIZE / sizeof(T));
+  __bang_write_value(dx_nram, elem_count, (T)0);
+
+  int dy_n_offset = shape_seg.h * shape_seg.w * dy_full.c;
+  int dy_h_offset = shape_seg.w * dy_full.c;
+  int dy_w_offset = dy_full.c;
+  int dy_c_offset = 1;
+  int dx_n_offset = shape_seg.h * shape_seg.w * dx_full.c;
+  int dx_h_offset = shape_seg.w * dx_full.c;
+  int dx_w_offset = dx_full.c;
+  int dx_c_offset = 1;
+  int h_feature = dy_full.h;
+  int w_feature = dy_full.w;
+
+  int dy_start = 0;
+  int dx_start = 0;
+  for (int nidx = 0; nidx < shape_seg.n; ++nidx) {
+    for (int hidx = 0; hidx < shape_seg.h; ++hidx) {
+      for (int widx = 0; widx < shape_seg.w; ++widx) {
+        int h_abs = hidx + position.h_start;
+        int w_abs = widx + position.w_start;
+        int dy_offset = dy_start;
+        int dx_offset = dx_start;
+        dy_offset += hidx * dy_h_offset + widx * dy_w_offset;
+        dx_offset += hidx * dx_h_offset + widx * dx_w_offset;
+
+        const int hstart = half_h_mask - h_abs > 0 ? half_h_mask - h_abs : 0;
+        const int hend = h_feature + half_h_mask - h_abs < h_mask
+                             ? h_feature + half_h_mask - h_abs
+                             : h_mask;
+        const int wstart = half_w_mask - w_abs > 0 ? half_w_mask - w_abs : 0;
+        const int wend = w_feature + half_w_mask - w_abs < w_mask
+                             ? w_feature + half_w_mask - w_abs
+                             : w_mask;
+        // (h,                       w                      ) with mask-indexed
+        // (h + h_abs - half_h_mask, w + w_abs - half_w_mask) with
+        // feature-indexed
+        dy_offset += ((hstart + h_abs - half_h_mask) * w_feature + wstart +
+                      w_abs - half_w_mask) *
+                     dy_c_offset;
+        dx_offset += (hstart * w_mask + wstart) * dx_c_offset;
+        int count = wend - wstart;
+        __memcpy(dx_nram + dx_offset, dy_nram + dy_offset, count * sizeof(T),
+                 NRAM2NRAM, dx_c_offset * w_mask * sizeof(T),
+                 dy_c_offset * w_feature * sizeof(T), hend - hstart - 1);
+      }
+    }
+    dy_start += dy_n_offset;
+    dx_start += dx_n_offset;
+  }
+  storeDataFromNramToDram(dx_dram, dx_nram, position, dx_full);
+}
+
+template <typename T>
+__mlu_func__ void psamaskDistributeBackward(
+    const T *dy_dram, T *dx_dram, const PositionInCore &position,
+    const Shape &dy_full, const Shape &dx_full, const Shape &shape_seg,
+    const int h_mask, const int w_mask, const int half_h_mask,
+    const int half_w_mask) {
+  // load dy from dram to nram
+  T *dy_nram_temp = (T *)buf;
+  int dy_n_offset_full = dy_full.h * dy_full.w * dy_full.c;
+  int dy_c_offset_full = 1;
+  int h_feature = dy_full.h;
+  int w_feature = dy_full.w;
+  int align_c =
+      CEIL_ALIGN(shape_seg.h * shape_seg.w, COMPUTE_COUNT_ALIGN / sizeof(T));
+  int align_hw =
+      CEIL_ALIGN(h_feature * w_feature, COMPUTE_COUNT_ALIGN / sizeof(T));
+
+  int dy_dram_start =
+      position.n_start * dy_n_offset_full +
+      (position.h_start * w_feature + position.w_start) * dy_c_offset_full;
+  int dy_nram_start = 0;
+  for (int i = 0; i < shape_seg.n; ++i) {
+    int dy_nram_offset = dy_nram_start + i * (align_hw * align_c);
+    int dy_dram_offset = dy_dram_start + i * dy_n_offset_full;
+    __memcpy(dy_nram_temp + dy_nram_offset, dy_dram + dy_dram_offset,
+             shape_seg.h * shape_seg.w * sizeof(T), GDRAM2NRAM,
+             align_c * sizeof(T), dy_full.c * sizeof(T),
+             h_feature * w_feature - 1);
+  }
+  T *dy_nram = dy_nram_temp + shape_seg.n * align_hw * align_c;
+  Shape dy_seg{shape_seg.n, h_feature, w_feature, shape_seg.h * shape_seg.w};
+  transposeData(dy_nram, dy_nram_temp, dy_seg);
+  swap(align_c, align_hw);
+
+  // fill zeros to dx
+  T *dx_nram = dy_nram + shape_seg.n * align_hw * align_c;
+  int dx_size = shape_seg.n * shape_seg.h * shape_seg.w * dx_full.c;
+  __bang_write_value(dx_nram, CEIL_ALIGN(dx_size, NFU_ALIGN_SIZE / sizeof(T)),
+                     (T)0);
+
+  int dy_n_offset_seg = align_hw * align_c;
+  int dy_h_offset_seg = shape_seg.w * align_c;
+  int dy_w_offset_seg = align_c;
+  int dy_c_offset_seg = 1;
+  int dx_n_offset_seg = shape_seg.h * shape_seg.w * shape_seg.c;
+  int dx_h_offset_seg = shape_seg.w * shape_seg.c;
+  int dx_w_offset_seg = shape_seg.c;
+  int dx_c_offset_seg = 1;
+
+  int dy_start = 0;
+  int dx_start = 0;
+  for (int nidx = 0; nidx < shape_seg.n; ++nidx) {
+    for (int hidx = 0; hidx < shape_seg.h; ++hidx) {
+      for (int widx = 0; widx < shape_seg.w; ++widx) {
+        int h_abs = hidx + position.h_start;
+        int w_abs = widx + position.w_start;
+        int dy_offset = dy_start;
+        int dx_offset = dx_start;
+        dy_offset += hidx * dy_h_offset_seg + widx * dy_w_offset_seg;
+        dx_offset += hidx * dx_h_offset_seg + widx * dx_w_offset_seg;
+        const int hstart = half_h_mask - h_abs > 0 ? half_h_mask - h_abs : 0;
+        const int hend = h_feature + half_h_mask - h_abs < h_mask
+                             ? h_feature + half_h_mask - h_abs
+                             : h_mask;
+        const int wstart = half_w_mask - w_abs > 0 ? half_w_mask - w_abs : 0;
+        const int wend = w_feature + half_w_mask - w_abs < w_mask
+                             ? w_feature + half_w_mask - w_abs
+                             : w_mask;
+        // (h,                       w                      ) with mask-indexed
+        // (h + h_abs - half_h_mask, w + w_abs - half_w_mask) with
+        // feature-indexed
+        dy_offset += ((hstart + h_abs - half_h_mask) * w_feature + wstart +
+                      w_abs - half_w_mask) *
+                     dy_c_offset_seg;
+        dx_offset += (hstart * w_mask + wstart) * dx_c_offset_seg;
+        int count = wend - wstart;
+        __memcpy(dx_nram + dx_offset, dy_nram + dy_offset, count * sizeof(T),
+                 NRAM2NRAM, w_mask * dx_c_offset_seg * sizeof(T),
+                 w_feature * dy_c_offset_seg * sizeof(T), hend - hstart - 1);
+      }
+    }
+    dy_start += dy_n_offset_seg;
+    dx_start += dx_n_offset_seg;
+  }
+  storeDataFromNramToDram(dx_dram, dx_nram, position, dx_full);
+}
+
+template <typename T>
+__mlu_func__ void psamaskBase(const T *input_dram, T *output_dram,
+                              const Shape &input_full, const Shape &output_full,
+                              LimitParam &limit, const PsamaskType psa_type,
+                              const DimPartitionType core_partition,
+                              const DimPartitionType cluster_partition,
+                              const bool is_forward, const int h_mask,
+                              const int w_mask, const int half_h_mask,
+                              const int half_w_mask, const int n_per_core,
+                              const int h_per_core, const int n_per_cluster,
+                              const int h_per_cluster) {
+  PositionInCore position_full;
+  PositionInCore position_seg;
+  position_full.w_start = 0;
+  position_full.w_end = output_full.w;
+  int n_num_in_cluster = n_per_cluster;
+  int h_num_in_cluster = h_per_cluster;
+
+  switch (cluster_partition) {
+    case PARTITION_N: {
+      position_full.h_start = 0;
+      position_full.h_end = input_full.h;
+      position_full.n_start = taskIdY * n_per_cluster;
+      int cluster_need = (input_full.n + n_per_cluster - 1) / n_per_cluster;
+      if (taskIdY >= cluster_need) return;
+      int n_remainder = input_full.n - (cluster_need - 1) * n_per_cluster;
+      n_num_in_cluster =
+          (taskIdY == cluster_need - 1) ? n_remainder : n_per_cluster;
+      position_full.n_end = position_full.n_start + n_num_in_cluster;
+    }; break;
+    case PARTITION_H: {
+      position_full.n_start = 0;
+      position_full.n_end = input_full.n;
+      position_full.h_start = taskIdY * h_per_cluster;
+      int cluster_need = (input_full.h + h_per_cluster - 1) / h_per_cluster;
+      if (taskIdY >= cluster_need) return;
+      int h_remainder = input_full.h - (cluster_need - 1) * h_per_cluster;
+      h_num_in_cluster =
+          (taskIdY == cluster_need - 1) ? h_remainder : h_per_cluster;
+      position_full.h_end = position_full.h_start + h_num_in_cluster;
+    }; break;
+  }
+  switch (core_partition) {
+    case PARTITION_N: {
+      position_full.n_start += taskIdX * n_per_core;
+      int core_need = (n_num_in_cluster + n_per_core - 1) / n_per_core;
+      if (taskIdX >= core_need) return;
+      int n_remainder = n_num_in_cluster - (core_need - 1) * n_per_core;
+      position_full.n_end =
+          position_full.n_start +
+          ((taskIdX == core_need - 1) ? n_remainder : n_per_core);
+    }; break;
+    case PARTITION_H: {
+      position_full.h_start += taskIdX * h_per_core;
+      int core_need = (h_num_in_cluster + h_per_core - 1) / h_per_core;
+      if (taskIdX >= core_need) return;
+      int h_remainder = h_num_in_cluster - (core_need - 1) * h_per_core;
+      position_full.h_end =
+          position_full.h_start +
+          ((taskIdX == core_need - 1) ? h_remainder : h_per_core);
+    }; break;
+  }
+  // the count of n ,h and w need to be processed in the current core
+  int shape_core_n = position_full.n_end - position_full.n_start;
+  int shape_core_h = position_full.h_end - position_full.h_start;
+  int shape_core_w = input_full.w;
+
+  limit.n = limit.n < shape_core_n ? limit.n : shape_core_n;
+  limit.h = limit.h < shape_core_h ? limit.h : shape_core_h;
+  limit.w = limit.w < shape_core_w ? limit.w : shape_core_w;
+
+  // load the data to nram according to the limit
+  for (int nidx = position_full.n_start; nidx < position_full.n_end;
+       nidx += limit.n) {
+    position_seg.n_start = nidx;
+    position_seg.n_end =
+        position_seg.n_start + (position_full.n_end - nidx < limit.n
+                                    ? position_full.n_end - nidx
+                                    : limit.n);
+    for (int hidx = position_full.h_start; hidx < position_full.h_end;
+         hidx += limit.h) {
+      position_seg.h_start = hidx;
+      position_seg.h_end =
+          position_seg.h_start + (position_full.h_end - hidx < limit.h
+                                      ? position_full.h_end - hidx
+                                      : limit.h);
+      for (int widx = position_full.w_start; widx < position_full.w_end;
+           widx += limit.w) {
+        position_seg.w_start = widx;
+        position_seg.w_end =
+            position_seg.w_start + (position_full.w_end - widx < limit.w
+                                        ? position_full.w_end - widx
+                                        : limit.w);
+
+        // record the segment of output except the size of channel
+        // channel segments of output and input are the same
+        Shape shape_seg;
+        shape_seg.n = position_seg.n_end - position_seg.n_start;
+        shape_seg.h = position_seg.h_end - position_seg.h_start;
+        shape_seg.w = position_seg.w_end - position_seg.w_start;
+        shape_seg.c = output_full.c;
+
+        switch (psa_type) {
+          case COLLECT: {
+            if (is_forward) {
+              psamaskCollectForward(input_dram, output_dram, position_seg,
+                                    input_full, output_full, shape_seg, h_mask,
+                                    w_mask, half_h_mask, half_w_mask);
+            } else {
+              psamaskCollectBackward(input_dram, output_dram, position_seg,
+                                     input_full, output_full, shape_seg, h_mask,
+                                     w_mask, half_h_mask, half_w_mask);
+            }
+          } break;
+          case DISTRIBUTE: {
+            if (is_forward) {
+              psamaskDistributeForward(input_dram, output_dram, position_seg,
+                                       input_full, output_full, shape_seg,
+                                       h_mask, w_mask, half_h_mask,
+                                       half_w_mask);
+            } else {
+              psamaskDistributeBackward(input_dram, output_dram, position_seg,
+                                        input_full, output_full, shape_seg,
+                                        h_mask, w_mask, half_h_mask,
+                                        half_w_mask);
+            }
+          } break;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelPsamaskForward(
+    const T *x, T *y, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int x_c, const int y_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg) {
+  if (coreId == 0x80) {
+    return;
+  }
+  Shape x_full, y_full;
+  x_full.n = batch;
+  x_full.h = h_feature;
+  x_full.w = w_feature;
+  x_full.c = x_c;
+  y_full.n = batch;
+  y_full.h = h_feature;
+  y_full.w = w_feature;
+  y_full.c = y_c;
+
+  LimitParam limit;
+  limit.n = limit_n_seg;
+  limit.h = limit_h_seg;
+  limit.w = limit_w_seg;
+
+  psamaskBase(x, y, x_full, y_full, limit, psa_type, core_partition,
+              cluster_partition, true, h_mask, w_mask, half_h_mask, half_w_mask,
+              n_per_core, h_per_core, n_per_cluster, h_per_cluster);
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelPsamaskBackward(
+    const T *dy, T *dx, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int dx_c, const int dy_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg) {
+  if (coreId == 0x80) {
+    return;
+  }
+  Shape dy_full, dx_full;
+  dx_full.n = batch;
+  dx_full.h = h_feature;
+  dx_full.w = w_feature;
+  dx_full.c = dx_c;
+  dy_full.n = batch;
+  dy_full.h = h_feature;
+  dy_full.w = w_feature;
+  dy_full.c = dy_c;
+
+  LimitParam limit;
+  limit.n = limit_n_seg;
+  limit.h = limit_h_seg;
+  limit.w = limit_w_seg;
+
+  psamaskBase(dy, dx, dy_full, dx_full, limit, psa_type, core_partition,
+              cluster_partition, false, h_mask, w_mask, half_h_mask,
+              half_w_mask, n_per_core, h_per_core, n_per_cluster,
+              h_per_cluster);
+}
+
+void KernelPsamaskForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *x, void *y, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int x_c, const int y_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg) {
+  MLUUnion1KernelPsamaskForward<<<k_dim, k_type, queue>>>(
+      static_cast<const float *>(x), static_cast<float *>(y), psa_type,
+      core_partition, cluster_partition, batch, h_feature, w_feature, h_mask,
+      w_mask, x_c, y_c, half_h_mask, half_w_mask, n_per_core, h_per_core,
+      n_per_cluster, h_per_cluster, limit_n_seg, limit_h_seg, limit_w_seg);
+}
+
+void KernelPsamaskBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *dy, void *dx, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int dx_c, const int dy_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg) {
+  MLUUnion1KernelPsamaskBackward<<<k_dim, k_type, queue>>>(
+      static_cast<const float *>(dy), static_cast<float *>(dx), psa_type,
+      core_partition, cluster_partition, batch, h_feature, w_feature, h_mask,
+      w_mask, dx_c, dy_c, half_h_mask, half_w_mask, n_per_core, h_per_core,
+      n_per_cluster, h_per_cluster, limit_n_seg, limit_h_seg, limit_w_seg);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..30ec388494615842528b74da0661e169b08a545e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/psamask_utils.hpp
@@ -0,0 +1,55 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef PSAMASK_UTILS_HPP_
+#define PSAMASK_UTILS_HPP_
+
+typedef enum {
+  COLLECT = 0,
+  DISTRIBUTE = 1,
+} PsamaskType;
+
+typedef enum {
+  PARTITION_N = 0,
+  PARTITION_H = 1,
+} DimPartitionType;
+
+struct PartitionSeg {
+  int h_per_cluster;
+  int n_per_cluster;
+  int h_per_core;
+  int n_per_core;
+  DimPartitionType cluster_partition;
+  DimPartitionType core_partition;
+};
+
+struct Shape {
+  int n;
+  int h;
+  int w;
+  int c;
+};
+
+struct LimitParam {
+  int n;
+  int h;
+  int w;
+};
+
+struct PositionInCore {
+  int n_start;
+  int n_end;
+  int h_start;
+  int h_end;
+  int w_start;
+  int w_end;
+};
+#endif  // PSAMASK_UTILS_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..c99176ab20a59bfc4643637604452890ebae6df4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_mlu_kernel.mlu
@@ -0,0 +1,493 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+
+#define ROI_OFFSET 5
+
+__nram__ char buffer[MAX_NRAM_SIZE];
+
+namespace forward {
+template <typename T>
+__mlu_func__ void bilinearInterpolate(const int input_height,
+                                      const int input_width, T y, T x, T *w1,
+                                      T *w2, T *w3, T *w4, int *x_low,
+                                      int *x_high, int *y_low, int *y_high,
+                                      bool *empty) {
+  // deal with cases that inverse elements are of feature map boundary
+  if (y < -1.0 || y > input_height || x < -1.0 || x > input_width) {
+    *empty = true;
+    return;
+  }
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  int y_low_ = int(y);
+  int x_low_ = int(x);
+
+  if (y_low_ >= input_height - 1) {
+    *y_high = y_low_ = input_height - 1;
+    y = (T)y_low_;
+  } else {
+    *y_high = y_low_ + 1;
+  }
+
+  if (x_low_ >= input_width - 1) {
+    *x_high = x_low_ = input_width - 1;
+    x = T(x_low_);
+  } else {
+    *x_high = x_low_ + 1;
+  }
+
+  *y_low = y_low_;
+  *x_low = x_low_;
+
+  T ly = y - y_low_;
+  T lx = x - x_low_;
+  T hy = 1.0 - ly;
+  T hx = 1.0 - lx;
+  *w1 = hy * hx, *w2 = hy * lx, *w3 = ly * hx, *w4 = ly * lx;
+  return;
+}
+
+template <typename T>
+__mlu_func__ void computeChannel(T *input_core, T *nram_in, T *output_core,
+                                 T *nram_out, const int roi_bin_grid_h,
+                                 const int roi_bin_grid_w, const T roi_start_h,
+                                 const T roi_start_w, const int ph,
+                                 const int pw, const T bin_size_h,
+                                 const T bin_size_w, const float count,
+                                 const int input_height, const int input_width,
+                                 const int channels, const int cyc_num,
+                                 const int max_elements) {
+  int cyc_channel = max_elements;
+
+  for (int i = 0; i < cyc_num; i++) {
+    int real_channel =
+        (i == cyc_num - 1) ? channels - i * cyc_channel : cyc_channel;
+    int align_channel = PAD_UP(real_channel, NFU_ALIGN_SIZE / sizeof(T));
+    __bang_write_zero(nram_out, align_channel);
+    uint32_t real_size = real_channel * sizeof(T);
+
+    int iy, ix;
+    for (iy = 0; iy < roi_bin_grid_h; iy++) {
+      // 1. compute the coordinates of the y axis in the current roi_bin_grid_h
+      T y = roi_start_h + ph * bin_size_h +
+            (T)(iy + 0.5) * bin_size_h / (T)(roi_bin_grid_h);
+      for (ix = 0; ix < roi_bin_grid_w; ix++) {
+        // 2. compute the coordinates of the x axis in the current
+        //    roi_bin_grid_w
+        T x = roi_start_w + pw * bin_size_w +
+              (T)(ix + 0.5) * bin_size_w / (T)(roi_bin_grid_w);
+
+        // 3. compute the four weights (w1, w2, w3 and w4), the height (y_low
+        //    and y_high) and weight (x_low and x_high) of input feature map in
+        //    the current roi bin grid, and the flag (empty) which shows if x, y
+        //    are out of input feature map ranges
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+        bool empty = false;
+
+        bilinearInterpolate(input_height, input_width, y, x, &w1, &w2, &w3, &w4,
+                            &x_low, &x_high, &y_low, &y_high, &empty);
+
+        // 4. compute interpolation of the current roi bin grid
+        //    tmp_cyc1, temp_cyc2, tmp_cyc3 and tmp_cyc4 store the input values
+        //    to compute the interpolation, and then reused to compute
+        //    the argmax_x and argmax_y.
+        T *tmp_cyc1 = nram_in + cyc_channel;
+        T *tmp_cyc2 = nram_in + cyc_channel * 2;
+        T *tmp_cyc3 = nram_in + cyc_channel * 3;
+        T *tmp_cyc4 = nram_in + cyc_channel * 4;
+
+        if (empty) {  // exits abnormal values
+          __bang_write_zero(nram_in, align_channel);
+        } else {
+          __bang_write_zero(nram_in, align_channel);
+          uint32_t offset1 = (y_low * input_width + x_low) * channels;
+          uint32_t offset2 = (y_low * input_width + x_high) * channels;
+          uint32_t offset3 = (y_high * input_width + x_low) * channels;
+          uint32_t offset4 = (y_high * input_width + x_high) * channels;
+          T *input1 = (T *)input_core + offset1 + i * cyc_channel;
+          T *input2 = (T *)input_core + offset2 + i * cyc_channel;
+          T *input3 = (T *)input_core + offset3 + i * cyc_channel;
+          T *input4 = (T *)input_core + offset4 + i * cyc_channel;
+
+          // load the four pixels (p1, p2, p3 and p4) of input feature map to
+          // compute interpolation
+          __memcpy(tmp_cyc1, input1, real_size, GDRAM2NRAM);
+          __memcpy(tmp_cyc2, input2, real_size, GDRAM2NRAM);
+          __memcpy(tmp_cyc3, input3, real_size, GDRAM2NRAM);
+          __memcpy(tmp_cyc4, input4, real_size, GDRAM2NRAM);
+
+          // interpolation value = w1 * p1 + w2 * p2 + w3 * p3 + w4 * p4
+          __bang_mul_scalar(tmp_cyc1, tmp_cyc1, w1, align_channel);
+          __bang_mul_scalar(tmp_cyc2, tmp_cyc2, w2, align_channel);
+          __bang_mul_scalar(tmp_cyc3, tmp_cyc3, w3, align_channel);
+          __bang_mul_scalar(tmp_cyc4, tmp_cyc4, w4, align_channel);
+
+          __bang_add(nram_in, tmp_cyc1, nram_in, align_channel);
+          __bang_add(nram_in, tmp_cyc2, nram_in, align_channel);
+          __bang_add(nram_in, tmp_cyc3, nram_in, align_channel);
+          __bang_add(nram_in, tmp_cyc4, nram_in, align_channel);
+        }
+        // 5. compute sum value and corresponding coordinates of x axis and y
+        //    axis. Update the sum value.
+        __bang_add(nram_out, nram_in, nram_out, align_channel);
+      }  // loop_roi_grid_w
+    }    // loop_roi_grid_h
+    T count_value = (T)(1.0 / count);
+    __bang_mul_scalar(nram_out, nram_out, count_value, align_channel);
+    __memcpy(output_core + i * cyc_channel, nram_out, real_size, NRAM2GDRAM);
+  }  // loop_cyc_num
+}
+
+template <typename T>
+__mlu_func__ void roialignForwardAvg(
+    T *input, T *rois, T *output, const bool aligned, const int channels,
+    const int pooled_height, const int pooled_width, const int input_height,
+    const int input_width, const int sampling_ratio, const T spatial_scale,
+    const int num_rois) {
+  // find limit for channel, the nram space is divided to 6 parts that are
+  // input, 4 weights to compute the interpolation (w1, w2, w3, w4), output
+
+  // max_elements : 300 : float datatype : 27296, half datatype : 54592
+  // max_elements : 200 : float datatype : 16384, half datatype : 32768
+  int max_elements = (PAD_DOWN(MAX_NRAM_SIZE / 6, NFU_ALIGN_SIZE)) / sizeof(T);
+  int cyc_num = channels / max_elements + (int)(channels % max_elements != 0);
+  T offset = aligned ? (T)0.5 : (T)0.0;
+  int task_num = num_rois * pooled_height * pooled_width;
+  T *nram_out = (T *)buffer;
+  T *nram_in = nram_out + max_elements;
+  if (task_num < taskDim) {
+    if (taskId >= task_num) {
+      return;
+    }
+  }
+
+  for (int bin_idx = taskId; bin_idx < task_num; bin_idx = bin_idx + taskDim) {
+    if (bin_idx >= task_num) {
+      return;
+    }
+
+    // (n,ph.pw) is a c in the pooled output
+    int pw = bin_idx % pooled_width;
+    int ph = (bin_idx / pooled_width) % pooled_height;
+    int n = bin_idx / pooled_width / pooled_height;
+
+    T *roi_id_tmp = rois + n * ROI_OFFSET;
+    // 1. compute width and height of roi region.
+    int batch_idx = (int)roi_id_tmp[0];
+    T roi_x1 = roi_id_tmp[1];
+    T roi_y1 = roi_id_tmp[2];
+    T roi_x2 = roi_id_tmp[3];
+    T roi_y2 = roi_id_tmp[4];
+    T roi_start_w = roi_x1 * spatial_scale - offset;
+    T roi_start_h = roi_y1 * spatial_scale - offset;
+    T roi_end_w = roi_x2 * spatial_scale - offset;
+    T roi_end_h = roi_y2 * spatial_scale - offset;
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+
+    if (!aligned) {
+      roi_width = roi_width > (T)(1.0) ? roi_width : (T)(1.0);
+      roi_height = roi_height > (T)(1.0) ? roi_height : (T)(1.0);
+    }
+
+    // 2. compute float-type width and height of roi bin region.
+    T bin_size_w = (T)roi_width / (T)pooled_width;
+    T bin_size_h = (T)roi_height / (T)pooled_height;
+
+    // 3. compute int-type width and height of roi bin region.
+    int roi_bin_grid_h, roi_bin_grid_w;
+    roi_bin_grid_h = (sampling_ratio > 0)
+                         ? sampling_ratio
+                         : int(ceilf(roi_height / pooled_height));
+    roi_bin_grid_w = (sampling_ratio > 0)
+                         ? sampling_ratio
+                         : int(ceilf(roi_width / pooled_width));
+    float count = (float)((roi_bin_grid_h * roi_bin_grid_w) > 1
+                              ? roi_bin_grid_h * roi_bin_grid_w
+                              : 1.0);
+    T *input_core = input + batch_idx * channels * input_width * input_height;
+    T *output_core = output + bin_idx * channels;
+    // 4. compute avg value and corresponding coordinates of x axis and y axis.
+    computeChannel(input_core, nram_in, output_core, nram_out, roi_bin_grid_h,
+                   roi_bin_grid_w, roi_start_h, roi_start_w, ph, pw, bin_size_h,
+                   bin_size_w, count, input_height, input_width, channels,
+                   cyc_num, max_elements);
+  }
+}
+
+__mlu_global__ void MLUUnion1KernelRoiAlignAvg(
+    const void *input, const void *rois, const int channels, const bool aligned,
+    const int pooled_height, const int pooled_width, const int input_height,
+    const int input_width, const int sampling_ratio, const float spatial_scale,
+    const int num_rois, const cnrtDataType_t data_type, void *output) {
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+
+  switch (data_type) {
+    case CNRT_FLOAT16: {
+      roialignForwardAvg((half *)input, (half *)rois, (half *)output, aligned,
+                         channels, pooled_height, pooled_width, input_height,
+                         input_width, sampling_ratio, (half)spatial_scale,
+                         num_rois);
+    }; break;
+    case CNRT_FLOAT32: {
+      roialignForwardAvg((float *)input, (float *)rois, (float *)output,
+                         aligned, channels, pooled_height, pooled_width,
+                         input_height, input_width, sampling_ratio,
+                         (float)spatial_scale, num_rois);
+    }; break;
+    default:
+      break;
+  }
+
+  return;
+}
+}  // namespace forward
+
+namespace backward {
+__mlu_func__ void bilinearInterpolateGradient(int height, int width, float y,
+                                              float x, float *w1, float *w2,
+                                              float *w3, float *w4, int *x_low,
+                                              int *x_high, int *y_low,
+                                              int *y_high) {
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    *w1 = 0.0, *w2 = 0.0, *w3 = 0.0, *w4 = 0.0;
+    *x_low = -1, *x_high = -1, *y_low = -1, *y_high = -1;
+    return;
+  }
+  if (y <= 0) {
+    y = 0;
+  }
+  if (x <= 0) {
+    x = 0;
+  }
+  *y_low = (int)y;
+  *x_low = (int)x;
+  if (*y_low >= height - 1) {
+    *y_high = height - 1, *y_low = height - 1;
+    y = (float)(*y_low);
+  } else {
+    *y_high = *y_low + 1;
+  }
+  if (*x_low >= width - 1) {
+    *x_high = width - 1, *x_low = width - 1;
+    x = (float)(*x_low);
+  } else {
+    *x_high = *x_low + 1;
+  }
+  float ly = y - *y_low, lx = x - *x_low;
+  float hy = 1.0 - ly, hx = 1.0 - lx;
+  *w1 = hy * hx, *w2 = hy * lx, *w3 = ly * hx, *w4 = ly * lx;
+  return;
+}
+
+template <typename T>
+__mlu_func__ void unionRoiAlignBp(
+    T *grads, T *boxes, T *grads_image, const int boxes_num, const int hi,
+    const int wi, const int c, const int no, const int ho, const int wo,
+    const float spatial_scale, const int sampling_ratio, const bool aligned) {
+  int c_align = PAD_UP(c, NFU_ALIGN_SIZE / sizeof(T));
+  int deal_all = boxes_num * hi * wi;
+  int deal_this_core = deal_all / taskDim + (int)(taskId < deal_all % taskDim);
+  for (int i = 0; i < deal_this_core; ++i) {
+    int bhw_id = i * taskDim + taskId;
+    int box_id = bhw_id / (hi * wi);
+    int ih = (bhw_id / wi) % hi;
+    int iw = bhw_id % wi;
+    T *box = boxes + box_id * 5;
+    int image_id = (int)box[0];
+    T *image_offset = grads_image + image_id * ho * wo * c;
+    T *grads_ = grads + box_id * hi * wi * c + ih * wi * c + iw * c;
+
+    float offset = aligned ? 0.5 : 0.0;
+    float x1 = box[1] * spatial_scale - offset;
+    float y1 = box[2] * spatial_scale - offset;
+    float x2 = box[3] * spatial_scale - offset;
+    float y2 = box[4] * spatial_scale - offset;
+    float roi_width = x2 - x1;
+    float roi_height = y2 - y1;
+    if (!aligned) {
+      roi_width = (roi_width > 1.0) ? roi_width : 1.0;
+      roi_height = (roi_height > 1.0) ? roi_height : 1.0;
+    }
+    float bin_size_h = roi_height / hi;
+    float bin_size_w = roi_width / wi;
+
+    int roi_grid_h =
+        (sampling_ratio > 0) ? sampling_ratio : std::ceil(roi_height / hi);
+    int roi_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : std::ceil(roi_width / wi);
+    const T count = roi_grid_h * roi_grid_w;
+    if (c_align * sizeof(T) * 2 <= MAX_NRAM_SIZE) {
+      for (int iy = 0; iy < roi_grid_h; ++iy) {
+        const float y =
+            y1 + ih * bin_size_h + (iy + 0.5) * bin_size_h / roi_grid_h;
+        for (int ix = 0; ix < roi_grid_w; ++ix) {
+          const float x =
+              x1 + iw * bin_size_w + (ix + 0.5) * bin_size_w / roi_grid_w;
+          float w1, w2, w3, w4;
+          int x_low, x_high, y_low, y_high;
+          bilinearInterpolateGradient(ho, wo, y, x, &w1, &w2, &w3, &w4, &x_low,
+                                      &x_high, &y_low, &y_high);
+          if (x_low >= 0 && y_low >= 0) {
+            __memcpy(buffer, grads_, c * sizeof(T), GDRAM2NRAM);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer, (T)w1,
+                              c_align);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer + c_align,
+                              1 / count, c_align);
+            __bang_atomic_add((T *)buffer + c_align,
+                              image_offset + y_low * wo * c + x_low * c,
+                              (T *)buffer + c_align, c);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer, (T)w2,
+                              c_align);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer + c_align,
+                              1 / count, c_align);
+            __bang_atomic_add((T *)buffer + c_align,
+                              image_offset + y_low * wo * c + x_high * c,
+                              (T *)buffer + c_align, c);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer, (T)w3,
+                              c_align);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer + c_align,
+                              1 / count, c_align);
+            __bang_atomic_add((T *)buffer + c_align,
+                              image_offset + y_high * wo * c + x_low * c,
+                              (T *)buffer + c_align, c);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer, (T)w4,
+                              c_align);
+            __bang_mul_scalar((T *)buffer + c_align, (T *)buffer + c_align,
+                              1 / count, c_align);
+            __bang_atomic_add((T *)buffer + c_align,
+                              image_offset + y_high * wo * c + x_high * c,
+                              (T *)buffer + c_align, c);
+          }  // x_low && y_low
+        }    // ix
+      }      // iy
+    } else {
+      for (int iy = 0; iy < roi_grid_h; ++iy) {
+        const float y =
+            y1 + ih * bin_size_h + (iy + 0.5) * bin_size_h / roi_grid_h;
+        for (int ix = 0; ix < roi_grid_w; ++ix) {
+          const float x =
+              x1 + iw * bin_size_w + (ix + 0.5) * bin_size_w / roi_grid_w;
+          float w1, w2, w3, w4;
+          int x_low, x_high, y_low, y_high;
+          bilinearInterpolateGradient(ho, wo, y, x, &w1, &w2, &w3, &w4, &x_low,
+                                      &x_high, &y_low, &y_high);
+          if (x_low >= 0 && y_low >= 0) {
+            int deal_once =
+                PAD_DOWN(MAX_NRAM_SIZE / 2, NFU_ALIGN_SIZE) / sizeof(T);
+            int c_repeat = c / deal_once + (int)(c % deal_once != 0);
+            for (int i = 0; i < c_repeat; ++i) {
+              int deal_c = deal_once;
+              int align_c = deal_once;
+              if (i == c_repeat - 1) {
+                deal_c = c - i * deal_once;
+                align_c = c_align - i * deal_once;
+              }
+              __memcpy(buffer, grads_ + i * deal_once, deal_c * sizeof(T),
+                       GDRAM2NRAM);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer, (T)w1,
+                                align_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer + align_c,
+                                1 / count, align_c);
+              __bang_atomic_add(
+                  (T *)buffer + align_c,
+                  image_offset + y_low * wo * c + x_low * c + i * deal_once,
+                  (T *)buffer + align_c, deal_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer, (T)w2,
+                                align_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer + align_c,
+                                1 / count, align_c);
+              __bang_atomic_add(
+                  (T *)buffer + align_c,
+                  image_offset + y_low * wo * c + x_high * c + i * deal_once,
+                  (T *)buffer + align_c, deal_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer, (T)w3,
+                                align_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer + align_c,
+                                1 / count, align_c);
+              __bang_atomic_add(
+                  (T *)buffer + align_c,
+                  image_offset + y_high * wo * c + x_low * c + i * deal_once,
+                  (T *)buffer + align_c, deal_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer, (T)w4,
+                                align_c);
+              __bang_mul_scalar((T *)buffer + align_c, (T *)buffer + align_c,
+                                1 / count, align_c);
+              __bang_atomic_add(
+                  (T *)buffer + align_c,
+                  image_offset + y_high * wo * c + x_high * c + i * deal_once,
+                  (T *)buffer + align_c, deal_c);
+            }  // for c_repeat
+          }    // x_low >= 0 && y_low >= 0
+        }      // ix
+      }        // iy
+    }          // if c
+  }            // i
+}
+
+__mlu_global__ void MLUUnion1KernelRoiAlignBackward(
+    const void *grads, const void *boxes, void *grads_image,
+    const cnrtDataType_t dtype, const int boxes_num, const int hi, const int wi,
+    const int c, const int no, const int ho, const int wo,
+    const float spatial_scale, const int sampling_ratio, const bool aligned) {
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  switch (dtype) {
+    case CNRT_FLOAT16: {
+      unionRoiAlignBp((half *)grads, (half *)boxes, (half *)grads_image,
+                      boxes_num, hi, wi, c, no, ho, wo, spatial_scale,
+                      sampling_ratio, aligned);
+    }; break;
+    case CNRT_FLOAT32: {
+      unionRoiAlignBp((float *)grads, (float *)boxes, (float *)grads_image,
+                      boxes_num, hi, wi, c, no, ho, wo, spatial_scale,
+                      sampling_ratio, aligned);
+    }; break;
+    default: { return; }
+  }
+}
+}  // namespace backward
+
+void KernelRoiAlign(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                    cnrtQueue_t queue, const cnrtDataType_t d_type,
+                    const void *input, const void *rois, const int channels,
+                    const bool aligned, const int pooled_height,
+                    const int pooled_width, const int input_height,
+                    const int input_width, const int sampling_ratio,
+                    const float spatial_scale, const int num_rois,
+                    void *output) {
+  forward::MLUUnion1KernelRoiAlignAvg<<<k_dim, k_type, queue>>>(
+      input, rois, channels, aligned, pooled_height, pooled_width, input_height,
+      input_width, sampling_ratio, spatial_scale, num_rois, d_type, output);
+}
+
+void KernelRoiAlignBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                            cnrtQueue_t queue, const cnrtDataType_t dtype,
+                            const void *grads, const void *boxes,
+                            void *grads_image, const int boxes_num,
+                            const int hi, const int wi, const int c,
+                            const int no, const int ho, const int wo,
+                            const float spatial_scale, const int sampling_ratio,
+                            const bool aligned) {
+  backward::MLUUnion1KernelRoiAlignBackward<<<k_dim, k_type, queue>>>(
+      grads, boxes, grads_image, dtype, boxes_num, hi, wi, c, no, ho, wo,
+      spatial_scale, sampling_ratio, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..9356776c58f0fa37d6e2d8e8b7b30d2680cfebbc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_mlu_kernel.mlu
@@ -0,0 +1,490 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * OR IMPLIED, INCLUDING BUvoid NOKType LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENvoid SHALL THE AUTHORS OR COPYRIGHKType HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORvoid OR OTHERWISE, ARISING FROM, OUKType OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+#include "roi_align_rotated_utils.hpp"
+
+#define ROI_OFFSET 6
+#define SAMPLING_NUM 4
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void swap(T &a, T &b) {
+  T tmp = a;
+  a = b;
+  b = tmp;
+}
+
+template <typename T>
+__mlu_func__ void bilinearInterpolate(const int input_height,
+                                      const int input_width, T x, T y, T *w1,
+                                      T *w2, T *w3, T *w4, int *x_low,
+                                      int *x_high, int *y_low, int *y_high,
+                                      bool *empty) {
+  // deal with case that the point is out of feature map boundary
+  if (y < -1.0 || y > input_height || x < -1.0 || x > input_width) {
+    *empty = true;
+    return;
+  }
+
+  if (y <= 0) y = (T)0;
+  if (x <= 0) x = (T)0;
+
+  *y_low = int(y);
+  *x_low = int(x);
+
+  if (*y_low >= input_height - 1) {
+    *y_high = *y_low = input_height - 1;
+    y = (T)(*y_low);
+  } else {
+    *y_high = *y_low + 1;
+  }
+
+  if (*x_low >= input_width - 1) {
+    *x_high = *x_low = input_width - 1;
+    x = T(*x_low);
+  } else {
+    *x_high = *x_low + 1;
+  }
+  T ly = y - *y_low;
+  T lx = x - *x_low;
+  T hy = 1.0 - ly;
+  T hx = 1.0 - lx;
+  *w1 = hy * hx;
+  *w2 = hy * lx;
+  *w3 = ly * hx;
+  *w4 = ly * lx;
+  return;
+}
+
+template <typename T>
+__mlu_func__ void getRoiBinInfo(const T *rois_dram, const int bin_i,
+                                const RoiAlignRotatedParams &params,
+                                int *batch_idx, int *roi_n, int *pw, int *ph,
+                                T *roi_center_x, T *roi_center_y, T *roi_width,
+                                T *roi_height, T *theta) {
+  T offset = params.aligned ? (T)0.5 : (T)0.0;
+  *pw = bin_i % params.pooled_width;
+  *ph = (bin_i / params.pooled_width) % params.pooled_height;
+  *roi_n = bin_i / params.pooled_width / params.pooled_height;
+  const T *roi_info = rois_dram + (*roi_n) * ROI_OFFSET;
+  *batch_idx = (int)roi_info[0];
+  *roi_center_x = roi_info[1] * (T)params.spatial_scale - offset;
+  *roi_center_y = roi_info[2] * (T)params.spatial_scale - offset;
+  *roi_width = roi_info[3] * (T)params.spatial_scale;
+  *roi_height = roi_info[4] * (T)params.spatial_scale;
+  *theta = roi_info[5];
+  if (params.clockwise) {
+    *theta = -(*theta);
+  }
+  if (!params.aligned) {
+    *roi_width = *roi_width > (T)1.0 ? *roi_width : (T)1.0;
+    *roi_height = *roi_height > (T)1.0 ? *roi_height : (T)1.0;
+  }
+}
+
+template <typename T>
+__mlu_func__ void roiAlignRotatedForward(const T *input_dram,
+                                         const T *rois_dram, const int batch,
+                                         const int height, const int width,
+                                         const int channel, const int rois_num,
+                                         const RoiAlignRotatedParams &params,
+                                         T *output_dram) {
+  int align_base_128 = NFU_ALIGN_SIZE / sizeof(T);
+  int channel_max_cap = MAX_NRAM_SIZE / sizeof(T) / (2 * SAMPLING_NUM + 1);
+  channel_max_cap = channel_max_cap / align_base_128 * align_base_128;
+  int channel_align = channel < channel_max_cap ? channel : channel_max_cap;
+  channel_align = CEIL_ALIGN(channel_align, align_base_128);
+
+  T *nram_out = (T *)nram_buffer;
+  T *nram_ping = nram_out + channel_align;
+  T *nram_pong = nram_ping + channel_align * SAMPLING_NUM;
+
+  int bin_first = taskId;
+  int bin_end = rois_num * params.pooled_height * params.pooled_width;
+
+  for (int bin_i = bin_first; bin_i < bin_end; bin_i += taskDim) {
+    T roi_center_x, roi_center_y, roi_width, roi_height, theta;
+    int batch_idx, roi_n, pw, ph;
+    getRoiBinInfo(rois_dram, bin_i, params, &batch_idx, &roi_n, &pw, &ph,
+                  &roi_center_x, &roi_center_y, &roi_width, &roi_height,
+                  &theta);
+    T bin_size_h = roi_height / params.pooled_height;
+    T bin_size_w = roi_width / params.pooled_width;
+
+    int roi_bin_grid_h =
+        (params.sample_ratio > 0)
+            ? params.sample_ratio
+            : __float2int_up((float)roi_height / params.pooled_height);
+    int roi_bin_grid_w =
+        (params.sample_ratio > 0)
+            ? params.sample_ratio
+            : __float2int_up((float)roi_width / params.pooled_width);
+    T roi_start_y = -roi_height / 2;
+    T roi_start_x = -roi_width / 2;
+    const int bin_dim = roi_bin_grid_h * roi_bin_grid_w > 1
+                            ? roi_bin_grid_h * roi_bin_grid_w
+                            : 1;
+    T cos_theta = std::cos(theta);
+    T sin_theta = std::sin(theta);
+    T zero_sign = 1.0f / bin_dim;
+
+    bool is_first_sample = true;
+    int src_offset = 0;
+    int dst_offset = 0;
+    int c_rem, c_slice, c_slice_align, pongc_slice, pongc_slice_align;
+    for (int c_offset = 0; c_offset < channel; c_offset += channel_align) {
+      __bang_write_value(nram_out, channel_align, (T)0);
+      c_rem = channel - c_offset;
+      c_slice = channel_align > c_rem ? c_rem : channel_align;
+      c_slice_align = CEIL_ALIGN(c_slice, align_base_128);
+      is_first_sample = true;
+      for (int iy = 0; iy < roi_bin_grid_h; ++iy) {
+        const T yy = roi_start_y + ph * bin_size_h +
+                     T(iy + 0.5) * bin_size_h / roi_bin_grid_h;
+        for (int ix = 0; ix < roi_bin_grid_w; ++ix) {
+          const T xx = roi_start_x + pw * bin_size_w +
+                       T(ix + 0.5) * bin_size_w / roi_bin_grid_w;
+          int sample_i = iy * roi_bin_grid_w + ix;
+
+          T y = yy * cos_theta - xx * sin_theta + roi_center_y;
+          T x = yy * sin_theta + xx * cos_theta + roi_center_x;
+          T w1, w2, w3, w4;
+          bool empty = false;
+          int x_low, x_high, y_low, y_high;
+          bilinearInterpolate(height, width, x, y, &w1, &w2, &w3, &w4, &x_low,
+                              &x_high, &y_low, &y_high, &empty);
+          /*******************************************************
+                 |          ping         |          pong         |
+          |------|-----|-----|-----|-----|-----|-----|-----|-----|
+          |output|  p1 |  p2 |  p3 |  p4 |  p1 |  p2 |  p3 |  p4 |
+          |------|-----|-----|-----|-----|-----|-----|-----|-----|
+          ********************************************************/
+          if (is_first_sample && !empty) {
+            // load input data from dram to nram
+            __bang_write_value(nram_ping, SAMPLING_NUM * c_slice_align, (T)0);
+            src_offset =
+                (batch_idx * height * width + y_low * width + x_low) * channel +
+                c_offset;
+            dst_offset = 0;
+            __memcpy(nram_ping + dst_offset, input_dram + src_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            src_offset = (batch_idx * height * width + y_low * width + x_high) *
+                             channel +
+                         c_offset;
+            dst_offset = c_slice_align;
+            __memcpy(nram_ping + dst_offset, input_dram + src_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            src_offset = (batch_idx * height * width + y_high * width + x_low) *
+                             channel +
+                         c_offset;
+            dst_offset = c_slice_align * 2;
+            __memcpy(nram_ping + dst_offset, input_dram + src_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+            src_offset =
+                (batch_idx * height * width + y_high * width + x_high) *
+                    channel +
+                c_offset;
+            dst_offset = c_slice_align * 3;
+            __memcpy(nram_ping + dst_offset, input_dram + src_offset,
+                     c_slice * sizeof(T), GDRAM2NRAM);
+          }
+          // load next input data to nram
+          if (sample_i + 1 < bin_dim) {
+            int p_iy = (sample_i + 1) / roi_bin_grid_w;
+            int p_ix = (sample_i + 1) % roi_bin_grid_w;
+            const T p_yy = roi_start_y + ph * bin_size_h +
+                           T(p_iy + 0.5) * bin_size_h / roi_bin_grid_h;
+            const T p_xx = roi_start_x + pw * bin_size_w +
+                           T(p_ix + 0.5) * bin_size_w / roi_bin_grid_w;
+            T p_y = p_yy * cos_theta - p_xx * sin_theta + roi_center_y;
+            T p_x = p_yy * sin_theta + p_xx * cos_theta + roi_center_x;
+            T p_w1, p_w2, p_w3, p_w4;
+            bool p_empty = false;
+            int p_x_low, p_x_high, p_y_low, p_y_high;
+            bilinearInterpolate(height, width, p_x, p_y, &p_w1, &p_w2, &p_w3,
+                                &p_w4, &p_x_low, &p_x_high, &p_y_low, &p_y_high,
+                                &p_empty);
+            pongc_slice = c_slice;
+            pongc_slice_align = c_slice_align;
+            if (!p_empty) {
+              __bang_write_value(nram_pong, SAMPLING_NUM * pongc_slice_align,
+                                 (T)0);
+              src_offset =
+                  (batch_idx * height * width + p_y_low * width + p_x_low) *
+                      channel +
+                  c_offset;
+              dst_offset = 0;
+              __memcpy(nram_pong + dst_offset, input_dram + src_offset,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              src_offset =
+                  (batch_idx * height * width + p_y_low * width + p_x_high) *
+                      channel +
+                  c_offset;
+              dst_offset = pongc_slice_align;
+              __memcpy(nram_pong + dst_offset, input_dram + src_offset,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              src_offset =
+                  (batch_idx * height * width + p_y_high * width + p_x_low) *
+                      channel +
+                  c_offset;
+              dst_offset = pongc_slice_align * 2;
+              __memcpy(nram_pong + dst_offset, input_dram + src_offset,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+              src_offset =
+                  (batch_idx * height * width + p_y_high * width + p_x_high) *
+                      channel +
+                  c_offset;
+              dst_offset = pongc_slice_align * 3;
+              __memcpy(nram_pong + dst_offset, input_dram + src_offset,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+            }
+          }
+          T *tmp_sum = nram_ping + 3 * c_slice_align;
+          if (empty) {
+            __bang_write_value(tmp_sum, c_slice_align, T(0));
+          } else {
+            __bang_mul_scalar(nram_ping, nram_ping, w1, c_slice_align);
+            __bang_mul_scalar(nram_ping + c_slice_align,
+                              nram_ping + c_slice_align, w2, c_slice_align);
+            __bang_mul_scalar(nram_ping + 2 * c_slice_align,
+                              nram_ping + 2 * c_slice_align, w3, c_slice_align);
+            __bang_mul_scalar(nram_ping + 3 * c_slice_align,
+                              nram_ping + 3 * c_slice_align, w4, c_slice_align);
+            __bang_sumpool(tmp_sum, nram_ping, c_slice_align, 1, SAMPLING_NUM,
+                           1, SAMPLING_NUM, 1, 1);
+          }
+          __bang_add(nram_out, nram_out, tmp_sum, c_slice_align);
+          swap(nram_ping, nram_pong);
+          __asm__ volatile("sync;");
+          is_first_sample = false;
+        }
+      }
+      __bang_mul_scalar(nram_out, nram_out, zero_sign, c_slice_align);
+      // store the result to dram
+      int output_offset =
+          ((roi_n * params.pooled_height + ph) * params.pooled_width + pw) *
+              channel +
+          c_offset;
+      __memcpy(output_dram + output_offset, nram_out, c_slice * sizeof(T),
+               NRAM2GDRAM);
+    }
+  }
+}
+
+template <typename T>
+__mlu_func__ void roiAlignRotatedBackward(const T *top_grad_dram,
+                                          const T *rois_dram, const int batch,
+                                          const int height, const int width,
+                                          const int channel, const int rois_num,
+                                          const RoiAlignRotatedParams &params,
+                                          T *bottom_grad_dram) {
+  int align_base_128 = NFU_ALIGN_SIZE / sizeof(T);
+  int channel_align = CEIL_ALIGN(channel, align_base_128);
+
+  unsigned int max_element = MAX_NRAM_SIZE / sizeof(T);
+  int c_limit = max_element >> 2;
+  c_limit = c_limit > channel_align ? channel_align : c_limit;
+
+  T *nram_ping = (T *)nram_buffer;
+  T *nram_pong = nram_ping + 2 * c_limit;
+  T *nram_output = nullptr;
+
+  int bin_first = taskId;
+  int bin_end = rois_num * params.pooled_height * params.pooled_width;
+  bool is_first_bin = true;
+  T roi_center_x, roi_center_y, roi_width, roi_height, theta;
+  int batch_idx, roi_n, pw, ph;
+  T pong_roi_center_x, pong_roi_center_y, pong_roi_width, pong_roi_height,
+      pong_theta;
+  int pong_batch_idx, pong_roi_n, pong_pw, pong_ph;
+  for (int bin_i = bin_first; bin_i < bin_end; bin_i += taskDim) {
+    getRoiBinInfo(rois_dram, bin_i, params, &batch_idx, &roi_n, &pw, &ph,
+                  &roi_center_x, &roi_center_y, &roi_width, &roi_height,
+                  &theta);
+    T bin_size_h = roi_height / params.pooled_height;
+    T bin_size_w = roi_width / params.pooled_width;
+
+    int roi_bin_grid_h =
+        (params.sample_ratio > 0)
+            ? params.sample_ratio
+            : __float2int_up((float)roi_height / params.pooled_height);
+    int roi_bin_grid_w =
+        (params.sample_ratio > 0)
+            ? params.sample_ratio
+            : __float2int_up((float)roi_width / params.pooled_width);
+    T roi_start_y = -roi_height / 2;
+    T roi_start_x = -roi_width / 2;
+    const int bin_dim = roi_bin_grid_h * roi_bin_grid_w > 1
+                            ? roi_bin_grid_h * roi_bin_grid_w
+                            : 1;
+    T cos_theta = std::cos(theta);
+    T sin_theta = std::sin(theta);
+    T zero_sign = 1.0f / bin_dim;
+    int c_rem, c_slice, pongc_slice, c_offset;
+    c_rem = channel;
+    c_offset = 0;
+    /****************************************
+    |        ping       |        pong       |
+    |---------|---------|---------|---------|
+    |  input  |  output |  input  |  output |
+    |---------|---------|---------|---------|
+    *****************************************/
+    if (is_first_bin) {
+      // load the first top_grad to nram
+      c_slice = c_limit < c_rem ? c_limit : c_rem;
+      int top_grad_offset =
+          ((roi_n * params.pooled_height + ph) * params.pooled_width + pw) *
+          channel;
+      __memcpy(nram_ping, top_grad_dram + top_grad_offset, c_slice * sizeof(T),
+               GDRAM2NRAM);
+    }
+    nram_output = nram_ping + c_limit;
+    while (c_rem > 0) {
+      c_slice = c_slice < c_rem ? c_slice : c_rem;
+      // load the next top_grad to nram
+      if (c_rem - c_slice > 0) {
+        // load the rest channels to nram
+        pongc_slice = (c_rem - c_slice > c_slice) ? c_slice : c_rem - c_slice;
+        int top_grad_offset =
+            ((roi_n * params.pooled_height + ph) * params.pooled_width + pw) *
+                channel +
+            c_offset + c_slice;
+        __memcpy_async(nram_pong, top_grad_dram + top_grad_offset,
+                       pongc_slice * sizeof(T), GDRAM2NRAM);
+      } else if (bin_i + taskDim < bin_end) {
+        // load next bin's data to nram
+        getRoiBinInfo(rois_dram, bin_i + taskDim, params, &pong_batch_idx,
+                      &pong_roi_n, &pong_pw, &pong_ph, &pong_roi_center_x,
+                      &pong_roi_center_y, &pong_roi_width, &pong_roi_height,
+                      &pong_theta);
+        pongc_slice = c_limit < channel ? c_limit : channel;
+        int top_grad_offset = ((pong_roi_n * params.pooled_height + pong_ph) *
+                                   params.pooled_width +
+                               pong_pw) *
+                              channel;
+        __memcpy_async(nram_pong, top_grad_dram + top_grad_offset,
+                       c_slice * sizeof(T), GDRAM2NRAM);
+      }
+      // comput the output in a single bin
+
+      for (int iy = 0; iy < roi_bin_grid_h; ++iy) {
+        const T yy = roi_start_y + ph * bin_size_h +
+                     T(iy + 0.5) * bin_size_h / roi_bin_grid_h;
+        for (int ix = 0; ix < roi_bin_grid_w; ++ix) {
+          const T xx = roi_start_x + pw * bin_size_w +
+                       T(ix + 0.5) * bin_size_w / roi_bin_grid_w;
+          T y = yy * cos_theta - xx * sin_theta + roi_center_y;
+          T x = yy * sin_theta + xx * cos_theta + roi_center_x;
+          T w1, w2, w3, w4;
+          bool empty = false;
+          int x_low, x_high, y_low, y_high;
+          bilinearInterpolate(height, width, x, y, &w1, &w2, &w3, &w4, &x_low,
+                              &x_high, &y_low, &y_high, &empty);
+          if (empty) {
+            continue;
+          } else {
+            __bang_mul_scalar(nram_output, nram_ping, w1 * zero_sign, c_limit);
+            __bang_atomic_add(
+                (T *)nram_output,
+                bottom_grad_dram + batch_idx * height * width * channel +
+                    y_low * width * channel + x_low * channel + c_offset,
+                (T *)nram_output, c_slice);
+            __bang_mul_scalar(nram_output, nram_ping, w2 * zero_sign, c_limit);
+            __bang_atomic_add(
+                (T *)nram_output,
+                bottom_grad_dram + batch_idx * height * width * channel +
+                    y_low * width * channel + x_high * channel + c_offset,
+                (T *)nram_output, c_slice);
+            __bang_mul_scalar(nram_output, nram_ping, w3 * zero_sign, c_limit);
+            __bang_atomic_add(
+                (T *)nram_output,
+                bottom_grad_dram + batch_idx * height * width * channel +
+                    y_high * width * channel + x_low * channel + c_offset,
+                (T *)nram_output, c_slice);
+            __bang_mul_scalar(nram_output, nram_ping, w4 * zero_sign, c_limit);
+            __bang_atomic_add(
+                (T *)nram_output,
+                bottom_grad_dram + batch_idx * height * width * channel +
+                    y_high * width * channel + x_high * channel + c_offset,
+                (T *)nram_output, c_slice);
+          }
+        }
+      }
+      swap(nram_ping, nram_pong);
+      c_rem -= c_slice;
+      c_offset += c_slice;
+      __asm__ volatile("sync;");
+    }
+    is_first_bin = false;
+  }
+}
+
+__mlu_global__ void MLUUnion1KernelRoiAlignRotatedForward(
+    const void *features, const void *rois, void *output, const int batch,
+    const int height, const int width, const int channel, const int rois_num,
+    const RoiAlignRotatedParams rroiAlignParams,
+    const cnrtDataType_t data_type) {
+  if (0x80 == coreId) {
+    return;
+  }
+
+  if (data_type == CNRT_FLOAT32) {
+    roiAlignRotatedForward((float *)features, (float *)rois, batch, height,
+                           width, channel, rois_num, rroiAlignParams,
+                           (float *)output);
+  } else {
+    roiAlignRotatedForward((half *)features, (half *)rois, batch, height, width,
+                           channel, rois_num, rroiAlignParams, (half *)output);
+  }
+}
+
+__mlu_global__ void MLUUnion1KernelRoiAlignRotatedBackward(
+    const void *top_grad, const void *rois, void *bottom_grad, const int batch,
+    const int height, const int width, const int channel, const int rois_num,
+    const RoiAlignRotatedParams rroiAlignParams,
+    const cnrtDataType_t data_type) {
+  if (0x80 == coreId) {
+    return;
+  }
+
+  if (data_type == CNRT_FLOAT32) {
+    roiAlignRotatedBackward((float *)top_grad, (float *)rois, batch, height,
+                            width, channel, rois_num, rroiAlignParams,
+                            (float *)bottom_grad);
+  } else {
+    roiAlignRotatedBackward((half *)top_grad, (half *)rois, batch, height,
+                            width, channel, rois_num, rroiAlignParams,
+                            (half *)bottom_grad);
+  }
+}
+
+void KernelRoiAlignRotatedForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const void *features, const void *rois,
+    void *output, const int batch, const int height, const int width,
+    const int channel, const int rois_num,
+    const RoiAlignRotatedParams roiAlignRotatedParams) {
+  MLUUnion1KernelRoiAlignRotatedForward<<<k_dim, k_type, queue>>>(
+      features, rois, output, batch, height, width, channel, rois_num,
+      roiAlignRotatedParams, d_type);
+}
+
+void KernelRoiAlignRotatedBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const void *top_grad, const void *rois,
+    void *bottom_grad, const int batch, const int height, const int width,
+    const int channel, const int rois_num,
+    const RoiAlignRotatedParams roiAlignRotatedParams) {
+  MLUUnion1KernelRoiAlignRotatedBackward<<<k_dim, k_type, queue>>>(
+      top_grad, rois, bottom_grad, batch, height, width, channel, rois_num,
+      roiAlignRotatedParams, d_type);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_utils.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_utils.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..cd0ec02484fef395db7d401976d64f9c5ca59622
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_align_rotated_utils.hpp
@@ -0,0 +1,24 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef ROI_ALIGN_ROTATED_UTILS_HPP_
+#define ROI_ALIGN_ROTATED_UTILS_HPP_
+
+struct RoiAlignRotatedParams {
+  int pooled_height;
+  int pooled_width;
+  int sample_ratio;
+  float spatial_scale;
+  bool aligned;
+  bool clockwise;
+};
+
+#endif  // ROI_ALIGN_ROTATED_UTILS_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_pool_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_pool_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..3a6d2d3ba61c2ba87ae9b1fb301c412fea93195c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roi_pool_mlu_kernel.mlu
@@ -0,0 +1,747 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+
+#define ALIGN_SIZE 64
+#define PIPELINE_COMMON_NUM 2
+#define PIPELINE_PINGPONG_NUM 10
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+namespace forward {
+template <typename T>
+__mlu_func__ void getRoiBinInfo(T *input_v, T *rois_v, int bin_i, int height,
+                                int width, int channels, int p_height,
+                                int p_width, T spatial_scale, int *bin_x1,
+                                int *bin_y1, int *bin_x2, int *bin_y2,
+                                int *bin_wdim, int *bin_hdim, int *bin_dims,
+                                T **input_base, bool *is_empty) {
+  int pw = bin_i % p_width;
+  int ph = (bin_i / p_width) % p_height;
+  int roi_n = bin_i / p_width / p_height;
+
+  /*roi*/
+  const T *roi_info = rois_v + roi_n * 5;  // {{batch, x1, y1, x2, y2},,,}
+  int batch_index = (int)roi_info[0];
+  int roi_x1 = round(roi_info[1] * spatial_scale);
+  int roi_y1 = round(roi_info[2] * spatial_scale);
+  int roi_x2 = round(roi_info[3] * spatial_scale);
+  int roi_y2 = round(roi_info[4] * spatial_scale);
+  int roi_w = roi_x2 - roi_x1 + 1 > 1 ? roi_x2 - roi_x1 + 1 : 1;
+  int roi_h = roi_y2 - roi_y1 + 1 > 1 ? roi_y2 - roi_y1 + 1 : 1;
+
+  /*bin*/
+  T bin_w = (T)roi_w / (T)p_width;
+  T bin_h = (T)roi_h / (T)p_height;
+
+  *bin_x1 = (int)floor((T)pw * bin_w) + roi_x1;
+  *bin_x1 = *bin_x1 > 0 ? *bin_x1 : 0;
+  *bin_x1 = *bin_x1 < width ? *bin_x1 : width;
+
+  *bin_y1 = (int)floor((T)ph * bin_h) + roi_y1;
+  *bin_y1 = *bin_y1 > 0 ? *bin_y1 : 0;
+  *bin_y1 = *bin_y1 < height ? *bin_y1 : height;
+
+  *bin_x2 = (int)ceil((T)(pw + 1) * bin_w) + roi_x1;
+  *bin_x2 = *bin_x2 > 0 ? *bin_x2 : 0;
+  *bin_x2 = *bin_x2 < width ? *bin_x2 : width;
+
+  *bin_y2 = (int)ceil((T)(ph + 1) * bin_h) + roi_y1;
+  *bin_y2 = *bin_y2 > 0 ? *bin_y2 : 0;
+  *bin_y2 = *bin_y2 < height ? *bin_y2 : height;
+
+  *input_base = input_v + batch_index * height * width * channels;
+  *bin_wdim = *bin_x2 - *bin_x1;
+  *bin_hdim = *bin_y2 - *bin_y1;
+  *bin_dims = (*bin_hdim) * (*bin_wdim);
+  *is_empty = (*bin_y2 <= *bin_y1) || (*bin_x2 <= *bin_x1);
+}
+
+template <typename T>
+__mlu_func__ void MLUUnion1Roipool(T *input_v, T *rois_v, int batch,
+                                   int channels, int height, int width,
+                                   int p_height, int p_width, int rois_num,
+                                   T spatial_scale, T *output_v, int *argmax) {
+  /*
+   * NRAM partition
+   *  |---------------------------------------------------|
+   *  |                        ping                       |
+   *  |---------------------------------------------------|
+   *  |                        pong                       |
+   *  |---------------------------------------------------|
+   *  |                        out                        |
+   *  |---------------------------------------------------|
+   *  |                        argmax                     |
+   *  |---------------------------------------------------|
+   *  |                        a                          |
+   *  |---------------------------------------------------|
+   *  |                        b                          |
+   *  |---------------------------------------------------|
+   */
+  uint32_t is_half = sizeof(T) == sizeof(half) ? true : false;
+  uint32_t t_size = sizeof(T);
+  uint32_t float_div = NFU_ALIGN_SIZE / sizeof(float);
+  uint32_t half_div = NFU_ALIGN_SIZE / sizeof(half);
+
+  uint32_t channels_align = PAD_UP(channels, float_div);
+  uint32_t nram_limit = PAD_DOWN(
+      (MAX_NRAM_SIZE / sizeof(float) - 4 * channels_align) / 2, half_div);
+
+  // nram PING/PONG, output, argamx, a, b
+  float *nram_ping = (float *)nram_buffer;
+  float *nram_pong = (float *)nram_buffer + nram_limit;
+  float *nram_out = (float *)nram_buffer + 2 * nram_limit;
+  float *nram_argmax = nram_out + channels_align;
+  float *nram_a = nram_out + 2 * channels_align;
+  float *nram_b = nram_out + 3 * channels_align;
+
+  uint32_t c_bins_num = rois_num * p_height * p_width;
+  uint32_t task_bins = c_bins_num / taskDim;
+  uint32_t rem_bins = c_bins_num % taskDim;
+  if (taskId < rem_bins) {
+    task_bins += 1;
+  }
+  int bin_first =
+      (c_bins_num / taskDim) * taskId + (taskId > rem_bins ? rem_bins : taskId);
+  int bins_loop = bin_first + task_bins;
+
+  T *input_base = NULL;
+  T *output_base = output_v + bin_first * channels;
+  int *argmax_base = NULL != argmax ? argmax + bin_first * channels : NULL;
+  int bin_x1, bin_y1, bin_x2, bin_y2, bin_wdim, bin_hdim, bin_dims;
+  int pbin_x1, pbin_y1, pbin_x2, pbin_y2, pbin_wdim, pbin_hdim, pbin_dims;
+  bool is_empty = false;
+  bool pong_is_empty = false;
+  bool is_first_bin = true;
+  uint32_t src_offset = 0;
+  uint32_t dst_offset = 0;
+  uint32_t nram_offset = 0;
+  uint32_t half_offset =
+      is_half ? (nram_limit / 2 / half_div * half_div) * 2 : 0;
+  float *nram_tmp = NULL;
+
+  uint32_t c_slice = 0;
+  uint32_t c_slice_align = 0;
+  uint32_t pongc_slice = 0;
+  uint32_t pongc_slice_align = 0;
+  for (int bin_i = bin_first; bin_i < bins_loop; bin_i++) {
+    getRoiBinInfo((T *)input_v, (T *)rois_v, bin_i, height, width, channels,
+                  p_height, p_width, (T)spatial_scale, &bin_x1, &bin_y1,
+                  &bin_x2, &bin_y2, &bin_wdim, &bin_hdim, &bin_dims,
+                  &input_base, &is_empty);
+    uint32_t c_rem = channels;
+    c_slice = nram_limit / bin_dims / float_div * float_div;
+
+    if (is_first_bin && !is_empty) {
+      c_slice = c_slice > c_rem ? c_rem : c_slice;
+      c_slice_align = PAD_UP(c_slice, float_div);
+      for (int h = bin_y1; h < bin_y2; h++) {
+        src_offset = (h * width + bin_x1) * channels;
+        nram_offset = (h - bin_y1) * bin_wdim * c_slice_align + half_offset;
+        if (c_slice_align == channels) {
+          __memcpy((T *)nram_ping + nram_offset, (T *)input_base + src_offset,
+                   bin_wdim * c_slice * t_size, GDRAM2NRAM);
+        } else {
+          __memcpy((T *)nram_ping + nram_offset, (T *)input_base + src_offset,
+                   c_slice * t_size, GDRAM2NRAM, c_slice_align * t_size,
+                   channels * t_size, bin_wdim - 1);
+        }
+      }
+    }
+    uint32_t c_offset = 0;
+    while (c_rem > 0) {
+      c_slice = c_slice > c_rem ? c_rem : c_slice;
+      c_slice_align = PAD_UP(c_slice, float_div);
+
+      /*__memcpy_async*/
+      if (c_rem - c_slice > 0 && !is_empty) {
+        pongc_slice = c_rem - c_slice > c_slice ? c_slice : c_rem - c_slice;
+        pongc_slice_align = PAD_UP(pongc_slice, float_div);
+        for (int h = bin_y1; h < bin_y2; h++) {
+          src_offset = (h * width + bin_x1) * channels + c_offset;
+          nram_offset =
+              (h - bin_y1) * bin_wdim * pongc_slice_align + half_offset;
+          __memcpy_async((T *)nram_pong + nram_offset,
+                         (T *)input_base + src_offset + c_slice,
+                         pongc_slice * t_size, GDRAM2NRAM,
+                         pongc_slice_align * t_size, channels * t_size,
+                         bin_wdim - 1);
+        }
+      } else if (bin_i + 1 < bins_loop) {
+        getRoiBinInfo((T *)input_v, (T *)rois_v, bin_i + 1, height, width,
+                      channels, p_height, p_width, (T)spatial_scale, &pbin_x1,
+                      &pbin_y1, &pbin_x2, &pbin_y2, &pbin_wdim, &pbin_hdim,
+                      &pbin_dims, &input_base, &pong_is_empty);
+        pongc_slice = PAD_DOWN(nram_limit / pbin_dims, float_div);
+        pongc_slice = pongc_slice > channels ? channels : pongc_slice;
+        pongc_slice_align = PAD_UP(pongc_slice, float_div);
+        if (!pong_is_empty) {
+          for (int h = pbin_y1; h < pbin_y2; h++) {
+            src_offset = (h * width + pbin_x1) * channels;
+            nram_offset =
+                (h - pbin_y1) * pbin_wdim * pongc_slice_align + half_offset;
+            if (pongc_slice_align == channels) {
+              __memcpy_async((T *)nram_pong + nram_offset,
+                             (T *)input_base + src_offset,
+                             pbin_wdim * pongc_slice * t_size, GDRAM2NRAM);
+            } else {
+              __memcpy_async((T *)nram_pong + nram_offset,
+                             (T *)input_base + src_offset, pongc_slice * t_size,
+                             GDRAM2NRAM, pongc_slice_align * t_size,
+                             channels * t_size, pbin_wdim - 1);
+            }
+          }
+        }
+      }
+
+      if (is_empty) {
+        __bang_write_value((T *)nram_out, c_slice_align, (T)0);
+        __memcpy((T *)output_base + dst_offset + c_offset, (T *)nram_out,
+                 c_slice * t_size, NRAM2GDRAM);
+        if (NULL != argmax) {
+          __bang_write_value((int32_t *)nram_out, c_slice_align, (int32_t)(-1));
+          __memcpy((int32_t *)argmax_base + dst_offset + c_offset,
+                   (int32_t *)nram_out, c_slice * sizeof(int32_t), NRAM2GDRAM);
+        }
+      } else {
+        if (is_half) {
+          uint32_t bin_align64 = PAD_UP(bin_dims * c_slice_align, half_div);
+          __bang_half2float((float *)nram_ping, (half *)nram_ping + half_offset,
+                            bin_align64);
+        }
+        __bang_maxpool((float *)nram_out, (float *)nram_ping, c_slice_align,
+                       bin_hdim, bin_wdim, bin_hdim, bin_wdim, 1, 1);
+        if (is_half) {
+          uint32_t c_align64 = PAD_UP(c_slice_align, half_div);
+          __bang_float2half_rd((half *)nram_out, (float *)nram_out, c_align64);
+        }
+        __memcpy((T *)output_base + dst_offset + c_offset, (T *)nram_out,
+                 c_slice * t_size, NRAM2GDRAM);
+        if (NULL != argmax) {
+          /*compute max_index*/
+          __bang_maxpool_index((uint32_t *)nram_out, (float *)nram_ping,
+                               c_slice_align, bin_hdim, bin_wdim, bin_hdim,
+                               bin_wdim, 1, 1);
+          convertInt2Float((float *)nram_argmax, (float *)nram_a,
+                           (int32_t *)nram_out, (float *)nram_b, c_slice_align);
+
+          /*compute input_h*/
+          for (int i = 0; i < c_slice; i++) {
+            nram_out[i] = (float)(((uint32_t *)nram_out)[i] / bin_wdim);
+          }
+          __bang_add_scalar((float *)nram_a, (float *)nram_out, (float)bin_y1,
+                            c_slice_align);
+          __bang_mul_scalar((float *)nram_ping, (float *)nram_a, (float)width,
+                            c_slice_align);
+
+          /*compute input_w*/
+          __bang_mul_scalar((float *)nram_a, (float *)nram_out, (float)bin_wdim,
+                            c_slice_align);
+          __bang_sub((float *)nram_a, (float *)nram_argmax, (float *)nram_a,
+                     c_slice_align);
+          __bang_add_scalar((float *)nram_a, (float *)nram_a, (float)bin_x1,
+                            c_slice_align);
+          __bang_add((float *)nram_out, (float *)nram_ping, (float *)nram_a,
+                     c_slice_align);
+          convertFloat2Int((int32_t *)nram_argmax, (float *)nram_a,
+                           (float *)nram_out, (float *)nram_b, c_slice_align);
+          __memcpy((int32_t *)argmax_base + dst_offset + c_offset,
+                   (int32_t *)nram_argmax, c_slice * sizeof(int32_t),
+                   NRAM2GDRAM);
+        }
+      }
+      nram_tmp = nram_ping;
+      nram_ping = nram_pong;
+      nram_pong = nram_tmp;
+      c_offset += c_slice;
+      c_rem -= c_slice;
+      __asm__ volatile("sync;");
+    }
+    dst_offset += channels;
+    is_first_bin = false;
+  }
+}
+
+__mlu_global__ void MLUKernelRoiPool(cnrtDataType_t data_type,
+                                     const void *input_data,
+                                     const void *input_rois, int batch,
+                                     int channels, int height, int width,
+                                     int pooled_height, int pooled_width,
+                                     int rois_num, float spatial_scale,
+                                     void *output_data, int *argmax) {
+  switch (data_type) {
+    case CNRT_FLOAT16: {
+      MLUUnion1Roipool((half *)input_data, (half *)input_rois, batch, channels,
+                       height, width, pooled_height, pooled_width, rois_num,
+                       (half)spatial_scale, (half *)output_data, argmax);
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1Roipool((float *)input_data, (float *)input_rois, batch,
+                       channels, height, width, pooled_height, pooled_width,
+                       rois_num, (float)spatial_scale, (float *)output_data,
+                       argmax);
+    }; break;
+    default: { break; }
+  }
+}
+}  // namespace forward
+
+namespace backward {
+// Convert index of argmax from global grads_image to local bin in RoI. Vector
+// operations do not support int type, so conversion from int to float is
+// performed here.
+__mlu_func__ void convertIndex(
+    int32_t *nram_argmax, int32_t *nram_argmax_fp, int32_t *nram_argmax_fp_bk1,
+    int32_t *nram_argmax_fp_bk2, int32_t *nram_argmax_int,
+    int32_t *nram_argmax_int_h, int32_t *nram_argmax_int_w,
+    int32_t *nram_argmax_fp_h, int32_t *nram_argmax_fp_w,
+    float *nram_atomic_add, float *nram_grads_image, int width, int height,
+    int wstart, int hstart, int w_compute, int h_compute, int align_c,
+    int channels, int loop_flag, int loop_id, int true_limit) {
+  convertInt2Float((float *)nram_argmax_fp, (float *)nram_argmax_fp_bk1,
+                   (int *)nram_argmax, (float *)nram_argmax_fp_bk2, align_c);
+
+  // This step uses scalar division, because the above vector division causes
+  // rounding accuracy problem.
+  for (int i = 0; i < channels; ++i) {
+    *((float *)nram_argmax_fp + i) = *((float *)nram_argmax_fp + i) / width;
+  }
+
+  // Use 'float2int_tz' to perform '*((int32_t*)nram_argmax + i) / width'
+  // operation.
+  convertFloat2Int((int *)nram_argmax_int_h, (float *)nram_argmax_fp_bk1,
+                   (float *)nram_argmax_fp, (float *)nram_argmax_fp_bk2,
+                   align_c);
+  convertInt2Float((float *)nram_argmax_fp, (float *)nram_argmax_fp_bk1,
+                   (int *)nram_argmax_int_h, (float *)nram_argmax_fp_bk2,
+                   align_c);
+
+  // Perform 'temp_result - hstart' operation
+  __bang_sub_scalar((float *)nram_argmax_fp_h, (float *)nram_argmax_fp, hstart,
+                    align_c);
+
+  // Perform 'temp_result1 - temp_result2 * width' operation
+  __bang_mul_scalar((float *)nram_argmax_fp_w, (float *)nram_argmax_fp, width,
+                    align_c);
+  convertInt2Float((float *)nram_argmax_fp, (float *)nram_argmax_fp_bk1,
+                   (int *)nram_argmax, (float *)nram_argmax_fp_bk2, align_c);
+  __bang_sub((float *)nram_argmax_fp_w, (float *)nram_argmax_fp,
+             (float *)nram_argmax_fp_w, align_c);
+
+  // Perform 'temp_result - wstart' operation
+  __bang_sub_scalar((float *)nram_argmax_fp_w, (float *)nram_argmax_fp_w,
+                    wstart, align_c);
+
+  // Perform 'temp_result = h * w_compute + w' operation
+  __bang_mul_scalar((float *)nram_argmax_fp_h, (float *)nram_argmax_fp_h,
+                    w_compute, align_c);
+  __bang_add((float *)nram_argmax_fp_h, (float *)nram_argmax_fp_h,
+             (float *)nram_argmax_fp_w, align_c);
+
+  if (loop_flag == 1) {
+    __bang_sub_scalar((float *)nram_argmax_fp_h, (float *)nram_argmax_fp_h,
+                      (loop_id * true_limit), align_c);
+  }
+  convertFloat2Int((int *)nram_argmax_int, (float *)nram_argmax_fp_bk1,
+                   (float *)nram_argmax_fp_h, (float *)nram_argmax_fp_bk2,
+                   align_c);
+}
+
+template <typename T>
+__mlu_func__ void MLUUnion1Roipool(const T *rois, const T *grads,
+                                   const int32_t *argmax, T *grads_image,
+                                   int channels, int height, int width,
+                                   int pooled_height, int pooled_width,
+                                   int rois_num, const T spatial_scale,
+                                   int high_precision) {
+  // Calculate the number of rois processed by each core
+  int bin_num = rois_num * pooled_height * pooled_width;
+  int loop =
+      (bin_num % taskDim) ? (bin_num / taskDim + 1) : (bin_num / taskDim);
+  int tid = taskId * loop;
+  if (bin_num % taskDim != 0) {
+    if (tid >= bin_num) {
+      return;
+    } else {
+      // last part is (bin_num - tid).
+      loop = bin_num - tid < loop ? bin_num - tid : loop;
+    }
+  }
+  int align_c = PAD_UP(channels, ALIGN_SIZE);
+  // Common part has 2: grads, argmax; ping-pong each is PIPELINE_PINGPONG_NUM.
+  int data_size =
+      PAD_DOWN(((MAX_NRAM_SIZE / sizeof(float) - PIPELINE_COMMON_NUM * align_c -
+                 (PIPELINE_PINGPONG_NUM - 1) * align_c * 2) /
+                2),
+               ALIGN_SIZE);
+  int hw_limit = data_size / align_c;
+  float *nram_grads = (float *)nram_buffer;
+  for (int idx = tid; idx < tid + loop; ++idx) {
+    // (n, ph, pw) is a C in the pooled output
+    int pw = idx % pooled_width;
+    int ph = (idx / pooled_width) % pooled_height;
+    int n = idx / pooled_width / pooled_height;
+
+    const T *offset_rois = (const T *)(rois + n * 5);
+    int roi_batch_ind = int(offset_rois[0]);
+    // Calculate the roi region on feature maps
+    int roi_start_w = round(offset_rois[1] * spatial_scale);
+    int roi_start_h = round(offset_rois[2] * spatial_scale);
+    int roi_end_w = round(offset_rois[3] * spatial_scale);
+    int roi_end_h = round(offset_rois[4] * spatial_scale);
+    // Force malformed rois to 1x1
+    int roi_width =
+        roi_end_w - roi_start_w + 1 > 1 ? roi_end_w - roi_start_w + 1 : 1;
+    int roi_height =
+        roi_end_h - roi_start_h + 1 > 1 ? roi_end_h - roi_start_h + 1 : 1;
+    T bin_size_h = (T)roi_height / (T)pooled_height;
+    T bin_size_w = (T)roi_width / (T)pooled_width;
+
+    // The corresponding bin region
+    int hstart = int(floor((T)ph * bin_size_h));
+    int wstart = int(floor((T)pw * bin_size_w));
+    int hend = int(ceil((T)(ph + 1) * bin_size_h));
+    int wend = int(ceil((T)(pw + 1) * bin_size_w));
+
+    // Add roi offsets and clip to input boundaries, min(max(A, B), C);
+    hstart = hstart + roi_start_h > 0 ? hstart + roi_start_h : 0;
+    hstart = hstart < height ? hstart : height;
+    hend = hend + roi_start_h > 0 ? hend + roi_start_h : 0;
+    hend = hend < height ? hend : height;
+    wstart = wstart + roi_start_w > 0 ? wstart + roi_start_w : 0;
+    wstart = wstart < width ? wstart : width;
+    wend = wend + roi_start_w > 0 ? wend + roi_start_w : 0;
+    wend = wend < width ? wend : width;
+
+    bool is_empty = (hend <= hstart) || (wend <= wstart);
+    if (!is_empty) {
+      int h_compute = hend - hstart;
+      int w_compute = wend - wstart;
+      int true_limit =
+          hw_limit < h_compute * w_compute ? hw_limit : h_compute * w_compute;
+      int loop_int = (h_compute * w_compute) / true_limit;
+      int rem = (h_compute * w_compute) % true_limit;
+      int32_t *nram_argmax = (int32_t *)nram_grads + align_c;
+      int32_t *nram_argmax_fp = (int32_t *)nram_argmax + align_c;
+      int32_t *nram_argmax_fp_bk1 = (int32_t *)nram_argmax_fp + align_c;
+      int32_t *nram_argmax_fp_bk2 = (int32_t *)nram_argmax_fp_bk1 + align_c;
+      int32_t *nram_argmax_int = (int32_t *)nram_argmax_fp_bk2 + align_c;
+      int32_t *nram_argmax_int_h = (int32_t *)nram_argmax_int + align_c;
+      int32_t *nram_argmax_int_w = (int32_t *)nram_argmax_int_h + align_c;
+      int32_t *nram_argmax_fp_h = (int32_t *)nram_argmax_int_w + align_c;
+      int32_t *nram_argmax_fp_w = (int32_t *)nram_argmax_fp_h + align_c;
+      float *nram_atomic_add = (float *)nram_argmax_fp_w + align_c;
+      float *nram_grads_image = (float *)nram_atomic_add + align_c;
+      if (true_limit == h_compute * w_compute) {
+        /*
+         * NRAM partition
+         *  |---------------------------------------------------|
+         *  |                     grads                         |
+         *  |---------------------------------------------------|
+         *  |                     argmax                        |
+         *  |---------------------------------------------------|
+         *  |                     argmax_temp                   |
+         *  |---------------------------------------------------|
+         *  |                     atomic_add                    |
+         *  |---------------------------------------------------|
+         *  |                     grads_image                   |
+         *  |---------------------------------------------------|
+         */
+
+        // Load the data from GDRAM to NRAM.
+        __memcpy(
+            (T *)nram_grads + align_c * high_precision,
+            (const T *)grads +
+                (n * pooled_height * pooled_width + ph * pooled_width + pw) *
+                    channels,
+            channels * sizeof(T), GDRAM2NRAM);
+        if (high_precision) {
+          __bang_half2float((float *)nram_grads,
+                            (half *)nram_grads + align_c * high_precision,
+                            align_c);
+        }
+
+        __memcpy((int32_t *)nram_argmax, (const int32_t *)argmax +
+                                             (n * pooled_height * pooled_width +
+                                              ph * pooled_width + pw) *
+                                                 channels,
+                 channels * sizeof(int32_t), GDRAM2NRAM);
+
+        // Perform pooling operation on NRAM.
+        convertIndex(nram_argmax, nram_argmax_fp, nram_argmax_fp_bk1,
+                     nram_argmax_fp_bk2, nram_argmax_int, nram_argmax_int_h,
+                     nram_argmax_int_w, nram_argmax_fp_h, nram_argmax_fp_w,
+                     nram_atomic_add, nram_grads_image, width, height, wstart,
+                     hstart, w_compute, h_compute, align_c, channels, 0, 0, 0);
+        __bang_maxpool_bp((float *)nram_grads_image, (float *)nram_grads,
+                          (int32_t *)nram_argmax_int, align_c, h_compute,
+                          w_compute, h_compute, w_compute, h_compute,
+                          w_compute);
+        if (high_precision) {
+          __bang_float2half_rd((half *)nram_grads_image,
+                               (float *)nram_grads_image,
+                               h_compute * w_compute * align_c);
+        }
+
+        // Store the result on NRAM back to GDRAM.
+        for (int hc = 0; hc < h_compute; ++hc) {
+          for (int wc = 0; wc < w_compute; ++wc) {
+            T *dst = (T *)nram_atomic_add;
+            int grad_image_offset = (roi_batch_ind * height * width +
+                                     (hc + hstart) * width + wc + wstart) *
+                                    channels;
+            T *src1 = (T *)grads_image + grad_image_offset;
+            int nram_grads_image_offset = (hc * w_compute + wc) * align_c;
+            T *src2 = (T *)nram_grads_image + nram_grads_image_offset;
+            __bang_atomic_add(dst, src1, src2, channels);
+          }
+        }
+      } else if (true_limit > 0) {
+        /*
+         * NRAM partition
+         *  |---------------------------------------------------|
+         *  |                     grads                         |
+         *  |---------------------------------------------------|
+         *  |                     argmax                        |
+         *  |--------------------ping_pong----------------------|
+         *  |       argmax_temp      |       argmax_temp        |
+         *  |------------------------|--------------------------|
+         *  |       atomic_add       |       atomic_add         |
+         *  |------------------------|--------------------------|
+         *  |       grads_image      |       grads_image        |
+         *  |---------------------------------------------------|
+         */
+
+        // Load the data from GDRAM to NRAM.
+        __memcpy(
+            (T *)nram_grads + align_c * high_precision,
+            (const T *)grads +
+                (n * pooled_height * pooled_width + ph * pooled_width + pw) *
+                    channels,
+            channels * sizeof(T), GDRAM2NRAM);
+        if (high_precision) {
+          __bang_half2float((float *)nram_grads,
+                            (half *)nram_grads + align_c * high_precision,
+                            align_c);
+        }
+        __memcpy((int32_t *)nram_argmax, (const int32_t *)argmax +
+                                             (n * pooled_height * pooled_width +
+                                              ph * pooled_width + pw) *
+                                                 channels,
+                 channels * sizeof(int32_t), GDRAM2NRAM);
+
+        int ping_pong = 0;
+        int ping_pong_offset =
+            (MAX_NRAM_SIZE / sizeof(float) - align_c * PIPELINE_COMMON_NUM) / 2;
+        for (int loop_id = 0; loop_id <= loop_int; ++loop_id) {
+          int size = (loop_id == loop_int) ? rem : true_limit;
+          if (size == 0) {
+            break;
+          }
+          // Perform pooling operation on NRAM.
+          nram_argmax_fp =
+              (int32_t *)nram_argmax + align_c + ping_pong * ping_pong_offset;
+          nram_argmax_fp_bk1 = (int32_t *)nram_argmax_fp + align_c;
+          nram_argmax_fp_bk2 = (int32_t *)nram_argmax_fp_bk1 + align_c;
+          nram_argmax_int = (int32_t *)nram_argmax_fp_bk2 + align_c;
+          nram_argmax_int_h = (int32_t *)nram_argmax_int + align_c;
+          nram_argmax_int_w = (int32_t *)nram_argmax_int_h + align_c;
+          nram_argmax_fp_h = (int32_t *)nram_argmax_int_w + align_c;
+          nram_argmax_fp_w = (int32_t *)nram_argmax_fp_h + align_c;
+          nram_atomic_add = (float *)nram_argmax_fp_w + align_c;
+          nram_grads_image = (float *)nram_atomic_add + align_c;
+          int loop_id_1 = loop_id;
+          int size_1 = ((loop_id_1) == loop_int) ? rem : true_limit;
+          if (size_1 == 0) {
+            break;
+          }
+          convertIndex(nram_argmax, nram_argmax_fp, nram_argmax_fp_bk1,
+                       nram_argmax_fp_bk2, nram_argmax_int, nram_argmax_int_h,
+                       nram_argmax_int_w, nram_argmax_fp_h, nram_argmax_fp_w,
+                       nram_atomic_add, nram_grads_image, width, height, wstart,
+                       hstart, w_compute, h_compute, align_c, channels, 1,
+                       loop_id_1, true_limit);
+          __bang_maxpool_bp((float *)nram_grads_image, (float *)nram_grads,
+                            (int32_t *)nram_argmax_int, align_c, size_1, 1,
+                            size_1, 1, size_1, 1);
+          if (high_precision) {
+            __bang_float2half_rd((half *)nram_grads_image,
+                                 (float *)nram_grads_image, size_1 * align_c);
+          }
+
+          // Store the result on NRAM back to GDRAM.
+          for (int index_size = 0; index_size < size; ++index_size) {
+            int h = (loop_id * true_limit + index_size) / w_compute;
+            int w = (loop_id * true_limit + index_size) % w_compute;
+            T *dst = (T *)nram_atomic_add;
+            T *grads_image_n =
+                (T *)grads_image + roi_batch_ind * height * width * channels;
+            T *src1 = (T *)grads_image_n +
+                      ((h + hstart) * width + (w + wstart)) * channels;
+            T *src2 = (T *)nram_grads_image + index_size * align_c;
+            __bang_atomic_add(dst, src1, src2, channels);
+          }
+          ping_pong = 1 - ping_pong;
+        }
+      } else {
+        /*
+         * NRAM partition
+         *  |---------------------------------------------------|
+         *  |                     grads                         |
+         *  |---------------------------------------------------|
+         *  |                     argmax                        |
+         *  |--------------------ping_pong----------------------|
+         *  |       argmax_temp      |       argmax_temp        |
+         *  |------------------------|--------------------------|
+         *  |       atomic_add       |       atomic_add         |
+         *  |------------------------|--------------------------|
+         *  |       grads_image      |       grads_image        |
+         *  |---------------------------------------------------|
+         */
+
+        int c_limit =
+            PAD_DOWN(MAX_NRAM_SIZE / sizeof(float) /
+                         (PIPELINE_COMMON_NUM + PIPELINE_PINGPONG_NUM * 2),
+                     ALIGN_SIZE);
+        int loop_int = channels / c_limit;
+        int rem = channels % c_limit;
+        int ping_pong = 0;
+        int ping_pong_offset =
+            (MAX_NRAM_SIZE / sizeof(float) - c_limit * PIPELINE_COMMON_NUM) / 2;
+        for (int loop_id = 0; loop_id <= loop_int; ++loop_id) {
+          int size = (loop_id == loop_int) ? rem : c_limit;
+          if (size == 0) {
+            break;
+          }
+          nram_argmax_fp =
+              (int32_t *)nram_argmax + c_limit + ping_pong * ping_pong_offset;
+          nram_argmax_fp_bk1 = (int32_t *)nram_argmax_fp + c_limit;
+          nram_argmax_fp_bk2 = (int32_t *)nram_argmax_fp_bk1 + c_limit;
+          nram_argmax_int = (int32_t *)nram_argmax_fp_bk2 + c_limit;
+          nram_argmax_int_h = (int32_t *)nram_argmax_int + c_limit;
+          nram_argmax_int_w = (int32_t *)nram_argmax_int_h + c_limit;
+          nram_argmax_fp_h = (int32_t *)nram_argmax_int_w + c_limit;
+          nram_argmax_fp_w = (int32_t *)nram_argmax_fp_h + c_limit;
+          nram_atomic_add = (float *)nram_argmax_fp_w + c_limit;
+          nram_grads_image = (float *)nram_atomic_add + c_limit;
+
+          // This pipeline loads the data from GDRAM to NRAM.
+          __memcpy((T *)nram_grads + c_limit * high_precision,
+                   (const T *)grads +
+                       n * pooled_height * pooled_width * channels +
+                       ph * pooled_width * channels + pw * channels +
+                       loop_id * c_limit,
+                   size * sizeof(T), GDRAM2NRAM);
+          if (high_precision) {
+            __bang_half2float((float *)nram_grads,
+                              (half *)nram_grads + c_limit * high_precision,
+                              c_limit);
+          }
+          __memcpy((int32_t *)nram_argmax,
+                   (const int32_t *)argmax +
+                       n * pooled_height * pooled_width * channels +
+                       ph * pooled_width * channels + pw * channels +
+                       loop_id * c_limit,
+                   size * sizeof(int32_t), GDRAM2NRAM);
+
+          for (int hc = 0; hc < h_compute; ++hc) {
+            for (int wc = 0; wc < w_compute; ++wc) {
+              // This pipeline performs pooling operation on NRAM.
+              convertIndex(
+                  nram_argmax, nram_argmax_fp, nram_argmax_fp_bk1,
+                  nram_argmax_fp_bk2, nram_argmax_int, nram_argmax_int_h,
+                  nram_argmax_int_w, nram_argmax_fp_h, nram_argmax_fp_w,
+                  nram_atomic_add, nram_grads_image, width, height, wstart + wc,
+                  hstart + hc, h_compute, w_compute, c_limit, size, 0, 0, 0);
+              __bang_maxpool_bp((float *)nram_grads_image, (float *)nram_grads,
+                                (int32_t *)nram_argmax_int, c_limit, 1, 1, 1, 1,
+                                1, 1);
+              if (high_precision) {
+                __bang_float2half_rd((half *)nram_grads_image,
+                                     (float *)nram_grads_image, c_limit);
+              }
+              // This pipeline stores the result on NRAM back to GDRAM.
+              T *dst = (T *)nram_atomic_add;
+              T *grads_image_n =
+                  (T *)grads_image + roi_batch_ind * height * width * channels;
+              T *src1 = (T *)grads_image_n +
+                        ((hc + hstart) * width + (wc + wstart)) * channels +
+                        loop_id * c_limit;
+              T *src2 = (T *)nram_grads_image;
+              __bang_atomic_add(dst, src1, src2, size);
+            }
+          }
+          ping_pong = 1 - ping_pong;
+        }
+      }
+    }
+  }
+}
+
+__mlu_global__ void MLUKernelRoiPoolBackward(
+    const void *grads, const void *rois, const int *argmax, void *grads_image,
+    int rois_num, int pooled_height, int pooled_width, int channels, int no,
+    int height, int width, const float spatial_scale,
+    const cnrtDataType_t k_dtype) {
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  switch (k_dtype) {
+    case CNRT_FLOAT16: {
+      // Using the float type '__bang_max_pool_bp' instruction to increase the
+      // bit width.
+      const int high_precision = 1;
+      MLUUnion1Roipool((const half *)rois, (const half *)grads,
+                       (const int32_t *)argmax, (half *)grads_image, channels,
+                       height, width, pooled_height, pooled_width, rois_num,
+                       (const half)spatial_scale, high_precision);
+    }; break;
+    case CNRT_FLOAT32: {
+      const int high_precision = 0;
+      MLUUnion1Roipool((const float *)rois, (const float *)grads,
+                       (const int32_t *)argmax, (float *)grads_image, channels,
+                       height, width, pooled_height, pooled_width, rois_num,
+                       (const float)spatial_scale, high_precision);
+    }; break;
+    default: { break; }
+  }
+}
+}  // namespace backward
+
+void KernelRoiPoolForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t data_type,
+                          const void *input_data, const void *input_rois,
+                          const int batch, const int channels, const int height,
+                          const int width, const int pooled_height,
+                          const int pooled_width, const int rois_num,
+                          const float spatial_scale, void *output_data,
+                          int *argmax) {
+  forward::MLUKernelRoiPool<<<k_dim, k_type, queue>>>(
+      data_type, input_data, input_rois, batch, channels, height, width,
+      pooled_height, pooled_width, rois_num, spatial_scale, output_data,
+      argmax);
+}
+
+void KernelRoiPoolBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                           cnrtQueue_t queue, cnrtDataType_t k_dtype,
+                           const void *grad_output_ptr, const void *rois_ptr,
+                           const int *argmax_ptr, void *grad_input_ptr,
+                           const int box_num, const int pooled_height,
+                           const int pooled_width, const int channels,
+                           const int batch, const int height, const int width,
+                           const float spatial_scale) {
+  backward::MLUKernelRoiPoolBackward<<<k_dim, k_type, queue>>>(
+      grad_output_ptr, rois_ptr, argmax_ptr, grad_input_ptr, box_num,
+      pooled_height, pooled_width, channels, batch, height, width,
+      spatial_scale, k_dtype);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roiaware_pool3d_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roiaware_pool3d_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..4c1edf0bf53a58e71789f6db91a1ac2917f7ee10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roiaware_pool3d_mlu_kernel.mlu
@@ -0,0 +1,747 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "common_mlu_helper.hpp"
+
+#define ROI_OFFSET 7
+#define FLOAT_NRAM_BUFFER_NUM 14
+#define HALF_NRAM_BUFFER_NUM 25
+#define ALIGN_NUM 64
+
+__nram__ char data_nram[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelPtsIdxOfVoxels(
+    const int pool_method, const int boxes_num, const int pts_num,
+    const int max_pts_each_voxel, const int out_x, const int out_y,
+    const int out_z, const T *rois, const T *pts, int *pts_idx_of_voxels) {
+  // params (T)rois: (boxes_num, 7)
+  // params (T)pts: (3, pts_num)
+  // params (int)pts_idx_of_voxels: (boxes_num, out_x, out_y, out_z,
+  // max_pts_each_voxel)
+
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  int nram_pts_num = 0;
+  if (sizeof(T) == sizeof(float)) {
+    nram_pts_num = PAD_DOWN(
+        (MAX_NRAM_SIZE / sizeof(float) / FLOAT_NRAM_BUFFER_NUM), ALIGN_NUM);
+  } else {
+    nram_pts_num = PAD_DOWN(
+        (MAX_NRAM_SIZE / sizeof(half) / HALF_NRAM_BUFFER_NUM), ALIGN_NUM);
+  }
+
+  char *X = NULL;
+  char *Y = NULL;
+  char *Z = NULL;
+  char *local_X = NULL;
+  char *local_Y = NULL;
+  char *local_Z = NULL;
+  char *nram_pts_in_flag = NULL;
+  float *temp_buffer1 = NULL;
+  float *temp_buffer2 = NULL;
+  float *temp_buffer3 = NULL;
+  float *temp_buffer4 = NULL;
+  float *temp_buffer5 = NULL;
+  float *nram_voxel_offset = NULL;
+  int *nram_pts_idx_seq = NULL;
+  float *fp_local_X = NULL;
+  float *fp_local_Y = NULL;
+  float *fp_local_Z = NULL;
+  float *fp_nram_pts_in_flag = NULL;
+  if (sizeof(T) == sizeof(float)) {
+    X = (char *)((float *)data_nram);
+    Y = (char *)((float *)data_nram + nram_pts_num);
+    Z = (char *)((float *)data_nram + nram_pts_num * 2);
+    local_X = (char *)((float *)data_nram + nram_pts_num * 3);
+    local_Y = (char *)((float *)data_nram + nram_pts_num * 4);
+    local_Z = (char *)((float *)data_nram + nram_pts_num * 5);
+    nram_pts_in_flag = (char *)((float *)data_nram + nram_pts_num * 6);
+    temp_buffer1 = (float *)data_nram + nram_pts_num * 7;
+    temp_buffer2 = (float *)data_nram + nram_pts_num * 8;
+    temp_buffer3 = (float *)data_nram + nram_pts_num * 9;
+    temp_buffer4 = (float *)data_nram + nram_pts_num * 10;
+    temp_buffer5 = (float *)data_nram + nram_pts_num * 11;
+    nram_voxel_offset = (float *)data_nram + nram_pts_num * 12;
+    nram_pts_idx_seq = (int *)((float *)data_nram + nram_pts_num * 13);
+    fp_local_X = (float *)local_X;
+    fp_local_Y = (float *)local_Y;
+    fp_local_Z = (float *)local_Z;
+    fp_nram_pts_in_flag = (float *)nram_pts_in_flag;
+  } else {
+    X = (char *)((half *)data_nram);
+    Y = (char *)((half *)data_nram + nram_pts_num);
+    Z = (char *)((half *)data_nram + nram_pts_num * 2);
+    local_X = (char *)((half *)data_nram + nram_pts_num * 4);
+    local_Y = (char *)((half *)data_nram + nram_pts_num * 6);
+    local_Z = (char *)((half *)data_nram + nram_pts_num * 8);
+    nram_pts_in_flag = (char *)((half *)data_nram + nram_pts_num * 10);
+    temp_buffer1 = (float *)((half *)data_nram + nram_pts_num * 11);
+    temp_buffer2 = (float *)((half *)data_nram + nram_pts_num * 13);
+    temp_buffer3 = (float *)((half *)data_nram + nram_pts_num * 15);
+    temp_buffer4 = (float *)((half *)data_nram + nram_pts_num * 17);
+    temp_buffer5 = (float *)((half *)data_nram + nram_pts_num * 19);
+    nram_voxel_offset = (float *)((half *)data_nram + nram_pts_num * 21);
+    nram_pts_idx_seq = (int *)((half *)data_nram + nram_pts_num * 23);
+    fp_local_X = (float *)((half *)local_X - nram_pts_num);
+    fp_local_Y = (float *)((half *)local_Y - nram_pts_num);
+    fp_local_Z = (float *)((half *)local_Z - nram_pts_num);
+    fp_nram_pts_in_flag = (float *)((half *)nram_pts_in_flag - nram_pts_num);
+  }
+
+  for (int i = 0; i < nram_pts_num; i++) {
+    nram_pts_idx_seq[i] = i;
+  }
+
+  int nram_pts_loop_times = pts_num / nram_pts_num;
+  int rem_nram_num = pts_num % nram_pts_num;
+
+  for (int roi_index = taskId; roi_index < boxes_num; roi_index += taskDim) {
+    const T *cur_roi = rois + roi_index * ROI_OFFSET;
+    T cx = cur_roi[0];
+    T cy = cur_roi[1];
+    T cz = cur_roi[2];
+    T dx = cur_roi[3];
+    T dy = cur_roi[4];
+    T dz = cur_roi[5];
+    T rz = cur_roi[6];
+
+    T dx_2 = dx / 2.0;
+    T dy_2 = dy / 2.0;
+    T dz_2 = dz / 2.0;
+
+    for (int loop_idx = 0; loop_idx <= nram_pts_loop_times; loop_idx++) {
+      int load_pts_num =
+          (loop_idx == nram_pts_loop_times) ? rem_nram_num : nram_pts_num;
+      if (load_pts_num == 0) {
+        break;
+      }
+      int pts_offset_cur_loop = nram_pts_num * loop_idx;
+      int compute_pts_num = (loop_idx == nram_pts_loop_times)
+                                ? PAD_UP(rem_nram_num, ALIGN_NUM)
+                                : nram_pts_num;
+      // load pts
+      __memcpy((void *)X, (T *)pts + pts_offset_cur_loop,
+               load_pts_num * sizeof(T), GDRAM2NRAM);
+      __memcpy((void *)Y, (T *)pts + pts_num + pts_offset_cur_loop,
+               load_pts_num * sizeof(T), GDRAM2NRAM);
+      __memcpy((void *)Z, (T *)pts + pts_num * 2 + pts_offset_cur_loop,
+               load_pts_num * sizeof(T), GDRAM2NRAM);
+      // fabs(local_z)
+      __bang_sub_scalar((T *)local_Z, (T *)Z, (T)cz, compute_pts_num);
+      __bang_sub_scalar((T *)temp_buffer1, (T *)Z, (T)(cz + dz_2),
+                        compute_pts_num);
+      __bang_active_abs((T *)temp_buffer1, (T *)temp_buffer1, compute_pts_num);
+#if __BANG_ARCH__ >= 322
+      __bang_le_scalar((T *)nram_pts_in_flag, (T *)temp_buffer1, (T)(dz_2),
+                       compute_pts_num);
+#else
+      __bang_write_value((void *)temp_buffer2, compute_pts_num, (T)(dz_2));
+      __bang_le((T *)nram_pts_in_flag, (T *)temp_buffer1, (T *)temp_buffer2,
+                compute_pts_num);
+#endif
+      T cosa = std::cos(-rz);
+      T sina = std::sin(-rz);
+      __bang_sub_scalar((T *)temp_buffer3, (T *)X, (T)cx, compute_pts_num);
+      __bang_sub_scalar((T *)temp_buffer4, (T *)Y, (T)cy, compute_pts_num);
+      __bang_mul_scalar((T *)temp_buffer1, (T *)temp_buffer3, (T)cosa,
+                        compute_pts_num);
+      __bang_mul_scalar((T *)temp_buffer2, (T *)temp_buffer4, (T)sina,
+                        compute_pts_num);
+      // local_x
+      __bang_sub((T *)local_X, (T *)temp_buffer1, (T *)temp_buffer2,
+                 compute_pts_num);
+      // fabs(local_x)
+      __bang_active_abs((T *)temp_buffer1, (T *)local_X, compute_pts_num);
+      // fabs(local_x) < dx/2 ? 1 : 0
+#if __BANG_ARCH__ >= 322
+      __bang_lt_scalar((T *)temp_buffer1, (T *)temp_buffer1, (T)(dx_2),
+                       compute_pts_num);
+#else
+      __bang_write_value((void *)temp_buffer2, compute_pts_num, (T)(dx_2));
+      __bang_lt((T *)temp_buffer1, (T *)temp_buffer1, (T *)temp_buffer2,
+                compute_pts_num);
+#endif
+      __bang_and((T *)nram_pts_in_flag, (T *)nram_pts_in_flag,
+                 (T *)temp_buffer1,
+                 compute_pts_num);  // flush res
+
+      __bang_mul_scalar((T *)temp_buffer1, (T *)temp_buffer3, (T)sina,
+                        compute_pts_num);
+      __bang_mul_scalar((T *)temp_buffer2, (T *)temp_buffer4, (T)cosa,
+                        compute_pts_num);
+      // local_y
+      __bang_add((T *)local_Y, (T *)temp_buffer1, (T *)temp_buffer2,
+                 compute_pts_num);
+      // fabs(local_y)
+      __bang_active_abs((T *)temp_buffer1, (T *)local_Y, compute_pts_num);
+      // fabs(local_y) < dy/2 ? 1 : 0
+#if __BANG_ARCH__ >= 322
+      __bang_lt_scalar((T *)temp_buffer1, (T *)temp_buffer1, (T)(dy_2),
+                       compute_pts_num);
+#else
+      __bang_write_value((void *)temp_buffer2, compute_pts_num, (T)(dy_2));
+      __bang_lt((T *)temp_buffer1, (T *)temp_buffer1, (T *)temp_buffer2,
+                compute_pts_num);
+#endif
+      __bang_and((T *)nram_pts_in_flag, (T *)nram_pts_in_flag,
+                 (T *)temp_buffer1,
+                 compute_pts_num);  // flush res
+      T x_res = dx / out_x;
+      T y_res = dy / out_y;
+      T z_res = dz / out_z;
+      __bang_add_scalar((T *)local_X, (T *)local_X, (T)(dx_2), compute_pts_num);
+      __bang_add_scalar((T *)local_Y, (T *)local_Y, (T)(dy_2), compute_pts_num);
+      // local_Z do not need to add dz/2.0
+
+#if (__BANG_ARCH__ >= 322) && (__BANG_ARCH__ != 372)
+      __bang_div((T *)local_X, (T *)local_X, (T)x_res, compute_pts_num);
+      __bang_div((T *)local_Y, (T *)local_Y, (T)y_res, compute_pts_num);
+      __bang_div((T *)local_Z, (T *)local_Z, (T)z_res, compute_pts_num);
+#else
+      __bang_mul_scalar((T *)local_X, (T *)local_X, (T)(1 / x_res),
+                        compute_pts_num);
+      __bang_mul_scalar((T *)local_Y, (T *)local_Y, (T)(1 / y_res),
+                        compute_pts_num);
+      __bang_mul_scalar((T *)local_Z, (T *)local_Z, (T)(1 / z_res),
+                        compute_pts_num);
+#endif
+      // float = float2int + int2float, half = half2int + int2float
+      if (sizeof(T) == sizeof(float)) {
+#if __BANG_ARCH__ >= 322
+        __bang_float2int32_tz((int *)temp_buffer1, (float *)local_X,
+                              compute_pts_num, 0);
+        __bang_float2int32_tz((int *)temp_buffer2, (float *)local_Y,
+                              compute_pts_num, 0);
+        __bang_float2int32_tz((int *)temp_buffer3, (float *)local_Z,
+                              compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_X, (int *)temp_buffer1,
+                              compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_Y, (int *)temp_buffer2,
+                              compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_Z, (int *)temp_buffer3,
+                              compute_pts_num, 0);
+#else
+        convertFloat2Int((int *)temp_buffer1, (float *)temp_buffer2,
+                         (float *)fp_local_X, (float *)temp_buffer3,
+                         compute_pts_num);
+        convertFloat2Int((int *)temp_buffer2, (float *)temp_buffer3,
+                         (float *)fp_local_Y, (float *)temp_buffer4,
+                         compute_pts_num);
+        convertFloat2Int((int *)temp_buffer3, (float *)temp_buffer4,
+                         (float *)fp_local_Z, (float *)temp_buffer5,
+                         compute_pts_num);
+        convertInt2Float((float *)fp_local_X, (float *)temp_buffer4,
+                         (int *)temp_buffer1, (float *)temp_buffer5,
+                         compute_pts_num);
+        convertInt2Float((float *)fp_local_Y, (float *)temp_buffer4,
+                         (int *)temp_buffer2, (float *)temp_buffer5,
+                         compute_pts_num);
+        convertInt2Float((float *)fp_local_Z, (float *)temp_buffer4,
+                         (int *)temp_buffer3, (float *)temp_buffer5,
+                         compute_pts_num);
+#endif
+      } else {
+        __bang_half2float((float *)temp_buffer4, (half *)nram_pts_in_flag,
+                          compute_pts_num);
+        __bang_move((void *)fp_nram_pts_in_flag, (void *)temp_buffer4,
+                    compute_pts_num * sizeof(float));
+#if __BANG_ARCH__ >= 322
+        __bang_half2int32_tz((int *)temp_buffer1, (half *)local_X,
+                             compute_pts_num, 0);
+        __bang_half2int32_tz((int *)temp_buffer2, (half *)local_Y,
+                             compute_pts_num, 0);
+        __bang_half2int32_tz((int *)temp_buffer3, (half *)local_Z,
+                             compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_X, (int *)temp_buffer1,
+                              compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_Y, (int *)temp_buffer2,
+                              compute_pts_num, 0);
+        __bang_int322float_rn((float *)fp_local_Z, (int *)temp_buffer3,
+                              compute_pts_num, 0);
+#else
+        __bang_half2int16_tz((int16_t *)temp_buffer1, (half *)local_X,
+                             compute_pts_num, 0);
+        __bang_half2int16_tz((int16_t *)temp_buffer2, (half *)local_Y,
+                             compute_pts_num, 0);
+        __bang_half2int16_tz((int16_t *)temp_buffer3, (half *)local_Z,
+                             compute_pts_num, 0);
+        __bang_int162float((float *)fp_local_X, (int16_t *)temp_buffer1,
+                           compute_pts_num, 0);
+        __bang_int162float((float *)fp_local_Y, (int16_t *)temp_buffer2,
+                           compute_pts_num, 0);
+        __bang_int162float((float *)fp_local_Z, (int16_t *)temp_buffer3,
+                           compute_pts_num, 0);
+#endif
+      }
+      // process index >= 0
+      __bang_write_value((float *)temp_buffer4, compute_pts_num, (float)0.0f);
+      __bang_maxequal((float *)fp_local_X, (float *)fp_local_X,
+                      (float *)temp_buffer4, compute_pts_num);
+      __bang_maxequal((float *)fp_local_Y, (float *)fp_local_Y,
+                      (float *)temp_buffer4, compute_pts_num);
+      __bang_maxequal((float *)fp_local_Z, (float *)fp_local_Z,
+                      (float *)temp_buffer4, compute_pts_num);
+      // process index <= （out_x - 1)
+      __bang_write_value((float *)temp_buffer5, compute_pts_num,
+                         (float)(out_x - 1));
+      __bang_minequal((float *)fp_local_X, (float *)fp_local_X,
+                      (float *)temp_buffer5, compute_pts_num);
+      __bang_write_value((float *)temp_buffer5, compute_pts_num,
+                         (float)(out_y - 1));
+      __bang_minequal((float *)fp_local_Y, (float *)fp_local_Y,
+                      (float *)temp_buffer5, compute_pts_num);
+      __bang_write_value((float *)temp_buffer5, compute_pts_num,
+                         (float)(out_z - 1));
+      __bang_minequal((float *)fp_local_Z, (float *)fp_local_Z,
+                      (float *)temp_buffer5, compute_pts_num);
+      __bang_mul_scalar((float *)temp_buffer1, (float *)fp_local_X,
+                        (float)(out_y * out_z), compute_pts_num);
+      __bang_mul_scalar((float *)temp_buffer2, (float *)fp_local_Y,
+                        (float)out_z, compute_pts_num);
+      __bang_mul_scalar((float *)temp_buffer3, (float *)fp_local_Z, (float)1.0,
+                        compute_pts_num);
+      __bang_add((float *)nram_voxel_offset, (float *)temp_buffer1,
+                 (float *)temp_buffer2, compute_pts_num);
+      __bang_add((float *)nram_voxel_offset, (float *)nram_voxel_offset,
+                 (float *)temp_buffer3, compute_pts_num);
+      __bang_mul_scalar((float *)nram_voxel_offset, (float *)nram_voxel_offset,
+                        (float)max_pts_each_voxel, compute_pts_num);
+      if (compute_pts_num != load_pts_num) {
+        __memset_nram((float *)fp_nram_pts_in_flag + load_pts_num,
+                      compute_pts_num - load_pts_num, (float)0.0);
+      }
+      __bang_collect((float *)temp_buffer4, (float *)nram_pts_idx_seq,
+                     (float *)fp_nram_pts_in_flag, compute_pts_num);
+      int pts_num_in_cur_roi =
+          (int)__bang_count((float *)fp_nram_pts_in_flag, compute_pts_num);
+      int *pts_idx_cur_voxels =
+          (int *)pts_idx_of_voxels +
+          roi_index * out_x * out_y * out_z * max_pts_each_voxel;
+      for (int idx = 0; idx < pts_num_in_cur_roi; idx++) {
+        int cur_pts_idx = *((int *)temp_buffer4 + idx);
+        int offset = (int)(*((float *)nram_voxel_offset + cur_pts_idx));
+        int cnt = pts_idx_cur_voxels[offset];
+        if (cnt < max_pts_each_voxel - 1) {
+          pts_idx_cur_voxels[offset + cnt + 1] =
+              cur_pts_idx + loop_idx * nram_pts_num;
+          pts_idx_cur_voxels[offset]++;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelRoiawarePool3dForward(
+    const int pool_method, const int boxes_num, const int pts_num,
+    const int channels, const int max_pts_each_voxel, const int out_x,
+    const int out_y, const int out_z, const T *pts_feature,
+    const int *pts_idx_of_voxels, T *pooled_features, int *argmax) {
+  // params (T)pts_feature: (channels, pts_num)
+  // params (int)pts_idx_of_voxels: (boxes_num, out_x, out_y, out_z,
+  // max_pts_each_voxel) params (int)argmax: (boxes_num, out_x, out_y, out_z,
+  // channels) params (T)pooled_features: (boxes_num, out_x, out_y, out_z,
+  // channels)
+
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  int align_num = NFU_ALIGN_SIZE / sizeof(T);
+  int align_max_pts_each_voxel = PAD_UP(max_pts_each_voxel, align_num);
+  int nram_channels_limit =
+      PAD_DOWN((MAX_NRAM_SIZE - 128 -
+                align_max_pts_each_voxel * (sizeof(int) + sizeof(T))) /
+                   ((align_max_pts_each_voxel + 1) * sizeof(T) + sizeof(int)),
+               align_num);
+  int *nram_pts_idx_cur_voxel = (int *)data_nram;
+  // nram_pts_idx_cur_voxel [align_max_pts_each_voxel]
+  T *nram_max_pts_feature_tmp =
+      (T *)((int *)nram_pts_idx_cur_voxel + align_max_pts_each_voxel);
+  // nram_max_pts_feature_tmp [align_max_pts_each_voxel]
+  T *nram_pts_feature_in_voxel =
+      ((T *)nram_max_pts_feature_tmp + align_max_pts_each_voxel);
+  // nram_pts_feature_in_voxel [nram_channels_limit, align_max_pts_each_voxel]
+  T *nram_pooled_features_cur_voxel =
+      ((T *)nram_pts_feature_in_voxel +
+       nram_channels_limit * align_max_pts_each_voxel);
+  // nram_pooled_features_cur_voxel [nram_channels_limit]
+  int *nram_argmax_cur_voxel =
+      (int *)((T *)nram_pooled_features_cur_voxel + nram_channels_limit);
+  // nram_argmax_cur_voxel [nram_channels_limit]
+  char *one_pooled_feature =
+      (char *)((int *)nram_argmax_cur_voxel + nram_channels_limit);
+  // one_pooled_feature [128]
+  int channels_loop_times = channels / nram_channels_limit;
+  int rem_channels = channels % nram_channels_limit;
+  for (int voxel_index = taskId;
+       voxel_index < boxes_num * out_x * out_y * out_z;
+       voxel_index += taskDim) {
+    int *pts_idx_cur_voxels =
+        (int *)pts_idx_of_voxels + voxel_index * max_pts_each_voxel;
+    __memcpy((void *)nram_pts_idx_cur_voxel, (void *)pts_idx_cur_voxels,
+             max_pts_each_voxel * sizeof(int), GDRAM2NRAM);
+    int pts_num_cur_voxel = nram_pts_idx_cur_voxel[0];
+    if (pts_num_cur_voxel == 0) {
+      continue;
+    }
+    for (int channels_loop_idx = 0; channels_loop_idx <= channels_loop_times;
+         channels_loop_idx++) {
+      int actual_channels_num = (channels_loop_idx == channels_loop_times)
+                                    ? rem_channels
+                                    : nram_channels_limit;
+      if (actual_channels_num == 0) {
+        break;
+      }
+      int channels_offset = nram_channels_limit * channels_loop_idx;
+
+#if ((__BANG_ARCH__ >= 200) && (__BANG_ARCH__ < 300))
+      int compute_channels_num = (channels_loop_idx == channels_loop_times)
+                                     ? PAD_UP(rem_channels, align_num)
+                                     : nram_channels_limit;
+      if (pool_method == 0) {
+        __bang_write_value((void *)nram_pts_feature_in_voxel,
+                           compute_channels_num * align_max_pts_each_voxel,
+                           (T)-INFINITY);
+      }
+#endif
+
+      T *pts_feature_cur_loop = (T *)pts_feature + channels_offset * pts_num;
+      for (int idx = 0; idx < pts_num_cur_voxel; idx++) {
+        __memcpy((T *)nram_pts_feature_in_voxel + idx,
+                 (T *)pts_feature_cur_loop + nram_pts_idx_cur_voxel[idx + 1],
+                 sizeof(T), GDRAM2NRAM, align_max_pts_each_voxel * sizeof(T),
+                 pts_num * sizeof(T), actual_channels_num - 1);
+      }
+      for (int channel_idx = 0; channel_idx < actual_channels_num;
+           channel_idx++) {
+        if (pool_method == 0) {
+#if __BANG_ARCH__ >= 322
+          __bang_argmax((T *)one_pooled_feature,
+                        (T *)nram_pts_feature_in_voxel +
+                            channel_idx * align_max_pts_each_voxel,
+                        pts_num_cur_voxel);
+          T max_val = ((T *)one_pooled_feature)[0];
+          int max_idx = (int)(*(uint32_t *)((T *)one_pooled_feature + 1));
+          nram_pooled_features_cur_voxel[channel_idx] =
+              (max_val == -INFINITY) ? 0 : max_val;
+          nram_argmax_cur_voxel[channel_idx] =
+              (max_val == -INFINITY) ? -1 : nram_pts_idx_cur_voxel[max_idx + 1];
+#else
+          // __bang_max need align num on mlu200 series
+          if (sizeof(T) == sizeof(float)) {
+            __bang_max((float *)one_pooled_feature,
+                       (float *)nram_pts_feature_in_voxel +
+                           channel_idx * align_max_pts_each_voxel,
+                       align_max_pts_each_voxel);
+            float max_val = ((float *)one_pooled_feature)[0];
+            __bang_write_value((void *)nram_max_pts_feature_tmp,
+                               align_max_pts_each_voxel, (float)max_val);
+            __bang_eq((float *)nram_max_pts_feature_tmp,
+                      (float *)nram_pts_feature_in_voxel +
+                          channel_idx * align_max_pts_each_voxel,
+                      (float *)nram_max_pts_feature_tmp,
+                      align_max_pts_each_voxel);
+            int max_idx = (int)__bang_findfirst1(
+                (float *)nram_max_pts_feature_tmp, align_max_pts_each_voxel);
+            nram_pooled_features_cur_voxel[channel_idx] =
+                (max_val == -INFINITY) ? 0 : max_val;
+            nram_argmax_cur_voxel[channel_idx] =
+                (max_val == -INFINITY) ? -1
+                                       : nram_pts_idx_cur_voxel[max_idx + 1];
+          } else {
+            int max_idx = -1;
+            float max_val = -INFINITY;
+            for (int k = 0; k < pts_num_cur_voxel; k++) {
+              float pts_feature_cur_channel = __half2float_rd(
+                  *((half *)nram_pts_feature_in_voxel +
+                    channel_idx * align_max_pts_each_voxel + k));
+              if (pts_feature_cur_channel > max_val) {
+                max_val = pts_feature_cur_channel;
+                max_idx = k;
+              }
+            }
+            nram_pooled_features_cur_voxel[channel_idx] =
+                (max_idx == -1) ? 0 : max_val;
+            nram_argmax_cur_voxel[channel_idx] =
+                (max_idx == -1) ? -1 : nram_pts_idx_cur_voxel[max_idx + 1];
+          }
+#endif
+        } else if (pool_method == 1) {
+          float sum_val_cur_channel = 0;
+          for (int k = 0; k < pts_num_cur_voxel; k++) {
+            sum_val_cur_channel += static_cast<float>(
+                ((T *)nram_pts_feature_in_voxel)[channel_idx *
+                                                     align_max_pts_each_voxel +
+                                                 k]);
+          }
+          nram_pooled_features_cur_voxel[channel_idx] =
+              (T)(sum_val_cur_channel / pts_num_cur_voxel);
+        }
+      }
+      // store
+      __memcpy((T *)pooled_features + voxel_index * channels + channels_offset,
+               (void *)nram_pooled_features_cur_voxel,
+               actual_channels_num * sizeof(T), NRAM2GDRAM);
+      if (pool_method == 0) {
+        __memcpy((int *)argmax + voxel_index * channels + channels_offset,
+                 (void *)nram_argmax_cur_voxel,
+                 actual_channels_num * sizeof(int), NRAM2GDRAM);
+      }
+    }
+  }
+}
+
+void KernelPtsIdxOfVoxels(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, const cnrtDataType_t d_type,
+                          const int pool_method, const int boxes_num,
+                          const int pts_num, const int max_pts_each_voxel,
+                          const int out_x, const int out_y, const int out_z,
+                          const void *rois, const void *pts,
+                          int *pts_idx_of_voxels) {
+  switch (d_type) {
+    case CNRT_FLOAT32: {
+      MLUUnion1KernelPtsIdxOfVoxels<float><<<k_dim, k_type, queue>>>(
+          pool_method, boxes_num, pts_num, max_pts_each_voxel, out_x, out_y,
+          out_z, (float *)rois, (float *)pts, (int *)pts_idx_of_voxels);
+    }; break;
+    case CNRT_FLOAT16: {
+      MLUUnion1KernelPtsIdxOfVoxels<half><<<k_dim, k_type, queue>>>(
+          pool_method, boxes_num, pts_num, max_pts_each_voxel, out_x, out_y,
+          out_z, (half *)rois, (half *)pts, (int *)pts_idx_of_voxels);
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+void KernelRoiawarePool3dForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const int pool_method, const int boxes_num,
+    const int pts_num, const int channels, const int max_pts_each_voxel,
+    const int out_x, const int out_y, const int out_z, const void *pts_feature,
+    const int *pts_idx_of_voxels, void *pooled_features, int *argmax) {
+  switch (d_type) {
+    case CNRT_FLOAT32: {
+      MLUUnion1KernelRoiawarePool3dForward<float><<<k_dim, k_type, queue>>>(
+          pool_method, boxes_num, pts_num, channels, max_pts_each_voxel, out_x,
+          out_y, out_z, (float *)pts_feature, (int *)pts_idx_of_voxels,
+          (float *)pooled_features, (int *)argmax);
+    }; break;
+    case CNRT_FLOAT16: {
+      MLUUnion1KernelRoiawarePool3dForward<half><<<k_dim, k_type, queue>>>(
+          pool_method, boxes_num, pts_num, channels, max_pts_each_voxel, out_x,
+          out_y, out_z, (half *)pts_feature, (int *)pts_idx_of_voxels,
+          (half *)pooled_features, (int *)argmax);
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelRoiawareMaxPool3dBackward(
+    const int boxes_num, const int out_x, const int out_y, const int out_z,
+    const int channels, const int *argmax, const T *grad_out, T *grad_in) {
+  // params (int)argmax: (boxes_num, out_x, out_y, out_z, channels)
+  // params (T)grad_out: (boxes_num, out_x, out_y, out_z, channels)
+  // params (T)grad_in: (pts_num, channels)
+
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  int nram_channels_limit =
+      (MAX_NRAM_SIZE - sizeof(T) * 1) / (sizeof(T) + sizeof(int));
+  int *nram_argmax_cur_loop = (int *)data_nram;
+  // nram_argmax_cur_loop [nram_channels_limit]
+  T *nram_grad_out_cur_loop =
+      (T *)((int *)nram_argmax_cur_loop + nram_channels_limit);
+  // nram_grad_out_cur_loop [nram_channels_limit]
+  T *nram_grad_in_cur_channel =
+      (T *)nram_grad_out_cur_loop + nram_channels_limit;
+  // nram_grad_in_cur_channel [1]
+  int channels_loop_times = channels / nram_channels_limit;
+  int rem_channels = channels % nram_channels_limit;
+  int voxels_num = boxes_num * out_x * out_y * out_z;
+
+  for (int voxel_index = taskId; voxel_index < voxels_num;
+       voxel_index += taskDim) {
+    const int *argmax_cur_voxel = argmax + voxel_index * channels;
+    const T *grad_out_cur_voxel = grad_out + voxel_index * channels;
+
+    for (int channels_loop_idx = 0; channels_loop_idx <= channels_loop_times;
+         channels_loop_idx++) {
+      int actual_channels_num = (channels_loop_idx == channels_loop_times)
+                                    ? rem_channels
+                                    : nram_channels_limit;
+      if (actual_channels_num == 0) {
+        break;
+      }
+      const int *argmax_cur_loop =
+          argmax_cur_voxel + nram_channels_limit * channels_loop_idx;
+      const T *grad_out_cur_loop =
+          grad_out_cur_voxel + nram_channels_limit * channels_loop_idx;
+      __memcpy((void *)nram_argmax_cur_loop, (void *)argmax_cur_loop,
+               actual_channels_num * sizeof(int), GDRAM2NRAM);
+      __memcpy((void *)nram_grad_out_cur_loop, (void *)grad_out_cur_loop,
+               actual_channels_num * sizeof(T), GDRAM2NRAM);
+
+      for (int channel_idx = 0; channel_idx < actual_channels_num;
+           channel_idx++) {
+        int *nram_argmax_cur_channel = nram_argmax_cur_loop + channel_idx;
+        T *nram_grad_out_cur_channel = nram_grad_out_cur_loop + channel_idx;
+        if (nram_argmax_cur_channel[0] == -1) {
+          continue;
+        }
+        T *grad_in_cur_channel =
+            grad_in + nram_argmax_cur_channel[0] * channels +
+            nram_channels_limit * channels_loop_idx + channel_idx;
+        __bang_atomic_add((T *)nram_grad_in_cur_channel,
+                          (T *)grad_in_cur_channel,
+                          (T *)(nram_grad_out_cur_channel), 1);
+      }
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelRoiawareAvgPool3dBackward(
+    const int boxes_num, const int out_x, const int out_y, const int out_z,
+    const int channels, const int max_pts_each_voxel,
+    const int *pts_idx_of_voxels, const T *grad_out, T *grad_in) {
+  // params (int)pts_idx_of_voxels: (boxes_num, out_x, out_y, out_z,
+  // max_pts_each_voxel) params (T)grad_out: (boxes_num, out_x, out_y, out_z,
+  // channels) params (T)grad_in: (pts_num, channels)
+
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  int align_num = NFU_ALIGN_SIZE / sizeof(T);
+  int align_max_pts_each_voxel = PAD_UP(max_pts_each_voxel, align_num);
+  int nram_channels_limit = PAD_DOWN(
+      (MAX_NRAM_SIZE - align_max_pts_each_voxel * sizeof(int)) / 2 / sizeof(T),
+      align_num);
+  int *nram_pts_idx_cur_voxel = (int *)data_nram;
+  // nram_pts_idx_cur_voxel [align_max_pts_each_voxel]
+  T *nram_grad_out_cur_loop =
+      (T *)((int *)nram_pts_idx_cur_voxel + align_max_pts_each_voxel);
+  // nram_grad_out_cur_loop [nram_channels_limit]
+  T *nram_grad_in_cur_loop = (T *)nram_grad_out_cur_loop + nram_channels_limit;
+  // nram_grad_in_cur_loop [nram_channels_limit]
+  int channels_loop_times = channels / nram_channels_limit;
+  int rem_channels = channels % nram_channels_limit;
+  int voxels_num = boxes_num * out_x * out_y * out_z;
+
+  for (int voxel_index = taskId; voxel_index < voxels_num;
+       voxel_index += taskDim) {
+    const T *grad_out_cur_voxel = grad_out + voxel_index * channels;
+    const int *pts_idx_cur_voxel =
+        pts_idx_of_voxels + voxel_index * max_pts_each_voxel;
+    __memcpy((void *)nram_pts_idx_cur_voxel, (void *)pts_idx_cur_voxel,
+             max_pts_each_voxel * sizeof(int), GDRAM2NRAM);
+    int total_pts_of_voxel = nram_pts_idx_cur_voxel[0];
+    if (total_pts_of_voxel <= 0) {
+      continue;
+    }
+    float cur_grad = 1.0 / ((float)total_pts_of_voxel);
+
+    for (int channels_loop_idx = 0; channels_loop_idx <= channels_loop_times;
+         channels_loop_idx++) {
+      int actual_channels_num = (channels_loop_idx == channels_loop_times)
+                                    ? rem_channels
+                                    : nram_channels_limit;
+      if (actual_channels_num == 0) {
+        break;
+      }
+      const T *grad_out_cur_loop =
+          grad_out_cur_voxel + nram_channels_limit * channels_loop_idx;
+      __memcpy((void *)nram_grad_in_cur_loop, (void *)grad_out_cur_loop,
+               actual_channels_num * sizeof(T), GDRAM2NRAM);
+
+      int align_actual_channels_num = PAD_UP(actual_channels_num, align_num);
+
+      if (sizeof(T) == sizeof(half)) {
+        __bang_half2float((float *)nram_grad_out_cur_loop,
+                          (half *)nram_grad_in_cur_loop,
+                          align_actual_channels_num);
+        __bang_mul_scalar((float *)nram_grad_out_cur_loop,
+                          (float *)nram_grad_out_cur_loop, (float)cur_grad,
+                          align_actual_channels_num);
+        convertFloat2half((half *)nram_grad_out_cur_loop,
+                          (float *)nram_grad_out_cur_loop,
+                          align_actual_channels_num);
+      } else {
+        __bang_mul_scalar((float *)nram_grad_out_cur_loop,
+                          (float *)nram_grad_in_cur_loop, (float)cur_grad,
+                          align_actual_channels_num);
+      }
+      for (int k = 1; k <= total_pts_of_voxel; k++) {
+        T *grad_in_cur_loop = grad_in + nram_pts_idx_cur_voxel[k] * channels +
+                              nram_channels_limit * channels_loop_idx;
+        __bang_atomic_add((T *)nram_grad_in_cur_loop, (T *)grad_in_cur_loop,
+                          (T *)nram_grad_out_cur_loop, actual_channels_num);
+      }
+    }
+  }
+}
+
+void KernelRoiawarePool3dBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const int pool_method, const int boxes_num,
+    const int out_x, const int out_y, const int out_z, const int channels,
+    const int max_pts_each_voxel, const int *pts_idx_of_voxels,
+    const int *argmax, const void *grad_out, void *grad_in) {
+  if (pool_method == 0) {
+    switch (d_type) {
+      case CNRT_FLOAT32: {
+        MLUUnion1KernelRoiawareMaxPool3dBackward<float>
+            <<<k_dim, k_type, queue>>>(boxes_num, out_x, out_y, out_z, channels,
+                                       (int *)argmax, (float *)grad_out,
+                                       (float *)grad_in);
+      }; break;
+      case CNRT_FLOAT16: {
+        MLUUnion1KernelRoiawareMaxPool3dBackward<half>
+            <<<k_dim, k_type, queue>>>(boxes_num, out_x, out_y, out_z, channels,
+                                       (int *)argmax, (half *)grad_out,
+                                       (half *)grad_in);
+      }; break;
+      default: {
+        break;
+      }
+    }
+  } else {
+    switch (d_type) {
+      case CNRT_FLOAT32: {
+        MLUUnion1KernelRoiawareAvgPool3dBackward<float>
+            <<<k_dim, k_type, queue>>>(
+                boxes_num, out_x, out_y, out_z, channels, max_pts_each_voxel,
+                (int *)pts_idx_of_voxels, (float *)grad_out, (float *)grad_in);
+      }; break;
+      case CNRT_FLOAT16: {
+        MLUUnion1KernelRoiawareAvgPool3dBackward<half>
+            <<<k_dim, k_type, queue>>>(
+                boxes_num, out_x, out_y, out_z, channels, max_pts_each_voxel,
+                (int *)pts_idx_of_voxels, (half *)grad_out, (half *)grad_in);
+      }; break;
+      default: {
+        break;
+      }
+    }
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_large_boxes_num_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_large_boxes_num_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..58a15d876570dbfd897ee709133c55f302025a6a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_large_boxes_num_mlu_kernel.mlu
@@ -0,0 +1,536 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * OR IMPLIED, INCLUDING BUvoid NOKType LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENvoid SHALL THE AUTHORS OR COPYRIGHKType HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORvoid OR OTHERWISE, ARISING FROM, OUKType OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "common_mlu_helper.hpp"
+
+/*************************************************************************
+ *
+ * NRAM partition:
+ * | boxes3d       | ping points + pong points | aux_a ~ aux_f            |
+ * | 7 * sizeof(T) | 6 * deal_num * sizeof(T)  | 6 * deal_num * sizeof(T) |
+ *
+ *************************************************************************/
+#define TWELVE_SPLIT 12
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void checkPointsInBox3d(const T *boxes3d,
+                                     const size_t deal_num,
+                                     T *x,
+                                     T *y,
+                                     T *z,
+                                     T *auxiliary_a,
+                                     T *auxiliary_b,
+                                     T *auxiliary_c,
+                                     T *auxiliary_d,
+                                     T *auxiliary_e,
+                                     T *auxiliary_f,
+                                     T *pts_assign) {
+  // param box3d: (cx, cy, cz, dx, dy, dz, rz) in LiDAR coordinate
+  T cx = boxes3d[0];
+  T cy = boxes3d[1];
+  T cz = boxes3d[2];
+  T dx = boxes3d[3];
+  T dy = boxes3d[4];
+  T dz = boxes3d[5];
+  T rz = boxes3d[6];
+  // shift to the center since cz in box3d is the bottom center
+  cz += 0.5 * dz;
+
+  T cosa = (T)std::cos(-rz);
+  T sina = (T)std::sin(-rz);
+
+  // x - cx
+  __bang_sub_scalar((T *)auxiliary_a, (T *)x, (T)cx, deal_num);
+  // y - cy
+  __bang_sub_scalar((T *)auxiliary_b, (T *)y, (T)cy, deal_num);
+  // z - cz
+  __bang_sub_scalar((T *)auxiliary_c, (T *)z, (T)cz, deal_num);
+  // |z - cz|
+  __bang_active_abs((T *)auxiliary_c, (T *)auxiliary_c, deal_num);
+  // |z - cz| > dz / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_gt_scalar((T *)auxiliary_c, (T *)auxiliary_c, (T)(0.5 * dz), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_d, deal_num, (T)(0.5 * dz));
+  __bang_lt((T *)auxiliary_c, (T *)auxiliary_d, (T *)auxiliary_c, deal_num);
+#endif
+  // !(|z - cz| > dz / 2.0)
+  __bang_not((T *)auxiliary_c, (T *)auxiliary_c, deal_num);
+  // (x - cx) * cos(-rz)
+  __bang_mul_scalar((T *)auxiliary_d, (T *)auxiliary_a, (T)cosa, deal_num);
+  // (y - cy) * sin(-rz)
+  __bang_mul_scalar((T *)auxiliary_e, (T *)auxiliary_b, (T)sina, deal_num);
+  // local_x = (x - cx) * cos(-rz) + (y - cy) * -sin(-rz)
+  __bang_sub((T *)auxiliary_d, (T *)auxiliary_d, (T *)auxiliary_e, deal_num);
+  // |local_x|
+  __bang_active_abs((T *)auxiliary_d, (T *)auxiliary_d, deal_num);
+  // |local_x| < dx / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_lt_scalar(auxiliary_d, auxiliary_d, (T)(0.5 * dx), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_e, deal_num, (T)(0.5 * dx));
+  __bang_gt((T *)auxiliary_d, (T *)auxiliary_e, (T *)auxiliary_d, deal_num);
+#endif
+  // (x - cx) * sin(-rz)
+  __bang_mul_scalar((T *)auxiliary_e, (T *)auxiliary_a, (T)sina, deal_num);
+  // (y - cy) * cos(-rz)
+  __bang_mul_scalar((T *)auxiliary_f, (T *)auxiliary_b, (T)cosa, deal_num);
+  // local_y = (x - cx) * sin(-rz) + (y - cy) * cos(-rz)
+  __bang_add((T *)auxiliary_e, (T *)auxiliary_e, (T *)auxiliary_f, deal_num);
+  // |local_y|
+  __bang_active_abs((T *)auxiliary_e, (T *)auxiliary_e, deal_num);
+  // |local_y| < dy / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_lt_scalar(auxiliary_e, auxiliary_e, (T)(0.5 * dy), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_f, deal_num, (T)(0.5 * dy));
+  __bang_gt((T *)auxiliary_e, (T *)auxiliary_f, (T *)auxiliary_e, deal_num);
+#endif
+  // pts_assign = |x - cx| < dx / 2.0 && |y - cy| < dy / 2.0 && |z - cz| <= dz / 2.0
+  __bang_mul((T *)pts_assign, (T *)auxiliary_c, (T *)auxiliary_d, deal_num);
+  __bang_mul((T *)pts_assign, (T *)pts_assign, (T *)auxiliary_e, deal_num);
+}
+
+template <typename T>
+__mlu_func__ void computeStoreRoipointPool3d(char *boxes3d,
+                                             int  *cnt,
+                                             char *points_x,
+                                             char *points_y,
+                                             char *points_z,
+                                             const char *point_features,
+                                             char *auxiliary_a,
+                                             char *auxiliary_b,
+                                             char *auxiliary_c,
+                                             char *auxiliary_d,
+                                             char *auxiliary_e,
+                                             char *auxiliary_f,
+                                             const int box_idx,
+                                             const int pts_num,
+                                             const int feature_in_len,
+                                             const int sampled_pts_num,
+                                             const size_t span_num_deal,
+                                             char *pooled_features_gdram,
+                                             char *pooled_empty_flag_gdram) {
+  char *pts_assign = auxiliary_a;
+  if (*cnt >= sampled_pts_num) {
+    return;
+  }
+  checkPointsInBox3d((T *)boxes3d, span_num_deal, (T *)points_x, (T *)points_y, (T *)points_z,
+                     (T *)auxiliary_a, (T *)auxiliary_b, (T *)auxiliary_c, (T *)auxiliary_d,
+                     (T *)auxiliary_e, (T *)auxiliary_f, (T *)pts_assign);
+
+  // __bang_select returns selected elements vector and the number of selected elements
+  __bang_select((T *)auxiliary_b, (T *)points_x, (T *)pts_assign, span_num_deal);
+  uint32_t select_num = *((uint32_t *)auxiliary_b);
+
+  if (select_num == 0) {
+    return;
+  }
+  int sampled_pts_num_rem = sampled_pts_num - *cnt;
+  int segnum = min((int)select_num, sampled_pts_num_rem) - 1;
+
+  // copy x to pooled_features_gdram
+  // The result of __bang_select is composed of three parts:
+  // The first 4-byte is the number of selected element, whose data type is unsigned int.
+  // The next 124-byte is zero. The rest bytes are the selected elements.
+  int select_num_size = 128;
+  __memcpy(
+      pooled_features_gdram + (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T),
+      (T *)((int8_t *)auxiliary_b + select_num_size), sizeof(T), NRAM2GDRAM,
+      (3 + feature_in_len) * sizeof(T), sizeof(T), segnum);
+
+  // copy y to pooled_features_gdram
+  __bang_collect((T *)auxiliary_d, (T *)points_y, (T *)pts_assign, span_num_deal);
+  __memcpy(pooled_features_gdram +
+               (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+               1 * sizeof(T),
+           (T *)auxiliary_d, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+           segnum);
+
+  // copy z to pooled_features_gdram
+  __bang_collect((T *)auxiliary_e, (T *)points_z, (T *)pts_assign, span_num_deal);
+  __memcpy(pooled_features_gdram +
+               (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+               2 * sizeof(T),
+           (T *)auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+           segnum);
+
+  // copy features to pooled_features_gdram
+  for (int c_idx = 0; c_idx < feature_in_len; c_idx++) {
+    __memcpy(auxiliary_d, point_features + c_idx * pts_num * sizeof(T), span_num_deal * sizeof(T),
+             GDRAM2NRAM);
+    __bang_collect((T *)auxiliary_e, (T *)auxiliary_d, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+                 (3 + c_idx) * sizeof(T),
+             auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+  }
+
+  *cnt += select_num;
+}
+
+template <typename T>
+__mlu_func__ void computeStoreLastBlockRoipointPool3d(char *boxes3d,
+                                                      int  *cnt,
+                                                      char *points_x,
+                                                      char *points_y,
+                                                      char *points_z,
+                                                      const char *point_features,
+                                                      char *auxiliary_a,
+                                                      char *auxiliary_b,
+                                                      char *auxiliary_c,
+                                                      char *auxiliary_d,
+                                                      char *auxiliary_e,
+                                                      char *auxiliary_f,
+                                                      const int box_idx,
+                                                      const int pts_num,
+                                                      const int feature_in_len,
+                                                      const int sampled_pts_num,
+                                                      const size_t span_num_deal,
+                                                      const size_t auxiliary_num_deal,
+                                                      char *pooled_features_gdram,
+                                                      char *pooled_empty_flag_gdram) {
+  char *pts_assign = auxiliary_a;
+  if (*cnt >= sampled_pts_num) {
+    // pooled_empty_flag_gdram set 0
+    *((int *)auxiliary_a) = 0;
+    __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+    return;
+  }
+  checkPointsInBox3d((T *)boxes3d, span_num_deal, (T *)points_x, (T *)points_y, (T *)points_z,
+                     (T *)auxiliary_a, (T *)auxiliary_b, (T *)auxiliary_c, (T *)auxiliary_d,
+                     (T *)auxiliary_e, (T *)auxiliary_f, (T *)pts_assign);
+
+  // __bang_select returns selected elements vector and the number of selected elements
+  __bang_select((T *)auxiliary_b, (T *)points_x, (T *)pts_assign, span_num_deal);
+  uint32_t select_num = *((uint32_t *)auxiliary_b);
+
+  if (*cnt + select_num == 0) {
+    // pooled_empty_flag_gdram set 1
+    *((int *)auxiliary_a) = 1;
+    __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+
+    // pooled_features_gdram set 0
+    int repeat = (sampled_pts_num * (3 + feature_in_len)) / (auxiliary_num_deal * 6);
+    int rem = (sampled_pts_num * (3 + feature_in_len)) % (auxiliary_num_deal * 6);
+    // use auxiliary_a to auxiliary_f
+    __bang_write_zero((T *)auxiliary_a, PAD_UP(auxiliary_num_deal * 6, NFU_ALIGN_SIZE));
+    if (repeat > 0) {
+      __memcpy(pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+               auxiliary_a, auxiliary_num_deal * 6 * sizeof(T), NRAM2GDRAM,
+               auxiliary_num_deal * 6 * sizeof(T), 0, repeat - 1);
+    }
+    if (rem > 0) {
+      __memcpy(pooled_features_gdram +
+                   box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T) +
+                   repeat * auxiliary_num_deal * 6 * sizeof(T),
+               auxiliary_a, rem * sizeof(T), NRAM2GDRAM);
+    }
+    return;
+  }
+
+  if (select_num > 0) {
+    int sampled_pts_num_rem = sampled_pts_num - *cnt;
+    int segnum = min((int)select_num, sampled_pts_num_rem) - 1;
+
+    // copy x to pooled_features_gdram
+    // The result of __bang_select is composed of three parts:
+    // The first 4-byte is the number of selected element, whose data type is unsigned int.
+    // The next 124-byte is zero. The rest bytes are the selected elements.
+    int select_num_size = 128;
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T),
+             (T *)((int8_t *)auxiliary_b + select_num_size), sizeof(T), NRAM2GDRAM,
+             (3 + feature_in_len) * sizeof(T), sizeof(T), segnum);
+
+    // copy y to pooled_features_gdram
+    __bang_collect((T *)auxiliary_d, (T *)points_y, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+                 1 * sizeof(T),
+             (T *)auxiliary_d, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+
+    // copy z to pooled_features_gdram
+    __bang_collect((T *)auxiliary_e, (T *)points_z, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+                 2 * sizeof(T),
+             (T *)auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+
+    // copy features to pooled_features_gdram
+    for (int c_idx = 0; c_idx < feature_in_len; c_idx++) {
+      __memcpy(auxiliary_d, point_features + c_idx * pts_num * sizeof(T), span_num_deal * sizeof(T),
+               GDRAM2NRAM);
+      __bang_collect((T *)auxiliary_e, (T *)auxiliary_d, (T *)pts_assign, span_num_deal);
+      __memcpy(pooled_features_gdram +
+                   (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T) +
+                   (3 + c_idx) * sizeof(T),
+               auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+               segnum);
+    }
+  }
+
+  // pooled_empty_flag_gdram set 0
+  *((int *)auxiliary_a) = 0;
+  __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+
+  *cnt += select_num;
+  if (*cnt < sampled_pts_num) {
+    // duplicate same points for sampling
+    int repeat = sampled_pts_num / (*cnt) - 1;
+    int rem = sampled_pts_num % (*cnt);
+    if (repeat > 0) {
+      __memcpy(pooled_features_gdram +
+                   (box_idx * sampled_pts_num + *cnt) * (3 + feature_in_len) * sizeof(T),
+               pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+               (*cnt) * (3 + feature_in_len) * sizeof(T), GDRAM2GDRAM,
+               (*cnt) * (3 + feature_in_len) * sizeof(T), 0, repeat - 1);
+    }
+    if (rem > 0) {
+      __memcpy(
+          pooled_features_gdram +
+              (box_idx * sampled_pts_num + (repeat + 1) * (*cnt)) * (3 + feature_in_len) *
+              sizeof(T),
+          pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+          rem * (3 + feature_in_len) * sizeof(T), GDRAM2GDRAM);
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelRoiPointPool3dLargeBoxesNumForward(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram) {
+  if (coreId == 0x80) {
+    return;
+  }
+  size_t boxes_per_core = (batch_size * boxes_num) / taskDim;
+  size_t boxes_rem = (batch_size * boxes_num) % taskDim;
+  // calc batch_start, batch_end, first_batch_box_start, last batch_box_end for each core
+  int32_t batch_start = taskId < (boxes_rem + 1) ?
+                        (taskId * (boxes_per_core + 1)) / boxes_num :
+                        (taskId * boxes_per_core + boxes_rem) / boxes_num;
+  int32_t batch_end = taskId < boxes_rem ?
+                      ((taskId + 1) * (boxes_per_core + 1) - 1) / boxes_num :
+                      ((taskId + 1) * boxes_per_core + boxes_rem - 1) / boxes_num;
+  size_t first_batch_box_start = taskId < (boxes_rem + 1) ?
+                                 (taskId * (boxes_per_core + 1)) - batch_start * boxes_num :
+                                 taskId * boxes_per_core + boxes_rem - batch_start * boxes_num;
+  size_t last_batch_box_end = taskId < boxes_rem ?
+                              (taskId + 1) * (boxes_per_core + 1) - batch_end * boxes_num :
+                              ((taskId + 1) * boxes_per_core + boxes_rem) - batch_end * boxes_num;
+
+  // points_xyz : [3, B, N]
+  const char *points_x_gdram = points_xyz_gdram;
+  const char *points_y_gdram = points_xyz_gdram + (1 * batch_size * pts_num) * sizeof(T);
+  const char *points_z_gdram = points_xyz_gdram + (2 * batch_size * pts_num) * sizeof(T);
+
+  size_t boxes3d_size = PAD_UP(7, NFU_ALIGN_SIZE) * sizeof(T);
+  size_t span_num_deal = PAD_DOWN(MAX_NRAM_SIZE / TWELVE_SPLIT / sizeof(T), NFU_ALIGN_SIZE);
+  size_t align_num = NFU_ALIGN_SIZE;
+  int32_t repeat = pts_num / span_num_deal;
+  size_t rem = pts_num % span_num_deal;
+  size_t align_rem = CEIL_ALIGN(rem, align_num);
+  char *boxes3d = nram_buffer;
+  char *ping_points_x = nram_buffer + boxes3d_size;
+  char *ping_points_y = ping_points_x + span_num_deal * sizeof(T);
+  char *ping_points_z = ping_points_y + span_num_deal * sizeof(T);
+  size_t ping_pong_gap = 3 * span_num_deal * sizeof(T);
+  char *auxiliary_a = ping_points_x + 2 * ping_pong_gap;
+  char *auxiliary_b = auxiliary_a + span_num_deal * sizeof(T);
+  char *auxiliary_c = auxiliary_b + span_num_deal * sizeof(T);
+  char *auxiliary_d = auxiliary_c + span_num_deal * sizeof(T);
+  char *auxiliary_e = auxiliary_d + span_num_deal * sizeof(T);
+  char *auxiliary_f = auxiliary_e + span_num_deal * sizeof(T);
+  size_t span_load_input1_size = span_num_deal * sizeof(T);
+  size_t span_load_input2_size = span_num_deal * sizeof(T);
+  size_t span_load_input3_size = span_num_deal * sizeof(T);
+  size_t span_load_input4_size = span_num_deal * sizeof(T);
+  int cnt = 0;
+
+  for (int bs_idx = batch_start; bs_idx <= batch_end; bs_idx++) {
+    const char *points_x_start = points_x_gdram + bs_idx * pts_num * sizeof(T);
+    const char *points_y_start = points_y_gdram + bs_idx * pts_num * sizeof(T);
+    const char *points_z_start = points_z_gdram + bs_idx * pts_num * sizeof(T);
+    const char *point_features_start =
+        point_features_gdram + bs_idx * feature_in_len * pts_num * sizeof(T);
+    char *pooled_features_start =
+        pooled_features_gdram +
+        (bs_idx * boxes_num * sampled_pts_num * (3 + feature_in_len)) * sizeof(T);
+    char *pooled_empty_flag_start = pooled_empty_flag_gdram + bs_idx * boxes_num * sizeof(int);
+    size_t box_start = bs_idx == batch_start ? first_batch_box_start : 0;
+    size_t box_end = bs_idx == batch_end ? last_batch_box_end : boxes_num;
+
+    for (int box_idx = box_start; box_idx < box_end; box_idx++) {
+      __memcpy_async(boxes3d,
+                     boxes3d_gdram + bs_idx * boxes_num * 7 * sizeof(T) + box_idx * 7 * sizeof(T),
+                     7 * sizeof(T), GDRAM2NRAM);
+      cnt = 0;
+      if (repeat > 0) {
+        __memcpy_async(ping_points_x, points_x_start, span_load_input1_size, GDRAM2NRAM);
+        __memcpy_async(ping_points_y, points_y_start, span_load_input2_size, GDRAM2NRAM);
+        __memcpy_async(ping_points_z, points_z_start, span_load_input3_size, GDRAM2NRAM);
+        __asm__ volatile("sync;");
+      }
+
+      for (int i = 0; i < repeat - 1; i++) {
+        __memcpy_async(ping_points_x + ((i + 1) % 2) * ping_pong_gap,
+                       points_x_start + (i + 1) * span_load_input1_size, span_load_input1_size,
+                       GDRAM2NRAM);
+        __memcpy_async(ping_points_y + ((i + 1) % 2) * ping_pong_gap,
+                       points_y_start + (i + 1) * span_load_input2_size, span_load_input2_size,
+                       GDRAM2NRAM);
+        __memcpy_async(ping_points_z + ((i + 1) % 2) * ping_pong_gap,
+                       points_z_start + (i + 1) * span_load_input3_size, span_load_input3_size,
+                       GDRAM2NRAM);
+        computeStoreRoipointPool3d<T>(
+            boxes3d, &cnt, ping_points_x + (i % 2) * ping_pong_gap,
+            ping_points_y + (i % 2) * ping_pong_gap, ping_points_z + (i % 2) * ping_pong_gap,
+            point_features_start + i * span_load_input4_size, auxiliary_a, auxiliary_b, auxiliary_c,
+            auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, pooled_features_start, pooled_empty_flag_start);
+        __asm__ volatile("sync;");
+      }
+
+      if (rem > 0) {
+        if (sizeof(T) == sizeof(float)) {
+          __bang_write_value((T *)(ping_points_x + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+          __bang_write_value((T *)(ping_points_y + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+          __bang_write_value((T *)(ping_points_z + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+        } else {
+          __bang_write_value((T *)(ping_points_x + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+          __bang_write_value((T *)(ping_points_y + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+          __bang_write_value((T *)(ping_points_z + (repeat % 2) * ping_pong_gap +
+                                   PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                             NFU_ALIGN_SIZE, (T)NAN);
+        }
+        __memcpy_async(ping_points_x + (repeat % 2) * ping_pong_gap,
+                       points_x_start + repeat * span_load_input1_size, rem * sizeof(T),
+                       GDRAM2NRAM);
+        __memcpy_async(ping_points_y + (repeat % 2) * ping_pong_gap,
+                       points_y_start + repeat * span_load_input2_size, rem * sizeof(T),
+                       GDRAM2NRAM);
+        __memcpy_async(ping_points_z + (repeat % 2) * ping_pong_gap,
+                       points_z_start + repeat * span_load_input3_size, rem * sizeof(T),
+                       GDRAM2NRAM);
+      }
+
+      if (repeat > 0 && rem > 0) {
+        computeStoreRoipointPool3d<T>(
+            boxes3d, &cnt, ping_points_x + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_y + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_z + ((repeat - 1) % 2) * ping_pong_gap,
+            point_features_start + (repeat - 1) * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, pooled_features_start, pooled_empty_flag_start);
+      } else if (repeat > 0 && rem == 0) {
+        computeStoreLastBlockRoipointPool3d<T>(
+            boxes3d, &cnt, ping_points_x + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_y + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_z + ((repeat - 1) % 2) * ping_pong_gap,
+            point_features_start + (repeat - 1) * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, span_num_deal, pooled_features_start,
+            pooled_empty_flag_start);
+      }
+
+      if (rem > 0) {
+        __asm__ volatile("sync;");
+        computeStoreLastBlockRoipointPool3d<T>(
+            boxes3d, &cnt, ping_points_x + (repeat % 2) * ping_pong_gap,
+            ping_points_y + (repeat % 2) * ping_pong_gap,
+            ping_points_z + (repeat % 2) * ping_pong_gap,
+            point_features_start + repeat * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, align_rem, span_num_deal, pooled_features_start,
+            pooled_empty_flag_start);
+      }
+    }
+  }
+}
+
+template __mlu_global__ void MLUUnion1KernelRoiPointPool3dLargeBoxesNumForward<float>(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram);
+
+template __mlu_global__ void MLUUnion1KernelRoiPointPool3dLargeBoxesNumForward<half>(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram);
+
+void KernelRoiPointPool3dLargeBoxesNumForward(cnrtDim3_t k_dim,
+                                              cnrtFunctionType_t k_type,
+                                              cnrtQueue_t queue,
+                                              const cnrtDataType_t d_type,
+                                              const int batch_size,
+                                              const int pts_num,
+                                              const int boxes_num,
+                                              const int feature_in_len,
+                                              const int sampled_pts_num,
+                                              const void *points_xyz,
+                                              const void *boxes3d,
+                                              const void *point_features,
+                                              void *pooled_features,
+                                              int *pooled_empty_flag) {
+  switch (d_type) {
+    default: { break; }
+    case CNRT_FLOAT32: {
+      MLUUnion1KernelRoiPointPool3dLargeBoxesNumForward<float><<<k_dim, k_type, queue>>>(
+          batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num,
+          (char *)points_xyz, (char *)point_features, (char *)boxes3d,
+          (char *)pooled_features, (char *)pooled_empty_flag);
+    }; break;
+    case CNRT_FLOAT16: {
+      MLUUnion1KernelRoiPointPool3dLargeBoxesNumForward<half><<<k_dim, k_type, queue>>>(
+          batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num,
+          (char *)points_xyz, (char *)point_features, (char *)boxes3d,
+          (char *)pooled_features, (char *)pooled_empty_flag);
+    }; break;
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..f16d84047d3dfd3648ca788055acad75829317f6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/roipoint_pool3d_mlu_kernel.mlu
@@ -0,0 +1,544 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * OR IMPLIED, INCLUDING BUvoid NOKType LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENvoid SHALL THE AUTHORS OR COPYRIGHKType HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORvoid OR OTHERWISE, ARISING FROM, OUKType OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "common_mlu_helper.hpp"
+
+/**************************************************************************************
+ *
+ * NRAM partition:
+ * | boxes3d                   | cnt                      |
+ * | boxes_num * 7 * sizeof(T) | boxes_num * sizeof(int)  |
+ *
+ * | ping points               | pong points              | aux_a ~ aux_f            |
+ * | 3 * deal_num * sizeof(T)  | 3 * deal_num * sizeof(T) | 6 * deal_num * sizeof(T) |
+ *
+ ***************************************************************************************/
+#define TWELVE_SPLIT 12
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void checkPointsInBox3d(const T *boxes3d,
+                                     const size_t deal_num,
+                                     T *x,
+                                     T *y,
+                                     T *z,
+                                     T *auxiliary_a,
+                                     T *auxiliary_b,
+                                     T *auxiliary_c,
+                                     T *auxiliary_d,
+                                     T *auxiliary_e,
+                                     T *auxiliary_f,
+                                     T *pts_assign) {
+  // param box3d: (cx, cy, cz, dx, dy, dz, rz) in LiDAR coordinate
+  T cx = boxes3d[0];
+  T cy = boxes3d[1];
+  T cz = boxes3d[2];
+  T dx = boxes3d[3];
+  T dy = boxes3d[4];
+  T dz = boxes3d[5];
+  T rz = boxes3d[6];
+  // shift to the center since cz in box3d is the bottom center
+  cz += 0.5 * dz;
+
+  T cosa = (T)std::cos(-rz);
+  T sina = (T)std::sin(-rz);
+
+  // x - cx
+  __bang_sub_scalar((T *)auxiliary_a, (T *)x, (T)cx, deal_num);
+  // y - cy
+  __bang_sub_scalar((T *)auxiliary_b, (T *)y, (T)cy, deal_num);
+  // z - cz
+  __bang_sub_scalar((T *)auxiliary_c, (T *)z, (T)cz, deal_num);
+  // |z - cz|
+  __bang_active_abs((T *)auxiliary_c, (T *)auxiliary_c, deal_num);
+  // |z - cz| > dz / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_gt_scalar((T *)auxiliary_c, (T *)auxiliary_c, (T)(0.5 * dz), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_d, deal_num, (T)(0.5 * dz));
+  __bang_lt((T *)auxiliary_c, (T *)auxiliary_d, (T *)auxiliary_c, deal_num);
+#endif
+  // !(|z - cz| > dz / 2.0)
+  __bang_not((T *)auxiliary_c, (T *)auxiliary_c, deal_num);
+  // (x - cx) * cos(-rz)
+  __bang_mul_scalar((T *)auxiliary_d, (T *)auxiliary_a, (T)cosa, deal_num);
+  // (y - cy) * sin(-rz)
+  __bang_mul_scalar((T *)auxiliary_e, (T *)auxiliary_b, (T)sina, deal_num);
+  // local_x = (x - cx) * cos(-rz) + (y - cy) * -sin(-rz)
+  __bang_sub((T *)auxiliary_d, (T *)auxiliary_d, (T *)auxiliary_e, deal_num);
+  // |local_x|
+  __bang_active_abs((T *)auxiliary_d, (T *)auxiliary_d, deal_num);
+  // |local_x| < dx / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_lt_scalar(auxiliary_d, auxiliary_d, (T)(0.5 * dx), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_e, deal_num, (T)(0.5 * dx));
+  __bang_gt((T *)auxiliary_d, (T *)auxiliary_e, (T *)auxiliary_d, deal_num);
+#endif
+  // (x - cx) * sin(-rz)
+  __bang_mul_scalar((T *)auxiliary_e, (T *)auxiliary_a, (T)sina, deal_num);
+  // (y - cy) * cos(-rz)
+  __bang_mul_scalar((T *)auxiliary_f, (T *)auxiliary_b, (T)cosa, deal_num);
+  // local_y = (x - cx) * sin(-rz) + (y - cy) * cos(-rz)
+  __bang_add((T *)auxiliary_e, (T *)auxiliary_e, (T *)auxiliary_f, deal_num);
+  // |local_y|
+  __bang_active_abs((T *)auxiliary_e, (T *)auxiliary_e, deal_num);
+  // |local_y| < dy / 2.0
+#if __BANG_ARCH__ >= 322
+  __bang_lt_scalar(auxiliary_e, auxiliary_e, (T)(0.5 * dy), deal_num);
+#else
+  __bang_write_value((T *)auxiliary_f, deal_num, (T)(0.5 * dy));
+  __bang_gt((T *)auxiliary_e, (T *)auxiliary_f, (T *)auxiliary_e, deal_num);
+#endif
+  // pts_assign = |x - cx| < dx / 2.0 && |y - cy| < dy / 2.0 && |z - cz| <= dz / 2.0
+  __bang_mul((T *)pts_assign, (T *)auxiliary_c, (T *)auxiliary_d, deal_num);
+  __bang_mul((T *)pts_assign, (T *)pts_assign, (T *)auxiliary_e, deal_num);
+}
+
+template <typename T>
+__mlu_func__ void computeStoreRoipointPool3d(char *boxes3d,
+                                             int  *cnt,
+                                             char *points_x,
+                                             char *points_y,
+                                             char *points_z,
+                                             const char *point_features,
+                                             char *auxiliary_a,
+                                             char *auxiliary_b,
+                                             char *auxiliary_c,
+                                             char *auxiliary_d,
+                                             char *auxiliary_e,
+                                             char *auxiliary_f,
+                                             const int box_idx,
+                                             const int pts_num,
+                                             const int feature_in_len,
+                                             const int sampled_pts_num,
+                                             const size_t span_num_deal,
+                                             char *pooled_features_gdram,
+                                             char *pooled_empty_flag_gdram) {
+  char *pts_assign = auxiliary_a;
+  if (cnt[box_idx] >= sampled_pts_num) {
+    return;
+  }
+  checkPointsInBox3d((T *)(boxes3d + box_idx * 7 * sizeof(T)), span_num_deal, (T *)points_x,
+                     (T *)points_y, (T *)points_z, (T *)auxiliary_a, (T *)auxiliary_b,
+                     (T *)auxiliary_c, (T *)auxiliary_d, (T *)auxiliary_e, (T *)auxiliary_f,
+                     (T *)pts_assign);
+
+  // __bang_select returns selected elements vector and the number of selected elements
+  __bang_select((T *)auxiliary_b, (T *)points_x, (T *)pts_assign, span_num_deal);
+  uint32_t select_num = *((uint32_t *)auxiliary_b);
+
+  if (select_num == 0) {
+    return;
+  }
+  int sampled_pts_num_rem = sampled_pts_num - cnt[box_idx];
+  int segnum = min((int)select_num, sampled_pts_num_rem) - 1;
+
+  // copy x to pooled_features_gdram
+  // The result of __bang_select is composed of three parts:
+  // The first 4-byte is the number of selected element, whose data type is unsigned int.
+  // The next 124-byte is zero. The rest bytes are the selected elements.
+  int select_num_size = 128;
+  __memcpy(pooled_features_gdram +
+               (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T),
+           (T *)((int8_t *)auxiliary_b + select_num_size), sizeof(T), NRAM2GDRAM,
+           (3 + feature_in_len) * sizeof(T), sizeof(T), segnum);
+
+  // copy y to pooled_features_gdram
+  __bang_collect((T *)auxiliary_d, (T *)points_y, (T *)pts_assign, span_num_deal);
+  __memcpy(pooled_features_gdram +
+               (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+               1 * sizeof(T),
+           (T *)auxiliary_d, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+           segnum);
+
+  // copy z to pooled_features_gdram
+  __bang_collect((T *)auxiliary_e, (T *)points_z, (T *)pts_assign, span_num_deal);
+  __memcpy(pooled_features_gdram +
+               (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+               2 * sizeof(T),
+           (T *)auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+           segnum);
+
+  // copy features to pooled_features_gdram
+  for (int c_idx = 0; c_idx < feature_in_len; c_idx++) {
+    __memcpy(auxiliary_d, point_features + c_idx * pts_num * sizeof(T), span_num_deal * sizeof(T),
+             GDRAM2NRAM);
+    __bang_collect((T *)auxiliary_e, (T *)auxiliary_d, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+                 (3 + c_idx) * sizeof(T),
+             auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+  }
+
+  cnt[box_idx] += select_num;
+}
+
+template <typename T>
+__mlu_func__ void computeStoreLastBlockRoipointPool3d(char *boxes3d,
+                                                      int  *cnt,
+                                                      char *points_x,
+                                                      char *points_y,
+                                                      char *points_z,
+                                                      const char *point_features,
+                                                      char *auxiliary_a,
+                                                      char *auxiliary_b,
+                                                      char *auxiliary_c,
+                                                      char *auxiliary_d,
+                                                      char *auxiliary_e,
+                                                      char *auxiliary_f,
+                                                      const int box_idx,
+                                                      const int pts_num,
+                                                      const int feature_in_len,
+                                                      const int sampled_pts_num,
+                                                      const size_t span_num_deal,
+                                                      const size_t auxiliary_num_deal,
+                                                      char *pooled_features_gdram,
+                                                      char *pooled_empty_flag_gdram) {
+  char *pts_assign = auxiliary_a;
+  if (cnt[box_idx] >= sampled_pts_num) {
+    // pooled_empty_flag_gdram set 0
+    *((int *)auxiliary_a) = 0;
+    __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+    return;
+  }
+  checkPointsInBox3d((T *)(boxes3d + box_idx * 7 * sizeof(T)), span_num_deal, (T *)points_x,
+                     (T *)points_y, (T *)points_z, (T *)auxiliary_a, (T *)auxiliary_b,
+                     (T *)auxiliary_c, (T *)auxiliary_d, (T *)auxiliary_e, (T *)auxiliary_f,
+                     (T *)pts_assign);
+
+  // __bang_select returns selected elements vector and the number of selected elements
+  __bang_select((T *)auxiliary_b, (T *)points_x, (T *)pts_assign, span_num_deal);
+  uint32_t select_num = *((uint32_t *)auxiliary_b);
+
+  if (cnt[box_idx] + select_num == 0) {
+    // pooled_empty_flag_gdram set 1
+    *((int *)auxiliary_a) = 1;
+    __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+
+    // pooled_features_gdram set 0
+    int repeat = (sampled_pts_num * (3 + feature_in_len)) / (auxiliary_num_deal * 6);
+    int rem = (sampled_pts_num * (3 + feature_in_len)) % (auxiliary_num_deal * 6);
+    // use auxiliary_a to auxiliary_f
+    __bang_write_zero((T *)auxiliary_a, PAD_UP(auxiliary_num_deal * 6, NFU_ALIGN_SIZE));
+    if (repeat > 0) {
+      __memcpy(pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+               auxiliary_a, auxiliary_num_deal * 6 * sizeof(T), NRAM2GDRAM,
+               auxiliary_num_deal * 6 * sizeof(T), 0, repeat - 1);
+    }
+    if (rem > 0) {
+      __memcpy(pooled_features_gdram +
+                   box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T) +
+                   repeat * auxiliary_num_deal * 6 * sizeof(T),
+               auxiliary_a, rem * sizeof(T), NRAM2GDRAM);
+    }
+    return;
+  }
+
+  if (select_num > 0) {
+    int sampled_pts_num_rem = sampled_pts_num - cnt[box_idx];
+    int segnum = min((int)select_num, sampled_pts_num_rem) - 1;
+
+    // copy x to pooled_features_gdram
+    // The result of __bang_select is composed of three parts:
+    // The first 4-byte is the number of selected element, whose data type is unsigned int.
+    // The next 124-byte is zero. The rest bytes are the selected elements.
+    int select_num_size = 128;
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T),
+             (T *)((int8_t *)auxiliary_b + select_num_size), sizeof(T), NRAM2GDRAM,
+             (3 + feature_in_len) * sizeof(T), sizeof(T), segnum);
+
+    // copy y to pooled_features_gdram
+    __bang_collect((T *)auxiliary_d, (T *)points_y, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+                 1 * sizeof(T),
+             (T *)auxiliary_d, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+
+    // copy z to pooled_features_gdram
+    __bang_collect((T *)auxiliary_e, (T *)points_z, (T *)pts_assign, span_num_deal);
+    __memcpy(pooled_features_gdram +
+                 (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+                 2 * sizeof(T),
+             (T *)auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+             segnum);
+
+    // copy features to pooled_features_gdram
+    for (int c_idx = 0; c_idx < feature_in_len; c_idx++) {
+      __memcpy(auxiliary_d, point_features + c_idx * pts_num * sizeof(T), span_num_deal * sizeof(T),
+               GDRAM2NRAM);
+      __bang_collect((T *)auxiliary_e, (T *)auxiliary_d, (T *)pts_assign, span_num_deal);
+      __memcpy(pooled_features_gdram +
+                   (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T) +
+                   (3 + c_idx) * sizeof(T),
+               auxiliary_e, sizeof(T), NRAM2GDRAM, (3 + feature_in_len) * sizeof(T), sizeof(T),
+               segnum);
+    }
+  }
+
+  // pooled_empty_flag_gdram set 0
+  *((int *)auxiliary_a) = 0;
+  __memcpy(pooled_empty_flag_gdram + box_idx * sizeof(int), auxiliary_a, sizeof(int), NRAM2GDRAM);
+
+  cnt[box_idx] += select_num;
+  if (cnt[box_idx] < sampled_pts_num) {
+    // duplicate same points for sampling
+    int repeat = sampled_pts_num / cnt[box_idx] - 1;
+    int rem = sampled_pts_num % cnt[box_idx];
+    if (repeat > 0) {
+      __memcpy(pooled_features_gdram +
+                   (box_idx * sampled_pts_num + cnt[box_idx]) * (3 + feature_in_len) * sizeof(T),
+               pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+               cnt[box_idx] * (3 + feature_in_len) * sizeof(T), GDRAM2GDRAM,
+               cnt[box_idx] * (3 + feature_in_len) * sizeof(T), 0, repeat - 1);
+    }
+    if (rem > 0) {
+      __memcpy(pooled_features_gdram + (box_idx * sampled_pts_num + (repeat + 1) * cnt[box_idx]) *
+                   (3 + feature_in_len) * sizeof(T),
+               pooled_features_gdram + box_idx * sampled_pts_num * (3 + feature_in_len) * sizeof(T),
+               rem * (3 + feature_in_len) * sizeof(T), GDRAM2GDRAM);
+    }
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelRoiPointPool3dForward(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram) {
+  if (coreId == 0x80) {
+    return;
+  }
+  size_t boxes_per_core = (batch_size * boxes_num) / taskDim;
+  size_t boxes_rem = (batch_size * boxes_num) % taskDim;
+  // calc batch_start, batch_end, first_batch_box_start, last batch_box_end for each core
+  int32_t batch_start = taskId < (boxes_rem + 1) ?
+                        (taskId * (boxes_per_core + 1)) / boxes_num :
+                        (taskId * boxes_per_core + boxes_rem) / boxes_num;
+  int32_t batch_end = taskId < boxes_rem ?
+                      ((taskId + 1) * (boxes_per_core + 1) - 1) / boxes_num :
+                      ((taskId + 1) * boxes_per_core + boxes_rem - 1) / boxes_num;
+  size_t first_batch_box_start = taskId < (boxes_rem + 1) ?
+                                 (taskId * (boxes_per_core + 1)) - batch_start * boxes_num :
+                                 taskId * boxes_per_core + boxes_rem - batch_start * boxes_num;
+  size_t last_batch_box_end = taskId < boxes_rem ?
+                              (taskId + 1) * (boxes_per_core + 1) - batch_end * boxes_num :
+                              ((taskId + 1) * boxes_per_core + boxes_rem) - batch_end * boxes_num;
+
+  // points_xyz : [3, B, N]
+  const char *points_x_gdram = points_xyz_gdram;
+  const char *points_y_gdram = points_xyz_gdram + (1 * batch_size * pts_num) * sizeof(T);
+  const char *points_z_gdram = points_xyz_gdram + (2 * batch_size * pts_num) * sizeof(T);
+
+  size_t boxes3d_size = PAD_UP(boxes_num * 7, NFU_ALIGN_SIZE) * sizeof(T);
+  size_t cnt_size = PAD_UP(boxes_num, NFU_ALIGN_SIZE) * sizeof(int);
+  size_t span_num_deal = PAD_DOWN(
+      (MAX_NRAM_SIZE - boxes3d_size - cnt_size) / TWELVE_SPLIT / sizeof(T), NFU_ALIGN_SIZE);
+  size_t align_num = NFU_ALIGN_SIZE;
+  int32_t repeat = pts_num / span_num_deal;
+  size_t rem = pts_num % span_num_deal;
+  size_t align_rem = CEIL_ALIGN(rem, align_num);
+  char *boxes3d = nram_buffer;
+  char *cnt = nram_buffer + boxes3d_size;
+  char *ping_points_x = cnt + cnt_size;
+  char *ping_points_y = ping_points_x + span_num_deal * sizeof(T);
+  char *ping_points_z = ping_points_y + span_num_deal * sizeof(T);
+  size_t ping_pong_gap = 3 * span_num_deal * sizeof(T);
+  char *auxiliary_a = ping_points_x + 2 * ping_pong_gap;
+  char *auxiliary_b = auxiliary_a + span_num_deal * sizeof(T);
+  char *auxiliary_c = auxiliary_b + span_num_deal * sizeof(T);
+  char *auxiliary_d = auxiliary_c + span_num_deal * sizeof(T);
+  char *auxiliary_e = auxiliary_d + span_num_deal * sizeof(T);
+  char *auxiliary_f = auxiliary_e + span_num_deal * sizeof(T);
+  size_t span_load_input1_size = span_num_deal * sizeof(T);
+  size_t span_load_input2_size = span_num_deal * sizeof(T);
+  size_t span_load_input3_size = span_num_deal * sizeof(T);
+  size_t span_load_input4_size = span_num_deal * sizeof(T);
+
+  for (int bs_idx = batch_start; bs_idx <= batch_end; bs_idx++) {
+    __memcpy_async(boxes3d, boxes3d_gdram + bs_idx * boxes_num * 7 * sizeof(T),
+                   boxes_num * 7 * sizeof(T), GDRAM2NRAM);
+    __bang_write_zero((int *)cnt, PAD_UP(boxes_num, NFU_ALIGN_SIZE));
+
+    const char *points_x_start = points_x_gdram + bs_idx * pts_num * sizeof(T);
+    const char *points_y_start = points_y_gdram + bs_idx * pts_num * sizeof(T);
+    const char *points_z_start = points_z_gdram + bs_idx * pts_num * sizeof(T);
+    const char *point_features_start =
+        point_features_gdram + bs_idx * feature_in_len * pts_num * sizeof(T);
+    char *pooled_features_start =
+        pooled_features_gdram +
+        (bs_idx * boxes_num * sampled_pts_num * (3 + feature_in_len)) * sizeof(T);
+    char *pooled_empty_flag_start = pooled_empty_flag_gdram + bs_idx * boxes_num * sizeof(int);
+    size_t box_start = bs_idx == batch_start ? first_batch_box_start : 0;
+    size_t box_end = bs_idx == batch_end ? last_batch_box_end : boxes_num;
+
+    if (repeat > 0) {
+      __memcpy_async(ping_points_x, points_x_start, span_load_input1_size, GDRAM2NRAM);
+      __memcpy_async(ping_points_y, points_y_start, span_load_input2_size, GDRAM2NRAM);
+      __memcpy_async(ping_points_z, points_z_start, span_load_input3_size, GDRAM2NRAM);
+      __asm__ volatile("sync;");
+    }
+
+    for (int i = 0; i < repeat - 1; i++) {
+      __memcpy_async(ping_points_x + ((i + 1) % 2) * ping_pong_gap,
+                     points_x_start + (i + 1) * span_load_input1_size, span_load_input1_size,
+                     GDRAM2NRAM);
+      __memcpy_async(ping_points_y + ((i + 1) % 2) * ping_pong_gap,
+                     points_y_start + (i + 1) * span_load_input2_size, span_load_input2_size,
+                     GDRAM2NRAM);
+      __memcpy_async(ping_points_z + ((i + 1) % 2) * ping_pong_gap,
+                     points_z_start + (i + 1) * span_load_input3_size, span_load_input3_size,
+                     GDRAM2NRAM);
+      for (int box_idx = box_start; box_idx < box_end; box_idx++) {
+        computeStoreRoipointPool3d<T>(
+            boxes3d, (int *)cnt, ping_points_x + (i % 2) * ping_pong_gap,
+            ping_points_y + (i % 2) * ping_pong_gap, ping_points_z + (i % 2) * ping_pong_gap,
+            point_features_start + i * span_load_input4_size, auxiliary_a, auxiliary_b, auxiliary_c,
+            auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, pooled_features_start, pooled_empty_flag_start);
+      }
+      __asm__ volatile("sync;");
+    }
+
+    if (rem > 0) {
+      if (sizeof(T) == sizeof(float)) {
+        __bang_write_value((T *)(ping_points_x + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+        __bang_write_value((T *)(ping_points_y + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+        __bang_write_value((T *)(ping_points_z + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+      } else {
+        __bang_write_value((T *)(ping_points_x + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+        __bang_write_value((T *)(ping_points_y + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+        __bang_write_value((T *)(ping_points_z + (repeat % 2) * ping_pong_gap +
+                                 PAD_DOWN(rem, NFU_ALIGN_SIZE) * sizeof(T)),
+                           NFU_ALIGN_SIZE, (T)NAN);
+      }
+      __memcpy_async(ping_points_x + (repeat % 2) * ping_pong_gap,
+                     points_x_start + repeat * span_load_input1_size, rem * sizeof(T), GDRAM2NRAM);
+      __memcpy_async(ping_points_y + (repeat % 2) * ping_pong_gap,
+                     points_y_start + repeat * span_load_input2_size, rem * sizeof(T), GDRAM2NRAM);
+      __memcpy_async(ping_points_z + (repeat % 2) * ping_pong_gap,
+                     points_z_start + repeat * span_load_input3_size, rem * sizeof(T), GDRAM2NRAM);
+    }
+
+    if (repeat > 0 && rem > 0) {
+      for (int box_idx = box_start; box_idx < box_end; box_idx++) {
+        computeStoreRoipointPool3d<T>(
+            boxes3d, (int *)cnt, ping_points_x + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_y + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_z + ((repeat - 1) % 2) * ping_pong_gap,
+            point_features_start + (repeat - 1) * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, pooled_features_start, pooled_empty_flag_start);
+      }
+    } else if (repeat > 0 && rem == 0) {
+      for (int box_idx = box_start; box_idx < box_end; box_idx++) {
+        computeStoreLastBlockRoipointPool3d<T>(
+            boxes3d, (int *)cnt, ping_points_x + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_y + ((repeat - 1) % 2) * ping_pong_gap,
+            ping_points_z + ((repeat - 1) % 2) * ping_pong_gap,
+            point_features_start + (repeat - 1) * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, span_num_deal, span_num_deal, pooled_features_start,
+            pooled_empty_flag_start);
+      }
+    }
+
+    if (rem > 0) {
+      __asm__ volatile("sync;");
+      for (int box_idx = box_start; box_idx < box_end; box_idx++) {
+        computeStoreLastBlockRoipointPool3d<T>(
+            boxes3d, (int *)cnt, ping_points_x + (repeat % 2) * ping_pong_gap,
+            ping_points_y + (repeat % 2) * ping_pong_gap,
+            ping_points_z + (repeat % 2) * ping_pong_gap,
+            point_features_start + repeat * span_load_input4_size, auxiliary_a, auxiliary_b,
+            auxiliary_c, auxiliary_d, auxiliary_e, auxiliary_f, box_idx, pts_num, feature_in_len,
+            sampled_pts_num, align_rem, span_num_deal, pooled_features_start,
+            pooled_empty_flag_start);
+      }
+    }
+  }
+}
+
+template __mlu_global__ void MLUUnion1KernelRoiPointPool3dForward<float>(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram);
+
+template __mlu_global__ void MLUUnion1KernelRoiPointPool3dForward<half>(
+    const int batch_size,
+    const int pts_num,
+    const int boxes_num,
+    const int feature_in_len,
+    const int sampled_pts_num,
+    const char *points_xyz_gdram,
+    const char *point_features_gdram,
+    const char *boxes3d_gdram,
+    char *pooled_features_gdram,
+    char *pooled_empty_flag_gdram);
+
+void KernelRoiPointPool3dForward(cnrtDim3_t k_dim,
+                                 cnrtFunctionType_t k_type,
+                                 cnrtQueue_t queue,
+                                 const cnrtDataType_t d_type,
+                                 const int batch_size,
+                                 const int pts_num,
+                                 const int boxes_num,
+                                 const int feature_in_len,
+                                 const int sampled_pts_num,
+                                 const void *points_xyz,
+                                 const void *boxes3d,
+                                 const void *point_features,
+                                 void *pooled_features,
+                                 int *pooled_empty_flag) {
+  switch (d_type) {
+    default: { break; }
+    case CNRT_FLOAT32: {
+      MLUUnion1KernelRoiPointPool3dForward<float><<<k_dim, k_type, queue>>>(
+          batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num,
+          (char *)points_xyz, (char *)point_features, (char *)boxes3d,
+          (char *)pooled_features, (char *)pooled_empty_flag);
+    }; break;
+    case CNRT_FLOAT16: {
+      MLUUnion1KernelRoiPointPool3dForward<half><<<k_dim, k_type, queue>>>(
+          batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num,
+          (char *)points_xyz, (char *)point_features, (char *)boxes3d,
+          (char *)pooled_features, (char *)pooled_empty_flag);
+    }; break;
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/three_nn_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/three_nn_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..7927385104c56721aea7a2ab39d410f0daf9f9c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/three_nn_mlu_kernel.mlu
@@ -0,0 +1,466 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+#include <algorithm>
+
+__nram__ char nram_buffer[MAX_NRAM_SIZE];
+
+#if __BANG_ARCH__ >= 322
+/**
+ * returns the index of ret, which is stored at the 1st position of the `ret`,
+ * used after bang_min
+ */
+__mlu_func__ uint32_t getIndice(half *ret) {
+  uint32_t indice = *((uint32_t *)((uint16_t *)ret + 1));
+  return indice;
+}
+
+/**
+ * returns the index of ret, which is stored at the 1st position of the `ret`,
+ * used after bang_min
+ */
+__mlu_func__ uint32_t getIndice(float *ret) {
+  uint32_t indice = ((uint32_t *)ret)[1];
+  return indice;
+}
+#endif
+
+template <typename T>
+__mlu_func__ void auxArgmin(T *nram_dst, T *nram_src, const int num_deal,
+                            T *value, int *index) {
+  __bang_min(nram_dst, nram_src, num_deal);
+  *value = nram_dst[0];
+  __bang_write_value(nram_dst, num_deal, *value);
+  __bang_eq(nram_dst, nram_src, nram_dst, num_deal);
+  __bang_findfirst1((uint32_t *)nram_dst, nram_dst, num_deal);
+  *index = *((int *)nram_dst);
+}
+
+template <typename T>
+__mlu_func__ void auxFuncFind3Min(T *nram_aux_a, const int auxa_offset,
+                                  int *nram_aux_b, const int auxb_offset,
+                                  T *nram_dest, T *nram_aux_sort_a,
+                                  int *nram_aux_sort_b, const int deal_offset) {
+  __bang_write_value(nram_aux_sort_a, auxa_offset, (T)(INFINITY));
+  __bang_write_value(nram_aux_sort_b, auxb_offset, (int)0);
+  int index = 0;
+  for (int i = 0; i < 3; i++) {
+#if __BANG_ARCH__ >= 322
+    __bang_argmin(nram_dest, nram_aux_a, auxa_offset);
+    nram_aux_sort_a[i] = nram_dest[0];
+    index = getIndice(nram_dest);
+#else
+    T value = 0;
+    auxArgmin(nram_dest, nram_aux_a, auxa_offset, &value, &index);
+    nram_aux_sort_a[i] = value;
+#endif
+    nram_aux_sort_b[i] = nram_aux_b[index];
+    __memset_nram(nram_aux_a + index, 1, (T)(INFINITY));
+  }
+  __memcpy((char *)nram_aux_a, (char *)nram_aux_sort_a, auxa_offset * sizeof(T),
+           NRAM2NRAM);
+  __memcpy((char *)nram_aux_b, (char *)nram_aux_sort_b,
+           auxb_offset * sizeof(int), NRAM2NRAM);
+}
+
+template <typename T>
+__mlu_func__ void auxFuncSort(T *nram_aux_a, const int auxa_offset,
+                              int *nram_aux_b, const int auxb_offset,
+                              T *nram_dest, T *nram_help_value,
+                              int *nram_help_idx, const int num_deal,
+                              const int deal_offset) {
+  for (int k = 0; k < num_deal; ++k) {
+    auxFuncFind3Min(nram_aux_a + k * auxa_offset, auxa_offset,
+                    nram_aux_b + k * auxb_offset, auxb_offset, nram_dest,
+                    nram_help_value, nram_help_idx, deal_offset);
+  }
+}
+
+template <typename T>
+__mlu_func__ void auxFuncNN(
+    size_t *output_aux_sort_a_gap, size_t *output_aux_sort_b_gap,
+    size_t *output_aux_dest_gap, size_t *output_unknown_gap,
+    size_t *output_known_gap, size_t *output_dist_gap, size_t *auxillary_a_gap,
+    size_t *auxillary_b_gap, size_t *known_num_deal, size_t *unknown_num_deal,
+    size_t *align_num, size_t *auxa_offset, size_t *auxb_offset) {
+  /*
+   * nram partition:
+   *        |-NFU_ALIGN_SIZE-|-2*NFU_ALIGN_SIZE-|-X*3*sizeof(T)-|
+   * space: |   aux_sort_a   |  aux_sort_b      |  nram_unknown |
+   *
+   *        | ------        (Y * 7 *sizeof(T)) ---------------- |
+   *        |   nram_known   |    nram_dist     |   nram_dest   |
+   *
+   *        | -X * NFU_ALIGN_SIZE ---|---X * 2 * NFU_ALIGN_SIZE-|
+   *        |  output_dist(aux_a)    |    output_dist(aux_b)    |
+   *  200 series
+   *  X = (MAX_NRAM - 3 * NFU_ALIGN_SIZE) * (2/3) / (3 * sizeof(T) + 3 *
+   *  NFU_ALIGN_SIZE)
+   *  Y = (MAX_NRAM - 3 * NFU_ALIGN_SIZE) * (1/3) / (7 * sizeof(T))
+   *  300 series
+   *  X = (MAX_NRAM - 3 * NFU_ALIGN_SIZE) * (4/5) / (3 *
+   *  sizeof(T) + 3 * NFU_ALIGN_SIZE)
+   *  Y = (MAX_NRAM - 3 * NFU_ALIGN_SIZE) *
+   *  (1/5) / (7 * sizeof(T))
+   *
+   */
+
+  *align_num = NFU_ALIGN_SIZE / sizeof(T);
+  *auxa_offset = NFU_ALIGN_SIZE / sizeof(T);
+  *auxb_offset = 2 * NFU_ALIGN_SIZE / sizeof(int);
+#if __BANG_ARCH__ >= 322
+  *known_num_deal = PAD_DOWN(
+      (MAX_NRAM_SIZE - 3 * NFU_ALIGN_SIZE) / 5 / (7 * sizeof(T)), *align_num);
+  *unknown_num_deal = PAD_DOWN((MAX_NRAM_SIZE - 3 * NFU_ALIGN_SIZE) / 5 * 4 /
+                                   (3 * sizeof(T) + 3 * NFU_ALIGN_SIZE),
+                               *align_num);
+#else
+  *known_num_deal = PAD_DOWN(
+      (MAX_NRAM_SIZE - 3 * NFU_ALIGN_SIZE) / 3 / (7 * sizeof(T)), *align_num);
+  *unknown_num_deal = PAD_DOWN((MAX_NRAM_SIZE - 3 * NFU_ALIGN_SIZE) / 3 * 2 /
+                                   (3 * sizeof(T) + 3 * NFU_ALIGN_SIZE),
+                               *align_num);
+#endif
+
+  *output_aux_sort_a_gap = 0;
+  *output_aux_sort_b_gap = *output_aux_sort_a_gap + NFU_ALIGN_SIZE;
+  *output_aux_dest_gap = *output_aux_sort_b_gap + 2 * NFU_ALIGN_SIZE;
+
+  *output_unknown_gap = *output_aux_dest_gap + *known_num_deal * sizeof(T);
+  *output_known_gap = *output_unknown_gap + *unknown_num_deal * 3 * sizeof(T);
+  *output_dist_gap = *output_known_gap + *known_num_deal * 3 * sizeof(T);
+  *auxillary_a_gap = *output_dist_gap + *known_num_deal * 3 * sizeof(T);
+  *auxillary_b_gap = *auxillary_a_gap + *unknown_num_deal * NFU_ALIGN_SIZE;
+}
+
+#if __BANG_ARCH__ >= 322
+template <typename T>
+__mlu_func__ bool containNanInf(T *nram_unknown) {
+  if (std::isnan(nram_unknown[0]) || std::isnan(nram_unknown[1]) ||
+      std::isnan(nram_unknown[2]) || std::isinf(nram_unknown[0]) ||
+      std::isinf(nram_unknown[1]) || std::isinf(nram_unknown[2]))
+    return true;
+  else
+    return false;
+}
+#endif
+
+template <typename T>
+__mlu_func__ void computeThreeNN(T *nram_unknown, T *nram_known, T *nram_dist,
+                                 T *nram_dest, T *nram_aux_a,
+                                 T *nram_aux_sort_a, int *nram_aux_b,
+                                 int *nram_aux_sort_b, const int known_num_deal,
+                                 const int known_seg_num, const int deal_offset,
+                                 const int known_count,
+                                 const int known_count_align) {
+  __bang_write_value(nram_dist, 3 * known_num_deal, (T)(INFINITY));
+#if __BANG_ARCH__ >= 322
+  if (!containNanInf(nram_unknown)) {
+#endif
+    // x1 - x2
+    __bang_sub_scalar(nram_dist, nram_known, nram_unknown[0],
+                      known_count_align);
+    // y1 - y2
+    __bang_sub_scalar(nram_dist + known_count_align,
+                      nram_known + known_count_align, nram_unknown[1],
+                      known_count_align);
+    // z1 - z2
+    __bang_sub_scalar(nram_dist + 2 * known_count_align,
+                      nram_known + 2 * known_count_align, nram_unknown[2],
+                      known_count_align);
+    __bang_square(nram_dist, nram_dist, 3 * known_count_align);
+    __bang_add(nram_dist, nram_dist, nram_dist + known_count_align,
+               known_count_align);
+    __bang_add(nram_dist, nram_dist, nram_dist + 2 * known_count_align,
+               known_count_align);
+#if __BANG_ARCH__ >= 322
+  }
+#endif
+
+  int index = 0;
+  for (int i = 0; i < 3; i++) {
+#if __BANG_ARCH__ >= 322
+    __bang_argmin(nram_dest, nram_dist, known_count_align);
+    nram_aux_a[i + deal_offset] = nram_dest[0];
+    index = getIndice(nram_dest);
+#else
+    T value = 0;
+    auxArgmin(nram_dest, nram_dist, known_count_align, &value, &index);
+    nram_aux_a[i + deal_offset] = value;
+#endif
+    nram_aux_b[i + deal_offset] = index + known_seg_num * known_num_deal;
+    __memset_nram(nram_dist + index, 1, (T)(INFINITY));
+  }
+}
+
+template <typename T>
+__mlu_func__ void loadTransposedKnownTensor(
+    char *nram_known, char *nram_dist, const char *known_gdram,
+    const int known_num_deal, const int batch_id, const int m,
+    const int known_seg_num, const int count, const int count_align_num) {
+  __bang_write_value(nram_known, 3 * known_num_deal, (T)(INFINITY));
+#if __BANG_ARCH__ >= 322
+  __bang_write_value(nram_dist, 3 * known_num_deal, (T)(INFINITY));
+  __memcpy(nram_dist,
+           known_gdram +
+               (batch_id * m * 3 + known_seg_num * known_num_deal) * sizeof(T),
+           count * sizeof(T), GDRAM2NRAM, count_align_num * sizeof(T),
+           m * sizeof(T), 2);
+  __bang_minequal((T *)nram_known, (T *)nram_known, (T *)nram_dist,
+                  3 * count_align_num);
+#else
+  __memcpy(nram_known,
+           known_gdram +
+               (batch_id * m * 3 + known_seg_num * known_num_deal) * sizeof(T),
+           count * sizeof(T), GDRAM2NRAM, count_align_num * sizeof(T),
+           m * sizeof(T), 2);
+#endif
+}
+
+template <typename T>
+__mlu_func__ void loadUnknownTensor(char *nram_unknown,
+                                    const char *unknown_gdram,
+                                    const int unknown_num_deal,
+                                    const int unknown_seg_num, const int count,
+                                    const int count_align_num) {
+  __memcpy(nram_unknown,
+           unknown_gdram + unknown_seg_num * unknown_num_deal * 3 * sizeof(T),
+           count * 3 * sizeof(T), GDRAM2NRAM);
+}
+
+template <typename T>
+__mlu_func__ void auxProcessSegment(
+    const int m, const int n, T *nram_unknown, T *nram_known, T *nram_dist,
+    T *nram_dest, T *known_gdram, T *nram_aux_a, const int auxa_offset,
+    int *nram_aux_b, const int auxb_offset, T *nram_aux_sort_a,
+    int *nram_aux_sort_b, const int unknown_num_deal, const int known_num_deal,
+    const int known_seg_num, const int unknown_seg_num, const int unknown_count,
+    const int known_count, const int known_count_align, const int start_idx,
+    int *deal_offset) {
+  int pre_batch_id = -1;
+  int cur_batch_id = -1;
+  pre_batch_id = start_idx / n;
+
+  // if aux_a space is not enough, get the first 3 min among aux_a and clear.
+  if (*deal_offset >= PAD_DOWN(auxa_offset, 3)) {
+    auxFuncSort(nram_aux_a, auxa_offset, nram_aux_b, auxb_offset, nram_dest,
+                nram_aux_sort_a, nram_aux_sort_b, unknown_count, *deal_offset);
+    *deal_offset = 3;
+  }
+
+  // load i'th segment of known batch data.
+  loadTransposedKnownTensor<T>((char *)nram_known, (char *)nram_dist,
+                               (char *)known_gdram, known_num_deal,
+                               pre_batch_id, m, known_seg_num, known_count,
+                               known_count_align);
+
+  for (int k = 0; k < unknown_count; ++k) {
+    cur_batch_id = (start_idx + k) / n;
+    if (cur_batch_id != pre_batch_id) {  // if batch id of unknown data changed,
+                                         // load corresponding known batch data
+      pre_batch_id = cur_batch_id;
+      loadTransposedKnownTensor<T>((char *)nram_known, (char *)nram_dist,
+                                   (char *)known_gdram, known_num_deal,
+                                   pre_batch_id, m, known_seg_num, known_count,
+                                   known_count_align);
+    }
+    computeThreeNN(nram_unknown + 3 * k, nram_known, nram_dist, nram_dest,
+                   nram_aux_a + k * auxa_offset, nram_aux_sort_a,
+                   nram_aux_b + k * auxb_offset, nram_aux_sort_b,
+                   known_num_deal, known_seg_num, *deal_offset, known_count,
+                   known_count_align);
+  }
+}
+
+template <typename T>
+__mlu_global__ void MLUUnion1KernelThreeNN(const int b, const int n,
+                                           const int m, char *unknown_gdram,
+                                           char *known_gdram, char *dist2_gdram,
+                                           int *idx_gdram) {
+  if (coreId == 0x80) {
+    return;
+  }
+
+  size_t output_aux_sort_a_gap = 0, output_aux_sort_b_gap = 0,
+         output_dest_gap = 0, output_unknown_gap = 0, output_known_gap = 0,
+         output_dist_gap = 0, auxillary_a_gap = 0, auxillary_b_gap = 0,
+         known_num_deal = 0, unknown_num_deal = 0, align_num = 0,
+         auxa_offset = 0, auxb_offset = 0;
+  auxFuncNN<T>(&output_aux_sort_a_gap, &output_aux_sort_b_gap, &output_dest_gap,
+               &output_unknown_gap, &output_known_gap, &output_dist_gap,
+               &auxillary_a_gap, &auxillary_b_gap, &known_num_deal,
+               &unknown_num_deal, &align_num, &auxa_offset, &auxb_offset);
+
+  int num_per_core = b * n / taskDim;
+  const int core_offset = num_per_core;
+
+  char *unknown_gdram_start =
+      unknown_gdram + taskId * 3 * core_offset * sizeof(T);
+  char *known_gdram_start = known_gdram;
+  char *output_dist_start = dist2_gdram + taskId * 3 * core_offset * sizeof(T);
+  int *output_idx_start = idx_gdram + taskId * 3 * core_offset;
+
+  const int rem = (b * n) % taskDim;
+  if (taskId == taskDim - 1) {
+    num_per_core += rem;
+  }
+
+  const int unknown_repeat =
+      num_per_core / unknown_num_deal;  // if unknown number is big, process it
+                                        // by unknown_repeat times.
+  const int unknown_rem = num_per_core % unknown_num_deal;  // unknown reminder
+  const int unknown_rem_align = PAD_UP(unknown_rem, align_num);
+
+  const int known_repeat =
+      m / known_num_deal;  // if known number is big, process it by
+                           // unknown_repeat times.
+  const int known_rem = m % known_num_deal;  // known reminder
+  const int known_rem_align = PAD_UP(known_rem, align_num);
+
+  char *nram_aux_sort_a = nram_buffer;
+  int *nram_aux_sort_b = (int *)(nram_buffer + output_aux_sort_b_gap);
+  char *nram_dest = nram_buffer + output_dest_gap;
+  char *nram_unknown = nram_buffer + output_unknown_gap;
+  char *nram_known = nram_buffer + output_known_gap;
+  char *nram_dist = nram_buffer + output_dist_gap;
+  char *nram_aux_a = nram_buffer + auxillary_a_gap;
+  int *nram_aux_b = (int *)(nram_buffer + auxillary_b_gap);
+  int deal_offset = 0;
+  int start_idx = -1;
+
+  for (int j = 0; j < unknown_repeat;
+       ++j) {  // process data within a unknown_repeat
+    // if unknown need to be process segmentally, use a aux_a and aux_b
+    // space to find first 3 minimum dist.
+    __bang_write_value(nram_aux_a, unknown_num_deal * auxa_offset,
+                       (T)(INFINITY));
+    __bang_write_value(nram_aux_b, unknown_num_deal * auxb_offset, (int)0);
+    loadUnknownTensor<T>(nram_unknown, unknown_gdram_start, unknown_num_deal, j,
+                         unknown_num_deal, unknown_num_deal);
+
+    deal_offset = 0;
+    start_idx = taskId * core_offset + j * unknown_num_deal;
+
+    for (int i = 0; i < known_repeat;
+         ++i) {  // process known data in segmentally.
+      auxProcessSegment<T>(
+          m, n, (T *)nram_unknown, (T *)nram_known, (T *)nram_dist,
+          (T *)nram_dest, (T *)known_gdram_start, (T *)nram_aux_a, auxa_offset,
+          nram_aux_b, auxb_offset, (T *)nram_aux_sort_a, nram_aux_sort_b,
+          unknown_num_deal, known_num_deal, i, j, unknown_num_deal,
+          known_num_deal, known_num_deal, start_idx, &deal_offset);
+      deal_offset += 3;
+    }
+
+    if (known_rem > 0) {  // process known rem
+      __bang_write_value(nram_known, 3 * known_num_deal, (T)(INFINITY));
+      auxProcessSegment<T>(
+          m, n, (T *)nram_unknown, (T *)nram_known, (T *)nram_dist,
+          (T *)nram_dest, (T *)known_gdram_start, (T *)nram_aux_a, auxa_offset,
+          nram_aux_b, auxb_offset, (T *)nram_aux_sort_a, nram_aux_sort_b,
+          unknown_num_deal, known_num_deal, known_repeat, j, unknown_num_deal,
+          known_rem, known_rem_align, start_idx, &deal_offset);
+    }
+
+    deal_offset += 3;
+
+    if (deal_offset > 3) {
+      auxFuncSort((T *)nram_aux_a, auxa_offset, nram_aux_b, auxb_offset,
+                  (T *)nram_dest, (T *)nram_aux_sort_a, nram_aux_sort_b,
+                  unknown_num_deal, deal_offset);
+      deal_offset = 0;
+    }
+
+    __memcpy((char *)output_dist_start + j * unknown_num_deal * 3 * sizeof(T),
+             (char *)nram_aux_a, 3 * sizeof(T), NRAM2GDRAM, 3 * sizeof(T),
+             auxa_offset * sizeof(T), unknown_num_deal - 1);
+    __memcpy((char *)output_idx_start + j * unknown_num_deal * 3 * sizeof(int),
+             (char *)nram_aux_b, 3 * sizeof(int), NRAM2GDRAM, 3 * sizeof(int),
+             auxb_offset * sizeof(int), unknown_num_deal - 1);
+  }
+
+  if (unknown_rem > 0) {  // process unknown rem
+    deal_offset = 0;
+    __bang_write_value(nram_aux_a, unknown_num_deal * auxa_offset,
+                       (T)(INFINITY));
+    __bang_write_value(nram_aux_b, unknown_num_deal * auxb_offset, (int)0);
+    loadUnknownTensor<T>(nram_unknown, unknown_gdram_start, unknown_num_deal,
+                         unknown_repeat, unknown_rem, unknown_rem_align);
+    start_idx = taskId * core_offset + unknown_repeat * unknown_num_deal;
+
+    for (int i = 0; i < known_repeat; ++i) {
+      auxProcessSegment<T>(
+          m, n, (T *)nram_unknown, (T *)nram_known, (T *)nram_dist,
+          (T *)nram_dest, (T *)known_gdram_start, (T *)nram_aux_a, auxa_offset,
+          nram_aux_b, auxb_offset, (T *)nram_aux_sort_a, nram_aux_sort_b,
+          unknown_num_deal, known_num_deal, i, unknown_repeat, unknown_rem,
+          known_num_deal, known_num_deal, start_idx, &deal_offset);
+      deal_offset += 3;
+    }
+
+    if (known_rem > 0) {
+      __bang_write_value(nram_known, 3 * known_num_deal, (T)(INFINITY));
+      start_idx = taskId * core_offset + unknown_repeat * unknown_num_deal;
+
+      auxProcessSegment<T>(
+          m, n, (T *)nram_unknown, (T *)nram_known, (T *)nram_dist,
+          (T *)nram_dest, (T *)known_gdram_start, (T *)nram_aux_a, auxa_offset,
+          nram_aux_b, auxb_offset, (T *)nram_aux_sort_a, nram_aux_sort_b,
+          unknown_num_deal, known_num_deal, known_repeat, unknown_repeat,
+          unknown_rem, known_rem, known_rem_align, start_idx, &deal_offset);
+
+      deal_offset += 3;
+    }
+    if (deal_offset > 3) {
+      auxFuncSort((T *)nram_aux_a, auxa_offset, nram_aux_b, auxb_offset,
+                  (T *)nram_dest, (T *)nram_aux_sort_a, nram_aux_sort_b,
+                  unknown_rem, deal_offset);
+      deal_offset = 0;
+    }
+
+    __memcpy((char *)output_dist_start +
+                 unknown_repeat * unknown_num_deal * 3 * sizeof(T),
+             (char *)nram_aux_a, 3 * sizeof(T), NRAM2GDRAM, 3 * sizeof(T),
+             auxa_offset * sizeof(T), unknown_rem - 1);
+    __memcpy((char *)output_idx_start +
+                 unknown_repeat * unknown_num_deal * 3 * sizeof(int),
+             (char *)nram_aux_b, 3 * sizeof(int), NRAM2GDRAM, 3 * sizeof(int),
+             auxb_offset * sizeof(int), unknown_rem - 1);
+  }
+}
+
+template __mlu_global__ void MLUUnion1KernelThreeNN<float>(
+    const int b, const int n, const int m, char *unknown_gdram,
+    char *known_gdram, char *dist2_gdram, int *idx_gdram);
+
+template __mlu_global__ void MLUUnion1KernelThreeNN<half>(
+    const int b, const int n, const int m, char *unknown_gdram,
+    char *known_gdram, char *dist2_gdram, int *idx_gdram);
+
+void KernelThreeNNForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t data_type,
+                          const void *unknown, const void *known, void *dist2,
+                          int *idx, const int b, const int n, const int m) {
+  switch (data_type) {
+    case CNRT_FLOAT16: {
+      MLUUnion1KernelThreeNN<half><<<k_dim, k_type, queue>>>(
+          b, n, m, (char *)unknown, (char *)known, (char *)dist2, idx);
+    }; break;
+    case CNRT_FLOAT32: {
+      MLUUnion1KernelThreeNN<float><<<k_dim, k_type, queue>>>(
+          b, n, m, (char *)unknown, (char *)known, (char *)dist2, idx);
+    }; break;
+    default: {
+      break;
+    }
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/tin_shift_mlu_kernel.mlu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/tin_shift_mlu_kernel.mlu
new file mode 100644
index 0000000000000000000000000000000000000000..ed64c2b68cd73d28959f5e9ee6cb34d68c8213cf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mlu/tin_shift_mlu_kernel.mlu
@@ -0,0 +1,307 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "common_mlu_helper.hpp"
+
+__nram__ char data_nram[MAX_NRAM_SIZE];
+
+template <typename T>
+__mlu_func__ void mluMultiKernelTinShift(
+    const T *input, const int *shifts, T *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel) {
+  for (int cur_channel_index = taskId;
+       cur_channel_index < batch_size * channel_size;
+       cur_channel_index += taskDim) {
+    int n_index = cur_channel_index / channel_size;
+    int group_id = cur_channel_index % channel_size / group_channel;
+    int t_shift = shifts[n_index * group_size + group_id];
+    int index = cur_channel_index % channel_size * hw_size +
+                n_index * time_size * channel_size * hw_size;
+    __bang_write_value(data_nram, MAX_NRAM_SIZE, (char)0);
+    __asm__ volatile("sync;");
+    if (abs(t_shift) >= time_size) {
+      __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+               channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+               time_size - 1);
+    } else {
+      if (t_shift > 0) {
+        __memcpy(data_nram + t_shift * hw_size * sizeof(T), input + index,
+                 hw_size * sizeof(T), GDRAM2NRAM, hw_size * sizeof(T),
+                 channel_size * hw_size * sizeof(T), time_size - 1 - t_shift);
+        __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                 channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                 time_size - 1);
+      } else {
+        __memcpy(data_nram, input + (index - t_shift * channel_size * hw_size),
+                 hw_size * sizeof(T), GDRAM2NRAM, hw_size * sizeof(T),
+                 channel_size * hw_size * sizeof(T), time_size - 1 + t_shift);
+        __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                 channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                 time_size - 1);
+      }
+    }
+    __asm__ volatile("sync;");
+  }
+}
+
+template <typename T>
+__mlu_func__ void mluHwSplit(const T *input, const int t_shift,
+                             const int time_size, const int hw_size,
+                             const int channel_size, const int index,
+                             const int cur_sequence_index,
+                             const int max_length_per_core, T *output) {
+  for (int cur_index = index; cur_index < index + hw_size;
+       cur_index += max_length_per_core) {
+    int memcpy_size = max_length_per_core;
+    if (cur_index + max_length_per_core > index + hw_size) {
+      memcpy_size = index + hw_size - cur_index;
+    }
+    if (cur_sequence_index - t_shift < 0 ||
+        cur_sequence_index - t_shift >= time_size) {
+      __memcpy(output + cur_index, data_nram, memcpy_size * sizeof(T),
+               NRAM2GDRAM);
+    } else {
+      __memcpy(data_nram, input + cur_index - t_shift * channel_size * hw_size,
+               memcpy_size * sizeof(T), GDRAM2NRAM);
+      __memcpy(output + cur_index, data_nram, memcpy_size * sizeof(T),
+               NRAM2GDRAM);
+    }
+    __asm__ volatile("sync;");
+  }
+}
+
+template <typename T>
+__mlu_func__ void mluMultiKernelTinShiftSplitSequence(
+    const T *input, const int *shifts, T *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel,
+    const int max_number_hw_per_core, const int max_length_per_core) {
+  const int tmp_max_number_hw_per_core =
+      max_number_hw_per_core > 0 ? max_number_hw_per_core : 1;
+  const int loop_time = time_size / tmp_max_number_hw_per_core +
+                        ((time_size % tmp_max_number_hw_per_core) > 0 ? 1 : 0);
+  int segmentime_size = tmp_max_number_hw_per_core;
+  int res_segment = time_size % tmp_max_number_hw_per_core;
+
+  for (int cur_segment_index = taskId;
+       cur_segment_index < loop_time * batch_size * channel_size;
+       cur_segment_index += taskDim) {
+    int n_index = cur_segment_index / loop_time / channel_size;
+    int group_id = cur_segment_index / loop_time % channel_size / group_channel;
+    int t_shift = shifts[n_index * group_size + group_id];
+    int index = n_index * time_size * channel_size * hw_size +
+                (cur_segment_index / loop_time % channel_size) * hw_size +
+                cur_segment_index % loop_time * segmentime_size * hw_size *
+                    channel_size;
+    char *dst_gdram2nram = data_nram;
+    const T *src_gdram2nram = input + index;
+    int count_gdram2nram = -1;
+    int count_nram2gdram = -1;
+    int next_sequence_index =
+        index / hw_size / channel_size % time_size + segmentime_size;
+    int cur_sequence_index = index / hw_size / channel_size % time_size;
+    __bang_write_value(data_nram, MAX_NRAM_SIZE, (char)0);
+    __asm__ volatile("sync;");
+    if (max_number_hw_per_core == 0) {
+      mluHwSplit(input, t_shift, time_size, hw_size, channel_size, index,
+                 cur_sequence_index, max_length_per_core, output);
+      continue;
+    }
+    if (abs(t_shift) >= time_size) {
+      if ((cur_segment_index + 1) % loop_time == 0 && res_segment != 0) {
+        __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                 channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                 res_segment - 1);
+      } else {
+        __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                 channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                 segmentime_size - 1);
+      }
+      continue;
+    }
+    if (t_shift == 0) {
+      if ((cur_segment_index + 1) % loop_time == 0 && res_segment != 0) {
+        dst_gdram2nram = data_nram;
+        src_gdram2nram = input + index;
+        count_gdram2nram = res_segment - 1;
+        count_nram2gdram = res_segment - 1;
+      } else {
+        dst_gdram2nram = data_nram;
+        src_gdram2nram = input + index;
+        count_gdram2nram = segmentime_size - 1;
+        count_nram2gdram = segmentime_size - 1;
+      }
+    } else if (t_shift > 0) {
+      int first_index_cur_channel =
+          n_index * time_size * channel_size * hw_size +
+          (cur_segment_index / loop_time % channel_size) * hw_size;
+      if ((cur_segment_index + 1) % loop_time == 0 && res_segment != 0) {
+        dst_gdram2nram = data_nram;
+        src_gdram2nram =
+            input +
+            (index - t_shift * channel_size * hw_size < first_index_cur_channel
+                 ? first_index_cur_channel
+                 : index - t_shift * channel_size * hw_size);
+        count_gdram2nram = res_segment - 1;
+        count_nram2gdram = res_segment - 1;
+        if (cur_sequence_index < t_shift && t_shift < next_sequence_index) {
+          dst_gdram2nram =
+              data_nram + t_shift % segmentime_size * hw_size * sizeof(T);
+          count_gdram2nram = res_segment - (t_shift - cur_sequence_index) - 1;
+        }
+      } else {
+        if (t_shift >= next_sequence_index) {
+          __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                   channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                   segmentime_size - 1);
+          continue;
+        } else if (cur_sequence_index < t_shift &&
+                   t_shift < next_sequence_index) {
+          dst_gdram2nram =
+              data_nram + t_shift % segmentime_size * hw_size * sizeof(T);
+          src_gdram2nram = input + first_index_cur_channel;
+          count_gdram2nram = segmentime_size - (t_shift % segmentime_size) - 1;
+          count_nram2gdram = segmentime_size - 1;
+        } else {
+          dst_gdram2nram = data_nram;
+          src_gdram2nram = input + index - t_shift * channel_size * hw_size;
+          count_gdram2nram = segmentime_size - 1;
+          count_nram2gdram = segmentime_size - 1;
+        }
+      }
+    } else {
+      int offset_index = time_size + t_shift;
+      if (cur_sequence_index >= offset_index) {
+        if ((cur_segment_index + 1) % loop_time == 0 && res_segment != 0) {
+          __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                   channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                   res_segment - 1);
+          continue;
+        } else {
+          __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+                   channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+                   segmentime_size - 1);
+          continue;
+        }
+      } else {
+        dst_gdram2nram = data_nram;
+        src_gdram2nram = input + index - t_shift * channel_size * hw_size;
+        if (cur_sequence_index - t_shift + segmentime_size < time_size) {
+          count_gdram2nram = segmentime_size - 1;
+          count_nram2gdram = segmentime_size - 1;
+        } else {
+          count_gdram2nram = time_size - (cur_sequence_index - t_shift) - 1;
+          count_nram2gdram =
+              (segmentime_size - 1) < (time_size - cur_sequence_index - 1)
+                  ? (segmentime_size - 1)
+                  : (time_size - cur_sequence_index - 1);
+        }
+      }
+    }
+    __memcpy(dst_gdram2nram, src_gdram2nram, hw_size * sizeof(T), GDRAM2NRAM,
+             hw_size * sizeof(T), channel_size * hw_size * sizeof(T),
+             count_gdram2nram);
+    __memcpy(output + index, data_nram, hw_size * sizeof(T), NRAM2GDRAM,
+             channel_size * hw_size * sizeof(T), hw_size * sizeof(T),
+             count_nram2gdram);
+    __asm__ volatile("sync;");
+  }
+}
+
+__mlu_entry__ void MLUUnion1KernelTinShift(
+    const void *input, const void *shifts, void *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel,
+    const cnrtDataType_t data_dtype) {
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  switch (data_dtype) {
+    case CNRT_FLOAT16: {
+      mluMultiKernelTinShift((half *)input, (const int *)shifts, (half *)output,
+                             batch_size, time_size, channel_size, hw_size,
+                             group_size, group_channel);
+    }; break;
+    case CNRT_FLOAT32: {
+      mluMultiKernelTinShift((float *)input, (const int *)shifts,
+                             (float *)output, batch_size, time_size,
+                             channel_size, hw_size, group_size, group_channel);
+    }; break;
+    default: { return; }
+  }
+}
+
+__mlu_entry__ void MLUUnion1KernelTinShiftSplitSequence(
+    const void *input, const void *shifts, void *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel,
+    const int max_number_hw_per_core, const int max_length_per_core,
+    const cnrtDataType_t data_dtype) {
+  // make sure that memcore is not used
+  if (coreId == 0x80) {
+    return;
+  }
+  switch (data_dtype) {
+    case CNRT_FLOAT16: {
+      mluMultiKernelTinShiftSplitSequence(
+          (half *)input, (const int *)shifts, (half *)output, batch_size,
+          time_size, channel_size, hw_size, group_size, group_channel,
+          max_number_hw_per_core, max_length_per_core);
+    }; break;
+    case CNRT_FLOAT32: {
+      mluMultiKernelTinShiftSplitSequence(
+          (float *)input, (const int *)shifts, (float *)output, batch_size,
+          time_size, channel_size, hw_size, group_size, group_channel,
+          max_number_hw_per_core, max_length_per_core);
+    }; break;
+    default: { return; }
+  }
+}
+
+void KernelTinShiftForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *input, const void *shifts, void *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel,
+    const cnrtDataType_t data_dtype, const int channel_per_core,
+    const int max_number_hw_per_core, const int max_length_per_core) {
+  if (channel_per_core >= 1) {
+    MLUUnion1KernelTinShift<<<k_dim, k_type, queue>>>(
+        input, shifts, output, batch_size, time_size, channel_size, hw_size,
+        group_size, group_channel, data_dtype);
+  } else {
+    MLUUnion1KernelTinShiftSplitSequence<<<k_dim, k_type, queue>>>(
+        input, shifts, output, batch_size, time_size, channel_size, hw_size,
+        group_size, group_channel, max_number_hw_per_core, max_length_per_core,
+        data_dtype);
+  }
+}
+
+void KernelTinShiftBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *grad_output, const void *shifts, void *grad_input,
+    const int batch_size, const int time_size, const int channel_size,
+    const int hw_size, const int group_size, const int group_channel,
+    const cnrtDataType_t data_dtype, const int channel_per_core,
+    const int max_number_hw_per_core, const int max_length_per_core) {
+  if (channel_per_core >= 1) {
+    MLUUnion1KernelTinShift<<<k_dim, k_type, queue>>>(
+        grad_output, shifts, grad_input, batch_size, time_size, channel_size,
+        hw_size, group_size, group_channel, data_dtype);
+  } else {
+    MLUUnion1KernelTinShiftSplitSequence<<<k_dim, k_type, queue>>>(
+        grad_output, shifts, grad_input, batch_size, time_size, channel_size,
+        hw_size, group_size, group_channel, max_number_hw_per_core,
+        max_length_per_core, data_dtype);
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSDevice.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSDevice.h
new file mode 100644
index 0000000000000000000000000000000000000000..e1d9d49618d7aea6a30b42630350c5a7b77ea0ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSDevice.h
@@ -0,0 +1,64 @@
+//  Copyright © 2022 Apple Inc.
+
+// This file is modify from:
+// https://github.com/pytorch/pytorch/blob/a85d1f0bcdd02cf18d3b0517337458cb51a18cdb/aten/src/ATen/mps/MPSDevice.h
+
+#pragma once
+#include <ATen/ATen.h>
+#include <c10/macros/Macros.h>
+#include <c10/util/Exception.h>
+
+#ifdef __OBJC__
+#include <Foundation/Foundation.h>
+#include <Metal/Metal.h>
+#include <MetalPerformanceShaders/MetalPerformanceShaders.h>
+typedef id<MTLDevice> MTLDevice_t;
+#else
+typedef void* MTLDevice;
+typedef void* MTLDevice_t;
+#endif
+
+using namespace std;
+
+namespace at {
+namespace mps {
+
+//-----------------------------------------------------------------
+//  MPSDevice
+//
+// MPSDevice is a singleton class that returns the default device
+//-----------------------------------------------------------------
+
+class TORCH_API MPSDevice {
+ public:
+  /**
+   * MPSDevice should not be cloneable.
+   */
+  MPSDevice(MPSDevice& other) = delete;
+  /**
+   * MPSDevice should not be assignable.
+   */
+  void operator=(const MPSDevice&) = delete;
+  /**
+   * Gets single instance of the Device.
+   */
+  static MPSDevice* getInstance();
+  /**
+   * Returns the single device.
+   */
+  MTLDevice_t device() { return _mtl_device; }
+
+  ~MPSDevice();
+
+ private:
+  static MPSDevice* _device;
+  MTLDevice_t _mtl_device;
+  MPSDevice();
+};
+
+TORCH_API bool is_available();
+
+TORCH_API at::Allocator* GetMPSAllocator(bool useSharedAllocator = false);
+
+}  // namespace mps
+}  // namespace at
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.h
new file mode 100644
index 0000000000000000000000000000000000000000..41c33fba8cbdd43cc5b3285603c11c6f9eee617b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.h
@@ -0,0 +1,61 @@
+#ifndef _MPS_LIBRARY_H_
+#define _MPS_LIBRARY_H_
+
+#include <string>
+#include <unordered_map>
+
+#ifdef __OBJC__
+#include <Foundation/Foundation.h>
+#include <Metal/Metal.h>
+#include <MetalPerformanceShaders/MetalPerformanceShaders.h>
+
+typedef id<MTLComputePipelineState> MTLComputePipelineState_t;
+typedef id<MTLLibrary> MTLLibrary_t;
+#else
+typedef void* MTLComputePipelineState;
+typedef void* MTLComputePipelineState_t;
+typedef void* MTLLibrary;
+typedef void* MTLLibrary_t;
+#endif
+
+class MPSLibrary {
+ public:
+  // disable constructor for singleton
+  static MPSLibrary* createFromUrl(const std::string& library_url);
+  static MPSLibrary* createFromSource(const std::string& source);
+  ~MPSLibrary();
+
+  MTLLibrary_t library() { return _library; }
+
+  MTLComputePipelineState_t getComputePipelineState(
+      const std::string& function_name);
+
+ private:
+  MTLLibrary_t _library;
+  std::unordered_map<std::string, MTLComputePipelineState_t> _pso_map;
+};
+
+class MPSLibraryManager {
+ public:
+  // disable constructor for singleton
+  MPSLibraryManager(const MPSLibraryManager&) = delete;
+  MPSLibraryManager& operator=(const MPSLibraryManager&) = delete;
+  MPSLibraryManager(MPSLibraryManager&&) = delete;
+  MPSLibraryManager& operator=(MPSLibraryManager&&) = delete;
+
+  static MPSLibraryManager* getInstance();
+
+  bool hasLibrary(const std::string& name);
+
+  MPSLibrary* getLibrary(const std::string& library_url);
+
+  MPSLibrary* createLibraryFromSouce(const std::string& name,
+                                     const std::string& sources);
+
+  ~MPSLibraryManager();
+
+ private:
+  MPSLibraryManager();
+  std::unordered_map<std::string, std::unique_ptr<MPSLibrary>> _library_map;
+};
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.mm b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.mm
new file mode 100644
index 0000000000000000000000000000000000000000..99addc7e28222f890e0b65660bb97711b6b52305
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSLibrary.mm
@@ -0,0 +1,107 @@
+#include "MPSLibrary.h"
+#include "MPSDevice.h"
+
+static std::unique_ptr<MPSLibraryManager> mps_library_manager=nullptr;
+
+MPSLibraryManager* MPSLibraryManager::getInstance() {
+  if(!mps_library_manager)
+    mps_library_manager = std::unique_ptr<MPSLibraryManager>(new MPSLibraryManager());
+  return mps_library_manager.get();
+}
+
+MPSLibraryManager::~MPSLibraryManager() {}
+
+MPSLibraryManager::MPSLibraryManager() {}
+
+bool MPSLibraryManager::hasLibrary(const std::string& name) {
+  return _library_map.find(name) != _library_map.end();
+}
+
+MPSLibrary* MPSLibraryManager::getLibrary(const std::string& library_url) {
+  if (_library_map.find(library_url) != _library_map.end()) {
+    return _library_map[library_url].get();
+  }
+  _library_map.emplace(std::make_pair(
+      library_url, std::unique_ptr<MPSLibrary>(MPSLibrary::createFromUrl(library_url))));
+  return _library_map[library_url].get();
+}
+
+MPSLibrary* MPSLibraryManager::createLibraryFromSouce(const std::string& name,
+                                                      const std::string& source) {
+  NSString* ns_name = [NSString stringWithCString:name.c_str()];
+  if (_library_map.find(name) != _library_map.end()) {
+    NSLog(@"Library %@ already exist.", ns_name);
+    return nullptr;
+  }
+
+  _library_map.emplace(
+      std::make_pair(name, std::unique_ptr<MPSLibrary>(MPSLibrary::createFromSource(source))));
+  return _library_map[name].get();
+}
+
+MPSLibrary* MPSLibrary::createFromUrl(const std::string& library_url) {
+  MPSLibrary* library = new MPSLibrary();
+  @autoreleasepool {
+    NSError* error = nil;
+
+    // load library and func
+    NSString* utl_str = [NSString stringWithCString:library_url.c_str()];
+    NSURL* metal_url = [NSURL fileURLWithPath:utl_str];
+    library->_library = [at::mps::MPSDevice::getInstance()->device() newLibraryWithURL:metal_url
+                                                                                 error:&error];
+    if (library->_library == nil) {
+      NSLog(@"Failed to find library, error %@.", error);
+      exit(1);
+    }
+  }
+
+  return library;
+}
+
+MPSLibrary* MPSLibrary::createFromSource(const std::string& sources) {
+  MPSLibrary* library = new MPSLibrary();
+  @autoreleasepool {
+    NSError* error = nil;
+
+    // load library and func
+    NSString* code_str = [NSString stringWithCString:sources.c_str()];
+    library->_library = [at::mps::MPSDevice::getInstance()->device() newLibraryWithSource:code_str
+                                                                                  options:nil
+                                                                                    error:&error];
+    if (library->_library == nil) {
+      NSLog(@"Failed to find library, error %@.", error);
+      exit(1);
+    }
+  }
+
+  return library;
+}
+
+MPSLibrary::~MPSLibrary() {
+  [_library release];
+  _library = nil;
+}
+
+MTLComputePipelineState_t MPSLibrary::getComputePipelineState(const std::string& function_name) {
+  if (_pso_map.find(function_name) != _pso_map.end()) {
+    return _pso_map[function_name];
+  }
+
+  MTLComputePipelineState_t pso;
+  @autoreleasepool {
+    NSError* error = nil;
+
+    // create function
+    NSString* function_name_str = [NSString stringWithCString:function_name.c_str()];
+    id<MTLFunction> func = [_library newFunctionWithName:function_name_str];
+    if (func == nil) {
+      NSLog(@"Failed to created pipeline state object, error %@.", error);
+      exit(1);
+    }
+    // create pipeline
+    pso = [at::mps::MPSDevice::getInstance()->device() newComputePipelineStateWithFunction:func
+                                                                                     error:&error];
+    _pso_map.emplace(std::make_pair(function_name, pso));
+  }
+  return _pso_map[function_name];
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSStream.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSStream.h
new file mode 100644
index 0000000000000000000000000000000000000000..54cd388494c8bbac636db44dd5c8afd1915357c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSStream.h
@@ -0,0 +1,132 @@
+//  Copyright © 2022 Apple Inc.
+
+// This file is modify from:
+// https://github.com/pytorch/pytorch/blob/a85d1f0bcdd02cf18d3b0517337458cb51a18cdb/aten/src/ATen/mps/MPSStream.h
+
+#pragma once
+
+#include <cstdint>
+#include <utility>
+
+#include <c10/core/DeviceGuard.h>
+#include <c10/core/Stream.h>
+#include <c10/util/Exception.h>
+#include "MPSDevice.h"
+
+#ifdef __OBJC__
+#include <Foundation/Foundation.h>
+#include <Metal/Metal.h>
+#include <MetalPerformanceShaders/MetalPerformanceShaders.h>
+#include <MetalPerformanceShadersGraph/MetalPerformanceShadersGraph.h>
+typedef id<MTLCommandQueue> MTLCommandQueue_t;
+typedef id<MTLCommandBuffer> MTLCommandBuffer_t;
+typedef id<MTLSharedEvent> MTLSharedEvent_t;
+typedef id<MTLDevice> MTLDevice_t;
+#else
+typedef void* MTLCommandQueue_t;
+typedef void* MTLCommandQueue;
+typedef void* MTLCommandBuffer_t;
+typedef void* MTLCommandBuffer;
+typedef void* MTLSharedEvent_t;
+typedef void* dispatch_queue_t;
+typedef void* MTLDevice_t;
+#define nil NULL;
+#endif
+
+namespace at {
+namespace mps {
+
+//-----------------------------------------------------------------
+//  MPSStream
+//-----------------------------------------------------------------
+
+class TORCH_API MPSStream {
+ public:
+  enum Unchecked { UNCHECKED };
+  /// Construct a MPSStream from a Stream.  This construction is checked,
+  /// and will raise an error if the Stream is not, in fact, a MPS stream.
+  explicit MPSStream(Stream stream);
+
+  ~MPSStream();
+  MTLCommandQueue_t commandQueue() const { return _commandQueue; };
+  dispatch_queue_t queue() const { return _serialQueue; }
+
+  MTLCommandBuffer_t commandBuffer();
+  void commit(bool flush);
+  void commitAndWait();
+  void synchronize();
+
+  void flush();
+
+  /// Get the MPS device index that this stream is associated with.
+  c10::DeviceIndex device_index() const { return _stream.device_index(); }
+
+  MTLCommandQueue_t stream() const { return _commandQueue; };
+
+  MTLDevice_t device() const { return [_commandQueue device]; }
+
+  /// Explicit conversion to Stream.
+  Stream unwrap() const { return _stream; }
+
+ private:
+  Stream _stream;
+  MTLCommandQueue_t _commandQueue = nil;
+  MTLCommandBuffer_t _commandBuffer = nil;
+  void _flush(bool commitAndWait) const;
+
+  dispatch_queue_t _serialQueue = nullptr;
+};
+
+/**
+ * Get the current MPS stream
+ */
+TORCH_API MPSStream* getCurrentMPSStream();
+
+/**
+ * Get the default MPS stream
+ */
+TORCH_API MPSStream* getDefaultMPSStream();
+
+//-----------------------------------------------------------------
+//  MPSStreamImpl
+//-----------------------------------------------------------------
+
+class TORCH_API MPSStreamImpl {
+ public:
+  /**
+   * Gets single instance of the MPSStream.
+   */
+  static MPSStream* getInstance();
+
+ private:
+  static MPSStream* _stream;
+  MPSStreamImpl();
+};
+
+//-----------------------------------------------------------------
+//  MPSEvent
+//-----------------------------------------------------------------
+
+struct TORCH_API MPSEvent {
+  MPSEvent();
+  // MPSEvent(id<MTLDevice> device);
+
+  ~MPSEvent();
+  MTLSharedEvent_t event() const { return _event; }
+
+  void recordEvent(MPSStream* stream);
+  void waitForEvent(MPSStream* queue);  // waits on the cpu
+  bool queryEvent();
+  uint64_t getCurrentValue() { return _currentValue; }
+  void setCurrentValue(uint64_t currValue) { _currentValue = currValue; }
+
+ private:
+  bool _isRecorded = false;
+  uint64_t _currentValue = 0;
+  MTLSharedEvent_t _event;
+};
+
+typedef MPSEvent* mpsEvent_t;
+
+}  // namespace mps
+}  // namespace at
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSUtils.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSUtils.h
new file mode 100644
index 0000000000000000000000000000000000000000..2a4ce6d7978d566e88dd22ee4f9722df914ff0de
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/mps/MPSUtils.h
@@ -0,0 +1,51 @@
+#ifndef _MPS_UTILS_H_
+#define _MPS_UTILS_H_
+#include <torch/extension.h>
+#ifdef __OBJC__
+#include <Foundation/Foundation.h>
+#include <Metal/Metal.h>
+#include <MetalPerformanceShaders/MetalPerformanceShaders.h>
+
+typedef id<MTLBuffer> MTLBuffer_t;
+typedef id<MTLComputeCommandEncoder> MTLComputeCommandEncoder_t;
+#else
+typedef void* MTLBuffer;
+typedef void* MTLBuffer_t;
+typedef void* MTLComputeCommandEncoder;
+typedef void* MTLComputeCommandEncoder_t;
+#endif
+
+// utils
+static inline MTLBuffer_t getMTLBufferStorage(const at::Tensor& tensor) {
+  return __builtin_bit_cast(MTLBuffer_t, tensor.storage().data());
+}
+
+template <typename T,
+          std::enable_if_t<!std::is_same<std::decay_t<T>, at::Tensor>::value, bool> = true>
+void setMTLArg(MTLComputeCommandEncoder_t encoder, int index, T&& t);
+
+template <typename T,
+          std::enable_if_t<std::is_same<std::decay_t<T>, at::Tensor>::value, bool> = true>
+void setMTLArg(MTLComputeCommandEncoder_t encoder, int index, T&& t) {
+  [encoder setBuffer:getMTLBufferStorage(t) offset:0 atIndex:index];
+}
+
+template <typename T, std::enable_if_t<!std::is_same<std::decay_t<T>, at::Tensor>::value, bool>>
+void setMTLArg(MTLComputeCommandEncoder_t encoder, int index, T&& t) {
+  [encoder setBytes:&t length:sizeof(t) atIndex:index];
+}
+
+inline void setMTLArgsImpl(MTLComputeCommandEncoder_t, int) {}
+
+template <typename T, typename... Args>
+void setMTLArgsImpl(MTLComputeCommandEncoder_t encoder, int index, T&& t, Args&&... args) {
+  setMTLArg(encoder, index, std::forward<T>(t));
+  setMTLArgsImpl(encoder, index + 1, std::forward<Args>(args)...);
+}
+
+template <typename... Args>
+void setMTLArgs(MTLComputeCommandEncoder_t encoder, MTLComputePipelineState_t pso, Args&&... args) {
+  [encoder setComputePipelineState:pso];
+  setMTLArgsImpl(encoder, 0, std::forward<Args>(args)...);
+}
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cpp_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cpp_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..d95d7221bd5fabd343297e33ef7dfc5ba8294bc3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cpp_helper.hpp
@@ -0,0 +1,40 @@
+#ifndef PARROTS_CPP_HELPER
+#define PARROTS_CPP_HELPER
+#include <parrots/darray/darraymath.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/darraylite.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+#include <vector>
+
+using namespace parrots;
+
+#define PARROTS_PRIVATE_CASE_TYPE(prim_type, type, ...) \
+  case prim_type: {                                     \
+    using scalar_t = type;                              \
+    return __VA_ARGS__();                               \
+  }
+
+#define PARROTS_DISPATCH_FLOATING_TYPES(TYPE, ...)                  \
+  [&] {                                                             \
+    const auto& the_type = TYPE;                                    \
+    switch (the_type) {                                             \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float64, float, __VA_ARGS__) \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float32, float, __VA_ARGS__)  \
+      default:                                                      \
+        PARROTS_NOTSUPPORTED;                                       \
+    }                                                               \
+  }()
+
+#define PARROTS_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, ...)          \
+  [&] {                                                              \
+    const auto& the_type = TYPE;                                     \
+    switch (the_type) {                                              \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float64, float, __VA_ARGS__)  \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float32, float, __VA_ARGS__)   \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float16, float16, __VA_ARGS__) \
+      default:                                                       \
+        PARROTS_NOTSUPPORTED;                                        \
+    }                                                                \
+  }()
+
+#endif  // PARROTS_CPP_HELPER
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cuda_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cuda_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..45aea02eb75f1188a4926c4d8a9d9715ca8c3a21
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/parrots_cuda_helper.hpp
@@ -0,0 +1,111 @@
+#ifndef PARROTS_CUDA_HELPER
+#define PARROTS_CUDA_HELPER
+
+#include <cuda.h>
+#include <float.h>
+
+#include <parrots/darray/darraymath.hpp>
+#include <parrots/darray/mathfunctions.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/darrayutil.hpp>
+#include <parrots/foundation/exceptions.hpp>
+#include <parrots/foundation/float16.hpp>
+#include <parrots/foundation/mathfunction.hpp>
+
+#include "common_cuda_helper.hpp"
+#include "parrots_cudawarpfunction.cuh"
+
+using namespace parrots;
+using phalf = float16;
+
+#define __PHALF(x) (x.y)
+
+#define PARROTS_CUDA_CHECK(exp)                         \
+  do {                                                  \
+    cudaError_t err = exp;                              \
+    if (err != cudaSuccess) {                           \
+      fprintf(stderr, "cudaCheckError() failed : %s\n", \
+              cudaGetErrorString(err));                 \
+      exit(-1);                                         \
+    }                                                   \
+  } while (0)
+
+#define PARROTS_PRIVATE_CASE_TYPE(prim_type, type, ...) \
+  case prim_type: {                                     \
+    using scalar_t = type;                              \
+    return __VA_ARGS__();                               \
+  }
+
+#define PARROTS_DISPATCH_FLOATING_TYPES(TYPE, ...)                  \
+  [&] {                                                             \
+    const auto& the_type = TYPE;                                    \
+    switch (the_type) {                                             \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float64, float, __VA_ARGS__) \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float32, float, __VA_ARGS__)  \
+      default:                                                      \
+        PARROTS_NOTSUPPORTED;                                       \
+    }                                                               \
+  }()
+
+#define PARROTS_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, ...)          \
+  [&] {                                                              \
+    const auto& the_type = TYPE;                                     \
+    switch (the_type) {                                              \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float64, float, __VA_ARGS__)  \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float32, float, __VA_ARGS__)   \
+      PARROTS_PRIVATE_CASE_TYPE(Prim::Float16, float16, __VA_ARGS__) \
+      default:                                                       \
+        PARROTS_NOTSUPPORTED;                                        \
+    }                                                                \
+  }()
+
+/** atomicAdd **/
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600
+
+static __inline__ __device__ float atomicAdd(float* address, float val) {
+  unsigned long long int* address_as_ull = (unsigned long long int*)address;
+  unsigned long long int old = *address_as_ull, assumed;
+  if (val == 0.0) return __longlong_as_float(old);
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_ull, assumed,
+                    __float_as_longlong(val + __longlong_as_float(assumed)));
+  } while (assumed != old);
+  return __longlong_as_float(old);
+}
+
+#endif
+
+static __inline__ __device__ float16 atomicAdd(float16* address, float16 val) {
+  unsigned int* aligned =
+      (unsigned int*)((size_t)address - ((size_t)address & 2));
+  unsigned int old = *aligned;
+  unsigned int assumed;
+  unsigned short old_as_us;
+  do {
+    assumed = old;
+    old_as_us =
+        (unsigned short)((size_t)address & 2 ? old >> 16 : old & 0xffff);
+
+#if __CUDACC_VER_MAJOR__ >= 9
+    float16 tmp;
+    tmp.x = old_as_us;
+    float16 sum = tmp + val;
+    unsigned short sum_as_us = sum.x;
+//         half sum = __float2half_rn(__half2float(__ushort_as_half(old_as_us))
+//         + (float)(val)); unsigned short sum_as_us = __half_as_ushort(sum);
+#else
+    unsigned short sum_as_us =
+        __float2half_rn(__half2float(old_as_us) + (float)(val));
+#endif
+
+    unsigned int sum_as_ui = (size_t)address & 2
+                                 ? (sum_as_us << 16) | (old & 0xffff)
+                                 : (old & 0xffff0000) | sum_as_us;
+    old = atomicCAS(aligned, assumed, sum_as_ui);
+  } while (assumed != old);
+  //__half_raw raw = {old_as_us};
+  // return float16(raw);
+  return *reinterpret_cast<float16*>(&old_as_us);
+}
+#endif  // PARROTS_CUDA_HELPER
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cpp_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cpp_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..f68e8740561ef833c09e1ba9f999922f5d04bce5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cpp_helper.hpp
@@ -0,0 +1,27 @@
+#ifndef PYTORCH_CPP_HELPER
+#define PYTORCH_CPP_HELPER
+#include <torch/types.h>
+
+#include <vector>
+
+using namespace at;
+
+#define CHECK_CUDA(x) \
+  TORCH_CHECK(x.device().is_cuda(), #x " must be a CUDA tensor")
+#define CHECK_MLU(x) \
+  TORCH_CHECK(x.device().type() == at::kMLU, #x " must be a MLU tensor")
+#define CHECK_CPU(x) \
+  TORCH_CHECK(x.device().type() == at::kCPU, #x " must be a CPU tensor")
+#define CHECK_CONTIGUOUS(x) \
+  TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
+#define CHECK_CUDA_INPUT(x) \
+  CHECK_CUDA(x);            \
+  CHECK_CONTIGUOUS(x)
+#define CHECK_MLU_INPUT(x) \
+  CHECK_MLU(x);            \
+  CHECK_CONTIGUOUS(x)
+#define CHECK_CPU_INPUT(x) \
+  CHECK_CPU(x);            \
+  CHECK_CONTIGUOUS(x)
+
+#endif  // PYTORCH_CPP_HELPER
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cuda_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cuda_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..52e512695a403abe2688f9bffeece633a02f189a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_cuda_helper.hpp
@@ -0,0 +1,20 @@
+#ifndef PYTORCH_CUDA_HELPER
+#define PYTORCH_CUDA_HELPER
+
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <c10/cuda/CUDAGuard.h>
+
+#include <ATen/cuda/CUDAApplyUtils.cuh>
+#include <THC/THCAtomics.cuh>
+
+#include "common_cuda_helper.hpp"
+
+using at::Half;
+using at::Tensor;
+using phalf = at::Half;
+
+#define __PHALF(x) (x)
+#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0))
+
+#endif  // PYTORCH_CUDA_HELPER
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..2a32b7270c3521f960394af7d18cbbd03ba50df1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp
@@ -0,0 +1,141 @@
+#ifndef PYTORCH_DEVICE_REGISTRY_H
+#define PYTORCH_DEVICE_REGISTRY_H
+
+// Using <torch/extension.h> is recommended in the official documentation in
+// https://pytorch.org/tutorials/advanced/cpp_extension.html#writing-the-c-op.
+// However, we use <torch/types.h> for compatibility with CUDA 9.0
+// Read https://github.com/pytorch/extension-cpp/issues/35 for more details.
+#include <torch/types.h>
+
+#include <cassert>
+#include <functional>
+#include <map>
+#include <type_traits>
+
+inline std::string GetDeviceStr(const at::Device& device) {
+  std::string str = DeviceTypeName(device.type(), true);
+  if (device.has_index()) {
+    str.push_back(':');
+    str.append(std::to_string(device.index()));
+  }
+  return str;
+}
+
+// Registry
+template <typename F, F f>
+class DeviceRegistry;
+
+template <typename Ret, typename... Args, Ret (*f)(Args...)>
+class DeviceRegistry<Ret (*)(Args...), f> {
+ public:
+  using FunctionType = Ret (*)(Args...);
+  static const int MAX_DEVICE_TYPES =
+      int8_t(at::DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES);
+
+  void Register(at::DeviceType device, FunctionType function) {
+    funcs_[int8_t(device)] = function;
+  }
+
+  FunctionType Find(at::DeviceType device) const {
+    return funcs_[int8_t(device)];
+  }
+
+  static DeviceRegistry& instance() {
+    static DeviceRegistry inst;
+    return inst;
+  }
+
+ private:
+  DeviceRegistry() {
+    for (size_t i = 0; i < MAX_DEVICE_TYPES; ++i) {
+      funcs_[i] = nullptr;
+    }
+  };
+  FunctionType funcs_[MAX_DEVICE_TYPES];
+};
+
+// get device of first tensor param
+
+template <typename T, typename... Args,
+          std::enable_if_t<std::is_same<std::decay_t<T>, at::Tensor>::value,
+                           bool> = true>
+at::Device GetFirstTensorDevice(T&& t, Args&&... args) {
+  return std::forward<T>(t).device();
+}
+template <typename T, typename... Args,
+          std::enable_if_t<!std::is_same<std::decay_t<T>, at::Tensor>::value,
+                           bool> = true>
+at::Device GetFirstTensorDevice(T&& t, Args&&... args) {
+  return GetFirstTensorDevice(std::forward<Args>(args)...);
+}
+
+// check device consistency
+
+inline std::pair<int, at::Device> CheckDeviceConsistency(
+    const at::Device& device, int index) {
+  return {index, device};
+}
+
+template <typename T, typename... Args,
+          std::enable_if_t<!std::is_same<std::decay_t<T>, at::Tensor>::value,
+                           bool> = true>
+std::pair<int, at::Device> CheckDeviceConsistency(const at::Device& device,
+                                                  int index, T&& t,
+                                                  Args&&... args);
+
+template <typename T, typename... Args,
+          std::enable_if_t<std::is_same<std::decay_t<T>, at::Tensor>::value,
+                           bool> = true>
+std::pair<int, at::Device> CheckDeviceConsistency(const at::Device& device,
+                                                  int index, T&& t,
+                                                  Args&&... args) {
+  auto new_device = std::forward<T>(t).device();
+  if (new_device.type() != device.type() ||
+      new_device.index() != device.index()) {
+    return {index, new_device};
+  }
+  return CheckDeviceConsistency(device, index + 1, std::forward<Args>(args)...);
+}
+
+template <
+    typename T, typename... Args,
+    std::enable_if_t<!std::is_same<std::decay_t<T>, at::Tensor>::value, bool>>
+std::pair<int, at::Device> CheckDeviceConsistency(const at::Device& device,
+                                                  int index, T&& t,
+                                                  Args&&... args) {
+  return CheckDeviceConsistency(device, index + 1, std::forward<Args>(args)...);
+}
+
+// dispatch
+
+template <typename R, typename... Args>
+auto Dispatch(const R& registry, const char* name, Args&&... args) {
+  auto device = GetFirstTensorDevice(std::forward<Args>(args)...);
+  auto inconsist =
+      CheckDeviceConsistency(device, 0, std::forward<Args>(args)...);
+  TORCH_CHECK(inconsist.first >= int(sizeof...(Args)), name, ": at param ",
+              inconsist.first,
+              ", inconsistent device: ", GetDeviceStr(inconsist.second).c_str(),
+              " vs ", GetDeviceStr(device).c_str(), "\n")
+  auto f_ptr = registry.Find(device.type());
+  TORCH_CHECK(f_ptr != nullptr, name, ": implementation for device ",
+              GetDeviceStr(device).c_str(), " not found.\n")
+  return f_ptr(std::forward<Args>(args)...);
+}
+
+// helper macro
+
+#define DEVICE_REGISTRY(key) DeviceRegistry<decltype(&(key)), key>::instance()
+
+#define REGISTER_DEVICE_IMPL(key, device, value)           \
+  struct key##_##device##_registerer {                     \
+    key##_##device##_registerer() {                        \
+      DEVICE_REGISTRY(key).Register(at::k##device, value); \
+    }                                                      \
+  };                                                       \
+  static key##_##device##_registerer _##key##_##device##_registerer;
+
+#define DISPATCH_DEVICE_IMPL(key, ...) \
+  Dispatch(DEVICE_REGISTRY(key), #key, __VA_ARGS__)
+
+#endif  // PYTORCH_DEVICE_REGISTRY
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_mlu_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_mlu_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..e49572ca841211e2960192f1e0955b54819086cc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_mlu_helper.hpp
@@ -0,0 +1,61 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#ifndef PYTORCH_MLU_HELPER_HPP_
+#define PYTORCH_MLU_HELPER_HPP_
+
+#ifdef MMCV_WITH_MLU
+#include "aten.h"
+
+#define NFU_ALIGN_SIZE 128
+
+#define PAD_UP(x, y) (((x) / (y) + (int)((x) % (y) > 0)) * (y))
+
+#define PAD_DOWN(x, y) (((x) / (y)) * (y))
+
+#define CEIL_DIV(x, y) (((x) + (y)-1) / (y))
+
+#define CEIL_ALIGN(x, y) (((x) + (y)-1) / (y) * (y))
+
+inline int32_t getJobLimitCapability() {
+  CNcontext drv_ctx;
+  TORCH_CHECK(CN_SUCCESS == cnCtxGetCurrent(&drv_ctx), "cnCtxGetCurrent fails");
+  CNctxConfigParam ctx_conf_param;
+  TORCH_CHECK(
+      CN_SUCCESS == cnGetCtxConfigParam(drv_ctx, CN_CTX_CONFIG_UNION_LIMIT,
+                                        &ctx_conf_param),
+      "cnGetCtxConfigParam fails.");
+  return (int32_t)ctx_conf_param.unionLimit;
+}
+
+inline int32_t getCoreNumOfJobLimitCapability() {
+  switch (getJobLimitCapability()) {
+    default:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster) *
+             getJobLimitCapability();
+    case CN_KERNEL_CLASS_BLOCK:
+      return 1;
+    case CN_KERNEL_CLASS_UNION:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+    case CN_KERNEL_CLASS_UNION2:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster) * 2;
+    case CN_KERNEL_CLASS_UNION4:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster) * 4;
+    case CN_KERNEL_CLASS_UNION8:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster) * 8;
+    case CN_KERNEL_CLASS_UNION16:
+      return torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster) * 16;
+  }
+}
+
+#endif  // MMCV_WITH_MLU
+
+#endif  // PYTORCH_MLU_HELPER_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp
new file mode 100644
index 0000000000000000000000000000000000000000..88607d23b360e519084efc09ad472b2a3b8cff74
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp
@@ -0,0 +1,35 @@
+/******************************************************************************
+ * Copyright (c) 2022 Huawei Technologies Co., Ltd
+ * All rights reserved.
+ *
+ * Licensed under the BSD 3-Clause License  (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * https://opensource.org/licenses/BSD-3-Clause
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ ******************************************************************************/
+
+#ifndef PYTORCH_NPU_HELPER_HPP_
+#define PYTORCH_NPU_HELPER_HPP_
+
+#include <torch_npu/csrc/aten/NPUNativeFunctions.h>
+#include <torch_npu/csrc/framework/utils/CalcuOpUtil.h>
+#include <torch_npu/csrc/framework/utils/OpAdapter.h>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+#define NPU_NAME_SPACE at_npu::native
+
+#define REGISTER_NPU_IMPL(key, value) REGISTER_DEVICE_IMPL(key, XLA, value)
+
+#define CHECK_NPU(x) \
+  TORCH_CHECK(x.device().type() == at::kXLA, #x " must be a NPU tensor")
+
+#endif  // PYTORCH_NPU_HELPER_HPP_
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e1ead1f8e4700d019fff7b25034e2475087040c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter.cpp
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/orn/src/ActiveRotatingFilter.h
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void active_rotated_filter_forward_impl(const Tensor input,
+                                        const Tensor indices, Tensor output) {
+  DISPATCH_DEVICE_IMPL(active_rotated_filter_forward_impl, input, indices,
+                       output);
+}
+
+void active_rotated_filter_backward_impl(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in) {
+  DISPATCH_DEVICE_IMPL(active_rotated_filter_backward_impl, grad_out, indices,
+                       grad_in);
+}
+
+void active_rotated_filter_forward(const Tensor input, const Tensor indices,
+                                   Tensor output) {
+  active_rotated_filter_forward_impl(input, indices, output);
+}
+
+void active_rotated_filter_backward(const Tensor grad_out, const Tensor indices,
+                                    Tensor grad_in) {
+  active_rotated_filter_backward_impl(grad_out, indices, grad_in);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9097f7e0a15d817b8e176a01e080e8f4476f6be9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_parrots.cpp
@@ -0,0 +1,63 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "active_rotated_filter_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void active_rotated_filter_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto input = buildATensor(ctx, ins[0]);
+  auto indices = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  active_rotated_filter_forward(input, indices, output);
+}
+
+void active_rotated_filter_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto grad_out = buildATensor(ctx, ins[0]);
+  auto indices = buildATensor(ctx, ins[1]);
+  auto grad_in = buildATensor(ctx, outs[0]);
+  active_rotated_filter_backward(grad_out, indices, grad_in);
+}
+#endif
+
+void active_rotated_filter_forward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto input = buildATensor(ctx, ins[0]);
+  auto indices = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  active_rotated_filter_forward(input, indices, output);
+}
+
+void active_rotated_filter_backward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto grad_out = buildATensor(ctx, ins[0]);
+  auto indices = buildATensor(ctx, ins[1]);
+  auto grad_in = buildATensor(ctx, outs[0]);
+  active_rotated_filter_backward(grad_out, indices, grad_in);
+}
+
+PARROTS_EXTENSION_REGISTER(active_rotated_filter_forward)
+    .input(2)
+    .output(1)
+    .apply(active_rotated_filter_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(active_rotated_filter_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(active_rotated_filter_backward)
+    .input(2)
+    .output(1)
+    .apply(active_rotated_filter_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(active_rotated_filter_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..9a4d2ce96a416d6d845413f08b586aa55c57ea2f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/active_rotated_filter_pytorch.h
@@ -0,0 +1,13 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ACTIVE_ROTATED_FILTER_PYTORCH_H
+#define ACTIVE_ROTATED_FILTER_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void active_rotated_filter_forward(const Tensor input, const Tensor indices,
+                                   Tensor output);
+
+void active_rotated_filter_backward(const Tensor grad_out, const Tensor indices,
+                                    Tensor grad_in);
+
+#endif  // ACTIVE_ROTATED_FILTER_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9076277181c48c7c8f236cb9da79a83c5d38d47f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk.cpp
@@ -0,0 +1,42 @@
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/paconv_lib/src/gpu
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void assign_score_withk_forward_impl(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output) {
+  DISPATCH_DEVICE_IMPL(assign_score_withk_forward_impl, B, N0, N1, M, K, O,
+                       aggregate, points, centers, scores, knn_idx, output);
+}
+
+void assign_score_withk_backward_impl(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores) {
+  DISPATCH_DEVICE_IMPL(assign_score_withk_backward_impl, B, N0, N1, M, K, O,
+                       aggregate, grad_out, points, centers, scores, knn_idx,
+                       grad_points, grad_centers, grad_scores);
+}
+
+void assign_score_withk_forward(const Tensor& points, const Tensor& centers,
+                                const Tensor& scores, const Tensor& knn_idx,
+                                Tensor& output, int B, int N0, int N1, int M,
+                                int K, int O, int aggregate) {
+  assign_score_withk_forward_impl(B, N0, N1, M, K, O, aggregate, points,
+                                  centers, scores, knn_idx, output);
+}
+
+void assign_score_withk_backward(const Tensor& grad_out, const Tensor& points,
+                                 const Tensor& centers, const Tensor& scores,
+                                 const Tensor& knn_idx, Tensor& grad_points,
+                                 Tensor& grad_centers, Tensor& grad_scores,
+                                 int B, int N0, int N1, int M, int K, int O,
+                                 int aggregate) {
+  assign_score_withk_backward_impl(B, N0, N1, M, K, O, aggregate, grad_out,
+                                   points, centers, scores, knn_idx,
+                                   grad_points, grad_centers, grad_scores);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5729c716310069f2abd49412255b048a5dfe3f68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_parrots.cpp
@@ -0,0 +1,89 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "assign_score_withk_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void assign_score_withk_forward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  int B, N0, N1, M, K, O, aggregate;
+  SSAttrs(attr)
+      .get<int>("B", B)
+      .get<int>("N0", N0)
+      .get<int>("N1", N1)
+      .get<int>("M", M)
+      .get<int>("K", K)
+      .get<int>("O", O)
+      .get<int>("aggregate", aggregate)
+      .done();
+
+  const auto& points = buildATensor(ctx, ins[0]);
+  const auto& centers = buildATensor(ctx, ins[1]);
+  const auto& scores = buildATensor(ctx, ins[2]);
+  const auto& knn_idx = buildATensor(ctx, ins[3]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  assign_score_withk_forward(points, centers, scores, knn_idx, output, B, N0,
+                             N1, M, K, O, aggregate);
+}
+
+void assign_score_withk_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int B, N0, N1, M, K, O, aggregate;
+  SSAttrs(attr)
+      .get<int>("B", B)
+      .get<int>("N0", N0)
+      .get<int>("N1", N1)
+      .get<int>("M", M)
+      .get<int>("K", K)
+      .get<int>("O", O)
+      .get<int>("aggregate", aggregate)
+      .done();
+
+  const auto& grad_out = buildATensor(ctx, ins[0]);
+  const auto& points = buildATensor(ctx, ins[1]);
+  const auto& centers = buildATensor(ctx, ins[2]);
+  const auto& scores = buildATensor(ctx, ins[3]);
+  const auto& knn_idx = buildATensor(ctx, ins[4]);
+
+  auto grad_points = buildATensor(ctx, outs[0]);
+  auto grad_centers = buildATensor(ctx, outs[1]);
+  auto grad_scores = buildATensor(ctx, outs[2]);
+  assign_score_withk_backward(grad_out, points, centers, scores, knn_idx,
+                              grad_points, grad_centers, grad_scores, B, N0, N1,
+                              M, K, O, aggregate);
+}
+
+PARROTS_EXTENSION_REGISTER(assign_score_withk_forward)
+    .attr("B")
+    .attr("N0")
+    .attr("N1")
+    .attr("M")
+    .attr("K")
+    .attr("O")
+    .attr("aggregate")
+    .input(4)
+    .output(1)
+    .apply(assign_score_withk_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(assign_score_withk_backward)
+    .attr("B")
+    .attr("N0")
+    .attr("N1")
+    .attr("M")
+    .attr("K")
+    .attr("O")
+    .attr("aggregate")
+    .input(5)
+    .output(3)
+    .apply(assign_score_withk_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..660594feec80371eaece3a5663facf1db2b366d9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/assign_score_withk_pytorch.h
@@ -0,0 +1,19 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ASSIGN_SCORE_WITHK_PYTORCH_H
+#define ASSIGN_SCORE_WITHK_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void assign_score_withk_forward(const Tensor& points, const Tensor& centers,
+                                const Tensor& scores, const Tensor& knn_idx,
+                                Tensor& output, int B, int N0, int N1, int M,
+                                int K, int O, int aggregate);
+
+void assign_score_withk_backward(const Tensor& grad_out, const Tensor& points,
+                                 const Tensor& centers, const Tensor& scores,
+                                 const Tensor& knn_idx, Tensor& grad_points,
+                                 Tensor& grad_centers, Tensor& grad_scores,
+                                 int B, int N0, int N1, int M, int K, int O,
+                                 int aggregate);
+
+#endif  // ASSIGN_SCORE_WITHK_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query._parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query._parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..01ab9739b09986a59b69961c5b108bb098b36d6e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query._parrots.cpp
@@ -0,0 +1,43 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "ball_query_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void ball_query_parrots(CudaContext& ctx, const SSElement& attr,
+                        const OperatorBase::in_list_t& ins,
+                        OperatorBase::out_list_t& outs) {
+  int b, n, m, nsample;
+  float min_radius, max_radius;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("n", n)
+      .get<int>("m", m)
+      .get<int>("nsample", nsample)
+      .get<float>("min_radius", min_radius)
+      .get<float>("max_radius", max_radius)
+      .done();
+
+  const auto& center_xyz = buildATensor(ctx, ins[0]);
+  const auto& xyz = buildATensor(ctx, ins[1]);
+  auto idx = buildATensor(ctx, outs[0]);
+  ball_query_forward(center_xyz, xyz, idx, b, n, m, min_radius, max_radius,
+                     nsample);
+}
+
+PARROTS_EXTENSION_REGISTER(ball_query_forward)
+    .attr("b")
+    .attr("n")
+    .attr("m")
+    .attr("nsample")
+    .attr("min_radius")
+    .attr("max_radius")
+    .input(2)
+    .output(1)
+    .apply(ball_query_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1c9e7a20785e894c80d15256a1b040beffa92b47
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query.cpp
@@ -0,0 +1,20 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void ball_query_forward_impl(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx) {
+  DISPATCH_DEVICE_IMPL(ball_query_forward_impl, b, n, m, min_radius, max_radius,
+                       nsample, new_xyz, xyz, idx);
+}
+
+void ball_query_forward(Tensor new_xyz_tensor, Tensor xyz_tensor,
+                        Tensor idx_tensor, int b, int n, int m,
+                        float min_radius, float max_radius, int nsample) {
+  ball_query_forward_impl(b, n, m, min_radius, max_radius, nsample,
+                          new_xyz_tensor, xyz_tensor, idx_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..70026f315089d1c37335865ae719f301407d6231
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ball_query_pytorch.h
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef BALL_QUERY_PYTORCH_H
+#define BALL_QUERY_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void ball_query_forward(const Tensor new_xyz, const Tensor xyz, Tensor idx,
+                        int b, int n, int m, float min_radius, float max_radius,
+                        int nsample);
+
+#endif  // BALL_QUERY_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..187216fb01a307906a6fff8d7c10fc4efa1b9b3a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps.cpp
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset) {
+  DISPATCH_DEVICE_IMPL(bbox_overlaps_impl, bboxes1, bboxes2, ious, mode,
+                       aligned, offset);
+}
+
+void bbox_overlaps(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                   const int mode, const bool aligned, const int offset) {
+  bbox_overlaps_impl(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5f6264d3c07a6b0c0f5b1cb98666580e7bae6a25
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_parrots.cpp
@@ -0,0 +1,40 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "bbox_overlaps_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+/*
+ * void bbox_overlaps_cuda(const Tensor bboxes1, const Tensor bboxes2, Tensor
+ * ious, const int mode, const bool aligned, const int offset);
+ */
+void bbox_overlaps_parrots(CudaContext& ctx, const SSElement& attr,
+                           const OperatorBase::in_list_t& ins,
+                           OperatorBase::out_list_t& outs) {
+  int mode, offset;
+  bool aligned;
+  SSAttrs(attr)
+      .get<int>("mode", mode)
+      .get<bool>("aligned", aligned)
+      .get<int>("offset", offset)
+      .done();
+
+  const auto& bboxes1 = buildATensor(ctx, ins[0]);
+  const auto& bboxes2 = buildATensor(ctx, ins[1]);
+  auto ious = buildATensor(ctx, outs[0]);
+  bbox_overlaps_cuda(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
+
+PARROTS_EXTENSION_REGISTER(bbox_overlaps)
+    .attr("mode")
+    .attr("aligned")
+    .attr("offset")
+    .input(2)
+    .output(1)
+    .apply(bbox_overlaps_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..4f68aa3397d80db7dd2cf4299b4391cddc533920
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/bbox_overlaps_pytorch.h
@@ -0,0 +1,10 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef BBOX_OVERLAPS_PYTORCH_H
+#define BBOX_OVERLAPS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void bbox_overlaps_cuda(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset);
+
+#endif  // BBOX_OVERLAPS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..565de689913413ab106884365e6dc1edfa940de0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align.cpp
@@ -0,0 +1,30 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void border_align_forward_impl(const Tensor &input, const Tensor &boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size) {
+  DISPATCH_DEVICE_IMPL(border_align_forward_impl, input, boxes, output,
+                       argmax_idx, pool_size);
+}
+
+void border_align_backward_impl(const Tensor &grad_output, const Tensor &boxes,
+                                const Tensor &argmax_idx, Tensor grad_input,
+                                const int pool_size) {
+  DISPATCH_DEVICE_IMPL(border_align_backward_impl, grad_output, boxes,
+                       argmax_idx, grad_input, pool_size);
+}
+
+void border_align_forward(const Tensor &input, const Tensor &boxes,
+                          Tensor output, Tensor argmax_idx,
+                          const int pool_size) {
+  border_align_forward_impl(input, boxes, output, argmax_idx, pool_size);
+}
+
+void border_align_backward(const Tensor &grad_output, const Tensor &boxes,
+                           const Tensor &argmax_idx, Tensor grad_input,
+                           const int pool_size) {
+  border_align_backward_impl(grad_output, boxes, argmax_idx, grad_input,
+                             pool_size);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8c3bea58cca4903a1a33361ecdfe0e0d37404e0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_parrots.cpp
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "border_align_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void border_align_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  int pool_size;
+  SSAttrs(attr).get<int>("pool_size", pool_size).done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& boxes = buildATensor(ctx, ins[1]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  auto argmax_idx = buildATensor(ctx, outs[1]);
+  border_align_forward_cuda(input, boxes, output, argmax_idx, pool_size);
+}
+
+void border_align_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  int pool_size;
+  SSAttrs(attr).get<int>("pool_size", pool_size).done();
+
+  const auto& top_grad = buildATensor(ctx, ins[0]);
+  const auto& boxes = buildATensor(ctx, ins[1]);
+  const auto& argmax_idx = buildATensor(ctx, ins[2]);
+
+  auto bottom_grad = buildATensor(ctx, outs[0]);
+  border_align_backward_cuda(top_grad, boxes, argmax_idx, bottom_grad,
+                             pool_size);
+}
+
+PARROTS_EXTENSION_REGISTER(border_align_forward)
+    .attr("pool_size")
+    .input(2)
+    .output(2)
+    .apply(border_align_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(border_align_backward)
+    .attr("pool_size")
+    .input(3)
+    .output(1)
+    .apply(border_align_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..cb031e572a50df4edec4fc65056700c8850f7715
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/border_align_pytorch.h
@@ -0,0 +1,17 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef BORDER_ALIGN_PYTORCH_H
+#define BORDER_ALIGN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+#ifdef MMCV_WITH_CUDA
+void border_align_forward_cuda(const Tensor &input, const Tensor &boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size);
+
+void border_align_backward_cuda(const Tensor &grad_output, const Tensor &boxes,
+                                const Tensor &argmax_idx, Tensor grad_input,
+                                const int pool_size);
+#endif
+
+#endif  // BORDER_ALIGN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a2a4e0953a5575f72c167bd668c6b6e758ebae87
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated.cpp
@@ -0,0 +1,19 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated.h
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void box_iou_rotated_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned) {
+  DISPATCH_DEVICE_IMPL(box_iou_rotated_impl, boxes1, boxes2, ious, mode_flag,
+                       aligned);
+}
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+void box_iou_rotated(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                     const int mode_flag, const bool aligned) {
+  box_iou_rotated_impl(boxes1, boxes2, ious, mode_flag, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a90d640458b8ed38b9e18c3b26f574ce4c58e8fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_parrots.cpp
@@ -0,0 +1,61 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "box_iou_rotated_pytorch.h"
+
+using namespace parrots;
+
+/*
+ * void box_iou_rotated_cpu(const Tensor boxes1, const Tensor boxes2, Tensor
+ * ious, const int mode_flag, const bool aligned);
+ */
+void box_iou_rotated_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                 const OperatorBase::in_list_t& ins,
+                                 OperatorBase::out_list_t& outs) {
+  bool aligned;
+  int mode_flag;
+  SSAttrs(attr)
+      .get<bool>("aligned", aligned)
+      .get<int>("mode_flag", mode_flag)
+      .done();
+
+  const auto& boxes1 = buildATensor(ctx, ins[0]);
+  const auto& boxes2 = buildATensor(ctx, ins[1]);
+  auto ious = buildATensor(ctx, outs[0]);
+  box_iou_rotated_cpu(boxes1, boxes2, ious, mode_flag, aligned);
+}
+
+#ifdef MMCV_WITH_CUDA
+/*
+ * void box_iou_rotated_cuda(const Tensor boxes1, const Tensor boxes2, Tensor
+ * ious, const int mode_flag, const bool aligned);
+ */
+void box_iou_rotated_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                  const OperatorBase::in_list_t& ins,
+                                  OperatorBase::out_list_t& outs) {
+  bool aligned;
+  int mode_flag;
+  SSAttrs(attr)
+      .get<bool>("aligned", aligned)
+      .get<int>("mode_flag", mode_flag)
+      .done();
+
+  const auto& boxes1 = buildATensor(ctx, ins[0]);
+  const auto& boxes2 = buildATensor(ctx, ins[1]);
+  auto ious = buildATensor(ctx, outs[0]);
+  box_iou_rotated_cuda(boxes1, boxes2, ious, mode_flag, aligned);
+}
+#endif
+
+PARROTS_EXTENSION_REGISTER(box_iou_rotated)
+    .attr("aligned")
+    .attr("mode_flag")
+    .input(2)
+    .output(1)
+    .apply(box_iou_rotated_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(box_iou_rotated_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..afab7031812d4389707e6b4235affba93faef6c0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/box_iou_rotated_pytorch.h
@@ -0,0 +1,15 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef BOX_IOU_ROTATED_PYTORCH_H
+#define BOX_IOU_ROTATED_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void box_iou_rotated_cpu(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned);
+
+#ifdef MMCV_WITH_CUDA
+void box_iou_rotated_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+#endif
+
+#endif  // BOX_IOU_ROTATED_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a563aed94f04e32614e38062c4e7f4250c6dafe6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe.cpp
@@ -0,0 +1,38 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void carafe_forward_impl(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_forward_impl, features, masks, rfeatures, routput,
+                       rmasks, output, kernel_size, group_size, scale_factor);
+}
+
+void carafe_backward_impl(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_backward_impl, top_grad, rfeatures, masks,
+                       rtop_grad, rbottom_grad_hs, rbottom_grad, rmask_grad,
+                       bottom_grad, mask_grad, kernel_size, group_size,
+                       scale_factor);
+}
+
+void carafe_forward(Tensor features, Tensor masks, Tensor rfeatures,
+                    Tensor routput, Tensor rmasks, Tensor output,
+                    int kernel_size, int group_size, int scale_factor) {
+  carafe_forward_impl(features, masks, rfeatures, routput, rmasks, output,
+                      kernel_size, group_size, scale_factor);
+}
+
+void carafe_backward(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                     Tensor rtop_grad, Tensor rbottom_grad_hs,
+                     Tensor rbottom_grad, Tensor rmask_grad, Tensor bottom_grad,
+                     Tensor mask_grad, int kernel_size, int group_size,
+                     int scale_factor) {
+  carafe_backward_impl(top_grad, rfeatures, masks, rtop_grad, rbottom_grad_hs,
+                       rbottom_grad, rmask_grad, bottom_grad, mask_grad,
+                       kernel_size, group_size, scale_factor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6e8917a61d93c7e6613566902cb00623ea89444e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive.cpp
@@ -0,0 +1,32 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void carafe_naive_forward_impl(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_naive_forward_impl, features, masks, output,
+                       kernel_size, group_size, scale_factor);
+}
+
+void carafe_naive_backward_impl(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_naive_backward_impl, top_grad, features, masks,
+                       bottom_grad, mask_grad, kernel_size, group_size,
+                       scale_factor);
+}
+
+void carafe_naive_forward(Tensor features, Tensor masks, Tensor output,
+                          int kernel_size, int group_size, int scale_factor) {
+  carafe_naive_forward_impl(features, masks, output, kernel_size, group_size,
+                            scale_factor);
+}
+
+void carafe_naive_backward(Tensor top_grad, Tensor features, Tensor masks,
+                           Tensor bottom_grad, Tensor mask_grad,
+                           int kernel_size, int group_size, int scale_factor) {
+  carafe_naive_backward_impl(top_grad, features, masks, bottom_grad, mask_grad,
+                             kernel_size, group_size, scale_factor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9c16a3707991d015971325fe161c2c9c4c2c31a6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_parrots.cpp
@@ -0,0 +1,74 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "carafe_naive_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+/*void carafe_naive_forward_cuda(Tensor features, Tensor masks, Tensor output,
+ *                                int kernel_size, int group_size,
+ *                                int scale_factor)
+ */
+void carafe_naive_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  int kernel_size, group_size, scale_factor;
+  SSAttrs(attr)
+      .get<int>("kernel_size", kernel_size)
+      .get<int>("group_size", group_size)
+      .get<int>("scale_factor", scale_factor)
+      .done();
+
+  const auto& features = buildATensor(ctx, ins[0]);
+  const auto& masks = buildATensor(ctx, ins[1]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  carafe_naive_forward_cuda(features, masks, output, kernel_size, group_size,
+                            scale_factor);
+}
+
+/*void carafe_naive_backward_cuda(Tensor top_grad, Tensor features, Tensor
+ * masks, Tensor bottom_grad, Tensor mask_grad, int kernel_size, int group_size,
+ *                                int scale_factor);
+ */
+void carafe_naive_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  int kernel_size, group_size, scale_factor;
+  SSAttrs(attr)
+      .get<int>("kernel_size", kernel_size)
+      .get<int>("group_size", group_size)
+      .get<int>("scale_factor", scale_factor)
+      .done();
+
+  const auto& top_grad = buildATensor(ctx, ins[0]);
+  const auto& features = buildATensor(ctx, ins[1]);
+  const auto& masks = buildATensor(ctx, ins[2]);
+
+  auto bottom_grad = buildATensor(ctx, outs[0]);
+  auto mask_grad = buildATensor(ctx, outs[1]);
+  carafe_naive_backward_cuda(top_grad, features, masks, bottom_grad, mask_grad,
+                             kernel_size, group_size, scale_factor);
+}
+
+PARROTS_EXTENSION_REGISTER(carafe_naive_forward)
+    .attr("kernel_size")
+    .attr("group_size")
+    .attr("scale_factor")
+    .input(2)
+    .output(1)
+    .apply(carafe_naive_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(carafe_naive_backward)
+    .attr("kernel_size")
+    .attr("group_size")
+    .attr("scale_factor")
+    .input(3)
+    .output(2)
+    .apply(carafe_naive_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..6df9b88c231b4949f128c528cc3f31633c76fb79
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_naive_pytorch.h
@@ -0,0 +1,15 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CARAFE_NAIVE_PYTORCH_H
+#define CARAFE_NAIVE_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void carafe_naive_forward_cuda(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor);
+
+void carafe_naive_backward_cuda(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor);
+#endif  // CARAFE_NAIVE_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e99f59ef221bfe7058c53a486c75e201c44e7f68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_parrots.cpp
@@ -0,0 +1,88 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "carafe_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+/*
+ * void carafe_forward_cuda(Tensor features, Tensor masks, Tensor rfeatures,
+ *                          Tensor routput, Tensor rmasks, Tensor output,
+ *                          int kernel_size, int group_size, int scale_factor);
+ */
+void carafe_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                 const OperatorBase::in_list_t& ins,
+                                 OperatorBase::out_list_t& outs) {
+  int kernel_size, group_size, scale_factor;
+  SSAttrs(attr)
+      .get<int>("kernel_size", kernel_size)
+      .get<int>("group_size", group_size)
+      .get<int>("scale_factor", scale_factor)
+      .done();
+
+  const auto& features = buildATensor(ctx, ins[0]);
+  const auto& masks = buildATensor(ctx, ins[1]);
+
+  auto rfeatures = buildATensor(ctx, outs[0]);
+  auto routput = buildATensor(ctx, outs[1]);
+  auto rmasks = buildATensor(ctx, outs[2]);
+  auto output = buildATensor(ctx, outs[3]);
+
+  carafe_forward_cuda(features, masks, rfeatures, routput, rmasks, output,
+                      kernel_size, group_size, scale_factor);
+}
+
+/*
+ * void carafe_backward_cuda(Tensor top_grad, Tensor rfeatures, Tensor masks,
+ *                           Tensor rtop_grad, Tensor rbottom_grad_hs,
+ *                           Tensor rbottom_grad, Tensor rmask_grad,
+ *                           Tensor bottom_grad, Tensor mask_grad, int
+ * kernel_size, int group_size, int scale_factor);
+ */
+void carafe_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                  const OperatorBase::in_list_t& ins,
+                                  OperatorBase::out_list_t& outs) {
+  int kernel_size, group_size, scale_factor;
+  SSAttrs(attr)
+      .get<int>("kernel_size", kernel_size)
+      .get<int>("group_size", group_size)
+      .get<int>("scale_factor", scale_factor)
+      .done();
+
+  const auto& top_grad = buildATensor(ctx, ins[0]);
+  const auto& rfeatures = buildATensor(ctx, ins[1]);
+  const auto& masks = buildATensor(ctx, ins[2]);
+
+  auto rtop_grad = buildATensor(ctx, outs[0]);
+  auto rbottom_grad_hs = buildATensor(ctx, outs[1]);
+  auto rbottom_grad = buildATensor(ctx, outs[2]);
+  auto rmask_grad = buildATensor(ctx, outs[3]);
+  auto bottom_grad = buildATensor(ctx, outs[4]);
+  auto mask_grad = buildATensor(ctx, outs[5]);
+
+  carafe_backward_cuda(top_grad, rfeatures, masks, rtop_grad, rbottom_grad_hs,
+                       rbottom_grad, rmask_grad, bottom_grad, mask_grad,
+                       kernel_size, group_size, scale_factor);
+}
+
+PARROTS_EXTENSION_REGISTER(carafe_forward)
+    .attr("kernel_size")
+    .attr("group_size")
+    .attr("scale_factor")
+    .input(2)
+    .output(4)
+    .apply(carafe_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(carafe_backward)
+    .attr("kernel_size")
+    .attr("group_size")
+    .attr("scale_factor")
+    .input(3)
+    .output(6)
+    .apply(carafe_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..2b94d44d3c9d1a81e0838bf209d774c703004fa9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/carafe_pytorch.h
@@ -0,0 +1,16 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CARAFE_PYTORCH_H
+#define CARAFE_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void carafe_forward_cuda(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor);
+
+void carafe_backward_cuda(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor);
+#endif  // CARAFE_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..dcff69893185d7cc52d8048d300b45ccfe0b3968
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance.cpp
@@ -0,0 +1,35 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/chrdiller/pyTorchChamferDistance/blob/master/chamfer_distance/chamfer_distance.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void chamfer_distance_forward_impl(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2) {
+  DISPATCH_DEVICE_IMPL(chamfer_distance_forward_impl, xyz1, xyz2, dist1, dist2,
+                       idx1, idx2);
+}
+
+void chamfer_distance_backward_impl(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2) {
+  DISPATCH_DEVICE_IMPL(chamfer_distance_backward_impl, xyz1, xyz2, idx1, idx2,
+                       graddist1, graddist2, gradxyz1, gradxyz2);
+}
+
+void chamfer_distance_forward(const Tensor xyz1, const Tensor xyz2,
+                              const Tensor dist1, const Tensor dist2,
+                              const Tensor idx1, const Tensor idx2) {
+  chamfer_distance_forward_impl(xyz1, xyz2, dist1, dist2, idx1, idx2);
+}
+
+void chamfer_distance_backward(const Tensor xyz1, const Tensor xyz2,
+                               Tensor idx1, Tensor idx2, Tensor graddist1,
+                               Tensor graddist2, Tensor gradxyz1,
+                               Tensor gradxyz2) {
+  chamfer_distance_backward_impl(xyz1, xyz2, idx1, idx2, graddist1, graddist2,
+                                 gradxyz1, gradxyz2);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..db8eff1d6f5e0c4a0c1e21a55f54381f1d5a3104
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_parrots.cpp
@@ -0,0 +1,51 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "chamfer_distance_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void chamfer_distance_forward_cuda_parrots(CudaContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  auto xyz1 = buildATensor(ctx, ins[0]);
+  auto xyz2 = buildATensor(ctx, ins[1]);
+  auto dist1 = buildATensor(ctx, outs[0]);
+  auto dist2 = buildATensor(ctx, outs[1]);
+  auto idx1 = buildATensor(ctx, outs[2]);
+  auto idx2 = buildATensor(ctx, outs[3]);
+  chamfer_distance_forward(xyz1, xyz2, dist1, dist2, idx1, idx2);
+}
+
+void chamfer_distance_backward_cuda_parrots(CudaContext& ctx,
+                                            const SSElement& attr,
+                                            const OperatorBase::in_list_t& ins,
+                                            OperatorBase::out_list_t& outs) {
+  auto xyz1 = buildATensor(ctx, ins[0]);
+  auto xyz2 = buildATensor(ctx, ins[1]);
+  auto idx1 = buildATensor(ctx, ins[2]);
+  auto idx2 = buildATensor(ctx, ins[3]);
+  auto graddist1 = buildATensor(ctx, ins[4]);
+  auto graddist2 = buildATensor(ctx, ins[5]);
+  auto gradxyz1 = buildATensor(ctx, outs[0]);
+  auto gradxyz2 = buildATensor(ctx, outs[1]);
+  chamfer_distance_backward(xyz1, xyz2, idx1, idx2, graddist1, graddist2,
+                            gradxyz1, gradxyz2);
+}
+
+PARROTS_EXTENSION_REGISTER(chamfer_distance_forward)
+    .input(2)
+    .output(4)
+    .apply(chamfer_distance_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(chamfer_distance_backward)
+    .input(6)
+    .output(2)
+    .apply(chamfer_distance_backward_cuda_parrots)
+    .done();
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..6405526b0c4c73d6aa1bb2142687d148ba559af2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/chamfer_distance_pytorch.h
@@ -0,0 +1,16 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ACTIVE_CHAMFER_DISTANCE_PYTORCH_H
+#define ACTIVE_CHAMFER_DISTANCE_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void chamfer_distance_forward(const Tensor xyz1, const Tensor xyz2,
+                              const Tensor dist1, const Tensor dist2,
+                              const Tensor idx1, const Tensor idx);
+
+void chamfer_distance_backward(const Tensor xyz1, const Tensor xyz2,
+                               Tensor idx1, Tensor idx2, Tensor graddist1,
+                               Tensor graddist2, Tensor gradxyz1,
+                               Tensor gradxyz2);
+
+#endif  // ACTIVE_CHAMFER_DISTANCE_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..586c48ee44b6b7dbb24573b4a2d2ecf499a56d0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand.cpp
@@ -0,0 +1,111 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// It is modified from https://github.com/whai362/PSENet
+#include <iostream>
+#include <queue>
+
+#include "pytorch_cpp_helper.hpp"
+
+using namespace std;
+
+class Point2d {
+ public:
+  int x;
+  int y;
+
+  Point2d() : x(0), y(0) {}
+  Point2d(int _x, int _y) : x(_x), y(_y) {}
+};
+
+void kernel_dilate(const uint8_t *data, IntArrayRef data_shape,
+                   const int *label_map, int &label_num, int &min_area,
+                   vector<vector<int>> &text_line) {
+  std::vector<int> area(label_num + 1);
+  int kernel_num = data_shape[0];
+  int height = data_shape[1];
+  int width = data_shape[2];
+
+  for (int x = 0; x < height; ++x) {
+    for (int y = 0; y < width; ++y) {
+      int label = label_map[x * width + y];
+      if (label == 0) continue;
+      area[label] += 1;
+    }
+  }
+
+  queue<Point2d> queue, next_queue;
+  for (int x = 0; x < height; ++x) {
+    vector<int> row(width);
+    for (int y = 0; y < width; ++y) {
+      int label = label_map[x * width + y];
+      if (label == 0) continue;
+      if (area[label] < min_area) continue;
+
+      Point2d point(x, y);
+      queue.push(point);
+      row[y] = label;
+    }
+    text_line.emplace_back(row);
+  }
+
+  int dx[] = {-1, 1, 0, 0};
+  int dy[] = {0, 0, -1, 1};
+  vector<int> kernel_step(kernel_num);
+  std::for_each(kernel_step.begin(), kernel_step.end(),
+                [=](int &k) { return k * height * width; });
+
+  for (int kernel_id = kernel_num - 2; kernel_id >= 0; --kernel_id) {
+    while (!queue.empty()) {
+      Point2d point = queue.front();
+      queue.pop();
+      int x = point.x;
+      int y = point.y;
+      int label = text_line[x][y];
+
+      bool is_edge = true;
+      for (int d = 0; d < 4; ++d) {
+        int tmp_x = x + dx[d];
+        int tmp_y = y + dy[d];
+
+        if (tmp_x < 0 || tmp_x >= height) continue;
+        if (tmp_y < 0 || tmp_y >= width) continue;
+        int kernel_value = data[kernel_step[kernel_id] + tmp_x * width + tmp_y];
+        if (kernel_value == 0) continue;
+        if (text_line[tmp_x][tmp_y] > 0) continue;
+
+        Point2d point(tmp_x, tmp_y);
+        queue.push(point);
+        text_line[tmp_x][tmp_y] = label;
+        is_edge = false;
+      }
+
+      if (is_edge) {
+        next_queue.push(point);
+      }
+    }
+    swap(queue, next_queue);
+  }
+}
+
+std::vector<std::vector<int>> contour_expand(Tensor kernel_mask,
+                                             Tensor internal_kernel_label,
+                                             int min_kernel_area,
+                                             int kernel_num) {
+  kernel_mask = kernel_mask.contiguous();
+  internal_kernel_label = internal_kernel_label.contiguous();
+  assert(kernel_mask.dim() == 3);
+  assert(internal_kernel_label.dim() == 2);
+  assert(kernel_mask.size(1) == internal_kernel_label.size(0));
+  assert(kernel_mask.size(2) == internal_kernel_label.size(1));
+  CHECK_CPU_INPUT(kernel_mask);
+  CHECK_CPU_INPUT(internal_kernel_label);
+  auto ptr_data = kernel_mask.data_ptr<uint8_t>();
+  IntArrayRef data_shape = kernel_mask.sizes();
+
+  auto data_label_map = internal_kernel_label.data_ptr<int32_t>();
+  vector<vector<int>> text_line;
+
+  kernel_dilate(ptr_data, data_shape, data_label_map, kernel_num,
+                min_kernel_area, text_line);
+
+  return text_line;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1581fdc833c8f6b19a8e5a892ddbd8ec9414333e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_parrots.cpp
@@ -0,0 +1,43 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "contour_expand_pytorch.h"
+
+using namespace parrots;
+using namespace std;
+
+template <typename T>
+void contour_expand_parrots(T& ctx, const SSElement& attr,
+                            const OperatorBase::in_list_t& ins,
+                            OperatorBase::out_list_t& outs) {
+  int min_kernel_area, kernel_num;
+  SSAttrs(attr)
+      .get<int>("min_kernel_area", min_kernel_area)
+      .get<int>("kernel_num", kernel_num)
+      .done();
+  at::Tensor kernel_mask;
+  at::Tensor internal_kernel_label;
+  kernel_mask = buildATensor(ctx, ins[0]);
+  internal_kernel_label = buildATensor(ctx, ins[1]);
+  auto out = contour_expand(kernel_mask, internal_kernel_label, min_kernel_area,
+                            kernel_num);
+  int n = out.size(), m = 0;
+  for (int i = 0; i < n; ++i)
+    if (m < out[i].size()) m = out[i].size();
+  auto options = torch::TensorOptions().dtype(at::kInt);
+  auto tensor = torch::zeros({n, m}, options);
+  for (int i = 0; i < n; i++)
+    tensor.slice(0, i, i + 1) =
+        torch::from_blob(out[i].data(), {out[i].size()}, options);
+  updateDArray(ctx, tensor, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(contour_expand)
+    .attr("min_kernel_area")
+    .attr("kernel_num")
+    .input(2)
+    .output(1)
+    .apply(contour_expand_parrots<HostContext>)
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..881bbac3cb73494e0063314c340adc7a280f4fc6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/contour_expand_pytorch.h
@@ -0,0 +1,12 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CONTOUR_EXPAND_PYTORCH_H
+#define CONTOUR_EXPAND_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+std::vector<std::vector<int>> contour_expand(Tensor kernel_mask,
+                                             Tensor internal_kernel_label,
+                                             int min_kernel_area,
+                                             int kernel_num);
+
+#endif  // CONTOUR_EXPAND_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..79f2028b551c474453aff2f6633dd426194e4afd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou.cpp
@@ -0,0 +1,23 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/SDL-GuoZonghao/BeyondBoundingBox/tree/main/mmdet/ops/iou/src
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void convex_iou_impl(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious) {
+  DISPATCH_DEVICE_IMPL(convex_iou_impl, pointsets, polygons, ious);
+}
+
+void convex_iou(const Tensor pointsets, const Tensor polygons, Tensor ious) {
+  convex_iou_impl(pointsets, polygons, ious);
+}
+
+void convex_giou_impl(const Tensor pointsets, const Tensor polygons,
+                      Tensor output) {
+  DISPATCH_DEVICE_IMPL(convex_giou_impl, pointsets, polygons, output);
+}
+
+void convex_giou(const Tensor pointsets, const Tensor polygons, Tensor output) {
+  convex_giou_impl(pointsets, polygons, output);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bf766542f0a04da85a1b15022f3e5f078c283a1a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_parrots.cpp
@@ -0,0 +1,40 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "convex_iou_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void convex_iou_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  auto pointsets = buildATensor(ctx, ins[0]);
+  auto polygons = buildATensor(ctx, ins[1]);
+  auto ious = buildATensor(ctx, outs[0]);
+  convex_iou(pointsets, polygons, ious);
+}
+
+void convex_giou_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  auto pointsets = buildATensor(ctx, ins[0]);
+  auto polygons = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  convex_giou(pointsets, polygons, output);
+}
+
+PARROTS_EXTENSION_REGISTER(convex_iou)
+    .input(2)
+    .output(1)
+    .apply(convex_iou_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(convex_giou)
+    .input(2)
+    .output(1)
+    .apply(convex_giou_forward_cuda_parrots)
+    .done();
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..4f16a1ce4b62bbe91b3083465468c2b9ae6df055
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/convex_iou_pytorch.h
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CONVEX_IOU_PYTORCH_H
+#define CONVEX_IOU_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void convex_iou(const Tensor pointsets, const Tensor polygons, Tensor ious);
+
+void convex_giou(const Tensor pointsets, const Tensor polygons, Tensor output);
+
+#endif  // RIROI_ALIGN_ROTATED_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f4adba2a0c17201476352c473f1c7117af020ab2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation.cpp
@@ -0,0 +1,47 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include <iostream>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void correlation_forward_impl(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW) {
+  DISPATCH_DEVICE_IMPL(correlation_forward_impl, input1, input2, output, kH, kW,
+                       patchH, patchW, padH, padW, dilationH, dilationW,
+                       dilation_patchH, dilation_patchW, dH, dW);
+}
+
+void correlation_backward_impl(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW) {
+  DISPATCH_DEVICE_IMPL(correlation_backward_impl, grad_output, input1, input2,
+                       grad_input1, grad_input2, kH, kW, patchH, patchW, padH,
+                       padW, dilationH, dilationW, dilation_patchH,
+                       dilation_patchW, dH, dW);
+}
+
+void correlation_forward(Tensor input1, Tensor input2, Tensor output, int kH,
+                         int kW, int patchH, int patchW, int padH, int padW,
+                         int dilationH, int dilationW, int dilation_patchH,
+                         int dilation_patchW, int dH, int dW) {
+  correlation_forward_impl(input1, input2, output, kH, kW, patchH, patchW, padH,
+                           padW, dilationH, dilationW, dilation_patchH,
+                           dilation_patchW, dH, dW);
+}
+
+void correlation_backward(Tensor grad_output, Tensor input1, Tensor input2,
+                          Tensor grad_input1, Tensor grad_input2, int kH,
+                          int kW, int patchH, int patchW, int padH, int padW,
+                          int dilationH, int dilationW, int dilation_patchH,
+                          int dilation_patchW, int dH, int dW) {
+  correlation_backward_impl(grad_output, input1, input2, grad_input1,
+                            grad_input2, kH, kW, patchH, patchW, padH, padW,
+                            dilationH, dilationW, dilation_patchH,
+                            dilation_patchW, dH, dW);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b1e287d063564775070389285a6fee7ea1aaeb80
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_parrots.cpp
@@ -0,0 +1,176 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "correlation_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void correlation_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  int kH, kW, patchH, patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW;
+  SSAttrs(attr)
+      .get<int>("kH", kH)
+      .get<int>("kW", kW)
+      .get<int>("patchH", patchH)
+      .get<int>("patchW", patchW)
+      .get<int>("padH", padH)
+      .get<int>("padW", padW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilation_patchH", dilation_patchH)
+      .get<int>("dilation_patchW", dilation_patchW)
+      .get<int>("dH", dH)
+      .get<int>("dW", dW)
+      .done();
+
+  auto input1 = buildATensor(ctx, ins[0]);
+  auto input2 = buildATensor(ctx, ins[1]);
+
+  auto output = buildATensor(ctx, outs[0]);
+
+  correlation_forward(input1, input2, output, kH, kW, patchH, patchW, padH,
+                      padW, dilationH, dilationW, dilation_patchH,
+                      dilation_patchW, dH, dW);
+}
+
+void correlation_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  int kH, kW, patchH, patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW;
+  SSAttrs(attr)
+      .get<int>("kH", kH)
+      .get<int>("kW", kW)
+      .get<int>("patchH", patchH)
+      .get<int>("patchW", patchW)
+      .get<int>("padH", padH)
+      .get<int>("padW", padW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilation_patchH", dilation_patchH)
+      .get<int>("dilation_patchW", dilation_patchW)
+      .get<int>("dH", dH)
+      .get<int>("dW", dW)
+      .done();
+
+  auto grad_output = buildATensor(ctx, ins[0]);
+  auto input1 = buildATensor(ctx, ins[1]);
+  auto input2 = buildATensor(ctx, ins[2]);
+
+  auto grad_input1 = buildATensor(ctx, outs[0]);
+  auto grad_input2 = buildATensor(ctx, outs[1]);
+
+  correlation_backward(grad_output, input1, input2, grad_input1, grad_input2,
+                       kH, kW, patchH, patchW, padH, padW, dilationH, dilationW,
+                       dilation_patchH, dilation_patchW, dH, dW);
+}
+#endif
+
+void correlation_forward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  int kH, kW, patchH, patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW;
+  SSAttrs(attr)
+      .get<int>("kH", kH)
+      .get<int>("kW", kW)
+      .get<int>("patchH", patchH)
+      .get<int>("patchW", patchW)
+      .get<int>("padH", padH)
+      .get<int>("padW", padW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilation_patchH", dilation_patchH)
+      .get<int>("dilation_patchW", dilation_patchW)
+      .get<int>("dH", dH)
+      .get<int>("dW", dW)
+      .done();
+
+  auto input1 = buildATensor(ctx, ins[0]);
+  auto input2 = buildATensor(ctx, ins[1]);
+
+  auto output = buildATensor(ctx, outs[0]);
+
+  correlation_forward(input1, input2, output, kH, kW, patchH, patchW, padH,
+                      padW, dilationH, dilationW, dilation_patchH,
+                      dilation_patchW, dH, dW);
+}
+
+void correlation_backward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  int kH, kW, patchH, patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW;
+  SSAttrs(attr)
+      .get<int>("kH", kH)
+      .get<int>("kW", kW)
+      .get<int>("patchH", patchH)
+      .get<int>("patchW", patchW)
+      .get<int>("padH", padH)
+      .get<int>("padW", padW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilation_patchH", dilation_patchH)
+      .get<int>("dilation_patchW", dilation_patchW)
+      .get<int>("dH", dH)
+      .get<int>("dW", dW)
+      .done();
+
+  auto grad_output = buildATensor(ctx, ins[0]);
+  auto input1 = buildATensor(ctx, ins[1]);
+  auto input2 = buildATensor(ctx, ins[2]);
+
+  auto grad_input1 = buildATensor(ctx, outs[0]);
+  auto grad_input2 = buildATensor(ctx, outs[1]);
+
+  correlation_backward(grad_output, input1, input2, grad_input1, grad_input2,
+                       kH, kW, patchH, patchW, padH, padW, dilationH, dilationW,
+                       dilation_patchH, dilation_patchW, dH, dW);
+}
+
+PARROTS_EXTENSION_REGISTER(correlation_forward)
+    .attr("kH")
+    .attr("kW")
+    .attr("patchH")
+    .attr("patchW")
+    .attr("padH")
+    .attr("padW")
+    .attr("dilationH")
+    .attr("dilationW")
+    .attr("dilation_patchH")
+    .attr("dilation_patchW")
+    .attr("dH")
+    .attr("dW")
+    .input(2)
+    .output(1)
+    .apply(correlation_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(correlation_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(correlation_backward)
+    .attr("kH")
+    .attr("kW")
+    .attr("patchH")
+    .attr("patchW")
+    .attr("padH")
+    .attr("padW")
+    .attr("dilationH")
+    .attr("dilationW")
+    .attr("dilation_patchH")
+    .attr("dilation_patchW")
+    .attr("dH")
+    .attr("dW")
+    .input(3)
+    .output(2)
+    .apply(correlation_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(correlation_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..806fcaa710deb7d4622be6373dda84b20e7278fc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/correlation_pytorch.h
@@ -0,0 +1,18 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef CORRELATION_PYTORCH_H
+#define CORRELATION_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void correlation_forward(Tensor input1, Tensor input2, Tensor output, int kH,
+                         int kW, int patchH, int patchW, int padH, int padW,
+                         int dilationH, int dilationW, int dilation_patchH,
+                         int dilation_patchW, int dH, int dW);
+
+void correlation_backward(Tensor grad_output, Tensor input1, Tensor input2,
+                          Tensor grad_input1, Tensor grad_input2, int kH,
+                          int kW, int patchH, int patchW, int padH, int padW,
+                          int dilationH, int dilationW, int dilation_patchH,
+                          int dilation_patchW, int dH, int dW);
+
+#endif  // CORRELATION_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/cudabind.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/cudabind.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9627e26f4fc86c089fbc10f7d2327e1094b55491
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/cudabind.cpp
@@ -0,0 +1,1677 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void AssignScoreWithKForwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& points, const Tensor& centers, const Tensor& scores,
+    const Tensor& knn_idx, Tensor& output);
+
+void AssignScoreWithKBackwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores);
+
+void assign_score_withk_forward_cuda(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output) {
+  AssignScoreWithKForwardCUDAKernelLauncher(
+      B, N0, N1, M, K, O, aggregate, points, centers, scores, knn_idx, output);
+};
+
+void assign_score_withk_backward_cuda(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores) {
+  AssignScoreWithKBackwardCUDAKernelLauncher(
+      B, N0, N1, M, K, O, aggregate, grad_out, points, centers, scores, knn_idx,
+      grad_points, grad_centers, grad_scores);
+};
+
+void assign_score_withk_forward_impl(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output);
+
+void assign_score_withk_backward_impl(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores);
+
+REGISTER_DEVICE_IMPL(assign_score_withk_forward_impl, CUDA,
+                     assign_score_withk_forward_cuda);
+REGISTER_DEVICE_IMPL(assign_score_withk_backward_impl, CUDA,
+                     assign_score_withk_backward_cuda);
+
+void BallQueryForwardCUDAKernelLauncher(int b, int n, int m, float min_radius,
+                                        float max_radius, int nsample,
+                                        const Tensor new_xyz, const Tensor xyz,
+                                        Tensor idx);
+
+void ball_query_forward_cuda(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx) {
+  BallQueryForwardCUDAKernelLauncher(b, n, m, min_radius, max_radius, nsample,
+                                     new_xyz, xyz, idx);
+};
+
+void ball_query_forward_impl(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx);
+REGISTER_DEVICE_IMPL(ball_query_forward_impl, CUDA, ball_query_forward_cuda);
+
+void BBoxOverlapsCUDAKernelLauncher(const Tensor bboxes1, const Tensor bboxes2,
+                                    Tensor ious, const int mode,
+                                    const bool aligned, const int offset);
+
+void bbox_overlaps_cuda(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset) {
+  BBoxOverlapsCUDAKernelLauncher(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset);
+REGISTER_DEVICE_IMPL(bbox_overlaps_impl, CUDA, bbox_overlaps_cuda);
+
+void BorderAlignForwardCUDAKernelLauncher(const Tensor& input,
+                                          const Tensor& boxes, Tensor output,
+                                          Tensor argmax_idx,
+                                          const int pool_size);
+
+void BorderAlignBackwardCUDAKernelLauncher(const Tensor& grad_output,
+                                           const Tensor& boxes,
+                                           const Tensor& argmax_idx,
+                                           Tensor grad_input,
+                                           const int pool_size);
+
+void border_align_forward_cuda(const Tensor& input, const Tensor& boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size) {
+  BorderAlignForwardCUDAKernelLauncher(input, boxes, output, argmax_idx,
+                                       pool_size);
+}
+
+void border_align_backward_cuda(const Tensor& grad_output, const Tensor& boxes,
+                                const Tensor& argmax_idx, Tensor grad_input,
+                                const int pool_size) {
+  BorderAlignBackwardCUDAKernelLauncher(grad_output, boxes, argmax_idx,
+                                        grad_input, pool_size);
+}
+
+void border_align_forward_impl(const Tensor& input, const Tensor& boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size);
+
+void border_align_backward_impl(const Tensor& grad_output, const Tensor& boxes,
+                                const Tensor& argmax_idx, Tensor grad_input,
+                                const int pool_size);
+
+REGISTER_DEVICE_IMPL(border_align_forward_impl, CUDA,
+                     border_align_forward_cuda);
+REGISTER_DEVICE_IMPL(border_align_backward_impl, CUDA,
+                     border_align_backward_cuda);
+
+void box_iou_rotated_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+
+void box_iou_rotated_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+REGISTER_DEVICE_IMPL(box_iou_rotated_impl, CUDA, box_iou_rotated_cuda);
+
+void CARAFEForwardCUDAKernelLauncher(const Tensor features, const Tensor masks,
+                                     Tensor rfeatures, Tensor routput,
+                                     Tensor rmasks, Tensor output,
+                                     const int kernel_size,
+                                     const int group_size,
+                                     const int scale_factor);
+
+void CARAFEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor rfeatures, const Tensor masks,
+    Tensor rtop_grad, Tensor rbottom_grad_hs, Tensor rbottom_grad,
+    Tensor rmask_grad, Tensor bottom_grad, Tensor mask_grad,
+    const int kernel_size, const int group_size, const int scale_factor);
+
+void carafe_forward_cuda(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor) {
+  CARAFEForwardCUDAKernelLauncher(features, masks, rfeatures, routput, rmasks,
+                                  output, kernel_size, group_size,
+                                  scale_factor);
+}
+
+void carafe_backward_cuda(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor) {
+  CARAFEBackwardCUDAKernelLauncher(top_grad, rfeatures, masks, rtop_grad,
+                                   rbottom_grad_hs, rbottom_grad, rmask_grad,
+                                   bottom_grad, mask_grad, kernel_size,
+                                   group_size, scale_factor);
+}
+
+void carafe_forward_impl(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor);
+
+void carafe_backward_impl(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor);
+
+REGISTER_DEVICE_IMPL(carafe_forward_impl, CUDA, carafe_forward_cuda);
+REGISTER_DEVICE_IMPL(carafe_backward_impl, CUDA, carafe_backward_cuda);
+
+void CARAFENAIVEForwardCUDAKernelLauncher(const Tensor features,
+                                          const Tensor masks, Tensor output,
+                                          const int kernel_size,
+                                          const int group_size,
+                                          const int scale_factor);
+
+void CARAFENAIVEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor features, const Tensor masks,
+    Tensor bottom_grad, Tensor mask_grad, const int kernel_size,
+    const int group_size, const int scale_factor);
+
+void carafe_naive_forward_cuda(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor) {
+  CARAFENAIVEForwardCUDAKernelLauncher(features, masks, output, kernel_size,
+                                       group_size, scale_factor);
+}
+
+void carafe_naive_backward_cuda(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor) {
+  CARAFENAIVEBackwardCUDAKernelLauncher(top_grad, features, masks, bottom_grad,
+                                        mask_grad, kernel_size, group_size,
+                                        scale_factor);
+}
+void carafe_naive_forward_impl(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor);
+
+void carafe_naive_backward_impl(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor);
+
+REGISTER_DEVICE_IMPL(carafe_naive_forward_impl, CUDA,
+                     carafe_naive_forward_cuda);
+REGISTER_DEVICE_IMPL(carafe_naive_backward_impl, CUDA,
+                     carafe_naive_backward_cuda);
+
+void CorrelationForwardCUDAKernelLauncher(Tensor input1, Tensor input2,
+                                          Tensor output, int kH, int kW,
+                                          int patchH, int patchW, int padH,
+                                          int padW, int dilationH,
+                                          int dilationW, int dilation_patchH,
+                                          int dilation_patchW, int dH, int dW);
+
+void CorrelationBackwardCUDAKernelLauncher(Tensor grad_output, Tensor input1,
+                                           Tensor input2, Tensor grad_input1,
+                                           Tensor grad_input2, int kH, int kW,
+                                           int patchH, int patchW, int padH,
+                                           int padW, int dilationH,
+                                           int dilationW, int dilation_patchH,
+                                           int dilation_patchW, int dH, int dW);
+
+void correlation_forward_cuda(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW) {
+  CorrelationForwardCUDAKernelLauncher(
+      input1, input2, output, kH, kW, patchH, patchW, padH, padW, dilationH,
+      dilationW, dilation_patchH, dilation_patchW, dH, dW);
+}
+
+void correlation_backward_cuda(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW) {
+  CorrelationBackwardCUDAKernelLauncher(
+      grad_output, input1, input2, grad_input1, grad_input2, kH, kW, patchH,
+      patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW);
+}
+
+void correlation_forward_impl(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW);
+
+void correlation_backward_impl(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW);
+
+REGISTER_DEVICE_IMPL(correlation_forward_impl, CUDA, correlation_forward_cuda);
+REGISTER_DEVICE_IMPL(correlation_backward_impl, CUDA,
+                     correlation_backward_cuda);
+
+void deformable_im2col_cuda(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col);
+
+void deformable_col2im_cuda(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im);
+
+void deformable_col2im_coord_cuda(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset);
+
+void deformable_im2col_impl(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col);
+
+void deformable_col2im_impl(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im);
+
+void deformable_col2im_coord_impl(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset);
+
+REGISTER_DEVICE_IMPL(deformable_im2col_impl, CUDA, deformable_im2col_cuda);
+REGISTER_DEVICE_IMPL(deformable_col2im_impl, CUDA, deformable_col2im_cuda);
+REGISTER_DEVICE_IMPL(deformable_col2im_coord_impl, CUDA,
+                     deformable_col2im_coord_cuda);
+
+void DeformRoIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                            Tensor offset, Tensor output,
+                                            int pooled_height, int pooled_width,
+                                            float spatial_scale,
+                                            int sampling_ratio, float gamma);
+
+void DeformRoIPoolBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor input, Tensor rois, Tensor offset,
+    Tensor grad_input, Tensor grad_offset, int pooled_height, int pooled_width,
+    float spatial_scale, int sampling_ratio, float gamma);
+
+void deform_roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  DeformRoIPoolForwardCUDAKernelLauncher(input, rois, offset, output,
+                                         pooled_height, pooled_width,
+                                         spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_backward_cuda(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma) {
+  DeformRoIPoolBackwardCUDAKernelLauncher(
+      grad_output, input, rois, offset, grad_input, grad_offset, pooled_height,
+      pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma);
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma);
+
+REGISTER_DEVICE_IMPL(deform_roi_pool_forward_impl, CUDA,
+                     deform_roi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(deform_roi_pool_backward_impl, CUDA,
+                     deform_roi_pool_backward_cuda);
+
+void SigmoidFocalLossForwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha);
+
+void SigmoidFocalLossBackwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                                Tensor weight,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha);
+
+void SoftmaxFocalLossForwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha);
+
+void SoftmaxFocalLossBackwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                                Tensor weight, Tensor buff,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha);
+
+void sigmoid_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  SigmoidFocalLossForwardCUDAKernelLauncher(input, target, weight, output,
+                                            gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha) {
+  SigmoidFocalLossBackwardCUDAKernelLauncher(input, target, weight, grad_input,
+                                             gamma, alpha);
+}
+
+void softmax_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  SoftmaxFocalLossForwardCUDAKernelLauncher(input, target, weight, output,
+                                            gamma, alpha);
+}
+
+void softmax_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha) {
+  SoftmaxFocalLossBackwardCUDAKernelLauncher(input, target, weight, buff,
+                                             grad_input, gamma, alpha);
+}
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha);
+
+void softmax_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void softmax_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha);
+
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_forward_impl, CUDA,
+                     sigmoid_focal_loss_forward_cuda);
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_backward_impl, CUDA,
+                     sigmoid_focal_loss_backward_cuda);
+REGISTER_DEVICE_IMPL(softmax_focal_loss_forward_impl, CUDA,
+                     softmax_focal_loss_forward_cuda);
+REGISTER_DEVICE_IMPL(softmax_focal_loss_backward_impl, CUDA,
+                     softmax_focal_loss_backward_cuda);
+
+void FurthestPointSamplingForwardCUDAKernelLauncher(int b, int n, int m,
+                                                    const float* dataset,
+                                                    float* temp, int* idxs);
+
+void FurthestPointSamplingWithDistForwardCUDAKernelLauncher(
+    int b, int n, int m, const float* dataset, float* temp, int* idxs);
+
+void furthest_point_sampling_forward_cuda(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m) {
+  const float* dataset = points_tensor.data_ptr<float>();
+  float* temp = temp_tensor.data_ptr<float>();
+  int* idxs = idx_tensor.data_ptr<int>();
+  FurthestPointSamplingForwardCUDAKernelLauncher(b, n, m, dataset, temp, idxs);
+}
+
+void furthest_point_sampling_with_dist_forward_cuda(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m) {
+  const float* dataset = points_tensor.data_ptr<float>();
+  float* temp = temp_tensor.data_ptr<float>();
+  int* idxs = idx_tensor.data_ptr<int>();
+  FurthestPointSamplingWithDistForwardCUDAKernelLauncher(b, n, m, dataset, temp,
+                                                         idxs);
+}
+
+void furthest_point_sampling_forward_impl(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m);
+
+void furthest_point_sampling_with_dist_forward_impl(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m);
+
+REGISTER_DEVICE_IMPL(furthest_point_sampling_forward_impl, CUDA,
+                     furthest_point_sampling_forward_cuda);
+REGISTER_DEVICE_IMPL(furthest_point_sampling_with_dist_forward_impl, CUDA,
+                     furthest_point_sampling_with_dist_forward_cuda);
+
+torch::Tensor fused_bias_leakyrelu_op(const torch::Tensor& input,
+                                      const torch::Tensor& bias,
+                                      const torch::Tensor& refer, int act,
+                                      int grad, float alpha, float scale);
+
+torch::Tensor fused_bias_leakyrelu_op_impl(const torch::Tensor& input,
+                                           const torch::Tensor& bias,
+                                           const torch::Tensor& refer, int act,
+                                           int grad, float alpha, float scale);
+REGISTER_DEVICE_IMPL(fused_bias_leakyrelu_op_impl, CUDA,
+                     fused_bias_leakyrelu_op);
+
+void GatherPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           const Tensor points,
+                                           const Tensor idx, Tensor out);
+
+void GatherPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                            const Tensor grad_out,
+                                            const Tensor idx,
+                                            Tensor grad_points);
+
+void gather_points_forward_cuda(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out) {
+  GatherPointsForwardCUDAKernelLauncher(b, c, n, npoints, points, idx, out);
+};
+
+void gather_points_backward_cuda(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points) {
+  GatherPointsBackwardCUDAKernelLauncher(b, c, n, npoints, grad_out, idx,
+                                         grad_points);
+};
+
+void gather_points_forward_impl(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out);
+
+void gather_points_backward_impl(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points);
+
+REGISTER_DEVICE_IMPL(gather_points_forward_impl, CUDA,
+                     gather_points_forward_cuda);
+REGISTER_DEVICE_IMPL(gather_points_backward_impl, CUDA,
+                     gather_points_backward_cuda);
+
+void GroupPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                          int nsample, const Tensor points,
+                                          const Tensor idx, Tensor out);
+
+void GroupPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           int nsample, const Tensor grad_out,
+                                           const Tensor idx,
+                                           Tensor grad_points);
+
+void group_points_forward_cuda(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out) {
+  GroupPointsForwardCUDAKernelLauncher(b, c, n, npoints, nsample, points, idx,
+                                       out);
+};
+
+void group_points_backward_cuda(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points) {
+  GroupPointsBackwardCUDAKernelLauncher(b, c, n, npoints, nsample, grad_out,
+                                        idx, grad_points);
+};
+
+void group_points_forward_impl(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out);
+
+void group_points_backward_impl(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points);
+
+REGISTER_DEVICE_IMPL(group_points_forward_impl, CUDA,
+                     group_points_forward_cuda);
+REGISTER_DEVICE_IMPL(group_points_backward_impl, CUDA,
+                     group_points_backward_cuda);
+
+void IoU3DBoxesOverlapBevForwardCUDAKernelLauncher(const int num_a,
+                                                   const Tensor boxes_a,
+                                                   const int num_b,
+                                                   const Tensor boxes_b,
+                                                   Tensor ans_overlap);
+
+void IoU3DNMS3DForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                         Tensor& keep_num,
+                                         float nms_overlap_thresh);
+
+void IoU3DNMS3DNormalForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                               Tensor& keep_num,
+                                               float nms_overlap_thresh);
+
+void iou3d_boxes_overlap_bev_forward_cuda(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap) {
+  IoU3DBoxesOverlapBevForwardCUDAKernelLauncher(num_a, boxes_a, num_b, boxes_b,
+                                                ans_overlap);
+};
+
+void iou3d_nms3d_forward_cuda(const Tensor boxes, Tensor& keep,
+                              Tensor& keep_num, float nms_overlap_thresh) {
+  IoU3DNMS3DForwardCUDAKernelLauncher(boxes, keep, keep_num,
+                                      nms_overlap_thresh);
+};
+
+void iou3d_nms3d_normal_forward_cuda(const Tensor boxes, Tensor& keep,
+                                     Tensor& keep_num,
+                                     float nms_overlap_thresh) {
+  IoU3DNMS3DNormalForwardCUDAKernelLauncher(boxes, keep, keep_num,
+                                            nms_overlap_thresh);
+};
+
+void iou3d_boxes_overlap_bev_forward_impl(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap);
+
+void iou3d_nms3d_forward_impl(const Tensor boxes, Tensor& keep,
+                              Tensor& keep_num, float nms_overlap_thresh);
+
+void iou3d_nms3d_normal_forward_impl(const Tensor boxes, Tensor& keep,
+                                     Tensor& keep_num,
+                                     float nms_overlap_thresh);
+
+REGISTER_DEVICE_IMPL(iou3d_boxes_overlap_bev_forward_impl, CUDA,
+                     iou3d_boxes_overlap_bev_forward_cuda);
+REGISTER_DEVICE_IMPL(iou3d_nms3d_forward_impl, CUDA, iou3d_nms3d_forward_cuda);
+REGISTER_DEVICE_IMPL(iou3d_nms3d_normal_forward_impl, CUDA,
+                     iou3d_nms3d_normal_forward_cuda);
+
+void KNNForwardCUDAKernelLauncher(int b, int n, int m, int nsample,
+                                  const Tensor xyz, const Tensor new_xyz,
+                                  Tensor idx, Tensor dist2);
+
+void knn_forward_cuda(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2) {
+  KNNForwardCUDAKernelLauncher(b, n, m, nsample, xyz, new_xyz, idx, dist2);
+}
+
+void knn_forward_impl(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2);
+REGISTER_DEVICE_IMPL(knn_forward_impl, CUDA, knn_forward_cuda);
+
+void MaskedIm2colForwardCUDAKernelLauncher(const Tensor bottom_data,
+                                           const Tensor mask_h_idx,
+                                           const Tensor mask_w_idx,
+                                           Tensor top_data, const int kernel_h,
+                                           const int kernel_w, const int pad_h,
+                                           const int pad_w);
+
+void MaskedCol2imForwardCUDAKernelLauncher(const Tensor bottom_data,
+                                           const Tensor mask_h_idx,
+                                           const Tensor mask_w_idx,
+                                           Tensor top_data, const int height,
+                                           const int width, const int channels);
+
+void masked_im2col_forward_cuda(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kw), col: (kh * kw * ic, ow * oh)
+  MaskedIm2colForwardCUDAKernelLauncher(im, mask_h_idx, mask_w_idx, col,
+                                        kernel_h, kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_cuda(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kh), col: (kh * kw * ic, ow * oh)
+  MaskedCol2imForwardCUDAKernelLauncher(col, mask_h_idx, mask_w_idx, im, height,
+                                        width, channels);
+}
+
+void masked_im2col_forward_impl(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w);
+
+void masked_col2im_forward_impl(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels);
+
+REGISTER_DEVICE_IMPL(masked_im2col_forward_impl, CUDA,
+                     masked_im2col_forward_cuda);
+REGISTER_DEVICE_IMPL(masked_col2im_forward_impl, CUDA,
+                     masked_col2im_forward_cuda);
+
+void modulated_deformable_im2col_cuda(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col);
+
+void modulated_deformable_col2im_cuda(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im);
+
+void modulated_deformable_col2im_coord_cuda(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask);
+
+void modulated_deformable_im2col_impl(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col);
+
+void modulated_deformable_col2im_impl(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im);
+
+void modulated_deformable_col2im_coord_impl(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask);
+
+REGISTER_DEVICE_IMPL(modulated_deformable_im2col_impl, CUDA,
+                     modulated_deformable_im2col_cuda);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_impl, CUDA,
+                     modulated_deformable_col2im_cuda);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_coord_impl, CUDA,
+                     modulated_deformable_col2im_coord_cuda);
+
+Tensor ms_deform_attn_cuda_forward(const Tensor& value,
+                                   const Tensor& spatial_shapes,
+                                   const Tensor& level_start_index,
+                                   const Tensor& sampling_loc,
+                                   const Tensor& attn_weight,
+                                   const int im2col_step);
+
+void ms_deform_attn_cuda_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight, const int im2col_step);
+
+Tensor ms_deform_attn_impl_forward(const Tensor& value,
+                                   const Tensor& spatial_shapes,
+                                   const Tensor& level_start_index,
+                                   const Tensor& sampling_loc,
+                                   const Tensor& attn_weight,
+                                   const int im2col_step);
+
+void ms_deform_attn_impl_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight, const int im2col_step);
+
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_forward, CUDA,
+                     ms_deform_attn_cuda_forward);
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_backward, CUDA,
+                     ms_deform_attn_cuda_backward);
+
+Tensor NMSCUDAKernelLauncher(Tensor boxes, Tensor scores, float iou_threshold,
+                             int offset);
+
+Tensor nms_cuda(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return NMSCUDAKernelLauncher(boxes, scores, iou_threshold, offset);
+}
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+REGISTER_DEVICE_IMPL(nms_impl, CUDA, nms_cuda);
+
+void PointsInBoxesPartForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                                int pts_num, const Tensor boxes,
+                                                const Tensor pts,
+                                                Tensor box_idx_of_points);
+
+void PointsInBoxesAllForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                               int pts_num, const Tensor boxes,
+                                               const Tensor pts,
+                                               Tensor box_idx_of_points);
+
+void points_in_boxes_part_forward_cuda(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points) {
+  PointsInBoxesPartForwardCUDAKernelLauncher(batch_size, boxes_num, pts_num,
+                                             boxes, pts, box_idx_of_points);
+};
+
+void points_in_boxes_all_forward_cuda(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points) {
+  PointsInBoxesAllForwardCUDAKernelLauncher(batch_size, boxes_num, pts_num,
+                                            boxes, pts, box_idx_of_points);
+};
+
+void points_in_boxes_part_forward_impl(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points);
+
+void points_in_boxes_all_forward_impl(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points);
+REGISTER_DEVICE_IMPL(points_in_boxes_part_forward_impl, CUDA,
+                     points_in_boxes_part_forward_cuda);
+REGISTER_DEVICE_IMPL(points_in_boxes_all_forward_impl, CUDA,
+                     points_in_boxes_all_forward_cuda);
+
+void PSAMaskForwardCUDAKernelLauncher(const int psa_type, const Tensor input,
+                                      Tensor output, const int num_,
+                                      const int h_feature, const int w_feature,
+                                      const int h_mask, const int w_mask,
+                                      const int half_h_mask,
+                                      const int half_w_mask);
+
+void PSAMaskBackwardCUDAKernelLauncher(
+    const int psa_type, const Tensor grad_output, Tensor grad_input,
+    const int num_, const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int half_h_mask, const int half_w_mask);
+
+void psamask_forward_cuda(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask) {
+  PSAMaskForwardCUDAKernelLauncher(psa_type, input, output, num_, h_feature,
+                                   w_feature, h_mask, w_mask, half_h_mask,
+                                   half_w_mask);
+}
+
+void psamask_backward_cuda(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask) {
+  PSAMaskBackwardCUDAKernelLauncher(psa_type, grad_output, grad_input, num_,
+                                    h_feature, w_feature, h_mask, w_mask,
+                                    half_h_mask, half_w_mask);
+}
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask);
+REGISTER_DEVICE_IMPL(psamask_forward_impl, CUDA, psamask_forward_cuda);
+REGISTER_DEVICE_IMPL(psamask_backward_impl, CUDA, psamask_backward_cuda);
+
+void ROIAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       int pool_mode, bool aligned);
+
+void ROIAlignBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                        Tensor argmax_y, Tensor argmax_x,
+                                        Tensor grad_input, int aligned_height,
+                                        int aligned_width, float spatial_scale,
+                                        int sampling_ratio, int pool_mode,
+                                        bool aligned);
+
+void roi_align_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignForwardCUDAKernelLauncher(
+      input, rois, output, argmax_y, argmax_x, aligned_height, aligned_width,
+      spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned) {
+  ROIAlignBackwardCUDAKernelLauncher(
+      grad_output, rois, argmax_y, argmax_x, grad_input, aligned_height,
+      aligned_width, spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned);
+
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, CUDA, roi_align_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_align_backward_impl, CUDA, roi_align_backward_cuda);
+
+void ROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor input, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor output);
+
+void ROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor bottom_grad);
+
+void roi_align_rotated_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+
+  int num_channels = input.size(1);
+  int data_height = input.size(2);
+  int data_width = input.size(3);
+  ROIAlignRotatedForwardCUDAKernelLauncher(
+      input, rois, spatial_scale, sampling_ratio, aligned, clockwise,
+      num_channels, data_height, data_width, num_rois, aligned_height,
+      aligned_width, output);
+}
+
+void roi_align_rotated_backward_cuda(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+
+  int num_channels = bottom_grad.size(1);
+  int data_height = bottom_grad.size(2);
+  int data_width = bottom_grad.size(3);
+  ROIAlignRotatedBackwardCUDAKernelLauncher(
+      top_grad, rois, spatial_scale, sampling_ratio, aligned, clockwise,
+      num_channels, data_height, data_width, num_rois, aligned_height,
+      aligned_width, bottom_grad);
+}
+
+void roi_align_rotated_forward_impl(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise);
+REGISTER_DEVICE_IMPL(roi_align_rotated_forward_impl, CUDA,
+                     roi_align_rotated_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_align_rotated_backward_impl, CUDA,
+                     roi_align_rotated_backward_cuda);
+
+void RiROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor features, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor output);
+
+void RiROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor bottom_grad);
+
+void riroi_align_rotated_forward_cuda(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+  CHECK_CONTIGUOUS(features);
+  CHECK_CONTIGUOUS(rois);
+  int num_channels = features.size(1) / num_orientations;
+  int data_height = features.size(2);
+  int data_width = features.size(3);
+  RiROIAlignRotatedForwardCUDAKernelLauncher(
+      features, rois, spatial_scale, num_samples, clockwise, num_channels,
+      data_height, data_width, num_rois, pooled_height, pooled_width,
+      num_orientations, output);
+}
+
+void riroi_align_rotated_backward_cuda(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+  CHECK_CONTIGUOUS(top_grad);
+  CHECK_CONTIGUOUS(rois);
+  int num_channels = bottom_grad.size(1) / num_orientations;
+  int data_height = bottom_grad.size(2);
+  int data_width = bottom_grad.size(3);
+  RiROIAlignRotatedBackwardCUDAKernelLauncher(
+      top_grad, rois, spatial_scale, num_samples, clockwise, num_channels,
+      data_height, data_width, num_rois, pooled_height, pooled_width,
+      num_orientations, bottom_grad);
+}
+
+void riroi_align_rotated_forward_impl(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise);
+
+void riroi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise);
+
+REGISTER_DEVICE_IMPL(riroi_align_rotated_forward_impl, CUDA,
+                     riroi_align_rotated_forward_cuda);
+REGISTER_DEVICE_IMPL(riroi_align_rotated_backward_impl, CUDA,
+                     riroi_align_rotated_backward_cuda);
+
+void RoiawarePool3dForwardCUDAKernelLauncher(
+    int boxes_num, int pts_num, int channels, int max_pts_each_voxel, int out_x,
+    int out_y, int out_z, const Tensor rois, const Tensor pts,
+    const Tensor pts_feature, Tensor argmax, Tensor pts_idx_of_voxels,
+    Tensor pooled_features, int pool_method);
+
+void RoiawarePool3dBackwardCUDAKernelLauncher(
+    int boxes_num, int out_x, int out_y, int out_z, int channels,
+    int max_pts_each_voxel, const Tensor pts_idx_of_voxels, const Tensor argmax,
+    const Tensor grad_out, Tensor grad_in, int pool_method);
+
+void roiaware_pool3d_forward_cuda(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method) {
+  RoiawarePool3dForwardCUDAKernelLauncher(
+      boxes_num, pts_num, channels, max_pts_each_voxel, out_x, out_y, out_z,
+      rois, pts, pts_feature, argmax, pts_idx_of_voxels, pooled_features,
+      pool_method);
+};
+
+void roiaware_pool3d_backward_cuda(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method) {
+  RoiawarePool3dBackwardCUDAKernelLauncher(
+      boxes_num, out_x, out_y, out_z, channels, max_pts_each_voxel,
+      pts_idx_of_voxels, argmax, grad_out, grad_in, pool_method);
+};
+
+void roiaware_pool3d_forward_impl(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method);
+
+void roiaware_pool3d_backward_impl(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method);
+
+REGISTER_DEVICE_IMPL(roiaware_pool3d_forward_impl, CUDA,
+                     roiaware_pool3d_forward_cuda);
+REGISTER_DEVICE_IMPL(roiaware_pool3d_backward_impl, CUDA,
+                     roiaware_pool3d_backward_cuda);
+
+void RoIPointPool3dForwardCUDAKernelLauncher(
+    int batch_size, int pts_num, int boxes_num, int feature_in_len,
+    int sampled_pts_num, const Tensor xyz, const Tensor boxes3d,
+    const Tensor pts_feature, Tensor pooled_features, Tensor pooled_empty_flag);
+
+void roipoint_pool3d_forward_cuda(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag) {
+  RoIPointPool3dForwardCUDAKernelLauncher(
+      batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num, xyz,
+      boxes3d, pts_feature, pooled_features, pooled_empty_flag);
+};
+
+void roipoint_pool3d_forward_impl(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag);
+REGISTER_DEVICE_IMPL(roipoint_pool3d_forward_impl, CUDA,
+                     roipoint_pool3d_forward_cuda);
+
+void ROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                      Tensor argmax, int pooled_height,
+                                      int pooled_width, float spatial_scale);
+
+void ROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                       Tensor argmax, Tensor grad_input,
+                                       int pooled_height, int pooled_width,
+                                       float spatial_scale);
+
+void roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale) {
+  ROIPoolForwardCUDAKernelLauncher(input, rois, output, argmax, pooled_height,
+                                   pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale) {
+  ROIPoolBackwardCUDAKernelLauncher(grad_output, rois, argmax, grad_input,
+                                    pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale);
+void roi_pool_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale);
+REGISTER_DEVICE_IMPL(roi_pool_forward_impl, CUDA, roi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_pool_backward_impl, CUDA, roi_pool_backward_cuda);
+
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
+std::vector<at::Tensor> DynamicPointToVoxelForwardCUDAKernelLauncher(
+    const at::Tensor& feats, const at::Tensor& coors,
+    const reduce_t reduce_type);
+
+void DynamicPointToVoxelBackwardCUDAKernelLauncher(
+    at::Tensor& grad_feats, const at::Tensor& grad_reduced_feats,
+    const at::Tensor& feats, const at::Tensor& reduced_feats,
+    const at::Tensor& coors_map, const at::Tensor& reduce_count,
+    const reduce_t reduce_type);
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_cuda(
+    const torch::Tensor& feats, const torch::Tensor& coors,
+    const reduce_t reduce_type) {
+  return DynamicPointToVoxelForwardCUDAKernelLauncher(feats, coors,
+                                                      reduce_type);
+};
+
+void dynamic_point_to_voxel_backward_cuda(
+    torch::Tensor& grad_feats, const torch::Tensor& grad_reduced_feats,
+    const torch::Tensor& feats, const torch::Tensor& reduced_feats,
+    const torch::Tensor& coors_idx, const torch::Tensor& reduce_count,
+    const reduce_t reduce_type) {
+  DynamicPointToVoxelBackwardCUDAKernelLauncher(grad_feats, grad_reduced_feats,
+                                                feats, reduced_feats, coors_idx,
+                                                reduce_count, reduce_type);
+};
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_impl(
+    const torch::Tensor& feats, const torch::Tensor& coors,
+    const reduce_t reduce_type);
+
+void dynamic_point_to_voxel_backward_impl(
+    torch::Tensor& grad_feats, const torch::Tensor& grad_reduced_feats,
+    const torch::Tensor& feats, const torch::Tensor& reduced_feats,
+    const torch::Tensor& coors_idx, const torch::Tensor& reduce_count,
+    const reduce_t reduce_type);
+
+REGISTER_DEVICE_IMPL(dynamic_point_to_voxel_forward_impl, CUDA,
+                     dynamic_point_to_voxel_forward_cuda);
+REGISTER_DEVICE_IMPL(dynamic_point_to_voxel_backward_impl, CUDA,
+                     dynamic_point_to_voxel_backward_cuda);
+
+void SyncBNForwardMeanCUDAKernelLauncher(const Tensor input, Tensor mean);
+
+void SyncBNForwardVarCUDAKernelLauncher(const Tensor input, const Tensor mean,
+                                        Tensor var);
+
+void SyncBNForwardOutputCUDAKernelLauncher(
+    const Tensor input, const Tensor mean, const Tensor var,
+    Tensor running_mean, Tensor running_var, const Tensor weight,
+    const Tensor bias, Tensor norm, Tensor std, Tensor output, float eps,
+    float momentum, int group_size);
+
+void SyncBNBackwardParamCUDAKernelLauncher(const Tensor grad_output,
+                                           const Tensor norm,
+                                           Tensor grad_weight,
+                                           Tensor grad_bias);
+
+void SyncBNBackwardDataCUDAKernelLauncher(const Tensor grad_output,
+                                          const Tensor weight,
+                                          const Tensor grad_weight,
+                                          const Tensor grad_bias,
+                                          const Tensor norm, const Tensor std,
+                                          Tensor grad_input);
+
+void sync_bn_forward_mean_cuda(const Tensor input, Tensor mean) {
+  SyncBNForwardMeanCUDAKernelLauncher(input, mean);
+}
+
+void sync_bn_forward_var_cuda(const Tensor input, const Tensor mean,
+                              Tensor var) {
+  SyncBNForwardVarCUDAKernelLauncher(input, mean, var);
+}
+
+void sync_bn_forward_output_cuda(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size) {
+  SyncBNForwardOutputCUDAKernelLauncher(input, mean, var, running_mean,
+                                        running_var, weight, bias, norm, std,
+                                        output, eps, momentum, group_size);
+}
+
+void sync_bn_backward_param_cuda(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias) {
+  SyncBNBackwardParamCUDAKernelLauncher(grad_output, norm, grad_weight,
+                                        grad_bias);
+}
+
+void sync_bn_backward_data_cuda(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input) {
+  SyncBNBackwardDataCUDAKernelLauncher(grad_output, weight, grad_weight,
+                                       grad_bias, norm, std, grad_input);
+}
+
+void sync_bn_forward_mean_impl(const Tensor input, Tensor mean);
+
+void sync_bn_forward_var_impl(const Tensor input, const Tensor mean,
+                              Tensor var);
+
+void sync_bn_forward_output_impl(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size);
+
+void sync_bn_backward_param_impl(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias);
+
+void sync_bn_backward_data_impl(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input);
+
+REGISTER_DEVICE_IMPL(sync_bn_forward_mean_impl, CUDA,
+                     sync_bn_forward_mean_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_forward_var_impl, CUDA, sync_bn_forward_var_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_forward_output_impl, CUDA,
+                     sync_bn_forward_output_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_backward_param_impl, CUDA,
+                     sync_bn_backward_param_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_backward_data_impl, CUDA,
+                     sync_bn_backward_data_cuda);
+
+void ThreeInterpolateForwardCUDAKernelLauncher(int b, int c, int m, int n,
+                                               const Tensor points,
+                                               const Tensor idx,
+                                               const Tensor weight, Tensor out);
+
+void ThreeInterpolateBackwardCUDAKernelLauncher(int b, int c, int n, int m,
+                                                const Tensor grad_out,
+                                                const Tensor idx,
+                                                const Tensor weight,
+                                                Tensor grad_points);
+
+void three_interpolate_forward_cuda(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out) {
+  ThreeInterpolateForwardCUDAKernelLauncher(b, c, m, n, points, idx, weight,
+                                            out);
+};
+
+void three_interpolate_backward_cuda(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points) {
+  ThreeInterpolateBackwardCUDAKernelLauncher(b, c, n, m, grad_out, idx, weight,
+                                             grad_points);
+};
+
+void three_interpolate_forward_impl(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out);
+
+void three_interpolate_backward_impl(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points);
+REGISTER_DEVICE_IMPL(three_interpolate_forward_impl, CUDA,
+                     three_interpolate_forward_cuda);
+REGISTER_DEVICE_IMPL(three_interpolate_backward_impl, CUDA,
+                     three_interpolate_backward_cuda);
+
+void ThreeNNForwardCUDAKernelLauncher(int b, int n, int m, const Tensor unknown,
+                                      const Tensor known, Tensor dist2,
+                                      Tensor idx);
+
+void three_nn_forward_cuda(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx) {
+  ThreeNNForwardCUDAKernelLauncher(b, n, m, unknown, known, dist2, idx);
+};
+
+void three_nn_forward_impl(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx);
+REGISTER_DEVICE_IMPL(three_nn_forward_impl, CUDA, three_nn_forward_cuda);
+
+void TINShiftForwardCUDAKernelLauncher(Tensor input, Tensor shift,
+                                       Tensor output);
+
+void TINShiftBackwardCUDAKernelLauncher(Tensor grad_output, Tensor shift,
+                                        Tensor grad_input);
+
+void tin_shift_forward_cuda(Tensor input, Tensor shift, Tensor output) {
+  TINShiftForwardCUDAKernelLauncher(input, shift, output);
+}
+
+void tin_shift_backward_cuda(Tensor grad_output, Tensor shift,
+                             Tensor grad_input) {
+  TINShiftBackwardCUDAKernelLauncher(grad_output, shift, grad_input);
+}
+
+void tin_shift_forward_impl(Tensor input, Tensor shift, Tensor output);
+void tin_shift_backward_impl(Tensor grad_output, Tensor shift,
+                             Tensor grad_input);
+REGISTER_DEVICE_IMPL(tin_shift_forward_impl, CUDA, tin_shift_forward_cuda);
+REGISTER_DEVICE_IMPL(tin_shift_backward_impl, CUDA, tin_shift_backward_cuda);
+
+torch::Tensor upfirdn2d_op(const torch::Tensor& input,
+                           const torch::Tensor& kernel, int up_x, int up_y,
+                           int down_x, int down_y, int pad_x0, int pad_x1,
+                           int pad_y0, int pad_y1);
+
+torch::Tensor upfirdn2d_op_impl(const torch::Tensor& input,
+                                const torch::Tensor& kernel, int up_x, int up_y,
+                                int down_x, int down_y, int pad_x0, int pad_x1,
+                                int pad_y0, int pad_y1);
+REGISTER_DEVICE_IMPL(upfirdn2d_op_impl, CUDA, upfirdn2d_op);
+
+int HardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3);
+
+int NondeterministicHardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3);
+
+void DynamicVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& coors,
+    const std::vector<float> voxel_size, const std::vector<float> coors_range,
+    const int NDim = 3);
+
+int hard_voxelize_forward_cuda(const at::Tensor& points, at::Tensor& voxels,
+                               at::Tensor& coors,
+                               at::Tensor& num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim) {
+  return HardVoxelizeForwardCUDAKernelLauncher(
+      points, voxels, coors, num_points_per_voxel, voxel_size, coors_range,
+      max_points, max_voxels, NDim);
+};
+
+int nondeterministic_hard_voxelize_forward_cuda(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim) {
+  return NondeterministicHardVoxelizeForwardCUDAKernelLauncher(
+      points, voxels, coors, num_points_per_voxel, voxel_size, coors_range,
+      max_points, max_voxels, NDim);
+};
+
+void dynamic_voxelize_forward_cuda(const at::Tensor& points, at::Tensor& coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim) {
+  DynamicVoxelizeForwardCUDAKernelLauncher(points, coors, voxel_size,
+                                           coors_range, NDim);
+};
+
+int hard_voxelize_forward_impl(const at::Tensor& points, at::Tensor& voxels,
+                               at::Tensor& coors,
+                               at::Tensor& num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim);
+
+int nondeterministic_hard_voxelize_forward_impl(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim);
+
+void dynamic_voxelize_forward_impl(const at::Tensor& points, at::Tensor& coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim);
+
+REGISTER_DEVICE_IMPL(hard_voxelize_forward_impl, CUDA,
+                     hard_voxelize_forward_cuda);
+REGISTER_DEVICE_IMPL(nondeterministic_hard_voxelize_forward_impl, CUDA,
+                     nondeterministic_hard_voxelize_forward_cuda);
+REGISTER_DEVICE_IMPL(dynamic_voxelize_forward_impl, CUDA,
+                     dynamic_voxelize_forward_cuda);
+
+void RotatedFeatureAlignForwardCUDAKernelLauncher(const Tensor features,
+                                                  const Tensor best_bboxes,
+                                                  const float spatial_scale,
+                                                  const int points,
+                                                  Tensor output);
+
+void RotatedFeatureAlignBackwardCUDAKernelLauncher(const Tensor top_grad,
+                                                   const Tensor best_bboxes,
+                                                   const float spatial_scale,
+                                                   const int points,
+                                                   Tensor bottom_grad);
+
+void rotated_feature_align_forward_cuda(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output) {
+  RotatedFeatureAlignForwardCUDAKernelLauncher(features, best_bboxes,
+                                               spatial_scale, points, output);
+};
+
+void rotated_feature_align_backward_cuda(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad) {
+  RotatedFeatureAlignBackwardCUDAKernelLauncher(
+      top_grad, best_bboxes, spatial_scale, points, bottom_grad);
+};
+
+void rotated_feature_align_forward_impl(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output);
+
+void rotated_feature_align_backward_impl(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad);
+
+REGISTER_DEVICE_IMPL(rotated_feature_align_forward_impl, CUDA,
+                     rotated_feature_align_forward_cuda);
+REGISTER_DEVICE_IMPL(rotated_feature_align_backward_impl, CUDA,
+                     rotated_feature_align_backward_cuda);
+
+void PointsInPolygonsForwardCUDAKernelLauncher(const at::Tensor points,
+                                               const at::Tensor polygons,
+                                               const int rows, const int cols,
+                                               at::Tensor output);
+
+void points_in_polygons_forward_cuda(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols) {
+  PointsInPolygonsForwardCUDAKernelLauncher(points, polygons, rows, cols,
+                                            output);
+};
+
+void points_in_polygons_forward_impl(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols);
+
+REGISTER_DEVICE_IMPL(points_in_polygons_forward_impl, CUDA,
+                     points_in_polygons_forward_cuda);
+
+void MinAreaPolygonsCUDAKernelLauncher(const Tensor pointsets, Tensor polygons);
+
+void min_area_polygons_cuda(const Tensor pointsets, Tensor polygons) {
+  MinAreaPolygonsCUDAKernelLauncher(pointsets, polygons);
+}
+
+void min_area_polygons_impl(const Tensor pointsets, Tensor polygons);
+
+REGISTER_DEVICE_IMPL(min_area_polygons_impl, CUDA, min_area_polygons_cuda);
+
+void ActiveRotatedFilterForwardCUDAKernelLauncher(const Tensor input,
+                                                  const Tensor indices,
+                                                  Tensor output);
+
+void ActiveRotatedFilterBackwardCUDAKernelLauncher(const Tensor grad_out,
+                                                   const Tensor indices,
+                                                   Tensor grad_in);
+
+void active_rotated_filter_forward_cuda(const Tensor input,
+                                        const Tensor indices, Tensor output) {
+  ActiveRotatedFilterForwardCUDAKernelLauncher(input, indices, output);
+};
+
+void active_rotated_filter_backward_cuda(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in) {
+  ActiveRotatedFilterBackwardCUDAKernelLauncher(grad_out, indices, grad_in);
+};
+
+void active_rotated_filter_forward_impl(const Tensor input,
+                                        const Tensor indices, Tensor output);
+
+void active_rotated_filter_backward_impl(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in);
+
+REGISTER_DEVICE_IMPL(active_rotated_filter_forward_impl, CUDA,
+                     active_rotated_filter_forward_cuda);
+REGISTER_DEVICE_IMPL(active_rotated_filter_backward_impl, CUDA,
+                     active_rotated_filter_backward_cuda);
+
+void ConvexIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                 Tensor ious);
+
+void ConvexGIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                  Tensor output);
+
+void convex_iou_cuda(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious) {
+  ConvexIoUCUDAKernelLauncher(pointsets, polygons, ious);
+}
+
+void convex_giou_cuda(const Tensor pointsets, const Tensor polygons,
+                      Tensor output) {
+  ConvexGIoUCUDAKernelLauncher(pointsets, polygons, output);
+}
+
+void convex_iou_impl(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious);
+
+void convex_giou_impl(const Tensor pointsets, const Tensor polygons,
+                      Tensor output);
+
+REGISTER_DEVICE_IMPL(convex_iou_impl, CUDA, convex_iou_cuda);
+REGISTER_DEVICE_IMPL(convex_giou_impl, CUDA, convex_giou_cuda);
+
+Tensor DiffIoURotatedSortVerticesCUDAKernelLauncher(Tensor vertices,
+                                                    Tensor mask,
+                                                    Tensor num_valid);
+
+Tensor diff_iou_rotated_sort_vertices_forward_cuda(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid) {
+  return DiffIoURotatedSortVerticesCUDAKernelLauncher(vertices, mask,
+                                                      num_valid);
+}
+
+Tensor diff_iou_rotated_sort_vertices_forward_impl(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid);
+
+REGISTER_DEVICE_IMPL(diff_iou_rotated_sort_vertices_forward_impl, CUDA,
+                     diff_iou_rotated_sort_vertices_forward_cuda);
+
+void ChamferDistanceForwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, const Tensor dist1,
+    const Tensor dist2, const Tensor idx1, const Tensor idx2);
+
+void ChamferDistanceBackwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, Tensor idx1, Tensor idx2,
+    Tensor grad_dist1, Tensor grad_dist2, Tensor grad_xyz1, Tensor grad_xyz2);
+
+void chamfer_distance_forward_cuda(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2) {
+  ChamferDistanceForwardCUDAKernelLauncher(xyz1, xyz2, dist1, dist2, idx1,
+                                           idx2);
+};
+
+void chamfer_distance_backward_cuda(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2) {
+  ChamferDistanceBackwardCUDAKernelLauncher(xyz1, xyz2, idx1, idx2, graddist1,
+                                            graddist2, gradxyz1, gradxyz2);
+};
+
+void chamfer_distance_forward_impl(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2);
+
+void chamfer_distance_backward_impl(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2);
+
+REGISTER_DEVICE_IMPL(chamfer_distance_forward_impl, CUDA,
+                     chamfer_distance_forward_cuda);
+REGISTER_DEVICE_IMPL(chamfer_distance_backward_impl, CUDA,
+                     chamfer_distance_backward_cuda);
+
+void PrROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                        Tensor output, int pooled_height,
+                                        int pooled_width, float spatial_scale);
+
+void PrROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                         Tensor grad_input, int pooled_height,
+                                         int pooled_width, float spatial_scale);
+
+void PrROIPoolCoorBackwardCUDAKernelLauncher(
+    Tensor output, Tensor grad_output, Tensor input, Tensor rois,
+    Tensor grad_rois, int pooled_height, int pooled_width, float spatial_scale);
+
+void prroi_pool_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale) {
+  PrROIPoolForwardCUDAKernelLauncher(input, rois, output, pooled_height,
+                                     pooled_width, spatial_scale);
+}
+
+void prroi_pool_backward_cuda(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  PrROIPoolBackwardCUDAKernelLauncher(grad_output, rois, grad_input,
+                                      pooled_height, pooled_width,
+                                      spatial_scale);
+}
+
+void prroi_pool_coor_backward_cuda(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale) {
+  PrROIPoolCoorBackwardCUDAKernelLauncher(output, grad_output, input, rois,
+                                          grad_rois, pooled_height,
+                                          pooled_width, spatial_scale);
+}
+
+void prroi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale);
+void prroi_pool_backward_impl(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale);
+void prroi_pool_coor_backward_impl(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale);
+REGISTER_DEVICE_IMPL(prroi_pool_forward_impl, CUDA, prroi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(prroi_pool_backward_impl, CUDA, prroi_pool_backward_cuda);
+REGISTER_DEVICE_IMPL(prroi_pool_coor_backward_impl, CUDA,
+                     prroi_pool_coor_backward_cuda);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..86690b9394a4b758104009062f656dcfe0de178e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv.cpp
@@ -0,0 +1,517 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void deformable_im2col_impl(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col) {
+  DISPATCH_DEVICE_IMPL(deformable_im2col_impl, data_im, data_offset, channels,
+                       height, width, ksize_h, ksize_w, pad_h, pad_w, stride_h,
+                       stride_w, dilation_h, dilation_w, parallel_imgs,
+                       deformable_group, data_col);
+}
+
+void deformable_col2im_impl(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im) {
+  DISPATCH_DEVICE_IMPL(deformable_col2im_impl, data_col, data_offset, channels,
+                       height, width, ksize_h, ksize_w, pad_h, pad_w, stride_h,
+                       stride_w, dilation_h, dilation_w, parallel_imgs,
+                       deformable_group, grad_im);
+}
+
+void deformable_col2im_coord_impl(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset) {
+  DISPATCH_DEVICE_IMPL(deformable_col2im_coord_impl, data_col, data_im,
+                       data_offset, channels, height, width, ksize_h, ksize_w,
+                       pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w,
+                       parallel_imgs, deformable_group, grad_offset);
+}
+
+void deform_conv_shape_check(at::Tensor input, at::Tensor offset,
+                             at::Tensor *gradOutput, at::Tensor weight, int kH,
+                             int kW, int dH, int dW, int padH, int padW,
+                             int dilationH, int dilationW, int group,
+                             int deformable_group) {
+  TORCH_CHECK(
+      weight.ndimension() == 4,
+      "4D weight tensor (nOutputPlane,nInputPlane,kH,kW) expected, but got: %s",
+      weight.ndimension());
+
+  TORCH_CHECK(weight.is_contiguous(), "weight tensor has to be contiguous");
+
+  TORCH_CHECK(kW > 0 && kH > 0,
+              "kernel size should be greater than zero, but got kH: %d kW: %d",
+              kH, kW);
+
+  TORCH_CHECK((weight.size(2) == kH && weight.size(3) == kW),
+              "kernel size should be consistent with weight, ",
+              "but got kH: %d kW: %d weight.size(2): %d, weight.size(3): %d",
+              kH, kW, weight.size(2), weight.size(3));
+
+  TORCH_CHECK(dW > 0 && dH > 0,
+              "stride should be greater than zero, but got dH: %d dW: %d", dH,
+              dW);
+
+  TORCH_CHECK(
+      dilationW > 0 && dilationH > 0,
+      "dilation should be greater than 0, but got dilationH: %d dilationW: %d",
+      dilationH, dilationW);
+
+  int ndim = input.ndimension();
+  int dimf = 0;
+  int dimh = 1;
+  int dimw = 2;
+
+  if (ndim == 4) {
+    dimf++;
+    dimh++;
+    dimw++;
+  }
+
+  TORCH_CHECK(ndim == 3 || ndim == 4,
+              "3D or 4D input tensor expected but got: %s", ndim);
+
+  long nInputPlane = weight.size(1) * group;
+  long inputHeight = input.size(dimh);
+  long inputWidth = input.size(dimw);
+  long nOutputPlane = weight.size(0);
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+
+  TORCH_CHECK(nInputPlane % deformable_group == 0,
+              "input channels must divide deformable group size");
+
+  if (outputWidth < 1 || outputHeight < 1)
+    AT_ERROR(
+        "Given input size: (%ld x %ld x %ld). "
+        "Calculated output size: (%ld x %ld x %ld). Output size is too small",
+        nInputPlane, inputHeight, inputWidth, nOutputPlane, outputHeight,
+        outputWidth);
+
+  TORCH_CHECK(input.size(1) == nInputPlane,
+              "invalid number of input planes, expected: %d, but got: %d",
+              nInputPlane, input.size(1));
+
+  TORCH_CHECK((inputHeight >= kH && inputWidth >= kW),
+              "input image is smaller than kernel");
+
+  TORCH_CHECK(
+      (offset.size(2) == outputHeight && offset.size(3) == outputWidth),
+      "invalid spatial size of offset, expected height: %d width: %d, but "
+      "got height: %d width: %d",
+      outputHeight, outputWidth, offset.size(2), offset.size(3));
+
+  TORCH_CHECK((offset.size(1) == deformable_group * 2 * kH * kW),
+              "invalid number of channels of offset");
+
+  if (gradOutput != NULL) {
+    TORCH_CHECK(
+        gradOutput->size(dimf) == nOutputPlane,
+        "invalid number of gradOutput planes, expected: %d, but got: %d",
+        nOutputPlane, gradOutput->size(dimf));
+
+    TORCH_CHECK(
+        (gradOutput->size(dimh) == outputHeight &&
+         gradOutput->size(dimw) == outputWidth),
+        "invalid size of gradOutput, expected height: %d width: %d , but "
+        "got height: %d width: %d",
+        outputHeight, outputWidth, gradOutput->size(dimh),
+        gradOutput->size(dimw));
+  }
+}
+
+void deform_conv_forward(Tensor input, Tensor weight, Tensor offset,
+                         Tensor output, Tensor columns, Tensor ones, int kW,
+                         int kH, int dW, int dH, int padW, int padH,
+                         int dilationW, int dilationH, int group,
+                         int deformable_group, int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(weight);
+    CHECK_CUDA_INPUT(output);
+    CHECK_CUDA_INPUT(columns);
+    CHECK_CUDA_INPUT(ones);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(weight);
+    CHECK_CPU_INPUT(output);
+    CHECK_CPU_INPUT(columns);
+    CHECK_CPU_INPUT(ones);
+  }
+
+  deform_conv_shape_check(input, offset, NULL, weight, kH, kW, dH, dW, padH,
+                          padW, dilationH, dilationW, group, deformable_group);
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input.unsqueeze_(0);
+    offset.unsqueeze_(0);
+  }
+
+  // todo: assert batchsize dividable by im2col_step
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = weight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+
+  output = output.view({batchSize / im2col_step, im2col_step, nOutputPlane,
+                        outputHeight, outputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < outputHeight * outputWidth) {
+    ones = at::ones({outputHeight, outputWidth}, input.options());
+  }
+
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  Tensor output_buffer = at::zeros({batchSize / im2col_step, nOutputPlane,
+                                    im2col_step * outputHeight, outputWidth},
+                                   output.options());
+
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), group, output_buffer.size(1) / group,
+       output_buffer.size(2), output_buffer.size(3)});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col_impl(input[elt], offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group, columns);
+
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      output_buffer[elt][g] = output_buffer[elt][g]
+                                  .flatten(1)
+                                  .addmm_(weight[g].flatten(1), columns[g])
+                                  .view_as(output_buffer[elt][g]);
+    }
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+  }
+
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), output_buffer.size(1) * output_buffer.size(2),
+       output_buffer.size(3), output_buffer.size(4)});
+
+  output_buffer = output_buffer.view({batchSize / im2col_step, nOutputPlane,
+                                      im2col_step, outputHeight, outputWidth});
+  output_buffer.transpose_(1, 2);
+  output.copy_(output_buffer);
+  output = output.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    output = output.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+}
+
+void deform_conv_backward_input(Tensor input, Tensor offset, Tensor gradOutput,
+                                Tensor gradInput, Tensor gradOffset,
+                                Tensor weight, Tensor columns, int kW, int kH,
+                                int dW, int dH, int padW, int padH,
+                                int dilationW, int dilationH, int group,
+                                int deformable_group, int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(gradOutput);
+    CHECK_CUDA_INPUT(gradInput);
+    CHECK_CUDA_INPUT(gradOffset);
+    CHECK_CUDA_INPUT(weight);
+    CHECK_CUDA_INPUT(columns);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(gradOutput);
+    CHECK_CPU_INPUT(gradInput);
+    CHECK_CPU_INPUT(gradOffset);
+    CHECK_CPU_INPUT(weight);
+    CHECK_CPU_INPUT(columns);
+  }
+  deform_conv_shape_check(input, offset, &gradOutput, weight, kH, kW, dH, dW,
+                          padH, padW, dilationH, dilationW, group,
+                          deformable_group);
+
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view({1, input.size(0), input.size(1), input.size(2)});
+    offset = offset.view({1, offset.size(0), offset.size(1), offset.size(2)});
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = weight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), 3, "invalid batch size of offset");
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  // change order of grad output
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+
+  gradInput = gradInput.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                              inputHeight, inputWidth});
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  gradOffset = gradOffset.view({batchSize / im2col_step, im2col_step,
+                                deformable_group * 2 * kH * kW, outputHeight,
+                                outputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    // divide into groups
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), group, gradOutput.size(1) / group,
+         gradOutput.size(2), gradOutput.size(3), gradOutput.size(4)});
+
+    for (int g = 0; g < group; g++) {
+      columns[g] = columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                                     gradOutput[elt][g].flatten(1), 0.0f, 1.0f);
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), gradOutput.size(1) * gradOutput.size(2),
+         gradOutput.size(3), gradOutput.size(4), gradOutput.size(5)});
+
+    deformable_col2im_coord_impl(columns, input[elt], offset[elt], nInputPlane,
+                                 inputHeight, inputWidth, kH, kW, padH, padW,
+                                 dH, dW, dilationH, dilationW, im2col_step,
+                                 deformable_group, gradOffset[elt]);
+
+    deformable_col2im_impl(columns, offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group,
+                           gradInput[elt]);
+
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+  }
+
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  gradOffset = gradOffset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    gradInput = gradInput.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+    gradOffset =
+        gradOffset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+}
+
+void deform_conv_backward_parameters(Tensor input, Tensor offset,
+                                     Tensor gradOutput, Tensor gradWeight,
+                                     Tensor columns, Tensor ones, int kW,
+                                     int kH, int dW, int dH, int padW, int padH,
+                                     int dilationW, int dilationH, int group,
+                                     int deformable_group, float scale,
+                                     int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(gradOutput);
+    CHECK_CUDA_INPUT(gradWeight);
+    CHECK_CUDA_INPUT(columns);
+    CHECK_CUDA_INPUT(ones);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(gradOutput);
+    CHECK_CPU_INPUT(gradWeight);
+    CHECK_CPU_INPUT(columns);
+    CHECK_CPU_INPUT(ones);
+  }
+
+  deform_conv_shape_check(input, offset, &gradOutput, gradWeight, kH, kW, dH,
+                          dW, padH, padW, dilationH, dilationW, group,
+                          deformable_group);
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view(
+        at::IntList({1, input.size(0), input.size(1), input.size(2)}));
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = gradWeight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+
+  Tensor gradOutputBuffer = at::zeros_like(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane, im2col_step,
+                             outputHeight, outputWidth});
+  gradOutputBuffer = gradOutputBuffer.contiguous();
+  gradOutputBuffer.copy_(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane,
+                             im2col_step * outputHeight, outputWidth});
+
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col_impl(input[elt], offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group, columns);
+
+    // divide into group
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0), group, gradOutputBuffer.size(1) / group,
+         gradOutputBuffer.size(2), gradOutputBuffer.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    gradWeight =
+        gradWeight.view({group, gradWeight.size(0) / group, gradWeight.size(1),
+                         gradWeight.size(2), gradWeight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      gradWeight[g] = gradWeight[g]
+                          .flatten(1)
+                          .addmm_(gradOutputBuffer[elt][g].flatten(1),
+                                  columns[g].transpose(1, 0), 1.0, scale)
+                          .view_as(gradWeight[g]);
+    }
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0),
+         gradOutputBuffer.size(1) * gradOutputBuffer.size(2),
+         gradOutputBuffer.size(3), gradOutputBuffer.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradWeight = gradWeight.view({gradWeight.size(0) * gradWeight.size(1),
+                                  gradWeight.size(2), gradWeight.size(3),
+                                  gradWeight.size(4)});
+  }
+
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c07a170dfb73032756277096d53b82a528ecafd1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_parrots.cpp
@@ -0,0 +1,273 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "deform_conv_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void deform_conv_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& offset = buildATensor(ctx, ins[2]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+  auto ones = buildATensor(ctx, outs[2]);
+
+  deform_conv_forward(input, weight, offset, output, columns, ones, kW, kH, dW,
+                      dH, padW, padH, dilationW, dilationH, group,
+                      deformable_group, im2col_step);
+}
+
+void deform_conv_backward_input_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& offset = buildATensor(ctx, ins[1]);
+  const auto& gradOutput = buildATensor(ctx, ins[2]);
+
+  auto gradInput = buildATensor(ctx, outs[0]);
+  auto gradOffset = buildATensor(ctx, outs[1]);
+  auto weight = buildATensor(ctx, outs[2]);
+  auto columns = buildATensor(ctx, outs[3]);
+
+  deform_conv_backward_input(input, offset, gradOutput, gradInput, gradOffset,
+                             weight, columns, kW, kH, dW, dH, padW, padH,
+                             dilationW, dilationH, group, deformable_group,
+                             im2col_step);
+}
+
+void deform_conv_backward_parameters_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  float scale;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<float>("scale", scale)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& offset = buildATensor(ctx, ins[1]);
+  const auto& gradOutput = buildATensor(ctx, ins[2]);
+
+  auto gradWeight = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+  auto ones = buildATensor(ctx, outs[2]);
+  deform_conv_backward_parameters(input, offset, gradOutput, gradWeight,
+                                  columns, ones, kW, kH, dW, dH, padW, padH,
+                                  dilationW, dilationH, group, deformable_group,
+                                  scale, im2col_step);
+}
+#endif
+
+void deform_conv_forward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& offset = buildATensor(ctx, ins[2]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+  auto ones = buildATensor(ctx, outs[2]);
+
+  deform_conv_forward(input, weight, offset, output, columns, ones, kW, kH, dW,
+                      dH, padW, padH, dilationW, dilationH, group,
+                      deformable_group, im2col_step);
+}
+
+void deform_conv_backward_input_cpu_parrots(HostContext& ctx,
+                                            const SSElement& attr,
+                                            const OperatorBase::in_list_t& ins,
+                                            OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& offset = buildATensor(ctx, ins[1]);
+  const auto& gradOutput = buildATensor(ctx, ins[2]);
+
+  auto gradInput = buildATensor(ctx, outs[0]);
+  auto gradOffset = buildATensor(ctx, outs[1]);
+  auto weight = buildATensor(ctx, outs[2]);
+  auto columns = buildATensor(ctx, outs[3]);
+
+  deform_conv_backward_input(input, offset, gradOutput, gradInput, gradOffset,
+                             weight, columns, kW, kH, dW, dH, padW, padH,
+                             dilationW, dilationH, group, deformable_group,
+                             im2col_step);
+}
+
+void deform_conv_backward_parameters_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kW, kH, dW, dH, padW, padH, dilationW, dilationH, group, deformable_group,
+      im2col_step;
+  float scale;
+  SSAttrs(attr)
+      .get<int>("kW", kW)
+      .get<int>("kH", kH)
+      .get<int>("dW", dW)
+      .get<int>("dH", dH)
+      .get<int>("padW", padW)
+      .get<int>("padH", padH)
+      .get<int>("dilationW", dilationW)
+      .get<int>("dilationH", dilationH)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<float>("scale", scale)
+      .get<int>("im2col_step", im2col_step)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& offset = buildATensor(ctx, ins[1]);
+  const auto& gradOutput = buildATensor(ctx, ins[2]);
+
+  auto gradWeight = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+  auto ones = buildATensor(ctx, outs[2]);
+  deform_conv_backward_parameters(input, offset, gradOutput, gradWeight,
+                                  columns, ones, kW, kH, dW, dH, padW, padH,
+                                  dilationW, dilationH, group, deformable_group,
+                                  scale, im2col_step);
+}
+
+PARROTS_EXTENSION_REGISTER(deform_conv_forward)
+    .attr("kW")
+    .attr("kH")
+    .attr("dW")
+    .attr("dH")
+    .attr("padW")
+    .attr("padH")
+    .attr("dilationW")
+    .attr("dilationH")
+    .attr("group")
+    .attr("deformable_group")
+    .attr("im2col_step")
+    .input(3)
+    .output(3)
+    .apply(deform_conv_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(deform_conv_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(deform_conv_backward_input)
+    .attr("kW")
+    .attr("kH")
+    .attr("dW")
+    .attr("dH")
+    .attr("padW")
+    .attr("padH")
+    .attr("dilationW")
+    .attr("dilationH")
+    .attr("group")
+    .attr("deformable_group")
+    .attr("im2col_step")
+    .input(3)
+    .output(4)
+    .apply(deform_conv_backward_input_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(deform_conv_backward_input_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(deform_conv_backward_parameters)
+    .attr("kW")
+    .attr("kH")
+    .attr("dW")
+    .attr("dH")
+    .attr("padW")
+    .attr("padH")
+    .attr("dilationW")
+    .attr("dilationH")
+    .attr("group")
+    .attr("deformable_group")
+    .attr("scale")
+    .attr("im2col_step")
+    .input(3)
+    .output(3)
+    .apply(deform_conv_backward_parameters_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(deform_conv_backward_parameters_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..e0d3d40d1c9eb32a466d5d4b427556741a4c79fc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_conv_pytorch.h
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef DEFORM_CONV_PYTORCH_H
+#define DEFORM_CONV_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void deform_conv_forward(Tensor input, Tensor weight, Tensor offset,
+                         Tensor output, Tensor columns, Tensor ones, int kW,
+                         int kH, int dW, int dH, int padW, int padH,
+                         int dilationW, int dilationH, int group,
+                         int deformable_group, int im2col_step);
+
+void deform_conv_backward_input(Tensor input, Tensor offset, Tensor gradOutput,
+                                Tensor gradInput, Tensor gradOffset,
+                                Tensor weight, Tensor columns, int kW, int kH,
+                                int dW, int dH, int padW, int padH,
+                                int dilationW, int dilationH, int group,
+                                int deformable_group, int im2col_step);
+
+void deform_conv_backward_parameters(Tensor input, Tensor offset,
+                                     Tensor gradOutput, Tensor gradWeight,
+                                     Tensor columns, Tensor ones, int kW,
+                                     int kH, int dW, int dH, int padW, int padH,
+                                     int dilationW, int dilationH, int group,
+                                     int deformable_group, float scale,
+                                     int im2col_step);
+
+#endif  // DEFORM_CONV_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4fb78a96e74f7e97dff5212bb767eab743f2e73c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool.cpp
@@ -0,0 +1,42 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  DISPATCH_DEVICE_IMPL(deform_roi_pool_forward_impl, input, rois, offset,
+                       output, pooled_height, pooled_width, spatial_scale,
+                       sampling_ratio, gamma);
+}
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma) {
+  DISPATCH_DEVICE_IMPL(deform_roi_pool_backward_impl, grad_output, input, rois,
+                       offset, grad_input, grad_offset, pooled_height,
+                       pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_forward(Tensor input, Tensor rois, Tensor offset,
+                             Tensor output, int pooled_height, int pooled_width,
+                             float spatial_scale, int sampling_ratio,
+                             float gamma) {
+  deform_roi_pool_forward_impl(input, rois, offset, output, pooled_height,
+                               pooled_width, spatial_scale, sampling_ratio,
+                               gamma);
+}
+
+void deform_roi_pool_backward(Tensor grad_output, Tensor input, Tensor rois,
+                              Tensor offset, Tensor grad_input,
+                              Tensor grad_offset, int pooled_height,
+                              int pooled_width, float spatial_scale,
+                              int sampling_ratio, float gamma) {
+  deform_roi_pool_backward_impl(grad_output, input, rois, offset, grad_input,
+                                grad_offset, pooled_height, pooled_width,
+                                spatial_scale, sampling_ratio, gamma);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..fc2701d52d921ee03fd2ff518852d52e291d6c4c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_parrots.cpp
@@ -0,0 +1,102 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "deform_roi_pool_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+/*void deform_roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor offset,
+ *                                  Tensor output, int pooled_height,
+ *                                  int pooled_width, float spatial_scale,
+ *                                  int sampling_ratio, float gamma);
+ */
+void deform_roi_pool_forward_cuda_parrots(CudaContext& ctx,
+                                          const SSElement& attr,
+                                          const OperatorBase::in_list_t& ins,
+                                          OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  float gamma;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<float>("gamma", gamma)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  const auto& offset = buildATensor(ctx, ins[2]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  deform_roi_pool_forward_cuda(input, rois, offset, output, pooled_height,
+                               pooled_width, spatial_scale, sampling_ratio,
+                               gamma);
+}
+
+/*void deform_roi_pool_backward_cuda(Tensor grad_output, Tensor input,
+ *                                   Tensor rois, Tensor offset,
+ *                                   Tensor grad_input, Tensor grad_offset,
+ *                                   int pooled_height, int pooled_width,
+ *                                   float spatial_scale, int sampling_ratio,
+ *                                   float gamma);
+ */
+void deform_roi_pool_backward_cuda_parrots(CudaContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  float gamma;
+
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<float>("gamma", gamma)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& input = buildATensor(ctx, ins[1]);
+  const auto& rois = buildATensor(ctx, ins[2]);
+  const auto& offset = buildATensor(ctx, ins[3]);
+
+  auto grad_input = buildATensor(ctx, outs[0]);
+  auto grad_offset = buildATensor(ctx, outs[1]);
+
+  deform_roi_pool_backward_cuda(grad_output, input, rois, offset, grad_input,
+                                grad_offset, pooled_height, pooled_width,
+                                spatial_scale, sampling_ratio, gamma);
+}
+
+PARROTS_EXTENSION_REGISTER(deform_roi_pool_forward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("gamma")
+    .input(3)
+    .output(1)
+    .apply(deform_roi_pool_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(deform_roi_pool_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("gamma")
+    .input(4)
+    .output(2)
+    .apply(deform_roi_pool_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..ac0f2c324bb8329f2a0b6bc683f3d902a300156c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/deform_roi_pool_pytorch.h
@@ -0,0 +1,18 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef DEFORM_ROI_POOL_PYTORCH_H
+#define DEFORM_ROI_POOL_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void deform_roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma);
+
+void deform_roi_pool_backward_cuda(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma);
+#endif  // DEFORM_ROI_POOL_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2361b7fbe5c86fa62a0fa78f39f6d018de108f8f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated.cpp
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor diff_iou_rotated_sort_vertices_forward_impl(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid) {
+  return DISPATCH_DEVICE_IMPL(diff_iou_rotated_sort_vertices_forward_impl,
+                              vertices, mask, num_valid);
+}
+
+Tensor diff_iou_rotated_sort_vertices_forward(Tensor vertices, Tensor mask,
+                                              Tensor num_valid) {
+  return diff_iou_rotated_sort_vertices_forward_impl(vertices, mask, num_valid);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b4d3e0e05900a1c9c731fcc7e2194eeedc8b9bfb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_parrots.cpp
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "diff_iou_rotated_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void diff_iou_rotated_sort_vertices_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  at::Tensor boxes, scores, dets;
+  auto vertices = buildATensor(ctx, ins[0]);
+  auto mask = buildATensor(ctx, ins[1]);
+  auto num_valid = buildATensor(ctx, ins[2]);
+  auto out =
+      diff_iou_rotated_sort_vertices_forward_cuda(vertices, mask, num_valid);
+  updateDArray(ctx, out, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(diff_iou_rotated_sort_vertices_forward)
+    .input(3)
+    .output(1)
+    .apply(diff_iou_rotated_sort_vertices_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..ef911ecc20c7e648dea7aeb74a4d3ec2f46ec990
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/diff_iou_rotated_pytorch.h
@@ -0,0 +1,10 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef DIFF_IOU_ROTATED_PYTORCH_H
+#define DIFF_IOU_ROTATED_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+Tensor diff_iou_rotated_sort_vertices_forward_cuda(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid);
+
+#endif  // DIFF_IOU_ROTATED_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..ed0e2186532d9d6d909f76d653283bbdc29eac11
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss.cpp
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(sigmoid_focal_loss_forward_impl, input, target, weight,
+                       output, gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(sigmoid_focal_loss_backward_impl, input, target, weight,
+                       grad_input, gamma, alpha);
+}
+
+void softmax_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(softmax_focal_loss_forward_impl, input, target, weight,
+                       output, gamma, alpha);
+}
+
+void softmax_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha) {
+  DISPATCH_DEVICE_IMPL(softmax_focal_loss_backward_impl, input, target, weight,
+                       buff, grad_input, gamma, alpha);
+}
+
+void sigmoid_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha) {
+  sigmoid_focal_loss_forward_impl(input, target, weight, output, gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor grad_input, float gamma, float alpha) {
+  sigmoid_focal_loss_backward_impl(input, target, weight, grad_input, gamma,
+                                   alpha);
+}
+
+void softmax_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha) {
+  softmax_focal_loss_forward_impl(input, target, weight, output, gamma, alpha);
+}
+
+void softmax_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor buff, Tensor grad_input, float gamma,
+                                 float alpha) {
+  softmax_focal_loss_backward_impl(input, target, weight, buff, grad_input,
+                                   gamma, alpha);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..044e200c40ef6342c6147e2d9282d856cc3dd9a2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_parrots.cpp
@@ -0,0 +1,113 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "focal_loss_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void sigmoid_focal_loss_forward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  float gamma;
+  float alpha;
+  SSAttrs(attr).get<float>("gamma", gamma).get<float>("alpha", alpha).done();
+
+  // get inputs and outputs
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& target = buildATensor(ctx, ins[1]);
+  const auto& weight = buildATensor(ctx, ins[2]);
+
+  auto output = buildATensor(ctx, outs[0]);
+
+  sigmoid_focal_loss_forward_cuda(input, target, weight, output, gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float gamma;
+  float alpha;
+  SSAttrs(attr).get<float>("gamma", gamma).get<float>("alpha", alpha).done();
+
+  // get inputs and outputs
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& target = buildATensor(ctx, ins[1]);
+  const auto& weight = buildATensor(ctx, ins[2]);
+
+  auto grad_input = buildATensor(ctx, outs[0]);
+
+  sigmoid_focal_loss_backward_cuda(input, target, weight, grad_input, gamma,
+                                   alpha);
+}
+
+void softmax_focal_loss_forward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  float gamma;
+  float alpha;
+  SSAttrs(attr).get<float>("gamma", gamma).get<float>("alpha", alpha).done();
+
+  // get inputs and outputs
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& target = buildATensor(ctx, ins[1]);
+  const auto& weight = buildATensor(ctx, ins[2]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  softmax_focal_loss_forward_cuda(input, target, weight, output, gamma, alpha);
+}
+
+void softmax_focal_loss_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float gamma;
+  float alpha;
+  SSAttrs(attr).get<float>("gamma", gamma).get<float>("alpha", alpha).done();
+
+  // get inputs and outputs
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& target = buildATensor(ctx, ins[1]);
+  const auto& weight = buildATensor(ctx, ins[2]);
+
+  auto buff = buildATensor(ctx, outs[0]);
+  auto grad_input = buildATensor(ctx, outs[1]);
+  softmax_focal_loss_backward_cuda(input, target, weight, buff, grad_input,
+                                   gamma, alpha);
+}
+
+PARROTS_EXTENSION_REGISTER(sigmoid_focal_loss_forward)
+    .attr("gamma")
+    .attr("alpha")
+    .input(3)
+    .output(1)
+    .apply(sigmoid_focal_loss_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(sigmoid_focal_loss_backward)
+    .attr("gamma")
+    .attr("alpha")
+    .input(3)
+    .output(1)
+    .apply(sigmoid_focal_loss_backward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(softmax_focal_loss_forward)
+    .attr("gamma")
+    .attr("alpha")
+    .input(3)
+    .output(1)
+    .apply(softmax_focal_loss_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(softmax_focal_loss_backward)
+    .attr("gamma")
+    .attr("alpha")
+    .input(3)
+    .output(2)
+    .apply(softmax_focal_loss_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..b7a00c8abcd5fccd5bf2e3bfcde0451545c69f28
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/focal_loss_pytorch.h
@@ -0,0 +1,21 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef FOCAL_LOSS_PYTORCH_H
+#define FOCAL_LOSS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void sigmoid_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha);
+
+void softmax_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void softmax_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha);
+#endif  // FOCAL_LOSS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9c7098acdb5b8392a698803dd7c7d34a360df6ad
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample.cpp
@@ -0,0 +1,34 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/sampling.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void furthest_point_sampling_forward_impl(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m) {
+  DISPATCH_DEVICE_IMPL(furthest_point_sampling_forward_impl, points_tensor,
+                       temp_tensor, idx_tensor, b, n, m);
+}
+
+void furthest_point_sampling_with_dist_forward_impl(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m) {
+  DISPATCH_DEVICE_IMPL(furthest_point_sampling_with_dist_forward_impl,
+                       points_tensor, temp_tensor, idx_tensor, b, n, m);
+}
+
+void furthest_point_sampling_forward(Tensor points_tensor, Tensor temp_tensor,
+                                     Tensor idx_tensor, int b, int n, int m) {
+  furthest_point_sampling_forward_impl(points_tensor, temp_tensor, idx_tensor,
+                                       b, n, m);
+}
+
+void furthest_point_sampling_with_dist_forward(Tensor points_tensor,
+                                               Tensor temp_tensor,
+                                               Tensor idx_tensor, int b, int n,
+                                               int m) {
+  furthest_point_sampling_with_dist_forward_impl(points_tensor, temp_tensor,
+                                                 idx_tensor, b, n, m);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..483bfb24316d505c6c6086f0ec1f70a61c2e2baf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_parrots.cpp
@@ -0,0 +1,57 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "furthest_point_sample_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void furthest_point_sample_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int b, n, m;
+  SSAttrs(attr).get<int>("b", b).get<int>("n", n).get<int>("m", m).done();
+
+  auto points_tensor = buildATensor(ctx, ins[0]);
+  auto temp_tensor = buildATensor(ctx, ins[1]);
+
+  auto idx_tensor = buildATensor(ctx, outs[0]);
+
+  furthest_point_sampling_forward(points_tensor, temp_tensor, idx_tensor, b, n,
+                                  m);
+}
+
+void furthest_point_sampling_with_dist_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int b, n, m;
+  SSAttrs(attr).get<int>("b", b).get<int>("n", n).get<int>("m", m).done();
+
+  auto points_tensor = buildATensor(ctx, ins[0]);
+  auto temp_tensor = buildATensor(ctx, ins[1]);
+
+  auto idx_tensor = buildATensor(ctx, outs[0]);
+
+  furthest_point_sampling_with_dist_forward(points_tensor, temp_tensor,
+                                            idx_tensor, b, n, m);
+}
+PARROTS_EXTENSION_REGISTER(furthest_point_sampling_forward)
+    .attr("b")
+    .attr("n")
+    .attr("m")
+    .input(2)
+    .output(1)
+    .apply(furthest_point_sample_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(furthest_point_sampling_with_dist_forward)
+    .attr("b")
+    .attr("n")
+    .attr("m")
+    .input(2)
+    .output(1)
+    .apply(furthest_point_sampling_with_dist_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..0325cd66ed317574d2ab258152617091552a9301
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/furthest_point_sample_pytorch.h
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef FURTHEST_POINT_SAMPLE_PYTORCH_H
+#define FURTHEST_POINT_SAMPLE_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void furthest_point_sampling_forward(Tensor points_tensor, Tensor temp_tensor,
+                                     Tensor idx_tensor, int b, int n, int m);
+
+void furthest_point_sampling_with_dist_forward(Tensor points_tensor,
+                                               Tensor temp_tensor,
+                                               Tensor idx_tensor, int b, int n,
+                                               int m);
+#endif  // FURTHEST_POINT_SAMPLE_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_leakyrelu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_leakyrelu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8d411c9d843f15174653aab4b24cbb3c37564073
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_leakyrelu.cpp
@@ -0,0 +1,119 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/fused_bias_act.cpp
+
+/*
+Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+
+NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+Augmentation (ADA)
+=======================================================================
+
+1. Definitions
+
+"Licensor" means any person or entity that distributes its Work.
+
+"Software" means the original work of authorship made available under
+this License.
+
+"Work" means the Software and any additions to or derivative works of
+the Software that are made available under this License.
+
+The terms "reproduce," "reproduction," "derivative works," and
+"distribution" have the meaning as provided under U.S. copyright law;
+provided, however, that for the purposes of this License, derivative
+works shall not include works that remain separable from, or merely
+link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are "made available" under this License
+by including in or with the Work either (a) a copyright notice
+referencing the applicability of this License to the Work, or (b) a
+copy of this License.
+
+2. License Grants
+
+    2.1 Copyright Grant. Subject to the terms and conditions of this
+    License, each Licensor grants to you a perpetual, worldwide,
+    non-exclusive, royalty-free, copyright license to reproduce,
+    prepare derivative works of, publicly display, publicly perform,
+    sublicense and distribute its Work and any resulting derivative
+    works in any form.
+
+3. Limitations
+
+    3.1 Redistribution. You may reproduce or distribute the Work only
+    if (a) you do so under this License, (b) you include a complete
+    copy of this License with your distribution, and (c) you retain
+    without modification any copyright, patent, trademark, or
+    attribution notices that are present in the Work.
+
+    3.2 Derivative Works. You may specify that additional or different
+    terms apply to the use, reproduction, and distribution of your
+    derivative works of the Work ("Your Terms") only if (a) Your Terms
+    provide that the use limitation in Section 3.3 applies to your
+    derivative works, and (b) you identify the specific derivative
+    works that are subject to Your Terms. Notwithstanding Your Terms,
+    this License (including the redistribution requirements in Section
+    3.1) will continue to apply to the Work itself.
+
+    3.3 Use Limitation. The Work and any derivative works thereof only
+    may be used or intended for use non-commercially. Notwithstanding
+    the foregoing, NVIDIA and its affiliates may use the Work and any
+    derivative works commercially. As used herein, "non-commercially"
+    means for research or evaluation purposes only.
+
+    3.4 Patent Claims. If you bring or threaten to bring a patent claim
+    against any Licensor (including any claim, cross-claim or
+    counterclaim in a lawsuit) to enforce any patents that you allege
+    are infringed by any Work, then your rights under this License from
+    such Licensor (including the grant in Section 2.1) will terminate
+    immediately.
+
+    3.5 Trademarks. This License does not grant any rights to use any
+    Licensor’s or its affiliates’ names, logos, or trademarks, except
+    as necessary to reproduce the notices described in this License.
+
+    3.6 Termination. If you violate any term of this License, then your
+    rights under this License (including the grant in Section 2.1) will
+    terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+torch::Tensor fused_bias_leakyrelu_op_impl(const torch::Tensor& input,
+                                           const torch::Tensor& bias,
+                                           const torch::Tensor& refer, int act,
+                                           int grad, float alpha, float scale) {
+  return DISPATCH_DEVICE_IMPL(fused_bias_leakyrelu_op_impl, input, bias, refer,
+                              act, grad, alpha, scale);
+}
+
+torch::Tensor fused_bias_leakyrelu(const torch::Tensor& input,
+                                   const torch::Tensor& bias,
+                                   const torch::Tensor& refer, int act,
+                                   int grad, float alpha, float scale) {
+  return fused_bias_leakyrelu_op_impl(input, bias, refer, act, grad, alpha,
+                                      scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..47409ad20bbb5d4852eceb16038d3cec41e3431c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/fused_bias_parrots.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <torch/extension.h>
+
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+using namespace at;
+using namespace parrots;
+
+torch::Tensor fused_bias_leakyrelu(const torch::Tensor &input,
+                                   const torch::Tensor &bias,
+                                   const torch::Tensor &refer, int act,
+                                   int grad, float alpha, float scale);
+
+void fused_bias_leakyrelu_parrots(CudaContext &ctx, const SSElement &attr,
+                                  const OperatorBase::in_list_t &ins,
+                                  OperatorBase::out_list_t &outs) {
+  int act, grad;
+  float alpha, scale;
+  SSAttrs(attr)
+      .get<int>("act", act)
+      .get<int>("grad", grad)
+      .get<float>("alpha", alpha)
+      .get<float>("scale", scale)
+      .done();
+  const auto &input = buildATensor(ctx, ins[0]);
+  const auto &bias = buildATensor(ctx, ins[1]);
+  const auto &refer = buildATensor(ctx, ins[2]);
+  auto out = fused_bias_leakyrelu(input, bias, refer, act, grad, alpha, scale);
+  updateDArray(ctx, out, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(fused_bias_leakyrelu)
+    .attr("act")
+    .attr("grad")
+    .attr("alpha")
+    .attr("scale")
+    .input(3)
+    .output(1)
+    .apply(fused_bias_leakyrelu_parrots)
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b8fb020022902bfbeb5ba940621d51859c616bdc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points.cpp
@@ -0,0 +1,30 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void gather_points_forward_impl(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out) {
+  DISPATCH_DEVICE_IMPL(gather_points_forward_impl, b, c, n, npoints, points,
+                       idx, out);
+}
+
+void gather_points_backward_impl(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(gather_points_backward_impl, b, c, n, npoints, grad_out,
+                       idx, grad_points);
+}
+
+void gather_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                           Tensor out_tensor, int b, int c, int n,
+                           int npoints) {
+  gather_points_forward_impl(b, c, n, npoints, points_tensor, idx_tensor,
+                             out_tensor);
+}
+
+void gather_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                            Tensor grad_points_tensor, int b, int c, int n,
+                            int npoints) {
+  gather_points_backward_impl(b, c, n, npoints, grad_out_tensor, idx_tensor,
+                              grad_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1d2d9e1290f26ccbfeb301a102fcb0917ff2cfa1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_parrots.cpp
@@ -0,0 +1,71 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "gather_points_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void gather_points_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  int b, c, n, npoints;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("n", n)
+      .get<int>("npoints", npoints)
+      .done();
+
+  auto points_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+
+  auto out_tensor = buildATensor(ctx, outs[0]);
+
+  gather_points_forward(points_tensor, idx_tensor, out_tensor, b, c, n,
+                        npoints);
+}
+
+void gather_points_backward_cuda_parrots(CudaContext& ctx,
+                                         const SSElement& attr,
+                                         const OperatorBase::in_list_t& ins,
+                                         OperatorBase::out_list_t& outs) {
+  int b, c, n, npoints;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("n", n)
+      .get<int>("npoints", npoints)
+      .done();
+
+  auto grad_out_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+
+  auto grad_points_tensor = buildATensor(ctx, outs[0]);
+
+  gather_points_backward(grad_out_tensor, idx_tensor, grad_points_tensor, b, c,
+                         n, npoints);
+}
+
+PARROTS_EXTENSION_REGISTER(gather_points_forward)
+    .attr("b")
+    .attr("c")
+    .attr("n")
+    .attr("npoints")
+    .input(2)
+    .output(1)
+    .apply(gather_points_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(gather_points_backward)
+    .attr("b")
+    .attr("c")
+    .attr("n")
+    .attr("npoints")
+    .input(2)
+    .output(1)
+    .apply(gather_points_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..1689ae6ad9ca00e795510ac356f6b49c4890bf2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/gather_points_pytorch.h
@@ -0,0 +1,13 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef GATHER_POINTS_PYTORCH_H
+#define GATHER_POINTS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void gather_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                           Tensor out_tensor, int b, int c, int n, int npoints);
+
+void gather_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                            Tensor grad_points_tensor, int b, int c, int n,
+                            int npoints);
+#endif  // GATHER_POINTS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..cdd190d40bbfdb109e34148791775dfe9d16be2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points.cpp
@@ -0,0 +1,34 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void group_points_forward_impl(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out) {
+  DISPATCH_DEVICE_IMPL(group_points_forward_impl, b, c, n, npoints, nsample,
+                       points, idx, out);
+}
+
+void group_points_backward_impl(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(group_points_backward_impl, b, c, n, npoints, nsample,
+                       grad_out, idx, grad_points);
+}
+
+void group_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                          Tensor out_tensor, int b, int c, int n, int npoints,
+                          int nsample) {
+  DISPATCH_DEVICE_IMPL(group_points_forward_impl, b, c, n, npoints, nsample,
+                       points_tensor, idx_tensor, out_tensor);
+}
+
+void group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                           Tensor grad_points_tensor, int b, int c, int n,
+                           int npoints, int nsample) {
+  group_points_backward_impl(b, c, n, npoints, nsample, grad_out_tensor,
+                             idx_tensor, grad_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..282c01a8c175cc6145ab45e5938325d2f7e0d491
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_parrots.cpp
@@ -0,0 +1,72 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "group_points_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void group_points_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  int b, c, n, npoints, nsample;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("n", n)
+      .get<int>("npoints", npoints)
+      .get<int>("nsample", nsample)
+      .done();
+  auto points_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+
+  auto out_tensor = buildATensor(ctx, outs[0]);
+
+  group_points_forward(points_tensor, idx_tensor, out_tensor, b, c, n, npoints,
+                       nsample);
+}
+
+void group_points_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  int b, c, n, npoints, nsample;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("n", n)
+      .get<int>("npoints", npoints)
+      .get<int>("nsample", nsample)
+      .done();
+  auto grad_out_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+
+  auto grad_points_tensor = buildATensor(ctx, outs[0]);
+
+  group_points_backward(grad_out_tensor, idx_tensor, grad_points_tensor, b, c,
+                        n, npoints, nsample);
+}
+
+PARROTS_EXTENSION_REGISTER(group_points_forward)
+    .attr("b")
+    .attr("c")
+    .attr("n")
+    .attr("npoints")
+    .attr("nsample")
+    .input(2)
+    .output(1)
+    .apply(group_points_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(group_points_backward)
+    .attr("b")
+    .attr("c")
+    .attr("n")
+    .attr("npoints")
+    .attr("nsample")
+    .input(2)
+    .output(1)
+    .apply(group_points_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..e704ab078e0ea3833c0ef29e5e4ab00693151be3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/group_points_pytorch.h
@@ -0,0 +1,15 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef GROUP_POINTS_PYTORCH_H
+#define GROUP_POINTS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void group_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                          Tensor out_tensor, int b, int c, int n, int npoints,
+                          int nsample);
+
+void group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                           Tensor grad_points_tensor, int b, int c, int n,
+                           int npoints, int nsample);
+
+#endif  // GROUP_POINTS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/info.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/info.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a4cc41861128dc0a8f8ccd641f68044428c4dc2c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/info.cpp
@@ -0,0 +1,65 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/vision.cpp
+#include "pytorch_cpp_helper.hpp"
+
+#ifdef MMCV_WITH_CUDA
+#ifdef MMCV_WITH_HIP
+#include <hip/hip_runtime_api.h>
+int get_hiprt_version() {
+  int runtimeVersion;
+  hipRuntimeGetVersion(&runtimeVersion);
+  return runtimeVersion;
+}
+#else
+#include <cuda_runtime_api.h>
+int get_cudart_version() { return CUDART_VERSION; }
+#endif
+#endif
+
+std::string get_compiling_cuda_version() {
+#ifdef MMCV_WITH_CUDA
+#ifndef MMCV_WITH_HIP
+  std::ostringstream oss;
+  // copied from
+  // https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/CUDAHooks.cpp#L231
+  auto printCudaStyleVersion = [&](int v) {
+    oss << (v / 1000) << "." << (v / 10 % 100);
+    if (v % 10 != 0) {
+      oss << "." << (v % 10);
+    }
+  };
+  printCudaStyleVersion(get_cudart_version());
+  return oss.str();
+#else
+  std::ostringstream oss;
+  oss << get_hiprt_version();
+  return oss.str();
+#endif
+#else
+  return std::string("not available");
+#endif
+}
+
+// similar to
+// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Version.cpp
+std::string get_compiler_version() {
+  std::ostringstream ss;
+#if defined(__GNUC__)
+#ifndef __clang__
+  { ss << "GCC " << __GNUC__ << "." << __GNUC_MINOR__; }
+#endif
+#endif
+
+#if defined(__clang_major__)
+  {
+    ss << "clang " << __clang_major__ << "." << __clang_minor__ << "."
+       << __clang_patchlevel__;
+  }
+#endif
+
+#if defined(_MSC_VER)
+  { ss << "MSVC " << _MSC_FULL_VER; }
+#endif
+  return ss.str();
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a347c0ee96db9ceefd6168c3cce84bea243e7044
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d.cpp
@@ -0,0 +1,66 @@
+// Modified from
+// https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/iou3d_nms/src/iou3d_nms.cpp
+
+/*
+3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others)
+Written by Shaoshuai Shi
+All Rights Reserved 2019-2020.
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8;
+
+void iou3d_boxes_overlap_bev_forward_impl(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap) {
+  DISPATCH_DEVICE_IMPL(iou3d_boxes_overlap_bev_forward_impl, num_a, boxes_a,
+                       num_b, boxes_b, ans_overlap);
+}
+
+void iou3d_nms3d_forward_impl(const Tensor boxes, Tensor &keep,
+                              Tensor &keep_num, float nms_overlap_thresh) {
+  DISPATCH_DEVICE_IMPL(iou3d_nms3d_forward_impl, boxes, keep, keep_num,
+                       nms_overlap_thresh);
+}
+
+void iou3d_nms3d_normal_forward_impl(const Tensor boxes, Tensor &keep,
+                                     Tensor &keep_num,
+                                     float nms_overlap_thresh) {
+  DISPATCH_DEVICE_IMPL(iou3d_nms3d_normal_forward_impl, boxes, keep, keep_num,
+                       nms_overlap_thresh);
+}
+
+void iou3d_boxes_overlap_bev_forward(Tensor boxes_a, Tensor boxes_b,
+                                     Tensor ans_overlap) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params boxes_b: (M, 5)
+  // params ans_overlap: (N, M)
+  int num_a = boxes_a.size(0);
+  int num_b = boxes_b.size(0);
+
+  iou3d_boxes_overlap_bev_forward_impl(num_a, boxes_a, num_b, boxes_b,
+                                       ans_overlap);
+}
+
+void iou3d_nms3d_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                         float nms_overlap_thresh) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params keep: (N)
+  CHECK_CONTIGUOUS(boxes);
+  CHECK_CONTIGUOUS(keep);
+
+  iou3d_nms3d_forward_impl(boxes, keep, keep_num, nms_overlap_thresh);
+}
+
+void iou3d_nms3d_normal_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                                float nms_overlap_thresh) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params keep: (N)
+
+  CHECK_CONTIGUOUS(boxes);
+  CHECK_CONTIGUOUS(keep);
+
+  iou3d_nms3d_normal_forward_impl(boxes, keep, keep_num, nms_overlap_thresh);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..20e288aeab9bdaef047115bdac645e4b58e4c629
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_parrots.cpp
@@ -0,0 +1,70 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "iou3d_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void iou3d_boxes_overlap_bev_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto boxes_a = buildATensor(ctx, ins[0]);
+  auto boxes_b = buildATensor(ctx, ins[1]);
+
+  auto ans_iou = buildATensor(ctx, outs[0]);
+
+  iou3d_boxes_overlap_bev_forward(boxes_a, boxes_b, ans_iou);
+}
+
+void iou3d_nms3d_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  float nms_overlap_thresh;
+  SSAttrs(attr).get<float>("nms_overlap_thresh", nms_overlap_thresh).done();
+
+  auto boxes = buildATensor(ctx, ins[0]);
+
+  auto keep = buildATensor(ctx, outs[0]);
+  auto keep_num = buildATensor(ctx, outs[1]);
+
+  iou3d_nms3d_forward(boxes, keep, keep_num, nms_overlap_thresh);
+}
+
+void iou3d_nms3d_normal_forward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  float nms_overlap_thresh;
+  SSAttrs(attr).get<float>("nms_overlap_thresh", nms_overlap_thresh).done();
+
+  auto boxes = buildATensor(ctx, ins[0]);
+
+  auto keep = buildATensor(ctx, outs[0]);
+  auto keep_num = buildATensor(ctx, outs[1]);
+
+  iou3d_nms3d_normal_forward(boxes, keep, keep_num, nms_overlap_thresh);
+}
+
+PARROTS_EXTENSION_REGISTER(iou3d_boxes_overlap_bev_forward)
+    .input(2)
+    .output(1)
+    .apply(iou3d_boxes_overlap_bev_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(iou3d_nms3d_forward)
+    .attr("nms_overlap_thresh")
+    .input(1)
+    .output(2)
+    .apply(iou3d_nms3d_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(iou3d_nms3d_normal_forward)
+    .attr("nms_overlap_thresh")
+    .input(1)
+    .output(2)
+    .apply(iou3d_nms3d_normal_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..76170edc7083dbaff4a2d23356c4e7702b929a2d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/iou3d_pytorch.h
@@ -0,0 +1,16 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef IOU_3D_PYTORCH_H
+#define IOU_3D_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void iou3d_boxes_overlap_bev_forward(Tensor boxes_a, Tensor boxes_b,
+                                     Tensor ans_overlap);
+
+void iou3d_nms3d_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                         float nms_overlap_thresh);
+
+void iou3d_nms3d_normal_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                                float nms_overlap_thresh);
+
+#endif  // IOU_3D_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b4be9428c59c0f04635891b954f4c73f7fb0536d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn.cpp
@@ -0,0 +1,17 @@
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/pointops/src/knnquery_heap
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void knn_forward_impl(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2) {
+  DISPATCH_DEVICE_IMPL(knn_forward_impl, b, n, m, nsample, xyz, new_xyz, idx,
+                       dist2);
+}
+
+void knn_forward(Tensor xyz_tensor, Tensor new_xyz_tensor, Tensor idx_tensor,
+                 Tensor dist2_tensor, int b, int n, int m, int nsample) {
+  knn_forward_impl(b, n, m, nsample, xyz_tensor, new_xyz_tensor, idx_tensor,
+                   dist2_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..585b84644a4427330046ac0ea2220d07580ee638
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_parrots.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "knn_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void knn_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                              const OperatorBase::in_list_t& ins,
+                              OperatorBase::out_list_t& outs) {
+  int b, n, m, nsample;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("n", n)
+      .get<int>("m", m)
+      .get<int>("nsample", nsample)
+      .done();
+
+  auto xyz_tensor = buildATensor(ctx, ins[0]);
+  auto new_xyz_tensor = buildATensor(ctx, ins[1]);
+
+  auto idx_tensor = buildATensor(ctx, outs[0]);
+  auto dist2_tensor = buildATensor(ctx, outs[1]);
+
+  knn_forward(xyz_tensor, new_xyz_tensor, idx_tensor, dist2_tensor, b, n, m,
+              nsample);
+}
+
+PARROTS_EXTENSION_REGISTER(knn_forward)
+    .attr("b")
+    .attr("n")
+    .attr("m")
+    .attr("nsample")
+    .input(2)
+    .output(2)
+    .apply(knn_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..b0875f8389ee91bfc93083da844ccd4f6be9fdf3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/knn_pytorch.h
@@ -0,0 +1,9 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef KNN_PYTORCH_H
+#define KNN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void knn_forward(Tensor xyz_tensor, Tensor new_xyz_tensor, Tensor idx_tensor,
+                 Tensor dist2_tensor, int b, int n, int m, int nsample);
+#endif  // KNN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5903925351fcb193b86c8b5f01b410e4fc0bbaf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d.cpp
@@ -0,0 +1,33 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void masked_im2col_forward_impl(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w) {
+  DISPATCH_DEVICE_IMPL(masked_im2col_forward_impl, im, mask_h_idx, mask_w_idx,
+                       col, kernel_h, kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_impl(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels) {
+  DISPATCH_DEVICE_IMPL(masked_col2im_forward_impl, col, mask_h_idx, mask_w_idx,
+                       im, height, width, channels);
+}
+
+void masked_im2col_forward(const Tensor im, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor col,
+                           const int kernel_h, const int kernel_w,
+                           const int pad_h, const int pad_w) {
+  masked_im2col_forward_impl(im, mask_h_idx, mask_w_idx, col, kernel_h,
+                             kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward(const Tensor col, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor im, int height,
+                           int width, int channels) {
+  masked_col2im_forward_impl(col, mask_h_idx, mask_w_idx, im, height, width,
+                             channels);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..39f19740c84b521cf16a2030fb01b07bda1e75e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_parrots.cpp
@@ -0,0 +1,72 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "masked_conv2d_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void masked_im2col_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kw), col: (kh * kw * ic, ow * oh)
+  int kernel_h, kernel_w, pad_h, pad_w;
+  SSAttrs(attr)
+      .get<int>("kernel_h", kernel_h)
+      .get<int>("kernel_w", kernel_w)
+      .get<int>("pad_h", pad_h)
+      .get<int>("pad_w", pad_w)
+      .done();
+
+  const auto& im = buildATensor(ctx, ins[0]);
+  const auto& mask_h_idx = buildATensor(ctx, ins[1]);
+  const auto& mask_w_idx = buildATensor(ctx, ins[2]);
+
+  auto col = buildATensor(ctx, outs[0]);
+  masked_im2col_forward_cuda(im, mask_h_idx, mask_w_idx, col, kernel_h,
+                             kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kh), col: (kh * kw * ic, ow * oh)
+  int height, width, channels;
+  SSAttrs(attr)
+      .get<int>("height", height)
+      .get<int>("width", width)
+      .get<int>("channels", channels)
+      .done();
+
+  const auto& col = buildATensor(ctx, ins[0]);
+  const auto& mask_h_idx = buildATensor(ctx, ins[1]);
+  const auto& mask_w_idx = buildATensor(ctx, ins[2]);
+
+  auto im = buildATensor(ctx, outs[0]);
+  masked_col2im_forward_cuda(col, mask_h_idx, mask_w_idx, im, height, width,
+                             channels);
+}
+
+PARROTS_EXTENSION_REGISTER(masked_im2col_forward)
+    .attr("kernel_h")
+    .attr("kernel_w")
+    .attr("pad_h")
+    .attr("pad_w")
+    .input(3)
+    .output(1)
+    .apply(masked_im2col_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(masked_col2im_forward)
+    .attr("height")
+    .attr("width")
+    .attr("channels")
+    .input(3)
+    .output(1)
+    .apply(masked_col2im_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..36d5643f6037bf05cfdcdb23a02151aab0c1d4b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/masked_conv2d_pytorch.h
@@ -0,0 +1,15 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef MASKED_CONV2D_PYTORCH_H
+#define MASKED_CONV2D_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void masked_im2col_forward_cuda(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w);
+
+void masked_col2im_forward_cuda(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels);
+#endif  // MASKED_CONV2D_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8ff996dc8992b4c95633516054ecdba5913de8f3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons.cpp
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void min_area_polygons_impl(const Tensor pointsets, Tensor polygons) {
+  DISPATCH_DEVICE_IMPL(min_area_polygons_impl, pointsets, polygons);
+}
+
+void min_area_polygons(const Tensor pointsets, Tensor polygons) {
+  min_area_polygons_impl(pointsets, polygons);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d9e4ff4b3dd80746ca534cbf4f02ace966b363d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_parrots.cpp
@@ -0,0 +1,26 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "min_area_polygons_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void min_area_polygons_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                    const OperatorBase::in_list_t& ins,
+                                    OperatorBase::out_list_t& outs) {
+  auto pointsets = buildATensor(ctx, ins[0]);
+
+  auto polygons = buildATensor(ctx, outs[0]);
+  min_area_polygons(pointsets, polygons);
+}
+
+PARROTS_EXTENSION_REGISTER(min_area_polygons)
+    .input(1)
+    .output(1)
+    .apply(min_area_polygons_cuda_parrots)
+    .done();
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..1df27641882c6ae29028809f726c1a19b9a192cd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/min_area_polygons_pytorch.h
@@ -0,0 +1,9 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef MIN_AREA_POLYGONS_PYTORCH_H
+#define MIN_AREA_POLYGONS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void min_area_polygons(const Tensor pointsets, Tensor polygons);
+
+#endif  // MIN_AREA_POLYGONS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..12b538a05e6fd98becccfddf8e79cba7abf96d93
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv.cpp
@@ -0,0 +1,237 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void modulated_deformable_im2col_impl(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_im2col_impl, data_im, data_offset,
+                       data_mask, batch_size, channels, height_im, width_im,
+                       height_col, width_col, kernel_h, kernel_w, pad_h, pad_w,
+                       stride_h, stride_w, dilation_h, dilation_w,
+                       deformable_group, data_col);
+}
+
+void modulated_deformable_col2im_impl(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_col2im_impl, data_col, data_offset,
+                       data_mask, batch_size, channels, height_im, width_im,
+                       height_col, width_col, kernel_h, kernel_w, pad_h, pad_w,
+                       stride_h, stride_w, dilation_h, dilation_w,
+                       deformable_group, grad_im);
+}
+
+void modulated_deformable_col2im_coord_impl(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_col2im_coord_impl, data_col,
+                       data_im, data_offset, data_mask, batch_size, channels,
+                       height_im, width_im, height_col, width_col, kernel_h,
+                       kernel_w, pad_h, pad_w, stride_h, stride_w, dilation_h,
+                       dilation_w, deformable_group, grad_offset, grad_mask);
+}
+
+void modulated_deform_conv_forward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor output, Tensor columns, int kernel_h, int kernel_w,
+    const int stride_h, const int stride_w, const int pad_h, const int pad_w,
+    const int dilation_h, const int dilation_w, const int group,
+    const int deformable_group, const bool with_bias) {
+  at::DeviceGuard guard(input.device());
+
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+
+  const int channels_out = weight.size(0);
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape won't match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels won't match: (%d vs %d).",
+             channels, channels_kernel * group);
+
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+
+  // resize output
+  output = output.view({batch, channels_out, height_out, width_out}).zero_();
+  // resize temporary columns
+  columns =
+      at::zeros({channels * kernel_h * kernel_w, 1 * height_out * width_out},
+                input.options());
+
+  output = output.view({output.size(0), group, output.size(1) / group,
+                        output.size(2), output.size(3)});
+
+  for (int b = 0; b < batch; b++) {
+    modulated_deformable_im2col_impl(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+
+    // divide into group
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+
+    for (int g = 0; g < group; g++) {
+      output[b][g] = output[b][g]
+                         .flatten(1)
+                         .addmm_(weight[g].flatten(1), columns[g])
+                         .view_as(output[b][g]);
+    }
+
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+  }
+
+  output = output.view({output.size(0), output.size(1) * output.size(2),
+                        output.size(3), output.size(4)});
+
+  if (with_bias) {
+    output += bias.view({1, bias.size(0), 1, 1});
+  }
+}
+
+void modulated_deform_conv_backward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor columns, Tensor grad_input, Tensor grad_weight,
+    Tensor grad_bias, Tensor grad_offset, Tensor grad_mask, Tensor grad_output,
+    int kernel_h, int kernel_w, int stride_h, int stride_w, int pad_h,
+    int pad_w, int dilation_h, int dilation_w, int group, int deformable_group,
+    const bool with_bias) {
+  at::DeviceGuard guard(input.device());
+
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape won't match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels won't match: (%d vs %d).",
+             channels, channels_kernel * group);
+
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+
+  grad_input = grad_input.view({batch, channels, height, width});
+  columns = at::zeros({channels * kernel_h * kernel_w, height_out * width_out},
+                      input.options());
+
+  grad_output =
+      grad_output.view({grad_output.size(0), group, grad_output.size(1) / group,
+                        grad_output.size(2), grad_output.size(3)});
+
+  for (int b = 0; b < batch; b++) {
+    // divide int group
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                        grad_output[b][g].flatten(1), 0.0f, 1.0f);
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+
+    // gradient w.r.t. input coordinate data
+    modulated_deformable_col2im_coord_impl(
+        columns, input[b], offset[b], mask[b], 1, channels, height, width,
+        height_out, width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h,
+        stride_w, dilation_h, dilation_w, deformable_group, grad_offset[b],
+        grad_mask[b]);
+    // gradient w.r.t. input data
+    modulated_deformable_col2im_impl(
+        columns, offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, grad_input[b]);
+
+    // gradient w.r.t. weight, dWeight should accumulate across the batch and
+    // group
+    modulated_deformable_im2col_impl(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    grad_weight = grad_weight.view({group, grad_weight.size(0) / group,
+                                    grad_weight.size(1), grad_weight.size(2),
+                                    grad_weight.size(3)});
+    if (with_bias)
+      grad_bias = grad_bias.view({group, grad_bias.size(0) / group});
+
+    for (int g = 0; g < group; g++) {
+      grad_weight[g] =
+          grad_weight[g]
+              .flatten(1)
+              .addmm_(grad_output[b][g].flatten(1), columns[g].transpose(0, 1))
+              .view_as(grad_weight[g]);
+      if (with_bias) {
+        grad_bias[g] =
+            grad_bias[g]
+                .view({-1, 1})
+                .addmm_(grad_output[b][g].flatten(1), ones.view({-1, 1}))
+                .view(-1);
+      }
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    grad_weight = grad_weight.view({grad_weight.size(0) * grad_weight.size(1),
+                                    grad_weight.size(2), grad_weight.size(3),
+                                    grad_weight.size(4)});
+    if (with_bias)
+      grad_bias = grad_bias.view({grad_bias.size(0) * grad_bias.size(1)});
+  }
+  grad_output = grad_output.view({grad_output.size(0) * grad_output.size(1),
+                                  grad_output.size(2), grad_output.size(3),
+                                  grad_output.size(4)});
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2ef7efff6e473abd4ee94d21c8b8dc05ab34f1d9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_parrots.cpp
@@ -0,0 +1,199 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "modulated_deform_conv_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void modulated_deform_conv_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w, dilation_h,
+      dilation_w, group, deformable_group, with_bias;
+  SSAttrs(attr)
+      .get<int>("kernel_h", kernel_h)
+      .get<int>("kernel_w", kernel_w)
+      .get<int>("stride_h", stride_h)
+      .get<int>("stride_w", stride_w)
+      .get<int>("pad_h", pad_h)
+      .get<int>("pad_w", pad_w)
+      .get<int>("dilation_h", dilation_h)
+      .get<int>("dilation_w", dilation_w)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("with_bias", with_bias)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& bias = buildATensor(ctx, ins[2]);
+  const auto& ones = buildATensor(ctx, ins[3]);
+  const auto& offset = buildATensor(ctx, ins[4]);
+  const auto& mask = buildATensor(ctx, ins[5]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+
+  modulated_deform_conv_forward(input, weight, bias, ones, offset, mask, output,
+                                columns, kernel_h, kernel_w, stride_h, stride_w,
+                                pad_h, pad_w, dilation_h, dilation_w, group,
+                                deformable_group, with_bias);
+}
+
+void modulated_deform_conv_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w, dilation_h,
+      dilation_w, group, deformable_group, with_bias;
+  SSAttrs(attr)
+      .get<int>("kernel_h", kernel_h)
+      .get<int>("kernel_w", kernel_w)
+      .get<int>("stride_h", stride_h)
+      .get<int>("stride_w", stride_w)
+      .get<int>("pad_h", pad_h)
+      .get<int>("pad_w", pad_w)
+      .get<int>("dilation_h", dilation_h)
+      .get<int>("dilation_w", dilation_w)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("with_bias", with_bias)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& bias = buildATensor(ctx, ins[2]);
+  const auto& ones = buildATensor(ctx, ins[3]);
+  const auto& offset = buildATensor(ctx, ins[4]);
+  const auto& mask = buildATensor(ctx, ins[5]);
+
+  auto columns = buildATensor(ctx, outs[0]);
+  auto grad_input = buildATensor(ctx, outs[1]);
+  auto grad_weight = buildATensor(ctx, outs[2]);
+  auto grad_bias = buildATensor(ctx, outs[3]);
+  auto grad_offset = buildATensor(ctx, outs[4]);
+  auto grad_mask = buildATensor(ctx, outs[5]);
+  auto grad_output = buildATensor(ctx, outs[6]);
+  modulated_deform_conv_backward(
+      input, weight, bias, ones, offset, mask, columns, grad_input, grad_weight,
+      grad_bias, grad_offset, grad_mask, grad_output, kernel_h, kernel_w,
+      stride_h, stride_w, pad_h, pad_w, dilation_h, dilation_w, group,
+      deformable_group, with_bias);
+}
+#endif
+
+void modulated_deform_conv_forward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w, dilation_h,
+      dilation_w, group, deformable_group, with_bias;
+  SSAttrs(attr)
+      .get<int>("kernel_h", kernel_h)
+      .get<int>("kernel_w", kernel_w)
+      .get<int>("stride_h", stride_h)
+      .get<int>("stride_w", stride_w)
+      .get<int>("pad_h", pad_h)
+      .get<int>("pad_w", pad_w)
+      .get<int>("dilation_h", dilation_h)
+      .get<int>("dilation_w", dilation_w)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("with_bias", with_bias)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& bias = buildATensor(ctx, ins[2]);
+  const auto& ones = buildATensor(ctx, ins[3]);
+  const auto& offset = buildATensor(ctx, ins[4]);
+  const auto& mask = buildATensor(ctx, ins[5]);
+
+  auto output = buildATensor(ctx, outs[0]);
+  auto columns = buildATensor(ctx, outs[1]);
+
+  modulated_deform_conv_forward(input, weight, bias, ones, offset, mask, output,
+                                columns, kernel_h, kernel_w, stride_h, stride_w,
+                                pad_h, pad_w, dilation_h, dilation_w, group,
+                                deformable_group, with_bias);
+}
+
+void modulated_deform_conv_backward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w, dilation_h,
+      dilation_w, group, deformable_group, with_bias;
+  SSAttrs(attr)
+      .get<int>("kernel_h", kernel_h)
+      .get<int>("kernel_w", kernel_w)
+      .get<int>("stride_h", stride_h)
+      .get<int>("stride_w", stride_w)
+      .get<int>("pad_h", pad_h)
+      .get<int>("pad_w", pad_w)
+      .get<int>("dilation_h", dilation_h)
+      .get<int>("dilation_w", dilation_w)
+      .get<int>("group", group)
+      .get<int>("deformable_group", deformable_group)
+      .get<int>("with_bias", with_bias)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& bias = buildATensor(ctx, ins[2]);
+  const auto& ones = buildATensor(ctx, ins[3]);
+  const auto& offset = buildATensor(ctx, ins[4]);
+  const auto& mask = buildATensor(ctx, ins[5]);
+
+  auto columns = buildATensor(ctx, outs[0]);
+  auto grad_input = buildATensor(ctx, outs[1]);
+  auto grad_weight = buildATensor(ctx, outs[2]);
+  auto grad_bias = buildATensor(ctx, outs[3]);
+  auto grad_offset = buildATensor(ctx, outs[4]);
+  auto grad_mask = buildATensor(ctx, outs[5]);
+  auto grad_output = buildATensor(ctx, outs[6]);
+  modulated_deform_conv_backward(
+      input, weight, bias, ones, offset, mask, columns, grad_input, grad_weight,
+      grad_bias, grad_offset, grad_mask, grad_output, kernel_h, kernel_w,
+      stride_h, stride_w, pad_h, pad_w, dilation_h, dilation_w, group,
+      deformable_group, with_bias);
+}
+PARROTS_EXTENSION_REGISTER(modulated_deform_conv_forward)
+    .attr("kernel_h")
+    .attr("kernel_w")
+    .attr("stride_h")
+    .attr("stride_w")
+    .attr("pad_h")
+    .attr("pad_w")
+    .attr("dilation_h")
+    .attr("dilation_w")
+    .attr("group")
+    .attr("deformable_group")
+    .attr("with_bias")
+    .input(6)
+    .output(2)
+    .apply(modulated_deform_conv_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(modulated_deform_conv_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(modulated_deform_conv_backward)
+    .attr("kernel_h")
+    .attr("kernel_w")
+    .attr("stride_h")
+    .attr("stride_w")
+    .attr("pad_h")
+    .attr("pad_w")
+    .attr("dilation_h")
+    .attr("dilation_w")
+    .attr("group")
+    .attr("deformable_group")
+    .attr("with_bias")
+    .input(6)
+    .output(7)
+    .apply(modulated_deform_conv_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(modulated_deform_conv_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..12f6868612d5e7596378c4ce2e8fa25f1b9c0afc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/modulated_deform_conv_pytorch.h
@@ -0,0 +1,21 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef MODULATED_DEFORM_CONV_PYTORCH_H
+#define MODULATED_DEFORM_CONV_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void modulated_deform_conv_forward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor output, Tensor columns, int kernel_h, int kernel_w,
+    const int stride_h, const int stride_w, const int pad_h, const int pad_w,
+    const int dilation_h, const int dilation_w, const int group,
+    const int deformable_group, const bool with_bias);
+
+void modulated_deform_conv_backward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor columns, Tensor grad_input, Tensor grad_weight,
+    Tensor grad_bias, Tensor grad_offset, Tensor grad_mask, Tensor grad_output,
+    int kernel_h, int kernel_w, int stride_h, int stride_w, int pad_h,
+    int pad_w, int dilation_h, int dilation_w, int group, int deformable_group,
+    const bool with_bias);
+#endif  // MODULATED_DEFORM_CONV_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..25c8f6209b16c475ba181eea7c880eb27cca4082
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn.cpp
@@ -0,0 +1,60 @@
+/*!
+**************************************************************************************************
+* Deformable DETR
+* Copyright (c) 2020 SenseTime. All Rights Reserved.
+* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
+**************************************************************************************************
+* Modified from
+*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
+**************************************************************************************************
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor ms_deform_attn_impl_forward(const Tensor &value,
+                                   const Tensor &spatial_shapes,
+                                   const Tensor &level_start_index,
+                                   const Tensor &sampling_loc,
+                                   const Tensor &attn_weight,
+                                   const int im2col_step) {
+  return DISPATCH_DEVICE_IMPL(ms_deform_attn_impl_forward, value,
+                              spatial_shapes, level_start_index, sampling_loc,
+                              attn_weight, im2col_step);
+}
+
+void ms_deform_attn_impl_backward(
+    const Tensor &value, const Tensor &spatial_shapes,
+    const Tensor &level_start_index, const Tensor &sampling_loc,
+    const Tensor &attn_weight, const Tensor &grad_output, Tensor &grad_value,
+    Tensor &grad_sampling_loc, Tensor &grad_attn_weight,
+    const int im2col_step) {
+  DISPATCH_DEVICE_IMPL(ms_deform_attn_impl_backward, value, spatial_shapes,
+                       level_start_index, sampling_loc, attn_weight,
+                       grad_output, grad_value, grad_sampling_loc,
+                       grad_attn_weight, im2col_step);
+}
+
+Tensor ms_deform_attn_forward(const Tensor &value, const Tensor &spatial_shapes,
+                              const Tensor &level_start_index,
+                              const Tensor &sampling_loc,
+                              const Tensor &attn_weight,
+                              const int im2col_step) {
+  at::DeviceGuard guard(value.device());
+  return ms_deform_attn_impl_forward(value, spatial_shapes, level_start_index,
+                                     sampling_loc, attn_weight, im2col_step);
+}
+
+void ms_deform_attn_backward(const Tensor &value, const Tensor &spatial_shapes,
+                             const Tensor &level_start_index,
+                             const Tensor &sampling_loc,
+                             const Tensor &attn_weight,
+                             const Tensor &grad_output, Tensor &grad_value,
+                             Tensor &grad_sampling_loc,
+                             Tensor &grad_attn_weight, const int im2col_step) {
+  at::DeviceGuard guard(value.device());
+  ms_deform_attn_impl_backward(value, spatial_shapes, level_start_index,
+                               sampling_loc, attn_weight, grad_output,
+                               grad_value, grad_sampling_loc, grad_attn_weight,
+                               im2col_step);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a3ad786a8e08129fa84fa73b710637e6e23b2994
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/ms_deform_attn_parrots.cpp
@@ -0,0 +1,69 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <torch/extension.h>
+
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+using namespace at;
+using namespace parrots;
+
+Tensor ms_deform_attn_forward(const Tensor &value, const Tensor &spatial_shapes,
+                              const Tensor &level_start_index,
+                              const Tensor &sampling_loc,
+                              const Tensor &attn_weight, const int im2col_step);
+
+void ms_deform_attn_backward(const Tensor &value, const Tensor &spatial_shapes,
+                             const Tensor &level_start_index,
+                             const Tensor &sampling_loc,
+                             const Tensor &attn_weight,
+                             const Tensor &grad_output, Tensor &grad_value,
+                             Tensor &grad_sampling_loc,
+                             Tensor &grad_attn_weight, const int im2col_step);
+
+void ms_deform_attn_forward_parrots(CudaContext &ctx, const SSElement &attr,
+                                    const OperatorBase::in_list_t &ins,
+                                    OperatorBase::out_list_t &outs) {
+  int im2col_step;
+  SSAttrs(attr).get<int>("im2col_step", im2col_step).done();
+  const auto &value = buildATensor(ctx, ins[0]);
+  const auto &spatial_shapes = buildATensor(ctx, ins[1]);
+  const auto &level_start_index = buildATensor(ctx, ins[2]);
+  const auto &sampling_loc = buildATensor(ctx, ins[3]);
+  const auto &attn_weight = buildATensor(ctx, ins[4]);
+  auto out = ms_deform_attn_forward(value, spatial_shapes, level_start_index,
+                                    sampling_loc, attn_weight, im2col_step);
+  updateDArray(ctx, out, outs[0]);
+}
+
+void ms_deform_attn_backward_parrots(CudaContext &ctx, const SSElement &attr,
+                                     const OperatorBase::in_list_t &ins,
+                                     OperatorBase::out_list_t &outs) {
+  int im2col_step;
+  SSAttrs(attr).get<int>("im2col_step", im2col_step).done();
+  const auto &value = buildATensor(ctx, ins[0]);
+  const auto &spatial_shapes = buildATensor(ctx, ins[1]);
+  const auto &level_start_index = buildATensor(ctx, ins[2]);
+  const auto &sampling_loc = buildATensor(ctx, ins[3]);
+  const auto &attn_weight = buildATensor(ctx, ins[4]);
+  const auto &grad_output = buildATensor(ctx, ins[5]);
+  auto grad_value = buildATensor(ctx, outs[0]);
+  auto grad_sampling_loc = buildATensor(ctx, outs[1]);
+  auto grad_attn_weight = buildATensor(ctx, outs[2]);
+  ms_deform_attn_backward(value, spatial_shapes, level_start_index,
+                          sampling_loc, attn_weight, grad_output, grad_value,
+                          grad_sampling_loc, grad_attn_weight, im2col_step);
+}
+
+PARROTS_EXTENSION_REGISTER(ms_deform_attn_forward)
+    .attr("im2col_step")
+    .input(5)
+    .output(1)
+    .apply(ms_deform_attn_forward_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(ms_deform_attn_backward)
+    .attr("im2col_step")
+    .input(6)
+    .output(3)
+    .apply(ms_deform_attn_backward_parrots)
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..199d8af236f5442fcdd53ce3dfd8d24aa67481bb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms.cpp
@@ -0,0 +1,33 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return DISPATCH_DEVICE_IMPL(nms_impl, boxes, scores, iou_threshold, offset);
+}
+
+Tensor softnms_impl(Tensor boxes, Tensor scores, Tensor dets,
+                    float iou_threshold, float sigma, float min_score,
+                    int method, int offset) {
+  return DISPATCH_DEVICE_IMPL(softnms_impl, boxes, scores, dets, iou_threshold,
+                              sigma, min_score, method, offset);
+}
+
+std::vector<std::vector<int> > nms_match_impl(Tensor dets,
+                                              float iou_threshold) {
+  return DISPATCH_DEVICE_IMPL(nms_match_impl, dets, iou_threshold);
+}
+
+Tensor nms(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return nms_impl(boxes, scores, iou_threshold, offset);
+}
+
+Tensor softnms(Tensor boxes, Tensor scores, Tensor dets, float iou_threshold,
+               float sigma, float min_score, int method, int offset) {
+  return softnms_impl(boxes, scores, dets, iou_threshold, sigma, min_score,
+                      method, offset);
+}
+
+std::vector<std::vector<int> > nms_match(Tensor dets, float iou_threshold) {
+  return nms_match_impl(dets, iou_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..db8b5f16e9a276a9891f0a415276c334ebf0901f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_parrots.cpp
@@ -0,0 +1,140 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "nms_pytorch.h"
+
+using namespace parrots;
+
+// Tensor nms(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+template <typename T>
+void nms_parrots(T& ctx, const SSElement& attr,
+                 const OperatorBase::in_list_t& ins,
+                 OperatorBase::out_list_t& outs) {
+  float iou_threshold;
+  int offset;
+  SSAttrs(attr)
+      .get("iou_threshold", iou_threshold)
+      .get("offset", offset)
+      .done();
+  at::Tensor boxes, scores;
+  boxes = buildATensor(ctx, ins[0]);
+  scores = buildATensor(ctx, ins[1]);
+  auto out = nms(boxes, scores, iou_threshold, offset);
+  updateDArray(ctx, out, outs[0]);
+}
+
+/*Tensor softnms(Tensor boxes, Tensor scores, Tensor dets, float iou_threshold,
+ *                float sigma, float min_score, int method, int offset);*/
+template <typename T>
+void softnms_parrots(T& ctx, const SSElement& attr,
+                     const OperatorBase::in_list_t& ins,
+                     OperatorBase::out_list_t& outs) {
+  float iou_threshold, sigma, min_score;
+  int method, offset;
+  SSAttrs(attr)
+      .get("iou_threshold", iou_threshold)
+      .get("sigma", sigma)
+      .get("min_score", min_score)
+      .get("method", method)
+      .get("offset", offset)
+      .done();
+  at::Tensor boxes, scores, dets;
+  boxes = buildATensor(ctx, ins[0]);
+  scores = buildATensor(ctx, ins[1]);
+  dets = buildATensor(ctx, ins[2]);
+  auto out = softnms(boxes, scores, dets, iou_threshold, sigma, min_score,
+                     method, offset);
+  updateDArray(ctx, out, outs[0]);
+}
+
+// std::vector<std::vector<int> > nms_match(Tensor dets, float iou_threshold);
+template <typename T>
+void nms_match_parrots(T& ctx, const SSElement& attr,
+                       const OperatorBase::in_list_t& ins,
+                       OperatorBase::out_list_t& outs) {
+  float iou_threshold;
+  SSAttrs(attr).get("iou_threshold", iou_threshold).done();
+  at::Tensor dets;
+  dets = buildATensor(ctx, ins[0]);
+  auto out = nms_match(dets, iou_threshold);
+  int n = out.size(), m = 0;
+  for (int i = 0; i < n; ++i)
+    if (m < out[i].size()) m = out[i].size();
+  auto options = torch::TensorOptions().dtype(at::kInt);
+  auto tensor = torch::zeros({n, m}, options);
+  for (int i = 0; i < n; i++)
+    tensor.slice(0, i, i + 1) =
+        torch::from_blob(out[i].data(), {out[i].size()}, options);
+  updateDArray(ctx, tensor, outs[0]);
+}
+
+/*Tensor nms_rotated(const Tensor dets, const Tensor scores, const Tensor order,
+ *                    const Tensor dets_sorted, const float iou_threshold,
+ *                                       const int multi_label);*/
+template <typename T>
+void nms_rotated_parrots(T& ctx, const SSElement& attr,
+                         const OperatorBase::in_list_t& ins,
+                         OperatorBase::out_list_t& outs) {
+  float iou_threshold;
+  int multi_label;
+  SSAttrs(attr)
+      .get("iou_threshold", iou_threshold)
+      .get("multi_label", multi_label)
+      .done();
+  at::Tensor dets, scores, order, dets_sorted;
+  dets = buildATensor(ctx, ins[0]);
+  scores = buildATensor(ctx, ins[1]);
+  order = buildATensor(ctx, ins[2]);
+  dets_sorted = buildATensor(ctx, ins[3]);
+  auto out =
+      nms_rotated(dets, scores, order, dets_sorted, iou_threshold, multi_label);
+  updateDArray(ctx, out, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(nms)
+    .attr("iou_threshold")
+    .attr("offset")
+    .input(2)
+    .output(1)
+    .apply(nms_parrots<HostContext>)
+#ifdef MMCV_WITH_CUDA
+    .apply(nms_parrots<CudaContext>)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(softnms)
+    .attr("iou_threshold")
+    .attr("sigma")
+    .attr("min_score")
+    .attr("method")
+    .attr("offset")
+    .input(3)
+    .output(1)
+    .apply(softnms_parrots<HostContext>)
+#ifdef MMCV_WITH_CUDA
+    .apply(softnms_parrots<CudaContext>)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(nms_match)
+    .attr("iou_threshold")
+    .input(1)
+    .output(1)
+    .apply(nms_match_parrots<HostContext>)
+#ifdef MMCV_WITH_CUDA
+    .apply(nms_match_parrots<CudaContext>)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(nms_rotated)
+    .attr("multi_label")
+    .attr("iou_threshold")
+    .input(4)
+    .output(1)
+    .apply(nms_rotated_parrots<HostContext>)
+#ifdef MMCV_WITH_CUDA
+    .apply(nms_rotated_parrots<CudaContext>)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..78c680e57c3089b44d29586175f56a5599560914
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_pytorch.h
@@ -0,0 +1,18 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef NMS_PYTORCH_H
+#define NMS_PYTORCH_H
+#include <torch/extension.h>
+
+at::Tensor nms(at::Tensor boxes, at::Tensor scores, float iou_threshold,
+               int offset);
+
+at::Tensor softnms(at::Tensor boxes, at::Tensor scores, at::Tensor dets,
+                   float iou_threshold, float sigma, float min_score,
+                   int method, int offset);
+
+std::vector<std::vector<int> > nms_match(at::Tensor dets, float iou_threshold);
+
+at::Tensor nms_rotated(const at::Tensor dets, const at::Tensor scores,
+                       const at::Tensor order, const at::Tensor dets_sorted,
+                       const float iou_threshold, const int multi_label);
+#endif  // NMS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e4ef676a9d6f94e5f60b7c9e1df8ce78eb6cbaa2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/nms_rotated.cpp
@@ -0,0 +1,32 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/nms_rotated/nms_rotated.h
+#include "pytorch_cpp_helper.hpp"
+
+Tensor nms_rotated_cpu(const Tensor dets, const Tensor scores,
+                       const float iou_threshold);
+
+#ifdef MMCV_WITH_CUDA
+Tensor nms_rotated_cuda(const Tensor dets, const Tensor scores,
+                        const Tensor order, const Tensor dets_sorted,
+                        const float iou_threshold, const int multi_label);
+#endif
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+Tensor nms_rotated(const Tensor dets, const Tensor scores, const Tensor order,
+                   const Tensor dets_sorted, const float iou_threshold,
+                   const int multi_label) {
+  assert(dets.device().is_cuda() == scores.device().is_cuda());
+  if (dets.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    return nms_rotated_cuda(dets, scores, order, dets_sorted, iou_threshold,
+                            multi_label);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+
+  return nms_rotated_cpu(dets, scores, iou_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2bf8c8bbf2061cacb9e0c2d33c8a635834407622
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group.cpp
@@ -0,0 +1,26 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// It is modified from https://github.com/WenmuZhou/PAN.pytorch
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+std::vector<std::vector<float>> pixel_group_impl(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float dis_threshold) {
+  return DISPATCH_DEVICE_IMPL(pixel_group_impl, score, mask, embedding,
+                              kernel_label, kernel_contour, kernel_region_num,
+                              dis_threshold);
+}
+
+std::vector<std::vector<float>> pixel_group(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float distance_threshold) {
+  score = score.contiguous();
+  mask = mask.contiguous();
+  embedding = embedding.contiguous();
+  kernel_label = kernel_label.contiguous();
+  kernel_contour = kernel_contour.contiguous();
+
+  return pixel_group_impl(score, mask, embedding, kernel_label, kernel_contour,
+                          kernel_region_num, distance_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bd863a4e1b341441b3700fe3931c9bb78c159ee6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_parrots.cpp
@@ -0,0 +1,54 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "pixel_group_pytorch.h"
+
+using namespace parrots;
+using namespace std;
+
+template <typename T>
+void pixel_group_parrots(T& ctx, const SSElement& attr,
+                         const OperatorBase::in_list_t& ins,
+                         OperatorBase::out_list_t& outs) {
+  int kernel_region_num;
+  float distance_threshold;
+  SSAttrs(attr)
+      .get<int>("kernel_region_num", kernel_region_num)
+      .get<float>("distance_threshold", distance_threshold)
+      .done();
+  at::Tensor score;
+  at::Tensor mask;
+  at::Tensor embedding;
+  at::Tensor kernel_label;
+  at::Tensor kernel_contour;
+  score = buildATensor(ctx, ins[0]);
+  mask = buildATensor(ctx, ins[1]);
+  embedding = buildATensor(ctx, ins[2]);
+  kernel_label = buildATensor(ctx, ins[3]);
+  kernel_contour = buildATensor(ctx, ins[4]);
+  auto out = pixel_group(score, mask, embedding, kernel_label, kernel_contour,
+                         kernel_region_num, distance_threshold);
+  int n = out.size();
+  std::vector<float> out_tensor;
+  for (int i = 0; i < n; ++i) out_tensor.push_back(float(out[i].size()));
+  for (int i = 0; i < n; ++i)
+    out_tensor.insert(out_tensor.end(), out[i].begin(), out[i].end());
+  auto options = torch::TensorOptions().dtype(at::kFloat);
+  auto tensor = torch::zeros({1, out_tensor.size()}, options);
+  tensor.slice(0, 0, 1) =
+      torch::from_blob(out_tensor.data(), {out_tensor.size()}, options);
+  updateDArray(ctx, tensor, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(pixel_group)
+    .attr("kernel_region_num")
+    .attr("distance_threshold")
+    .input(5)
+    .output(1)
+    .apply(pixel_group_parrots<HostContext>)
+#ifdef MMCV_WITH_CUDA
+    .apply(pixel_group_parrots<CudaContext>)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..1686ef3ee3647ada5fa37ded01415c37a4186f2d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/pixel_group_pytorch.h
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef PIXEL_GROUP_PYTORCH_H
+#define PIXEL_GROUP_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+std::vector<std::vector<float>> pixel_group(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float distance_threshold);
+
+#endif  // PIXEL_GROUP_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..540da94038f6dea2dc10443905f289ddd131f1af
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes.cpp
@@ -0,0 +1,44 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void points_in_boxes_part_forward_impl(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points) {
+  DISPATCH_DEVICE_IMPL(points_in_boxes_part_forward_impl, batch_size, boxes_num,
+                       pts_num, boxes, pts, box_idx_of_points);
+}
+
+void points_in_boxes_all_forward_impl(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points) {
+  DISPATCH_DEVICE_IMPL(points_in_boxes_all_forward_impl, batch_size, boxes_num,
+                       pts_num, boxes, pts, box_idx_of_points);
+}
+
+void points_in_boxes_part_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                  Tensor box_idx_of_points_tensor) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box params pts: (B, npoints, 3)
+  // [x, y, z] in LiDAR coordinate params boxes_idx_of_points: (B, npoints),
+  // default -1
+  int batch_size = boxes_tensor.size(0);
+  int boxes_num = boxes_tensor.size(1);
+  int pts_num = pts_tensor.size(1);
+  points_in_boxes_part_forward_impl(batch_size, boxes_num, pts_num,
+                                    boxes_tensor, pts_tensor,
+                                    box_idx_of_points_tensor);
+}
+
+void points_in_boxes_all_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor box_idx_of_points_tensor) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center. params pts: (B, npoints, 3) [x, y, z]
+  // in LiDAR coordinate params boxes_idx_of_points: (B, npoints), default -1
+  int batch_size = boxes_tensor.size(0);
+  int boxes_num = boxes_tensor.size(1);
+  int pts_num = pts_tensor.size(1);
+  points_in_boxes_all_forward_impl(batch_size, boxes_num, pts_num, boxes_tensor,
+                                   pts_tensor, box_idx_of_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..afd2b0eb2d6c84f0dc44229c08b6b764185365fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_parrots.cpp
@@ -0,0 +1,64 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "points_in_boxes_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void points_in_boxes_part_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto boxes_tensor = buildATensor(ctx, ins[0]);
+  auto pts_tensor = buildATensor(ctx, ins[1]);
+
+  auto box_idx_of_points_tensor = buildATensor(ctx, outs[0]);
+
+  points_in_boxes_part_forward(boxes_tensor, pts_tensor,
+                               box_idx_of_points_tensor);
+}
+
+void points_in_boxes_all_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  auto boxes_tensor = buildATensor(ctx, ins[0]);
+  auto pts_tensor = buildATensor(ctx, ins[1]);
+
+  auto box_idx_of_points_tensor = buildATensor(ctx, outs[0]);
+
+  points_in_boxes_all_forward(boxes_tensor, pts_tensor,
+                              box_idx_of_points_tensor);
+}
+
+PARROTS_EXTENSION_REGISTER(points_in_boxes_part_forward)
+    .input(2)
+    .output(1)
+    .apply(points_in_boxes_part_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(points_in_boxes_all_forward)
+    .input(2)
+    .output(1)
+    .apply(points_in_boxes_all_forward_cuda_parrots)
+    .done();
+#endif
+
+void points_in_boxes_forward_cpu_parrots(HostContext& ctx,
+                                         const SSElement& attr,
+                                         const OperatorBase::in_list_t& ins,
+                                         OperatorBase::out_list_t& outs) {
+  auto boxes_tensor = buildATensor(ctx, ins[0]);
+  auto pts_tensor = buildATensor(ctx, ins[1]);
+
+  auto pts_indices_tensor = buildATensor(ctx, outs[0]);
+
+  points_in_boxes_cpu_forward(boxes_tensor, pts_tensor, pts_indices_tensor);
+}
+
+PARROTS_EXTENSION_REGISTER(points_in_boxes_cpu_forward)
+    .input(2)
+    .output(1)
+    .apply(points_in_boxes_forward_cpu_parrots)
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..f3e465e3c785e5e78c020f61eaeaa23e59d1948a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_boxes_pytorch.h
@@ -0,0 +1,16 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef POINTS_IN_BOXES_PYTORCH_H
+#define POINTS_IN_BOXES_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void points_in_boxes_part_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                  Tensor box_idx_of_points_tensor);
+
+void points_in_boxes_all_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor box_idx_of_points_tensor);
+
+void points_in_boxes_cpu_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor pts_indices_tensor);
+
+#endif  // POINTS_IN_BOXES_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..75a93dcef33f23904c1218048e16beff65c230d1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons.cpp
@@ -0,0 +1,15 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void points_in_polygons_forward_impl(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols) {
+  DISPATCH_DEVICE_IMPL(points_in_polygons_forward_impl, points, polygons,
+                       output, rows, cols);
+}
+
+void points_in_polygons_forward(Tensor points, Tensor polygons, Tensor output) {
+  int rows = points.size(0);
+  int cols = polygons.size(0);
+  points_in_polygons_forward_impl(points, polygons, output, rows, cols);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d52018e6451f52d0c10648cea2ee036b3214376d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_parrots.cpp
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "points_in_polygons_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void points_in_polygons_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  auto points = buildATensor(ctx, ins[0]);
+  auto polygons = buildATensor(ctx, ins[1]);
+
+  auto output = buildATensor(ctx, outs[0]);
+
+  points_in_polygons_forward(points, polygons, output);
+}
+
+PARROTS_EXTENSION_REGISTER(points_in_polygons_forward)
+    .input(2)
+    .output(1)
+    .apply(points_in_polygons_cuda_parrots)
+    .done();
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..042678143472b18c85ac6d1bdcd79cc97a4e7ab0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/points_in_polygons_pytorch.h
@@ -0,0 +1,9 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef POINTS_IN_POLYGONS_PYTORCH_H
+#define POINTS_IN_POLYGONS_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void points_in_polygons_forward(Tensor points, Tensor polygons, Tensor output);
+
+#endif  // POINTS_IN_POLYGONS_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..00db84a154bef7a7cee8d38ba6236d959849a3bc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool.cpp
@@ -0,0 +1,47 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void prroi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_forward_impl, input, rois, output,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void prroi_pool_backward_impl(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_backward_impl, grad_output, rois, grad_input,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void prroi_pool_coor_backward_impl(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_coor_backward_impl, output, grad_output,
+                       input, rois, grad_rois, pooled_height, pooled_width,
+                       spatial_scale);
+}
+
+void prroi_pool_forward(Tensor input, Tensor rois, Tensor output,
+                        int pooled_height, int pooled_width,
+                        float spatial_scale) {
+  prroi_pool_forward_impl(input, rois, output, pooled_height, pooled_width,
+                          spatial_scale);
+}
+
+void prroi_pool_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                         int pooled_height, int pooled_width,
+                         float spatial_scale) {
+  prroi_pool_backward_impl(grad_output, rois, grad_input, pooled_height,
+                           pooled_width, spatial_scale);
+}
+
+void prroi_pool_coor_backward(Tensor output, Tensor grad_output, Tensor input,
+                              Tensor rois, Tensor grad_rois, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  prroi_pool_coor_backward_impl(output, grad_output, input, rois, grad_rois,
+                                pooled_height, pooled_width, spatial_scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4e82955818640f3276255f14cd1e7db232117773
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_parrots.cpp
@@ -0,0 +1,97 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "prroi_pool_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void prroi_pool_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  prroi_pool_forward(input, rois, output, pooled_height, pooled_width,
+                     spatial_scale);
+}
+
+void prroi_pool_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  prroi_pool_backward(grad_output, rois, grad_input, pooled_height,
+                      pooled_width, spatial_scale);
+}
+
+void prroi_pool_coor_backward_cuda_parrots(CudaContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .done();
+
+  const auto& output = buildATensor(ctx, ins[0]);
+  const auto& grad_output = buildATensor(ctx, ins[1]);
+  const auto& input = buildATensor(ctx, ins[2]);
+  const auto& rois = buildATensor(ctx, ins[3]);
+  auto grad_rois = buildATensor(ctx, outs[0]);
+  prroi_pool_coor_backward(output, grad_output, input, rois, grad_rois,
+                           pooled_height, pooled_width, spatial_scale);
+}
+
+PARROTS_EXTENSION_REGISTER(prroi_pool_forward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .input(2)
+    .output(1)
+    .apply(prroi_pool_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(prroi_pool_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .input(2)
+    .output(1)
+    .apply(prroi_pool_backward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(prroi_pool_coor_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .input(4)
+    .output(1)
+    .apply(prroi_pool_coor_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..451b01dd5d289cd6a4533f62f326b326cd89da16
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/prroi_pool_pytorch.h
@@ -0,0 +1,19 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef PRROI_POOL_PYTORCH_H
+#define PRROI_POOL_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void prroi_pool_forward(Tensor input, Tensor rois, Tensor output,
+                        int pooled_height, int pooled_width,
+                        float spatial_scale);
+
+void prroi_pool_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                         int pooled_height, int pooled_width,
+                         float spatial_scale);
+
+void prroi_pool_coor_backward(Tensor output, Tensor grad_output, Tensor input,
+                              Tensor rois, Tensor grad_rois, int pooled_height,
+                              int pooled_width, float spatial_scale);
+
+#endif  // PRROI_POOL_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6064c9ba5fd7ec9bcfef22b3abcc65ef50106d67
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/hszhao/semseg/blob/master/lib/psa/src
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask) {
+  DISPATCH_DEVICE_IMPL(psamask_forward_impl, psa_type, input, output, num_,
+                       h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                       half_w_mask);
+}
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask) {
+  DISPATCH_DEVICE_IMPL(psamask_backward_impl, psa_type, grad_output, grad_input,
+                       num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                       half_w_mask);
+}
+
+void psamask_forward(const Tensor input, Tensor output, const int psa_type,
+                     const int num_, const int h_feature, const int w_feature,
+                     const int h_mask, const int w_mask, const int half_h_mask,
+                     const int half_w_mask) {
+  psamask_forward_impl(psa_type, input, output, num_, h_feature, w_feature,
+                       h_mask, w_mask, half_h_mask, half_w_mask);
+}
+
+void psamask_backward(Tensor grad_output, const Tensor grad_input,
+                      const int psa_type, const int num_, const int h_feature,
+                      const int w_feature, const int h_mask, const int w_mask,
+                      const int half_h_mask, const int half_w_mask) {
+  psamask_backward_impl(psa_type, grad_output, grad_input, num_, h_feature,
+                        w_feature, h_mask, w_mask, half_h_mask, half_w_mask);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f67102d02cc124a81d300aea4946c65155ede81d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_parrots.cpp
@@ -0,0 +1,129 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "psamask_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void psamask_forward_cuda_parrots(CudaContext &ctx, const SSElement &attr,
+                                  const OperatorBase::in_list_t &ins,
+                                  OperatorBase::out_list_t &outs) {
+  int psa_type, num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+      half_w_mask;
+  SSAttrs(attr)
+      .get<int>("psa_type", psa_type)
+      .get<int>("num_", num_)
+      .get<int>("h_feature", h_feature)
+      .get<int>("w_feature", w_feature)
+      .get<int>("h_mask", h_mask)
+      .get<int>("w_mask", w_mask)
+      .get<int>("half_h_mask", half_h_mask)
+      .get<int>("half_w_mask", half_w_mask)
+      .done();
+  const auto &input = buildATensor(ctx, ins[0]);
+  auto output = buildATensor(ctx, outs[0]);
+  psamask_forward_cuda(psa_type, input, output, num_, h_feature, w_feature,
+                       h_mask, w_mask, half_h_mask, half_w_mask);
+}
+
+void psamask_backward_cuda_parrots(CudaContext &ctx, const SSElement &attr,
+                                   const OperatorBase::in_list_t &ins,
+                                   OperatorBase::out_list_t &outs) {
+  int psa_type, num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+      half_w_mask;
+  SSAttrs(attr)
+      .get<int>("psa_type", psa_type)
+      .get<int>("num_", num_)
+      .get<int>("h_feature", h_feature)
+      .get<int>("w_feature", w_feature)
+      .get<int>("h_mask", h_mask)
+      .get<int>("w_mask", w_mask)
+      .get<int>("half_h_mask", half_h_mask)
+      .get<int>("half_w_mask", half_w_mask)
+      .done();
+
+  const auto &grad_output = buildATensor(ctx, ins[0]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  psamask_backward_cuda(psa_type, grad_output, grad_input, num_, h_feature,
+                        w_feature, h_mask, w_mask, half_h_mask, half_w_mask);
+}
+#endif
+
+void psamask_forward_cpu_parrots(HostContext &ctx, const SSElement &attr,
+                                 const OperatorBase::in_list_t &ins,
+                                 OperatorBase::out_list_t &outs) {
+  int psa_type, num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+      half_w_mask;
+  SSAttrs(attr)
+      .get<int>("psa_type", psa_type)
+      .get<int>("num_", num_)
+      .get<int>("h_feature", h_feature)
+      .get<int>("w_feature", w_feature)
+      .get<int>("h_mask", h_mask)
+      .get<int>("w_mask", w_mask)
+      .get<int>("half_h_mask", half_h_mask)
+      .get<int>("half_w_mask", half_w_mask)
+      .done();
+  const auto &input = buildATensor(ctx, ins[0]);
+  auto output = buildATensor(ctx, outs[0]);
+  psamask_forward_cpu(psa_type, input, output, num_, h_feature, w_feature,
+                      h_mask, w_mask, half_h_mask, half_w_mask);
+}
+
+void psamask_backward_cpu_parrots(HostContext &ctx, const SSElement &attr,
+                                  const OperatorBase::in_list_t &ins,
+                                  OperatorBase::out_list_t &outs) {
+  int psa_type, num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+      half_w_mask;
+  SSAttrs(attr)
+      .get<int>("psa_type", psa_type)
+      .get<int>("num_", num_)
+      .get<int>("h_feature", h_feature)
+      .get<int>("w_feature", w_feature)
+      .get<int>("h_mask", h_mask)
+      .get<int>("w_mask", w_mask)
+      .get<int>("half_h_mask", half_h_mask)
+      .get<int>("half_w_mask", half_w_mask)
+      .done();
+
+  const auto &grad_output = buildATensor(ctx, ins[0]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  psamask_backward_cpu(psa_type, grad_output, grad_input, num_, h_feature,
+                       w_feature, h_mask, w_mask, half_h_mask, half_w_mask);
+}
+
+PARROTS_EXTENSION_REGISTER(psamask_forward)
+    .attr("psa_type")
+    .attr("num_")
+    .attr("h_feature")
+    .attr("w_feature")
+    .attr("h_mask")
+    .attr("w_mask")
+    .attr("half_h_mask")
+    .attr("half_w_mask")
+    .input(1)
+    .output(1)
+    .apply(psamask_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(psamask_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(psamask_backward)
+    .attr("psa_type")
+    .attr("num_")
+    .attr("h_feature")
+    .attr("w_feature")
+    .attr("h_mask")
+    .attr("w_mask")
+    .attr("half_h_mask")
+    .attr("half_w_mask")
+    .input(1)
+    .output(1)
+    .apply(psamask_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(psamask_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..c3f0579efb8b8149f1840d0a20fc5ba91df74f06
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/psamask_pytorch.h
@@ -0,0 +1,31 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef PSAMASK_PYTORCH_H
+#define PSAMASK_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+#ifdef MMCV_WITH_CUDA
+void psamask_forward_cuda(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_cuda(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask);
+#endif
+void psamask_forward_cpu(const int psa_type, const Tensor input, Tensor output,
+                         const int num_, const int h_feature,
+                         const int w_feature, const int h_mask,
+                         const int w_mask, const int half_h_mask,
+                         const int half_w_mask);
+
+void psamask_backward_cpu(const int psa_type, const Tensor grad_output,
+                          Tensor grad_input, const int num_,
+                          const int h_feature, const int w_feature,
+                          const int h_mask, const int w_mask,
+                          const int half_h_mask, const int half_w_mask);
+#endif  // PSAMASK_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..81ffa9fd6dcd82117ca13ac83b88b5f023aca466
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated.cpp
@@ -0,0 +1,42 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void riroi_align_rotated_forward_impl(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise) {
+  DISPATCH_DEVICE_IMPL(riroi_align_rotated_forward_impl, features, rois, output,
+                       pooled_height, pooled_width, spatial_scale, num_samples,
+                       num_orientations, clockwise);
+}
+
+void riroi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise) {
+  DISPATCH_DEVICE_IMPL(riroi_align_rotated_backward_impl, top_grad, rois,
+                       bottom_grad, pooled_height, pooled_width, spatial_scale,
+                       num_samples, num_orientations, clockwise);
+}
+
+void riroi_align_rotated_forward(Tensor features, Tensor rois, Tensor output,
+                                 int pooled_height, int pooled_width,
+                                 float spatial_scale, int num_samples,
+                                 int num_orientations, bool clockwise) {
+  riroi_align_rotated_forward_impl(features, rois, output, pooled_height,
+                                   pooled_width, spatial_scale, num_samples,
+                                   num_orientations, clockwise);
+}
+
+void riroi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                  Tensor bottom_grad, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int num_samples, int num_orientations,
+                                  bool clockwise) {
+  riroi_align_rotated_backward_impl(top_grad, rois, bottom_grad, pooled_height,
+                                    pooled_width, spatial_scale, num_samples,
+                                    num_orientations, clockwise);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5eb340ce42cf0ed4ccbe66a4b97aaed55a13be8b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_parrots.cpp
@@ -0,0 +1,86 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "riroi_align_rotated_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void riroi_align_rotated_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sample_num;
+  int num_orientations;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("num_samples", sample_num)
+      .get<int>("num_orientations", num_orientations)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  auto input = buildATensor(ctx, ins[0]);
+  auto rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  riroi_align_rotated_forward(input, rois, output, pooled_height, pooled_width,
+                              spatial_scale, sample_num, num_orientations,
+                              clockwise);
+}
+
+void riroi_align_rotated_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sample_num;
+  int num_orientations;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("num_samples", sample_num)
+      .get<int>("num_orientations", num_orientations)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  auto grad_output = buildATensor(ctx, ins[0]);
+  auto rois = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  riroi_align_rotated_backward(grad_output, rois, grad_input, pooled_height,
+                               pooled_width, spatial_scale, sample_num,
+                               num_orientations, clockwise);
+}
+
+PARROTS_EXTENSION_REGISTER(riroi_align_rotated_forward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("num_samples")
+    .attr("num_orientations")
+    .attr("clockwise")
+    .input(2)
+    .output(1)
+    .apply(riroi_align_rotated_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(riroi_align_rotated_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("num_samples")
+    .attr("num_orientations")
+    .attr("clockwise")
+    .input(2)
+    .output(1)
+    .apply(riroi_align_rotated_backward_cuda_parrots)
+    .done();
+
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..49a30bffaffe059c98884332449c6af817036390
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/riroi_align_rotated_pytorch.h
@@ -0,0 +1,18 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef RIROI_ALIGN_ROTATED_PYTORCH_H
+#define RIROI_ALIGN_ROTATED_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void riroi_align_rotated_forward(Tensor features, Tensor rois, Tensor output,
+                                 int pooled_height, int pooled_width,
+                                 float spatial_scale, int num_samples,
+                                 int num_orientations, bool clockwise);
+
+void riroi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                  Tensor bottom_grad, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int num_samples, int num_orientations,
+                                  bool clockwise);
+
+#endif  // RIROI_ALIGN_ROTATED_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6e7077397d06ecd55af1e1060e64fe8c5ff08c94
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_forward_impl, input, rois, output, argmax_y,
+                       argmax_x, aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_backward_impl, grad_output, rois, argmax_y,
+                       argmax_x, grad_input, aligned_height, aligned_width,
+                       spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward(Tensor input, Tensor rois, Tensor output,
+                       Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                       int aligned_width, float spatial_scale,
+                       int sampling_ratio, int pool_mode, bool aligned) {
+  roi_align_forward_impl(input, rois, output, argmax_y, argmax_x,
+                         aligned_height, aligned_width, spatial_scale,
+                         sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                        Tensor argmax_x, Tensor grad_input, int aligned_height,
+                        int aligned_width, float spatial_scale,
+                        int sampling_ratio, int pool_mode, bool aligned) {
+  roi_align_backward_impl(grad_output, rois, argmax_y, argmax_x, grad_input,
+                          aligned_height, aligned_width, spatial_scale,
+                          sampling_ratio, pool_mode, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..60abea092709427b0e62c101931911c2c1924cf1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_parrots.cpp
@@ -0,0 +1,151 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "roi_align_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void roi_align_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                    const OperatorBase::in_list_t& ins,
+                                    OperatorBase::out_list_t& outs) {
+  int aligned_height;
+  int aligned_width;
+  float spatial_scale;
+  int sampling_ratio;
+  int pool_mode;
+  bool aligned;
+  SSAttrs(attr)
+      .get<int>("aligned_height", aligned_height)
+      .get<int>("aligned_width", aligned_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<int>("pool_mode", pool_mode)
+      .get<bool>("aligned", aligned)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  auto argmax_y = buildATensor(ctx, outs[1]);
+  auto argmax_x = buildATensor(ctx, outs[2]);
+  roi_align_forward_cuda(input, rois, output, argmax_y, argmax_x,
+                         aligned_height, aligned_width, spatial_scale,
+                         sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                     const OperatorBase::in_list_t& ins,
+                                     OperatorBase::out_list_t& outs) {
+  int aligned_height;
+  int aligned_width;
+  float spatial_scale;
+  int sampling_ratio;
+  int pool_mode;
+  bool aligned;
+  SSAttrs(attr)
+      .get<int>("aligned_height", aligned_height)
+      .get<int>("aligned_width", aligned_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<int>("pool_mode", pool_mode)
+      .get<bool>("aligned", aligned)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  const auto& argmax_y = buildATensor(ctx, ins[2]);
+  const auto& argmax_x = buildATensor(ctx, ins[3]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  roi_align_backward_cuda(grad_output, rois, argmax_y, argmax_x, grad_input,
+                          aligned_height, aligned_width, spatial_scale,
+                          sampling_ratio, pool_mode, aligned);
+}
+#endif
+
+void roi_align_forward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                   const OperatorBase::in_list_t& ins,
+                                   OperatorBase::out_list_t& outs) {
+  int aligned_height;
+  int aligned_width;
+  float spatial_scale;
+  int sampling_ratio;
+  int pool_mode;
+  bool aligned;
+  SSAttrs(attr)
+      .get<int>("aligned_height", aligned_height)
+      .get<int>("aligned_width", aligned_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<int>("pool_mode", pool_mode)
+      .get<bool>("aligned", aligned)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  auto argmax_y = buildATensor(ctx, outs[1]);
+  auto argmax_x = buildATensor(ctx, outs[2]);
+  roi_align_forward_cpu(input, rois, output, argmax_y, argmax_x, aligned_height,
+                        aligned_width, spatial_scale, sampling_ratio, pool_mode,
+                        aligned);
+}
+
+void roi_align_backward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                    const OperatorBase::in_list_t& ins,
+                                    OperatorBase::out_list_t& outs) {
+  int aligned_height;
+  int aligned_width;
+  float spatial_scale;
+  int sampling_ratio;
+  int pool_mode;
+  bool aligned;
+  SSAttrs(attr)
+      .get<int>("aligned_height", aligned_height)
+      .get<int>("aligned_width", aligned_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<int>("pool_mode", pool_mode)
+      .get<bool>("aligned", aligned)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  const auto& argmax_y = buildATensor(ctx, ins[2]);
+  const auto& argmax_x = buildATensor(ctx, ins[3]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  roi_align_backward_cpu(grad_output, rois, argmax_y, argmax_x, grad_input,
+                         aligned_height, aligned_width, spatial_scale,
+                         sampling_ratio, pool_mode, aligned);
+}
+
+PARROTS_EXTENSION_REGISTER(roi_align_forward)
+    .attr("aligned_height")
+    .attr("aligned_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("pool_mode")
+    .attr("aligned")
+    .input(2)
+    .output(3)
+    .apply(roi_align_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(roi_align_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(roi_align_backward)
+    .attr("aligned_height")
+    .attr("aligned_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("pool_mode")
+    .attr("aligned")
+    .input(4)
+    .output(1)
+    .apply(roi_align_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(roi_align_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..4c60160984fd964663547c590025558780c8c62f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_pytorch.h
@@ -0,0 +1,32 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROI_ALIGN_PYTORCH_H
+#define ROI_ALIGN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+#ifdef MMCV_WITH_CUDA
+void roi_align_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+void roi_align_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned);
+#endif
+
+void roi_align_forward_cpu(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                           int aligned_width, float spatial_scale,
+                           int sampling_ratio, int pool_mode, bool aligned);
+
+void roi_align_backward_cpu(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                            Tensor argmax_x, Tensor grad_input,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+#endif  // ROI_ALIGN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5ef691ada07e599740906254369631189e5d6f51
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_align_rotated_forward_impl(Tensor features, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sample_ratio,
+                                    bool aligned, bool clockwise) {
+  DISPATCH_DEVICE_IMPL(roi_align_rotated_forward_impl, features, rois, output,
+                       aligned_height, aligned_width, spatial_scale,
+                       sample_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sample_ratio, bool aligned,
+                                     bool clockwise) {
+  DISPATCH_DEVICE_IMPL(roi_align_rotated_backward_impl, top_grad, rois,
+                       bottom_grad, aligned_height, aligned_width,
+                       spatial_scale, sample_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_forward(Tensor input, Tensor rois, Tensor output,
+                               int aligned_height, int aligned_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned, bool clockwise) {
+  roi_align_rotated_forward_impl(input, rois, output, aligned_height,
+                                 aligned_width, spatial_scale, sampling_ratio,
+                                 aligned, clockwise);
+}
+
+void roi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                Tensor bottom_grad, int aligned_height,
+                                int aligned_width, float spatial_scale,
+                                int sampling_ratio, bool aligned,
+                                bool clockwise) {
+  roi_align_rotated_backward_impl(top_grad, rois, bottom_grad, aligned_height,
+                                  aligned_width, spatial_scale, sampling_ratio,
+                                  aligned, clockwise);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9386250a27b1db338bcc522c4acf9b29b05077db
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_parrots.cpp
@@ -0,0 +1,147 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "roi_align_rotated_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void roi_align_rotated_forward_cuda_parrots(CudaContext& ctx,
+                                            const SSElement& attr,
+                                            const OperatorBase::in_list_t& ins,
+                                            OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  bool aligned;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<bool>("aligned", aligned)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  roi_align_rotated_forward_cuda(input, rois, output, pooled_height,
+                                 pooled_width, spatial_scale, sampling_ratio,
+                                 aligned, clockwise);
+}
+
+void roi_align_rotated_backward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  bool aligned;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<bool>("aligned", aligned)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  roi_align_rotated_backward_cuda(grad_output, rois, grad_input, pooled_height,
+                                  pooled_width, spatial_scale, sampling_ratio,
+                                  aligned, clockwise);
+}
+#endif
+
+void roi_align_rotated_forward_cpu_parrots(HostContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  bool aligned;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<bool>("aligned", aligned)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  roi_align_rotated_forward_cpu(input, rois, output, pooled_height,
+                                pooled_width, spatial_scale, sampling_ratio,
+                                aligned, clockwise);
+}
+
+void roi_align_rotated_backward_cpu_parrots(HostContext& ctx,
+                                            const SSElement& attr,
+                                            const OperatorBase::in_list_t& ins,
+                                            OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  int sampling_ratio;
+  bool aligned;
+  bool clockwise;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("sampling_ratio", sampling_ratio)
+      .get<bool>("aligned", aligned)
+      .get<bool>("clockwise", clockwise)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  roi_align_rotated_backward_cpu(grad_output, rois, grad_input, pooled_height,
+                                 pooled_width, spatial_scale, sampling_ratio,
+                                 aligned, clockwise);
+}
+
+PARROTS_EXTENSION_REGISTER(roi_align_rotated_forward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("aligned")
+    .attr("clockwise")
+    .input(2)
+    .output(1)
+    .apply(roi_align_rotated_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(roi_align_rotated_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(roi_align_rotated_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .attr("sampling_ratio")
+    .attr("aligned")
+    .attr("clockwise")
+    .input(2)
+    .output(1)
+    .apply(roi_align_rotated_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(roi_align_rotated_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..8136b56d133d4dfa32b0d1aa2a02425560dee0e0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_align_rotated_pytorch.h
@@ -0,0 +1,31 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROI_ALIGN_ROTATED_PYTORCH_H
+#define ROI_ALIGN_ROTATED_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+#ifdef MMCV_WITH_CUDA
+void roi_align_rotated_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                                    int pooled_height, int pooled_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_cuda(Tensor grad_output, Tensor rois,
+                                     Tensor bottom_grad, int pooled_height,
+                                     int pooled_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise);
+#endif
+
+void roi_align_rotated_forward_cpu(Tensor input, Tensor rois, Tensor output,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_cpu(Tensor grad_output, Tensor rois,
+                                    Tensor bottom_grad, int pooled_height,
+                                    int pooled_width, float spatial_scale,
+                                    int sampling_ratio, bool aligned,
+                                    bool clockwise);
+
+#endif  // ROI_ALIGN_ROTATED_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bba90b806c5fe59d9e20a0b41a51df9922e91c3f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool.cpp
@@ -0,0 +1,31 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(roi_pool_forward_impl, input, rois, output, argmax,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(roi_pool_backward_impl, grad_output, rois, argmax,
+                       grad_input, pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_forward(Tensor input, Tensor rois, Tensor output, Tensor argmax,
+                      int pooled_height, int pooled_width,
+                      float spatial_scale) {
+  roi_pool_forward_impl(input, rois, output, argmax, pooled_height,
+                        pooled_width, spatial_scale);
+}
+
+void roi_pool_backward(Tensor grad_output, Tensor rois, Tensor argmax,
+                       Tensor grad_input, int pooled_height, int pooled_width,
+                       float spatial_scale) {
+  roi_pool_backward_impl(grad_output, rois, argmax, grad_input, pooled_height,
+                         pooled_width, spatial_scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0acde4a41e46ccac53c8b4bae80bd88fb2fde6d6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_parrots.cpp
@@ -0,0 +1,67 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "roi_pool_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void roi_pool_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                   const OperatorBase::in_list_t& ins,
+                                   OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  auto argmax = buildATensor(ctx, outs[1]);
+  roi_pool_forward_cuda(input, rois, output, argmax, pooled_height,
+                        pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                    const OperatorBase::in_list_t& ins,
+                                    OperatorBase::out_list_t& outs) {
+  int pooled_height;
+  int pooled_width;
+  float spatial_scale;
+  SSAttrs(attr)
+      .get<int>("pooled_height", pooled_height)
+      .get<int>("pooled_width", pooled_width)
+      .get<float>("spatial_scale", spatial_scale)
+      .done();
+
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& rois = buildATensor(ctx, ins[1]);
+  const auto& argmax = buildATensor(ctx, ins[2]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  roi_pool_backward_cuda(grad_output, rois, argmax, grad_input, pooled_height,
+                         pooled_width, spatial_scale);
+}
+
+PARROTS_EXTENSION_REGISTER(roi_pool_forward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .input(2)
+    .output(2)
+    .apply(roi_pool_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(roi_pool_backward)
+    .attr("pooled_height")
+    .attr("pooled_width")
+    .attr("spatial_scale")
+    .input(3)
+    .output(1)
+    .apply(roi_pool_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..d67a1502fe955fa469cc5f854687df88ee432756
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roi_pool_pytorch.h
@@ -0,0 +1,16 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROI_POOL_PYTORCH_H
+#define ROI_POOL_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+#ifdef MMCV_WITH_CUDA
+void roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale);
+
+void roi_pool_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale);
+#endif
+#endif  // ROI_POOL_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6cf9cf0945db4c0ce1774aed6d334b62f3e1a9e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d.cpp
@@ -0,0 +1,72 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roiaware_pool3d_forward_impl(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method) {
+  DISPATCH_DEVICE_IMPL(roiaware_pool3d_forward_impl, boxes_num, pts_num,
+                       channels, max_pts_each_voxel, out_x, out_y, out_z, rois,
+                       pts, pts_feature, argmax, pts_idx_of_voxels,
+                       pooled_features, pool_method);
+}
+
+void roiaware_pool3d_backward_impl(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method) {
+  DISPATCH_DEVICE_IMPL(roiaware_pool3d_backward_impl, boxes_num, out_x, out_y,
+                       out_z, channels, max_pts_each_voxel, pts_idx_of_voxels,
+                       argmax, grad_out, grad_in, pool_method);
+}
+
+void roiaware_pool3d_forward(Tensor rois, Tensor pts, Tensor pts_feature,
+                             Tensor argmax, Tensor pts_idx_of_voxels,
+                             Tensor pooled_features, int pool_method) {
+  // params rois: (N, 7) [x, y, z, x_size, y_size, z_size, ry] in LiDAR
+  // coordinate
+  // params pts: (npoints, 3) [x, y, z] in LiDAR coordinate
+  // params pts_feature: (npoints, C)
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params pooled_features: (N, out_x, out_y, out_z, C)
+  // params pool_method: 0: max_pool 1: avg_pool
+  int boxes_num = rois.size(0);
+  int pts_num = pts.size(0);
+  int channels = pts_feature.size(1);
+  int max_pts_each_voxel = pts_idx_of_voxels.size(4);  // index 0 is the counter
+  int out_x = pts_idx_of_voxels.size(1);
+  int out_y = pts_idx_of_voxels.size(2);
+  int out_z = pts_idx_of_voxels.size(3);
+  assert((out_x < 256) && (out_y < 256) &&
+         (out_z < 256));  // we encode index with 8bit
+
+  roiaware_pool3d_forward_impl(boxes_num, pts_num, channels, max_pts_each_voxel,
+                               out_x, out_y, out_z, rois, pts, pts_feature,
+                               argmax, pts_idx_of_voxels, pooled_features,
+                               pool_method);
+}
+
+void roiaware_pool3d_backward(Tensor pts_idx_of_voxels, Tensor argmax,
+                              Tensor grad_out, Tensor grad_in,
+                              int pool_method) {
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params grad_out: (N, out_x, out_y, out_z, C)
+  // params grad_in: (npoints, C), return value
+  // params pool_method: 0: max_pool 1: avg_pool
+  int boxes_num = pts_idx_of_voxels.size(0);
+  int out_x = pts_idx_of_voxels.size(1);
+  int out_y = pts_idx_of_voxels.size(2);
+  int out_z = pts_idx_of_voxels.size(3);
+  int max_pts_each_voxel = pts_idx_of_voxels.size(4);  // index 0 is the counter
+  int channels = grad_out.size(4);
+
+  roiaware_pool3d_backward_impl(boxes_num, out_x, out_y, out_z, channels,
+                                max_pts_each_voxel, pts_idx_of_voxels, argmax,
+                                grad_out, grad_in, pool_method);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..771d920043869cd538377a9f9a7320dd67243c69
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_parrots.cpp
@@ -0,0 +1,58 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "roiaware_pool3d_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void roiaware_pool3d_forward_cuda_parrots(CudaContext& ctx,
+                                          const SSElement& attr,
+                                          const OperatorBase::in_list_t& ins,
+                                          OperatorBase::out_list_t& outs) {
+  int pool_method;
+  SSAttrs(attr).get<int>("pool_method", pool_method).done();
+  auto rois = buildATensor(ctx, ins[0]);
+  auto pts = buildATensor(ctx, ins[1]);
+  auto pts_feature = buildATensor(ctx, ins[2]);
+
+  auto argmax = buildATensor(ctx, outs[0]);
+  auto pts_idx_of_voxels = buildATensor(ctx, outs[1]);
+  auto pooled_features = buildATensor(ctx, outs[2]);
+
+  roiaware_pool3d_forward(rois, pts, pts_feature, argmax, pts_idx_of_voxels,
+                          pooled_features, pool_method);
+}
+
+void roiaware_pool3d_backward_cuda_parrots(CudaContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  int pool_method;
+  SSAttrs(attr).get<int>("pool_method", pool_method).done();
+  auto pts_idx_of_voxels = buildATensor(ctx, ins[0]);
+  auto argmax = buildATensor(ctx, ins[1]);
+  auto grad_out = buildATensor(ctx, ins[2]);
+
+  auto grad_in = buildATensor(ctx, outs[0]);
+
+  roiaware_pool3d_backward(pts_idx_of_voxels, argmax, grad_out, grad_in,
+                           pool_method);
+}
+
+PARROTS_EXTENSION_REGISTER(roiaware_pool3d_forward)
+    .attr("pool_method")
+    .input(3)
+    .output(3)
+    .apply(roiaware_pool3d_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(roiaware_pool3d_backward)
+    .attr("pool_method")
+    .input(3)
+    .output(1)
+    .apply(roiaware_pool3d_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..0b4b0402afa573c2231a3667fec41632ed854ad2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roiaware_pool3d_pytorch.h
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROIAWARE_POOL3D_PYTORCH_H
+#define ROIAWARE_POOL3D_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void roiaware_pool3d_forward(Tensor rois, Tensor pts, Tensor pts_feature,
+                             Tensor argmax, Tensor pts_idx_of_voxels,
+                             Tensor pooled_features, int pool_method);
+
+void roiaware_pool3d_backward(Tensor pts_idx_of_voxels, Tensor argmax,
+                              Tensor grad_out, Tensor grad_in, int pool_method);
+
+#endif  // ROIAWARE_POOL3D_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a10080b7c23abb3a31b6f764c972ea7917f52346
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d.cpp
@@ -0,0 +1,39 @@
+/*
+Modified from
+https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/roipoint_pool3d/src/roipoint_pool3d.cpp
+Point cloud feature pooling
+Written by Shaoshuai Shi
+All Rights Reserved 2018.
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roipoint_pool3d_forward_impl(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag) {
+  DISPATCH_DEVICE_IMPL(roipoint_pool3d_forward_impl, batch_size, pts_num,
+                       boxes_num, feature_in_len, sampled_pts_num, xyz, boxes3d,
+                       pts_feature, pooled_features, pooled_empty_flag);
+}
+
+void roipoint_pool3d_forward(Tensor xyz, Tensor boxes3d, Tensor pts_feature,
+                             Tensor pooled_features, Tensor pooled_empty_flag) {
+  // params xyz: (B, N, 3)
+  // params boxes3d: (B, M, 7)
+  // params pts_feature: (B, N, C)
+  // params pooled_features: (B, M, 512, 3+C)
+  // params pooled_empty_flag: (B, M)
+  int batch_size = xyz.size(0);
+  int pts_num = xyz.size(1);
+  int boxes_num = boxes3d.size(1);
+  int feature_in_len = pts_feature.size(2);
+  int sampled_pts_num = pooled_features.size(2);
+
+  roipoint_pool3d_forward_impl(batch_size, pts_num, boxes_num, feature_in_len,
+                               sampled_pts_num, xyz, boxes3d, pts_feature,
+                               pooled_features, pooled_empty_flag);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..17f549849df4d433d5c7369f5f43715d1f88a56e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_parrots.cpp
@@ -0,0 +1,31 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "roipoint_pool3d_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void roipoint_pool3d_forward_cuda_parrots(CudaContext& ctx,
+                                          const SSElement& attr,
+                                          const OperatorBase::in_list_t& ins,
+                                          OperatorBase::out_list_t& outs) {
+  auto xyz = buildATensor(ctx, ins[0]);
+  auto boxes3d = buildATensor(ctx, ins[1]);
+  auto pts_feature = buildATensor(ctx, ins[2]);
+
+  auto pooled_features = buildATensor(ctx, outs[0]);
+  auto pooled_empty_flag = buildATensor(ctx, outs[1]);
+
+  roipoint_pool3d_forward(xyz, boxes3d, pts_feature, pooled_features,
+                          pooled_empty_flag);
+}
+
+PARROTS_EXTENSION_REGISTER(roipoint_pool3d_forward)
+    .input(3)
+    .output(2)
+    .apply(roipoint_pool3d_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..e5b61b0d9ab2d2ed6ea3db9947ae8dc1e0d96992
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/roipoint_pool3d_pytorch.h
@@ -0,0 +1,10 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROIPOINT_POOL3D_PYTORCH_H
+#define ROIPOINT_POOL3D_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void roipoint_pool3d_forward(Tensor xyz, Tensor boxes3d, Tensor pts_feature,
+                             Tensor pooled_features, Tensor pooled_empty_flag);
+
+#endif  // ROIPOINT_POOL3D_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..71fe0c9a0a26003310a388d4edca6e79aa7b9026
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align.cpp
@@ -0,0 +1,39 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection/blob/master/mmdet/ops/fr/src/feature_refine_cuda.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void rotated_feature_align_forward_impl(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output) {
+  DISPATCH_DEVICE_IMPL(rotated_feature_align_forward_impl, features,
+                       best_bboxes, spatial_scale, points, output);
+}
+
+void rotated_feature_align_backward_impl(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad) {
+  DISPATCH_DEVICE_IMPL(rotated_feature_align_backward_impl, top_grad,
+                       best_bboxes, spatial_scale, points, bottom_grad);
+}
+
+void rotated_feature_align_forward(const Tensor features,
+                                   const Tensor best_bboxes, Tensor output,
+                                   const float spatial_scale,
+                                   const int points) {
+  rotated_feature_align_forward_impl(features, best_bboxes, spatial_scale,
+                                     points, output);
+}
+
+void rotated_feature_align_backward(const Tensor top_grad,
+                                    const Tensor best_bboxes,
+                                    Tensor bottom_grad,
+                                    const float spatial_scale,
+                                    const int points) {
+  rotated_feature_align_backward_impl(top_grad, best_bboxes, spatial_scale,
+                                      points, bottom_grad);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d4efaf1d3a3d8f382047202defc7546b8af6c48f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_parrots.cpp
@@ -0,0 +1,99 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "rotated_feature_align_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void rotated_feature_align_forward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float spatial_scale;
+  int points;
+  SSAttrs(attr)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("points", points)
+      .done();
+
+  auto features = buildATensor(ctx, ins[0]);
+  auto best_bboxes = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  rotated_feature_align_forward(features, best_bboxes, output, spatial_scale,
+                                points);
+}
+
+void rotated_feature_align_backward_cuda_parrots(
+    CudaContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float spatial_scale;
+  int points;
+  SSAttrs(attr)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("points", points)
+      .done();
+
+  auto grad_output = buildATensor(ctx, ins[0]);
+  auto best_bboxes = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  rotated_feature_align_backward(grad_output, best_bboxes, grad_input,
+                                 spatial_scale, points);
+}
+#endif
+
+void rotated_feature_align_forward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float spatial_scale;
+  int points;
+  SSAttrs(attr)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("points", points)
+      .done();
+
+  auto features = buildATensor(ctx, ins[0]);
+  auto best_bboxes = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  rotated_feature_align_forward(features, best_bboxes, output, spatial_scale,
+                                points);
+}
+
+void rotated_feature_align_backward_cpu_parrots(
+    HostContext& ctx, const SSElement& attr, const OperatorBase::in_list_t& ins,
+    OperatorBase::out_list_t& outs) {
+  float spatial_scale;
+  int points;
+  SSAttrs(attr)
+      .get<float>("spatial_scale", spatial_scale)
+      .get<int>("points", points)
+      .done();
+
+  auto grad_output = buildATensor(ctx, ins[0]);
+  auto best_bboxes = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  rotated_feature_align_backward(grad_output, best_bboxes, grad_input,
+                                 spatial_scale, points);
+}
+
+PARROTS_EXTENSION_REGISTER(rotated_feature_align_forward)
+    .attr("spatial_scale")
+    .attr("points")
+    .input(2)
+    .output(1)
+    .apply(rotated_feature_align_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(rotated_feature_align_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(rotated_feature_align_backward)
+    .attr("spatial_scale")
+    .attr("points")
+    .input(2)
+    .output(1)
+    .apply(rotated_feature_align_backward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(rotated_feature_align_backward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..9a695ee5e3de4b2d8f77e93fb06986967f3a35d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/rotated_feature_align_pytorch.h
@@ -0,0 +1,17 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef ROTATED_FEATURE_ALIGN_PYTORCH_H
+#define ROTATED_FEATURE_ALIGN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void rotated_feature_align_forward(const Tensor features,
+                                   const Tensor best_bboxes, Tensor output,
+                                   const float spatial_scale, const int points);
+
+void rotated_feature_align_backward(const Tensor top_grad,
+                                    const Tensor best_bboxes,
+                                    Tensor bottom_grad,
+                                    const float spatial_scale,
+                                    const int points);
+
+#endif  // ROTATED_FEATURE_ALIGN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..fd5a513273a7bbce2cf41c790706fe4801f4c414
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn.cpp
@@ -0,0 +1,69 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void sync_bn_forward_mean_impl(const Tensor input, Tensor mean) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_mean_impl, input, mean);
+}
+
+void sync_bn_forward_var_impl(const Tensor input, const Tensor mean,
+                              Tensor var) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_var_impl, input, mean, var);
+}
+
+void sync_bn_forward_output_impl(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_output_impl, input, mean, var,
+                       running_mean, running_var, weight, bias, norm, std,
+                       output, eps, momentum, group_size);
+}
+
+void sync_bn_backward_param_impl(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias) {
+  DISPATCH_DEVICE_IMPL(sync_bn_backward_param_impl, grad_output, norm,
+                       grad_weight, grad_bias);
+}
+
+void sync_bn_backward_data_impl(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input) {
+  DISPATCH_DEVICE_IMPL(sync_bn_backward_data_impl, grad_output, weight,
+                       grad_weight, grad_bias, norm, std, grad_input);
+}
+
+void sync_bn_forward_mean(const Tensor input, Tensor mean) {
+  sync_bn_forward_mean_impl(input, mean);
+}
+
+void sync_bn_forward_var(const Tensor input, const Tensor mean, Tensor var) {
+  sync_bn_forward_var_impl(input, mean, var);
+}
+
+void sync_bn_forward_output(const Tensor input, const Tensor mean,
+                            const Tensor var, const Tensor weight,
+                            const Tensor bias, Tensor running_mean,
+                            Tensor running_var, Tensor norm, Tensor std,
+                            Tensor output, float eps, float momentum,
+                            int group_size) {
+  sync_bn_forward_output_impl(input, mean, var, running_mean, running_var,
+                              weight, bias, norm, std, output, eps, momentum,
+                              group_size);
+}
+
+void sync_bn_backward_param(const Tensor grad_output, const Tensor norm,
+                            Tensor grad_weight, Tensor grad_bias) {
+  sync_bn_backward_param_impl(grad_output, norm, grad_weight, grad_bias);
+}
+
+void sync_bn_backward_data(const Tensor grad_output, const Tensor weight,
+                           const Tensor grad_weight, const Tensor grad_bias,
+                           const Tensor norm, const Tensor std,
+                           Tensor grad_input) {
+  sync_bn_backward_data_impl(grad_output, weight, grad_weight, grad_bias, norm,
+                             std, grad_input);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0b1855abd1cca4bd0cd831c3b86e50f273779339
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_parrots.cpp
@@ -0,0 +1,111 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "sync_bn_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void sync_bn_forward_mean_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  const auto& input = buildATensor(ctx, ins[0]);
+  auto mean = buildATensor(ctx, outs[0]);
+  sync_bn_forward_mean_cuda(input, mean);
+}
+
+void sync_bn_forward_var_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                      const OperatorBase::in_list_t& ins,
+                                      OperatorBase::out_list_t& outs) {
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& mean = buildATensor(ctx, ins[1]);
+  auto var = buildATensor(ctx, outs[0]);
+  sync_bn_forward_var_cuda(input, mean, var);
+}
+
+void sync_bn_forward_output_cuda_parrots(CudaContext& ctx,
+                                         const SSElement& attr,
+                                         const OperatorBase::in_list_t& ins,
+                                         OperatorBase::out_list_t& outs) {
+  size_t group_size;
+  float eps, momentum;
+  SSAttrs(attr)
+      .get<float>("eps", eps)
+      .get<float>("momentum", momentum)
+      .get<size_t>("group_size", group_size)
+      .done();
+
+  const auto& input = buildATensor(ctx, ins[0]);
+  const auto& mean = buildATensor(ctx, ins[1]);
+  const auto& var = buildATensor(ctx, ins[2]);
+  const auto& weight = buildATensor(ctx, ins[3]);
+  const auto& bias = buildATensor(ctx, ins[4]);
+  auto running_mean = buildATensor(ctx, outs[0]);
+  auto running_var = buildATensor(ctx, outs[1]);
+  auto norm = buildATensor(ctx, outs[2]);
+  auto std = buildATensor(ctx, outs[3]);
+  auto output = buildATensor(ctx, outs[4]);
+  sync_bn_forward_output_cuda(input, mean, var, running_mean, running_var,
+                              weight, bias, norm, std, output, eps, momentum,
+                              group_size);
+}
+
+void sync_bn_backward_param_cuda_parrots(CudaContext& ctx,
+                                         const SSElement& attr,
+                                         const OperatorBase::in_list_t& ins,
+                                         OperatorBase::out_list_t& outs) {
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& norm = buildATensor(ctx, ins[1]);
+  auto grad_weight = buildATensor(ctx, outs[0]);
+  auto grad_bias = buildATensor(ctx, outs[1]);
+  sync_bn_backward_param_cuda(grad_output, norm, grad_weight, grad_bias);
+}
+
+void sync_bn_backward_data_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  const auto& grad_output = buildATensor(ctx, ins[0]);
+  const auto& weight = buildATensor(ctx, ins[1]);
+  const auto& grad_weight = buildATensor(ctx, ins[2]);
+  const auto& grad_bias = buildATensor(ctx, ins[3]);
+  const auto& norm = buildATensor(ctx, ins[4]);
+  const auto& std = buildATensor(ctx, ins[5]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  sync_bn_backward_data_cuda(grad_output, weight, grad_weight, grad_bias, norm,
+                             std, grad_input);
+}
+
+PARROTS_EXTENSION_REGISTER(sync_bn_forward_mean)
+    .input(1)
+    .output(1)
+    .apply(sync_bn_forward_mean_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(sync_bn_forward_var)
+    .input(2)
+    .output(1)
+    .apply(sync_bn_forward_var_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(sync_bn_forward_output)
+    .attr("eps")
+    .attr("momentum")
+    .attr("group_size")
+    .input(5)
+    .output(5)
+    .apply(sync_bn_forward_output_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(sync_bn_backward_param)
+    .input(2)
+    .output(2)
+    .apply(sync_bn_backward_param_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(sync_bn_backward_data)
+    .input(6)
+    .output(1)
+    .apply(sync_bn_backward_data_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..6bd6a7fada22ed512489f74d69445042b9aaf84b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/sync_bn_pytorch.h
@@ -0,0 +1,26 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef SYNC_BN_PYTORCH_H
+#define SYNC_BN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void sync_bn_forward_mean_cuda(const Tensor input, Tensor mean);
+
+void sync_bn_forward_var_cuda(const Tensor input, const Tensor mean,
+                              Tensor var);
+
+void sync_bn_forward_output_cuda(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size);
+
+void sync_bn_backward_param_cuda(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias);
+
+void sync_bn_backward_data_cuda(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input);
+#endif  // SYNC_BN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1e0ec71bb3d3fdb8416dcc62cfda926cc45c9977
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate.cpp
@@ -0,0 +1,33 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void three_interpolate_forward_impl(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out) {
+  DISPATCH_DEVICE_IMPL(three_interpolate_forward_impl, b, c, m, n, points, idx,
+                       weight, out);
+}
+
+void three_interpolate_backward_impl(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(three_interpolate_backward_impl, b, c, n, m, grad_out,
+                       idx, weight, grad_points);
+}
+
+void three_interpolate_forward(Tensor points_tensor, Tensor idx_tensor,
+                               Tensor weight_tensor, Tensor out_tensor, int b,
+                               int c, int m, int n) {
+  three_interpolate_forward_impl(b, c, m, n, points_tensor, idx_tensor,
+                                 weight_tensor, out_tensor);
+}
+
+void three_interpolate_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                Tensor weight_tensor, Tensor grad_points_tensor,
+                                int b, int c, int n, int m) {
+  three_interpolate_backward_impl(b, c, n, m, grad_out_tensor, idx_tensor,
+                                  weight_tensor, grad_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a71a90fd1e6b0321e14665265430a31c2934cb51
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_parrots.cpp
@@ -0,0 +1,74 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "three_interpolate_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void three_interpolate_forward_cuda_parrots(CudaContext& ctx,
+                                            const SSElement& attr,
+                                            const OperatorBase::in_list_t& ins,
+                                            OperatorBase::out_list_t& outs) {
+  int b, c, m, n;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("m", m)
+      .get<int>("n", n)
+      .done();
+
+  auto points_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+  auto weight_tensor = buildATensor(ctx, ins[2]);
+
+  auto out_tensor = buildATensor(ctx, outs[0]);
+
+  three_interpolate_forward(points_tensor, idx_tensor, weight_tensor,
+                            out_tensor, b, c, m, n);
+}
+
+void three_interpolate_backward_cuda_parrots(CudaContext& ctx,
+                                             const SSElement& attr,
+                                             const OperatorBase::in_list_t& ins,
+                                             OperatorBase::out_list_t& outs) {
+  int b, c, n, m;
+  SSAttrs(attr)
+      .get<int>("b", b)
+      .get<int>("c", c)
+      .get<int>("n", n)
+      .get<int>("m", m)
+      .done();
+
+  auto grad_out_tensor = buildATensor(ctx, ins[0]);
+  auto idx_tensor = buildATensor(ctx, ins[1]);
+  auto weight_tensor = buildATensor(ctx, ins[2]);
+
+  auto grad_points_tensor = buildATensor(ctx, outs[0]);
+
+  three_interpolate_backward(grad_out_tensor, idx_tensor, weight_tensor,
+                             grad_points_tensor, b, c, n, m);
+}
+
+PARROTS_EXTENSION_REGISTER(three_interpolate_forward)
+    .attr("b")
+    .attr("c")
+    .attr("m")
+    .attr("n")
+    .input(3)
+    .output(1)
+    .apply(three_interpolate_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(three_interpolate_backward)
+    .attr("b")
+    .attr("c")
+    .attr("n")
+    .attr("m")
+    .input(3)
+    .output(1)
+    .apply(three_interpolate_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..464c6d90051529e2f2c694bfda9cb15f5998c9c5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_interpolate_pytorch.h
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef THREE_INTERPOLATE_PYTORCH_H
+#define THREE_INTERPOLATE_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void three_interpolate_forward(Tensor points_tensor, Tensor idx_tensor,
+                               Tensor weight_tensor, Tensor out_tensor, int b,
+                               int c, int m, int n);
+
+void three_interpolate_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                Tensor weight_tensor, Tensor grad_points_tensor,
+                                int b, int c, int n, int m);
+#endif  // THREE_INTERPOLATE_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b629200c0727cdec5ca4e0abd8ac65baacaa31f9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn.cpp
@@ -0,0 +1,18 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void three_nn_forward_impl(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx) {
+  DISPATCH_DEVICE_IMPL(three_nn_forward_impl, b, n, m, unknown, known, dist2,
+                       idx);
+}
+
+void three_nn_forward(Tensor unknown_tensor, Tensor known_tensor,
+                      Tensor dist2_tensor, Tensor idx_tensor, int b, int n,
+                      int m) {
+  three_nn_forward_impl(b, n, m, unknown_tensor, known_tensor, dist2_tensor,
+                        idx_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c28c7d216cc6c2d4ab55de26b7b9d9e0197642b3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_parrots.cpp
@@ -0,0 +1,35 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "three_nn_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void three_nn_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                   const OperatorBase::in_list_t& ins,
+                                   OperatorBase::out_list_t& outs) {
+  int b, n, m;
+  SSAttrs(attr).get<int>("b", b).get<int>("n", n).get<int>("m", m).done();
+
+  auto unknown_tensor = buildATensor(ctx, ins[0]);
+  auto known_tensor = buildATensor(ctx, ins[1]);
+
+  auto dist2_tensor = buildATensor(ctx, outs[0]);
+  auto idx_tensor = buildATensor(ctx, outs[1]);
+
+  three_nn_forward(unknown_tensor, known_tensor, dist2_tensor, idx_tensor, b, n,
+                   m);
+}
+
+PARROTS_EXTENSION_REGISTER(three_nn_forward)
+    .attr("b")
+    .attr("n")
+    .attr("m")
+    .input(2)
+    .output(2)
+    .apply(three_nn_forward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..6574fba0912bd87425de995db5ddb6c7b715381d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/three_nn_pytorch.h
@@ -0,0 +1,10 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef THREE_NN_PYTORCH_H
+#define THREE_NN_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void three_nn_forward(Tensor unknown_tensor, Tensor known_tensor,
+                      Tensor dist2_tensor, Tensor idx_tensor, int b, int n,
+                      int m);
+#endif  // THREE_NN_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b03f587541f17cae3c3f03f5cb8747d4b0208efc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift.cpp
@@ -0,0 +1,20 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void tin_shift_forward_impl(Tensor input, Tensor shift, Tensor output) {
+  DISPATCH_DEVICE_IMPL(tin_shift_forward_impl, input, shift, output);
+}
+
+void tin_shift_backward_impl(Tensor grad_output, Tensor shift,
+                             Tensor grad_input) {
+  DISPATCH_DEVICE_IMPL(tin_shift_backward_impl, grad_output, shift, grad_input);
+}
+
+void tin_shift_forward(Tensor input, Tensor shift, Tensor output) {
+  tin_shift_forward_impl(input, shift, output);
+}
+
+void tin_shift_backward(Tensor grad_output, Tensor shift, Tensor grad_input) {
+  tin_shift_backward_impl(grad_output, shift, grad_input);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b0920928e73a0af9650726420396c6a481e1b2bd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_parrots.cpp
@@ -0,0 +1,39 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "tin_shift_pytorch.h"
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void tin_shift_forward_cuda_parrots(CudaContext &ctx, const SSElement &attr,
+                                    const OperatorBase::in_list_t &ins,
+                                    OperatorBase::out_list_t &outs) {
+  const auto &input = buildATensor(ctx, ins[0]);
+  const auto &shift = buildATensor(ctx, ins[1]);
+  auto output = buildATensor(ctx, outs[0]);
+  tin_shift_forward_cuda(input, shift, output);
+}
+
+void tin_shift_backward_cuda_parrots(CudaContext &ctx, const SSElement &attr,
+                                     const OperatorBase::in_list_t &ins,
+                                     OperatorBase::out_list_t &outs) {
+  const auto &grad_output = buildATensor(ctx, ins[0]);
+  const auto &shift = buildATensor(ctx, ins[1]);
+  auto grad_input = buildATensor(ctx, outs[0]);
+  tin_shift_backward_cuda(grad_output, shift, grad_input);
+}
+
+PARROTS_EXTENSION_REGISTER(tin_shift_forward)
+    .input(2)
+    .output(1)
+    .apply(tin_shift_forward_cuda_parrots)
+    .done();
+
+PARROTS_EXTENSION_REGISTER(tin_shift_backward)
+    .input(2)
+    .output(1)
+    .apply(tin_shift_backward_cuda_parrots)
+    .done();
+#endif
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..fe72383764cd0ed13fd8b74938027ea9db992d52
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/tin_shift_pytorch.h
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef TIN_SHIFT_PYTORCH_H
+#define TIN_SHIFT_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void tin_shift_forward_cuda(Tensor input, Tensor shift, Tensor output);
+
+void tin_shift_backward_cuda(Tensor grad_output, Tensor shift,
+                             Tensor grad_input);
+#endif  // TIN_SHIFT_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..dd325bd7887a49b5f0ccd134604f24c0fd40fc10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d.cpp
@@ -0,0 +1,118 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/upfirdn2d.cpp
+
+/*
+Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+
+NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+Augmentation (ADA)
+=======================================================================
+
+1. Definitions
+
+"Licensor" means any person or entity that distributes its Work.
+
+"Software" means the original work of authorship made available under
+this License.
+
+"Work" means the Software and any additions to or derivative works of
+the Software that are made available under this License.
+
+The terms "reproduce," "reproduction," "derivative works," and
+"distribution" have the meaning as provided under U.S. copyright law;
+provided, however, that for the purposes of this License, derivative
+works shall not include works that remain separable from, or merely
+link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are "made available" under this License
+by including in or with the Work either (a) a copyright notice
+referencing the applicability of this License to the Work, or (b) a
+copy of this License.
+
+2. License Grants
+
+    2.1 Copyright Grant. Subject to the terms and conditions of this
+    License, each Licensor grants to you a perpetual, worldwide,
+    non-exclusive, royalty-free, copyright license to reproduce,
+    prepare derivative works of, publicly display, publicly perform,
+    sublicense and distribute its Work and any resulting derivative
+    works in any form.
+
+3. Limitations
+
+    3.1 Redistribution. You may reproduce or distribute the Work only
+    if (a) you do so under this License, (b) you include a complete
+    copy of this License with your distribution, and (c) you retain
+    without modification any copyright, patent, trademark, or
+    attribution notices that are present in the Work.
+
+    3.2 Derivative Works. You may specify that additional or different
+    terms apply to the use, reproduction, and distribution of your
+    derivative works of the Work ("Your Terms") only if (a) Your Terms
+    provide that the use limitation in Section 3.3 applies to your
+    derivative works, and (b) you identify the specific derivative
+    works that are subject to Your Terms. Notwithstanding Your Terms,
+    this License (including the redistribution requirements in Section
+    3.1) will continue to apply to the Work itself.
+
+    3.3 Use Limitation. The Work and any derivative works thereof only
+    may be used or intended for use non-commercially. Notwithstanding
+    the foregoing, NVIDIA and its affiliates may use the Work and any
+    derivative works commercially. As used herein, "non-commercially"
+    means for research or evaluation purposes only.
+
+    3.4 Patent Claims. If you bring or threaten to bring a patent claim
+    against any Licensor (including any claim, cross-claim or
+    counterclaim in a lawsuit) to enforce any patents that you allege
+    are infringed by any Work, then your rights under this License from
+    such Licensor (including the grant in Section 2.1) will terminate
+    immediately.
+
+    3.5 Trademarks. This License does not grant any rights to use any
+    Licensor’s or its affiliates’ names, logos, or trademarks, except
+    as necessary to reproduce the notices described in this License.
+
+    3.6 Termination. If you violate any term of this License, then your
+    rights under this License (including the grant in Section 2.1) will
+    terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+torch::Tensor upfirdn2d_op_impl(const torch::Tensor& input,
+                                const torch::Tensor& kernel, int up_x, int up_y,
+                                int down_x, int down_y, int pad_x0, int pad_x1,
+                                int pad_y0, int pad_y1) {
+  return DISPATCH_DEVICE_IMPL(upfirdn2d_op_impl, input, kernel, up_x, up_y,
+                              down_x, down_y, pad_x0, pad_x1, pad_y0, pad_y1);
+}
+
+torch::Tensor upfirdn2d(const torch::Tensor& input, const torch::Tensor& kernel,
+                        int up_x, int up_y, int down_x, int down_y, int pad_x0,
+                        int pad_x1, int pad_y0, int pad_y1) {
+  return upfirdn2d_op_impl(input, kernel, up_x, up_y, down_x, down_y, pad_x0,
+                           pad_x1, pad_y0, pad_y1);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f0c50db5cdfca872a8231c26d6f578d0fdc171f5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/upfirdn2d_parrots.cpp
@@ -0,0 +1,47 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <torch/extension.h>
+
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+using namespace at;
+using namespace parrots;
+
+torch::Tensor upfirdn2d(const Tensor &input, const Tensor &kernel, int up_x,
+                        int up_y, int down_x, int down_y, int pad_x0,
+                        int pad_x1, int pad_y0, int pad_y1);
+
+void upfirdn2d_parrots(CudaContext &ctx, const SSElement &attr,
+                       const OperatorBase::in_list_t &ins,
+                       OperatorBase::out_list_t &outs) {
+  int up_x, up_y, down_x, down_y, pad_x0, pad_x1, pad_y0, pad_y1;
+  const auto &input = buildATensor(ctx, ins[0]);
+  const auto &kernel = buildATensor(ctx, ins[1]);
+  SSAttrs(attr)
+      .get("up_x", up_x)
+      .get("up_y", up_y)
+      .get("down_x", down_x)
+      .get("down_y", down_y)
+      .get("pad_x0", pad_x0)
+      .get("pad_x1", pad_x1)
+      .get("pad_y0", pad_y0)
+      .get("pad_y1", pad_y1)
+      .done();
+  auto out = upfirdn2d(input, kernel, up_x, up_y, down_x, down_y, pad_x0,
+                       pad_x1, pad_y0, pad_y1);
+  updateDArray(ctx, out, outs[0]);
+}
+
+PARROTS_EXTENSION_REGISTER(upfirdn2d)
+    .attr("up_x")
+    .attr("up_y")
+    .attr("down_x")
+    .attr("down_y")
+    .attr("pad_x0")
+    .attr("pad_x1")
+    .attr("pad_y0")
+    .attr("pad_y1")
+    .input(2)
+    .output(1)
+    .apply(upfirdn2d_parrots)
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7946be6178ad5eae64958b4631c1cabec2a04eee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization.cpp
@@ -0,0 +1,74 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+int hard_voxelize_forward_impl(const at::Tensor &points, at::Tensor &voxels,
+                               at::Tensor &coors,
+                               at::Tensor &num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim = 3) {
+  return DISPATCH_DEVICE_IMPL(hard_voxelize_forward_impl, points, voxels, coors,
+                              num_points_per_voxel, voxel_size, coors_range,
+                              max_points, max_voxels, NDim);
+}
+
+int nondeterministic_hard_voxelize_forward_impl(
+    const at::Tensor &points, at::Tensor &voxels, at::Tensor &coors,
+    at::Tensor &num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3) {
+  return DISPATCH_DEVICE_IMPL(nondeterministic_hard_voxelize_forward_impl,
+                              points, voxels, coors, num_points_per_voxel,
+                              voxel_size, coors_range, max_points, max_voxels,
+                              NDim);
+}
+
+void dynamic_voxelize_forward_impl(const at::Tensor &points, at::Tensor &coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim = 3) {
+  DISPATCH_DEVICE_IMPL(dynamic_voxelize_forward_impl, points, coors, voxel_size,
+                       coors_range, NDim);
+}
+
+void hard_voxelize_forward(const at::Tensor &points,
+                           const at::Tensor &voxel_size,
+                           const at::Tensor &coors_range, at::Tensor &voxels,
+                           at::Tensor &coors, at::Tensor &num_points_per_voxel,
+                           at::Tensor &voxel_num, const int max_points,
+                           const int max_voxels, const int NDim = 3,
+                           const bool deterministic = true) {
+  int64_t *voxel_num_data = voxel_num.data_ptr<int64_t>();
+  std::vector<float> voxel_size_v(
+      voxel_size.data_ptr<float>(),
+      voxel_size.data_ptr<float>() + voxel_size.numel());
+  std::vector<float> coors_range_v(
+      coors_range.data_ptr<float>(),
+      coors_range.data_ptr<float>() + coors_range.numel());
+
+  if (deterministic) {
+    *voxel_num_data = hard_voxelize_forward_impl(
+        points, voxels, coors, num_points_per_voxel, voxel_size_v,
+        coors_range_v, max_points, max_voxels, NDim);
+  } else {
+    *voxel_num_data = nondeterministic_hard_voxelize_forward_impl(
+        points, voxels, coors, num_points_per_voxel, voxel_size_v,
+        coors_range_v, max_points, max_voxels, NDim);
+  }
+}
+
+void dynamic_voxelize_forward(const at::Tensor &points,
+                              const at::Tensor &voxel_size,
+                              const at::Tensor &coors_range, at::Tensor &coors,
+                              const int NDim = 3) {
+  std::vector<float> voxel_size_v(
+      voxel_size.data_ptr<float>(),
+      voxel_size.data_ptr<float>() + voxel_size.numel());
+  std::vector<float> coors_range_v(
+      coors_range.data_ptr<float>(),
+      coors_range.data_ptr<float>() + coors_range.numel());
+  dynamic_voxelize_forward_impl(points, coors, voxel_size_v, coors_range_v,
+                                NDim);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_parrots.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_parrots.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..90e2a4445c217a49ecddf064455874b1be12a14f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_parrots.cpp
@@ -0,0 +1,113 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <parrots/compute/aten.hpp>
+#include <parrots/extension.hpp>
+#include <parrots/foundation/ssattrs.hpp>
+
+#include "voxelization_pytorch.h"
+
+using namespace parrots;
+
+#ifdef MMCV_WITH_CUDA
+void hard_voxelize_forward_cuda_parrots(CudaContext& ctx, const SSElement& attr,
+                                        const OperatorBase::in_list_t& ins,
+                                        OperatorBase::out_list_t& outs) {
+  int max_points, max_voxels, NDim;
+  bool deterministic;
+  SSAttrs(attr)
+      .get<int>("max_points", max_points)
+      .get<int>("max_voxels", max_voxels)
+      .get<int>("NDim", NDim)
+      .get<bool>("deterministic", deterministic)
+      .done();
+  const auto& points = buildATensor(ctx, ins[0]);
+  const auto& voxel_size = buildATensor(ctx, ins[1]);
+  const auto& coors_range = buildATensor(ctx, ins[2]);
+
+  auto voxels = buildATensor(ctx, outs[0]);
+  auto coors = buildATensor(ctx, outs[1]);
+  auto num_points_per_voxel = buildATensor(ctx, outs[2]);
+  auto voxel_num = buildATensor(ctx, outs[3]);
+
+  hard_voxelize_forward(points, voxel_size, coors_range, voxels, coors,
+                        num_points_per_voxel, voxel_num, max_points, max_voxels,
+                        NDim, deterministic);
+}
+
+void dynamic_voxelize_forward_cuda_parrots(CudaContext& ctx,
+                                           const SSElement& attr,
+                                           const OperatorBase::in_list_t& ins,
+                                           OperatorBase::out_list_t& outs) {
+  int NDim;
+  SSAttrs(attr).get<int>("NDim", NDim).done();
+  const auto& points = buildATensor(ctx, ins[0]);
+  const auto& voxel_size = buildATensor(ctx, ins[1]);
+  const auto& coors_range = buildATensor(ctx, ins[2]);
+
+  auto coors = buildATensor(ctx, outs[0]);
+
+  dynamic_voxelize_forward(points, voxel_size, coors_range, coors, NDim);
+}
+#endif
+
+void hard_voxelize_forward_cpu_parrots(HostContext& ctx, const SSElement& attr,
+                                       const OperatorBase::in_list_t& ins,
+                                       OperatorBase::out_list_t& outs) {
+  int max_points, max_voxels, NDim;
+  bool deterministic;
+  SSAttrs(attr)
+      .get<int>("max_points", max_points)
+      .get<int>("max_voxels", max_voxels)
+      .get<int>("NDim", NDim)
+      .get<bool>("deterministic", deterministic)
+      .done();
+  const auto& points = buildATensor(ctx, ins[0]);
+  const auto& voxel_size = buildATensor(ctx, ins[1]);
+  const auto& coors_range = buildATensor(ctx, ins[2]);
+
+  auto voxels = buildATensor(ctx, outs[0]);
+  auto coors = buildATensor(ctx, outs[1]);
+  auto num_points_per_voxel = buildATensor(ctx, outs[2]);
+  auto voxel_num = buildATensor(ctx, outs[3]);
+
+  hard_voxelize_forward(points, voxel_size, coors_range, voxels, coors,
+                        num_points_per_voxel, voxel_num, max_points, max_voxels,
+                        NDim, deterministic);
+}
+
+void dynamic_voxelize_forward_cpu_parrots(HostContext& ctx,
+                                          const SSElement& attr,
+                                          const OperatorBase::in_list_t& ins,
+                                          OperatorBase::out_list_t& outs) {
+  int NDim;
+  SSAttrs(attr).get<int>("NDim", NDim).done();
+  const auto& points = buildATensor(ctx, ins[0]);
+  const auto& voxel_size = buildATensor(ctx, ins[1]);
+  const auto& coors_range = buildATensor(ctx, ins[2]);
+
+  auto coors = buildATensor(ctx, outs[0]);
+
+  dynamic_voxelize_forward(points, voxel_size, coors_range, coors, NDim);
+}
+
+PARROTS_EXTENSION_REGISTER(hard_voxelize_forward)
+    .attr("max_points")
+    .attr("max_voxels")
+    .attr("NDim")
+    .attr("deterministic")
+    .input(3)
+    .output(4)
+    .apply(hard_voxelize_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(hard_voxelize_forward_cuda_parrots)
+#endif
+    .done();
+
+PARROTS_EXTENSION_REGISTER(dynamic_voxelize_forward)
+    .attr("NDim")
+    .input(3)
+    .output(1)
+    .apply(dynamic_voxelize_forward_cpu_parrots)
+#ifdef MMCV_WITH_CUDA
+    .apply(dynamic_voxelize_forward_cuda_parrots)
+#endif
+    .done();
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_pytorch.h b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_pytorch.h
new file mode 100644
index 0000000000000000000000000000000000000000..0019d51912cb4b8077147e553925ab107bc216ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/parrots/voxelization_pytorch.h
@@ -0,0 +1,20 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#ifndef VOXELIZATION_PYTORCH_H
+#define VOXELIZATION_PYTORCH_H
+#include <torch/extension.h>
+using namespace at;
+
+void hard_voxelize_forward(const at::Tensor &points,
+                           const at::Tensor &voxel_size,
+                           const at::Tensor &coors_range, at::Tensor &voxels,
+                           at::Tensor &coors, at::Tensor &num_points_per_voxel,
+                           at::Tensor &voxel_num, const int max_points,
+                           const int max_voxels, const int NDim = 3,
+                           const bool deterministic = true);
+
+void dynamic_voxelize_forward(const at::Tensor &points,
+                              const at::Tensor &voxel_size,
+                              const at::Tensor &coors_range, at::Tensor &coors,
+                              const int NDim = 3);
+
+#endif  // VOXELIZATION_PYTORCH_H
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/active_rotated_filter.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/active_rotated_filter.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e1ead1f8e4700d019fff7b25034e2475087040c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/active_rotated_filter.cpp
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/orn/src/ActiveRotatingFilter.h
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void active_rotated_filter_forward_impl(const Tensor input,
+                                        const Tensor indices, Tensor output) {
+  DISPATCH_DEVICE_IMPL(active_rotated_filter_forward_impl, input, indices,
+                       output);
+}
+
+void active_rotated_filter_backward_impl(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in) {
+  DISPATCH_DEVICE_IMPL(active_rotated_filter_backward_impl, grad_out, indices,
+                       grad_in);
+}
+
+void active_rotated_filter_forward(const Tensor input, const Tensor indices,
+                                   Tensor output) {
+  active_rotated_filter_forward_impl(input, indices, output);
+}
+
+void active_rotated_filter_backward(const Tensor grad_out, const Tensor indices,
+                                    Tensor grad_in) {
+  active_rotated_filter_backward_impl(grad_out, indices, grad_in);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/assign_score_withk.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/assign_score_withk.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9076277181c48c7c8f236cb9da79a83c5d38d47f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/assign_score_withk.cpp
@@ -0,0 +1,42 @@
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/paconv_lib/src/gpu
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void assign_score_withk_forward_impl(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output) {
+  DISPATCH_DEVICE_IMPL(assign_score_withk_forward_impl, B, N0, N1, M, K, O,
+                       aggregate, points, centers, scores, knn_idx, output);
+}
+
+void assign_score_withk_backward_impl(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores) {
+  DISPATCH_DEVICE_IMPL(assign_score_withk_backward_impl, B, N0, N1, M, K, O,
+                       aggregate, grad_out, points, centers, scores, knn_idx,
+                       grad_points, grad_centers, grad_scores);
+}
+
+void assign_score_withk_forward(const Tensor& points, const Tensor& centers,
+                                const Tensor& scores, const Tensor& knn_idx,
+                                Tensor& output, int B, int N0, int N1, int M,
+                                int K, int O, int aggregate) {
+  assign_score_withk_forward_impl(B, N0, N1, M, K, O, aggregate, points,
+                                  centers, scores, knn_idx, output);
+}
+
+void assign_score_withk_backward(const Tensor& grad_out, const Tensor& points,
+                                 const Tensor& centers, const Tensor& scores,
+                                 const Tensor& knn_idx, Tensor& grad_points,
+                                 Tensor& grad_centers, Tensor& grad_scores,
+                                 int B, int N0, int N1, int M, int K, int O,
+                                 int aggregate) {
+  assign_score_withk_backward_impl(B, N0, N1, M, K, O, aggregate, grad_out,
+                                   points, centers, scores, knn_idx,
+                                   grad_points, grad_centers, grad_scores);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ball_query.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ball_query.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b0534db5ce136ed43e2f72a497281fb1968f41f9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ball_query.cpp
@@ -0,0 +1,38 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void ball_query_forward_impl(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx) {
+  DISPATCH_DEVICE_IMPL(ball_query_forward_impl, b, n, m, min_radius, max_radius,
+                       nsample, new_xyz, xyz, idx);
+}
+
+void ball_query_forward(Tensor new_xyz_tensor, Tensor xyz_tensor,
+                        Tensor idx_tensor, int b, int n, int m,
+                        float min_radius, float max_radius, int nsample) {
+  ball_query_forward_impl(b, n, m, min_radius, max_radius, nsample,
+                          new_xyz_tensor, xyz_tensor, idx_tensor);
+}
+
+void stack_ball_query_forward_impl(float max_radius, int nsample,
+                                   const Tensor new_xyz,
+                                   const Tensor new_xyz_batch_cnt,
+                                   const Tensor xyz, const Tensor xyz_batch_cnt,
+                                   Tensor idx) {
+  DISPATCH_DEVICE_IMPL(stack_ball_query_forward_impl, max_radius, nsample,
+                       new_xyz, new_xyz_batch_cnt, xyz, xyz_batch_cnt, idx);
+}
+
+void stack_ball_query_forward(Tensor new_xyz_tensor, Tensor new_xyz_batch_cnt,
+                              Tensor xyz_tensor, Tensor xyz_batch_cnt,
+                              Tensor idx_tensor, float max_radius,
+                              int nsample) {
+  stack_ball_query_forward_impl(max_radius, nsample, new_xyz_tensor,
+                                new_xyz_batch_cnt, xyz_tensor, xyz_batch_cnt,
+                                idx_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bbox_overlaps.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bbox_overlaps.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..187216fb01a307906a6fff8d7c10fc4efa1b9b3a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bbox_overlaps.cpp
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset) {
+  DISPATCH_DEVICE_IMPL(bbox_overlaps_impl, bboxes1, bboxes2, ious, mode,
+                       aligned, offset);
+}
+
+void bbox_overlaps(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                   const int mode, const bool aligned, const int offset) {
+  bbox_overlaps_impl(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bezier_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bezier_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b8521d66cb2ec1dae27cb215f8a42c8a61709073
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/bezier_align.cpp
@@ -0,0 +1,38 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void bezier_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                               int aligned_height, int aligned_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned) {
+  DISPATCH_DEVICE_IMPL(bezier_align_forward_impl, input, rois, output,
+                       aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, aligned);
+}
+
+void bezier_align_backward_impl(Tensor grad_output, Tensor rois,
+                                Tensor grad_input, int aligned_height,
+                                int aligned_width, float spatial_scale,
+                                int sampling_ratio, bool aligned) {
+  DISPATCH_DEVICE_IMPL(bezier_align_backward_impl, grad_output, rois,
+                       grad_input, aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, aligned);
+}
+
+void bezier_align_forward(Tensor input, Tensor rois, Tensor output,
+                          int aligned_height, int aligned_width,
+                          float spatial_scale, int sampling_ratio,
+                          bool aligned) {
+  bezier_align_forward_impl(input, rois, output, aligned_height, aligned_width,
+                            spatial_scale, sampling_ratio, aligned);
+}
+
+void bezier_align_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                           int aligned_height, int aligned_width,
+                           float spatial_scale, int sampling_ratio,
+                           bool aligned) {
+  bezier_align_backward_impl(grad_output, rois, grad_input, aligned_height,
+                             aligned_width, spatial_scale, sampling_ratio,
+                             aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/border_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/border_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..565de689913413ab106884365e6dc1edfa940de0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/border_align.cpp
@@ -0,0 +1,30 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void border_align_forward_impl(const Tensor &input, const Tensor &boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size) {
+  DISPATCH_DEVICE_IMPL(border_align_forward_impl, input, boxes, output,
+                       argmax_idx, pool_size);
+}
+
+void border_align_backward_impl(const Tensor &grad_output, const Tensor &boxes,
+                                const Tensor &argmax_idx, Tensor grad_input,
+                                const int pool_size) {
+  DISPATCH_DEVICE_IMPL(border_align_backward_impl, grad_output, boxes,
+                       argmax_idx, grad_input, pool_size);
+}
+
+void border_align_forward(const Tensor &input, const Tensor &boxes,
+                          Tensor output, Tensor argmax_idx,
+                          const int pool_size) {
+  border_align_forward_impl(input, boxes, output, argmax_idx, pool_size);
+}
+
+void border_align_backward(const Tensor &grad_output, const Tensor &boxes,
+                           const Tensor &argmax_idx, Tensor grad_input,
+                           const int pool_size) {
+  border_align_backward_impl(grad_output, boxes, argmax_idx, grad_input,
+                             pool_size);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_quadri.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_quadri.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..48c928106d955e983e824efc1682ad6c0c514791
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_quadri.cpp
@@ -0,0 +1,17 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void box_iou_quadri_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned) {
+  DISPATCH_DEVICE_IMPL(box_iou_quadri_impl, boxes1, boxes2, ious, mode_flag,
+                       aligned);
+}
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+void box_iou_quadri(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                    const int mode_flag, const bool aligned) {
+  box_iou_quadri_impl(boxes1, boxes2, ious, mode_flag, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a2a4e0953a5575f72c167bd668c6b6e758ebae87
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/box_iou_rotated.cpp
@@ -0,0 +1,19 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated.h
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void box_iou_rotated_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned) {
+  DISPATCH_DEVICE_IMPL(box_iou_rotated_impl, boxes1, boxes2, ious, mode_flag,
+                       aligned);
+}
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+void box_iou_rotated(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                     const int mode_flag, const bool aligned) {
+  box_iou_rotated_impl(boxes1, boxes2, ious, mode_flag, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a563aed94f04e32614e38062c4e7f4250c6dafe6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe.cpp
@@ -0,0 +1,38 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void carafe_forward_impl(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_forward_impl, features, masks, rfeatures, routput,
+                       rmasks, output, kernel_size, group_size, scale_factor);
+}
+
+void carafe_backward_impl(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_backward_impl, top_grad, rfeatures, masks,
+                       rtop_grad, rbottom_grad_hs, rbottom_grad, rmask_grad,
+                       bottom_grad, mask_grad, kernel_size, group_size,
+                       scale_factor);
+}
+
+void carafe_forward(Tensor features, Tensor masks, Tensor rfeatures,
+                    Tensor routput, Tensor rmasks, Tensor output,
+                    int kernel_size, int group_size, int scale_factor) {
+  carafe_forward_impl(features, masks, rfeatures, routput, rmasks, output,
+                      kernel_size, group_size, scale_factor);
+}
+
+void carafe_backward(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                     Tensor rtop_grad, Tensor rbottom_grad_hs,
+                     Tensor rbottom_grad, Tensor rmask_grad, Tensor bottom_grad,
+                     Tensor mask_grad, int kernel_size, int group_size,
+                     int scale_factor) {
+  carafe_backward_impl(top_grad, rfeatures, masks, rtop_grad, rbottom_grad_hs,
+                       rbottom_grad, rmask_grad, bottom_grad, mask_grad,
+                       kernel_size, group_size, scale_factor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe_naive.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe_naive.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6e8917a61d93c7e6613566902cb00623ea89444e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/carafe_naive.cpp
@@ -0,0 +1,32 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void carafe_naive_forward_impl(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_naive_forward_impl, features, masks, output,
+                       kernel_size, group_size, scale_factor);
+}
+
+void carafe_naive_backward_impl(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor) {
+  DISPATCH_DEVICE_IMPL(carafe_naive_backward_impl, top_grad, features, masks,
+                       bottom_grad, mask_grad, kernel_size, group_size,
+                       scale_factor);
+}
+
+void carafe_naive_forward(Tensor features, Tensor masks, Tensor output,
+                          int kernel_size, int group_size, int scale_factor) {
+  carafe_naive_forward_impl(features, masks, output, kernel_size, group_size,
+                            scale_factor);
+}
+
+void carafe_naive_backward(Tensor top_grad, Tensor features, Tensor masks,
+                           Tensor bottom_grad, Tensor mask_grad,
+                           int kernel_size, int group_size, int scale_factor) {
+  carafe_naive_backward_impl(top_grad, features, masks, bottom_grad, mask_grad,
+                             kernel_size, group_size, scale_factor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/chamfer_distance.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/chamfer_distance.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..dcff69893185d7cc52d8048d300b45ccfe0b3968
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/chamfer_distance.cpp
@@ -0,0 +1,35 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/chrdiller/pyTorchChamferDistance/blob/master/chamfer_distance/chamfer_distance.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void chamfer_distance_forward_impl(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2) {
+  DISPATCH_DEVICE_IMPL(chamfer_distance_forward_impl, xyz1, xyz2, dist1, dist2,
+                       idx1, idx2);
+}
+
+void chamfer_distance_backward_impl(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2) {
+  DISPATCH_DEVICE_IMPL(chamfer_distance_backward_impl, xyz1, xyz2, idx1, idx2,
+                       graddist1, graddist2, gradxyz1, gradxyz2);
+}
+
+void chamfer_distance_forward(const Tensor xyz1, const Tensor xyz2,
+                              const Tensor dist1, const Tensor dist2,
+                              const Tensor idx1, const Tensor idx2) {
+  chamfer_distance_forward_impl(xyz1, xyz2, dist1, dist2, idx1, idx2);
+}
+
+void chamfer_distance_backward(const Tensor xyz1, const Tensor xyz2,
+                               Tensor idx1, Tensor idx2, Tensor graddist1,
+                               Tensor graddist2, Tensor gradxyz1,
+                               Tensor gradxyz2) {
+  chamfer_distance_backward_impl(xyz1, xyz2, idx1, idx2, graddist1, graddist2,
+                                 gradxyz1, gradxyz2);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/contour_expand.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/contour_expand.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..586c48ee44b6b7dbb24573b4a2d2ecf499a56d0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/contour_expand.cpp
@@ -0,0 +1,111 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// It is modified from https://github.com/whai362/PSENet
+#include <iostream>
+#include <queue>
+
+#include "pytorch_cpp_helper.hpp"
+
+using namespace std;
+
+class Point2d {
+ public:
+  int x;
+  int y;
+
+  Point2d() : x(0), y(0) {}
+  Point2d(int _x, int _y) : x(_x), y(_y) {}
+};
+
+void kernel_dilate(const uint8_t *data, IntArrayRef data_shape,
+                   const int *label_map, int &label_num, int &min_area,
+                   vector<vector<int>> &text_line) {
+  std::vector<int> area(label_num + 1);
+  int kernel_num = data_shape[0];
+  int height = data_shape[1];
+  int width = data_shape[2];
+
+  for (int x = 0; x < height; ++x) {
+    for (int y = 0; y < width; ++y) {
+      int label = label_map[x * width + y];
+      if (label == 0) continue;
+      area[label] += 1;
+    }
+  }
+
+  queue<Point2d> queue, next_queue;
+  for (int x = 0; x < height; ++x) {
+    vector<int> row(width);
+    for (int y = 0; y < width; ++y) {
+      int label = label_map[x * width + y];
+      if (label == 0) continue;
+      if (area[label] < min_area) continue;
+
+      Point2d point(x, y);
+      queue.push(point);
+      row[y] = label;
+    }
+    text_line.emplace_back(row);
+  }
+
+  int dx[] = {-1, 1, 0, 0};
+  int dy[] = {0, 0, -1, 1};
+  vector<int> kernel_step(kernel_num);
+  std::for_each(kernel_step.begin(), kernel_step.end(),
+                [=](int &k) { return k * height * width; });
+
+  for (int kernel_id = kernel_num - 2; kernel_id >= 0; --kernel_id) {
+    while (!queue.empty()) {
+      Point2d point = queue.front();
+      queue.pop();
+      int x = point.x;
+      int y = point.y;
+      int label = text_line[x][y];
+
+      bool is_edge = true;
+      for (int d = 0; d < 4; ++d) {
+        int tmp_x = x + dx[d];
+        int tmp_y = y + dy[d];
+
+        if (tmp_x < 0 || tmp_x >= height) continue;
+        if (tmp_y < 0 || tmp_y >= width) continue;
+        int kernel_value = data[kernel_step[kernel_id] + tmp_x * width + tmp_y];
+        if (kernel_value == 0) continue;
+        if (text_line[tmp_x][tmp_y] > 0) continue;
+
+        Point2d point(tmp_x, tmp_y);
+        queue.push(point);
+        text_line[tmp_x][tmp_y] = label;
+        is_edge = false;
+      }
+
+      if (is_edge) {
+        next_queue.push(point);
+      }
+    }
+    swap(queue, next_queue);
+  }
+}
+
+std::vector<std::vector<int>> contour_expand(Tensor kernel_mask,
+                                             Tensor internal_kernel_label,
+                                             int min_kernel_area,
+                                             int kernel_num) {
+  kernel_mask = kernel_mask.contiguous();
+  internal_kernel_label = internal_kernel_label.contiguous();
+  assert(kernel_mask.dim() == 3);
+  assert(internal_kernel_label.dim() == 2);
+  assert(kernel_mask.size(1) == internal_kernel_label.size(0));
+  assert(kernel_mask.size(2) == internal_kernel_label.size(1));
+  CHECK_CPU_INPUT(kernel_mask);
+  CHECK_CPU_INPUT(internal_kernel_label);
+  auto ptr_data = kernel_mask.data_ptr<uint8_t>();
+  IntArrayRef data_shape = kernel_mask.sizes();
+
+  auto data_label_map = internal_kernel_label.data_ptr<int32_t>();
+  vector<vector<int>> text_line;
+
+  kernel_dilate(ptr_data, data_shape, data_label_map, kernel_num,
+                min_kernel_area, text_line);
+
+  return text_line;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/convex_iou.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/convex_iou.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..79f2028b551c474453aff2f6633dd426194e4afd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/convex_iou.cpp
@@ -0,0 +1,23 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/SDL-GuoZonghao/BeyondBoundingBox/tree/main/mmdet/ops/iou/src
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void convex_iou_impl(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious) {
+  DISPATCH_DEVICE_IMPL(convex_iou_impl, pointsets, polygons, ious);
+}
+
+void convex_iou(const Tensor pointsets, const Tensor polygons, Tensor ious) {
+  convex_iou_impl(pointsets, polygons, ious);
+}
+
+void convex_giou_impl(const Tensor pointsets, const Tensor polygons,
+                      Tensor output) {
+  DISPATCH_DEVICE_IMPL(convex_giou_impl, pointsets, polygons, output);
+}
+
+void convex_giou(const Tensor pointsets, const Tensor polygons, Tensor output) {
+  convex_giou_impl(pointsets, polygons, output);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/correlation.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/correlation.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f4adba2a0c17201476352c473f1c7117af020ab2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/correlation.cpp
@@ -0,0 +1,47 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include <iostream>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void correlation_forward_impl(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW) {
+  DISPATCH_DEVICE_IMPL(correlation_forward_impl, input1, input2, output, kH, kW,
+                       patchH, patchW, padH, padW, dilationH, dilationW,
+                       dilation_patchH, dilation_patchW, dH, dW);
+}
+
+void correlation_backward_impl(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW) {
+  DISPATCH_DEVICE_IMPL(correlation_backward_impl, grad_output, input1, input2,
+                       grad_input1, grad_input2, kH, kW, patchH, patchW, padH,
+                       padW, dilationH, dilationW, dilation_patchH,
+                       dilation_patchW, dH, dW);
+}
+
+void correlation_forward(Tensor input1, Tensor input2, Tensor output, int kH,
+                         int kW, int patchH, int patchW, int padH, int padW,
+                         int dilationH, int dilationW, int dilation_patchH,
+                         int dilation_patchW, int dH, int dW) {
+  correlation_forward_impl(input1, input2, output, kH, kW, patchH, patchW, padH,
+                           padW, dilationH, dilationW, dilation_patchH,
+                           dilation_patchW, dH, dW);
+}
+
+void correlation_backward(Tensor grad_output, Tensor input1, Tensor input2,
+                          Tensor grad_input1, Tensor grad_input2, int kH,
+                          int kW, int patchH, int patchW, int padH, int padW,
+                          int dilationH, int dilationW, int dilation_patchH,
+                          int dilation_patchW, int dH, int dW) {
+  correlation_backward_impl(grad_output, input1, input2, grad_input1,
+                            grad_input2, kH, kW, patchH, patchW, padH, padW,
+                            dilationH, dilationW, dilation_patchH,
+                            dilation_patchW, dH, dW);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/active_rotated_filter.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/active_rotated_filter.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..aa5a8b3d517e9cec4cf953aa9f3de8e2fb17c3a3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/active_rotated_filter.cpp
@@ -0,0 +1,120 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/orn/src/cpu/ActiveRotatingFilter_cpu.cpp
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+void active_rotated_filter_forward_cpu_kernel(
+    const T* weightData, const int* indicesData, const int num_output_planes,
+    const int num_input_planes, const int num_orientations, const int kH,
+    const int kW, const int num_rotations, T* outputData) {
+  const int nEntry = num_orientations * kH * kW;
+  int i, j, l;
+  int k;
+
+#pragma omp parallel for private(i, j, l, k)
+  for (i = 0; i < num_output_planes; i++) {
+    for (j = 0; j < num_input_planes; j++) {
+      for (l = 0; l < nEntry; l++) {
+        int weightIndex = i * num_input_planes * nEntry + j * nEntry + l;
+        T val = *(weightData + weightIndex);
+        for (k = 0; k < num_rotations; k++) {
+          int index = (int)(*(indicesData + l * num_rotations + k)) - 1;
+          T* target = outputData +
+                      i * (num_rotations * num_input_planes * nEntry) +
+                      k * (num_input_planes * nEntry) + j * (nEntry) + index;
+          *target = val;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void active_rotated_filter_backward_cpu_kernel(
+    const T* gradOutputData, const int* indicesData,
+    const int num_output_planes, const int num_input_planes,
+    const int num_orientations, const int kH, const int kW,
+    const int num_rotations, T* gradInputData) {
+  const int nEntry = num_orientations * kH * kW;
+  int i, j, l;
+  int k;
+
+#pragma omp parallel for private(i, j, l, k)
+  for (i = 0; i < num_output_planes; i++) {
+    for (j = 0; j < num_input_planes; j++) {
+      for (l = 0; l < nEntry; l++) {
+        int gradInputIndex = i * num_input_planes * nEntry + j * nEntry + l;
+        T* val = gradInputData + gradInputIndex;
+        *val = 0;
+        for (k = 0; k < num_rotations; k++) {
+          int index = (int)(*(indicesData + l * num_rotations + k)) - 1;
+          const T* target =
+              gradOutputData + i * (num_rotations * num_input_planes * nEntry) +
+              k * (num_input_planes * nEntry) + j * (nEntry) + index;
+          *val = *val + *target;
+        }
+      }
+    }
+  }
+}
+
+void ActiveRotatedFilterForwardCPULauncher(const Tensor input,
+                                           const Tensor indices,
+                                           Tensor output) {
+  const int num_output_planes = input.size(0);
+  const int num_input_planes = input.size(1);
+  const int num_orientations = input.size(2);
+  const int kH = input.size(3);
+  const int kW = input.size(4);
+  const int num_rotations = indices.size(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "active_rotated_filter_forward_cpu_kernel", [&] {
+        active_rotated_filter_forward_cpu_kernel<scalar_t>(
+            input.data_ptr<scalar_t>(), indices.data_ptr<int>(),
+            num_output_planes, num_input_planes, num_orientations, kH, kW,
+            num_rotations, output.data_ptr<scalar_t>());
+      });
+}
+
+void ActiveRotatedFilterBackwardCPULauncher(const Tensor grad_out,
+                                            const Tensor indices,
+                                            Tensor grad_in) {
+  const int num_orientations = indices.size(0);
+  const int kH = indices.size(1);
+  const int kW = indices.size(2);
+  const int num_rotations = indices.size(3);
+  const int num_output_planes = grad_out.size(0) / num_rotations;
+  const int num_input_planes = grad_out.size(1) / num_orientations;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "active_rotated_filter_backward_cpu_kernel", [&] {
+        active_rotated_filter_backward_cpu_kernel<scalar_t>(
+            grad_out.data_ptr<scalar_t>(), indices.data_ptr<int>(),
+            num_output_planes, num_input_planes, num_orientations, kH, kW,
+            num_rotations, grad_in.data_ptr<scalar_t>());
+      });
+}
+
+void active_rotated_filter_forward_cpu(const Tensor input, const Tensor indices,
+                                       Tensor output) {
+  ActiveRotatedFilterForwardCPULauncher(input, indices, output);
+}
+
+void active_rotated_filter_backward_cpu(const Tensor grad_out,
+                                        const Tensor indices, Tensor grad_in) {
+  ActiveRotatedFilterBackwardCPULauncher(grad_out, indices, grad_in);
+}
+
+void active_rotated_filter_forward_impl(const Tensor input,
+                                        const Tensor indices, Tensor output);
+
+void active_rotated_filter_backward_impl(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in);
+
+REGISTER_DEVICE_IMPL(active_rotated_filter_forward_impl, CPU,
+                     active_rotated_filter_forward_cpu);
+REGISTER_DEVICE_IMPL(active_rotated_filter_backward_impl, CPU,
+                     active_rotated_filter_backward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/bezier_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/bezier_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7eb0e5b9402c9fb2077252a4d758e7ea6345e672
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/bezier_align.cpp
@@ -0,0 +1,447 @@
+// Modified from
+// https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/BezierAlign
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include <ATen/ATen.h>
+#include <ATen/TensorUtils.h>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+// implementation taken from Caffe2
+template <typename T>
+struct PreCalc {
+  int pos1;
+  int pos2;
+  int pos3;
+  int pos4;
+  T w1;
+  T w2;
+  T w3;
+  T w4;
+};
+
+template <typename T>
+T bezier_curve(const T p0, const T p1, const T p2, const T p3, const T u) {
+  return ((1. - u) * (1. - u) * (1. - u) * p0 +
+          3. * u * (1. - u) * (1. - u) * p1 + 3. * u * u * (1. - u) * p2 +
+          u * u * u * p3);
+}
+
+template <typename T>
+void pre_calc_for_bilinear_interpolate(
+    const int height, const int width, const int pooled_height,
+    const int pooled_width, const int iy_upper, const int ix_upper, T p0_x,
+    T p0_y, T p1_x, T p1_y, T p2_x, T p2_y, T p3_x, T p3_y, T p4_x, T p4_y,
+    T p5_x, T p5_y, T p6_x, T p6_y, T p7_x, T p7_y, T bin_size_h, T bin_size_w,
+    int roi_bin_grid_h, int roi_bin_grid_w, T offset,
+    std::vector<PreCalc<T>> &pre_calc) {
+  int pre_calc_index = 0;
+  for (int ph = 0; ph < pooled_height; ph++) {
+    for (int pw = 0; pw < pooled_width; pw++) {
+      // compute the coords
+      const T u = pw / static_cast<T>(pooled_width);
+      const T v = ph / static_cast<T>(pooled_height);
+      const T x0 = bezier_curve(p0_x, p1_x, p2_x, p3_x, u);
+      const T y0 = bezier_curve(p0_y, p1_y, p2_y, p3_y, u);
+      const T x1 = bezier_curve(p4_x, p5_x, p6_x, p7_x, u);
+      const T y1 = bezier_curve(p4_y, p5_y, p6_y, p7_y, u);
+      const T x_center = x1 * v + x0 * (1. - v) - offset;
+      const T y_center = y1 * v + y0 * (1. - v) - offset;
+      for (int iy = 0; iy < iy_upper; iy++) {
+        const T yy = y_center - (T)0.5 * bin_size_h +
+                     static_cast<T>(iy + .5f) * bin_size_h /
+                         static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+        for (int ix = 0; ix < ix_upper; ix++) {
+          const T xx = x_center - (T)0.5 * bin_size_w +
+                       static_cast<T>(ix + .5f) * bin_size_w /
+                           static_cast<T>(roi_bin_grid_w);
+
+          T x = xx;
+          T y = yy;
+          // deal with: inverse elements are out of feature map boundary
+          if (y < -1.0 || y > height || x < -1.0 || x > width) {
+            // empty
+            PreCalc<T> pc;
+            pc.pos1 = 0;
+            pc.pos2 = 0;
+            pc.pos3 = 0;
+            pc.pos4 = 0;
+            pc.w1 = 0;
+            pc.w2 = 0;
+            pc.w3 = 0;
+            pc.w4 = 0;
+            pre_calc[pre_calc_index] = pc;
+            pre_calc_index += 1;
+            continue;
+          }
+
+          if (y <= 0) {
+            y = 0;
+          }
+          if (x <= 0) {
+            x = 0;
+          }
+
+          int y_low = (int)y;
+          int x_low = (int)x;
+          int y_high;
+          int x_high;
+
+          if (y_low >= height - 1) {
+            y_high = y_low = height - 1;
+            y = (T)y_low;
+          } else {
+            y_high = y_low + 1;
+          }
+
+          if (x_low >= width - 1) {
+            x_high = x_low = width - 1;
+            x = (T)x_low;
+          } else {
+            x_high = x_low + 1;
+          }
+
+          T ly = y - y_low;
+          T lx = x - x_low;
+          T hy = 1. - ly, hx = 1. - lx;
+          T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+          // save weights and indices
+          PreCalc<T> pc;
+          pc.pos1 = y_low * width + x_low;
+          pc.pos2 = y_low * width + x_high;
+          pc.pos3 = y_high * width + x_low;
+          pc.pos4 = y_high * width + x_high;
+          pc.w1 = w1;
+          pc.w2 = w2;
+          pc.w3 = w3;
+          pc.w4 = w4;
+          pre_calc[pre_calc_index] = pc;
+
+          pre_calc_index += 1;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void BezierAlignForward(const int nthreads, const T *input, const T *rois,
+                        T *output, const int pooled_height,
+                        const int pooled_width, const T &spatial_scale,
+                        const int sampling_ratio, bool aligned,
+                        const int channels, const int height, const int width) {
+  int n_rois = nthreads / channels / pooled_width / pooled_height;
+  // (n, c, ph, pw) is an element in the pooled output
+  // can be parallelized using omp
+  // #pragma omp parallel for num_threads(32)
+  for (int n = 0; n < n_rois; n++) {
+    int index_n = n * channels * pooled_width * pooled_height;
+
+    // beziers have size Nx(1+8*2) = Nx17
+    const T *offset_rois = rois + n * 17;
+    int roi_batch_ind = offset_rois[0];
+
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    // Do not use rounding; this implementation detail is critical
+    T p0_x = offset_rois[1] * spatial_scale;
+    T p0_y = offset_rois[2] * spatial_scale;
+    T p1_x = offset_rois[3] * spatial_scale;
+    T p1_y = offset_rois[4] * spatial_scale;
+    T p2_x = offset_rois[5] * spatial_scale;
+    T p2_y = offset_rois[6] * spatial_scale;
+    T p3_x = offset_rois[7] * spatial_scale;
+    T p3_y = offset_rois[8] * spatial_scale;
+    T p4_x = offset_rois[15] * spatial_scale;
+    T p4_y = offset_rois[16] * spatial_scale;
+    T p5_x = offset_rois[13] * spatial_scale;
+    T p5_y = offset_rois[14] * spatial_scale;
+    T p6_x = offset_rois[11] * spatial_scale;
+    T p6_y = offset_rois[12] * spatial_scale;
+    T p7_x = offset_rois[9] * spatial_scale;
+    T p7_y = offset_rois[10] * spatial_scale;
+
+    T roi_width = std::max(std::abs(p0_x - p3_x), std::abs(p4_x - p7_x));
+    T roi_height = std::max(std::abs(p0_y - p3_y), std::abs(p4_y - p7_y));
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "Beziers in BezierAlign cannot have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceil(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+
+    // We do average (integral) pooling inside a bin
+    // When the grid is empty, output zeros == 0/1, instead of NaN.
+    const T count = std::max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    // we want to precalculate indices and weights shared by all channels,
+    // this is the key point of optimization
+    std::vector<PreCalc<T>> pre_calc(roi_bin_grid_h * roi_bin_grid_w *
+                                     pooled_width * pooled_height);
+    pre_calc_for_bilinear_interpolate(
+        height, width, pooled_height, pooled_width, roi_bin_grid_h,
+        roi_bin_grid_w, p0_x, p0_y, p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x,
+        p4_y, p5_x, p5_y, p6_x, p6_y, p7_x, p7_y, bin_size_h, bin_size_w,
+        roi_bin_grid_h, roi_bin_grid_w, offset, pre_calc);
+
+    for (int c = 0; c < channels; c++) {
+      int index_n_c = index_n + c * pooled_width * pooled_height;
+      const T *offset_input =
+          input + (roi_batch_ind * channels + c) * height * width;
+      int pre_calc_index = 0;
+
+      for (int ph = 0; ph < pooled_height; ph++) {
+        for (int pw = 0; pw < pooled_width; pw++) {
+          int index = index_n_c + ph * pooled_width + pw;
+
+          T output_val = 0.;
+          for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+            for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+              PreCalc<T> pc = pre_calc[pre_calc_index];
+              output_val += pc.w1 * offset_input[pc.pos1] +
+                            pc.w2 * offset_input[pc.pos2] +
+                            pc.w3 * offset_input[pc.pos3] +
+                            pc.w4 * offset_input[pc.pos4];
+
+              pre_calc_index += 1;
+            }
+          }
+          output_val /= count;
+
+          output[index] = output_val;
+        }  // for pw
+      }    // for ph
+    }      // for c
+  }        // for n
+}
+
+template <typename T>
+void bilinear_interpolate_gradient(const int height, const int width, T y, T x,
+                                   T &w1, T &w2, T &w3, T &w4, int &x_low,
+                                   int &x_high, int &y_low, int &y_high,
+                                   const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    // empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  y_low = (int)y;
+  x_low = (int)x;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+
+  // reference in forward
+  // T v1 = input[y_low * width + x_low];
+  // T v2 = input[y_low * width + x_high];
+  // T v3 = input[y_high * width + x_low];
+  // T v4 = input[y_high * width + x_high];
+  // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+}
+
+template <class T>
+inline void add(T *address, const T &val) {
+  *address += val;
+}
+
+template <typename T>
+void BezierAlignBackward(const int nthreads, const T *grad_output,
+                         const T *rois, T *grad_input, const int pooled_height,
+                         const int pooled_width, const T &spatial_scale,
+                         const int sampling_ratio, bool aligned,
+                         const int channels, const int height, const int width,
+                         const int n_stride, const int c_stride,
+                         const int h_stride, const int w_stride) {
+  for (int index = 0; index < nthreads; index++) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T *offset_rois = rois + n * 17;
+    int roi_batch_ind = offset_rois[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T p0_x = offset_rois[1] * spatial_scale;
+    T p0_y = offset_rois[2] * spatial_scale;
+    T p1_x = offset_rois[3] * spatial_scale;
+    T p1_y = offset_rois[4] * spatial_scale;
+    T p2_x = offset_rois[5] * spatial_scale;
+    T p2_y = offset_rois[6] * spatial_scale;
+    T p3_x = offset_rois[7] * spatial_scale;
+    T p3_y = offset_rois[8] * spatial_scale;
+    T p4_x = offset_rois[15] * spatial_scale;
+    T p4_y = offset_rois[16] * spatial_scale;
+    T p5_x = offset_rois[13] * spatial_scale;
+    T p5_y = offset_rois[14] * spatial_scale;
+    T p6_x = offset_rois[11] * spatial_scale;
+    T p6_y = offset_rois[12] * spatial_scale;
+    T p7_x = offset_rois[9] * spatial_scale;
+    T p7_y = offset_rois[10] * spatial_scale;
+
+    // compute the coords
+    const T u = pw / static_cast<T>(pooled_width);
+    const T v = ph / static_cast<T>(pooled_height);
+    const T x0 = bezier_curve(p0_x, p1_x, p2_x, p3_x, u);
+    const T y0 = bezier_curve(p0_y, p1_y, p2_y, p3_y, u);
+    const T x1 = bezier_curve(p4_x, p5_x, p6_x, p7_x, u);
+    const T y1 = bezier_curve(p4_y, p5_y, p6_y, p7_y, u);
+    const T x_center = x1 * v + x0 * (1. - v) - offset;
+    const T y_center = y1 * v + y0 * (1. - v) - offset;
+
+    T roi_width = std::max(std::abs(p0_x - p3_x), std::abs(p4_x - p7_x));
+    T roi_height = std::max(std::abs(p0_y - p3_y), std::abs(p4_y - p7_y));
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "Beziers in BezierAlign do not have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    T *offset_grad_input =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+
+    int output_offset = n * n_stride + c * c_stride;
+    const T *offset_grad_output = grad_output + output_offset;
+    const T grad_output_this_bin =
+        offset_grad_output[ph * h_stride + pw * w_stride];
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceil(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+      const T y = y_center - (T)0.5 * bin_size_h +
+                  static_cast<T>(iy + .5f) * bin_size_h /
+                      static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T x = x_center - (T)0.5 * bin_size_w +
+                    static_cast<T>(ix + .5f) * bin_size_w /
+                        static_cast<T>(roi_bin_grid_w);
+
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high, index);
+
+        T g1 = grad_output_this_bin * w1 / count;
+        T g2 = grad_output_this_bin * w2 / count;
+        T g3 = grad_output_this_bin * w3 / count;
+        T g4 = grad_output_this_bin * w4 / count;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          // atomic add is not needed for now since it is single threaded
+          add(offset_grad_input + y_low * width + x_low, static_cast<T>(g1));
+          add(offset_grad_input + y_low * width + x_high, static_cast<T>(g2));
+          add(offset_grad_input + y_high * width + x_low, static_cast<T>(g3));
+          add(offset_grad_input + y_high * width + x_high, static_cast<T>(g4));
+        }  // if
+      }    // ix
+    }      // iy
+  }        // for
+}  // BezierAlignBackward
+
+void BezierAlignForwardCPULauncher(Tensor input, Tensor rois, Tensor output,
+                                   int aligned_height, int aligned_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   bool aligned) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "BezierAlign_forward", [&] {
+        BezierAlignForward<scalar_t>(
+            output_size, input.data_ptr<scalar_t>(), rois.data_ptr<scalar_t>(),
+            output.data_ptr<scalar_t>(), aligned_height, aligned_width,
+            static_cast<scalar_t>(spatial_scale), sampling_ratio, aligned,
+            channels, height, width);
+      });
+}
+
+void BezierAlignBackwardCPULauncher(Tensor grad_output, Tensor rois,
+                                    Tensor grad_input, int aligned_height,
+                                    int aligned_width, float spatial_scale,
+                                    int sampling_ratio, bool aligned) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  // get stride values to ensure indexing into gradients is correct.
+  int n_stride = grad_output.stride(0);
+  int c_stride = grad_output.stride(1);
+  int h_stride = grad_output.stride(2);
+  int w_stride = grad_output.stride(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "BezierAlign_backward", [&] {
+        BezierAlignBackward<scalar_t>(
+            output_size, grad_output.data_ptr<scalar_t>(),
+            rois.data_ptr<scalar_t>(), grad_input.data_ptr<scalar_t>(),
+            aligned_height, aligned_width, static_cast<scalar_t>(spatial_scale),
+            sampling_ratio, aligned, channels, height, width, n_stride,
+            c_stride, h_stride, w_stride);
+      });
+}
+
+void bezier_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                               int aligned_height, int aligned_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned);
+
+void bezier_align_backward_impl(Tensor grad_output, Tensor rois,
+                                Tensor grad_input, int aligned_height,
+                                int aligned_width, float spatial_scale,
+                                int sampling_ratio, bool aligned);
+
+REGISTER_DEVICE_IMPL(bezier_align_forward_impl, CPU,
+                     BezierAlignForwardCPULauncher);
+REGISTER_DEVICE_IMPL(bezier_align_backward_impl, CPU,
+                     BezierAlignBackwardCPULauncher);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_quadri.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_quadri.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..211699ce2b832324848c4c6c5f7e5f90fcab97f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_quadri.cpp
@@ -0,0 +1,36 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "box_iou_rotated_utils.hpp"
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+void box_iou_quadri_cpu_kernel(const Tensor boxes1, const Tensor boxes2,
+                               Tensor ious, const int mode_flag,
+                               const bool aligned) {
+  int output_size = ious.numel();
+  auto num_boxes1 = boxes1.size(0);
+  auto num_boxes2 = boxes2.size(0);
+
+  if (aligned) {
+    for (int i = 0; i < output_size; i++) {
+      ious[i] = single_box_iou_quadri<T>(boxes1[i].data_ptr<T>(),
+                                         boxes2[i].data_ptr<T>(), mode_flag);
+    }
+  } else {
+    for (int i = 0; i < num_boxes1; i++) {
+      for (int j = 0; j < num_boxes2; j++) {
+        ious[i * num_boxes2 + j] = single_box_iou_quadri<T>(
+            boxes1[i].data_ptr<T>(), boxes2[j].data_ptr<T>(), mode_flag);
+      }
+    }
+  }
+}
+
+void box_iou_quadri_cpu(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                        const int mode_flag, const bool aligned) {
+  box_iou_quadri_cpu_kernel<float>(boxes1, boxes2, ious, mode_flag, aligned);
+}
+
+void box_iou_quadri_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned);
+REGISTER_DEVICE_IMPL(box_iou_quadri_impl, CPU, box_iou_quadri_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..585d2c9fddd1566e4898c35ce6e1f4533cd1a236
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/box_iou_rotated.cpp
@@ -0,0 +1,38 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.cpp
+#include "box_iou_rotated_utils.hpp"
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+void box_iou_rotated_cpu_kernel(const Tensor boxes1, const Tensor boxes2,
+                                Tensor ious, const int mode_flag,
+                                const bool aligned) {
+  int output_size = ious.numel();
+  auto num_boxes1 = boxes1.size(0);
+  auto num_boxes2 = boxes2.size(0);
+
+  if (aligned) {
+    for (int i = 0; i < output_size; i++) {
+      ious[i] = single_box_iou_rotated<T>(boxes1[i].data_ptr<T>(),
+                                          boxes2[i].data_ptr<T>(), mode_flag);
+    }
+  } else {
+    for (int i = 0; i < num_boxes1; i++) {
+      for (int j = 0; j < num_boxes2; j++) {
+        ious[i * num_boxes2 + j] = single_box_iou_rotated<T>(
+            boxes1[i].data_ptr<T>(), boxes2[j].data_ptr<T>(), mode_flag);
+      }
+    }
+  }
+}
+
+void box_iou_rotated_cpu(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned) {
+  box_iou_rotated_cpu_kernel<float>(boxes1, boxes2, ious, mode_flag, aligned);
+}
+
+void box_iou_rotated_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+REGISTER_DEVICE_IMPL(box_iou_rotated_impl, CPU, box_iou_rotated_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7ab67e78c7b5fb4468f47066935cb35b68525b54
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/deform_conv.cpp
@@ -0,0 +1,408 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+T deformable_im2col_bilinear_cpu(const T *input, const int data_width,
+                                 const int height, const int width, T h, T w) {
+  if (h <= -1 || height <= h || w <= -1 || width <= w) {
+    return 0;
+  }
+
+  int h_low = floor(h);
+  int w_low = floor(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+
+  T lh = h - h_low;
+  T lw = w - w_low;
+  T hh = 1 - lh, hw = 1 - lw;
+
+  T v1 = 0;
+  if (h_low >= 0 && w_low >= 0) v1 = input[h_low * data_width + w_low];
+  T v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = input[h_low * data_width + w_high];
+  T v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = input[h_high * data_width + w_low];
+  T v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = input[h_high * data_width + w_high];
+
+  T w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+
+template <typename T>
+T get_gradient_weight_cpu(T argmax_h, T argmax_w, const int h, const int w,
+                          const int height, const int width) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+
+template <typename T>
+T get_coordinate_weight_cpu(T argmax_h, T argmax_w, const int height,
+                            const int width, const T *im_data,
+                            const int data_width, const int bp_dir) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+
+  if (bp_dir == 0) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  } else if (bp_dir == 1) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+
+  return weight;
+}
+
+template <typename T>
+void deformable_im2col_cpu_kernel(
+    const int n, const T *data_im, const T *data_offset, const int height,
+    const int width, const int kernel_h, const int kernel_w, const int pad_h,
+    const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int num_channels, const int deformable_group, const int height_col,
+    const int width_col, T *data_col) {
+  for (int index = 0; index < n; index++) {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+    T *data_col_ptr =
+        data_col +
+        ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    const T *data_im_ptr =
+        data_im + (b_col * num_channels + c_im) * height * width;
+    const T *data_offset_ptr =
+        data_offset + (b_col * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    for (int i = 0; i < kernel_h; ++i) {
+      for (int j = 0; j < kernel_w; ++j) {
+        const int data_offset_h_ptr =
+            ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr =
+            ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col +
+            w_col;
+        const T offset_h = data_offset_ptr[data_offset_h_ptr];
+        const T offset_w = data_offset_ptr[data_offset_w_ptr];
+        T val = static_cast<T>(0);
+        const T h_im = h_in + i * dilation_h + offset_h;
+        const T w_im = w_in + j * dilation_w + offset_w;
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+          val = deformable_im2col_bilinear_cpu(data_im_ptr, width, height,
+                                               width, h_im, w_im);
+        *data_col_ptr = val;
+        data_col_ptr += batch_size * height_col * width_col;
+      }
+    }
+  }
+}
+
+template <typename T>
+void deformable_col2im_cpu_kernel(
+    const int n, const T *data_col, const T *data_offset, const int channels,
+    const int height, const int width, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int deformable_group, const int height_col, const int width_col,
+    T *grad_im) {
+  for (int index = 0; index < n; index++) {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i =
+        (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c =
+        index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / channel_per_deformable_group;
+
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const int data_offset_h_ptr =
+        ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr =
+        ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const T offset_h = data_offset_ptr[data_offset_h_ptr];
+    const T offset_w = data_offset_ptr[data_offset_w_ptr];
+    const T cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const T cur_inv_w_data = w_in + j * dilation_w + offset_w;
+
+    const T cur_top_grad = data_col[index];
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++) {
+      for (int dx = -2; dx <= 2; dx++) {
+        if (cur_h + dy >= 0 && cur_h + dy < height && cur_w + dx >= 0 &&
+            cur_w + dx < width && abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1) {
+          int cur_bottom_grad_pos =
+              ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          T weight =
+              get_gradient_weight_cpu(cur_inv_h_data, cur_inv_w_data,
+                                      cur_h + dy, cur_w + dx, height, width);
+          *(grad_im + cur_bottom_grad_pos) += weight * cur_top_grad;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void deformable_col2im_coord_cpu_kernel(
+    const int n, const T *data_col, const T *data_im, const T *data_offset,
+    const int channels, const int height, const int width, const int kernel_h,
+    const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
+    const int stride_w, const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int offset_channels, const int deformable_group, const int height_col,
+    const int width_col, T *grad_offset) {
+  for (int index = 0; index < n; index++) {
+    T val = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const T *data_col_ptr = data_col + deformable_group_index *
+                                           channel_per_deformable_group *
+                                           batch_size * width_col * height_col;
+    const T *data_im_ptr =
+        data_im + (b * deformable_group + deformable_group_index) *
+                      channel_per_deformable_group / kernel_h / kernel_w *
+                      height * width;
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group;
+         col_c += col_step) {
+      const int col_pos =
+          (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i =
+          (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr =
+          (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr =
+          (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col +
+           w_out);
+      const T offset_h = data_offset_ptr[data_offset_h_ptr];
+      const T offset_w = data_offset_ptr[data_offset_w_ptr];
+      T inv_h = h_in + i * dilation_h + offset_h;
+      T inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+        inv_h = inv_w = -2;
+      const T weight = get_coordinate_weight_cpu(
+          inv_h, inv_w, height, width, data_im_ptr + cnt * height * width,
+          width, bp_dir);
+      val += weight * data_col_ptr[col_pos];
+      cnt += 1;
+    }
+
+    grad_offset[index] = val;
+  }
+}
+
+void deformable_im2col_cpu(Tensor data_im, Tensor data_offset,
+                           const int channels, const int height,
+                           const int width, const int ksize_h,
+                           const int ksize_w, const int pad_h, const int pad_w,
+                           const int stride_h, const int stride_w,
+                           const int dilation_h, const int dilation_w,
+                           const int parallel_imgs, const int deformable_group,
+                           Tensor data_col) {
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = channels * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.scalar_type(), "deformable_im2col_cpu", [&] {
+        deformable_im2col_cpu_kernel<scalar_t>(
+            num_kernels, data_im.data_ptr<scalar_t>(),
+            data_offset.data_ptr<scalar_t>(), height, width, ksize_h, ksize_w,
+            pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w,
+            channel_per_deformable_group, parallel_imgs, channels,
+            deformable_group, height_col, width_col,
+            data_col.data_ptr<scalar_t>());
+      });
+}
+
+void deformable_col2im_cpu(Tensor data_col, Tensor data_offset,
+                           const int channels, const int height,
+                           const int width, const int ksize_h,
+                           const int ksize_w, const int pad_h, const int pad_w,
+                           const int stride_h, const int stride_w,
+                           const int dilation_h, const int dilation_w,
+                           const int parallel_imgs, const int deformable_group,
+                           Tensor grad_im) {
+  // todo: make sure parallel_imgs is passed in correctly
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels =
+      channels * ksize_h * ksize_w * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "deformable_col2im_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data_ptr<scalar_t>();
+
+        deformable_col2im_cpu_kernel<scalar_t>(
+            num_kernels, data_col_, data_offset_, channels, height, width,
+            ksize_h, ksize_w, pad_h, pad_w, stride_h, stride_w, dilation_h,
+            dilation_w, channel_per_deformable_group, parallel_imgs,
+            deformable_group, height_col, width_col, grad_im_);
+      }));
+}
+
+void deformable_col2im_coord_cpu(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset) {
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = height_col * width_col * 2 * ksize_h * ksize_w *
+                    deformable_group * parallel_imgs;
+  int channel_per_deformable_group =
+      channels * ksize_h * ksize_w / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "deformable_col2im_coord_cpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data_ptr<scalar_t>();
+
+        deformable_col2im_coord_cpu_kernel<scalar_t>(
+            num_kernels, data_col_, data_im_, data_offset_, channels, height,
+            width, ksize_h, ksize_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group, parallel_imgs,
+            2 * ksize_h * ksize_w * deformable_group, deformable_group,
+            height_col, width_col, grad_offset_);
+      }));
+}
+
+void deformable_im2col_impl(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col);
+
+void deformable_col2im_impl(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im);
+
+void deformable_col2im_coord_impl(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset);
+
+REGISTER_DEVICE_IMPL(deformable_im2col_impl, CPU, deformable_im2col_cpu);
+REGISTER_DEVICE_IMPL(deformable_col2im_impl, CPU, deformable_col2im_cpu);
+REGISTER_DEVICE_IMPL(deformable_col2im_coord_impl, CPU,
+                     deformable_col2im_coord_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/modulated_deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/modulated_deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..95390956450d062a37eaec98664aff11a8035587
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/modulated_deform_conv.cpp
@@ -0,0 +1,436 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+T dmcn_im2col_bilinear_cpu(const T *input, const int data_width,
+                           const int height, const int width, T h, T w) {
+  int h_low = floorf(h);
+  int w_low = floorf(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+
+  T lh = h - h_low;
+  T lw = w - w_low;
+  T hh = 1 - lh, hw = 1 - lw;
+
+  T v1 = 0;
+  if (h_low >= 0 && w_low >= 0) v1 = input[h_low * data_width + w_low];
+  T v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = input[h_low * data_width + w_high];
+  T v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = input[h_high * data_width + w_low];
+  T v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = input[h_high * data_width + w_high];
+
+  T w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+
+template <typename T>
+T dmcn_get_gradient_weight_cpu(T argmax_h, T argmax_w, const int h, const int w,
+                               const int height, const int width) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+
+template <typename T>
+T dmcn_get_coordinate_weight_cpu(T argmax_h, T argmax_w, const int height,
+                                 const int width, const T *im_data,
+                                 const int data_width, const int bp_dir) {
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 ||
+      argmax_w >= width) {
+    // empty
+    return 0;
+  }
+
+  int argmax_h_low = floorf(argmax_h);
+  int argmax_w_low = floorf(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+
+  T weight = 0;
+
+  if (bp_dir == 0) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  } else if (bp_dir == 1) {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) *
+                im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) *
+                im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+
+  return weight;
+}
+
+template <typename T>
+void modulated_deformable_im2col_cpu_kernel(
+    const int n, const T *data_im, const T *data_offset, const T *data_mask,
+    const int height, const int width, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int num_channels, const int deformable_group, const int height_col,
+    const int width_col, T *data_col) {
+  for (int index = 0; index < n; index++) {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+
+    T *data_col_ptr =
+        data_col +
+        ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    const T *data_im_ptr =
+        data_im + (b_col * num_channels + c_im) * height * width;
+    const T *data_offset_ptr =
+        data_offset + (b_col * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+
+    const T *data_mask_ptr =
+        data_mask + (b_col * deformable_group + deformable_group_index) *
+                        kernel_h * kernel_w * height_col * width_col;
+
+    for (int i = 0; i < kernel_h; ++i) {
+      for (int j = 0; j < kernel_w; ++j) {
+        const int data_offset_h_ptr =
+            ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr =
+            ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col +
+            w_col;
+        const int data_mask_hw_ptr =
+            ((i * kernel_w + j) * height_col + h_col) * width_col + w_col;
+        const T offset_h = data_offset_ptr[data_offset_h_ptr];
+        const T offset_w = data_offset_ptr[data_offset_w_ptr];
+        const T mask = data_mask_ptr[data_mask_hw_ptr];
+        T val = static_cast<T>(0);
+        const T h_im = h_in + i * dilation_h + offset_h;
+        const T w_im = w_in + j * dilation_w + offset_w;
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+          val = dmcn_im2col_bilinear_cpu(data_im_ptr, width, height, width,
+                                         h_im, w_im);
+        *data_col_ptr = val * mask;
+        data_col_ptr += batch_size * height_col * width_col;
+      }
+    }
+  }
+}
+
+template <typename T>
+void modulated_deformable_col2im_cpu_kernel(
+    const int n, const T *data_col, const T *data_offset, const T *data_mask,
+    const int channels, const int height, const int width, const int kernel_h,
+    const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
+    const int stride_w, const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group, const int batch_size,
+    const int deformable_group, const int height_col, const int width_col,
+    T *grad_im) {
+  for (int index = 0; index < n; index++) {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i =
+        (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c =
+        index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / channel_per_deformable_group;
+
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const T *data_mask_ptr =
+        data_mask + (b * deformable_group + deformable_group_index) * kernel_h *
+                        kernel_w * height_col * width_col;
+    const int data_offset_h_ptr =
+        ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr =
+        ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const int data_mask_hw_ptr =
+        ((i * kernel_w + j) * height_col + h_out) * width_col + w_out;
+    const T offset_h = data_offset_ptr[data_offset_h_ptr];
+    const T offset_w = data_offset_ptr[data_offset_w_ptr];
+    const T mask = data_mask_ptr[data_mask_hw_ptr];
+    const T cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const T cur_inv_w_data = w_in + j * dilation_w + offset_w;
+
+    const T cur_top_grad = data_col[index] * mask;
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++) {
+      for (int dx = -2; dx <= 2; dx++) {
+        if (cur_h + dy >= 0 && cur_h + dy < height && cur_w + dx >= 0 &&
+            cur_w + dx < width && abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1) {
+          int cur_bottom_grad_pos =
+              ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          T weight = dmcn_get_gradient_weight_cpu(cur_inv_h_data,
+                                                  cur_inv_w_data, cur_h + dy,
+                                                  cur_w + dx, height, width);
+          *(grad_im + cur_bottom_grad_pos) += weight * cur_top_grad;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void modulated_deformable_col2im_coord_cpu_kernel(
+    const int n, const T *data_col, const T *data_im, const T *data_offset,
+    const T *data_mask, const int channels, const int height, const int width,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int channel_per_deformable_group,
+    const int batch_size, const int offset_channels, const int deformable_group,
+    const int height_col, const int width_col, T *grad_offset, T *grad_mask) {
+  for (int index = 0; index < n; index++) {
+    T val = 0, mval = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const T *data_col_ptr = data_col + deformable_group_index *
+                                           channel_per_deformable_group *
+                                           batch_size * width_col * height_col;
+    const T *data_im_ptr =
+        data_im + (b * deformable_group + deformable_group_index) *
+                      channel_per_deformable_group / kernel_h / kernel_w *
+                      height * width;
+    const T *data_offset_ptr =
+        data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                          kernel_h * kernel_w * height_col * width_col;
+    const T *data_mask_ptr =
+        data_mask + (b * deformable_group + deformable_group_index) * kernel_h *
+                        kernel_w * height_col * width_col;
+
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group;
+         col_c += col_step) {
+      const int col_pos =
+          (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i =
+          (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr =
+          (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr =
+          (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col +
+           w_out);
+      const int data_mask_hw_ptr =
+          (((i * kernel_w + j) * height_col + h_out) * width_col + w_out);
+      const T offset_h = data_offset_ptr[data_offset_h_ptr];
+      const T offset_w = data_offset_ptr[data_offset_w_ptr];
+      const T mask = data_mask_ptr[data_mask_hw_ptr];
+      T inv_h = h_in + i * dilation_h + offset_h;
+      T inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+        inv_h = inv_w = -2;
+      else
+        mval += data_col_ptr[col_pos] *
+                dmcn_im2col_bilinear_cpu(data_im_ptr + cnt * height * width,
+                                         width, height, width, inv_h, inv_w);
+      const T weight = dmcn_get_coordinate_weight_cpu(
+          inv_h, inv_w, height, width, data_im_ptr + cnt * height * width,
+          width, bp_dir);
+      val += weight * data_col_ptr[col_pos] * mask;
+      cnt += 1;
+    }
+    // KERNEL_ASSIGN(grad_offset[index], offset_req, val);
+    grad_offset[index] = val;
+    if (offset_c % 2 == 0)
+      // KERNEL_ASSIGN(grad_mask[(((b * deformable_group +
+      // deformable_group_index) * kernel_h * kernel_w + offset_c / 2) *
+      // height_col + h) * width_col + w], mask_req, mval);
+      grad_mask[(((b * deformable_group + deformable_group_index) * kernel_h *
+                      kernel_w +
+                  offset_c / 2) *
+                     height_col +
+                 h) *
+                    width_col +
+                w] = mval;
+  }
+}
+
+void modulated_deformable_im2col_cpu(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col) {
+  // num_axes should be smaller than block size
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels = channels * batch_size * height_col * width_col;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.scalar_type(), "modulated_deformable_im2col_cpu", ([&] {
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+
+        modulated_deformable_im2col_cpu_kernel(
+            num_kernels, data_im_, data_offset_, data_mask_, height_im,
+            width_im, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group, batch_size,
+            channels, deformable_group, height_col, width_col, data_col_);
+      }));
+}
+
+void modulated_deformable_col2im_cpu(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im) {
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels =
+      channels * kernel_h * kernel_w * batch_size * height_col * width_col;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "modulated_deformable_col2im_cpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data_ptr<scalar_t>();
+
+        modulated_deformable_col2im_cpu_kernel(
+            num_kernels, data_col_, data_offset_, data_mask_, channels,
+            height_im, width_im, kernel_h, kernel_w, pad_h, pad_w, stride_h,
+            stride_w, dilation_h, dilation_w, channel_per_deformable_group,
+            batch_size, deformable_group, height_col, width_col, grad_im_);
+      }));
+}
+
+void modulated_deformable_col2im_coord_cpu(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask) {
+  const int num_kernels = batch_size * height_col * width_col * 2 * kernel_h *
+                          kernel_w * deformable_group;
+  const int channel_per_deformable_group =
+      channels * kernel_h * kernel_w / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "modulated_deformable_col2im_coord_cpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data_ptr<scalar_t>();
+        scalar_t *grad_mask_ = grad_mask.data_ptr<scalar_t>();
+
+        modulated_deformable_col2im_coord_cpu_kernel(
+            num_kernels, data_col_, data_im_, data_offset_, data_mask_,
+            channels, height_im, width_im, kernel_h, kernel_w, pad_h, pad_w,
+            stride_h, stride_w, dilation_h, dilation_w,
+            channel_per_deformable_group, batch_size,
+            2 * kernel_h * kernel_w * deformable_group, deformable_group,
+            height_col, width_col, grad_offset_, grad_mask_);
+      }));
+}
+
+void modulated_deformable_im2col_impl(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col);
+
+void modulated_deformable_col2im_impl(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im);
+
+void modulated_deformable_col2im_coord_impl(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask);
+
+REGISTER_DEVICE_IMPL(modulated_deformable_im2col_impl, CPU,
+                     modulated_deformable_im2col_cpu);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_impl, CPU,
+                     modulated_deformable_col2im_cpu);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_coord_impl, CPU,
+                     modulated_deformable_col2im_coord_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..53e9b9a8d82c405e8f923be06f78cda730c0f4ee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms.cpp
@@ -0,0 +1,230 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor nms_cpu(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  if (boxes.numel() == 0) {
+    return at::empty({0}, boxes.options().dtype(at::kLong));
+  }
+  auto x1_t = boxes.select(1, 0).contiguous();
+  auto y1_t = boxes.select(1, 1).contiguous();
+  auto x2_t = boxes.select(1, 2).contiguous();
+  auto y2_t = boxes.select(1, 3).contiguous();
+
+  Tensor areas_t = (x2_t - x1_t + offset) * (y2_t - y1_t + offset);
+
+  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
+
+  auto nboxes = boxes.size(0);
+  Tensor select_t = at::ones({nboxes}, boxes.options().dtype(at::kBool));
+
+  auto select = select_t.data_ptr<bool>();
+  auto order = order_t.data_ptr<int64_t>();
+  auto x1 = x1_t.data_ptr<float>();
+  auto y1 = y1_t.data_ptr<float>();
+  auto x2 = x2_t.data_ptr<float>();
+  auto y2 = y2_t.data_ptr<float>();
+  auto areas = areas_t.data_ptr<float>();
+
+  for (int64_t _i = 0; _i < nboxes; _i++) {
+    if (select[_i] == false) continue;
+    auto i = order[_i];
+    auto ix1 = x1[i];
+    auto iy1 = y1[i];
+    auto ix2 = x2[i];
+    auto iy2 = y2[i];
+    auto iarea = areas[i];
+
+    for (int64_t _j = _i + 1; _j < nboxes; _j++) {
+      if (select[_j] == false) continue;
+      auto j = order[_j];
+      auto xx1 = std::max(ix1, x1[j]);
+      auto yy1 = std::max(iy1, y1[j]);
+      auto xx2 = std::min(ix2, x2[j]);
+      auto yy2 = std::min(iy2, y2[j]);
+
+      auto w = std::max(0.f, xx2 - xx1 + offset);
+      auto h = std::max(0.f, yy2 - yy1 + offset);
+      auto inter = w * h;
+      auto ovr = inter / (iarea + areas[j] - inter);
+      if (ovr > iou_threshold) select[_j] = false;
+    }
+  }
+  return order_t.masked_select(select_t);
+}
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+REGISTER_DEVICE_IMPL(nms_impl, CPU, nms_cpu);
+
+Tensor softnms_cpu(Tensor boxes, Tensor scores, Tensor dets,
+                   float iou_threshold, float sigma, float min_score,
+                   int method, int offset) {
+  if (boxes.numel() == 0) {
+    return at::empty({0}, boxes.options().dtype(at::kLong));
+  }
+
+  auto x1_t = boxes.select(1, 0).contiguous();
+  auto y1_t = boxes.select(1, 1).contiguous();
+  auto x2_t = boxes.select(1, 2).contiguous();
+  auto y2_t = boxes.select(1, 3).contiguous();
+  auto scores_t = scores.clone();
+
+  Tensor areas_t = (x2_t - x1_t + offset) * (y2_t - y1_t + offset);
+
+  auto nboxes = boxes.size(0);
+  auto x1 = x1_t.data_ptr<float>();
+  auto y1 = y1_t.data_ptr<float>();
+  auto x2 = x2_t.data_ptr<float>();
+  auto y2 = y2_t.data_ptr<float>();
+  auto sc = scores_t.data_ptr<float>();
+  auto areas = areas_t.data_ptr<float>();
+  auto de = dets.data_ptr<float>();
+
+  int64_t pos = 0;
+  Tensor inds_t = at::arange(nboxes, boxes.options().dtype(at::kLong));
+  auto inds = inds_t.data_ptr<int64_t>();
+
+  for (int64_t i = 0; i < nboxes; i++) {
+    auto max_score = sc[i];
+    auto max_pos = i;
+
+    pos = i + 1;
+    // get max box
+    while (pos < nboxes) {
+      if (max_score < sc[pos]) {
+        max_score = sc[pos];
+        max_pos = pos;
+      }
+      pos = pos + 1;
+    }
+    // swap
+    auto ix1 = de[i * 5 + 0] = x1[max_pos];
+    auto iy1 = de[i * 5 + 1] = y1[max_pos];
+    auto ix2 = de[i * 5 + 2] = x2[max_pos];
+    auto iy2 = de[i * 5 + 3] = y2[max_pos];
+    auto iscore = de[i * 5 + 4] = sc[max_pos];
+    auto iarea = areas[max_pos];
+    auto iind = inds[max_pos];
+    x1[max_pos] = x1[i];
+    y1[max_pos] = y1[i];
+    x2[max_pos] = x2[i];
+    y2[max_pos] = y2[i];
+    sc[max_pos] = sc[i];
+    areas[max_pos] = areas[i];
+    inds[max_pos] = inds[i];
+    x1[i] = ix1;
+    y1[i] = iy1;
+    x2[i] = ix2;
+    y2[i] = iy2;
+    sc[i] = iscore;
+    areas[i] = iarea;
+    inds[i] = iind;
+
+    pos = i + 1;
+    while (pos < nboxes) {
+      auto xx1 = std::max(ix1, x1[pos]);
+      auto yy1 = std::max(iy1, y1[pos]);
+      auto xx2 = std::min(ix2, x2[pos]);
+      auto yy2 = std::min(iy2, y2[pos]);
+
+      auto w = std::max(0.f, xx2 - xx1 + offset);
+      auto h = std::max(0.f, yy2 - yy1 + offset);
+      auto inter = w * h;
+      auto ovr = inter / (iarea + areas[pos] - inter);
+
+      float weight = 1.;
+      if (method == 0) {
+        if (ovr >= iou_threshold) weight = 0;
+      } else if (method == 1) {
+        if (ovr >= iou_threshold) weight = 1 - ovr;
+      } else if (method == 2) {
+        weight = std::exp(-(ovr * ovr) / sigma);
+      }
+      sc[pos] *= weight;
+      // if box score falls below threshold, discard the box by
+      // swapping with last box update N
+      if (sc[pos] < min_score) {
+        x1[pos] = x1[nboxes - 1];
+        y1[pos] = y1[nboxes - 1];
+        x2[pos] = x2[nboxes - 1];
+        y2[pos] = y2[nboxes - 1];
+        sc[pos] = sc[nboxes - 1];
+        areas[pos] = areas[nboxes - 1];
+        inds[pos] = inds[nboxes - 1];
+        nboxes = nboxes - 1;
+        pos = pos - 1;
+      }
+      pos = pos + 1;
+    }
+  }
+  return inds_t.slice(0, 0, nboxes);
+}
+
+Tensor softnms_impl(Tensor boxes, Tensor scores, Tensor dets,
+                    float iou_threshold, float sigma, float min_score,
+                    int method, int offset);
+REGISTER_DEVICE_IMPL(softnms_impl, CPU, softnms_cpu);
+
+std::vector<std::vector<int> > nms_match_cpu(Tensor dets, float iou_threshold) {
+  auto x1_t = dets.select(1, 0).contiguous();
+  auto y1_t = dets.select(1, 1).contiguous();
+  auto x2_t = dets.select(1, 2).contiguous();
+  auto y2_t = dets.select(1, 3).contiguous();
+  auto scores = dets.select(1, 4).contiguous();
+
+  at::Tensor areas_t = (x2_t - x1_t) * (y2_t - y1_t);
+
+  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
+
+  auto ndets = dets.size(0);
+  at::Tensor suppressed_t =
+      at::zeros({ndets}, dets.options().dtype(at::kByte).device(at::kCPU));
+
+  auto suppressed = suppressed_t.data_ptr<uint8_t>();
+  auto order = order_t.data_ptr<int64_t>();
+  auto x1 = x1_t.data_ptr<float>();
+  auto y1 = y1_t.data_ptr<float>();
+  auto x2 = x2_t.data_ptr<float>();
+  auto y2 = y2_t.data_ptr<float>();
+  auto areas = areas_t.data_ptr<float>();
+
+  std::vector<int> keep;
+  std::vector<std::vector<int> > matched;
+
+  for (int64_t _i = 0; _i < ndets; _i++) {
+    auto i = order[_i];
+    if (suppressed[i] == 1) continue;
+    keep.push_back(i);
+    std::vector<int> v_i;
+    auto ix1 = x1[i];
+    auto iy1 = y1[i];
+    auto ix2 = x2[i];
+    auto iy2 = y2[i];
+    auto iarea = areas[i];
+
+    for (int64_t _j = _i + 1; _j < ndets; _j++) {
+      auto j = order[_j];
+      if (suppressed[j] == 1) continue;
+      auto xx1 = std::max(ix1, x1[j]);
+      auto yy1 = std::max(iy1, y1[j]);
+      auto xx2 = std::min(ix2, x2[j]);
+      auto yy2 = std::min(iy2, y2[j]);
+
+      auto w = std::max(static_cast<float>(0), xx2 - xx1);
+      auto h = std::max(static_cast<float>(0), yy2 - yy1);
+      auto inter = w * h;
+      auto ovr = inter / (iarea + areas[j] - inter);
+      if (ovr >= iou_threshold) {
+        suppressed[j] = 1;
+        v_i.push_back(j);
+      }
+    }
+    matched.push_back(v_i);
+  }
+  for (size_t i = 0; i < keep.size(); i++)
+    matched[i].insert(matched[i].begin(), keep[i]);
+  return matched;
+}
+
+std::vector<std::vector<int> > nms_match_impl(Tensor dets, float iou_threshold);
+REGISTER_DEVICE_IMPL(nms_match_impl, CPU, nms_match_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_quadri.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_quadri.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..086df167eb49b1406c01129eea8783e393d7320f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_quadri.cpp
@@ -0,0 +1,64 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "box_iou_rotated_utils.hpp"
+#include "pytorch_cpp_helper.hpp"
+
+template <typename scalar_t>
+Tensor nms_quadri_cpu_kernel(const Tensor dets, const Tensor scores,
+                             const float iou_threshold) {
+  // nms_quadri_cpu_kernel is modified from torchvision's nms_cpu_kernel,
+  // however, the code in this function is much shorter because
+  // we delegate the IoU computation for quadri boxes to
+  // the single_box_iou_quadri function in box_iou_rotated_utils.h
+  AT_ASSERTM(!dets.is_cuda(), "dets must be a CPU tensor");
+  AT_ASSERTM(!scores.is_cuda(), "scores must be a CPU tensor");
+  AT_ASSERTM(dets.scalar_type() == scores.scalar_type(),
+             "dets should have the same type as scores");
+
+  if (dets.numel() == 0) {
+    return at::empty({0}, dets.options().dtype(at::kLong));
+  }
+
+  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
+
+  auto ndets = dets.size(0);
+  Tensor suppressed_t = at::zeros({ndets}, dets.options().dtype(at::kByte));
+  Tensor keep_t = at::zeros({ndets}, dets.options().dtype(at::kLong));
+
+  auto suppressed = suppressed_t.data_ptr<uint8_t>();
+  auto keep = keep_t.data_ptr<int64_t>();
+  auto order = order_t.data_ptr<int64_t>();
+
+  int64_t num_to_keep = 0;
+
+  for (int64_t _i = 0; _i < ndets; _i++) {
+    auto i = order[_i];
+    if (suppressed[i] == 1) {
+      continue;
+    }
+
+    keep[num_to_keep++] = i;
+
+    for (int64_t _j = _i + 1; _j < ndets; _j++) {
+      auto j = order[_j];
+      if (suppressed[j] == 1) {
+        continue;
+      }
+
+      auto ovr = single_box_iou_quadri<scalar_t>(
+          dets[i].data_ptr<scalar_t>(), dets[j].data_ptr<scalar_t>(), 0);
+      if (ovr >= iou_threshold) {
+        suppressed[j] = 1;
+      }
+    }
+  }
+  return keep_t.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep);
+}
+
+Tensor nms_quadri_cpu(const Tensor dets, const Tensor scores,
+                      const float iou_threshold) {
+  auto result = at::empty({0}, dets.options());
+  AT_DISPATCH_FLOATING_TYPES(dets.scalar_type(), "nms_quadri", [&] {
+    result = nms_quadri_cpu_kernel<scalar_t>(dets, scores, iou_threshold);
+  });
+  return result;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d2774c82654ef83d220ca81566cce8d25d02c275
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/nms_rotated.cpp
@@ -0,0 +1,66 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/nms_rotated/nms_rotated_cpu.cpp
+#include "box_iou_rotated_utils.hpp"
+#include "pytorch_cpp_helper.hpp"
+
+template <typename scalar_t>
+Tensor nms_rotated_cpu_kernel(const Tensor dets, const Tensor scores,
+                              const float iou_threshold) {
+  // nms_rotated_cpu_kernel is modified from torchvision's nms_cpu_kernel,
+  // however, the code in this function is much shorter because
+  // we delegate the IoU computation for rotated boxes to
+  // the single_box_iou_rotated function in box_iou_rotated_utils.h
+  AT_ASSERTM(!dets.is_cuda(), "dets must be a CPU tensor");
+  AT_ASSERTM(!scores.is_cuda(), "scores must be a CPU tensor");
+  AT_ASSERTM(dets.scalar_type() == scores.scalar_type(),
+             "dets should have the same type as scores");
+
+  if (dets.numel() == 0) {
+    return at::empty({0}, dets.options().dtype(at::kLong));
+  }
+
+  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
+
+  auto ndets = dets.size(0);
+  Tensor suppressed_t = at::zeros({ndets}, dets.options().dtype(at::kByte));
+  Tensor keep_t = at::zeros({ndets}, dets.options().dtype(at::kLong));
+
+  auto suppressed = suppressed_t.data_ptr<uint8_t>();
+  auto keep = keep_t.data_ptr<int64_t>();
+  auto order = order_t.data_ptr<int64_t>();
+
+  int64_t num_to_keep = 0;
+
+  for (int64_t _i = 0; _i < ndets; _i++) {
+    auto i = order[_i];
+    if (suppressed[i] == 1) {
+      continue;
+    }
+
+    keep[num_to_keep++] = i;
+
+    for (int64_t _j = _i + 1; _j < ndets; _j++) {
+      auto j = order[_j];
+      if (suppressed[j] == 1) {
+        continue;
+      }
+
+      auto ovr = single_box_iou_rotated<scalar_t>(
+          dets[i].data_ptr<scalar_t>(), dets[j].data_ptr<scalar_t>(), 0);
+      if (ovr >= iou_threshold) {
+        suppressed[j] = 1;
+      }
+    }
+  }
+  return keep_t.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep);
+}
+
+Tensor nms_rotated_cpu(const Tensor dets, const Tensor scores,
+                       const float iou_threshold) {
+  auto result = at::empty({0}, dets.options());
+  AT_DISPATCH_FLOATING_TYPES(dets.scalar_type(), "nms_rotated", [&] {
+    result = nms_rotated_cpu_kernel<scalar_t>(dets, scores, iou_threshold);
+  });
+  return result;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/pixel_group.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/pixel_group.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..db06a224a075e641b8d7738fe3e7be3f71990fc7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/pixel_group.cpp
@@ -0,0 +1,126 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// It is modified from https://github.com/WenmuZhou/PAN.pytorch
+
+#include <queue>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+std::vector<std::vector<float>> estimate_confidence(int32_t* label,
+                                                    float* score, int label_num,
+                                                    int height, int width) {
+  std::vector<std::vector<float>> point_vector;
+  for (int i = 0; i < label_num; i++) {
+    std::vector<float> point;
+    point.push_back(0);
+    point.push_back(0);
+    point_vector.push_back(point);
+  }
+  for (int y = 0; y < height; y++) {
+    auto label_tmp = label + y * width;
+    auto score_tmp = score + y * width;
+    for (int x = 0; x < width; x++) {
+      auto l = label_tmp[x];
+      if (l > 0) {
+        float confidence = score_tmp[x];
+        point_vector[l].push_back(x);
+        point_vector[l].push_back(y);
+        point_vector[l][0] += confidence;
+        point_vector[l][1] += 1;
+      }
+    }
+  }
+  for (size_t l = 0; l < point_vector.size(); l++)
+    if (point_vector[l][1] > 0) {
+      point_vector[l][0] /= point_vector[l][1];
+    }
+  return point_vector;
+}
+std::vector<std::vector<float>> pixel_group_cpu(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float dis_threshold) {
+  assert(score.dim() == 2);
+  assert(mask.dim() == 2);
+  assert(embedding.dim() == 3);
+  int height = score.size(0);
+  int width = score.size(1);
+  assert(height == mask.size(0) == embedding.size(1) == kernel_label.size(1));
+  assert(width == mask.size(1) == embedding.size(2) == kernel_label.size(2));
+
+  auto threshold_square = dis_threshold * dis_threshold;
+  auto ptr_score = score.data_ptr<float>();
+  auto ptr_mask = mask.data_ptr<bool>();
+  auto ptr_kernel_contour = kernel_contour.data_ptr<uint8_t>();
+  auto ptr_embedding = embedding.data_ptr<float>();
+  auto ptr_kernel_label = kernel_label.data_ptr<int32_t>();
+  std::queue<std::tuple<int, int, int32_t>> contour_pixels;
+  auto embedding_dim = embedding.size(2);
+  std::vector<std::vector<float>> kernel_vector(
+      kernel_region_num, std::vector<float>(embedding_dim + 1, 0));
+
+  Tensor text_label;
+  text_label = kernel_label.clone();
+  auto ptr_text_label = text_label.data_ptr<int32_t>();
+
+  for (int i = 0; i < height; i++) {
+    auto ptr_embedding_tmp = ptr_embedding + i * width * embedding_dim;
+    auto ptr_kernel_label_tmp = ptr_kernel_label + i * width;
+    auto ptr_kernel_contour_tmp = ptr_kernel_contour + i * width;
+
+    for (int j = 0, k = 0; j < width && k < width * embedding_dim;
+         j++, k += embedding_dim) {
+      int32_t label = ptr_kernel_label_tmp[j];
+      if (label > 0) {
+        for (int d = 0; d < embedding_dim; d++)
+          kernel_vector[label][d] += ptr_embedding_tmp[k + d];
+        kernel_vector[label][embedding_dim] += 1;
+        // kernel pixel number
+        if (ptr_kernel_contour_tmp[j]) {
+          contour_pixels.push(std::make_tuple(i, j, label));
+        }
+      }
+    }
+  }
+  for (int i = 0; i < kernel_region_num; i++) {
+    for (int j = 0; j < embedding_dim; j++) {
+      kernel_vector[i][j] /= kernel_vector[i][embedding_dim];
+    }
+  }
+  int dx[4] = {-1, 1, 0, 0};
+  int dy[4] = {0, 0, -1, 1};
+  while (!contour_pixels.empty()) {
+    auto query_pixel = contour_pixels.front();
+    contour_pixels.pop();
+    int y = std::get<0>(query_pixel);
+    int x = std::get<1>(query_pixel);
+    int32_t l = std::get<2>(query_pixel);
+    auto kernel_cv = kernel_vector[l];
+    for (int idx = 0; idx < 4; idx++) {
+      int tmpy = y + dy[idx];
+      int tmpx = x + dx[idx];
+      auto ptr_text_label_tmp = ptr_text_label + tmpy * width;
+      if (tmpy < 0 || tmpy >= height || tmpx < 0 || tmpx >= width) continue;
+      if (!ptr_mask[tmpy * width + tmpx] || ptr_text_label_tmp[tmpx] > 0)
+        continue;
+
+      float dis = 0;
+      auto ptr_embedding_tmp = ptr_embedding + tmpy * width * embedding_dim;
+      for (size_t i = 0; i < size_t(embedding_dim); i++) {
+        dis +=
+            pow(kernel_cv[i] - ptr_embedding_tmp[tmpx * embedding_dim + i], 2);
+        // ignore further computing if dis is big enough
+        if (dis >= threshold_square) break;
+      }
+      if (dis >= threshold_square) continue;
+      contour_pixels.push(std::make_tuple(tmpy, tmpx, l));
+      ptr_text_label_tmp[tmpx] = l;
+    }
+  }
+
+  return estimate_confidence(ptr_text_label, ptr_score, kernel_region_num,
+                             height, width);
+}
+std::vector<std::vector<float>> pixel_group_impl(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float dis_threshold);
+REGISTER_DEVICE_IMPL(pixel_group_impl, CPU, pixel_group_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/points_in_boxes.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/points_in_boxes.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c16baa4cca4c380db4ae25462f5074607f084214
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/points_in_boxes.cpp
@@ -0,0 +1,53 @@
+#include "pytorch_cpp_helper.hpp"
+
+inline void lidar_to_local_coords_cpu(float shift_x, float shift_y, float rz,
+                                      float &local_x, float &local_y) {
+  float cosa = cos(-rz), sina = sin(-rz);
+  local_x = shift_x * cosa + shift_y * (-sina);
+  local_y = shift_x * sina + shift_y * cosa;
+}
+
+inline int check_pt_in_box3d_cpu(const float *pt, const float *box3d,
+                                 float &local_x, float &local_y) {
+  // param pt: (x, y, z)
+  // param box3d: (cx, cy, cz, x_size, y_size, z_size, rz) in LiDAR coordinate,
+  // cz in the bottom center
+  float x = pt[0], y = pt[1], z = pt[2];
+  float cx = box3d[0], cy = box3d[1], cz = box3d[2];
+  float x_size = box3d[3], y_size = box3d[4], z_size = box3d[5], rz = box3d[6];
+  cz += z_size /
+        2.0;  // shift to the center since cz in box3d is the bottom center
+
+  if (fabsf(z - cz) > z_size / 2.0) return 0;
+  lidar_to_local_coords_cpu(x - cx, y - cy, rz, local_x, local_y);
+  float in_flag = (local_x > -x_size / 2.0) & (local_x < x_size / 2.0) &
+                  (local_y > -y_size / 2.0) & (local_y < y_size / 2.0);
+  return in_flag;
+}
+
+void points_in_boxes_cpu_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor pts_indices_tensor) {
+  // params boxes: (N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box DO NOT overlaps params pts:
+  // (npoints, 3) [x, y, z] in LiDAR coordinate params pts_indices: (N, npoints)
+
+  CHECK_CONTIGUOUS(boxes_tensor);
+  CHECK_CONTIGUOUS(pts_tensor);
+  CHECK_CONTIGUOUS(pts_indices_tensor);
+
+  int boxes_num = boxes_tensor.size(0);
+  int pts_num = pts_tensor.size(0);
+
+  const float *boxes = boxes_tensor.data_ptr<float>();
+  const float *pts = pts_tensor.data_ptr<float>();
+  int *pts_indices = pts_indices_tensor.data_ptr<int>();
+
+  float local_x = 0, local_y = 0;
+  for (int i = 0; i < boxes_num; i++) {
+    for (int j = 0; j < pts_num; j++) {
+      int cur_in_flag =
+          check_pt_in_box3d_cpu(pts + j * 3, boxes + i * 7, local_x, local_y);
+      pts_indices[i * pts_num + j] = cur_in_flag;
+    }
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/psamask.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/psamask.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..aa7fdcbdca908e3f037d75bcc6d7d9e68102d192
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/psamask.cpp
@@ -0,0 +1,199 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/hszhao/semseg/blob/master/lib/psa/src
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+#ifndef min
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+#endif
+#ifndef max
+#define max(a, b) (((a) > (b)) ? (a) : (b))
+#endif
+
+void psamask_collect_forward(const int num_, const int h_feature,
+                             const int w_feature, const int h_mask,
+                             const int w_mask, const int half_h_mask,
+                             const int half_w_mask, const Tensor mask_data,
+                             Tensor buffer_data) {
+  for (int n = 0; n < num_; n++) {
+    for (int h = 0; h < h_feature; h++) {
+      for (int w = 0; w < w_feature; w++) {
+        // effective mask region : [hstart, hend) x [wstart, wend) with
+        // mask-indexed
+        const int hstart = max(0, half_h_mask - h);
+        const int hend = min(h_mask, h_feature + half_h_mask - h);
+        const int wstart = max(0, half_w_mask - w);
+        const int wend = min(w_mask, w_feature + half_w_mask - w);
+        // (hidx,                    widx                   ) with mask-indexed
+        // (hidx + h - half_h_mask, widx + w - half_w_mask) with
+        // feature-indexed
+        for (int hidx = hstart; hidx < hend; hidx++) {
+          for (int widx = wstart; widx < wend; widx++) {
+            buffer_data.view({-1})[(n * h_feature * w_feature +
+                                    (hidx + h - half_h_mask) * w_feature +
+                                    (widx + w - half_w_mask)) *
+                                       h_feature * w_feature +
+                                   h * w_feature + w] =
+                mask_data.view(
+                    {-1})[((n * h_mask * w_mask + hidx * w_mask + widx) *
+                               h_feature +
+                           h) *
+                              w_feature +
+                          w];
+          }
+        }
+      }
+    }
+  }
+}
+
+void psamask_distribute_forward(const int num_, const int h_feature,
+                                const int w_feature, const int h_mask,
+                                const int w_mask, const int half_h_mask,
+                                const int half_w_mask, const Tensor mask_data,
+                                Tensor buffer_data) {
+  for (int n = 0; n < num_; n++) {
+    for (int h = 0; h < h_feature; h++) {
+      for (int w = 0; w < w_feature; w++) {
+        // effective mask region : [hstart, hend) x [wstart, wend) with
+        // mask-indexed
+        const int hstart = max(0, half_h_mask - h);
+        const int hend = min(h_mask, h_feature + half_h_mask - h);
+        const int wstart = max(0, half_w_mask - w);
+        const int wend = min(w_mask, w_feature + half_w_mask - w);
+        // (hidx,                    widx                   ) with mask-indexed
+        // (hidx + h - half_h_mask, widx + w - half_w_mask) with
+        // feature-indexed
+        for (int hidx = hstart; hidx < hend; hidx++) {
+          for (int widx = wstart; widx < wend; widx++) {
+            buffer_data.view(
+                {-1})[(n * h_feature * w_feature + h * w_feature + w) *
+                          h_feature * w_feature +
+                      (hidx + h - half_h_mask) * w_feature +
+                      (widx + w - half_w_mask)] =
+                mask_data.view(
+                    {-1})[((n * h_mask * w_mask + hidx * w_mask + widx) *
+                               h_feature +
+                           h) *
+                              w_feature +
+                          w];
+          }
+        }
+      }
+    }
+  }
+}
+
+void psamask_collect_backward(const int num_, const int h_feature,
+                              const int w_feature, const int h_mask,
+                              const int w_mask, const int half_h_mask,
+                              const int half_w_mask, const Tensor buffer_diff,
+                              Tensor mask_diff) {
+  for (int n = 0; n < num_; n++) {
+    for (int h = 0; h < h_feature; h++) {
+      for (int w = 0; w < w_feature; w++) {
+        // effective mask region : [hstart, hend) x [wstart, wend) with
+        // mask-indexed
+        const int hstart = max(0, half_h_mask - h);
+        const int hend = min(h_mask, h_feature + half_h_mask - h);
+        const int wstart = max(0, half_w_mask - w);
+        const int wend = min(w_mask, w_feature + half_w_mask - w);
+        // (hidx,                    widx                   ) with mask-indexed
+        // (hidx + h - half_h_mask, widx + w - half_w_mask) with
+        // feature-indexed
+        for (int hidx = hstart; hidx < hend; hidx++) {
+          for (int widx = wstart; widx < wend; widx++) {
+            mask_diff.view({-1})[((n * h_mask * w_mask + hidx * w_mask + widx) *
+                                      h_feature +
+                                  h) *
+                                     w_feature +
+                                 w] =
+                buffer_diff.view({-1})[(n * h_feature * w_feature +
+                                        (hidx + h - half_h_mask) * w_feature +
+                                        (widx + w - half_w_mask)) *
+                                           h_feature * w_feature +
+                                       h * w_feature + w];
+          }
+        }
+      }
+    }
+  }
+}
+
+void psamask_distribute_backward(const int num_, const int h_feature,
+                                 const int w_feature, const int h_mask,
+                                 const int w_mask, const int half_h_mask,
+                                 const int half_w_mask,
+                                 const Tensor buffer_diff, Tensor mask_diff) {
+  for (int n = 0; n < num_; n++) {
+    for (int h = 0; h < h_feature; h++) {
+      for (int w = 0; w < w_feature; w++) {
+        // effective mask region : [hstart, hend) x [wstart, wend) with
+        // mask-indexed
+        const int hstart = max(0, half_h_mask - h);
+        const int hend = min(h_mask, h_feature + half_h_mask - h);
+        const int wstart = max(0, half_w_mask - w);
+        const int wend = min(w_mask, w_feature + half_w_mask - w);
+        // (hidx,                    widx                   ) with mask-indexed
+        // (hidx + h - half_h_mask, widx + w - half_w_mask) with
+        // feature-indexed
+        for (int hidx = hstart; hidx < hend; hidx++) {
+          for (int widx = wstart; widx < wend; widx++) {
+            mask_diff.view({-1})[((n * h_mask * w_mask + hidx * w_mask + widx) *
+                                      h_feature +
+                                  h) *
+                                     w_feature +
+                                 w] =
+                buffer_diff.view(
+                    {-1})[(n * h_feature * w_feature + h * w_feature + w) *
+                              h_feature * w_feature +
+                          (hidx + h - half_h_mask) * w_feature +
+                          (widx + w - half_w_mask)];
+          }
+        }
+      }
+    }
+  }
+}
+
+void psamask_forward_cpu(const int psa_type, const Tensor input, Tensor output,
+                         const int num_, const int h_feature,
+                         const int w_feature, const int h_mask,
+                         const int w_mask, const int half_h_mask,
+                         const int half_w_mask) {
+  if (psa_type == 0)
+    psamask_collect_forward(num_, h_feature, w_feature, h_mask, w_mask,
+                            half_h_mask, half_w_mask, input, output);
+  else
+    psamask_distribute_forward(num_, h_feature, w_feature, h_mask, w_mask,
+                               half_h_mask, half_w_mask, input, output);
+}
+
+void psamask_backward_cpu(const int psa_type, const Tensor grad_output,
+                          Tensor grad_input, const int num_,
+                          const int h_feature, const int w_feature,
+                          const int h_mask, const int w_mask,
+                          const int half_h_mask, const int half_w_mask) {
+  if (psa_type == 0)
+    psamask_collect_backward(num_, h_feature, w_feature, h_mask, w_mask,
+                             half_h_mask, half_w_mask, grad_output, grad_input);
+  else
+    psamask_distribute_backward(num_, h_feature, w_feature, h_mask, w_mask,
+                                half_h_mask, half_w_mask, grad_output,
+                                grad_input);
+}
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask);
+REGISTER_DEVICE_IMPL(psamask_forward_impl, CPU, psamask_forward_cpu);
+REGISTER_DEVICE_IMPL(psamask_backward_impl, CPU, psamask_backward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d545390645917aff7e5e8b42564fb83eb4e62ae7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align.cpp
@@ -0,0 +1,466 @@
+// Modified from
+// https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/ROIAlign
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include <ATen/ATen.h>
+#include <ATen/TensorUtils.h>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+// implementation taken from Caffe2
+template <typename T>
+struct PreCalc {
+  int pos1;
+  int pos2;
+  int pos3;
+  int pos4;
+  T w1;
+  T w2;
+  T w3;
+  T w4;
+};
+
+template <typename T>
+void pre_calc_for_bilinear_interpolate(
+    const int height, const int width, const int pooled_height,
+    const int pooled_width, const int iy_upper, const int ix_upper,
+    T roi_start_h, T roi_start_w, T bin_size_h, T bin_size_w,
+    int roi_bin_grid_h, int roi_bin_grid_w, std::vector<PreCalc<T>>& pre_calc) {
+  int pre_calc_index = 0;
+  for (int ph = 0; ph < pooled_height; ph++) {
+    for (int pw = 0; pw < pooled_width; pw++) {
+      for (int iy = 0; iy < iy_upper; iy++) {
+        const T yy = roi_start_h + ph * bin_size_h +
+                     static_cast<T>(iy + .5f) * bin_size_h /
+                         static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+        for (int ix = 0; ix < ix_upper; ix++) {
+          const T xx = roi_start_w + pw * bin_size_w +
+                       static_cast<T>(ix + .5f) * bin_size_w /
+                           static_cast<T>(roi_bin_grid_w);
+
+          T x = xx;
+          T y = yy;
+          // deal with: inverse elements are out of feature map boundary
+          if (y < -1.0 || y > height || x < -1.0 || x > width) {
+            // empty
+            PreCalc<T> pc;
+            pc.pos1 = 0;
+            pc.pos2 = 0;
+            pc.pos3 = 0;
+            pc.pos4 = 0;
+            pc.w1 = 0;
+            pc.w2 = 0;
+            pc.w3 = 0;
+            pc.w4 = 0;
+            pre_calc[pre_calc_index] = pc;
+            pre_calc_index += 1;
+            continue;
+          }
+
+          if (y <= 0) {
+            y = 0;
+          }
+          if (x <= 0) {
+            x = 0;
+          }
+
+          int y_low = (int)y;
+          int x_low = (int)x;
+          int y_high;
+          int x_high;
+
+          if (y_low >= height - 1) {
+            y_high = y_low = height - 1;
+            y = (T)y_low;
+          } else {
+            y_high = y_low + 1;
+          }
+
+          if (x_low >= width - 1) {
+            x_high = x_low = width - 1;
+            x = (T)x_low;
+          } else {
+            x_high = x_low + 1;
+          }
+
+          T ly = y - y_low;
+          T lx = x - x_low;
+          T hy = 1. - ly, hx = 1. - lx;
+          T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+          // save weights and indices
+          PreCalc<T> pc;
+          pc.pos1 = y_low * width + x_low;
+          pc.pos2 = y_low * width + x_high;
+          pc.pos3 = y_high * width + x_low;
+          pc.pos4 = y_high * width + x_high;
+          pc.w1 = w1;
+          pc.w2 = w2;
+          pc.w3 = w3;
+          pc.w4 = w4;
+          pre_calc[pre_calc_index] = pc;
+
+          pre_calc_index += 1;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void ROIAlignForward(const int nthreads, const T* input, const T* rois,
+                     T* output, T* argmax_y, T* argmax_x,
+                     const int pooled_height, const int pooled_width,
+                     const T spatial_scale, const int sampling_ratio,
+                     const int pool_mode,  // 0 - max pool, 1 - avg pool
+                     const bool aligned, const int channels, const int height,
+                     const int width) {
+  int n_rois = nthreads / channels / pooled_width / pooled_height;
+  // (n, c, ph, pw) is an element in the pooled output
+  // can be parallelized using omp
+  // #pragma omp parallel for num_threads(32)
+  for (int n = 0; n < n_rois; n++) {
+    int index_n = n * channels * pooled_width * pooled_height;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T roi_start_w = offset_rois[1] * spatial_scale - offset;
+    T roi_start_h = offset_rois[2] * spatial_scale - offset;
+    T roi_end_w = offset_rois[3] * spatial_scale - offset;
+    T roi_end_h = offset_rois[4] * spatial_scale - offset;
+
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "ROIs in ROIAlign cannot have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
+
+    // When the grid is empty, output zeros == 0/1, instead of NaN.
+    const T count = std::max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    // we want to precalculate indices and weights shared by all channels,
+    // this is the key point of optimization
+    std::vector<PreCalc<T>> pre_calc(roi_bin_grid_h * roi_bin_grid_w *
+                                     pooled_width * pooled_height);
+    pre_calc_for_bilinear_interpolate(
+        height, width, pooled_height, pooled_width, roi_bin_grid_h,
+        roi_bin_grid_w, roi_start_h, roi_start_w, bin_size_h, bin_size_w,
+        roi_bin_grid_h, roi_bin_grid_w, pre_calc);
+
+    for (int c = 0; c < channels; c++) {
+      int index_n_c = index_n + c * pooled_width * pooled_height;
+      const T* offset_input =
+          input + (roi_batch_ind * channels + c) * height * width;
+      int pre_calc_index = 0;
+
+      for (int ph = 0; ph < pooled_height; ph++) {
+        for (int pw = 0; pw < pooled_width; pw++) {
+          int index = index_n_c + ph * pooled_width + pw;
+
+          T output_val = 0.;
+          T maxval = -10000;
+          T maxidx_y = -1.f, maxidx_x = -1.f;
+          for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+            const T y = roi_start_h + ph * bin_size_h +
+                        static_cast<T>(iy + .5f) * bin_size_h /
+                            static_cast<T>(roi_bin_grid_h);
+            for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+              const T x = roi_start_w + pw * bin_size_w +
+                          static_cast<T>(ix + .5f) * bin_size_w /
+                              static_cast<T>(roi_bin_grid_w);
+              PreCalc<T> pc = pre_calc[pre_calc_index];
+              T val = pc.w1 * offset_input[pc.pos1] +
+                      pc.w2 * offset_input[pc.pos2] +
+                      pc.w3 * offset_input[pc.pos3] +
+                      pc.w4 * offset_input[pc.pos4];
+              if (val > maxval) {
+                maxval = val;
+                maxidx_y = y;
+                maxidx_x = x;
+              }
+              output_val += val;
+              pre_calc_index += 1;
+            }
+          }
+          if (pool_mode == 0) {
+            // We do max pooling inside a bin
+            output[index] = maxval;
+            argmax_y[index] = maxidx_y;
+            argmax_x[index] = maxidx_x;
+          } else if (pool_mode == 1) {
+            // We do average (integral) pooling inside a bin
+            output[index] = output_val / count;
+          }  // if
+        }    // for pw
+      }      // for ph
+    }        // for c
+  }          // for n
+}
+
+template <typename T>
+void bilinear_interpolate_gradient(const int height, const int width, T y, T x,
+                                   T& w1, T& w2, T& w3, T& w4, int& x_low,
+                                   int& x_high, int& y_low, int& y_high,
+                                   const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    // empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  y_low = (int)y;
+  x_low = (int)x;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+
+  // reference in forward
+  // T v1 = input[y_low * width + x_low];
+  // T v2 = input[y_low * width + x_high];
+  // T v3 = input[y_high * width + x_low];
+  // T v4 = input[y_high * width + x_high];
+  // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+  return;
+}
+
+template <class T>
+inline void add(T* address, const T& val) {
+  *address += val;
+}
+
+template <typename T>
+void ROIAlignBackward(const int nthreads, const T* grad_output, const T* rois,
+                      const T* argmax_y, const T* argmax_x, T* grad_input,
+                      const int pooled_height, const int pooled_width,
+                      const T spatial_scale, const int sampling_ratio,
+                      const int pool_mode,  // 0 - max pool, 1 - avg pool
+                      const bool aligned, const int channels, const int height,
+                      const int width, const int n_stride, const int c_stride,
+                      const int h_stride, const int w_stride) {
+  for (int index = 0; index < nthreads; index++) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* offset_rois = rois + n * 5;
+    int roi_batch_ind = offset_rois[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T roi_start_w = offset_rois[1] * spatial_scale - offset;
+    T roi_start_h = offset_rois[2] * spatial_scale - offset;
+    T roi_end_w = offset_rois[3] * spatial_scale - offset;
+    T roi_end_h = offset_rois[4] * spatial_scale - offset;
+
+    T roi_width = roi_end_w - roi_start_w;
+    T roi_height = roi_end_h - roi_start_h;
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "ROIs in ROIAlign do not have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    T* offset_grad_input =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+
+    int output_offset = n * n_stride + c * c_stride;
+    const T* offset_grad_output = grad_output + output_offset;
+    const T grad_output_this_bin =
+        offset_grad_output[ph * h_stride + pw * w_stride];
+
+    if (pool_mode == 0) {
+      // We do max pooling inside a bin
+      T y = argmax_y[index], x = argmax_x[index];
+      if (y != -1.f) {
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high, index);
+
+        T g1 = grad_output_this_bin * w1;
+        T g2 = grad_output_this_bin * w2;
+        T g3 = grad_output_this_bin * w3;
+        T g4 = grad_output_this_bin * w4;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          // atomic add is not needed for now since it is single threaded
+          add(offset_grad_input + y_low * width + x_low, static_cast<T>(g1));
+          add(offset_grad_input + y_low * width + x_high, static_cast<T>(g2));
+          add(offset_grad_input + y_high * width + x_low, static_cast<T>(g3));
+          add(offset_grad_input + y_high * width + x_high, static_cast<T>(g4));
+        }  // if
+      }    // mode
+    } else if (pool_mode == 1) {
+      // We do average (integral) pooling inside a bin
+      // We use roi_bin_grid to sample the grid and mimic integral
+      int roi_bin_grid_h =
+          (sampling_ratio > 0)
+              ? sampling_ratio
+              : ceilf(roi_height / pooled_height);  // e.g., = 2
+      int roi_bin_grid_w = (sampling_ratio > 0)
+                               ? sampling_ratio
+                               : ceilf(roi_width / pooled_width);
+
+      const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+      for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+        const T y = roi_start_h + ph * bin_size_h +
+                    static_cast<T>(iy + .5f) * bin_size_h /
+                        static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+        for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+          const T x = roi_start_w + pw * bin_size_w +
+                      static_cast<T>(ix + .5f) * bin_size_w /
+                          static_cast<T>(roi_bin_grid_w);
+
+          T w1, w2, w3, w4;
+          int x_low, x_high, y_low, y_high;
+
+          bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                        x_low, x_high, y_low, y_high, index);
+
+          T g1 = grad_output_this_bin * w1 / count;
+          T g2 = grad_output_this_bin * w2 / count;
+          T g3 = grad_output_this_bin * w3 / count;
+          T g4 = grad_output_this_bin * w4 / count;
+
+          if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+            // atomic add is not needed for now since it is single threaded
+            add(offset_grad_input + y_low * width + x_low, static_cast<T>(g1));
+            add(offset_grad_input + y_low * width + x_high, static_cast<T>(g2));
+            add(offset_grad_input + y_high * width + x_low, static_cast<T>(g3));
+            add(offset_grad_input + y_high * width + x_high,
+                static_cast<T>(g4));
+          }  // if
+        }    // ix
+      }      // iy
+    }        // mode
+  }          // for
+}  // ROIAlignBackward
+
+void ROIAlignForwardCPULauncher(Tensor input, Tensor rois, Tensor output,
+                                Tensor argmax_y, Tensor argmax_x,
+                                int aligned_height, int aligned_width,
+                                float spatial_scale, int sampling_ratio,
+                                int pool_mode, bool aligned) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "ROIAlign_forward", [&] {
+        ROIAlignForward<scalar_t>(
+            output_size, input.data_ptr<scalar_t>(), rois.data_ptr<scalar_t>(),
+            output.data_ptr<scalar_t>(), argmax_y.data_ptr<scalar_t>(),
+            argmax_x.data_ptr<scalar_t>(), aligned_height, aligned_width,
+            static_cast<scalar_t>(spatial_scale), sampling_ratio, pool_mode,
+            aligned, channels, height, width);
+      });
+}
+
+void ROIAlignBackwardCPULauncher(Tensor grad_output, Tensor rois,
+                                 Tensor argmax_y, Tensor argmax_x,
+                                 Tensor grad_input, int aligned_height,
+                                 int aligned_width, float spatial_scale,
+                                 int sampling_ratio, int pool_mode,
+                                 bool aligned) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  // get stride values to ensure indexing into gradients is correct.
+  int n_stride = grad_output.stride(0);
+  int c_stride = grad_output.stride(1);
+  int h_stride = grad_output.stride(2);
+  int w_stride = grad_output.stride(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "ROIAlign_backward", [&] {
+        ROIAlignBackward<scalar_t>(
+            output_size, grad_output.data_ptr<scalar_t>(),
+            rois.data_ptr<scalar_t>(), argmax_y.data_ptr<scalar_t>(),
+            argmax_x.data_ptr<scalar_t>(), grad_input.data_ptr<scalar_t>(),
+            aligned_height, aligned_width, static_cast<scalar_t>(spatial_scale),
+            sampling_ratio, pool_mode, aligned, channels, height, width,
+            n_stride, c_stride, h_stride, w_stride);
+      });
+}
+
+void roi_align_forward_cpu(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                           int aligned_width, float spatial_scale,
+                           int sampling_ratio, int pool_mode, bool aligned) {
+  ROIAlignForwardCPULauncher(input, rois, output, argmax_y, argmax_x,
+                             aligned_height, aligned_width, spatial_scale,
+                             sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_cpu(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                            Tensor argmax_x, Tensor grad_input,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignBackwardCPULauncher(grad_output, rois, argmax_y, argmax_x, grad_input,
+                              aligned_height, aligned_width, spatial_scale,
+                              sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned);
+
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, CPU, roi_align_forward_cpu);
+REGISTER_DEVICE_IMPL(roi_align_backward_impl, CPU, roi_align_backward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8c849de0cbc564a9a88cdbcd35b4acdb065f99a3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/roi_align_rotated.cpp
@@ -0,0 +1,455 @@
+// Modified from
+// https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/ROIAlignRotated
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include <ATen/ATen.h>
+#include <ATen/TensorUtils.h>
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+// implementation taken from Caffe2
+template <typename T>
+struct PreCalc {
+  int pos1;
+  int pos2;
+  int pos3;
+  int pos4;
+  T w1;
+  T w2;
+  T w3;
+  T w4;
+};
+
+template <typename T>
+void pre_calc_for_bilinear_interpolate(
+    const int height, const int width, const int pooled_height,
+    const int pooled_width, const int iy_upper, const int ix_upper,
+    T roi_start_h, T roi_start_w, T bin_size_h, T bin_size_w,
+    int roi_bin_grid_h, int roi_bin_grid_w, T roi_center_h, T roi_center_w,
+    T cos_theta, T sin_theta, std::vector<PreCalc<T>>& pre_calc) {
+  int pre_calc_index = 0;
+  for (int ph = 0; ph < pooled_height; ph++) {
+    for (int pw = 0; pw < pooled_width; pw++) {
+      for (int iy = 0; iy < iy_upper; iy++) {
+        const T yy = roi_start_h + ph * bin_size_h +
+                     static_cast<T>(iy + .5f) * bin_size_h /
+                         static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+        for (int ix = 0; ix < ix_upper; ix++) {
+          const T xx = roi_start_w + pw * bin_size_w +
+                       static_cast<T>(ix + .5f) * bin_size_w /
+                           static_cast<T>(roi_bin_grid_w);
+
+          // Rotate by theta around the center and translate
+          // In image space, (y, x) is the order for Right Handed System,
+          // and this is essentially multiplying the point by a rotation matrix
+          // to rotate it counterclockwise through angle theta.
+          T y = yy * cos_theta - xx * sin_theta + roi_center_h;
+          T x = yy * sin_theta + xx * cos_theta + roi_center_w;
+          // deal with: inverse elements are out of feature map boundary
+          if (y < -1.0 || y > height || x < -1.0 || x > width) {
+            // empty
+            PreCalc<T> pc;
+            pc.pos1 = 0;
+            pc.pos2 = 0;
+            pc.pos3 = 0;
+            pc.pos4 = 0;
+            pc.w1 = 0;
+            pc.w2 = 0;
+            pc.w3 = 0;
+            pc.w4 = 0;
+            pre_calc[pre_calc_index] = pc;
+            pre_calc_index += 1;
+            continue;
+          }
+
+          if (y < 0) {
+            y = 0;
+          }
+          if (x < 0) {
+            x = 0;
+          }
+
+          int y_low = (int)y;
+          int x_low = (int)x;
+          int y_high;
+          int x_high;
+
+          if (y_low >= height - 1) {
+            y_high = y_low = height - 1;
+            y = (T)y_low;
+          } else {
+            y_high = y_low + 1;
+          }
+
+          if (x_low >= width - 1) {
+            x_high = x_low = width - 1;
+            x = (T)x_low;
+          } else {
+            x_high = x_low + 1;
+          }
+
+          T ly = y - y_low;
+          T lx = x - x_low;
+          T hy = 1. - ly, hx = 1. - lx;
+          T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+          // save weights and indices
+          PreCalc<T> pc;
+          pc.pos1 = y_low * width + x_low;
+          pc.pos2 = y_low * width + x_high;
+          pc.pos3 = y_high * width + x_low;
+          pc.pos4 = y_high * width + x_high;
+          pc.w1 = w1;
+          pc.w2 = w2;
+          pc.w3 = w3;
+          pc.w4 = w4;
+          pre_calc[pre_calc_index] = pc;
+
+          pre_calc_index += 1;
+        }
+      }
+    }
+  }
+}
+
+template <typename T>
+void ROIAlignRotatedForward(const int nthreads, const T* input,
+                            const T& spatial_scale, const bool aligned,
+                            const bool clockwise, const int channels,
+                            const int height, const int width,
+                            const int pooled_height, const int pooled_width,
+                            const int sampling_ratio, const T* rois,
+                            T* output) {
+  int n_rois = nthreads / channels / pooled_width / pooled_height;
+  // (n, c, ph, pw) is an element in the pooled output
+  // can be parallelized using omp
+  // #pragma omp parallel for num_threads(32)
+  for (int n = 0; n < n_rois; n++) {
+    int index_n = n * channels * pooled_width * pooled_height;
+
+    const T* current_roi = rois + n * 6;
+    int roi_batch_ind = current_roi[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T roi_center_w = current_roi[1] * spatial_scale - offset;
+    T roi_center_h = current_roi[2] * spatial_scale - offset;
+    T roi_width = current_roi[3] * spatial_scale;
+    T roi_height = current_roi[4] * spatial_scale;
+    T theta = current_roi[5];
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    T cos_theta = cos(theta);
+    T sin_theta = sin(theta);
+
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "ROIs in ROIAlignRotated do not have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
+
+    // We do average (integral) pooling inside a bin
+    const T count = std::max(roi_bin_grid_h * roi_bin_grid_w, 1);  // e.g. = 4
+
+    // we want to precalculate indices and weights shared by all channels,
+    // this is the key point of optimization
+    std::vector<PreCalc<T>> pre_calc(roi_bin_grid_h * roi_bin_grid_w *
+                                     pooled_width * pooled_height);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    T roi_start_h = -roi_height / 2.0;
+    T roi_start_w = -roi_width / 2.0;
+
+    pre_calc_for_bilinear_interpolate(
+        height, width, pooled_height, pooled_width, roi_bin_grid_h,
+        roi_bin_grid_w, roi_start_h, roi_start_w, bin_size_h, bin_size_w,
+        roi_bin_grid_h, roi_bin_grid_w, roi_center_h, roi_center_w, cos_theta,
+        sin_theta, pre_calc);
+
+    for (int c = 0; c < channels; c++) {
+      int index_n_c = index_n + c * pooled_width * pooled_height;
+      const T* offset_input =
+          input + (roi_batch_ind * channels + c) * height * width;
+      int pre_calc_index = 0;
+
+      for (int ph = 0; ph < pooled_height; ph++) {
+        for (int pw = 0; pw < pooled_width; pw++) {
+          int index = index_n_c + ph * pooled_width + pw;
+
+          T output_val = 0.;
+          for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+            for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+              PreCalc<T> pc = pre_calc[pre_calc_index];
+              output_val += pc.w1 * offset_input[pc.pos1] +
+                            pc.w2 * offset_input[pc.pos2] +
+                            pc.w3 * offset_input[pc.pos3] +
+                            pc.w4 * offset_input[pc.pos4];
+
+              pre_calc_index += 1;
+            }
+          }
+          output_val /= count;
+
+          output[index] = output_val;
+        }  // for pw
+      }    // for ph
+    }      // for c
+  }        // for n
+}
+
+template <typename T>
+void bilinear_interpolate_gradient(const int height, const int width, T y, T x,
+                                   T& w1, T& w2, T& w3, T& w4, int& x_low,
+                                   int& x_high, int& y_low, int& y_high) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    // empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+
+  if (y < 0) {
+    y = 0;
+  }
+
+  if (x < 0) {
+    x = 0;
+  }
+
+  y_low = (int)y;
+  x_low = (int)x;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+
+  // reference in forward
+  // T v1 = input[y_low * width + x_low];
+  // T v2 = input[y_low * width + x_high];
+  // T v3 = input[y_high * width + x_low];
+  // T v4 = input[y_high * width + x_high];
+  // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+  return;
+}
+
+template <class T>
+inline void add(T* address, const T& val) {
+  *address += val;
+}
+
+template <typename T>
+void ROIAlignRotatedBackward(
+    const int nthreads,
+    // may not be contiguous. should index using n_stride, etc
+    const T* grad_output, const T& spatial_scale, const bool aligned,
+    const bool clockwise, const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width, const int sampling_ratio,
+    T* grad_input, const T* rois, const int n_stride, const int c_stride,
+    const int h_stride, const int w_stride) {
+  for (int index = 0; index < nthreads; index++) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+
+    const T* current_roi = rois + n * 6;
+    int roi_batch_ind = current_roi[0];
+
+    // Do not use rounding; this implementation detail is critical
+    T offset = aligned ? (T)0.5 : (T)0.0;
+    T roi_center_w = current_roi[1] * spatial_scale - offset;
+    T roi_center_h = current_roi[2] * spatial_scale - offset;
+    T roi_width = current_roi[3] * spatial_scale;
+    T roi_height = current_roi[4] * spatial_scale;
+    T theta = current_roi[5];
+    if (clockwise) {
+      theta = -theta;  // If clockwise, the angle needs to be reversed.
+    }
+    T cos_theta = cos(theta);
+    T sin_theta = sin(theta);
+
+    if (aligned) {
+      AT_ASSERTM(roi_width >= 0 && roi_height >= 0,
+                 "ROIs in ROIAlignRotated do not have non-negative size!");
+    } else {  // for backward-compatibility only
+      roi_width = std::max(roi_width, (T)1.);
+      roi_height = std::max(roi_height, (T)1.);
+    }
+
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+
+    T* offset_grad_input =
+        grad_input + ((roi_batch_ind * channels + c) * height * width);
+
+    int output_offset = n * n_stride + c * c_stride;
+    const T* offset_grad_output = grad_output + output_offset;
+    const T grad_output_this_bin =
+        offset_grad_output[ph * h_stride + pw * w_stride];
+
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+                             ? sampling_ratio
+                             : ceilf(roi_height / pooled_height);  // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
+
+    // roi_start_h and roi_start_w are computed wrt the center of RoI (x, y).
+    // Appropriate translation needs to be applied after.
+    T roi_start_h = -roi_height / 2.0;
+    T roi_start_w = -roi_width / 2.0;
+
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w;  // e.g. = 4
+
+    for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+      const T yy = roi_start_h + ph * bin_size_h +
+                   static_cast<T>(iy + .5f) * bin_size_h /
+                       static_cast<T>(roi_bin_grid_h);  // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+        const T xx = roi_start_w + pw * bin_size_w +
+                     static_cast<T>(ix + .5f) * bin_size_w /
+                         static_cast<T>(roi_bin_grid_w);
+
+        // Rotate by theta around the center and translate
+        T y = yy * cos_theta - xx * sin_theta + roi_center_h;
+        T x = yy * sin_theta + xx * cos_theta + roi_center_w;
+
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+
+        bilinear_interpolate_gradient(height, width, y, x, w1, w2, w3, w4,
+                                      x_low, x_high, y_low, y_high);
+
+        T g1 = grad_output_this_bin * w1 / count;
+        T g2 = grad_output_this_bin * w2 / count;
+        T g3 = grad_output_this_bin * w3 / count;
+        T g4 = grad_output_this_bin * w4 / count;
+
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+          // atomic add is not needed for now since it is single threaded
+          add(offset_grad_input + y_low * width + x_low, static_cast<T>(g1));
+          add(offset_grad_input + y_low * width + x_high, static_cast<T>(g2));
+          add(offset_grad_input + y_high * width + x_low, static_cast<T>(g3));
+          add(offset_grad_input + y_high * width + x_high, static_cast<T>(g4));
+        }  // if
+      }    // ix
+    }      // iy
+  }        // for
+}  // ROIAlignRotatedBackward
+
+void ROIAlignRotatedForwardCPULauncher(Tensor input, Tensor rois, Tensor output,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       bool aligned, bool clockwise) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "ROIAlignRotated_forward", [&] {
+        ROIAlignRotatedForward<scalar_t>(
+            output_size, input.data_ptr<scalar_t>(),
+            static_cast<scalar_t>(spatial_scale), aligned, clockwise, channels,
+            height, width, aligned_height, aligned_width, sampling_ratio,
+            rois.data_ptr<scalar_t>(), output.data_ptr<scalar_t>());
+      });
+}
+
+void ROIAlignRotatedBackwardCPULauncher(Tensor grad_output, Tensor rois,
+                                        Tensor grad_input, int aligned_height,
+                                        int aligned_width, float spatial_scale,
+                                        int sampling_ratio, bool aligned,
+                                        bool clockwise) {
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  // get stride values to ensure indexing into gradients is correct.
+  int n_stride = grad_output.stride(0);
+  int c_stride = grad_output.stride(1);
+  int h_stride = grad_output.stride(2);
+  int w_stride = grad_output.stride(3);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "ROIAlignRotated_backward", [&] {
+        ROIAlignRotatedBackward<scalar_t>(
+            grad_output.numel(), grad_output.data_ptr<scalar_t>(),
+            static_cast<scalar_t>(spatial_scale), aligned, clockwise, channels,
+            height, width, aligned_height, aligned_width, sampling_ratio,
+            grad_input.data_ptr<scalar_t>(), rois.data_ptr<scalar_t>(),
+            n_stride, c_stride, h_stride, w_stride);
+      });
+}
+
+void roi_align_rotated_forward_cpu(Tensor input, Tensor rois, Tensor output,
+                                   int aligned_height, int aligned_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   bool aligned, bool clockwise) {
+  ROIAlignRotatedForwardCPULauncher(input, rois, output, aligned_height,
+                                    aligned_width, spatial_scale,
+                                    sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_backward_cpu(Tensor top_grad, Tensor rois,
+                                    Tensor bottom_grad, int aligned_height,
+                                    int aligned_width, float spatial_scale,
+                                    int sampling_ratio, bool aligned,
+                                    bool clockwise) {
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+  ROIAlignRotatedBackwardCPULauncher(
+      top_grad, rois, bottom_grad, aligned_height, aligned_width, spatial_scale,
+      sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_forward_impl(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise);
+REGISTER_DEVICE_IMPL(roi_align_rotated_forward_impl, CPU,
+                     roi_align_rotated_forward_cpu);
+REGISTER_DEVICE_IMPL(roi_align_rotated_backward_impl, CPU,
+                     roi_align_rotated_backward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/rotated_feature_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/rotated_feature_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..09dcdd33759aa03e619c629ef7ae052d0fe48f2b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/rotated_feature_align.cpp
@@ -0,0 +1,262 @@
+// modified from
+// https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection/blob/master/mmdet/ops/fr/src/feature_refine_kernel.cu
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T>
+T bilinear_interpolate(const T* input, const int height, const int width, T y,
+                       T x, const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) return 0;
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  int y_low = (int)y;
+  int x_low = (int)x;
+  int y_high;
+  int x_high;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  // do bilinear interpolation
+  T v1 = input[y_low * width + x_low];
+  T v2 = input[y_low * width + x_high];
+  T v3 = input[y_high * width + x_low];
+  T v4 = input[y_high * width + x_high];
+  const T v_low = fma(v2 - v1, lx, v1);
+  const T v_high = fma(v4 - v3, lx, v3);
+  const T val = fma(v_high - v_low, ly, v_low);
+
+  return val;
+}
+
+template <typename scalar_t>
+void rotated_feature_align_forward_cpu_kernel(
+    const int nthreads, const int points, const scalar_t* bottom_data,
+    const scalar_t* best_bboxes, const scalar_t spatial_scale,
+    const int channels, const int height, const int width, scalar_t* top_data) {
+  for (int index = 0; index < nthreads; index++) {
+    int w = index % width;
+    int h = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    const scalar_t* bbox_offset =
+        best_bboxes + ((n * height + h) * width + w) * 5;
+    scalar_t roi_y = bbox_offset[0] * spatial_scale;
+    scalar_t roi_x = bbox_offset[1] * spatial_scale;
+
+    scalar_t px[5] = {roi_x, 0, 0, 0, 0};
+    scalar_t py[5] = {roi_y, 0, 0, 0, 0};
+
+    if (points > 1) {
+      scalar_t roi_w = bbox_offset[2] * spatial_scale;
+      scalar_t roi_h = bbox_offset[3] * spatial_scale;
+      scalar_t roi_a = bbox_offset[4];
+
+      scalar_t w_2 = roi_w / 2, h_2 = roi_h / 2;
+      scalar_t cosa = cosf(roi_a), sina = sinf(roi_a);
+      scalar_t wx = cosa * w_2, wy = sina * w_2;
+      scalar_t hx = -sina * h_2, hy = cosa * h_2;
+
+      px[1] = roi_x + wx + hx;
+      py[1] = roi_y + wy + hy;
+      px[2] = roi_x - wx + hx;
+      py[2] = roi_y - wy + hy;
+      px[3] = roi_x - wx - hx;
+      py[3] = roi_y - wy - hy;
+      px[4] = roi_x + wx - hx;
+      py[4] = roi_y + wy - hy;
+    }
+
+    const scalar_t* offset_bottom_data =
+        bottom_data + (n * channels + c) * height * width;
+
+    scalar_t output_val = bottom_data[index];
+    for (int i = 0; i < points; i++) {
+      output_val += bilinear_interpolate<scalar_t>(offset_bottom_data, height,
+                                                   width, py[i], px[i], i);
+    }
+    top_data[index] = output_val;
+  }
+}
+
+template <typename T>
+void bilinear_interpolate_gradient(const int height, const int width, T y, T x,
+                                   T& w1, T& w2, T& w3, T& w4, int& x_low,
+                                   int& x_high, int& y_low, int& y_high,
+                                   const int index) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    // empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+
+  y_low = (int)y;
+  x_low = (int)x;
+
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T)y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T)x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+
+  return;
+}
+
+template <typename scalar_t>
+inline void valueAdd(scalar_t* address, scalar_t val) {
+  scalar_t old = *address;
+  *address = (old + val);
+}
+
+template <typename scalar_t>
+void rotated_feature_align_backward_cpu_kernel(
+    const int nthreads, const int points, const scalar_t* top_diff,
+    const scalar_t* best_bboxes, const scalar_t spatial_scale,
+    const int channels, const int height, const int width,
+    scalar_t* bottom_diff) {
+  for (int index = 0; index < nthreads; index++) {
+    int w = index % width;
+    int h = (index / width) % height;
+    int c = (index / width / height) % channels;
+    int n = index / width / height / channels;
+
+    const scalar_t* bbox_offset =
+        best_bboxes + ((n * height + h) * width + w) * 5;
+    scalar_t roi_y = bbox_offset[0] * spatial_scale;
+    scalar_t roi_x = bbox_offset[1] * spatial_scale;
+
+    scalar_t px[5] = {roi_x, 0, 0, 0, 0};
+    scalar_t py[5] = {roi_y, 0, 0, 0, 0};
+
+    if (points > 1) {
+      scalar_t roi_w = bbox_offset[2] * spatial_scale;
+      scalar_t roi_h = bbox_offset[3] * spatial_scale;
+      scalar_t roi_a = bbox_offset[4];
+
+      scalar_t w_2 = roi_w / 2, h_2 = roi_h / 2;
+      scalar_t cosa = cosf(roi_a), sina = sinf(roi_a);
+      scalar_t wx = cosa * w_2, wy = sina * w_2;
+      scalar_t hx = -sina * h_2, hy = cosa * h_2;
+
+      px[1] = roi_x + wx + hx;
+      py[1] = roi_y + wy + hy;
+      px[2] = roi_x - wx + hx;
+      py[2] = roi_y - wy + hy;
+      px[3] = roi_x - wx - hx;
+      py[3] = roi_y - wy - hy;
+      px[4] = roi_x + wx - hx;
+      py[4] = roi_y + wy - hy;
+    }
+
+    scalar_t* offset_bottom_diff =
+        bottom_diff + (n * channels + c) * height * width;
+    scalar_t value_top_diff = top_diff[index];
+
+    valueAdd(bottom_diff + index, value_top_diff);
+    for (int i = 0; i < points; i++) {
+      scalar_t w1, w2, w3, w4;
+      int x_low, x_high, y_low, y_high;
+
+      bilinear_interpolate_gradient<scalar_t>(height, width, py[i], px[i], w1,
+                                              w2, w3, w4, x_low, x_high, y_low,
+                                              y_high, i);
+      scalar_t g1 = value_top_diff * w1;
+      scalar_t g2 = value_top_diff * w2;
+      scalar_t g3 = value_top_diff * w3;
+      scalar_t g4 = value_top_diff * w4;
+      if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
+        valueAdd(offset_bottom_diff + y_low * width + x_low, g1);
+        valueAdd(offset_bottom_diff + y_low * width + x_high, g2);
+        valueAdd(offset_bottom_diff + y_high * width + x_low, g3);
+        valueAdd(offset_bottom_diff + y_high * width + x_high, g4);
+      }
+    }
+  }
+}
+
+void rotated_feature_align_forward_cpu(const Tensor features,
+                                       const Tensor best_bboxes,
+                                       const float spatial_scale,
+                                       const int points, Tensor output) {
+  const int output_size = features.numel();
+  AT_DISPATCH_FLOATING_TYPES(
+      features.scalar_type(), "rotated_feature_align_forward_cpu_kernel", [&] {
+        const scalar_t* bottom_data = features.data_ptr<scalar_t>();
+        const scalar_t* bboxes_data = best_bboxes.data_ptr<scalar_t>();
+        scalar_t* top_data = output.data_ptr<scalar_t>();
+
+        rotated_feature_align_forward_cpu_kernel<scalar_t>(
+            output_size, points, bottom_data, bboxes_data,
+            scalar_t(spatial_scale), features.size(1), features.size(2),
+            features.size(3), top_data);
+      });
+}
+
+void rotated_feature_align_backward_cpu(const Tensor top_grad,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor bottom_grad) {
+  const int output_size = top_grad.numel();
+  AT_DISPATCH_FLOATING_TYPES(
+      top_grad.scalar_type(), "rotated_feature_align_backward_cpu_kernel", [&] {
+        const scalar_t* top_diff = top_grad.data_ptr<scalar_t>();
+        const scalar_t* bboxes_data = best_bboxes.data_ptr<scalar_t>();
+        scalar_t* bottom_diff = bottom_grad.data_ptr<scalar_t>();
+
+        rotated_feature_align_backward_cpu_kernel<scalar_t>(
+            output_size, points, top_diff, bboxes_data, scalar_t(spatial_scale),
+            top_grad.size(1), top_grad.size(2), top_grad.size(3), bottom_diff);
+      });
+}
+
+void rotated_feature_align_forward_impl(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output);
+
+void rotated_feature_align_backward_impl(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad);
+
+REGISTER_DEVICE_IMPL(rotated_feature_align_forward_impl, CPU,
+                     rotated_feature_align_forward_cpu);
+
+REGISTER_DEVICE_IMPL(rotated_feature_align_backward_impl, CPU,
+                     rotated_feature_align_backward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/voxelization.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/voxelization.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a21f849a0b90ebb489d26daadbbc48427d6dd502
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cpu/voxelization.cpp
@@ -0,0 +1,186 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+template <typename T, typename T_int>
+void dynamic_voxelize_forward_cpu_kernel(
+    const torch::TensorAccessor<T, 2> points,
+    torch::TensorAccessor<T_int, 2> coors, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const std::vector<int> grid_size,
+    const int num_points, const int num_features, const int NDim) {
+  const int ndim_minus_1 = NDim - 1;
+  bool failed = false;
+  // int coor[NDim];
+  int* coor = new int[NDim]();
+  int c;
+
+  for (int i = 0; i < num_points; ++i) {
+    failed = false;
+    for (int j = 0; j < NDim; ++j) {
+      c = floor((points[i][j] - coors_range[j]) / voxel_size[j]);
+      // necessary to rm points out of range
+      if ((c < 0 || c >= grid_size[j])) {
+        failed = true;
+        break;
+      }
+      coor[ndim_minus_1 - j] = c;
+    }
+
+    // memcpy and memset will cause problem because of the memory distribution
+    // discontinuity of TensorAccessor, so here using loops to replace memcpy
+    // or memset
+    if (failed) {
+      for (int k = 0; k < NDim; ++k) {
+        coors[i][k] = -1;
+      }
+    } else {
+      for (int k = 0; k < NDim; ++k) {
+        coors[i][k] = coor[k];
+      }
+    }
+  }
+
+  delete[] coor;
+  return;
+}
+
+template <typename T, typename T_int>
+void hard_voxelize_forward_cpu_kernel(
+    const torch::TensorAccessor<T, 2> points,
+    torch::TensorAccessor<T, 3> voxels, torch::TensorAccessor<T_int, 2> coors,
+    torch::TensorAccessor<T_int, 1> num_points_per_voxel,
+    torch::TensorAccessor<T_int, 3> coor_to_voxelidx, int& voxel_num,
+    const std::vector<float> voxel_size, const std::vector<float> coors_range,
+    const std::vector<int> grid_size, const int max_points,
+    const int max_voxels, const int num_points, const int num_features,
+    const int NDim) {
+  // declare a temp coors
+  at::Tensor temp_coors = at::zeros(
+      {num_points, NDim}, at::TensorOptions().dtype(at::kInt).device(at::kCPU));
+
+  // First use dynamic voxelization to get coors,
+  // then check max points/voxels constraints
+  dynamic_voxelize_forward_cpu_kernel<T, int>(
+      points, temp_coors.accessor<int, 2>(), voxel_size, coors_range, grid_size,
+      num_points, num_features, NDim);
+
+  int voxelidx, num;
+  auto coor = temp_coors.accessor<int, 2>();
+
+  for (int i = 0; i < num_points; ++i) {
+    // T_int* coor = temp_coors.data_ptr<int>() + i * NDim;
+
+    if (coor[i][0] == -1) continue;
+
+    voxelidx = coor_to_voxelidx[coor[i][0]][coor[i][1]][coor[i][2]];
+
+    // record voxel
+    if (voxelidx == -1) {
+      voxelidx = voxel_num;
+      if (max_voxels != -1 && voxel_num >= max_voxels) continue;
+      voxel_num += 1;
+
+      coor_to_voxelidx[coor[i][0]][coor[i][1]][coor[i][2]] = voxelidx;
+      // memcpy will cause problem because of the memory distribution
+      // discontinuity of TensorAccessor, so here using loops to replace memcpy
+      for (int k = 0; k < NDim; ++k) {
+        coors[voxelidx][k] = coor[i][k];
+      }
+    }
+
+    // put points into voxel
+    num = num_points_per_voxel[voxelidx];
+    if (max_points == -1 || num < max_points) {
+      // memcpy will cause problem because of the memory distribution
+      // discontinuity of TensorAccessor, so here using loops to replace memcpy
+      for (int k = 0; k < num_features; ++k) {
+        voxels[voxelidx][num][k] = points[i][k];
+      }
+      num_points_per_voxel[voxelidx] += 1;
+    }
+  }
+
+  return;
+}
+
+void dynamic_voxelize_forward_cpu(const at::Tensor& points, at::Tensor& coors,
+                                  const std::vector<float> voxel_size,
+                                  const std::vector<float> coors_range,
+                                  const int NDim = 3) {
+  // check device
+  AT_ASSERTM(points.device().is_cpu(), "points must be a CPU tensor");
+
+  std::vector<int> grid_size(NDim);
+  const int num_points = points.size(0);
+  const int num_features = points.size(1);
+
+  for (int i = 0; i < NDim; ++i) {
+    grid_size[i] =
+        round((coors_range[NDim + i] - coors_range[i]) / voxel_size[i]);
+  }
+
+  // coors, num_points_per_voxel, coor_to_voxelidx are int Tensor
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "dynamic_voxelize_forward_cpu_kernel", [&] {
+        dynamic_voxelize_forward_cpu_kernel<scalar_t, int>(
+            points.accessor<scalar_t, 2>(), coors.accessor<int, 2>(),
+            voxel_size, coors_range, grid_size, num_points, num_features, NDim);
+      });
+}
+
+int hard_voxelize_forward_cpu(const at::Tensor& points, at::Tensor& voxels,
+                              at::Tensor& coors,
+                              at::Tensor& num_points_per_voxel,
+                              const std::vector<float> voxel_size,
+                              const std::vector<float> coors_range,
+                              const int max_points, const int max_voxels,
+                              const int NDim = 3) {
+  // current version tooks about 0.02s_0.03s for one frame on cpu
+  // check device
+  AT_ASSERTM(points.device().is_cpu(), "points must be a CPU tensor");
+
+  std::vector<int> grid_size(NDim);
+  const int num_points = points.size(0);
+  const int num_features = points.size(1);
+
+  for (int i = 0; i < NDim; ++i) {
+    grid_size[i] =
+        round((coors_range[NDim + i] - coors_range[i]) / voxel_size[i]);
+  }
+
+  // coors, num_points_per_voxel, coor_to_voxelidx are int Tensor
+  // printf("cpu coor_to_voxelidx size: [%d, %d, %d]\n", grid_size[2],
+  // grid_size[1], grid_size[0]);
+  at::Tensor coor_to_voxelidx =
+      -at::ones({grid_size[2], grid_size[1], grid_size[0]}, coors.options());
+
+  int voxel_num = 0;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "hard_voxelize_forward_cpu_kernel", [&] {
+        hard_voxelize_forward_cpu_kernel<scalar_t, int>(
+            points.accessor<scalar_t, 2>(), voxels.accessor<scalar_t, 3>(),
+            coors.accessor<int, 2>(), num_points_per_voxel.accessor<int, 1>(),
+            coor_to_voxelidx.accessor<int, 3>(), voxel_num, voxel_size,
+            coors_range, grid_size, max_points, max_voxels, num_points,
+            num_features, NDim);
+      });
+
+  return voxel_num;
+}
+
+int hard_voxelize_forward_impl(const at::Tensor& points, at::Tensor& voxels,
+                               at::Tensor& coors,
+                               at::Tensor& num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim);
+
+void dynamic_voxelize_forward_impl(const at::Tensor& points, at::Tensor& coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim);
+REGISTER_DEVICE_IMPL(hard_voxelize_forward_impl, CPU,
+                     hard_voxelize_forward_cpu);
+REGISTER_DEVICE_IMPL(dynamic_voxelize_forward_impl, CPU,
+                     dynamic_voxelize_forward_cpu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/active_rotated_filter_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/active_rotated_filter_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..27fffb9faeaa33eff201c0fcaf236866e5d10712
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/active_rotated_filter_cuda.cu
@@ -0,0 +1,58 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/csuhan/s2anet/blob/master/mmdet/ops/orn/src/cuda/ActiveRotatingFilter_cuda.cu
+#include "active_rotated_filter_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void ActiveRotatedFilterForwardCUDAKernelLauncher(const Tensor input,
+                                                  const Tensor indices,
+                                                  Tensor output) {
+  int num_output_planes = input.size(0);
+  int num_input_planes = input.size(1);
+  int num_orientations = input.size(2);
+  int kH = input.size(3);
+  int kW = input.size(4);
+  int num_rotations = indices.size(3);
+  int nEntry = num_orientations * kH * kW;
+  int output_size = input.numel();
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "active_rotated_filter_forward_cuda_kernel", [&] {
+        active_rotated_filter_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                indices.data_ptr<int>(), num_input_planes, num_output_planes,
+                num_orientations, num_rotations, nEntry,
+                output.data_ptr<scalar_t>());
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ActiveRotatedFilterBackwardCUDAKernelLauncher(const Tensor grad_out,
+                                                   const Tensor indices,
+                                                   Tensor grad_in) {
+  int num_orientations = indices.size(0);
+  int kH = indices.size(1);
+  int kW = indices.size(2);
+  int num_rotations = indices.size(3);
+  int num_output_planes = grad_out.size(0) / num_rotations;
+  int num_input_planes = grad_out.size(1) / num_orientations;
+  int nEntry = num_orientations * kH * kW;
+  int output_size = grad_in.numel();
+
+  at::cuda::CUDAGuard device_guard(indices.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "active_rotated_filter_backward_cuda_kernel",
+      [&] {
+        active_rotated_filter_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_out.data_ptr<scalar_t>(),
+                indices.data_ptr<int>(), num_input_planes, num_output_planes,
+                num_orientations, num_rotations, nEntry,
+                grad_in.data_ptr<scalar_t>());
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/assign_score_withk_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/assign_score_withk_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..bdb5fab9fc61ad19d9230cfdc26642dc7fe5972e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/assign_score_withk_cuda.cu
@@ -0,0 +1,66 @@
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/paconv_lib/src/gpu
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "assign_score_withk_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void AssignScoreWithKForwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& points, const Tensor& centers, const Tensor& scores,
+    const Tensor& knn_idx, Tensor& output) {
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(GET_BLOCKS(B * O * N1 * K, THREADS_PER_BLOCK));
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "assign_score_withk_forward_cuda_kernel", [&] {
+        assign_score_withk_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                B, N0, N1, M, K, O, aggregate, points.data_ptr<scalar_t>(),
+                centers.data_ptr<scalar_t>(), scores.data_ptr<scalar_t>(),
+                knn_idx.data_ptr<int64_t>(), output.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void AssignScoreWithKBackwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores) {
+  at::cuda::CUDAGuard device_guard(grad_out.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks1(GET_BLOCKS(B * M * O, THREADS_PER_BLOCK));
+  dim3 threads1(THREADS_PER_BLOCK);
+  dim3 blocks2(GET_BLOCKS(B * N1 * K * M, THREADS_PER_BLOCK));
+  dim3 threads2(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "assign_score_withk_points_backward_cuda_kernel",
+      [&] {
+        assign_score_withk_points_backward_cuda_kernel<scalar_t>
+            <<<blocks1, threads1, 0, stream>>>(
+                B, N0, N1, M, K, O, aggregate, grad_out.data_ptr<scalar_t>(),
+                scores.data_ptr<scalar_t>(), knn_idx.data_ptr<int64_t>(),
+                grad_points.data_ptr<scalar_t>(),
+                grad_centers.data_ptr<scalar_t>());
+      });
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "assign_score_withk_scores_backward_cuda_kernel",
+      [&] {
+        assign_score_withk_scores_backward_cuda_kernel<scalar_t>
+            <<<blocks2, threads2, 0, stream>>>(
+                B, N0, N1, M, K, O, aggregate, grad_out.data_ptr<scalar_t>(),
+                points.data_ptr<scalar_t>(), centers.data_ptr<scalar_t>(),
+                knn_idx.data_ptr<int64_t>(), grad_scores.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ball_query_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ball_query_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..c42c3e2ae6164dfc504c2794db1436607ec8445f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ball_query_cuda.cu
@@ -0,0 +1,38 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query_gpu.cu
+
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "ball_query_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void BallQueryForwardCUDAKernelLauncher(int b, int n, int m, float min_radius,
+                                        float max_radius, int nsample,
+                                        const Tensor new_xyz, const Tensor xyz,
+                                        Tensor idx) {
+  // new_xyz: (B, M, 3)
+  // xyz: (B, N, 3)
+  // output:
+  //      idx: (B, M, nsample)
+
+  at::cuda::CUDAGuard device_guard(new_xyz.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(m, THREADS_PER_BLOCK), b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      new_xyz.scalar_type(), "ball_query_forward_cuda_kernel", [&] {
+        ball_query_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, n, m, min_radius, max_radius, nsample,
+                new_xyz.data_ptr<scalar_t>(), xyz.data_ptr<scalar_t>(),
+                idx.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bbox_overlaps_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bbox_overlaps_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..7dae535cfb4818d6cae445666378332db29bb9f0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bbox_overlaps_cuda.cu
@@ -0,0 +1,40 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "bbox_overlaps_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+// Disable fp16 on ROCm device
+#ifndef MMCV_WITH_HIP
+#if __CUDA_ARCH__ >= 530
+template <>
+__global__ void bbox_overlaps_cuda_kernel<at::Half>(
+    const at::Half* bbox1, const at::Half* bbox2, at::Half* ious,
+    const int num_bbox1, const int num_bbox2, const int mode,
+    const bool aligned, const int offset) {
+  bbox_overlaps_cuda_kernel_half(reinterpret_cast<const __half*>(bbox1),
+                                 reinterpret_cast<const __half*>(bbox2),
+                                 reinterpret_cast<__half*>(ious), num_bbox1,
+                                 num_bbox2, mode, aligned, offset);
+}
+
+#endif  // __CUDA_ARCH__ >= 530
+#endif  // MMCV_WITH_HIP
+
+void BBoxOverlapsCUDAKernelLauncher(const Tensor bboxes1, const Tensor bboxes2,
+                                    Tensor ious, const int mode,
+                                    const bool aligned, const int offset) {
+  int output_size = ious.numel();
+  int num_bbox1 = bboxes1.size(0);
+  int num_bbox2 = bboxes2.size(0);
+
+  at::cuda::CUDAGuard device_guard(bboxes1.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      bboxes1.scalar_type(), "bbox_overlaps_cuda_kernel", ([&] {
+        bbox_overlaps_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                bboxes1.data_ptr<scalar_t>(), bboxes2.data_ptr<scalar_t>(),
+                ious.data_ptr<scalar_t>(), num_bbox1, num_bbox2, mode, aligned,
+                offset);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bezier_align_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bezier_align_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..b2786a84eb7dcbb6b06e950f6a1e80f8fcaebfb7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/bezier_align_cuda.cu
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "bezier_align_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void BezierAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                          Tensor output, int aligned_height,
+                                          int aligned_width,
+                                          float spatial_scale,
+                                          int sampling_ratio, bool aligned) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "bezier_align_forward_cuda_kernel", [&] {
+        bezier_align_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),
+                aligned_height, aligned_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio, aligned,
+                channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void BezierAlignBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor rois, Tensor grad_input, int aligned_height,
+    int aligned_width, float spatial_scale, int sampling_ratio, bool aligned) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "bezier_align_backward_cuda_kernel", [&] {
+        bezier_align_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), grad_input.data_ptr<scalar_t>(),
+                aligned_height, aligned_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio, aligned,
+                channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/border_align_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/border_align_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..3aeefea5ddafa81da74f320ae7f166f4977787b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/border_align_cuda.cu
@@ -0,0 +1,68 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "border_align_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void BorderAlignForwardCUDAKernelLauncher(const Tensor &input,
+                                          const Tensor &boxes, Tensor output,
+                                          Tensor argmax_idx,
+                                          const int pool_size) {
+  // shape assertion
+  AT_ASSERTM(input.ndimension() == 4,
+             "non-empty 4D(batch mode) tensor expected for input feature");
+  AT_ASSERTM(boxes.ndimension() == 3,
+             "boxes must be 3D tensor with size of [B, H*W, 4]");
+
+  int batch_size = input.size(0);
+  int feat_channels = input.size(1);
+  int channels = feat_channels / 4;
+  int height = input.size(2);
+  int width = input.size(3);
+  // shape [N, box_size, 4] for boxes. (x1, y1, x2, y2) format
+  int box_size = boxes.size(1);
+  // shape [N, channels, box_size, 4] for output
+  int nthreads = batch_size * channels * box_size;
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 block(128, 4);
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "border_align_forward_cuda_kernel", [&] {
+        border_align_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(nthreads), block, 0, stream>>>(
+                nthreads, input.data_ptr<scalar_t>(),
+                boxes.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),
+                argmax_idx.data_ptr<int>(), channels, box_size, height, width,
+                pool_size);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void BorderAlignBackwardCUDAKernelLauncher(const Tensor &grad_output,
+                                           const Tensor &boxes,
+                                           const Tensor &argmax_idx,
+                                           Tensor grad_input,
+                                           const int pool_size) {
+  int batch_size = grad_input.size(0);
+  int feat_channels = grad_input.size(1);
+  int channels = feat_channels / 4;
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+  int box_size = boxes.size(1);
+  int nthreads = batch_size * channels * box_size;
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 block(128, 4);
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "border_align_backward_cuda_kernel", [&] {
+        border_align_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(nthreads), block, 0, stream>>>(
+                nthreads, grad_output.data_ptr<scalar_t>(),
+                boxes.data_ptr<scalar_t>(), argmax_idx.data_ptr<int>(),
+                grad_input.data_ptr<scalar_t>(), channels, box_size, height,
+                width, pool_size);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_quadri_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_quadri_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..25b6819a795354f015c421b612fd2ae130482e91
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_quadri_cuda.cu
@@ -0,0 +1,23 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "box_iou_quadri_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void box_iou_quadri_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned) {
+  using scalar_t = float;
+  AT_ASSERTM(boxes1.is_cuda(), "boxes1 must be a CUDA tensor");
+  AT_ASSERTM(boxes2.is_cuda(), "boxes2 must be a CUDA tensor");
+
+  int output_size = ious.numel();
+  int num_boxes1 = boxes1.size(0);
+  int num_boxes2 = boxes2.size(0);
+
+  at::cuda::CUDAGuard device_guard(boxes1.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  box_iou_quadri_cuda_kernel<scalar_t>
+      <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+          num_boxes1, num_boxes2, boxes1.data_ptr<scalar_t>(),
+          boxes2.data_ptr<scalar_t>(), (scalar_t*)ious.data_ptr<scalar_t>(),
+          mode_flag, aligned);
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_rotated_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_rotated_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..3c13e06237b208a48e2489ef8246c90ada78ef51
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/box_iou_rotated_cuda.cu
@@ -0,0 +1,25 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cuda.cu
+#include "box_iou_rotated_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void box_iou_rotated_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned) {
+  using scalar_t = float;
+  AT_ASSERTM(boxes1.is_cuda(), "boxes1 must be a CUDA tensor");
+  AT_ASSERTM(boxes2.is_cuda(), "boxes2 must be a CUDA tensor");
+
+  int output_size = ious.numel();
+  int num_boxes1 = boxes1.size(0);
+  int num_boxes2 = boxes2.size(0);
+
+  at::cuda::CUDAGuard device_guard(boxes1.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  box_iou_rotated_cuda_kernel<scalar_t>
+      <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+          num_boxes1, num_boxes2, boxes1.data_ptr<scalar_t>(),
+          boxes2.data_ptr<scalar_t>(), (scalar_t*)ious.data_ptr<scalar_t>(),
+          mode_flag, aligned);
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..984e734f9ea5e15de2517d6a580dbe35a11c208b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_cuda.cu
@@ -0,0 +1,180 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "carafe_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void CARAFEForwardCUDAKernelLauncher(const Tensor features, const Tensor masks,
+                                     Tensor rfeatures, Tensor routput,
+                                     Tensor rmasks, Tensor output,
+                                     const int kernel_size,
+                                     const int group_size,
+                                     const int scale_factor) {
+  const int batch_size = output.size(0);
+  const int channels = output.size(1);
+  const int output_height = output.size(2);
+  const int output_width = output.size(3);
+
+  const int input_height = features.size(2);
+  const int input_width = features.size(3);
+
+  const int mask_channels = masks.size(1);
+
+  rfeatures.resize_({batch_size, input_height, input_width, channels});
+  routput.resize_({batch_size, output_height, output_width, channels});
+  rmasks.resize_({batch_size, output_height, output_width, mask_channels});
+
+  // one warp per pixel
+  at::cuda::CUDAGuard device_guard(features.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "NCHW2NHWC_Feature", ([&] {
+        const scalar_t *bottom_data = features.data_ptr<scalar_t>();
+        scalar_t *top_data = rfeatures.data_ptr<scalar_t>();
+        const int dh = divideUP(channels, kTileDim);
+        const int dw = divideUP(input_height * input_width, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, channels, input_height * input_width, dh, dw,
+                bottom_data, top_data);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "NCHW2NHWC_Masks", ([&] {
+        const scalar_t *bottom_data = masks.data_ptr<scalar_t>();
+        scalar_t *top_data = rmasks.data_ptr<scalar_t>();
+        const int dh = divideUP(mask_channels, kTileDim);
+        const int dw = divideUP(output_height * output_width, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, mask_channels, output_height * output_width, dh, dw,
+                bottom_data, top_data);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "CARAFELaucherForward", ([&] {
+        const int num_kernels =
+            batch_size * output_height * output_width * THREADS_PER_PIXEL;
+        const scalar_t *bottom_data = rfeatures.data_ptr<scalar_t>();
+        const scalar_t *bottom_masks = rmasks.data_ptr<scalar_t>();
+        scalar_t *top_data = routput.data_ptr<scalar_t>();
+
+        CARAFEForward<scalar_t><<<divideUP(num_kernels, THREADS_PER_BLOCK),
+                                  THREADS_PER_BLOCK, 0, stream>>>(
+            num_kernels, bottom_data, bottom_masks, kernel_size, group_size,
+            scale_factor, channels, input_height, input_width, output_height,
+            output_width, mask_channels, top_data);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "NHWC2NCHW", ([&] {
+        const scalar_t *bottom_data = routput.data_ptr<scalar_t>();
+        scalar_t *top_data = output.data_ptr<scalar_t>();
+        const int dh = divideUP(output_height * output_width, kTileDim);
+        const int dw = divideUP(channels, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, output_height * output_width, channels, dh, dw,
+                bottom_data, top_data);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void CARAFEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor rfeatures, const Tensor masks,
+    Tensor rtop_grad, Tensor rbottom_grad_hs, Tensor rbottom_grad,
+    Tensor rmask_grad, Tensor bottom_grad, Tensor mask_grad,
+    const int kernel_size, const int group_size, const int scale_factor) {
+  const int batch_size = top_grad.size(0);
+  const int channels = top_grad.size(1);
+  const int output_height = top_grad.size(2);
+  const int output_width = top_grad.size(3);
+
+  const int input_height = bottom_grad.size(2);
+  const int input_width = bottom_grad.size(3);
+
+  const int mask_channels = masks.size(1);
+
+  rtop_grad.resize_({batch_size, output_height, output_width, channels});
+  rbottom_grad.resize_({batch_size, input_height, input_width, channels});
+  rbottom_grad_hs.resize_({batch_size, output_height, output_width, channels});
+  rmask_grad.resize_({batch_size, output_height, output_width, mask_channels});
+
+  at::cuda::CUDAGuard device_guard(top_grad.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "NCHW2NHWC_Top_Grad", ([&] {
+        const scalar_t *bottom_data = top_grad.data_ptr<scalar_t>();
+        scalar_t *top_data = rtop_grad.data_ptr<scalar_t>();
+        const int dh = divideUP(channels, kTileDim);
+        const int dw = divideUP(output_height * output_width, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, channels, output_height * output_width, dh, dw,
+                bottom_data, top_data);
+      }));
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "CARAFELaucherBackward_Feature", ([&] {
+        const int num_kernels =
+            batch_size * output_height * output_width * THREADS_PER_PIXEL;
+        const scalar_t *top_diff = rtop_grad.data_ptr<scalar_t>();
+        const scalar_t *bottom_masks = masks.data_ptr<scalar_t>();
+        scalar_t *bottom_diff = rbottom_grad_hs.data_ptr<scalar_t>();
+
+        CARAFEBackward_Feature<scalar_t>
+            <<<divideUP(num_kernels, THREADS_PER_BLOCK), THREADS_PER_BLOCK, 0,
+               stream>>>(num_kernels, top_diff, bottom_masks, kernel_size,
+                         group_size, scale_factor, channels, input_height,
+                         input_width, output_height, output_width,
+                         mask_channels, bottom_diff);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "FeatureSum", ([&] {
+        const int num_kernels =
+            batch_size * input_height * input_width * THREADS_PER_PIXEL;
+        const scalar_t *bottom_diff_hs = rbottom_grad_hs.data_ptr<scalar_t>();
+        scalar_t *bottom_diff = rbottom_grad.data_ptr<scalar_t>();
+
+        FeatureSum<scalar_t>
+            <<<divideUP(num_kernels, THREADS_PER_BLOCK), THREADS_PER_BLOCK, 0,
+               stream>>>(num_kernels, bottom_diff_hs, scale_factor, channels,
+                         input_height, input_width, bottom_diff);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "NHWC2NCHW_Bottom_Grad", ([&] {
+        const scalar_t *bottom_data = rbottom_grad.data_ptr<scalar_t>();
+        scalar_t *top_data = bottom_grad.data_ptr<scalar_t>();
+        const int dh = divideUP(input_height * input_width, kTileDim);
+        const int dw = divideUP(channels, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, input_height * input_width, channels, dh, dw,
+                bottom_data, top_data);
+      }));
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "CARAFELaucherBackward_Mask", ([&] {
+        const int num_kernels = batch_size * output_height * output_width *
+                                mask_channels * WARP_SIZE;
+        const scalar_t *top_diff = rtop_grad.data_ptr<scalar_t>();
+        const scalar_t *bottom_data = rfeatures.data_ptr<scalar_t>();
+        scalar_t *mask_diff = rmask_grad.data_ptr<scalar_t>();
+
+        CARAFEBackward_Mask<scalar_t>
+            <<<divideUP(num_kernels, THREADS_PER_BLOCK), THREADS_PER_BLOCK, 0,
+               stream>>>(num_kernels, top_diff, bottom_data, kernel_size,
+                         group_size, scale_factor, channels, input_height,
+                         input_width, output_height, output_width,
+                         mask_channels, mask_diff);
+      }));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "NHWC2NCHW_Mask_Grad", ([&] {
+        const scalar_t *bottom_data = rmask_grad.data_ptr<scalar_t>();
+        scalar_t *top_data = mask_grad.data_ptr<scalar_t>();
+        const int dh = divideUP(output_height * output_width, kTileDim);
+        const int dw = divideUP(mask_channels, kTileDim);
+        BatchTranspose2DCUDAKernel<scalar_t>
+            <<<batch_size * dh * dw, dim3(kTileDim, kBlockRows), 0, stream>>>(
+                batch_size, output_height * output_width, mask_channels, dh, dw,
+                bottom_data, top_data);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_naive_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_naive_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..2fc5667686d225064bd14c2f2ad5d06b93bd5fca
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/carafe_naive_cuda.cu
@@ -0,0 +1,52 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "carafe_naive_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void CARAFENAIVEForwardCUDAKernelLauncher(const Tensor features,
+                                          const Tensor masks, Tensor output,
+                                          const int kernel_size,
+                                          const int group_size,
+                                          const int scale_factor) {
+  int output_size = output.numel();
+  int channels = output.size(1);
+  int height = output.size(2);
+  int width = output.size(3);
+
+  at::cuda::CUDAGuard device_guard(features.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "CARAFENAIVEForward", ([&] {
+        carafe_naive_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, features.data_ptr<scalar_t>(),
+                masks.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),
+                kernel_size, group_size, scale_factor, channels, height, width);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void CARAFENAIVEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor features, const Tensor masks,
+    Tensor bottom_grad, Tensor mask_grad, const int kernel_size,
+    const int group_size, const int scale_factor) {
+  int output_size = top_grad.numel();
+  int channels = top_grad.size(1);
+  int height = top_grad.size(2);
+  int width = top_grad.size(3);
+
+  at::cuda::CUDAGuard device_guard(top_grad.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "CARAFENAIVEBackward", ([&] {
+        carafe_naive_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, top_grad.data_ptr<scalar_t>(),
+                features.data_ptr<scalar_t>(), masks.data_ptr<scalar_t>(),
+                bottom_grad.data_ptr<scalar_t>(),
+                mask_grad.data_ptr<scalar_t>(), kernel_size, group_size,
+                scale_factor, channels, height, width);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/chamfer_distance_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/chamfer_distance_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..6effa29ee7f9998f03461df5e0c251657aeccc39
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/chamfer_distance_cuda.cu
@@ -0,0 +1,63 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/chrdiller/pyTorchChamferDistance/blob/master/chamfer_distance/chamfer_distance.cpp
+#include "chamfer_distance_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void ChamferDistanceForwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, const Tensor dist1,
+    const Tensor dist2, const Tensor idx1, const Tensor idx2) {
+  int batch_size = xyz1.size(0);
+  int n = xyz1.size(1);
+  int m = xyz2.size(1);
+
+  at::cuda::CUDAGuard device_guard(xyz1.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz1.scalar_type(), "chamfer_distance_forward_cuda_kernel", [&] {
+        chamfer_distance_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(batch_size * n), THREADS_PER_BLOCK, 0, stream>>>(
+                batch_size, n, xyz1.data_ptr<scalar_t>(), m,
+                xyz2.data_ptr<scalar_t>(), dist1.data_ptr<scalar_t>(),
+                idx1.data_ptr<int>());
+      });
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz1.scalar_type(), "chamfer_distance_forward_cuda_kernel", [&] {
+        chamfer_distance_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(batch_size * m), THREADS_PER_BLOCK, 0, stream>>>(
+                batch_size, m, xyz2.data_ptr<scalar_t>(), n,
+                xyz1.data_ptr<scalar_t>(), dist2.data_ptr<scalar_t>(),
+                idx2.data_ptr<int>());
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ChamferDistanceBackwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, Tensor idx1, Tensor idx2,
+    Tensor grad_dist1, Tensor grad_dist2, Tensor grad_xyz1, Tensor grad_xyz2) {
+  int batch_size = xyz1.size(0);
+  int n = xyz1.size(1);
+  int m = xyz2.size(1);
+
+  at::cuda::CUDAGuard device_guard(xyz1.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz1.scalar_type(), "chamfer_distance_backward_cuda_kernel", [&] {
+        chamfer_distance_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(batch_size * n), THREADS_PER_BLOCK / 2, 0, stream>>>(
+                batch_size, m, xyz1.data_ptr<scalar_t>(), n,
+                xyz2.data_ptr<scalar_t>(), grad_dist1.data_ptr<scalar_t>(),
+                idx1.data_ptr<int>(), grad_xyz1.data_ptr<scalar_t>(),
+                grad_xyz2.data_ptr<scalar_t>());
+      });
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz1.scalar_type(), "chamfer_distance_backward_cuda_kernel", [&] {
+        chamfer_distance_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(batch_size * m), THREADS_PER_BLOCK / 2, 0, stream>>>(
+                batch_size, n, xyz2.data_ptr<scalar_t>(), m,
+                xyz1.data_ptr<scalar_t>(), grad_dist2.data_ptr<scalar_t>(),
+                idx2.data_ptr<int>(), grad_xyz2.data_ptr<scalar_t>(),
+                grad_xyz1.data_ptr<scalar_t>());
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/convex_iou.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/convex_iou.cu
new file mode 100644
index 0000000000000000000000000000000000000000..804f7ac3bae433173f2e71011fa5be2c2c81e761
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/convex_iou.cu
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/SDL-GuoZonghao/BeyondBoundingBox/blob/main/mmdet/ops/iou/src/convex_iou_kernel.cu
+#include "convex_iou_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void ConvexIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                 Tensor ious) {
+  int output_size = ious.numel();
+  int num_pointsets = pointsets.size(0);
+  int num_polygons = polygons.size(0);
+
+  at::cuda::CUDAGuard device_guard(pointsets.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      pointsets.scalar_type(), "convex_iou_cuda_kernel", ([&] {
+        convex_iou_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK / 2, 0, stream>>>(
+                num_pointsets, num_polygons, pointsets.data_ptr<scalar_t>(),
+                polygons.data_ptr<scalar_t>(), ious.data_ptr<scalar_t>());
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ConvexGIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                  Tensor output) {
+  int output_size = output.numel();
+  int num_pointsets = pointsets.size(0);
+  int num_polygons = polygons.size(0);
+
+  at::cuda::CUDAGuard device_guard(pointsets.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      pointsets.scalar_type(), "convex_giou_cuda_kernel", ([&] {
+        convex_giou_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK / 2, 0, stream>>>(
+                num_pointsets, num_polygons, pointsets.data_ptr<scalar_t>(),
+                polygons.data_ptr<scalar_t>(), output.data_ptr<scalar_t>());
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/correlation_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/correlation_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..6a43cfc70dafd8050699eac05bfc9bd896f5ba2f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/correlation_cuda.cu
@@ -0,0 +1,94 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/ClementPinard/Pytorch-Correlation-extension/blob/master/Correlation_Module/correlation_cuda_kernel.cu
+// Original licence: Under MIT License
+
+#include "correlation_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void CorrelationForwardCUDAKernelLauncher(Tensor input1, Tensor input2,
+                                          Tensor output, int kH, int kW,
+                                          int patchH, int patchW, int padH,
+                                          int padW, int dilationH,
+                                          int dilationW, int dilation_patchH,
+                                          int dilation_patchW, int dH, int dW) {
+  const int batch_size = input1.size(0);
+  const int iH = input1.size(2);
+  const int iW = input1.size(3);
+  const int dilatedKH = (kH - 1) * dilationH + 1;
+  const int dilatedKW = (kW - 1) * dilationW + 1;
+
+  const auto oH = (iH + 2 * padH - dilatedKH) / dH + 1;
+  const auto oW = (iW + 2 * padW - dilatedKW) / dW + 1;
+
+  auto trInput1 = input1.permute({0, 2, 3, 1}).contiguous();
+  auto trInput2 = input2.permute({0, 2, 3, 1}).contiguous();
+
+  const dim3 threads(WARP_SIZE, 4, 4);
+  const dim3 blocks(batch_size, (oH + 3) >> 2, (oW + 3) >> 2);
+
+  at::cuda::CUDAGuard device_guard(input1.device());
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input1.scalar_type(), "correlation_forward_cuda", ([&] {
+        TensorAcc4R trInput1_acc =
+            trInput1.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc4R trInput2_acc =
+            trInput2.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc5R output_acc =
+            output.packed_accessor32<scalar_t, 5, RestrictPtrTraits>();
+
+        correlation_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(
+                trInput1_acc, trInput2_acc, output_acc, kH, kW, patchH, patchW,
+                padH, padW, dilationH, dilationW, dilation_patchH,
+                dilation_patchW, dH, dW, oH, oW);
+      }));
+}
+
+void CorrelationBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor input1, Tensor input2, Tensor grad_input1,
+    Tensor grad_input2, int kH, int kW, int patchH, int patchW, int padH,
+    int padW, int dilationH, int dilationW, int dilation_patchH,
+    int dilation_patchW, int dH, int dW) {
+  const int batch_size = input1.size(0);
+  const int iH = input1.size(2);
+  const int iW = input1.size(3);
+  const int C = input1.size(1);
+
+  auto trInput1 = input1.permute({0, 2, 3, 1}).contiguous();
+  auto trInput2 = input2.permute({0, 2, 3, 1}).contiguous();
+  const dim3 blocks(batch_size, iH, iW);
+  const dim3 threads(THREADS_PER_BLOCK);
+
+  at::cuda::CUDAGuard device_guard(input1.device());
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input1.scalar_type(), "correlation_backward_cuda", ([&] {
+        const int grad_cache_size = patchH * patchW * sizeof(scalar_t);
+        TensorAcc4R input1_acc =
+            trInput1.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc4R input2_acc =
+            trInput2.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc4R grad_input1_acc =
+            grad_input1.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc4R grad_input2_acc =
+            grad_input2.packed_accessor32<scalar_t, 4, RestrictPtrTraits>();
+        TensorAcc5R grad_output_acc =
+            grad_output.packed_accessor32<scalar_t, 5, RestrictPtrTraits>();
+
+        correlation_backward_cuda_kernel_input1<scalar_t>
+            <<<blocks, threads, grad_cache_size,
+               at::cuda::getCurrentCUDAStream()>>>(
+                grad_output_acc, input2_acc, grad_input1_acc, kH, kW, patchH,
+                patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+                dilation_patchW, dH, dW);
+
+        correlation_backward_cuda_kernel_input2<scalar_t>
+            <<<blocks, threads, grad_cache_size,
+               at::cuda::getCurrentCUDAStream()>>>(
+                grad_output_acc, input1_acc, grad_input2_acc, kH, kW, patchH,
+                patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+                dilation_patchW, dH, dW);
+      }));
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/cudabind.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/cudabind.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7171581491474bcc2b407dec8928fdcab2b01ea9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/cudabind.cpp
@@ -0,0 +1,1894 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void AssignScoreWithKForwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& points, const Tensor& centers, const Tensor& scores,
+    const Tensor& knn_idx, Tensor& output);
+
+void AssignScoreWithKBackwardCUDAKernelLauncher(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores);
+
+void assign_score_withk_forward_cuda(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output) {
+  AssignScoreWithKForwardCUDAKernelLauncher(
+      B, N0, N1, M, K, O, aggregate, points, centers, scores, knn_idx, output);
+};
+
+void assign_score_withk_backward_cuda(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores) {
+  AssignScoreWithKBackwardCUDAKernelLauncher(
+      B, N0, N1, M, K, O, aggregate, grad_out, points, centers, scores, knn_idx,
+      grad_points, grad_centers, grad_scores);
+};
+
+void assign_score_withk_forward_impl(int B, int N0, int N1, int M, int K, int O,
+                                     int aggregate, const Tensor& points,
+                                     const Tensor& centers,
+                                     const Tensor& scores,
+                                     const Tensor& knn_idx, Tensor& output);
+
+void assign_score_withk_backward_impl(
+    int B, int N0, int N1, int M, int K, int O, int aggregate,
+    const Tensor& grad_out, const Tensor& points, const Tensor& centers,
+    const Tensor& scores, const Tensor& knn_idx, Tensor& grad_points,
+    Tensor& grad_centers, Tensor& grad_scores);
+
+REGISTER_DEVICE_IMPL(assign_score_withk_forward_impl, CUDA,
+                     assign_score_withk_forward_cuda);
+REGISTER_DEVICE_IMPL(assign_score_withk_backward_impl, CUDA,
+                     assign_score_withk_backward_cuda);
+
+void BallQueryForwardCUDAKernelLauncher(int b, int n, int m, float min_radius,
+                                        float max_radius, int nsample,
+                                        const Tensor new_xyz, const Tensor xyz,
+                                        Tensor idx);
+
+void ball_query_forward_cuda(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx) {
+  BallQueryForwardCUDAKernelLauncher(b, n, m, min_radius, max_radius, nsample,
+                                     new_xyz, xyz, idx);
+};
+
+void ball_query_forward_impl(int b, int n, int m, float min_radius,
+                             float max_radius, int nsample,
+                             const Tensor new_xyz, const Tensor xyz,
+                             Tensor idx);
+REGISTER_DEVICE_IMPL(ball_query_forward_impl, CUDA, ball_query_forward_cuda);
+
+void StackBallQueryForwardCUDAKernelLauncher(float max_radius, int nsample,
+                                             const Tensor new_xyz,
+                                             const Tensor new_xyz_batch_cnt,
+                                             const Tensor xyz,
+                                             const Tensor xyz_batch_cnt,
+                                             Tensor idx);
+
+void stack_ball_query_forward_cuda(float max_radius, int nsample,
+                                   const Tensor new_xyz,
+                                   const Tensor new_xyz_batch_cnt,
+                                   const Tensor xyz, const Tensor xyz_batch_cnt,
+                                   Tensor idx) {
+  StackBallQueryForwardCUDAKernelLauncher(
+      max_radius, nsample, new_xyz, new_xyz_batch_cnt, xyz, xyz_batch_cnt, idx);
+};
+
+void stack_ball_query_forward_impl(float max_radius, int nsample,
+                                   const Tensor new_xyz,
+                                   const Tensor new_xyz_batch_cnt,
+                                   const Tensor xyz, const Tensor xyz_batch_cnt,
+                                   Tensor idx);
+REGISTER_DEVICE_IMPL(stack_ball_query_forward_impl, CUDA,
+                     stack_ball_query_forward_cuda);
+
+void BBoxOverlapsCUDAKernelLauncher(const Tensor bboxes1, const Tensor bboxes2,
+                                    Tensor ious, const int mode,
+                                    const bool aligned, const int offset);
+
+void bbox_overlaps_cuda(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset) {
+  BBoxOverlapsCUDAKernelLauncher(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset);
+REGISTER_DEVICE_IMPL(bbox_overlaps_impl, CUDA, bbox_overlaps_cuda);
+
+void BorderAlignForwardCUDAKernelLauncher(const Tensor& input,
+                                          const Tensor& boxes, Tensor output,
+                                          Tensor argmax_idx,
+                                          const int pool_size);
+
+void BorderAlignBackwardCUDAKernelLauncher(const Tensor& grad_output,
+                                           const Tensor& boxes,
+                                           const Tensor& argmax_idx,
+                                           Tensor grad_input,
+                                           const int pool_size);
+
+void border_align_forward_cuda(const Tensor& input, const Tensor& boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size) {
+  BorderAlignForwardCUDAKernelLauncher(input, boxes, output, argmax_idx,
+                                       pool_size);
+}
+
+void border_align_backward_cuda(const Tensor& grad_output, const Tensor& boxes,
+                                const Tensor& argmax_idx, Tensor grad_input,
+                                const int pool_size) {
+  BorderAlignBackwardCUDAKernelLauncher(grad_output, boxes, argmax_idx,
+                                        grad_input, pool_size);
+}
+
+void border_align_forward_impl(const Tensor& input, const Tensor& boxes,
+                               Tensor output, Tensor argmax_idx,
+                               const int pool_size);
+
+void border_align_backward_impl(const Tensor& grad_output, const Tensor& boxes,
+                                const Tensor& argmax_idx, Tensor grad_input,
+                                const int pool_size);
+
+REGISTER_DEVICE_IMPL(border_align_forward_impl, CUDA,
+                     border_align_forward_cuda);
+REGISTER_DEVICE_IMPL(border_align_backward_impl, CUDA,
+                     border_align_backward_cuda);
+
+void box_iou_rotated_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+
+void box_iou_rotated_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                          const int mode_flag, const bool aligned);
+REGISTER_DEVICE_IMPL(box_iou_rotated_impl, CUDA, box_iou_rotated_cuda);
+
+void box_iou_quadri_cuda(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned);
+
+void box_iou_quadri_impl(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                         const int mode_flag, const bool aligned);
+REGISTER_DEVICE_IMPL(box_iou_quadri_impl, CUDA, box_iou_quadri_cuda);
+
+void CARAFEForwardCUDAKernelLauncher(const Tensor features, const Tensor masks,
+                                     Tensor rfeatures, Tensor routput,
+                                     Tensor rmasks, Tensor output,
+                                     const int kernel_size,
+                                     const int group_size,
+                                     const int scale_factor);
+
+void CARAFEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor rfeatures, const Tensor masks,
+    Tensor rtop_grad, Tensor rbottom_grad_hs, Tensor rbottom_grad,
+    Tensor rmask_grad, Tensor bottom_grad, Tensor mask_grad,
+    const int kernel_size, const int group_size, const int scale_factor);
+
+void carafe_forward_cuda(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor) {
+  CARAFEForwardCUDAKernelLauncher(features, masks, rfeatures, routput, rmasks,
+                                  output, kernel_size, group_size,
+                                  scale_factor);
+}
+
+void carafe_backward_cuda(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor) {
+  CARAFEBackwardCUDAKernelLauncher(top_grad, rfeatures, masks, rtop_grad,
+                                   rbottom_grad_hs, rbottom_grad, rmask_grad,
+                                   bottom_grad, mask_grad, kernel_size,
+                                   group_size, scale_factor);
+}
+
+void carafe_forward_impl(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor);
+
+void carafe_backward_impl(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor);
+
+REGISTER_DEVICE_IMPL(carafe_forward_impl, CUDA, carafe_forward_cuda);
+REGISTER_DEVICE_IMPL(carafe_backward_impl, CUDA, carafe_backward_cuda);
+
+void CARAFENAIVEForwardCUDAKernelLauncher(const Tensor features,
+                                          const Tensor masks, Tensor output,
+                                          const int kernel_size,
+                                          const int group_size,
+                                          const int scale_factor);
+
+void CARAFENAIVEBackwardCUDAKernelLauncher(
+    const Tensor top_grad, const Tensor features, const Tensor masks,
+    Tensor bottom_grad, Tensor mask_grad, const int kernel_size,
+    const int group_size, const int scale_factor);
+
+void carafe_naive_forward_cuda(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor) {
+  CARAFENAIVEForwardCUDAKernelLauncher(features, masks, output, kernel_size,
+                                       group_size, scale_factor);
+}
+
+void carafe_naive_backward_cuda(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor) {
+  CARAFENAIVEBackwardCUDAKernelLauncher(top_grad, features, masks, bottom_grad,
+                                        mask_grad, kernel_size, group_size,
+                                        scale_factor);
+}
+void carafe_naive_forward_impl(Tensor features, Tensor masks, Tensor output,
+                               int kernel_size, int group_size,
+                               int scale_factor);
+
+void carafe_naive_backward_impl(Tensor top_grad, Tensor features, Tensor masks,
+                                Tensor bottom_grad, Tensor mask_grad,
+                                int kernel_size, int group_size,
+                                int scale_factor);
+
+REGISTER_DEVICE_IMPL(carafe_naive_forward_impl, CUDA,
+                     carafe_naive_forward_cuda);
+REGISTER_DEVICE_IMPL(carafe_naive_backward_impl, CUDA,
+                     carafe_naive_backward_cuda);
+
+void CorrelationForwardCUDAKernelLauncher(Tensor input1, Tensor input2,
+                                          Tensor output, int kH, int kW,
+                                          int patchH, int patchW, int padH,
+                                          int padW, int dilationH,
+                                          int dilationW, int dilation_patchH,
+                                          int dilation_patchW, int dH, int dW);
+
+void CorrelationBackwardCUDAKernelLauncher(Tensor grad_output, Tensor input1,
+                                           Tensor input2, Tensor grad_input1,
+                                           Tensor grad_input2, int kH, int kW,
+                                           int patchH, int patchW, int padH,
+                                           int padW, int dilationH,
+                                           int dilationW, int dilation_patchH,
+                                           int dilation_patchW, int dH, int dW);
+
+void correlation_forward_cuda(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW) {
+  CorrelationForwardCUDAKernelLauncher(
+      input1, input2, output, kH, kW, patchH, patchW, padH, padW, dilationH,
+      dilationW, dilation_patchH, dilation_patchW, dH, dW);
+}
+
+void correlation_backward_cuda(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW) {
+  CorrelationBackwardCUDAKernelLauncher(
+      grad_output, input1, input2, grad_input1, grad_input2, kH, kW, patchH,
+      patchW, padH, padW, dilationH, dilationW, dilation_patchH,
+      dilation_patchW, dH, dW);
+}
+
+void correlation_forward_impl(Tensor input1, Tensor input2, Tensor output,
+                              int kH, int kW, int patchH, int patchW, int padH,
+                              int padW, int dilationH, int dilationW,
+                              int dilation_patchH, int dilation_patchW, int dH,
+                              int dW);
+
+void correlation_backward_impl(Tensor grad_output, Tensor input1, Tensor input2,
+                               Tensor grad_input1, Tensor grad_input2, int kH,
+                               int kW, int patchH, int patchW, int padH,
+                               int padW, int dilationH, int dilationW,
+                               int dilation_patchH, int dilation_patchW, int dH,
+                               int dW);
+
+REGISTER_DEVICE_IMPL(correlation_forward_impl, CUDA, correlation_forward_cuda);
+REGISTER_DEVICE_IMPL(correlation_backward_impl, CUDA,
+                     correlation_backward_cuda);
+
+void deformable_im2col_cuda(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col);
+
+void deformable_col2im_cuda(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im);
+
+void deformable_col2im_coord_cuda(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset);
+
+void deformable_im2col_impl(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col);
+
+void deformable_col2im_impl(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im);
+
+void deformable_col2im_coord_impl(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset);
+
+REGISTER_DEVICE_IMPL(deformable_im2col_impl, CUDA, deformable_im2col_cuda);
+REGISTER_DEVICE_IMPL(deformable_col2im_impl, CUDA, deformable_col2im_cuda);
+REGISTER_DEVICE_IMPL(deformable_col2im_coord_impl, CUDA,
+                     deformable_col2im_coord_cuda);
+
+void DeformRoIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                            Tensor offset, Tensor output,
+                                            int pooled_height, int pooled_width,
+                                            float spatial_scale,
+                                            int sampling_ratio, float gamma);
+
+void DeformRoIPoolBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor input, Tensor rois, Tensor offset,
+    Tensor grad_input, Tensor grad_offset, int pooled_height, int pooled_width,
+    float spatial_scale, int sampling_ratio, float gamma);
+
+void deform_roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  DeformRoIPoolForwardCUDAKernelLauncher(input, rois, offset, output,
+                                         pooled_height, pooled_width,
+                                         spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_backward_cuda(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma) {
+  DeformRoIPoolBackwardCUDAKernelLauncher(
+      grad_output, input, rois, offset, grad_input, grad_offset, pooled_height,
+      pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma);
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma);
+
+REGISTER_DEVICE_IMPL(deform_roi_pool_forward_impl, CUDA,
+                     deform_roi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(deform_roi_pool_backward_impl, CUDA,
+                     deform_roi_pool_backward_cuda);
+
+void SigmoidFocalLossForwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha);
+
+void SigmoidFocalLossBackwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                                Tensor weight,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha);
+
+void SoftmaxFocalLossForwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha);
+
+void SoftmaxFocalLossBackwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                                Tensor weight, Tensor buff,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha);
+
+void sigmoid_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  SigmoidFocalLossForwardCUDAKernelLauncher(input, target, weight, output,
+                                            gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha) {
+  SigmoidFocalLossBackwardCUDAKernelLauncher(input, target, weight, grad_input,
+                                             gamma, alpha);
+}
+
+void softmax_focal_loss_forward_cuda(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  SoftmaxFocalLossForwardCUDAKernelLauncher(input, target, weight, output,
+                                            gamma, alpha);
+}
+
+void softmax_focal_loss_backward_cuda(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha) {
+  SoftmaxFocalLossBackwardCUDAKernelLauncher(input, target, weight, buff,
+                                             grad_input, gamma, alpha);
+}
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha);
+
+void softmax_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void softmax_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha);
+
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_forward_impl, CUDA,
+                     sigmoid_focal_loss_forward_cuda);
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_backward_impl, CUDA,
+                     sigmoid_focal_loss_backward_cuda);
+REGISTER_DEVICE_IMPL(softmax_focal_loss_forward_impl, CUDA,
+                     softmax_focal_loss_forward_cuda);
+REGISTER_DEVICE_IMPL(softmax_focal_loss_backward_impl, CUDA,
+                     softmax_focal_loss_backward_cuda);
+
+void FurthestPointSamplingForwardCUDAKernelLauncher(int b, int n, int m,
+                                                    const float* dataset,
+                                                    float* temp, int* idxs);
+
+void FurthestPointSamplingWithDistForwardCUDAKernelLauncher(
+    int b, int n, int m, const float* dataset, float* temp, int* idxs);
+
+void furthest_point_sampling_forward_cuda(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m) {
+  const float* dataset = points_tensor.data_ptr<float>();
+  float* temp = temp_tensor.data_ptr<float>();
+  int* idxs = idx_tensor.data_ptr<int>();
+  FurthestPointSamplingForwardCUDAKernelLauncher(b, n, m, dataset, temp, idxs);
+}
+
+void furthest_point_sampling_with_dist_forward_cuda(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m) {
+  const float* dataset = points_tensor.data_ptr<float>();
+  float* temp = temp_tensor.data_ptr<float>();
+  int* idxs = idx_tensor.data_ptr<int>();
+  FurthestPointSamplingWithDistForwardCUDAKernelLauncher(b, n, m, dataset, temp,
+                                                         idxs);
+}
+
+void furthest_point_sampling_forward_impl(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m);
+
+void furthest_point_sampling_with_dist_forward_impl(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m);
+
+REGISTER_DEVICE_IMPL(furthest_point_sampling_forward_impl, CUDA,
+                     furthest_point_sampling_forward_cuda);
+REGISTER_DEVICE_IMPL(furthest_point_sampling_with_dist_forward_impl, CUDA,
+                     furthest_point_sampling_with_dist_forward_cuda);
+
+torch::Tensor fused_bias_leakyrelu_op(const torch::Tensor& input,
+                                      const torch::Tensor& bias,
+                                      const torch::Tensor& refer, int act,
+                                      int grad, float alpha, float scale);
+
+torch::Tensor fused_bias_leakyrelu_op_impl(const torch::Tensor& input,
+                                           const torch::Tensor& bias,
+                                           const torch::Tensor& refer, int act,
+                                           int grad, float alpha, float scale);
+REGISTER_DEVICE_IMPL(fused_bias_leakyrelu_op_impl, CUDA,
+                     fused_bias_leakyrelu_op);
+
+void GatherPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           const Tensor points,
+                                           const Tensor idx, Tensor out);
+
+void GatherPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                            const Tensor grad_out,
+                                            const Tensor idx,
+                                            Tensor grad_points);
+
+void gather_points_forward_cuda(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out) {
+  GatherPointsForwardCUDAKernelLauncher(b, c, n, npoints, points, idx, out);
+};
+
+void gather_points_backward_cuda(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points) {
+  GatherPointsBackwardCUDAKernelLauncher(b, c, n, npoints, grad_out, idx,
+                                         grad_points);
+};
+
+void gather_points_forward_impl(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out);
+
+void gather_points_backward_impl(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points);
+
+REGISTER_DEVICE_IMPL(gather_points_forward_impl, CUDA,
+                     gather_points_forward_cuda);
+REGISTER_DEVICE_IMPL(gather_points_backward_impl, CUDA,
+                     gather_points_backward_cuda);
+
+void GroupPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                          int nsample, const Tensor points,
+                                          const Tensor idx, Tensor out);
+
+void GroupPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           int nsample, const Tensor grad_out,
+                                           const Tensor idx,
+                                           Tensor grad_points);
+
+void group_points_forward_cuda(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out) {
+  GroupPointsForwardCUDAKernelLauncher(b, c, n, npoints, nsample, points, idx,
+                                       out);
+};
+
+void group_points_backward_cuda(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points) {
+  GroupPointsBackwardCUDAKernelLauncher(b, c, n, npoints, nsample, grad_out,
+                                        idx, grad_points);
+};
+
+void group_points_forward_impl(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out);
+
+void group_points_backward_impl(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points);
+
+REGISTER_DEVICE_IMPL(group_points_forward_impl, CUDA,
+                     group_points_forward_cuda);
+REGISTER_DEVICE_IMPL(group_points_backward_impl, CUDA,
+                     group_points_backward_cuda);
+
+void StackGroupPointsForwardCUDAKernelLauncher(
+    int b, int c, int m, int nsample, const Tensor features_tensor,
+    const Tensor features_batch_cnt_tensor, const Tensor idx_tensor,
+    const Tensor idx_batch_cnt_tensor, Tensor out_tensor);
+void StackGroupPointsBackwardCUDAKernelLauncher(
+    int b, int c, int m, int n, int nsample, const Tensor grad_out_tensor,
+    const Tensor idx_tensor, const Tensor idx_batch_cnt_tensor,
+    const Tensor features_batch_cnt_tensor, Tensor grad_features_tensor);
+
+void stack_group_points_forward_cuda(int b, int c, int m, int nsample,
+                                     const Tensor features_tensor,
+                                     const Tensor features_batch_cnt_tensor,
+                                     const Tensor idx_tensor,
+                                     const Tensor idx_batch_cnt_tensor,
+                                     Tensor out_tensor) {
+  StackGroupPointsForwardCUDAKernelLauncher(
+      b, c, m, nsample, features_tensor, features_batch_cnt_tensor, idx_tensor,
+      idx_batch_cnt_tensor, out_tensor);
+};
+
+void stack_group_points_backward_cuda(int b, int c, int m, int n, int nsample,
+                                      const Tensor grad_out_tensor,
+                                      const Tensor idx_tensor,
+                                      const Tensor idx_batch_cnt_tensor,
+                                      const Tensor features_batch_cnt_tensor,
+                                      Tensor grad_features_tensor) {
+  StackGroupPointsBackwardCUDAKernelLauncher(
+      b, c, m, n, nsample, grad_out_tensor, idx_tensor, idx_batch_cnt_tensor,
+      features_batch_cnt_tensor, grad_features_tensor);
+};
+
+void stack_group_points_forward_impl(int b, int c, int m, int nsample,
+                                     const Tensor features_tensor,
+                                     const Tensor features_batch_cnt_tensor,
+                                     const Tensor idx_tensor,
+                                     const Tensor idx_batch_cnt_tensor,
+                                     Tensor out_tensor);
+
+void stack_group_points_backward_impl(int b, int c, int m, int n, int nsample,
+                                      const Tensor grad_out_tensor,
+                                      const Tensor idx_tensor,
+                                      const Tensor idx_batch_cnt_tensor,
+                                      const Tensor features_batch_cnt_tensor,
+                                      Tensor grad_features_tensor);
+
+REGISTER_DEVICE_IMPL(stack_group_points_forward_impl, CUDA,
+                     stack_group_points_forward_cuda);
+REGISTER_DEVICE_IMPL(stack_group_points_backward_impl, CUDA,
+                     stack_group_points_backward_cuda);
+
+void IoU3DBoxesOverlapBevForwardCUDAKernelLauncher(const int num_a,
+                                                   const Tensor boxes_a,
+                                                   const int num_b,
+                                                   const Tensor boxes_b,
+                                                   Tensor ans_overlap);
+
+void IoU3DNMS3DForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                         Tensor& keep_num,
+                                         float nms_overlap_thresh);
+
+void IoU3DNMS3DNormalForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                               Tensor& keep_num,
+                                               float nms_overlap_thresh);
+
+void iou3d_boxes_overlap_bev_forward_cuda(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap) {
+  IoU3DBoxesOverlapBevForwardCUDAKernelLauncher(num_a, boxes_a, num_b, boxes_b,
+                                                ans_overlap);
+};
+
+void iou3d_nms3d_forward_cuda(const Tensor boxes, Tensor& keep,
+                              Tensor& keep_num, float nms_overlap_thresh) {
+  IoU3DNMS3DForwardCUDAKernelLauncher(boxes, keep, keep_num,
+                                      nms_overlap_thresh);
+};
+
+void iou3d_nms3d_normal_forward_cuda(const Tensor boxes, Tensor& keep,
+                                     Tensor& keep_num,
+                                     float nms_overlap_thresh) {
+  IoU3DNMS3DNormalForwardCUDAKernelLauncher(boxes, keep, keep_num,
+                                            nms_overlap_thresh);
+};
+
+void iou3d_boxes_overlap_bev_forward_impl(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap);
+
+void iou3d_nms3d_forward_impl(const Tensor boxes, Tensor& keep,
+                              Tensor& keep_num, float nms_overlap_thresh);
+
+void iou3d_nms3d_normal_forward_impl(const Tensor boxes, Tensor& keep,
+                                     Tensor& keep_num,
+                                     float nms_overlap_thresh);
+
+REGISTER_DEVICE_IMPL(iou3d_boxes_overlap_bev_forward_impl, CUDA,
+                     iou3d_boxes_overlap_bev_forward_cuda);
+REGISTER_DEVICE_IMPL(iou3d_nms3d_forward_impl, CUDA, iou3d_nms3d_forward_cuda);
+REGISTER_DEVICE_IMPL(iou3d_nms3d_normal_forward_impl, CUDA,
+                     iou3d_nms3d_normal_forward_cuda);
+
+void KNNForwardCUDAKernelLauncher(int b, int n, int m, int nsample,
+                                  const Tensor xyz, const Tensor new_xyz,
+                                  Tensor idx, Tensor dist2);
+
+void knn_forward_cuda(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2) {
+  KNNForwardCUDAKernelLauncher(b, n, m, nsample, xyz, new_xyz, idx, dist2);
+}
+
+void knn_forward_impl(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2);
+REGISTER_DEVICE_IMPL(knn_forward_impl, CUDA, knn_forward_cuda);
+
+void MaskedIm2colForwardCUDAKernelLauncher(const Tensor bottom_data,
+                                           const Tensor mask_h_idx,
+                                           const Tensor mask_w_idx,
+                                           Tensor top_data, const int kernel_h,
+                                           const int kernel_w, const int pad_h,
+                                           const int pad_w);
+
+void MaskedCol2imForwardCUDAKernelLauncher(const Tensor bottom_data,
+                                           const Tensor mask_h_idx,
+                                           const Tensor mask_w_idx,
+                                           Tensor top_data, const int height,
+                                           const int width, const int channels);
+
+void masked_im2col_forward_cuda(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kw), col: (kh * kw * ic, ow * oh)
+  MaskedIm2colForwardCUDAKernelLauncher(im, mask_h_idx, mask_w_idx, col,
+                                        kernel_h, kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_cuda(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kh), col: (kh * kw * ic, ow * oh)
+  MaskedCol2imForwardCUDAKernelLauncher(col, mask_h_idx, mask_w_idx, im, height,
+                                        width, channels);
+}
+
+void masked_im2col_forward_impl(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w);
+
+void masked_col2im_forward_impl(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels);
+
+REGISTER_DEVICE_IMPL(masked_im2col_forward_impl, CUDA,
+                     masked_im2col_forward_cuda);
+REGISTER_DEVICE_IMPL(masked_col2im_forward_impl, CUDA,
+                     masked_col2im_forward_cuda);
+
+void modulated_deformable_im2col_cuda(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col);
+
+void modulated_deformable_col2im_cuda(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im);
+
+void modulated_deformable_col2im_coord_cuda(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask);
+
+void modulated_deformable_im2col_impl(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col);
+
+void modulated_deformable_col2im_impl(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im);
+
+void modulated_deformable_col2im_coord_impl(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask);
+
+REGISTER_DEVICE_IMPL(modulated_deformable_im2col_impl, CUDA,
+                     modulated_deformable_im2col_cuda);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_impl, CUDA,
+                     modulated_deformable_col2im_cuda);
+REGISTER_DEVICE_IMPL(modulated_deformable_col2im_coord_impl, CUDA,
+                     modulated_deformable_col2im_coord_cuda);
+
+Tensor ms_deform_attn_cuda_forward(const Tensor& value,
+                                   const Tensor& spatial_shapes,
+                                   const Tensor& level_start_index,
+                                   const Tensor& sampling_loc,
+                                   const Tensor& attn_weight,
+                                   const int im2col_step);
+
+void ms_deform_attn_cuda_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight, const int im2col_step);
+
+Tensor ms_deform_attn_impl_forward(const Tensor& value,
+                                   const Tensor& spatial_shapes,
+                                   const Tensor& level_start_index,
+                                   const Tensor& sampling_loc,
+                                   const Tensor& attn_weight,
+                                   const int im2col_step);
+
+void ms_deform_attn_impl_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight, const int im2col_step);
+
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_forward, CUDA,
+                     ms_deform_attn_cuda_forward);
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_backward, CUDA,
+                     ms_deform_attn_cuda_backward);
+
+Tensor NMSCUDAKernelLauncher(Tensor boxes, Tensor scores, float iou_threshold,
+                             int offset);
+
+Tensor nms_cuda(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return NMSCUDAKernelLauncher(boxes, scores, iou_threshold, offset);
+}
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+REGISTER_DEVICE_IMPL(nms_impl, CUDA, nms_cuda);
+
+void PointsInBoxesPartForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                                int pts_num, const Tensor boxes,
+                                                const Tensor pts,
+                                                Tensor box_idx_of_points);
+
+void PointsInBoxesAllForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                               int pts_num, const Tensor boxes,
+                                               const Tensor pts,
+                                               Tensor box_idx_of_points);
+
+void points_in_boxes_part_forward_cuda(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points) {
+  PointsInBoxesPartForwardCUDAKernelLauncher(batch_size, boxes_num, pts_num,
+                                             boxes, pts, box_idx_of_points);
+};
+
+void points_in_boxes_all_forward_cuda(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points) {
+  PointsInBoxesAllForwardCUDAKernelLauncher(batch_size, boxes_num, pts_num,
+                                            boxes, pts, box_idx_of_points);
+};
+
+void points_in_boxes_part_forward_impl(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points);
+
+void points_in_boxes_all_forward_impl(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points);
+REGISTER_DEVICE_IMPL(points_in_boxes_part_forward_impl, CUDA,
+                     points_in_boxes_part_forward_cuda);
+REGISTER_DEVICE_IMPL(points_in_boxes_all_forward_impl, CUDA,
+                     points_in_boxes_all_forward_cuda);
+
+void PSAMaskForwardCUDAKernelLauncher(const int psa_type, const Tensor input,
+                                      Tensor output, const int num_,
+                                      const int h_feature, const int w_feature,
+                                      const int h_mask, const int w_mask,
+                                      const int half_h_mask,
+                                      const int half_w_mask);
+
+void PSAMaskBackwardCUDAKernelLauncher(
+    const int psa_type, const Tensor grad_output, Tensor grad_input,
+    const int num_, const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int half_h_mask, const int half_w_mask);
+
+void psamask_forward_cuda(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask) {
+  PSAMaskForwardCUDAKernelLauncher(psa_type, input, output, num_, h_feature,
+                                   w_feature, h_mask, w_mask, half_h_mask,
+                                   half_w_mask);
+}
+
+void psamask_backward_cuda(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask) {
+  PSAMaskBackwardCUDAKernelLauncher(psa_type, grad_output, grad_input, num_,
+                                    h_feature, w_feature, h_mask, w_mask,
+                                    half_h_mask, half_w_mask);
+}
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask);
+REGISTER_DEVICE_IMPL(psamask_forward_impl, CUDA, psamask_forward_cuda);
+REGISTER_DEVICE_IMPL(psamask_backward_impl, CUDA, psamask_backward_cuda);
+
+void ROIAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       int pool_mode, bool aligned);
+
+void ROIAlignBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                        Tensor argmax_y, Tensor argmax_x,
+                                        Tensor grad_input, int aligned_height,
+                                        int aligned_width, float spatial_scale,
+                                        int sampling_ratio, int pool_mode,
+                                        bool aligned);
+
+void roi_align_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignForwardCUDAKernelLauncher(
+      input, rois, output, argmax_y, argmax_x, aligned_height, aligned_width,
+      spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned) {
+  ROIAlignBackwardCUDAKernelLauncher(
+      grad_output, rois, argmax_y, argmax_x, grad_input, aligned_height,
+      aligned_width, spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned);
+
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, CUDA, roi_align_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_align_backward_impl, CUDA, roi_align_backward_cuda);
+
+void ROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor input, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor output);
+
+void ROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor bottom_grad);
+
+void roi_align_rotated_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+
+  int num_channels = input.size(1);
+  int data_height = input.size(2);
+  int data_width = input.size(3);
+  ROIAlignRotatedForwardCUDAKernelLauncher(
+      input, rois, spatial_scale, sampling_ratio, aligned, clockwise,
+      num_channels, data_height, data_width, num_rois, aligned_height,
+      aligned_width, output);
+}
+
+void roi_align_rotated_backward_cuda(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+
+  int num_channels = bottom_grad.size(1);
+  int data_height = bottom_grad.size(2);
+  int data_width = bottom_grad.size(3);
+  ROIAlignRotatedBackwardCUDAKernelLauncher(
+      top_grad, rois, spatial_scale, sampling_ratio, aligned, clockwise,
+      num_channels, data_height, data_width, num_rois, aligned_height,
+      aligned_width, bottom_grad);
+}
+
+void roi_align_rotated_forward_impl(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise);
+REGISTER_DEVICE_IMPL(roi_align_rotated_forward_impl, CUDA,
+                     roi_align_rotated_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_align_rotated_backward_impl, CUDA,
+                     roi_align_rotated_backward_cuda);
+
+void RiROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor features, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor output);
+
+void RiROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor bottom_grad);
+
+void riroi_align_rotated_forward_cuda(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+  CHECK_CONTIGUOUS(features);
+  CHECK_CONTIGUOUS(rois);
+  int num_channels = features.size(1) / num_orientations;
+  int data_height = features.size(2);
+  int data_width = features.size(3);
+  RiROIAlignRotatedForwardCUDAKernelLauncher(
+      features, rois, spatial_scale, num_samples, clockwise, num_channels,
+      data_height, data_width, num_rois, pooled_height, pooled_width,
+      num_orientations, output);
+}
+
+void riroi_align_rotated_backward_cuda(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise) {
+  // Number of ROIs
+  int num_rois = rois.size(0);
+  int size_rois = rois.size(1);
+  if (size_rois != 6) {
+    AT_ERROR("wrong roi size");
+  }
+  CHECK_CONTIGUOUS(top_grad);
+  CHECK_CONTIGUOUS(rois);
+  int num_channels = bottom_grad.size(1) / num_orientations;
+  int data_height = bottom_grad.size(2);
+  int data_width = bottom_grad.size(3);
+  RiROIAlignRotatedBackwardCUDAKernelLauncher(
+      top_grad, rois, spatial_scale, num_samples, clockwise, num_channels,
+      data_height, data_width, num_rois, pooled_height, pooled_width,
+      num_orientations, bottom_grad);
+}
+
+void riroi_align_rotated_forward_impl(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise);
+
+void riroi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise);
+
+REGISTER_DEVICE_IMPL(riroi_align_rotated_forward_impl, CUDA,
+                     riroi_align_rotated_forward_cuda);
+REGISTER_DEVICE_IMPL(riroi_align_rotated_backward_impl, CUDA,
+                     riroi_align_rotated_backward_cuda);
+
+void RoiawarePool3dForwardCUDAKernelLauncher(
+    int boxes_num, int pts_num, int channels, int max_pts_each_voxel, int out_x,
+    int out_y, int out_z, const Tensor rois, const Tensor pts,
+    const Tensor pts_feature, Tensor argmax, Tensor pts_idx_of_voxels,
+    Tensor pooled_features, int pool_method);
+
+void RoiawarePool3dBackwardCUDAKernelLauncher(
+    int boxes_num, int out_x, int out_y, int out_z, int channels,
+    int max_pts_each_voxel, const Tensor pts_idx_of_voxels, const Tensor argmax,
+    const Tensor grad_out, Tensor grad_in, int pool_method);
+
+void roiaware_pool3d_forward_cuda(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method) {
+  RoiawarePool3dForwardCUDAKernelLauncher(
+      boxes_num, pts_num, channels, max_pts_each_voxel, out_x, out_y, out_z,
+      rois, pts, pts_feature, argmax, pts_idx_of_voxels, pooled_features,
+      pool_method);
+};
+
+void roiaware_pool3d_backward_cuda(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method) {
+  RoiawarePool3dBackwardCUDAKernelLauncher(
+      boxes_num, out_x, out_y, out_z, channels, max_pts_each_voxel,
+      pts_idx_of_voxels, argmax, grad_out, grad_in, pool_method);
+};
+
+void roiaware_pool3d_forward_impl(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method);
+
+void roiaware_pool3d_backward_impl(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method);
+
+REGISTER_DEVICE_IMPL(roiaware_pool3d_forward_impl, CUDA,
+                     roiaware_pool3d_forward_cuda);
+REGISTER_DEVICE_IMPL(roiaware_pool3d_backward_impl, CUDA,
+                     roiaware_pool3d_backward_cuda);
+
+void RoIPointPool3dForwardCUDAKernelLauncher(
+    int batch_size, int pts_num, int boxes_num, int feature_in_len,
+    int sampled_pts_num, const Tensor xyz, const Tensor boxes3d,
+    const Tensor pts_feature, Tensor pooled_features, Tensor pooled_empty_flag);
+
+void roipoint_pool3d_forward_cuda(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag) {
+  RoIPointPool3dForwardCUDAKernelLauncher(
+      batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num, xyz,
+      boxes3d, pts_feature, pooled_features, pooled_empty_flag);
+};
+
+void roipoint_pool3d_forward_impl(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag);
+REGISTER_DEVICE_IMPL(roipoint_pool3d_forward_impl, CUDA,
+                     roipoint_pool3d_forward_cuda);
+
+void ROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                      Tensor argmax, int pooled_height,
+                                      int pooled_width, float spatial_scale);
+
+void ROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                       Tensor argmax, Tensor grad_input,
+                                       int pooled_height, int pooled_width,
+                                       float spatial_scale);
+
+void roi_pool_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale) {
+  ROIPoolForwardCUDAKernelLauncher(input, rois, output, argmax, pooled_height,
+                                   pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_cuda(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale) {
+  ROIPoolBackwardCUDAKernelLauncher(grad_output, rois, argmax, grad_input,
+                                    pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale);
+void roi_pool_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale);
+REGISTER_DEVICE_IMPL(roi_pool_forward_impl, CUDA, roi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(roi_pool_backward_impl, CUDA, roi_pool_backward_cuda);
+
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
+std::vector<at::Tensor> DynamicPointToVoxelForwardCUDAKernelLauncher(
+    const at::Tensor& feats, const at::Tensor& coors,
+    const reduce_t reduce_type);
+
+void DynamicPointToVoxelBackwardCUDAKernelLauncher(
+    at::Tensor& grad_feats, const at::Tensor& grad_reduced_feats,
+    const at::Tensor& feats, const at::Tensor& reduced_feats,
+    const at::Tensor& coors_map, const at::Tensor& reduce_count,
+    const reduce_t reduce_type);
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_cuda(
+    const torch::Tensor& feats, const torch::Tensor& coors,
+    const reduce_t reduce_type) {
+  return DynamicPointToVoxelForwardCUDAKernelLauncher(feats, coors,
+                                                      reduce_type);
+};
+
+void dynamic_point_to_voxel_backward_cuda(
+    torch::Tensor& grad_feats, const torch::Tensor& grad_reduced_feats,
+    const torch::Tensor& feats, const torch::Tensor& reduced_feats,
+    const torch::Tensor& coors_idx, const torch::Tensor& reduce_count,
+    const reduce_t reduce_type) {
+  DynamicPointToVoxelBackwardCUDAKernelLauncher(grad_feats, grad_reduced_feats,
+                                                feats, reduced_feats, coors_idx,
+                                                reduce_count, reduce_type);
+};
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_impl(
+    const torch::Tensor& feats, const torch::Tensor& coors,
+    const reduce_t reduce_type);
+
+void dynamic_point_to_voxel_backward_impl(
+    torch::Tensor& grad_feats, const torch::Tensor& grad_reduced_feats,
+    const torch::Tensor& feats, const torch::Tensor& reduced_feats,
+    const torch::Tensor& coors_idx, const torch::Tensor& reduce_count,
+    const reduce_t reduce_type);
+
+REGISTER_DEVICE_IMPL(dynamic_point_to_voxel_forward_impl, CUDA,
+                     dynamic_point_to_voxel_forward_cuda);
+REGISTER_DEVICE_IMPL(dynamic_point_to_voxel_backward_impl, CUDA,
+                     dynamic_point_to_voxel_backward_cuda);
+
+void SyncBNForwardMeanCUDAKernelLauncher(const Tensor input, Tensor mean);
+
+void SyncBNForwardVarCUDAKernelLauncher(const Tensor input, const Tensor mean,
+                                        Tensor var);
+
+void SyncBNForwardOutputCUDAKernelLauncher(
+    const Tensor input, const Tensor mean, const Tensor var,
+    Tensor running_mean, Tensor running_var, const Tensor weight,
+    const Tensor bias, Tensor norm, Tensor std, Tensor output, float eps,
+    float momentum, int group_size);
+
+void SyncBNBackwardParamCUDAKernelLauncher(const Tensor grad_output,
+                                           const Tensor norm,
+                                           Tensor grad_weight,
+                                           Tensor grad_bias);
+
+void SyncBNBackwardDataCUDAKernelLauncher(const Tensor grad_output,
+                                          const Tensor weight,
+                                          const Tensor grad_weight,
+                                          const Tensor grad_bias,
+                                          const Tensor norm, const Tensor std,
+                                          Tensor grad_input);
+
+void sync_bn_forward_mean_cuda(const Tensor input, Tensor mean) {
+  SyncBNForwardMeanCUDAKernelLauncher(input, mean);
+}
+
+void sync_bn_forward_var_cuda(const Tensor input, const Tensor mean,
+                              Tensor var) {
+  SyncBNForwardVarCUDAKernelLauncher(input, mean, var);
+}
+
+void sync_bn_forward_output_cuda(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size) {
+  SyncBNForwardOutputCUDAKernelLauncher(input, mean, var, running_mean,
+                                        running_var, weight, bias, norm, std,
+                                        output, eps, momentum, group_size);
+}
+
+void sync_bn_backward_param_cuda(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias) {
+  SyncBNBackwardParamCUDAKernelLauncher(grad_output, norm, grad_weight,
+                                        grad_bias);
+}
+
+void sync_bn_backward_data_cuda(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input) {
+  SyncBNBackwardDataCUDAKernelLauncher(grad_output, weight, grad_weight,
+                                       grad_bias, norm, std, grad_input);
+}
+
+void sync_bn_forward_mean_impl(const Tensor input, Tensor mean);
+
+void sync_bn_forward_var_impl(const Tensor input, const Tensor mean,
+                              Tensor var);
+
+void sync_bn_forward_output_impl(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size);
+
+void sync_bn_backward_param_impl(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias);
+
+void sync_bn_backward_data_impl(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input);
+
+REGISTER_DEVICE_IMPL(sync_bn_forward_mean_impl, CUDA,
+                     sync_bn_forward_mean_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_forward_var_impl, CUDA, sync_bn_forward_var_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_forward_output_impl, CUDA,
+                     sync_bn_forward_output_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_backward_param_impl, CUDA,
+                     sync_bn_backward_param_cuda);
+REGISTER_DEVICE_IMPL(sync_bn_backward_data_impl, CUDA,
+                     sync_bn_backward_data_cuda);
+
+void ThreeInterpolateForwardCUDAKernelLauncher(int b, int c, int m, int n,
+                                               const Tensor points,
+                                               const Tensor idx,
+                                               const Tensor weight, Tensor out);
+
+void ThreeInterpolateBackwardCUDAKernelLauncher(int b, int c, int n, int m,
+                                                const Tensor grad_out,
+                                                const Tensor idx,
+                                                const Tensor weight,
+                                                Tensor grad_points);
+
+void three_interpolate_forward_cuda(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out) {
+  ThreeInterpolateForwardCUDAKernelLauncher(b, c, m, n, points, idx, weight,
+                                            out);
+};
+
+void three_interpolate_backward_cuda(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points) {
+  ThreeInterpolateBackwardCUDAKernelLauncher(b, c, n, m, grad_out, idx, weight,
+                                             grad_points);
+};
+
+void three_interpolate_forward_impl(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out);
+
+void three_interpolate_backward_impl(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points);
+REGISTER_DEVICE_IMPL(three_interpolate_forward_impl, CUDA,
+                     three_interpolate_forward_cuda);
+REGISTER_DEVICE_IMPL(three_interpolate_backward_impl, CUDA,
+                     three_interpolate_backward_cuda);
+
+void ThreeNNForwardCUDAKernelLauncher(int b, int n, int m, const Tensor unknown,
+                                      const Tensor known, Tensor dist2,
+                                      Tensor idx);
+
+void three_nn_forward_cuda(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx) {
+  ThreeNNForwardCUDAKernelLauncher(b, n, m, unknown, known, dist2, idx);
+};
+
+void three_nn_forward_impl(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx);
+REGISTER_DEVICE_IMPL(three_nn_forward_impl, CUDA, three_nn_forward_cuda);
+
+void TINShiftForwardCUDAKernelLauncher(Tensor input, Tensor shift,
+                                       Tensor output);
+
+void TINShiftBackwardCUDAKernelLauncher(Tensor grad_output, Tensor shift,
+                                        Tensor grad_input);
+
+void tin_shift_forward_cuda(Tensor input, Tensor shift, Tensor output) {
+  TINShiftForwardCUDAKernelLauncher(input, shift, output);
+}
+
+void tin_shift_backward_cuda(Tensor grad_output, Tensor shift,
+                             Tensor grad_input) {
+  TINShiftBackwardCUDAKernelLauncher(grad_output, shift, grad_input);
+}
+
+void tin_shift_forward_impl(Tensor input, Tensor shift, Tensor output);
+void tin_shift_backward_impl(Tensor grad_output, Tensor shift,
+                             Tensor grad_input);
+REGISTER_DEVICE_IMPL(tin_shift_forward_impl, CUDA, tin_shift_forward_cuda);
+REGISTER_DEVICE_IMPL(tin_shift_backward_impl, CUDA, tin_shift_backward_cuda);
+
+torch::Tensor upfirdn2d_op(const torch::Tensor& input,
+                           const torch::Tensor& kernel, int up_x, int up_y,
+                           int down_x, int down_y, int pad_x0, int pad_x1,
+                           int pad_y0, int pad_y1);
+
+torch::Tensor upfirdn2d_op_impl(const torch::Tensor& input,
+                                const torch::Tensor& kernel, int up_x, int up_y,
+                                int down_x, int down_y, int pad_x0, int pad_x1,
+                                int pad_y0, int pad_y1);
+REGISTER_DEVICE_IMPL(upfirdn2d_op_impl, CUDA, upfirdn2d_op);
+
+int HardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3);
+
+int NondeterministicHardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3);
+
+void DynamicVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor& points, at::Tensor& coors,
+    const std::vector<float> voxel_size, const std::vector<float> coors_range,
+    const int NDim = 3);
+
+int hard_voxelize_forward_cuda(const at::Tensor& points, at::Tensor& voxels,
+                               at::Tensor& coors,
+                               at::Tensor& num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim) {
+  return HardVoxelizeForwardCUDAKernelLauncher(
+      points, voxels, coors, num_points_per_voxel, voxel_size, coors_range,
+      max_points, max_voxels, NDim);
+};
+
+int nondeterministic_hard_voxelize_forward_cuda(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim) {
+  return NondeterministicHardVoxelizeForwardCUDAKernelLauncher(
+      points, voxels, coors, num_points_per_voxel, voxel_size, coors_range,
+      max_points, max_voxels, NDim);
+};
+
+void dynamic_voxelize_forward_cuda(const at::Tensor& points, at::Tensor& coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim) {
+  DynamicVoxelizeForwardCUDAKernelLauncher(points, coors, voxel_size,
+                                           coors_range, NDim);
+};
+
+int hard_voxelize_forward_impl(const at::Tensor& points, at::Tensor& voxels,
+                               at::Tensor& coors,
+                               at::Tensor& num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim);
+
+int nondeterministic_hard_voxelize_forward_impl(
+    const at::Tensor& points, at::Tensor& voxels, at::Tensor& coors,
+    at::Tensor& num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim);
+
+void dynamic_voxelize_forward_impl(const at::Tensor& points, at::Tensor& coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim);
+
+REGISTER_DEVICE_IMPL(hard_voxelize_forward_impl, CUDA,
+                     hard_voxelize_forward_cuda);
+REGISTER_DEVICE_IMPL(nondeterministic_hard_voxelize_forward_impl, CUDA,
+                     nondeterministic_hard_voxelize_forward_cuda);
+REGISTER_DEVICE_IMPL(dynamic_voxelize_forward_impl, CUDA,
+                     dynamic_voxelize_forward_cuda);
+
+void RotatedFeatureAlignForwardCUDAKernelLauncher(const Tensor features,
+                                                  const Tensor best_bboxes,
+                                                  const float spatial_scale,
+                                                  const int points,
+                                                  Tensor output);
+
+void RotatedFeatureAlignBackwardCUDAKernelLauncher(const Tensor top_grad,
+                                                   const Tensor best_bboxes,
+                                                   const float spatial_scale,
+                                                   const int points,
+                                                   Tensor bottom_grad);
+
+void rotated_feature_align_forward_cuda(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output) {
+  RotatedFeatureAlignForwardCUDAKernelLauncher(features, best_bboxes,
+                                               spatial_scale, points, output);
+};
+
+void rotated_feature_align_backward_cuda(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad) {
+  RotatedFeatureAlignBackwardCUDAKernelLauncher(
+      top_grad, best_bboxes, spatial_scale, points, bottom_grad);
+};
+
+void rotated_feature_align_forward_impl(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output);
+
+void rotated_feature_align_backward_impl(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad);
+
+REGISTER_DEVICE_IMPL(rotated_feature_align_forward_impl, CUDA,
+                     rotated_feature_align_forward_cuda);
+REGISTER_DEVICE_IMPL(rotated_feature_align_backward_impl, CUDA,
+                     rotated_feature_align_backward_cuda);
+
+void PointsInPolygonsForwardCUDAKernelLauncher(const at::Tensor points,
+                                               const at::Tensor polygons,
+                                               const int rows, const int cols,
+                                               at::Tensor output);
+
+void points_in_polygons_forward_cuda(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols) {
+  PointsInPolygonsForwardCUDAKernelLauncher(points, polygons, rows, cols,
+                                            output);
+};
+
+void points_in_polygons_forward_impl(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols);
+
+REGISTER_DEVICE_IMPL(points_in_polygons_forward_impl, CUDA,
+                     points_in_polygons_forward_cuda);
+
+// torch::Tensor IndiceMaxpoolForwardCUDAKernelLauncher(torch::Tensor features,
+//                                                      torch::Tensor indicePairs,
+//                                                      torch::Tensor indiceNum,
+//                                                      int64_t numAct);
+
+// torch::Tensor indice_maxpool_forward_cuda(torch::Tensor features,
+//                                           torch::Tensor indicePairs,
+//                                           torch::Tensor indiceNum,
+//                                           int64_t numAct) {
+//   return IndiceMaxpoolForwardCUDAKernelLauncher(features, indicePairs,
+//                                                 indiceNum, numAct);
+// };
+
+// torch::Tensor indice_maxpool_forward_impl(torch::Tensor features,
+//                                           torch::Tensor indicePairs,
+//                                           torch::Tensor indiceNum,
+//                                           int64_t numAct);
+// REGISTER_DEVICE_IMPL(indice_maxpool_forward_impl, CUDA,
+//                      indice_maxpool_forward_cuda);
+
+// torch::Tensor IndiceMaxpoolBackwardCUDAKernelLauncher(torch::Tensor features,
+//                                                       torch::Tensor outFeatures,
+//                                                       torch::Tensor outGrad,
+//                                                       torch::Tensor indicePairs,
+//                                                       torch::Tensor indiceNum);
+
+// torch::Tensor indice_maxpool_backward_cuda(torch::Tensor features,
+//                                            torch::Tensor outFeatures,
+//                                            torch::Tensor outGrad,
+//                                            torch::Tensor indicePairs,
+//                                            torch::Tensor indiceNum) {
+//   return IndiceMaxpoolBackwardCUDAKernelLauncher(features, outFeatures, outGrad,
+//                                                  indicePairs, indiceNum);
+// };
+
+// torch::Tensor indice_maxpool_backward_impl(torch::Tensor features,
+//                                            torch::Tensor outFeatures,
+//                                            torch::Tensor outGrad,
+//                                            torch::Tensor indicePairs,
+//                                            torch::Tensor indiceNum);
+
+// REGISTER_DEVICE_IMPL(indice_maxpool_backward_impl, CUDA,
+//                      indice_maxpool_backward_cuda)
+
+// torch::Tensor IndiceConvForwardCUDAKernelLauncher(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor indicePairs,
+//     torch::Tensor indiceNum, int64_t numActOut, int64_t _inverse,
+//     int64_t _subM);
+
+// torch::Tensor indice_conv_forward_cuda(torch::Tensor features,
+//                                        torch::Tensor filters,
+//                                        torch::Tensor indicePairs,
+//                                        torch::Tensor indiceNum,
+//                                        int64_t numActOut, int64_t _inverse,
+//                                        int64_t _subM) {
+//   return IndiceConvForwardCUDAKernelLauncher(
+//       features, filters, indicePairs, indiceNum, numActOut, _inverse, _subM);
+// };
+
+// torch::Tensor indice_conv_forward_impl(torch::Tensor features,
+//                                        torch::Tensor filters,
+//                                        torch::Tensor indicePairs,
+//                                        torch::Tensor indiceNum,
+//                                        int64_t numActOut, int64_t _inverse,
+//                                        int64_t _subM);
+
+// REGISTER_DEVICE_IMPL(indice_conv_forward_impl, CUDA, indice_conv_forward_cuda);
+
+// std::vector<torch::Tensor> IndiceConvBackwardCUDAKernelLauncher(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor outGrad,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t _inverse,
+//     int64_t _subM);
+
+// std::vector<torch::Tensor> indice_conv_backward_cuda(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor outGrad,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t _inverse,
+//     int64_t _subM) {
+//   return IndiceConvBackwardCUDAKernelLauncher(
+//       features, filters, outGrad, indicePairs, indiceNum, _inverse, _subM);
+// };
+
+// std::vector<torch::Tensor> indice_conv_backward_impl(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor outGrad,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t _inverse,
+//     int64_t _subM);
+
+// REGISTER_DEVICE_IMPL(indice_conv_backward_impl, CUDA,
+//                      indice_conv_backward_cuda);
+
+// torch::Tensor FusedIndiceConvBatchnormCUDAKernelLauncher(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor bias,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut,
+//     int64_t _inverse, int64_t _subM);
+
+// torch::Tensor fused_indice_conv_batchnorm_forward_cuda(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor bias,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut,
+//     int64_t _inverse, int64_t _subM) {
+//   return FusedIndiceConvBatchnormCUDAKernelLauncher(features, filters, bias,
+//                                                     indicePairs, indiceNum,
+//                                                     numActOut, _inverse, _subM);
+// };
+
+// torch::Tensor fused_indice_conv_batchnorm_forward_impl(
+//     torch::Tensor features, torch::Tensor filters, torch::Tensor bias,
+//     torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut,
+//     int64_t _inverse, int64_t _subM);
+
+// REGISTER_DEVICE_IMPL(fused_indice_conv_batchnorm_forward_impl, CUDA,
+//                      fused_indice_conv_batchnorm_forward_cuda)
+
+void MinAreaPolygonsCUDAKernelLauncher(const Tensor pointsets, Tensor polygons);
+
+void min_area_polygons_cuda(const Tensor pointsets, Tensor polygons) {
+  MinAreaPolygonsCUDAKernelLauncher(pointsets, polygons);
+}
+
+void min_area_polygons_impl(const Tensor pointsets, Tensor polygons);
+
+REGISTER_DEVICE_IMPL(min_area_polygons_impl, CUDA, min_area_polygons_cuda);
+
+void ActiveRotatedFilterForwardCUDAKernelLauncher(const Tensor input,
+                                                  const Tensor indices,
+                                                  Tensor output);
+
+void ActiveRotatedFilterBackwardCUDAKernelLauncher(const Tensor grad_out,
+                                                   const Tensor indices,
+                                                   Tensor grad_in);
+
+void active_rotated_filter_forward_cuda(const Tensor input,
+                                        const Tensor indices, Tensor output) {
+  ActiveRotatedFilterForwardCUDAKernelLauncher(input, indices, output);
+};
+
+void active_rotated_filter_backward_cuda(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in) {
+  ActiveRotatedFilterBackwardCUDAKernelLauncher(grad_out, indices, grad_in);
+};
+
+void active_rotated_filter_forward_impl(const Tensor input,
+                                        const Tensor indices, Tensor output);
+
+void active_rotated_filter_backward_impl(const Tensor grad_out,
+                                         const Tensor indices, Tensor grad_in);
+
+REGISTER_DEVICE_IMPL(active_rotated_filter_forward_impl, CUDA,
+                     active_rotated_filter_forward_cuda);
+REGISTER_DEVICE_IMPL(active_rotated_filter_backward_impl, CUDA,
+                     active_rotated_filter_backward_cuda);
+
+void ConvexIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                 Tensor ious);
+
+void ConvexGIoUCUDAKernelLauncher(const Tensor pointsets, const Tensor polygons,
+                                  Tensor output);
+
+void convex_iou_cuda(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious) {
+  ConvexIoUCUDAKernelLauncher(pointsets, polygons, ious);
+}
+
+void convex_giou_cuda(const Tensor pointsets, const Tensor polygons,
+                      Tensor output) {
+  ConvexGIoUCUDAKernelLauncher(pointsets, polygons, output);
+}
+
+void convex_iou_impl(const Tensor pointsets, const Tensor polygons,
+                     Tensor ious);
+
+void convex_giou_impl(const Tensor pointsets, const Tensor polygons,
+                      Tensor output);
+
+REGISTER_DEVICE_IMPL(convex_iou_impl, CUDA, convex_iou_cuda);
+REGISTER_DEVICE_IMPL(convex_giou_impl, CUDA, convex_giou_cuda);
+
+Tensor DiffIoURotatedSortVerticesCUDAKernelLauncher(Tensor vertices,
+                                                    Tensor mask,
+                                                    Tensor num_valid);
+
+Tensor diff_iou_rotated_sort_vertices_forward_cuda(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid) {
+  return DiffIoURotatedSortVerticesCUDAKernelLauncher(vertices, mask,
+                                                      num_valid);
+}
+
+Tensor diff_iou_rotated_sort_vertices_forward_impl(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid);
+
+REGISTER_DEVICE_IMPL(diff_iou_rotated_sort_vertices_forward_impl, CUDA,
+                     diff_iou_rotated_sort_vertices_forward_cuda);
+
+void ChamferDistanceForwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, const Tensor dist1,
+    const Tensor dist2, const Tensor idx1, const Tensor idx2);
+
+void ChamferDistanceBackwardCUDAKernelLauncher(
+    const Tensor xyz1, const Tensor xyz2, Tensor idx1, Tensor idx2,
+    Tensor grad_dist1, Tensor grad_dist2, Tensor grad_xyz1, Tensor grad_xyz2);
+
+void chamfer_distance_forward_cuda(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2) {
+  ChamferDistanceForwardCUDAKernelLauncher(xyz1, xyz2, dist1, dist2, idx1,
+                                           idx2);
+};
+
+void chamfer_distance_backward_cuda(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2) {
+  ChamferDistanceBackwardCUDAKernelLauncher(xyz1, xyz2, idx1, idx2, graddist1,
+                                            graddist2, gradxyz1, gradxyz2);
+};
+
+void chamfer_distance_forward_impl(const Tensor xyz1, const Tensor xyz2,
+                                   const Tensor dist1, const Tensor dist2,
+                                   const Tensor idx1, const Tensor idx2);
+
+void chamfer_distance_backward_impl(const Tensor xyz1, const Tensor xyz2,
+                                    Tensor idx1, Tensor idx2, Tensor graddist1,
+                                    Tensor graddist2, Tensor gradxyz1,
+                                    Tensor gradxyz2);
+
+REGISTER_DEVICE_IMPL(chamfer_distance_forward_impl, CUDA,
+                     chamfer_distance_forward_cuda);
+REGISTER_DEVICE_IMPL(chamfer_distance_backward_impl, CUDA,
+                     chamfer_distance_backward_cuda);
+
+void PrROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                        Tensor output, int pooled_height,
+                                        int pooled_width, float spatial_scale);
+
+void PrROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                         Tensor grad_input, int pooled_height,
+                                         int pooled_width, float spatial_scale);
+
+void PrROIPoolCoorBackwardCUDAKernelLauncher(
+    Tensor output, Tensor grad_output, Tensor input, Tensor rois,
+    Tensor grad_rois, int pooled_height, int pooled_width, float spatial_scale);
+
+void prroi_pool_forward_cuda(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale) {
+  PrROIPoolForwardCUDAKernelLauncher(input, rois, output, pooled_height,
+                                     pooled_width, spatial_scale);
+}
+
+void prroi_pool_backward_cuda(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  PrROIPoolBackwardCUDAKernelLauncher(grad_output, rois, grad_input,
+                                      pooled_height, pooled_width,
+                                      spatial_scale);
+}
+
+void prroi_pool_coor_backward_cuda(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale) {
+  PrROIPoolCoorBackwardCUDAKernelLauncher(output, grad_output, input, rois,
+                                          grad_rois, pooled_height,
+                                          pooled_width, spatial_scale);
+}
+
+void prroi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale);
+void prroi_pool_backward_impl(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale);
+void prroi_pool_coor_backward_impl(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale);
+REGISTER_DEVICE_IMPL(prroi_pool_forward_impl, CUDA, prroi_pool_forward_cuda);
+REGISTER_DEVICE_IMPL(prroi_pool_backward_impl, CUDA, prroi_pool_backward_cuda);
+REGISTER_DEVICE_IMPL(prroi_pool_coor_backward_impl, CUDA,
+                     prroi_pool_coor_backward_cuda);
+
+void BezierAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                          Tensor output, int aligned_height,
+                                          int aligned_width,
+                                          float spatial_scale,
+                                          int sampling_ratio, bool aligned);
+
+void BezierAlignBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor rois, Tensor grad_input, int aligned_height,
+    int aligned_width, float spatial_scale, int sampling_ratio, bool aligned);
+
+void bezier_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                               int aligned_height, int aligned_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned);
+
+void bezier_align_backward_impl(Tensor grad_output, Tensor rois,
+                                Tensor grad_input, int aligned_height,
+                                int aligned_width, float spatial_scale,
+                                int sampling_ratio, bool aligned);
+
+REGISTER_DEVICE_IMPL(bezier_align_forward_impl, CUDA,
+                     BezierAlignForwardCUDAKernelLauncher);
+REGISTER_DEVICE_IMPL(bezier_align_backward_impl, CUDA,
+                     BezierAlignBackwardCUDAKernelLauncher);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_conv_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_conv_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..05fc08b70be937411ed04c0dc80c40f5479c0d9e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_conv_cuda.cu
@@ -0,0 +1,105 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "deform_conv_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void deformable_im2col_cuda(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col) {
+  // num_axes should be smaller than block size
+  // todo: check parallel_imgs is correctly passed in
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = channels * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.scalar_type(), "deformable_im2col_gpu", ([&] {
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+
+        deformable_im2col_gpu_kernel<<<GET_BLOCKS(num_kernels),
+                                       THREADS_PER_BLOCK, 0,
+                                       at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_im_, data_offset_, height, width, ksize_h,
+            ksize_w, pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w,
+            channel_per_deformable_group, parallel_imgs, channels,
+            deformable_group, height_col, width_col, data_col_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void deformable_col2im_cuda(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im) {
+  // todo: make sure parallel_imgs is passed in correctly
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels =
+      channels * ksize_h * ksize_w * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "deformable_col2im_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data_ptr<scalar_t>();
+
+        deformable_col2im_gpu_kernel<<<GET_BLOCKS(num_kernels),
+                                       THREADS_PER_BLOCK, 0,
+                                       at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_col_, data_offset_, channels, height, width,
+            ksize_h, ksize_w, pad_h, pad_w, stride_h, stride_w, dilation_h,
+            dilation_w, channel_per_deformable_group, parallel_imgs,
+            deformable_group, height_col, width_col, grad_im_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void deformable_col2im_coord_cuda(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset) {
+  int height_col =
+      (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col =
+      (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = height_col * width_col * 2 * ksize_h * ksize_w *
+                    deformable_group * parallel_imgs;
+  int channel_per_deformable_group =
+      channels * ksize_h * ksize_w / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "deformable_col2im_coord_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data_ptr<scalar_t>();
+
+        deformable_col2im_coord_gpu_kernel<<<
+            GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0,
+            at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_col_, data_im_, data_offset_, channels, height,
+            width, ksize_h, ksize_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group, parallel_imgs,
+            2 * ksize_h * ksize_w * deformable_group, deformable_group,
+            height_col, width_col, grad_offset_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_roi_pool_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_roi_pool_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..d44399829e99f725e2c24418723ea14685819858
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/deform_roi_pool_cuda.cu
@@ -0,0 +1,55 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "deform_roi_pool_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void DeformRoIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                            Tensor offset, Tensor output,
+                                            int pooled_height, int pooled_width,
+                                            float spatial_scale,
+                                            int sampling_ratio, float gamma) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "deform_roi_pool_forward_cuda_kernel", [&] {
+        deform_roi_pool_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), offset.data_ptr<scalar_t>(),
+                output.data_ptr<scalar_t>(), pooled_height, pooled_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio,
+                static_cast<scalar_t>(gamma), channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void DeformRoIPoolBackwardCUDAKernelLauncher(
+    Tensor grad_output, Tensor input, Tensor rois, Tensor offset,
+    Tensor grad_input, Tensor grad_offset, int pooled_height, int pooled_width,
+    float spatial_scale, int sampling_ratio, float gamma) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "deform_roi_pool_backward_cuda_kernel", [&] {
+        deform_roi_pool_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                input.data_ptr<scalar_t>(), rois.data_ptr<scalar_t>(),
+                offset.data_ptr<scalar_t>(), grad_input.data_ptr<scalar_t>(),
+                grad_offset.data_ptr<scalar_t>(), pooled_height, pooled_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio,
+                static_cast<scalar_t>(gamma), channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/diff_iou_rotated_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/diff_iou_rotated_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..62dbf5da357ac8f2178e53d21fd8f9d3339eca81
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/diff_iou_rotated_cuda.cu
@@ -0,0 +1,35 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Adapted from
+// https://github.com/lilanxiao/Rotated_IoU/cuda_op/sort_vert_kernel.cu  # noqa
+#include "diff_iou_rotated_cuda_kernel.cuh"
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_cuda_helper.hpp"
+
+at::Tensor DiffIoURotatedSortVerticesCUDAKernelLauncher(at::Tensor vertices,
+                                                        at::Tensor mask,
+                                                        at::Tensor num_valid) {
+  at::cuda::CUDAGuard device_guard(vertices.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  CHECK_CONTIGUOUS(vertices);
+  CHECK_CONTIGUOUS(mask);
+  CHECK_CONTIGUOUS(num_valid);
+  CHECK_CUDA(vertices);
+  CHECK_CUDA(mask);
+  CHECK_CUDA(num_valid);
+
+  int b = vertices.size(0);
+  int n = vertices.size(1);
+  int m = vertices.size(2);
+  at::Tensor idx =
+      torch::zeros({b, n, MAX_NUM_VERT_IDX},
+                   at::device(vertices.device()).dtype(at::ScalarType::Int));
+
+  diff_iou_rotated_sort_vertices_forward_cuda_kernel<<<b, opt_n_thread(n), 0,
+                                                       stream>>>(
+      b, n, m, vertices.data_ptr<float>(), mask.data_ptr<bool>(),
+      num_valid.data_ptr<int>(), idx.data_ptr<int>());
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  return idx;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/focal_loss_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/focal_loss_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..1d5c433379f58539df361702b1f8d196262c9799
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/focal_loss_cuda.cu
@@ -0,0 +1,111 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "sigmoid_focal_loss_cuda_kernel.cuh"
+#include "softmax_focal_loss_cuda_kernel.cuh"
+
+void SigmoidFocalLossForwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha) {
+  int output_size = output.numel();
+  int num_classes = input.size(1);
+  AT_ASSERTM(target.max().item<int32_t>() <= (int32_t)num_classes,
+             "target label should smaller or equal than num classes");
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "sigmoid_focal_loss_forward_cuda_kernel", [&] {
+        sigmoid_focal_loss_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                target.data_ptr<int32_t>(), weight.data_ptr<scalar_t>(),
+                output.data_ptr<scalar_t>(), gamma, alpha, num_classes);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SigmoidFocalLossBackwardCUDAKernelLauncher(Tensor input, Tensor target,
+                                                Tensor weight,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha) {
+  int output_size = grad_input.numel();
+  int num_classes = input.size(1);
+
+  at::cuda::CUDAGuard device_guard(grad_input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "sigmoid_focal_loss_backward_cuda_kernel", [&] {
+        sigmoid_focal_loss_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                target.data_ptr<int32_t>(), weight.data_ptr<scalar_t>(),
+                grad_input.data_ptr<scalar_t>(), gamma, alpha, num_classes);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SoftmaxFocalLossForwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha) {
+  int output_size = output.numel();
+  int num_classes = softmax.size(1);
+
+  AT_ASSERTM(target.max().item<int32_t>() <= (int32_t)num_classes,
+             "target label should smaller or equal than num classes");
+  at::cuda::CUDAGuard device_guard(softmax.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      softmax.scalar_type(), "softmax_focal_loss_forward_cuda_kernel", [&] {
+        softmax_focal_loss_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, softmax.data_ptr<scalar_t>(),
+                target.data_ptr<int32_t>(), weight.data_ptr<scalar_t>(),
+                output.data_ptr<scalar_t>(), gamma, alpha, num_classes);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SoftmaxFocalLossBackwardCUDAKernelLauncher(Tensor softmax, Tensor target,
+                                                Tensor weight, Tensor buff,
+                                                Tensor grad_input,
+                                                const float gamma,
+                                                const float alpha) {
+  int num_classes = softmax.size(1);
+
+  int output_size = buff.numel();
+  at::cuda::CUDAGuard device_guard(grad_input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_input.scalar_type(),
+      "softmax_focal_loss_backward_cuda1_"
+      "kernel",
+      [&] {
+        softmax_focal_loss_backward_cuda1_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, softmax.data_ptr<scalar_t>(),
+                target.data_ptr<int32_t>(), weight.data_ptr<scalar_t>(),
+                buff.data_ptr<scalar_t>(), gamma, alpha, num_classes);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  output_size = grad_input.numel();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_input.scalar_type(),
+      "softmax_focal_loss_backward_cuda2_"
+      "kernel",
+      [&] {
+        softmax_focal_loss_backward_cuda2_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, softmax.data_ptr<scalar_t>(),
+                target.data_ptr<int32_t>(), buff.data_ptr<scalar_t>(),
+                grad_input.data_ptr<scalar_t>(), num_classes);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/furthest_point_sample_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/furthest_point_sample_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..6b1392733aeffef5cdf6895d25c5fc3a967ac600
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/furthest_point_sample_cuda.cu
@@ -0,0 +1,146 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/sampling_gpu.cu
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "furthest_point_sample_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+inline int opt_n_threads(int work_size) {
+#if defined(__ILUVATAR__)
+  const int pow_2 = std::log(static_cast<float>(work_size)) / std::log(2.0);
+#else
+  const int pow_2 = std::log(static_cast<double>(work_size)) / std::log(2.0);
+#endif
+  return std::max(std::min(1 << pow_2, 1024), 1);
+}
+
+void FurthestPointSamplingForwardCUDAKernelLauncher(int b, int n, int m,
+                                                    const float* dataset,
+                                                    float* temp, int* idxs) {
+  // dataset: (B, N, 3)
+  // tmp: (B, N)
+  // output:
+  //      idx: (B, M)
+
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  unsigned int n_threads = opt_n_threads(n);
+
+  switch (n_threads) {
+    case 1024:
+      furthest_point_sampling_forward_cuda_kernel<1024>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 512:
+      furthest_point_sampling_forward_cuda_kernel<512>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 256:
+      furthest_point_sampling_forward_cuda_kernel<256>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 128:
+      furthest_point_sampling_forward_cuda_kernel<128>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 64:
+      furthest_point_sampling_forward_cuda_kernel<64>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 32:
+      furthest_point_sampling_forward_cuda_kernel<32>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 16:
+      furthest_point_sampling_forward_cuda_kernel<16>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 8:
+      furthest_point_sampling_forward_cuda_kernel<8>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 4:
+      furthest_point_sampling_forward_cuda_kernel<4>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 2:
+      furthest_point_sampling_forward_cuda_kernel<2>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 1:
+      furthest_point_sampling_forward_cuda_kernel<1>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    default:
+      furthest_point_sampling_forward_cuda_kernel<512>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void FurthestPointSamplingWithDistForwardCUDAKernelLauncher(
+    int b, int n, int m, const float* dataset, float* temp, int* idxs) {
+  // dataset: (B, N, N)
+  // temp: (B, N)
+  // output:
+  //      idx: (B, M)
+
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  unsigned int n_threads = opt_n_threads(n);
+
+  switch (n_threads) {
+    case 1024:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<1024>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 512:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<512>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 256:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<256>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 128:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<128>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 64:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<64>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 32:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<32>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 16:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<16>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 8:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<8>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 4:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<4>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 2:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<2>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    case 1:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<1>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+      break;
+    default:
+      furthest_point_sampling_with_dist_forward_cuda_kernel<512>
+          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/fused_bias_leakyrelu_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/fused_bias_leakyrelu_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..911ea019aad65c8e51ca94c273cb5bbad70ae8db
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/fused_bias_leakyrelu_cuda.cu
@@ -0,0 +1,109 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/fused_bias_act_kernel.cu
+// Copyright (c) 2019, NVIDIA Corporation. All rights reserved.
+//
+// This work is made available under the Nvidia Source Code License-NC.
+// To view a copy of this license, visit
+// https://nvlabs.github.io/stylegan2/license.html
+
+#include <ATen/ATen.h>
+#include <ATen/AccumulateType.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <torch/types.h>
+
+#include <ATen/cuda/CUDAApplyUtils.cuh>
+
+template <typename scalar_t>
+static __global__ void fused_bias_act_kernel(
+    scalar_t* out, const scalar_t* p_x, const scalar_t* p_b,
+    const scalar_t* p_ref, int act, int grad, scalar_t alpha, scalar_t scale,
+    int loop_x, int size_x, int step_b, int size_b, int use_bias, int use_ref) {
+  int xi = blockIdx.x * loop_x * blockDim.x + threadIdx.x;
+
+  scalar_t zero = 0.0;
+
+  for (int loop_idx = 0; loop_idx < loop_x && xi < size_x;
+       loop_idx++, xi += blockDim.x) {
+    scalar_t x = p_x[xi];
+
+    if (use_bias) {
+      x += p_b[(xi / step_b) % size_b];
+    }
+
+    scalar_t ref = use_ref ? p_ref[xi] : zero;
+
+    scalar_t y;
+
+    // act = 1: linear layer
+    // act = 3: leaky relu layer
+    // grad = 0: direct forward path
+    // grad = 1: first order deviation
+    // grad = 2: second order deviation
+    switch (act * 10 + grad) {
+      default:
+      case 10:
+        y = x;
+        break;
+      case 11:
+        y = x;
+        break;
+      case 12:
+        y = 0.0;
+        break;
+
+      case 30:
+        y = (x > 0.0) ? x : x * alpha;
+        break;
+      case 31:
+        y = (ref > 0.0) ? x : x * alpha;
+        break;
+      case 32:
+        y = 0.0;
+        break;
+    }
+
+    out[xi] = y * scale;
+  }
+}
+
+torch::Tensor fused_bias_leakyrelu_op(const torch::Tensor& input,
+                                      const torch::Tensor& bias,
+                                      const torch::Tensor& refer, int act,
+                                      int grad, float alpha, float scale) {
+  int curDevice = -1;
+  cudaGetDevice(&curDevice);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream(curDevice);
+
+  auto x = input.contiguous();
+  auto b = bias.contiguous();
+  auto ref = refer.contiguous();
+
+  int use_bias = b.numel() ? 1 : 0;
+  int use_ref = ref.numel() ? 1 : 0;
+
+  int size_x = x.numel();
+  int size_b = b.numel();
+  int step_b = 1;
+
+  for (int i = 1 + 1; i < x.dim(); i++) {
+    step_b *= x.size(i);
+  }
+
+  int loop_x = 4;
+  int block_size = 4 * 32;
+  int grid_size = (size_x - 1) / (loop_x * block_size) + 1;
+
+  auto y = torch::empty_like(x);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      x.scalar_type(), "fused_bias_act_kernel", [&] {
+        fused_bias_act_kernel<scalar_t><<<grid_size, block_size, 0, stream>>>(
+            y.data_ptr<scalar_t>(), x.data_ptr<scalar_t>(),
+            b.data_ptr<scalar_t>(), ref.data_ptr<scalar_t>(), act, grad, alpha,
+            scale, loop_x, size_x, step_b, size_b, use_bias, use_ref);
+      });
+
+  return y;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/gather_points_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/gather_points_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..fd0a7b5daf03510cfb7408ff82cfac760af92afb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/gather_points_cuda.cu
@@ -0,0 +1,58 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "gather_points_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void GatherPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           const Tensor points,
+                                           const Tensor idx, Tensor out) {
+  // points: (B, C, N)
+  // idx: (B, npoints)
+  // output:
+  //      out: (B, C, npoints)
+
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(npoints, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "gather_points_forward_cuda_kernel", [&] {
+        gather_points_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, n, npoints, points.data_ptr<scalar_t>(),
+                idx.data_ptr<int>(), out.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void GatherPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                            const Tensor grad_out,
+                                            const Tensor idx,
+                                            Tensor grad_points) {
+  // grad_out: (B, C, npoints)
+  // idx: (B, npoints)
+  // output:
+  //      grad_points: (B, C, N)
+
+  at::cuda::CUDAGuard device_guard(grad_out.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(npoints, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "gather_points_backward_cuda_kernel", [&] {
+        gather_points_backward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, n, npoints, grad_out.data_ptr<scalar_t>(),
+                idx.data_ptr<int>(), grad_points.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/group_points_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/group_points_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..42fc2bb67b13938b8994f1961ec2fbc41a30d2d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/group_points_cuda.cu
@@ -0,0 +1,61 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points_gpu.cu
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "group_points_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void GroupPointsForwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                          int nsample, const Tensor points,
+                                          const Tensor idx, Tensor out) {
+  // points: (B, C, N)
+  // idx: (B, npoints, nsample)
+  // output:
+  //      out: (B, C, npoints, nsample)
+
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(npoints * nsample, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "group_points_forward_cuda_kernel", [&] {
+        group_points_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, n, npoints, nsample, points.data_ptr<scalar_t>(),
+                idx.data_ptr<int>(), out.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void GroupPointsBackwardCUDAKernelLauncher(int b, int c, int n, int npoints,
+                                           int nsample, const Tensor grad_out,
+                                           const Tensor idx,
+                                           Tensor grad_points) {
+  // grad_out: (B, C, npoints, nsample)
+  // idx: (B, npoints, nsample)
+  // output:
+  //      grad_points: (B, C, N)
+
+  at::cuda::CUDAGuard device_guard(grad_out.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(npoints * nsample, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "group_points_backward_cuda_kernel", [&] {
+        group_points_backward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, n, npoints, nsample, grad_out.data_ptr<scalar_t>(),
+                idx.data_ptr<int>(), grad_points.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/iou3d_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/iou3d_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..bb1c5fc136d3e4952554dbc5cec29cd3ae5dd27b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/iou3d_cuda.cu
@@ -0,0 +1,104 @@
+// Modified from
+// https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/iou3d_nms/src/iou3d_nms_kernel.cu
+
+/*
+3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others)
+Written by Shaoshuai Shi
+All Rights Reserved 2019-2020.
+*/
+
+#include <stdio.h>
+
+#include "iou3d_cuda_kernel.cuh"
+#include "nms_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void IoU3DBoxesOverlapBevForwardCUDAKernelLauncher(const int num_a,
+                                                   const Tensor boxes_a,
+                                                   const int num_b,
+                                                   const Tensor boxes_b,
+                                                   Tensor ans_overlap) {
+  at::cuda::CUDAGuard device_guard(boxes_a.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(num_b, THREADS_PER_BLOCK_IOU3D),
+              GET_BLOCKS(num_a, THREADS_PER_BLOCK_IOU3D));
+  dim3 threads(THREADS_PER_BLOCK_IOU3D, THREADS_PER_BLOCK_IOU3D);
+
+  iou3d_boxes_overlap_bev_forward_cuda_kernel<<<blocks, threads, 0, stream>>>(
+      num_a, boxes_a.data_ptr<float>(), num_b, boxes_b.data_ptr<float>(),
+      ans_overlap.data_ptr<float>());
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void IoU3DNMS3DForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                         Tensor& keep_num,
+                                         float nms_overlap_thresh) {
+  using namespace at::indexing;
+  at::cuda::CUDAGuard device_guard(boxes.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  int boxes_num = boxes.size(0);
+
+  const int col_blocks =
+      (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+  Tensor mask =
+      at::empty({boxes_num, col_blocks}, boxes.options().dtype(at::kLong));
+
+  dim3 blocks(GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS),
+              GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS));
+  dim3 threads(THREADS_PER_BLOCK_NMS);
+
+  iou3d_nms3d_forward_cuda_kernel<<<blocks, threads, 0, stream>>>(
+      boxes_num, nms_overlap_thresh, boxes.data_ptr<float>(),
+      (unsigned long long*)mask.data_ptr<int64_t>());
+
+  at::Tensor keep_t = at::zeros(
+      {boxes_num}, boxes.options().dtype(at::kBool).device(at::kCUDA));
+  gather_keep_from_mask<<<1, std::min(col_blocks, THREADS_PER_BLOCK),
+                          col_blocks * sizeof(unsigned long long), stream>>>(
+      keep_t.data_ptr<bool>(), (unsigned long long*)mask.data_ptr<int64_t>(),
+      boxes_num);
+
+  auto keep_data = keep_t.nonzero().index({Slice(), 0});
+  keep_num.fill_(at::Scalar(keep_data.size(0)));
+  keep.index_put_({Slice(0, keep_data.size(0))}, keep_data);
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void IoU3DNMS3DNormalForwardCUDAKernelLauncher(const Tensor boxes, Tensor& keep,
+                                               Tensor& keep_num,
+                                               float nms_overlap_thresh) {
+  using namespace at::indexing;
+  at::cuda::CUDAGuard device_guard(boxes.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  int boxes_num = boxes.size(0);
+
+  const int col_blocks =
+      (boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS;
+  Tensor mask =
+      at::empty({boxes_num, col_blocks}, boxes.options().dtype(at::kLong));
+
+  dim3 blocks(GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS),
+              GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS));
+  dim3 threads(THREADS_PER_BLOCK_NMS);
+
+  iou3d_nms3d_normal_forward_cuda_kernel<<<blocks, threads, 0, stream>>>(
+      boxes_num, nms_overlap_thresh, boxes.data_ptr<float>(),
+      (unsigned long long*)mask.data_ptr<int64_t>());
+
+  at::Tensor keep_t = at::zeros(
+      {boxes_num}, boxes.options().dtype(at::kBool).device(at::kCUDA));
+  gather_keep_from_mask<<<1, std::min(col_blocks, THREADS_PER_BLOCK),
+                          col_blocks * sizeof(unsigned long long), stream>>>(
+      keep_t.data_ptr<bool>(), (unsigned long long*)mask.data_ptr<int64_t>(),
+      boxes_num);
+
+  auto keep_data = keep_t.nonzero().index({Slice(), 0});
+  keep_num.fill_(at::Scalar(keep_data.size(0)));
+  keep.index_put_({Slice(0, keep_data.size(0))}, keep_data);
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/knn_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/knn_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e3351819779cc356cc21d7bb375082f71da2cb75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/knn_cuda.cu
@@ -0,0 +1,34 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/pointops/src/knnquery_heap
+
+#include <cmath>
+#include <cstdio>
+
+#include "knn_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void KNNForwardCUDAKernelLauncher(int b, int n, int m, int nsample,
+                                  const Tensor xyz, const Tensor new_xyz,
+                                  Tensor idx, Tensor dist2) {
+  // param new_xyz: (B, m, 3)
+  // param xyz: (B, n, 3)
+  // param idx: (B, m, nsample)
+
+  at::cuda::CUDAGuard device_guard(new_xyz.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(m, THREADS_PER_BLOCK), b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      new_xyz.scalar_type(), "knn_forward_cuda_kernel", [&] {
+        knn_forward_cuda_kernel<scalar_t><<<blocks, threads, 0, stream>>>(
+            b, n, m, nsample, xyz.data_ptr<scalar_t>(),
+            new_xyz.data_ptr<scalar_t>(), idx.data_ptr<int>(),
+            dist2.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/masked_conv2d_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/masked_conv2d_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..022e18901580a415037d1d5942791b3ccafc30b9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/masked_conv2d_cuda.cu
@@ -0,0 +1,54 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "masked_conv2d_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void MaskedIm2colForwardCUDAKernelLauncher(const Tensor bottom_data,
+                                           const Tensor mask_h_idx,
+                                           const Tensor mask_w_idx,
+                                           Tensor top_data, const int kernel_h,
+                                           const int kernel_w, const int pad_h,
+                                           const int pad_w) {
+  int channels = bottom_data.size(1);
+  int height = bottom_data.size(2);
+  int width = bottom_data.size(3);
+  int mask_cnt = mask_h_idx.size(0);
+  int output_size = mask_cnt * channels;
+
+  at::cuda::CUDAGuard device_guard(bottom_data.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      bottom_data.scalar_type(), "MaskedIm2colLaucherForward", ([&] {
+        const scalar_t *bottom_data_ = bottom_data.data_ptr<scalar_t>();
+        const int64_t *mask_h_idx_ = mask_h_idx.data_ptr<int64_t>();
+        const int64_t *mask_w_idx_ = mask_w_idx.data_ptr<int64_t>();
+        scalar_t *top_data_ = top_data.data_ptr<scalar_t>();
+        MaskedIm2colForward<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, bottom_data_, height, width, kernel_h, kernel_w,
+                pad_h, pad_w, mask_h_idx_, mask_w_idx_, mask_cnt, top_data_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void MaskedCol2imForwardCUDAKernelLauncher(
+    const Tensor bottom_data, const Tensor mask_h_idx, const Tensor mask_w_idx,
+    Tensor top_data, const int height, const int width, const int channels) {
+  int mask_cnt = mask_h_idx.size(0);
+  int output_size = mask_cnt * channels;
+
+  at::cuda::CUDAGuard device_guard(bottom_data.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      bottom_data.scalar_type(), "MaskedCol2imLaucherForward", ([&] {
+        const scalar_t *bottom_data_ = bottom_data.data_ptr<scalar_t>();
+        const int64_t *mask_h_idx_ = mask_h_idx.data_ptr<int64_t>();
+        const int64_t *mask_w_idx_ = mask_w_idx.data_ptr<int64_t>();
+        scalar_t *top_data_ = top_data.data_ptr<scalar_t>();
+
+        MaskedCol2imForward<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, bottom_data_, height, width, channels, mask_h_idx_,
+                mask_w_idx_, mask_cnt, top_data_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/min_area_polygons.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/min_area_polygons.cu
new file mode 100644
index 0000000000000000000000000000000000000000..9314f2dda6c89e1f35369b1b7ab9d290cf2ab295
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/min_area_polygons.cu
@@ -0,0 +1,21 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/SDL-GuoZonghao/BeyondBoundingBox/blob/main/mmdet/ops/minareabbox/src/minareabbox_kernel.cu
+#include "min_area_polygons_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void MinAreaPolygonsCUDAKernelLauncher(const Tensor pointsets,
+                                       Tensor polygons) {
+  int num_pointsets = pointsets.size(0);
+  const int output_size = polygons.numel();
+  at::cuda::CUDAGuard device_guard(pointsets.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      pointsets.scalar_type(), "min_area_polygons_cuda_kernel", ([&] {
+        min_area_polygons_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                num_pointsets, pointsets.data_ptr<scalar_t>(),
+                polygons.data_ptr<scalar_t>());
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/modulated_deform_conv_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/modulated_deform_conv_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..2b52796e4fdfa2b8bf039fd66f0b16a3af8c84ee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/modulated_deform_conv_cuda.cu
@@ -0,0 +1,96 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "modulated_deform_conv_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void modulated_deformable_im2col_cuda(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col) {
+  // num_axes should be smaller than block size
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels = channels * batch_size * height_col * width_col;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.scalar_type(), "modulated_deformable_im2col_gpu", ([&] {
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+
+        modulated_deformable_im2col_gpu_kernel<<<
+            GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0,
+            at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_im_, data_offset_, data_mask_, height_im,
+            width_im, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group, batch_size,
+            channels, deformable_group, height_col, width_col, data_col_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void modulated_deformable_col2im_cuda(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im) {
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels =
+      channels * kernel_h * kernel_w * batch_size * height_col * width_col;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "modulated_deformable_col2im_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data_ptr<scalar_t>();
+
+        modulated_deformable_col2im_gpu_kernel<<<
+            GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0,
+            at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_col_, data_offset_, data_mask_, channels,
+            height_im, width_im, kernel_h, kernel_w, pad_h, pad_w, stride_h,
+            stride_w, dilation_h, dilation_w, channel_per_deformable_group,
+            batch_size, deformable_group, height_col, width_col, grad_im_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void modulated_deformable_col2im_coord_cuda(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask) {
+  const int num_kernels = batch_size * height_col * width_col * 2 * kernel_h *
+                          kernel_w * deformable_group;
+  const int channel_per_deformable_group =
+      channels * kernel_h * kernel_w / deformable_group;
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.scalar_type(), "modulated_deformable_col2im_coord_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data_ptr<scalar_t>();
+        const scalar_t *data_im_ = data_im.data_ptr<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data_ptr<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data_ptr<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data_ptr<scalar_t>();
+        scalar_t *grad_mask_ = grad_mask.data_ptr<scalar_t>();
+
+        modulated_deformable_col2im_coord_gpu_kernel<<<
+            GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0,
+            at::cuda::getCurrentCUDAStream()>>>(
+            num_kernels, data_col_, data_im_, data_offset_, data_mask_,
+            channels, height_im, width_im, kernel_h, kernel_w, pad_h, pad_w,
+            stride_h, stride_w, dilation_h, dilation_w,
+            channel_per_deformable_group, batch_size,
+            2 * kernel_h * kernel_w * deformable_group, deformable_group,
+            height_col, width_col, grad_offset_, grad_mask_);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ms_deform_attn_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ms_deform_attn_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..fd191ee9c99eb000dced9131abf551ce65c691d3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/ms_deform_attn_cuda.cu
@@ -0,0 +1,351 @@
+/*!
+**************************************************************************************************
+* Deformable DETR
+* Copyright (c) 2020 SenseTime. All Rights Reserved.
+* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
+**************************************************************************************************
+* Modified from
+*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
+**************************************************************************************************
+*/
+
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <cuda.h>
+#include <cuda_runtime.h>
+
+#include <THC/THCAtomics.cuh>
+#include <vector>
+
+#include "ms_deform_attn_cuda_kernel.cuh"
+
+template <typename scalar_t>
+void ms_deformable_im2col_cuda(cudaStream_t stream, const scalar_t *data_value,
+                               const int64_t *data_spatial_shapes,
+                               const int64_t *data_level_start_index,
+                               const scalar_t *data_sampling_loc,
+                               const scalar_t *data_attn_weight,
+                               const int batch_size, const int spatial_size,
+                               const int num_heads, const int channels,
+                               const int num_levels, const int num_query,
+                               const int num_point, scalar_t *data_col) {
+  const int num_kernels = batch_size * num_query * num_heads * channels;
+  const int num_actual_kernels = batch_size * num_query * num_heads * channels;
+  const int num_threads = THREADS_PER_BLOCK;
+  ms_deformable_im2col_gpu_kernel<scalar_t>
+      <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0, stream>>>(
+          num_kernels, data_value, data_spatial_shapes, data_level_start_index,
+          data_sampling_loc, data_attn_weight, batch_size, spatial_size,
+          num_heads, channels, num_levels, num_query, num_point, data_col);
+
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess) {
+    printf("error in ms_deformable_im2col_cuda: %s\n", cudaGetErrorString(err));
+  }
+}
+
+template <typename scalar_t>
+void ms_deformable_col2im_cuda(
+    cudaStream_t stream, const scalar_t *grad_col, const scalar_t *data_value,
+    const int64_t *data_spatial_shapes, const int64_t *data_level_start_index,
+    const scalar_t *data_sampling_loc, const scalar_t *data_attn_weight,
+    const int batch_size, const int spatial_size, const int num_heads,
+    const int channels, const int num_levels, const int num_query,
+    const int num_point, scalar_t *grad_value, scalar_t *grad_sampling_loc,
+    scalar_t *grad_attn_weight) {
+  const int num_threads =
+      (channels > THREADS_PER_BLOCK) ? THREADS_PER_BLOCK : channels;
+  const int num_kernels = batch_size * num_query * num_heads * channels;
+  const int num_actual_kernels = batch_size * num_query * num_heads * channels;
+  if (channels > THREADS_PER_BLOCK) {
+    if ((channels & THREADS_PER_BLOCK - 1) == 0) {
+      ms_deformable_col2im_gpu_kernel_shm_reduce_v2_multi_blocks<scalar_t>
+          <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,
+             num_threads * 3 * sizeof(scalar_t), stream>>>(
+              num_kernels, grad_col, data_value, data_spatial_shapes,
+              data_level_start_index, data_sampling_loc, data_attn_weight,
+              batch_size, spatial_size, num_heads, channels, num_levels,
+              num_query, num_point, grad_value, grad_sampling_loc,
+              grad_attn_weight);
+    } else {
+      ms_deformable_col2im_gpu_kernel_gm<scalar_t>
+          <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+             stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                       data_level_start_index, data_sampling_loc,
+                       data_attn_weight, batch_size, spatial_size, num_heads,
+                       channels, num_levels, num_query, num_point, grad_value,
+                       grad_sampling_loc, grad_attn_weight);
+    }
+  } else {
+    switch (channels) {
+      case 1:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      1>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 2:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      2>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 4:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      4>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 8:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      8>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 16:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      16>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 32:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t,
+                                                                      32>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 64:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t,
+                                                                      64>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 128:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t,
+                                                                      128>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 256:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t,
+                                                                      256>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      case 512:
+        ms_deformable_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t,
+                                                                      512>
+            <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,
+               stream>>>(num_kernels, grad_col, data_value, data_spatial_shapes,
+                         data_level_start_index, data_sampling_loc,
+                         data_attn_weight, batch_size, spatial_size, num_heads,
+                         channels, num_levels, num_query, num_point, grad_value,
+                         grad_sampling_loc, grad_attn_weight);
+        break;
+      default:
+        if (channels < 64) {
+          ms_deformable_col2im_gpu_kernel_shm_reduce_v1<scalar_t>
+              <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,
+                 num_threads * 3 * sizeof(scalar_t), stream>>>(
+                  num_kernels, grad_col, data_value, data_spatial_shapes,
+                  data_level_start_index, data_sampling_loc, data_attn_weight,
+                  batch_size, spatial_size, num_heads, channels, num_levels,
+                  num_query, num_point, grad_value, grad_sampling_loc,
+                  grad_attn_weight);
+        } else {
+          ms_deformable_col2im_gpu_kernel_shm_reduce_v2<scalar_t>
+              <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,
+                 num_threads * 3 * sizeof(scalar_t), stream>>>(
+                  num_kernels, grad_col, data_value, data_spatial_shapes,
+                  data_level_start_index, data_sampling_loc, data_attn_weight,
+                  batch_size, spatial_size, num_heads, channels, num_levels,
+                  num_query, num_point, grad_value, grad_sampling_loc,
+                  grad_attn_weight);
+        }
+    }
+  }
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess) {
+    printf("error in ms_deformable_col2im_cuda: %s\n", cudaGetErrorString(err));
+  }
+}
+
+at::Tensor ms_deform_attn_cuda_forward(const at::Tensor &value,
+                                       const at::Tensor &spatial_shapes,
+                                       const at::Tensor &level_start_index,
+                                       const at::Tensor &sampling_loc,
+                                       const at::Tensor &attn_weight,
+                                       const int im2col_step) {
+  AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
+  AT_ASSERTM(spatial_shapes.is_contiguous(),
+             "spatial_shapes tensor has to be contiguous");
+  AT_ASSERTM(level_start_index.is_contiguous(),
+             "level_start_index tensor has to be contiguous");
+  AT_ASSERTM(sampling_loc.is_contiguous(),
+             "sampling_loc tensor has to be contiguous");
+  AT_ASSERTM(attn_weight.is_contiguous(),
+             "attn_weight tensor has to be contiguous");
+
+  AT_ASSERTM(value.is_cuda(), "value must be a CUDA tensor");
+  AT_ASSERTM(spatial_shapes.is_cuda(), "spatial_shapes must be a CUDA tensor");
+  AT_ASSERTM(level_start_index.is_cuda(),
+             "level_start_index must be a CUDA tensor");
+  AT_ASSERTM(sampling_loc.is_cuda(), "sampling_loc must be a CUDA tensor");
+  AT_ASSERTM(attn_weight.is_cuda(), "attn_weight must be a CUDA tensor");
+
+  const int batch = value.size(0);
+  const int spatial_size = value.size(1);
+  const int num_heads = value.size(2);
+  const int channels = value.size(3);
+
+  const int num_levels = spatial_shapes.size(0);
+
+  const int num_query = sampling_loc.size(1);
+  const int num_point = sampling_loc.size(4);
+
+  const int im2col_step_ = std::min(batch, im2col_step);
+
+  AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)",
+             batch, im2col_step_);
+
+  auto output =
+      at::zeros({batch, num_query, num_heads, channels}, value.options());
+
+  const int batch_n = im2col_step_;
+  auto output_n = output.view(
+      {batch / im2col_step_, batch_n, num_query, num_heads, channels});
+  auto per_value_size = spatial_size * num_heads * channels;
+  auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
+  auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
+  for (int n = 0; n < batch / im2col_step_; ++n) {
+    auto columns = output_n.select(0, n);
+    AT_DISPATCH_FLOATING_TYPES(
+        value.scalar_type(), "ms_deform_attn_forward_cuda", ([&] {
+          ms_deformable_im2col_cuda(
+              at::cuda::getCurrentCUDAStream(),
+              value.data_ptr<scalar_t>() + n * im2col_step_ * per_value_size,
+              spatial_shapes.data_ptr<int64_t>(),
+              level_start_index.data_ptr<int64_t>(),
+              sampling_loc.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_sample_loc_size,
+              attn_weight.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_attn_weight_size,
+              batch_n, spatial_size, num_heads, channels, num_levels, num_query,
+              num_point, columns.data_ptr<scalar_t>());
+        }));
+  }
+
+  output = output.view({batch, num_query, num_heads * channels});
+
+  return output;
+}
+
+void ms_deform_attn_cuda_backward(
+    const at::Tensor &value, const at::Tensor &spatial_shapes,
+    const at::Tensor &level_start_index, const at::Tensor &sampling_loc,
+    const at::Tensor &attn_weight, const at::Tensor &grad_output,
+    at::Tensor &grad_value, at::Tensor &grad_sampling_loc,
+    at::Tensor &grad_attn_weight, const int im2col_step) {
+  AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
+  AT_ASSERTM(spatial_shapes.is_contiguous(),
+             "spatial_shapes tensor has to be contiguous");
+  AT_ASSERTM(level_start_index.is_contiguous(),
+             "level_start_index tensor has to be contiguous");
+  AT_ASSERTM(sampling_loc.is_contiguous(),
+             "sampling_loc tensor has to be contiguous");
+  AT_ASSERTM(attn_weight.is_contiguous(),
+             "attn_weight tensor has to be contiguous");
+  AT_ASSERTM(grad_output.is_contiguous(),
+             "grad_output tensor has to be contiguous");
+
+  AT_ASSERTM(value.is_cuda(), "value must be a CUDA tensor");
+  AT_ASSERTM(spatial_shapes.is_cuda(), "spatial_shapes must be a CUDA tensor");
+  AT_ASSERTM(level_start_index.is_cuda(),
+             "level_start_index must be a CUDA tensor");
+  AT_ASSERTM(sampling_loc.is_cuda(), "sampling_loc must be a CUDA tensor");
+  AT_ASSERTM(attn_weight.is_cuda(), "attn_weight must be a CUDA tensor");
+  AT_ASSERTM(grad_output.is_cuda(), "grad_output must be a CUDA tensor");
+
+  const int batch = value.size(0);
+  const int spatial_size = value.size(1);
+  const int num_heads = value.size(2);
+  const int channels = value.size(3);
+
+  const int num_levels = spatial_shapes.size(0);
+
+  const int num_query = sampling_loc.size(1);
+  const int num_point = sampling_loc.size(4);
+
+  const int im2col_step_ = std::min(batch, im2col_step);
+
+  AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)",
+             batch, im2col_step_);
+
+  const int batch_n = im2col_step_;
+  auto per_value_size = spatial_size * num_heads * channels;
+  auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
+  auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
+  auto grad_output_n = grad_output.view(
+      {batch / im2col_step_, batch_n, num_query, num_heads, channels});
+
+  for (int n = 0; n < batch / im2col_step_; ++n) {
+    auto grad_output_g = grad_output_n.select(0, n);
+    AT_DISPATCH_FLOATING_TYPES(
+        value.scalar_type(), "ms_deform_attn_backward_cuda", ([&] {
+          ms_deformable_col2im_cuda(
+              at::cuda::getCurrentCUDAStream(),
+              grad_output_g.data_ptr<scalar_t>(),
+              value.data_ptr<scalar_t>() + n * im2col_step_ * per_value_size,
+              spatial_shapes.data_ptr<int64_t>(),
+              level_start_index.data_ptr<int64_t>(),
+              sampling_loc.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_sample_loc_size,
+              attn_weight.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_attn_weight_size,
+              batch_n, spatial_size, num_heads, channels, num_levels, num_query,
+              num_point,
+              grad_value.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_value_size,
+              grad_sampling_loc.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_sample_loc_size,
+              grad_attn_weight.data_ptr<scalar_t>() +
+                  n * im2col_step_ * per_attn_weight_size);
+        }));
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e7179c6ab15e9e0360b430cc24cae5203bd7cbf6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_cuda.cu
@@ -0,0 +1,36 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "nms_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+Tensor NMSCUDAKernelLauncher(Tensor boxes, Tensor scores, float iou_threshold,
+                             int offset) {
+  at::cuda::CUDAGuard device_guard(boxes.device());
+
+  if (boxes.numel() == 0) {
+    return at::empty({0}, boxes.options().dtype(at::kLong));
+  }
+  auto order_t = std::get<1>(scores.sort(0, /*descending=*/true));
+  auto boxes_sorted = boxes.index_select(0, order_t);
+
+  int boxes_num = boxes.size(0);
+  const int col_blocks = (boxes_num + threadsPerBlock - 1) / threadsPerBlock;
+  const int col_blocks_alloc = GET_BLOCKS(boxes_num, threadsPerBlock);
+  Tensor mask =
+      at::empty({boxes_num, col_blocks}, boxes.options().dtype(at::kLong));
+  dim3 blocks(col_blocks_alloc, col_blocks_alloc);
+  dim3 threads(threadsPerBlock);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  nms_cuda<<<blocks, threads, 0, stream>>>(
+      boxes_num, iou_threshold, offset, boxes_sorted.data_ptr<float>(),
+      (unsigned long long*)mask.data_ptr<int64_t>());
+
+  // Filter the boxes which should be kept.
+  at::Tensor keep_t = at::zeros(
+      {boxes_num}, boxes.options().dtype(at::kBool).device(at::kCUDA));
+  gather_keep_from_mask<<<1, std::min(col_blocks, THREADS_PER_BLOCK),
+                          col_blocks * sizeof(unsigned long long), stream>>>(
+      keep_t.data_ptr<bool>(), (unsigned long long*)mask.data_ptr<int64_t>(),
+      boxes_num);
+  AT_CUDA_CHECK(cudaGetLastError());
+  return order_t.masked_select(keep_t);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_quadri_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_quadri_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..15004b82179ab36408355cac4deef90de252b291
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_quadri_cuda.cu
@@ -0,0 +1,60 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "nms_quadri_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+Tensor nms_quadri_cuda(const Tensor dets, const Tensor scores,
+                       const Tensor order_t, const Tensor dets_sorted,
+                       float iou_threshold, const int multi_label) {
+  // using scalar_t = float;
+  AT_ASSERTM(dets.is_cuda(), "dets must be a CUDA tensor");
+  AT_ASSERTM(scores.is_cuda(), "scores must be a CUDA tensor");
+  at::cuda::CUDAGuard device_guard(dets.device());
+
+  int dets_num = dets.size(0);
+
+  const int col_blocks = at::cuda::ATenCeilDiv(dets_num, threadsPerBlock);
+
+  Tensor mask =
+      at::empty({dets_num * col_blocks}, dets.options().dtype(at::kLong));
+
+  dim3 blocks(col_blocks, col_blocks);
+  dim3 threads(threadsPerBlock);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      dets_sorted.scalar_type(), "nms_quadri_kernel_cuda", [&] {
+        nms_quadri_cuda_kernel<scalar_t><<<blocks, threads, 0, stream>>>(
+            dets_num, iou_threshold, dets_sorted.data_ptr<scalar_t>(),
+            (unsigned long long*)mask.data_ptr<int64_t>(), multi_label);
+      });
+
+  Tensor mask_cpu = mask.to(at::kCPU);
+  unsigned long long* mask_host =
+      (unsigned long long*)mask_cpu.data_ptr<int64_t>();
+
+  std::vector<unsigned long long> remv(col_blocks);
+  memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
+
+  Tensor keep =
+      at::empty({dets_num}, dets.options().dtype(at::kLong).device(at::kCPU));
+  int64_t* keep_out = keep.data_ptr<int64_t>();
+
+  int num_to_keep = 0;
+  for (int i = 0; i < dets_num; i++) {
+    int nblock = i / threadsPerBlock;
+    int inblock = i % threadsPerBlock;
+
+    if (!(remv[nblock] & (1ULL << inblock))) {
+      keep_out[num_to_keep++] = i;
+      unsigned long long* p = mask_host + i * col_blocks;
+      for (int j = nblock; j < col_blocks; j++) {
+        remv[j] |= p[j];
+      }
+    }
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+  return order_t.index(
+      {keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep)
+           .to(order_t.device(), keep.scalar_type())});
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_rotated_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_rotated_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e1185f81cb2fd58d00a30d3fff5215af76f57a85
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/nms_rotated_cuda.cu
@@ -0,0 +1,62 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/nms_rotated/nms_rotated_cuda.cu
+#include "nms_rotated_cuda.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+Tensor nms_rotated_cuda(const Tensor dets, const Tensor scores,
+                        const Tensor order_t, const Tensor dets_sorted,
+                        float iou_threshold, const int multi_label) {
+  // using scalar_t = float;
+  AT_ASSERTM(dets.is_cuda(), "dets must be a CUDA tensor");
+  AT_ASSERTM(scores.is_cuda(), "scores must be a CUDA tensor");
+  at::cuda::CUDAGuard device_guard(dets.device());
+
+  int dets_num = dets.size(0);
+
+  const int col_blocks = at::cuda::ATenCeilDiv(dets_num, threadsPerBlock);
+
+  Tensor mask =
+      at::empty({dets_num * col_blocks}, dets.options().dtype(at::kLong));
+
+  dim3 blocks(col_blocks, col_blocks);
+  dim3 threads(threadsPerBlock);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      dets_sorted.scalar_type(), "nms_rotated_kernel_cuda", [&] {
+        nms_rotated_cuda_kernel<scalar_t><<<blocks, threads, 0, stream>>>(
+            dets_num, iou_threshold, dets_sorted.data_ptr<scalar_t>(),
+            (unsigned long long*)mask.data_ptr<int64_t>(), multi_label);
+      });
+
+  Tensor mask_cpu = mask.to(at::kCPU);
+  unsigned long long* mask_host =
+      (unsigned long long*)mask_cpu.data_ptr<int64_t>();
+
+  std::vector<unsigned long long> remv(col_blocks);
+  memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
+
+  Tensor keep =
+      at::empty({dets_num}, dets.options().dtype(at::kLong).device(at::kCPU));
+  int64_t* keep_out = keep.data_ptr<int64_t>();
+
+  int num_to_keep = 0;
+  for (int i = 0; i < dets_num; i++) {
+    int nblock = i / threadsPerBlock;
+    int inblock = i % threadsPerBlock;
+
+    if (!(remv[nblock] & (1ULL << inblock))) {
+      keep_out[num_to_keep++] = i;
+      unsigned long long* p = mask_host + i * col_blocks;
+      for (int j = nblock; j < col_blocks; j++) {
+        remv[j] |= p[j];
+      }
+    }
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+  return order_t.index(
+      {keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep)
+           .to(order_t.device(), keep.scalar_type())});
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_boxes_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_boxes_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..3cc89d010a80126360fe42503a1754ef4a420afa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_boxes_cuda.cu
@@ -0,0 +1,62 @@
+// Modified from
+// https://github.com/sshaoshuai/PCDet/blob/master/pcdet/ops/roiaware_pool3d/src/roiaware_pool3d_kernel.cu
+// Written by Shaoshuai Shi
+// All Rights Reserved 2019.
+
+#include <stdio.h>
+
+#include "points_in_boxes_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void PointsInBoxesPartForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                                int pts_num, const Tensor boxes,
+                                                const Tensor pts,
+                                                Tensor box_idx_of_points) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is
+  // the bottom center, each box DO NOT overlaps params pts: (B, npoints, 3) [x,
+  // y, z] in LiDAR coordinate params boxes_idx_of_points: (B, npoints), default
+  // -1
+
+  at::cuda::CUDAGuard device_guard(boxes.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(GET_BLOCKS(pts_num, THREADS_PER_BLOCK), batch_size);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      boxes.scalar_type(), "points_in_boxes_part_forward_cuda_kernel", [&] {
+        points_in_boxes_part_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                batch_size, boxes_num, pts_num, boxes.data_ptr<scalar_t>(),
+                pts.data_ptr<scalar_t>(), box_idx_of_points.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void PointsInBoxesAllForwardCUDAKernelLauncher(int batch_size, int boxes_num,
+                                               int pts_num, const Tensor boxes,
+                                               const Tensor pts,
+                                               Tensor box_idx_of_points) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box params pts: (B, npoints, 3)
+  // [x, y, z] in LiDAR coordinate params boxes_idx_of_points: (B, npoints),
+  // default -1
+
+  at::cuda::CUDAGuard device_guard(boxes.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(GET_BLOCKS(pts_num, THREADS_PER_BLOCK), batch_size);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      boxes.scalar_type(), "points_in_boxes_all_forward_cuda_kernel", [&] {
+        points_in_boxes_all_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                batch_size, boxes_num, pts_num, boxes.data_ptr<scalar_t>(),
+                pts.data_ptr<scalar_t>(), box_idx_of_points.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_polygons_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_polygons_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..6e7db9ddfd63e4bfb3ca150a83dde5a79fb1717e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/points_in_polygons_cuda.cu
@@ -0,0 +1,28 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/ming71/CUDA/blob/master/point_justify/points_justify_kernel.cu
+
+#include <stdio.h>
+
+#include "points_in_polygons_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void PointsInPolygonsForwardCUDAKernelLauncher(const at::Tensor points,
+                                               const at::Tensor polygons,
+                                               const int rows, const int cols,
+                                               at::Tensor output) {
+  const int output_size = rows * cols;
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "points_in_polygons_forward_cuda_kernel", ([&] {
+        const scalar_t *vertex1 = points.data_ptr<scalar_t>();
+        const scalar_t *vertex2 = polygons.data_ptr<scalar_t>();
+        scalar_t *inside_flag = output.data_ptr<scalar_t>();
+
+        points_in_polygons_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, vertex1, vertex2, rows, cols, inside_flag);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/prroi_pool_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/prroi_pool_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e0636098b1d6fb6eef0c6a5ff334ddb43ae7855f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/prroi_pool_cuda.cu
@@ -0,0 +1,65 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "prroi_pool_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void PrROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois,
+                                        Tensor output, int pooled_height,
+                                        int pooled_width, float spatial_scale) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  prroi_pool_forward_cuda_kernel<float>
+      <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+          output_size, input.data_ptr<float>(), rois.data_ptr<float>(),
+          output.data_ptr<float>(), pooled_height, pooled_width,
+          static_cast<float>(spatial_scale), channels, height, width);
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void PrROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                         Tensor grad_input, int pooled_height,
+                                         int pooled_width,
+                                         float spatial_scale) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  prroi_pool_backward_cuda_kernel<float>
+      <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+          output_size, grad_output.data_ptr<float>(), rois.data_ptr<float>(),
+          grad_input.data_ptr<float>(), pooled_height, pooled_width,
+          static_cast<float>(spatial_scale), channels, height, width);
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void PrROIPoolCoorBackwardCUDAKernelLauncher(Tensor output, Tensor grad_output,
+                                             Tensor input, Tensor rois,
+                                             Tensor grad_rois,
+                                             int pooled_height,
+                                             int pooled_width,
+                                             float spatial_scale) {
+  int output_size = grad_output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  prroi_pool_coor_backward_cuda_kernel<float>
+      <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+          output_size, output.data_ptr<float>(), grad_output.data_ptr<float>(),
+          input.data_ptr<float>(), rois.data_ptr<float>(),
+          grad_rois.data_ptr<float>(), pooled_height, pooled_width,
+          static_cast<float>(spatial_scale), channels, height, width);
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/psamask_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/psamask_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..a0bdfa60c2d3ba75d089d0bfa44648821aaf4fed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/psamask_cuda.cu
@@ -0,0 +1,60 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/hszhao/semseg/blob/master/lib/psa/src
+
+#include <torch/serialize/tensor.h>
+
+#include "psamask_cuda_kernel.cuh"
+#include "pytorch_cuda_helper.hpp"
+
+void PSAMaskForwardCUDAKernelLauncher(const int psa_type, const Tensor input,
+                                      Tensor output, const int num_,
+                                      const int h_feature, const int w_feature,
+                                      const int h_mask, const int w_mask,
+                                      const int half_h_mask,
+                                      const int half_w_mask) {
+  int nthreads = num_ * h_feature * w_feature;
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  if (psa_type == 0)
+    AT_DISPATCH_FLOATING_TYPES(
+        input.scalar_type(), "psamask_collect_forward_cuda", [&] {
+          psamask_collect_forward_cuda<scalar_t><<<nthreads, 512, 0, stream>>>(
+              nthreads, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+              half_w_mask, input.data_ptr<scalar_t>(),
+              output.data_ptr<scalar_t>());
+        });
+  else
+    AT_DISPATCH_FLOATING_TYPES(
+        input.scalar_type(), "psamask_distribute_forward_cuda", [&] {
+          psamask_distribute_forward_cuda<scalar_t>
+              <<<nthreads, 512, 0, stream>>>(
+                  nthreads, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                  half_w_mask, input.data_ptr<scalar_t>(),
+                  output.data_ptr<scalar_t>());
+        });
+}
+
+void PSAMaskBackwardCUDAKernelLauncher(
+    const int psa_type, const Tensor grad_output, Tensor grad_input,
+    const int num_, const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int half_h_mask, const int half_w_mask) {
+  int nthreads = num_ * h_feature * w_feature;
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  if (psa_type == 0)
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_input.scalar_type(), "psamask_collect_backward_cuda", [&] {
+          psamask_collect_backward_cuda<scalar_t><<<nthreads, 512, 0, stream>>>(
+              nthreads, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+              half_w_mask, grad_output.data_ptr<scalar_t>(),
+              grad_input.data_ptr<scalar_t>());
+        });
+  else
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_input.scalar_type(), "psamask_distribute_backward_cuda", [&] {
+          psamask_distribute_backward_cuda<scalar_t>
+              <<<nthreads, 512, 0, stream>>>(
+                  nthreads, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                  half_w_mask, grad_output.data_ptr<scalar_t>(),
+                  grad_input.data_ptr<scalar_t>());
+        });
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/riroi_align_rotated_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/riroi_align_rotated_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..9829da731d6f5ad61ad2cde04a3b8511b5ca942c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/riroi_align_rotated_cuda.cu
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "riroi_align_rotated_cuda_kernel.cuh"
+
+void RiROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor features, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor output) {
+  const int output_size =
+      num_rois * pooled_height * pooled_width * channels * num_orientations;
+  at::cuda::CUDAGuard device_guard(features.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "riroi_align_rotated_forward_cuda_kernel", ([&] {
+        const scalar_t *bottom_data = features.data_ptr<scalar_t>();
+        const scalar_t *rois_data = rois.data_ptr<scalar_t>();
+        scalar_t *top_data = output.data_ptr<scalar_t>();
+
+        riroi_align_rotated_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, bottom_data, rois_data, scalar_t(spatial_scale),
+                num_samples, clockwise, channels, height, width, pooled_height,
+                pooled_width, num_orientations, top_data);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void RiROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int num_samples, const bool clockwise, const int channels,
+    const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const int num_orientations,
+    at::Tensor bottom_grad) {
+  const int output_size =
+      num_rois * pooled_height * pooled_width * channels * num_orientations;
+  at::cuda::CUDAGuard device_guard(top_grad.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "riroi_align_rotated_backward_cuda_kernel", ([&] {
+        const scalar_t *top_diff = top_grad.data_ptr<scalar_t>();
+        const scalar_t *rois_data = rois.data_ptr<scalar_t>();
+        scalar_t *bottom_diff = bottom_grad.data_ptr<scalar_t>();
+        riroi_align_rotated_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, top_diff, rois_data, spatial_scale, num_samples,
+                clockwise, channels, height, width, pooled_height, pooled_width,
+                num_orientations, bottom_diff);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..3d4f7614e4bce44b77027c82d99cabbd571e608c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_cuda.cu
@@ -0,0 +1,58 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "roi_align_cuda_kernel.cuh"
+
+void ROIAlignForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       int aligned_height, int aligned_width,
+                                       float spatial_scale, int sampling_ratio,
+                                       int pool_mode, bool aligned) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "roi_align_forward_cuda_kernel", [&] {
+        roi_align_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),
+                argmax_y.data_ptr<scalar_t>(), argmax_x.data_ptr<scalar_t>(),
+                aligned_height, aligned_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio, pool_mode,
+                aligned, channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ROIAlignBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                        Tensor argmax_y, Tensor argmax_x,
+                                        Tensor grad_input, int aligned_height,
+                                        int aligned_width, float spatial_scale,
+                                        int sampling_ratio, int pool_mode,
+                                        bool aligned) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "roi_align_backward_cuda_kernel", [&] {
+        roi_align_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), argmax_y.data_ptr<scalar_t>(),
+                argmax_x.data_ptr<scalar_t>(), grad_input.data_ptr<scalar_t>(),
+                aligned_height, aligned_width,
+                static_cast<scalar_t>(spatial_scale), sampling_ratio, pool_mode,
+                aligned, channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_rotated_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_rotated_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..c0fd987bb91d4c903c7e408190d7a31b906bae62
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_rotated_cuda.cu
@@ -0,0 +1,45 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "roi_align_rotated_cuda_kernel.cuh"
+
+void ROIAlignRotatedForwardCUDAKernelLauncher(
+    const at::Tensor input, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor output) {
+  const int output_size = num_rois * pooled_height * pooled_width * channels;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "ROIAlignRotatedLaucherForward", ([&] {
+        const scalar_t *bottom_data = input.data_ptr<scalar_t>();
+        const scalar_t *rois_data = rois.data_ptr<scalar_t>();
+        scalar_t *top_data = output.data_ptr<scalar_t>();
+
+        roi_align_rotated_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
+                output_size, bottom_data, rois_data, scalar_t(spatial_scale),
+                sampling_ratio, aligned, clockwise, channels, height, width,
+                pooled_height, pooled_width, top_data);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ROIAlignRotatedBackwardCUDAKernelLauncher(
+    const at::Tensor top_grad, const at::Tensor rois, const float spatial_scale,
+    const int sampling_ratio, const bool aligned, const bool clockwise,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, at::Tensor bottom_grad) {
+  const int output_size = num_rois * pooled_height * pooled_width * channels;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "ROIAlignLaucherBackward", ([&] {
+        const scalar_t *top_diff = top_grad.data_ptr<scalar_t>();
+        const scalar_t *rois_data = rois.data_ptr<scalar_t>();
+        scalar_t *bottom_diff = bottom_grad.data_ptr<scalar_t>();
+        roi_align_rotated_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
+                output_size, top_diff, rois_data, spatial_scale, sampling_ratio,
+                aligned, clockwise, channels, height, width, pooled_height,
+                pooled_width, bottom_diff);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_pool_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_pool_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..d9cdf3050964e9bd4fbb64f0650b138ccb51ac6d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_pool_cuda.cu
@@ -0,0 +1,50 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "roi_pool_cuda_kernel.cuh"
+
+void ROIPoolForwardCUDAKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                      Tensor argmax, int pooled_height,
+                                      int pooled_width, float spatial_scale) {
+  int output_size = output.numel();
+  int channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "roi_pool_forward_cuda_kernel", [&] {
+        roi_pool_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),
+                argmax.data_ptr<int>(), pooled_height, pooled_width,
+                static_cast<scalar_t>(spatial_scale), channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ROIPoolBackwardCUDAKernelLauncher(Tensor grad_output, Tensor rois,
+                                       Tensor argmax, Tensor grad_input,
+                                       int pooled_height, int pooled_width,
+                                       float spatial_scale) {
+  int output_size = grad_output.numel();
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "roi_pool_backward_cuda_kernel", [&] {
+        roi_pool_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                rois.data_ptr<scalar_t>(), argmax.data_ptr<int>(),
+                grad_input.data_ptr<scalar_t>(), pooled_height, pooled_width,
+                channels, height, width);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roiaware_pool3d_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roiaware_pool3d_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..7d83755f4c89104a037cb7c16a59e6dd25f84e12
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roiaware_pool3d_cuda.cu
@@ -0,0 +1,118 @@
+// Modified from
+// https://github.com/sshaoshuai/PCDet/blob/master/pcdet/ops/roiaware_pool3d/src/roiaware_pool3d_kernel.cu
+// Written by Shaoshuai Shi
+// All Rights Reserved 2019.
+
+#include <stdio.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "roiaware_pool3d_cuda_kernel.cuh"
+
+void RoiawarePool3dForwardCUDAKernelLauncher(
+    int boxes_num, int pts_num, int channels, int max_pts_each_voxel, int out_x,
+    int out_y, int out_z, const Tensor rois, const Tensor pts,
+    const Tensor pts_feature, Tensor argmax, Tensor pts_idx_of_voxels,
+    Tensor pooled_features, int pool_method) {
+  // params rois: (N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate params pts: (npoints, 3) [x, y, z] in LiDAR coordinate params
+  // pts_feature: (npoints, C) params argmax: (N, out_x, out_y, out_z, C) params
+  // pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel) params
+  // pooled_features: (N, out_x, out_y, out_z, C) params pool_method: 0:
+  // max_pool 1: avg_pool
+
+  at::cuda::CUDAGuard device_guard(pts_feature.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  Tensor pts_mask =
+      -at::ones({boxes_num, pts_num}, pts_feature.options().dtype(at::kInt));
+
+  dim3 blocks_mask(GET_BLOCKS(pts_num, THREADS_PER_BLOCK), boxes_num);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      rois.scalar_type(), "generate_pts_mask_for_box3d", [&] {
+        generate_pts_mask_for_box3d<scalar_t>
+            <<<blocks_mask, threads, 0, stream>>>(
+                boxes_num, pts_num, out_x, out_y, out_z,
+                rois.data_ptr<scalar_t>(), pts.data_ptr<scalar_t>(),
+                pts_mask.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  // TODO: Merge the collect and pool functions, SS
+
+  dim3 blocks_collect(GET_BLOCKS(boxes_num, THREADS_PER_BLOCK));
+
+  AT_DISPATCH_INTEGRAL_TYPES(
+      pts_idx_of_voxels.scalar_type(), "collect_inside_pts_for_box3d", [&] {
+        collect_inside_pts_for_box3d<scalar_t>
+            <<<blocks_collect, threads, 0, stream>>>(
+                boxes_num, pts_num, max_pts_each_voxel, out_x, out_y, out_z,
+                pts_mask.data_ptr<int>(),
+                pts_idx_of_voxels.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  dim3 blocks_pool(GET_BLOCKS(out_x * out_y * out_z, THREADS_PER_BLOCK),
+                   channels, boxes_num);
+  if (pool_method == 0) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        pts_feature.scalar_type(), "roiaware_maxpool3d", [&] {
+          roiaware_maxpool3d<scalar_t><<<blocks_pool, threads, 0, stream>>>(
+              boxes_num, pts_num, channels, max_pts_each_voxel, out_x, out_y,
+              out_z, pts_feature.data_ptr<scalar_t>(),
+              pts_idx_of_voxels.data_ptr<int>(),
+              pooled_features.data_ptr<scalar_t>(), argmax.data_ptr<int>());
+        });
+  } else if (pool_method == 1) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        pts_feature.scalar_type(), "roiaware_avgpool3d", [&] {
+          roiaware_avgpool3d<scalar_t><<<blocks_pool, threads, 0, stream>>>(
+              boxes_num, pts_num, channels, max_pts_each_voxel, out_x, out_y,
+              out_z, pts_feature.data_ptr<scalar_t>(),
+              pts_idx_of_voxels.data_ptr<int>(),
+              pooled_features.data_ptr<scalar_t>());
+        });
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void RoiawarePool3dBackwardCUDAKernelLauncher(
+    int boxes_num, int out_x, int out_y, int out_z, int channels,
+    int max_pts_each_voxel, const Tensor pts_idx_of_voxels, const Tensor argmax,
+    const Tensor grad_out, Tensor grad_in, int pool_method) {
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params grad_out: (N, out_x, out_y, out_z, C)
+  // params grad_in: (npoints, C), return value
+  // params pool_method: 0: max_pool, 1: avg_pool
+
+  at::cuda::CUDAGuard device_guard(grad_out.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(GET_BLOCKS(out_x * out_y * out_z, THREADS_PER_BLOCK), channels,
+              boxes_num);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  if (pool_method == 0) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        grad_in.scalar_type(), "roiaware_maxpool3d_backward", [&] {
+          roiaware_maxpool3d_backward<scalar_t><<<blocks, threads, 0, stream>>>(
+              boxes_num, channels, out_x, out_y, out_z, argmax.data_ptr<int>(),
+              grad_out.data_ptr<scalar_t>(), grad_in.data_ptr<scalar_t>());
+        });
+  } else if (pool_method == 1) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        grad_in.scalar_type(), "roiaware_avgpool3d_backward", [&] {
+          roiaware_avgpool3d_backward<scalar_t><<<blocks, threads, 0, stream>>>(
+              boxes_num, channels, out_x, out_y, out_z, max_pts_each_voxel,
+              pts_idx_of_voxels.data_ptr<int>(), grad_out.data_ptr<scalar_t>(),
+              grad_in.data_ptr<scalar_t>());
+        });
+  }
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roipoint_pool3d_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roipoint_pool3d_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..af2098e8229ef29c08fe3c8d715863fe67cda06e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/roipoint_pool3d_cuda.cu
@@ -0,0 +1,60 @@
+/*
+Modified from
+https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/roipoint_pool3d/src/roipoint_pool3d_kernel.cu
+Point cloud feature pooling
+Written by Shaoshuai Shi
+All Rights Reserved 2018.
+*/
+
+#include <math.h>
+#include <stdio.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "roipoint_pool3d_cuda_kernel.cuh"
+
+void RoIPointPool3dForwardCUDAKernelLauncher(
+    int batch_size, int pts_num, int boxes_num, int feature_in_len,
+    int sampled_pts_num, const Tensor xyz, const Tensor boxes3d,
+    const Tensor pts_feature, Tensor pooled_features,
+    Tensor pooled_empty_flag) {
+  Tensor pts_assign = at::empty({batch_size, pts_num, boxes_num},
+                                boxes3d.options().dtype(at::kInt));
+
+  at::cuda::CUDAGuard device_guard(xyz.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(pts_num, THREADS_PER_BLOCK), boxes_num, batch_size);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz.scalar_type(), "assign_pts_to_box3d", [&] {
+        assign_pts_to_box3d<scalar_t><<<blocks, threads, 0, stream>>>(
+            batch_size, pts_num, boxes_num, xyz.data_ptr<scalar_t>(),
+            boxes3d.data_ptr<scalar_t>(), pts_assign.data_ptr<int>());
+      });
+
+  Tensor pts_idx = at::empty({batch_size, boxes_num, sampled_pts_num},
+                             boxes3d.options().dtype(at::kInt));
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks2(GET_BLOCKS(boxes_num, THREADS_PER_BLOCK), batch_size);
+
+  get_pooled_idx<<<blocks2, threads, 0, stream>>>(
+      batch_size, pts_num, boxes_num, sampled_pts_num,
+      pts_assign.data_ptr<int>(), pts_idx.data_ptr<int>(),
+      pooled_empty_flag.data_ptr<int>());
+
+  dim3 blocks_pool(GET_BLOCKS(sampled_pts_num, THREADS_PER_BLOCK), boxes_num,
+                   batch_size);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      xyz.scalar_type(), "roipoint_pool3d_forward", [&] {
+        roipoint_pool3d_forward<scalar_t><<<blocks_pool, threads, 0, stream>>>(
+            batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num,
+            xyz.data_ptr<scalar_t>(), pts_idx.data_ptr<int>(),
+            pts_feature.data_ptr<scalar_t>(),
+            pooled_features.data_ptr<scalar_t>(),
+            pooled_empty_flag.data_ptr<int>());
+      });
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/rotated_feature_align_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/rotated_feature_align_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..d172338ae76b7d1509b3011383d3ea95ee8d9527
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/rotated_feature_align_cuda.cu
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection/blob/master/mmdet/ops/fr/src/feature_refine_kernel.cu
+#include "pytorch_cuda_helper.hpp"
+#include "rotated_feature_align_cuda_kernel.cuh"
+
+void RotatedFeatureAlignForwardCUDAKernelLauncher(const Tensor features,
+                                                  const Tensor best_bboxes,
+                                                  const float spatial_scale,
+                                                  const int points,
+                                                  Tensor output) {
+  at::cuda::CUDAGuard device_guard(features.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  const int output_size = features.numel();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features.scalar_type(), "rotated_feature_align_forward_cuda_kernel",
+      ([&] {
+        const scalar_t* bottom_data = features.data_ptr<scalar_t>();
+        const scalar_t* bboxes_data = best_bboxes.data_ptr<scalar_t>();
+        scalar_t* top_data = output.data_ptr<scalar_t>();
+
+        rotated_feature_align_forward_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, points, bottom_data, bboxes_data,
+                scalar_t(spatial_scale), features.size(1), features.size(2),
+                features.size(3), top_data);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void RotatedFeatureAlignBackwardCUDAKernelLauncher(const Tensor top_grad,
+                                                   const Tensor best_bboxes,
+                                                   const float spatial_scale,
+                                                   const int points,
+                                                   Tensor bottom_grad) {
+  at::cuda::CUDAGuard device_guard(top_grad.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  const int output_size = top_grad.numel();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      top_grad.scalar_type(), "rotated_feature_align_backward_cuda_kernel",
+      ([&] {
+        const scalar_t* top_diff = top_grad.data_ptr<scalar_t>();
+        const scalar_t* bboxes_data = best_bboxes.data_ptr<scalar_t>();
+        scalar_t* bottom_diff = bottom_grad.data_ptr<scalar_t>();
+
+        rotated_feature_align_backward_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, points, top_diff, bboxes_data,
+                scalar_t(spatial_scale), top_grad.size(1), top_grad.size(2),
+                top_grad.size(3), bottom_diff);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/scatter_points_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/scatter_points_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..cbc44651fc51a5392031e51355de242837242596
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/scatter_points_cuda.cu
@@ -0,0 +1,132 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include <stdio.h>
+#include <stdlib.h>
+#include <torch/types.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "scatter_points_cuda_kernel.cuh"
+
+std::vector<at::Tensor> DynamicPointToVoxelForwardCUDAKernelLauncher(
+    const at::Tensor &feats, const at::Tensor &coors,
+    const reduce_t reduce_type) {
+  const int num_input = feats.size(0);
+  const int num_feats = feats.size(1);
+
+  if (num_input == 0)
+    return {feats.clone().detach(), coors.clone().detach(),
+            coors.new_empty({0}, torch::kInt32),
+            coors.new_empty({0}, torch::kInt32)};
+
+  at::Tensor out_coors;
+  at::Tensor coors_map;
+  at::Tensor reduce_count;
+
+  auto coors_clean = coors.masked_fill(coors.lt(0).any(-1, true), -1);
+
+  std::tie(out_coors, coors_map, reduce_count) =
+      at::unique_dim(coors_clean, 0, true, true, true);
+
+  if (out_coors[0][0].lt(0).item<bool>()) {
+    // the first element of out_coors (-1,-1,-1) and should be removed
+    out_coors = out_coors.slice(0, 1);
+    reduce_count = reduce_count.slice(0, 1);
+    coors_map = coors_map - 1;
+  }
+
+  coors_map = coors_map.to(torch::kInt32);
+  reduce_count = reduce_count.to(torch::kInt32);
+
+  auto reduced_feats =
+      at::empty({out_coors.size(0), num_feats}, feats.options());
+
+  at::cuda::CUDAGuard device_guard(feats.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  AT_DISPATCH_FLOATING_TYPES(
+      feats.scalar_type(), "feats_reduce_kernel", ([&] {
+        if (reduce_type == reduce_t::MAX)
+          reduced_feats.fill_(-std::numeric_limits<scalar_t>::infinity());
+        else
+          reduced_feats.fill_(static_cast<scalar_t>(0));
+
+        dim3 blocks(std::min(
+            at::cuda::ATenCeilDiv(num_input, THREADS_PER_BLOCK), maxGridDim));
+        dim3 threads(THREADS_PER_BLOCK);
+        feats_reduce_kernel<<<blocks, threads, 0, stream>>>(
+            feats.data_ptr<scalar_t>(), coors_map.data_ptr<int32_t>(),
+            reduced_feats.data_ptr<scalar_t>(), num_input, num_feats,
+            reduce_type);
+        if (reduce_type == reduce_t::MEAN)
+          reduced_feats /= reduce_count.unsqueeze(-1).to(reduced_feats.dtype());
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  return {reduced_feats, out_coors, coors_map, reduce_count};
+}
+
+void DynamicPointToVoxelBackwardCUDAKernelLauncher(
+    at::Tensor &grad_feats, const at::Tensor &grad_reduced_feats,
+    const at::Tensor &feats, const at::Tensor &reduced_feats,
+    const at::Tensor &coors_map, const at::Tensor &reduce_count,
+    const reduce_t reduce_type) {
+  const int num_input = feats.size(0);
+  const int num_reduced = reduced_feats.size(0);
+  const int num_feats = feats.size(1);
+
+  grad_feats.fill_(0);
+  // copy voxel grad to points
+
+  if (num_input == 0 || num_reduced == 0) return;
+  at::cuda::CUDAGuard device_guard(feats.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  if (reduce_type == reduce_t::MEAN || reduce_type == reduce_t::SUM) {
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(), "add_reduce_traceback_grad_kernel",
+        ([&] {
+          dim3 blocks(std::min(
+              at::cuda::ATenCeilDiv(num_input, THREADS_PER_BLOCK), maxGridDim));
+          dim3 threads(THREADS_PER_BLOCK);
+          add_reduce_traceback_grad_kernel<<<blocks, threads, 0, stream>>>(
+              grad_feats.data_ptr<scalar_t>(),
+              grad_reduced_feats.data_ptr<scalar_t>(),
+              coors_map.data_ptr<int32_t>(), reduce_count.data_ptr<int32_t>(),
+              num_input, num_feats, reduce_type);
+        }));
+
+    AT_CUDA_CHECK(cudaGetLastError());
+  } else {
+    auto reduce_from = at::full({num_reduced, num_feats}, num_input,
+                                coors_map.options().dtype(torch::kInt32));
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(),
+        "max_reduce_traceback_scatter_idx_kernel", ([&] {
+          dim3 blocks(std::min(
+              at::cuda::ATenCeilDiv(num_input, THREADS_PER_BLOCK), maxGridDim));
+          dim3 threads(THREADS_PER_BLOCK);
+          max_reduce_traceback_scatter_idx_kernel<<<blocks, threads, 0,
+                                                    stream>>>(
+              feats.data_ptr<scalar_t>(), reduced_feats.data_ptr<scalar_t>(),
+              reduce_from.data_ptr<int32_t>(), coors_map.data_ptr<int32_t>(),
+              num_input, num_feats);
+        }));
+
+    AT_CUDA_CHECK(cudaGetLastError());
+
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(),
+        "max_reduce_traceback_scatter_idx_kernel", ([&] {
+          dim3 blocks(
+              std::min(at::cuda::ATenCeilDiv(num_reduced, THREADS_PER_BLOCK),
+                       maxGridDim));
+          dim3 threads(THREADS_PER_BLOCK);
+          max_reduce_scatter_grad_kernel<<<blocks, threads, 0, stream>>>(
+              grad_feats.data_ptr<scalar_t>(),
+              grad_reduced_feats.data_ptr<scalar_t>(),
+              reduce_from.data_ptr<int32_t>(), num_reduced, num_feats);
+        }));
+
+    AT_CUDA_CHECK(cudaGetLastError());
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_ball_query_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_ball_query_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..3095df5ee32070b340deec15f43d1fc093a2b282
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_ball_query_cuda.cu
@@ -0,0 +1,45 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/ball_query_gpu.cu
+
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "stack_ball_query_cuda_kernel.cuh"
+#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0))
+
+void StackBallQueryForwardCUDAKernelLauncher(float max_radius, int nsample,
+                                             const Tensor new_xyz,
+                                             const Tensor new_xyz_batch_cnt,
+                                             const Tensor xyz,
+                                             const Tensor xyz_batch_cnt,
+                                             Tensor idx) {
+  at::cuda::CUDAGuard device_guard(new_xyz.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  //   const float *new_xyz_ptr = new_xyz.data_ptr<float>();
+  //   const float *xyz_ptr = xyz.data_ptr<float>();
+  //   const int *new_xyz_batch_cnt_ptr = new_xyz_batch_cnt.data_ptr<int>();
+  //   const int *xyz_batch_cnt_ptr = xyz_batch_cnt.data_ptr<int>();
+  //   int *idx_ptr = idx.data_ptr<int>();
+
+  int B = xyz_batch_cnt.size(0);
+  int M = new_xyz.size(0);
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(DIVUP(M, THREADS_PER_BLOCK));
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      new_xyz.scalar_type(), "stack_ball_query_forward_cuda_kernel", [&] {
+        stack_ball_query_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                B, M, max_radius, nsample, new_xyz.data_ptr<scalar_t>(),
+                new_xyz_batch_cnt.data_ptr<int>(), xyz.data_ptr<scalar_t>(),
+                xyz_batch_cnt.data_ptr<int>(), idx.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_group_points_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_group_points_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..9f903b02a6750e0352f06ad268c35775d694b0fc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/stack_group_points_cuda.cu
@@ -0,0 +1,62 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points_gpu.cu
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "stack_group_points_cuda_kernel.cuh"
+
+void StackGroupPointsForwardCUDAKernelLauncher(
+    int b, int c, int m, int nsample, const Tensor features_tensor,
+    const Tensor features_batch_cnt_tensor, const Tensor idx_tensor,
+    const Tensor idx_batch_cnt_tensor, Tensor out_tensor) {
+  // points: (B, C, N)
+  // idx: (B, npoints, nsample)
+  // output:
+  //      out: (B, C, npoints, nsample)
+  at::cuda::CUDAGuard device_guard(features_tensor.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(DIVUP(m * c * nsample, THREADS_PER_BLOCK));
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      features_tensor.scalar_type(), "stack_group_points_forward_cuda_kernel",
+      [&] {
+        stack_group_points_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, m, nsample, features_tensor.data_ptr<scalar_t>(),
+                features_batch_cnt_tensor.data_ptr<int>(),
+                idx_tensor.data_ptr<int>(),
+                idx_batch_cnt_tensor.data_ptr<int>(),
+                out_tensor.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void StackGroupPointsBackwardCUDAKernelLauncher(
+    int b, int c, int m, int n, int nsample, const Tensor grad_out_tensor,
+    const Tensor idx_tensor, const Tensor idx_batch_cnt_tensor,
+    const Tensor features_batch_cnt_tensor, Tensor grad_features_tensor) {
+  at::cuda::CUDAGuard device_guard(grad_features_tensor.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  dim3 blocks(DIVUP(m * c * nsample, THREADS_PER_BLOCK));
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_features_tensor.scalar_type(),
+      "stack_group_points_backward_cuda_kernel", [&] {
+        stack_group_points_backward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, m, n, nsample, grad_out_tensor.data_ptr<scalar_t>(),
+                idx_tensor.data_ptr<int>(),
+                idx_batch_cnt_tensor.data_ptr<int>(),
+                features_batch_cnt_tensor.data_ptr<int>(),
+                grad_features_tensor.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/sync_bn_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/sync_bn_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..657c81701b7c114af700c4f8cf37094c705b9a94
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/sync_bn_cuda.cu
@@ -0,0 +1,110 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "sync_bn_cuda_kernel.cuh"
+
+void SyncBNForwardMeanCUDAKernelLauncher(const Tensor input, Tensor mean) {
+  int num = input.size(0);
+  int channels = input.size(1);
+  int spatial = input.size(2);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "sync_bn_forward_mean_cuda_kernel", [&] {
+        sync_bn_forward_mean_cuda_kernel<scalar_t>
+            <<<channels, THREADS_PER_BLOCK, 0, stream>>>(
+                input.data_ptr<scalar_t>(), mean.data_ptr<float>(), num,
+                channels, spatial);
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SyncBNForwardVarCUDAKernelLauncher(const Tensor input, const Tensor mean,
+                                        Tensor var) {
+  int num = input.size(0);
+  int channels = input.size(1);
+  int spatial = input.size(2);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "sync_bn_forward_mean_cuda_kernel", [&] {
+        sync_bn_forward_var_cuda_kernel<scalar_t>
+            <<<channels, THREADS_PER_BLOCK, 0, stream>>>(
+                input.data_ptr<scalar_t>(), mean.data_ptr<float>(),
+                var.data_ptr<float>(), num, channels, spatial);
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SyncBNForwardOutputCUDAKernelLauncher(
+    const Tensor input, const Tensor mean, const Tensor var,
+    Tensor running_mean, Tensor running_var, const Tensor weight,
+    const Tensor bias, Tensor norm, Tensor std, Tensor output, float eps,
+    float momentum, int group_size) {
+  int num = input.size(0);
+  int channels = input.size(1);
+  int spatial = input.size(2);
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "sync_bn_forward_mean_cuda_kernel", [&] {
+        sync_bn_forward_output_cuda_kernel<scalar_t>
+            <<<channels, THREADS_PER_BLOCK, 0, stream>>>(
+                input.data_ptr<scalar_t>(), mean.data_ptr<float>(),
+                var.data_ptr<float>(), running_mean.data_ptr<float>(),
+                running_var.data_ptr<float>(), weight.data_ptr<float>(),
+                bias.data_ptr<float>(), norm.data_ptr<float>(),
+                std.data_ptr<float>(), output.data_ptr<scalar_t>(), num,
+                channels, spatial, eps, momentum, group_size);
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SyncBNBackwardParamCUDAKernelLauncher(const Tensor grad_output,
+                                           const Tensor norm,
+                                           Tensor grad_weight,
+                                           Tensor grad_bias) {
+  int num = grad_output.size(0);
+  int channels = grad_output.size(1);
+  int spatial = grad_output.size(2);
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "sync_bn_backward_param_cuda_kernel", [&] {
+        sync_bn_backward_param_cuda_kernel<scalar_t>
+            <<<channels, THREADS_PER_BLOCK, 0, stream>>>(
+                grad_output.data_ptr<scalar_t>(), norm.data_ptr<float>(),
+                grad_weight.data_ptr<float>(), grad_bias.data_ptr<float>(), num,
+                channels, spatial);
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void SyncBNBackwardDataCUDAKernelLauncher(const Tensor grad_output,
+                                          const Tensor weight,
+                                          const Tensor grad_weight,
+                                          const Tensor grad_bias,
+                                          const Tensor norm, const Tensor std,
+                                          Tensor grad_input) {
+  int output_size = grad_input.numel();
+  int num = grad_input.size(0);
+  int channels = grad_input.size(1);
+  int spatial = grad_input.size(2);
+
+  at::cuda::CUDAGuard device_guard(grad_input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "sync_bn_backward_data_cuda_kernel", [&] {
+        sync_bn_backward_data_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                weight.data_ptr<float>(), grad_weight.data_ptr<float>(),
+                grad_bias.data_ptr<float>(), norm.data_ptr<float>(),
+                std.data_ptr<float>(), grad_input.data_ptr<scalar_t>(), num,
+                channels, spatial);
+      });
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_interpolate_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_interpolate_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..56a5550066035efb96d1d8e46c5f1ecd3e36083b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_interpolate_cuda.cu
@@ -0,0 +1,66 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate_gpu.cu
+
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "three_interpolate_cuda_kernel.cuh"
+
+void ThreeInterpolateForwardCUDAKernelLauncher(int b, int c, int m, int n,
+                                               const Tensor points,
+                                               const Tensor idx,
+                                               const Tensor weight,
+                                               Tensor out) {
+  // points: (B, C, M)
+  // idx: (B, N, 3)
+  // weight: (B, N, 3)
+  // output:
+  //      out: (B, C, N)
+
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(n, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      points.scalar_type(), "three_interpolate_forward_cuda_kernel", [&] {
+        three_interpolate_forward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, m, n, points.data_ptr<scalar_t>(), idx.data_ptr<int>(),
+                weight.data_ptr<scalar_t>(), out.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void ThreeInterpolateBackwardCUDAKernelLauncher(int b, int c, int n, int m,
+                                                const Tensor grad_out,
+                                                const Tensor idx,
+                                                const Tensor weight,
+                                                Tensor grad_points) {
+  // grad_out: (B, C, N)
+  // weight: (B, N, 3)
+  // output:
+  //      grad_points: (B, C, M)
+
+  at::cuda::CUDAGuard device_guard(grad_out.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(n, THREADS_PER_BLOCK), c, b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_out.scalar_type(), "three_interpolate_backward_cuda_kernel", [&] {
+        three_interpolate_backward_cuda_kernel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                b, c, n, m, grad_out.data_ptr<scalar_t>(), idx.data_ptr<int>(),
+                weight.data_ptr<scalar_t>(), grad_points.data_ptr<scalar_t>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_nn_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_nn_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..91c68829b9f2c19f1a64def88475c0fedf40de9f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/three_nn_cuda.cu
@@ -0,0 +1,35 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate_gpu.cu
+
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "three_nn_cuda_kernel.cuh"
+
+void ThreeNNForwardCUDAKernelLauncher(int b, int n, int m, const Tensor unknown,
+                                      const Tensor known, Tensor dist2,
+                                      Tensor idx) {
+  // unknown: (B, N, 3)
+  // known: (B, M, 3)
+  // output:
+  //      dist2: (B, N, 3)
+  //      idx: (B, N, 3)
+
+  at::cuda::CUDAGuard device_guard(unknown.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  // blockIdx.x(col), blockIdx.y(row)
+  dim3 blocks(GET_BLOCKS(n, THREADS_PER_BLOCK), b);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      unknown.scalar_type(), "three_nn_forward_cuda_kernel", [&] {
+        three_nn_forward_cuda_kernel<scalar_t><<<blocks, threads, 0, stream>>>(
+            b, n, m, unknown.data_ptr<scalar_t>(), known.data_ptr<scalar_t>(),
+            dist2.data_ptr<scalar_t>(), idx.data_ptr<int>());
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/tin_shift_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/tin_shift_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..19c85c76c9f53cb70314d4cdc1c1d2379322f30e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/tin_shift_cuda.cu
@@ -0,0 +1,55 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cuda_helper.hpp"
+#include "pytorch_device_registry.hpp"
+#include "tin_shift_cuda_kernel.cuh"
+
+void TINShiftForwardCUDAKernelLauncher(Tensor input, Tensor shift,
+                                       Tensor output) {
+  int output_size = output.numel();
+  int batch_size = input.size(0);
+  int t_size = input.size(1);
+  int channels = input.size(2);
+  int hw_size = input.size(3);
+  int group_size = shift.size(1);
+  int group_channel = channels / group_size;
+  int num_kernels = batch_size * hw_size * channels;
+
+  at::cuda::CUDAGuard device_guard(input.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      input.scalar_type(), "tin_shift_forward_cuda_kernel", [&] {
+        tin_shift_forward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, input.data_ptr<scalar_t>(), shift.data_ptr<int>(),
+                output.data_ptr<scalar_t>(), batch_size, channels, t_size,
+                hw_size, group_size, group_channel);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
+
+void TINShiftBackwardCUDAKernelLauncher(Tensor grad_output, Tensor shift,
+                                        Tensor grad_input) {
+  int output_size = grad_output.numel();
+  int batch_size = grad_output.size(0);
+  int t_size = grad_output.size(1);
+  int channels = grad_output.size(2);
+  int hw_size = grad_output.size(3);
+  int group_size = shift.size(1);
+  int group_channel = channels / group_size;
+  int num_kernels = batch_size * hw_size * channels;
+
+  at::cuda::CUDAGuard device_guard(grad_output.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      grad_output.scalar_type(), "tin_shift_backward_cuda_kernel", [&] {
+        tin_shift_backward_cuda_kernel<scalar_t>
+            <<<GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0, stream>>>(
+                output_size, grad_output.data_ptr<scalar_t>(),
+                shift.data_ptr<int>(), grad_input.data_ptr<scalar_t>(),
+                batch_size, channels, t_size, hw_size, group_size,
+                group_channel);
+      });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/upfirdn2d_kernel.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/upfirdn2d_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..ea2f08820023cea60bdefe8aae56b0f303c72ffa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/upfirdn2d_kernel.cu
@@ -0,0 +1,370 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/upfirdn2d_kernel.cu
+// Copyright (c) 2019, NVIDIA Corporation. All rights reserved.
+//
+// This work is made available under the Nvidia Source Code License-NC.
+// To view a copy of this license, visit
+// https://nvlabs.github.io/stylegan2/license.html
+
+#include <ATen/ATen.h>
+#include <ATen/AccumulateType.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <torch/types.h>
+
+#include <ATen/cuda/CUDAApplyUtils.cuh>
+
+static __host__ __device__ __forceinline__ int floor_div(int a, int b) {
+  int c = a / b;
+
+  if (c * b > a) {
+    c--;
+  }
+
+  return c;
+}
+
+struct UpFirDn2DKernelParams {
+  int up_x;
+  int up_y;
+  int down_x;
+  int down_y;
+  int pad_x0;
+  int pad_x1;
+  int pad_y0;
+  int pad_y1;
+
+  int major_dim;
+  int in_h;
+  int in_w;
+  int minor_dim;
+  int kernel_h;
+  int kernel_w;
+  int out_h;
+  int out_w;
+  int loop_major;
+  int loop_x;
+};
+
+template <typename scalar_t>
+__global__ void upfirdn2d_kernel_large(scalar_t *out, const scalar_t *input,
+                                       const scalar_t *kernel,
+                                       const UpFirDn2DKernelParams p) {
+  int minor_idx = blockIdx.x * blockDim.x + threadIdx.x;
+  int out_y = minor_idx / p.minor_dim;
+  minor_idx -= out_y * p.minor_dim;
+  int out_x_base = blockIdx.y * p.loop_x * blockDim.y + threadIdx.y;
+  int major_idx_base = blockIdx.z * p.loop_major;
+
+  if (out_x_base >= p.out_w || out_y >= p.out_h ||
+      major_idx_base >= p.major_dim) {
+    return;
+  }
+
+  int mid_y = out_y * p.down_y + p.up_y - 1 - p.pad_y0;
+  int in_y = min(max(floor_div(mid_y, p.up_y), 0), p.in_h);
+  int h = min(max(floor_div(mid_y + p.kernel_h, p.up_y), 0), p.in_h) - in_y;
+  int kernel_y = mid_y + p.kernel_h - (in_y + 1) * p.up_y;
+
+  for (int loop_major = 0, major_idx = major_idx_base;
+       loop_major < p.loop_major && major_idx < p.major_dim;
+       loop_major++, major_idx++) {
+    for (int loop_x = 0, out_x = out_x_base;
+         loop_x < p.loop_x && out_x < p.out_w; loop_x++, out_x += blockDim.y) {
+      int mid_x = out_x * p.down_x + p.up_x - 1 - p.pad_x0;
+      int in_x = min(max(floor_div(mid_x, p.up_x), 0), p.in_w);
+      int w = min(max(floor_div(mid_x + p.kernel_w, p.up_x), 0), p.in_w) - in_x;
+      int kernel_x = mid_x + p.kernel_w - (in_x + 1) * p.up_x;
+
+      const scalar_t *x_p =
+          &input[((major_idx * p.in_h + in_y) * p.in_w + in_x) * p.minor_dim +
+                 minor_idx];
+      const scalar_t *k_p = &kernel[kernel_y * p.kernel_w + kernel_x];
+      int x_px = p.minor_dim;
+      int k_px = -p.up_x;
+      int x_py = p.in_w * p.minor_dim;
+      int k_py = -p.up_y * p.kernel_w;
+
+      scalar_t v = 0.0f;
+
+      for (int y = 0; y < h; y++) {
+        for (int x = 0; x < w; x++) {
+          v += static_cast<scalar_t>(*x_p) * static_cast<scalar_t>(*k_p);
+          x_p += x_px;
+          k_p += k_px;
+        }
+
+        x_p += x_py - w * x_px;
+        k_p += k_py - w * k_px;
+      }
+
+      out[((major_idx * p.out_h + out_y) * p.out_w + out_x) * p.minor_dim +
+          minor_idx] = v;
+    }
+  }
+}
+
+template <typename scalar_t, int up_x, int up_y, int down_x, int down_y,
+          int kernel_h, int kernel_w, int tile_out_h, int tile_out_w>
+__global__ void upfirdn2d_kernel(scalar_t *out, const scalar_t *input,
+                                 const scalar_t *kernel,
+                                 const UpFirDn2DKernelParams p) {
+  const int tile_in_h = ((tile_out_h - 1) * down_y + kernel_h - 1) / up_y + 1;
+  const int tile_in_w = ((tile_out_w - 1) * down_x + kernel_w - 1) / up_x + 1;
+
+  __shared__ volatile float sk[kernel_h][kernel_w];
+  __shared__ volatile float sx[tile_in_h][tile_in_w];
+
+  int minor_idx = blockIdx.x;
+  int tile_out_y = minor_idx / p.minor_dim;
+  minor_idx -= tile_out_y * p.minor_dim;
+  tile_out_y *= tile_out_h;
+  int tile_out_x_base = blockIdx.y * p.loop_x * tile_out_w;
+  int major_idx_base = blockIdx.z * p.loop_major;
+
+  if (tile_out_x_base >= p.out_w | tile_out_y >= p.out_h |
+      major_idx_base >= p.major_dim) {
+    return;
+  }
+
+  for (int tap_idx = threadIdx.x; tap_idx < kernel_h * kernel_w;
+       tap_idx += blockDim.x) {
+    int ky = tap_idx / kernel_w;
+    int kx = tap_idx - ky * kernel_w;
+    scalar_t v = 0.0;
+
+    if (kx < p.kernel_w & ky < p.kernel_h) {
+      v = kernel[(p.kernel_h - 1 - ky) * p.kernel_w + (p.kernel_w - 1 - kx)];
+    }
+
+    sk[ky][kx] = v;
+  }
+
+  for (int loop_major = 0, major_idx = major_idx_base;
+       loop_major < p.loop_major & major_idx < p.major_dim;
+       loop_major++, major_idx++) {
+    for (int loop_x = 0, tile_out_x = tile_out_x_base;
+         loop_x < p.loop_x & tile_out_x < p.out_w;
+         loop_x++, tile_out_x += tile_out_w) {
+      int tile_mid_x = tile_out_x * down_x + up_x - 1 - p.pad_x0;
+      int tile_mid_y = tile_out_y * down_y + up_y - 1 - p.pad_y0;
+      int tile_in_x = floor_div(tile_mid_x, up_x);
+      int tile_in_y = floor_div(tile_mid_y, up_y);
+
+      __syncthreads();
+
+      for (int in_idx = threadIdx.x; in_idx < tile_in_h * tile_in_w;
+           in_idx += blockDim.x) {
+        int rel_in_y = in_idx / tile_in_w;
+        int rel_in_x = in_idx - rel_in_y * tile_in_w;
+        int in_x = rel_in_x + tile_in_x;
+        int in_y = rel_in_y + tile_in_y;
+
+        scalar_t v = 0.0;
+
+        if (in_x >= 0 & in_y >= 0 & in_x < p.in_w & in_y < p.in_h) {
+          v = input[((major_idx * p.in_h + in_y) * p.in_w + in_x) *
+                        p.minor_dim +
+                    minor_idx];
+        }
+
+        sx[rel_in_y][rel_in_x] = v;
+      }
+
+      __syncthreads();
+      for (int out_idx = threadIdx.x; out_idx < tile_out_h * tile_out_w;
+           out_idx += blockDim.x) {
+        int rel_out_y = out_idx / tile_out_w;
+        int rel_out_x = out_idx - rel_out_y * tile_out_w;
+        int out_x = rel_out_x + tile_out_x;
+        int out_y = rel_out_y + tile_out_y;
+
+        int mid_x = tile_mid_x + rel_out_x * down_x;
+        int mid_y = tile_mid_y + rel_out_y * down_y;
+        int in_x = floor_div(mid_x, up_x);
+        int in_y = floor_div(mid_y, up_y);
+        int rel_in_x = in_x - tile_in_x;
+        int rel_in_y = in_y - tile_in_y;
+        int kernel_x = (in_x + 1) * up_x - mid_x - 1;
+        int kernel_y = (in_y + 1) * up_y - mid_y - 1;
+
+        scalar_t v = 0.0;
+
+#pragma unroll
+        for (int y = 0; y < kernel_h / up_y; y++)
+#pragma unroll
+          for (int x = 0; x < kernel_w / up_x; x++)
+            v += sx[rel_in_y + y][rel_in_x + x] *
+                 sk[kernel_y + y * up_y][kernel_x + x * up_x];
+
+        if (out_x < p.out_w & out_y < p.out_h) {
+          out[((major_idx * p.out_h + out_y) * p.out_w + out_x) * p.minor_dim +
+              minor_idx] = v;
+        }
+      }
+    }
+  }
+}
+
+torch::Tensor upfirdn2d_op(const torch::Tensor &input,
+                           const torch::Tensor &kernel, int up_x, int up_y,
+                           int down_x, int down_y, int pad_x0, int pad_x1,
+                           int pad_y0, int pad_y1) {
+  int curDevice = -1;
+  cudaGetDevice(&curDevice);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream(curDevice);
+
+  UpFirDn2DKernelParams p;
+
+  auto x = input.contiguous();
+  auto k = kernel.contiguous();
+
+  p.major_dim = x.size(0);
+  p.in_h = x.size(1);
+  p.in_w = x.size(2);
+  p.minor_dim = x.size(3);
+  p.kernel_h = k.size(0);
+  p.kernel_w = k.size(1);
+  p.up_x = up_x;
+  p.up_y = up_y;
+  p.down_x = down_x;
+  p.down_y = down_y;
+  p.pad_x0 = pad_x0;
+  p.pad_x1 = pad_x1;
+  p.pad_y0 = pad_y0;
+  p.pad_y1 = pad_y1;
+
+  p.out_h = (p.in_h * p.up_y + p.pad_y0 + p.pad_y1 - p.kernel_h + p.down_y) /
+            p.down_y;
+  p.out_w = (p.in_w * p.up_x + p.pad_x0 + p.pad_x1 - p.kernel_w + p.down_x) /
+            p.down_x;
+
+  auto out =
+      at::empty({p.major_dim, p.out_h, p.out_w, p.minor_dim}, x.options());
+
+  int mode = -1;
+
+  int tile_out_h = -1;
+  int tile_out_w = -1;
+
+  if (p.up_x == 1 && p.up_y == 1 && p.down_x == 1 && p.down_y == 1 &&
+      p.kernel_h <= 4 && p.kernel_w <= 4) {
+    mode = 1;
+    tile_out_h = 16;
+    tile_out_w = 64;
+  }
+
+  if (p.up_x == 1 && p.up_y == 1 && p.down_x == 1 && p.down_y == 1 &&
+      p.kernel_h <= 3 && p.kernel_w <= 3) {
+    mode = 2;
+    tile_out_h = 16;
+    tile_out_w = 64;
+  }
+
+  if (p.up_x == 2 && p.up_y == 2 && p.down_x == 1 && p.down_y == 1 &&
+      p.kernel_h <= 4 && p.kernel_w <= 4) {
+    mode = 3;
+    tile_out_h = 16;
+    tile_out_w = 64;
+  }
+
+  if (p.up_x == 2 && p.up_y == 2 && p.down_x == 1 && p.down_y == 1 &&
+      p.kernel_h <= 2 && p.kernel_w <= 2) {
+    mode = 4;
+    tile_out_h = 16;
+    tile_out_w = 64;
+  }
+
+  if (p.up_x == 1 && p.up_y == 1 && p.down_x == 2 && p.down_y == 2 &&
+      p.kernel_h <= 4 && p.kernel_w <= 4) {
+    mode = 5;
+    tile_out_h = 8;
+    tile_out_w = 32;
+  }
+
+  if (p.up_x == 1 && p.up_y == 1 && p.down_x == 2 && p.down_y == 2 &&
+      p.kernel_h <= 2 && p.kernel_w <= 2) {
+    mode = 6;
+    tile_out_h = 8;
+    tile_out_w = 32;
+  }
+
+  dim3 block_size;
+  dim3 grid_size;
+
+  if (tile_out_h > 0 && tile_out_w > 0) {
+    p.loop_major = (p.major_dim - 1) / 16384 + 1;
+    p.loop_x = 1;
+    block_size = dim3(32 * 8, 1, 1);
+    grid_size = dim3(((p.out_h - 1) / tile_out_h + 1) * p.minor_dim,
+                     (p.out_w - 1) / (p.loop_x * tile_out_w) + 1,
+                     (p.major_dim - 1) / p.loop_major + 1);
+  } else {
+    p.loop_major = (p.major_dim - 1) / 16384 + 1;
+    p.loop_x = 4;
+    block_size = dim3(4, 32, 1);
+    grid_size = dim3((p.out_h * p.minor_dim - 1) / block_size.x + 1,
+                     (p.out_w - 1) / (p.loop_x * block_size.y) + 1,
+                     (p.major_dim - 1) / p.loop_major + 1);
+  }
+
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(x.scalar_type(), "upfirdn2d_cuda", [&] {
+    switch (mode) {
+      case 1:
+        upfirdn2d_kernel<scalar_t, 1, 1, 1, 1, 4, 4, 16, 64>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      case 2:
+        upfirdn2d_kernel<scalar_t, 1, 1, 1, 1, 3, 3, 16, 64>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      case 3:
+        upfirdn2d_kernel<scalar_t, 2, 2, 1, 1, 4, 4, 16, 64>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      case 4:
+        upfirdn2d_kernel<scalar_t, 2, 2, 1, 1, 2, 2, 16, 64>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      case 5:
+        upfirdn2d_kernel<scalar_t, 1, 1, 2, 2, 4, 4, 8, 32>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      case 6:
+        upfirdn2d_kernel<scalar_t, 1, 1, 2, 2, 4, 4, 8, 32>
+            <<<grid_size, block_size, 0, stream>>>(out.data_ptr<scalar_t>(),
+                                                   x.data_ptr<scalar_t>(),
+                                                   k.data_ptr<scalar_t>(), p);
+
+        break;
+
+      default:
+        upfirdn2d_kernel_large<scalar_t><<<grid_size, block_size, 0, stream>>>(
+            out.data_ptr<scalar_t>(), x.data_ptr<scalar_t>(),
+            k.data_ptr<scalar_t>(), p);
+    }
+  });
+
+  return out;
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/voxelization_cuda.cu b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/voxelization_cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..f4166b7b7a4fc7297f452636a991bbf91789dd85
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/cuda/voxelization_cuda.cu
@@ -0,0 +1,286 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "pytorch_cuda_helper.hpp"
+#include "voxelization_cuda_kernel.cuh"
+
+int HardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor &points, at::Tensor &voxels, at::Tensor &coors,
+    at::Tensor &num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3) {
+  // current version tooks about 0.04s for one frame on cpu
+  // check device
+
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  const int num_points = points.size(0);
+  const int num_features = points.size(1);
+
+  const float voxel_x = voxel_size[0];
+  const float voxel_y = voxel_size[1];
+  const float voxel_z = voxel_size[2];
+  const float coors_x_min = coors_range[0];
+  const float coors_y_min = coors_range[1];
+  const float coors_z_min = coors_range[2];
+  const float coors_x_max = coors_range[3];
+  const float coors_y_max = coors_range[4];
+  const float coors_z_max = coors_range[5];
+
+  const int grid_x = round((coors_x_max - coors_x_min) / voxel_x);
+  const int grid_y = round((coors_y_max - coors_y_min) / voxel_y);
+  const int grid_z = round((coors_z_max - coors_z_min) / voxel_z);
+
+  // map points to voxel coors
+  at::Tensor temp_coors =
+      at::zeros({num_points, NDim}, points.options().dtype(at::kInt));
+
+  dim3 grid(std::min(at::cuda::ATenCeilDiv(num_points, 512), 4096));
+  dim3 block(512);
+
+  // 1. link point to corresponding voxel coors
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "hard_voxelize_kernel", ([&] {
+        dynamic_voxelize_kernel<scalar_t, int><<<grid, block, 0, stream>>>(
+            points.contiguous().data_ptr<scalar_t>(),
+            temp_coors.contiguous().data_ptr<int>(), voxel_x, voxel_y, voxel_z,
+            coors_x_min, coors_y_min, coors_z_min, coors_x_max, coors_y_max,
+            coors_z_max, grid_x, grid_y, grid_z, num_points, num_features,
+            NDim);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  // 2. map point to the idx of the corresponding voxel, find duplicate coor
+  // create some temporary variables
+  auto point_to_pointidx = -at::ones(
+      {
+          num_points,
+      },
+      points.options().dtype(at::kInt));
+  auto point_to_voxelidx = -at::ones(
+      {
+          num_points,
+      },
+      points.options().dtype(at::kInt));
+
+  dim3 map_grid(std::min(at::cuda::ATenCeilDiv(num_points, 512), 4096));
+  dim3 map_block(512);
+
+  AT_DISPATCH_ALL_TYPES(
+      temp_coors.scalar_type(), "determin_duplicate", ([&] {
+        point_to_voxelidx_kernel<int><<<map_grid, map_block, 0, stream>>>(
+            temp_coors.contiguous().data_ptr<int>(),
+            point_to_voxelidx.contiguous().data_ptr<int>(),
+            point_to_pointidx.contiguous().data_ptr<int>(), max_points,
+            max_voxels, num_points, NDim);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  // 3. determine voxel num and voxel's coor index
+  // make the logic in the CUDA device could accelerate about 10 times
+  auto coor_to_voxelidx = -at::ones(
+      {
+          num_points,
+      },
+      points.options().dtype(at::kInt));
+  auto voxel_num = at::zeros(
+      {
+          1,
+      },
+      points.options().dtype(at::kInt));  // must be zero from the beginning
+
+  AT_DISPATCH_ALL_TYPES(temp_coors.scalar_type(), "determin_duplicate", ([&] {
+                          determin_voxel_num<int><<<1, 1, 0, stream>>>(
+                              num_points_per_voxel.contiguous().data_ptr<int>(),
+                              point_to_voxelidx.contiguous().data_ptr<int>(),
+                              point_to_pointidx.contiguous().data_ptr<int>(),
+                              coor_to_voxelidx.contiguous().data_ptr<int>(),
+                              voxel_num.contiguous().data_ptr<int>(),
+                              max_points, max_voxels, num_points);
+                        }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  // 4. copy point features to voxels
+  // Step 4 & 5 could be parallel
+  auto pts_output_size = num_points * num_features;
+  dim3 cp_grid(std::min(at::cuda::ATenCeilDiv(pts_output_size, 512), 4096));
+  dim3 cp_block(512);
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "assign_point_to_voxel", ([&] {
+        assign_point_to_voxel<float, int><<<cp_grid, cp_block, 0, stream>>>(
+            pts_output_size, points.contiguous().data_ptr<float>(),
+            point_to_voxelidx.contiguous().data_ptr<int>(),
+            coor_to_voxelidx.contiguous().data_ptr<int>(),
+            voxels.contiguous().data_ptr<float>(), max_points, num_features,
+            num_points, NDim);
+      }));
+  //   cudaDeviceSynchronize();
+  //   AT_CUDA_CHECK(cudaGetLastError());
+
+  // 5. copy coors of each voxels
+  auto coors_output_size = num_points * NDim;
+  dim3 coors_cp_grid(
+      std::min(at::cuda::ATenCeilDiv(coors_output_size, 512), 4096));
+  dim3 coors_cp_block(512);
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "assign_point_to_voxel", ([&] {
+        assign_voxel_coors<float, int>
+            <<<coors_cp_grid, coors_cp_block, 0, stream>>>(
+                coors_output_size, temp_coors.contiguous().data_ptr<int>(),
+                point_to_voxelidx.contiguous().data_ptr<int>(),
+                coor_to_voxelidx.contiguous().data_ptr<int>(),
+                coors.contiguous().data_ptr<int>(), num_points, NDim);
+      }));
+
+  AT_CUDA_CHECK(cudaGetLastError());
+
+  auto voxel_num_cpu = voxel_num.to(at::kCPU);
+  int voxel_num_int = voxel_num_cpu.data_ptr<int>()[0];
+
+  return voxel_num_int;
+}
+
+int NondeterministicHardVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor &points, at::Tensor &voxels, at::Tensor &coors,
+    at::Tensor &num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3) {
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  const int num_points = points.size(0);
+  const int num_features = points.size(1);
+
+  if (num_points == 0) return 0;
+
+  dim3 blocks(
+      std::min(at::cuda::ATenCeilDiv(num_points, THREADS_PER_BLOCK), 4096));
+  dim3 threads(THREADS_PER_BLOCK);
+
+  const float voxel_x = voxel_size[0];
+  const float voxel_y = voxel_size[1];
+  const float voxel_z = voxel_size[2];
+  const float coors_x_min = coors_range[0];
+  const float coors_y_min = coors_range[1];
+  const float coors_z_min = coors_range[2];
+  const float coors_x_max = coors_range[3];
+  const float coors_y_max = coors_range[4];
+  const float coors_z_max = coors_range[5];
+
+  const int grid_x = round((coors_x_max - coors_x_min) / voxel_x);
+  const int grid_y = round((coors_y_max - coors_y_min) / voxel_y);
+  const int grid_z = round((coors_z_max - coors_z_min) / voxel_z);
+
+  // map points to voxel coors
+  at::Tensor temp_coors =
+      at::zeros({num_points, NDim}, points.options().dtype(at::kInt));
+
+  // 1. link point to corresponding voxel coors
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "hard_voxelize_kernel", ([&] {
+        dynamic_voxelize_kernel<scalar_t, int><<<blocks, threads, 0, stream>>>(
+            points.contiguous().data_ptr<scalar_t>(),
+            temp_coors.contiguous().data_ptr<int>(), voxel_x, voxel_y, voxel_z,
+            coors_x_min, coors_y_min, coors_z_min, coors_x_max, coors_y_max,
+            coors_z_max, grid_x, grid_y, grid_z, num_points, num_features,
+            NDim);
+      }));
+
+  at::Tensor coors_map;
+  at::Tensor reduce_count;
+
+  auto coors_clean = temp_coors.masked_fill(temp_coors.lt(0).any(-1, true), -1);
+
+  std::tie(temp_coors, coors_map, reduce_count) =
+      at::unique_dim(coors_clean, 0, true, true, false);
+
+  if (temp_coors[0][0].lt(0).item<bool>()) {
+    // the first element of temp_coors is (-1,-1,-1) and should be removed
+    temp_coors = temp_coors.slice(0, 1);
+    coors_map = coors_map - 1;
+  }
+
+  int num_coors = temp_coors.size(0);
+  temp_coors = temp_coors.to(at::kInt);
+  coors_map = coors_map.to(at::kInt);
+
+  at::Tensor coors_count = at::zeros({1}, coors_map.options());
+  at::Tensor coors_order = at::empty({num_coors}, coors_map.options());
+  at::Tensor pts_id = at::zeros({num_points}, coors_map.options());
+  reduce_count = at::zeros({num_coors}, coors_map.options());
+
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "get_assign_pos", ([&] {
+        nondeterministic_get_assign_pos<<<blocks, threads, 0, stream>>>(
+            num_points, coors_map.contiguous().data_ptr<int32_t>(),
+            pts_id.contiguous().data_ptr<int32_t>(),
+            coors_count.contiguous().data_ptr<int32_t>(),
+            reduce_count.contiguous().data_ptr<int32_t>(),
+            coors_order.contiguous().data_ptr<int32_t>());
+      }));
+
+  AT_DISPATCH_ALL_TYPES(
+      points.scalar_type(), "assign_point_to_voxel", ([&] {
+        nondeterministic_assign_point_voxel<scalar_t>
+            <<<blocks, threads, 0, stream>>>(
+                num_points, points.contiguous().data_ptr<scalar_t>(),
+                coors_map.contiguous().data_ptr<int32_t>(),
+                pts_id.contiguous().data_ptr<int32_t>(),
+                temp_coors.contiguous().data_ptr<int32_t>(),
+                reduce_count.contiguous().data_ptr<int32_t>(),
+                coors_order.contiguous().data_ptr<int32_t>(),
+                voxels.contiguous().data_ptr<scalar_t>(),
+                coors.contiguous().data_ptr<int32_t>(),
+                num_points_per_voxel.contiguous().data_ptr<int32_t>(),
+                max_voxels, max_points, num_features, NDim);
+      }));
+  AT_CUDA_CHECK(cudaGetLastError());
+  return max_voxels < num_coors ? max_voxels : num_coors;
+}
+
+void DynamicVoxelizeForwardCUDAKernelLauncher(
+    const at::Tensor &points, at::Tensor &coors,
+    const std::vector<float> voxel_size, const std::vector<float> coors_range,
+    const int NDim = 3) {
+  // current version tooks about 0.04s for one frame on cpu
+  // check device
+
+  at::cuda::CUDAGuard device_guard(points.device());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+
+  const int num_points = points.size(0);
+  const int num_features = points.size(1);
+
+  const float voxel_x = voxel_size[0];
+  const float voxel_y = voxel_size[1];
+  const float voxel_z = voxel_size[2];
+  const float coors_x_min = coors_range[0];
+  const float coors_y_min = coors_range[1];
+  const float coors_z_min = coors_range[2];
+  const float coors_x_max = coors_range[3];
+  const float coors_y_max = coors_range[4];
+  const float coors_z_max = coors_range[5];
+
+  const int grid_x = round((coors_x_max - coors_x_min) / voxel_x);
+  const int grid_y = round((coors_y_max - coors_y_min) / voxel_y);
+  const int grid_z = round((coors_z_max - coors_z_min) / voxel_z);
+
+  const int col_blocks = at::cuda::ATenCeilDiv(num_points, THREADS_PER_BLOCK);
+  dim3 blocks(col_blocks);
+  dim3 threads(THREADS_PER_BLOCK);
+
+  AT_DISPATCH_ALL_TYPES(points.scalar_type(), "dynamic_voxelize_kernel", [&] {
+    dynamic_voxelize_kernel<scalar_t, int><<<blocks, threads, 0, stream>>>(
+        points.contiguous().data_ptr<scalar_t>(),
+        coors.contiguous().data_ptr<int>(), voxel_x, voxel_y, voxel_z,
+        coors_x_min, coors_y_min, coors_z_min, coors_x_max, coors_y_max,
+        coors_z_max, grid_x, grid_y, grid_z, num_points, num_features, NDim);
+  });
+
+  AT_CUDA_CHECK(cudaGetLastError());
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..86690b9394a4b758104009062f656dcfe0de178e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_conv.cpp
@@ -0,0 +1,517 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void deformable_im2col_impl(Tensor data_im, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor data_col) {
+  DISPATCH_DEVICE_IMPL(deformable_im2col_impl, data_im, data_offset, channels,
+                       height, width, ksize_h, ksize_w, pad_h, pad_w, stride_h,
+                       stride_w, dilation_h, dilation_w, parallel_imgs,
+                       deformable_group, data_col);
+}
+
+void deformable_col2im_impl(Tensor data_col, Tensor data_offset,
+                            const int channels, const int height,
+                            const int width, const int ksize_h,
+                            const int ksize_w, const int pad_h, const int pad_w,
+                            const int stride_h, const int stride_w,
+                            const int dilation_h, const int dilation_w,
+                            const int parallel_imgs, const int deformable_group,
+                            Tensor grad_im) {
+  DISPATCH_DEVICE_IMPL(deformable_col2im_impl, data_col, data_offset, channels,
+                       height, width, ksize_h, ksize_w, pad_h, pad_w, stride_h,
+                       stride_w, dilation_h, dilation_w, parallel_imgs,
+                       deformable_group, grad_im);
+}
+
+void deformable_col2im_coord_impl(
+    Tensor data_col, Tensor data_im, Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, Tensor grad_offset) {
+  DISPATCH_DEVICE_IMPL(deformable_col2im_coord_impl, data_col, data_im,
+                       data_offset, channels, height, width, ksize_h, ksize_w,
+                       pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w,
+                       parallel_imgs, deformable_group, grad_offset);
+}
+
+void deform_conv_shape_check(at::Tensor input, at::Tensor offset,
+                             at::Tensor *gradOutput, at::Tensor weight, int kH,
+                             int kW, int dH, int dW, int padH, int padW,
+                             int dilationH, int dilationW, int group,
+                             int deformable_group) {
+  TORCH_CHECK(
+      weight.ndimension() == 4,
+      "4D weight tensor (nOutputPlane,nInputPlane,kH,kW) expected, but got: %s",
+      weight.ndimension());
+
+  TORCH_CHECK(weight.is_contiguous(), "weight tensor has to be contiguous");
+
+  TORCH_CHECK(kW > 0 && kH > 0,
+              "kernel size should be greater than zero, but got kH: %d kW: %d",
+              kH, kW);
+
+  TORCH_CHECK((weight.size(2) == kH && weight.size(3) == kW),
+              "kernel size should be consistent with weight, ",
+              "but got kH: %d kW: %d weight.size(2): %d, weight.size(3): %d",
+              kH, kW, weight.size(2), weight.size(3));
+
+  TORCH_CHECK(dW > 0 && dH > 0,
+              "stride should be greater than zero, but got dH: %d dW: %d", dH,
+              dW);
+
+  TORCH_CHECK(
+      dilationW > 0 && dilationH > 0,
+      "dilation should be greater than 0, but got dilationH: %d dilationW: %d",
+      dilationH, dilationW);
+
+  int ndim = input.ndimension();
+  int dimf = 0;
+  int dimh = 1;
+  int dimw = 2;
+
+  if (ndim == 4) {
+    dimf++;
+    dimh++;
+    dimw++;
+  }
+
+  TORCH_CHECK(ndim == 3 || ndim == 4,
+              "3D or 4D input tensor expected but got: %s", ndim);
+
+  long nInputPlane = weight.size(1) * group;
+  long inputHeight = input.size(dimh);
+  long inputWidth = input.size(dimw);
+  long nOutputPlane = weight.size(0);
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+
+  TORCH_CHECK(nInputPlane % deformable_group == 0,
+              "input channels must divide deformable group size");
+
+  if (outputWidth < 1 || outputHeight < 1)
+    AT_ERROR(
+        "Given input size: (%ld x %ld x %ld). "
+        "Calculated output size: (%ld x %ld x %ld). Output size is too small",
+        nInputPlane, inputHeight, inputWidth, nOutputPlane, outputHeight,
+        outputWidth);
+
+  TORCH_CHECK(input.size(1) == nInputPlane,
+              "invalid number of input planes, expected: %d, but got: %d",
+              nInputPlane, input.size(1));
+
+  TORCH_CHECK((inputHeight >= kH && inputWidth >= kW),
+              "input image is smaller than kernel");
+
+  TORCH_CHECK(
+      (offset.size(2) == outputHeight && offset.size(3) == outputWidth),
+      "invalid spatial size of offset, expected height: %d width: %d, but "
+      "got height: %d width: %d",
+      outputHeight, outputWidth, offset.size(2), offset.size(3));
+
+  TORCH_CHECK((offset.size(1) == deformable_group * 2 * kH * kW),
+              "invalid number of channels of offset");
+
+  if (gradOutput != NULL) {
+    TORCH_CHECK(
+        gradOutput->size(dimf) == nOutputPlane,
+        "invalid number of gradOutput planes, expected: %d, but got: %d",
+        nOutputPlane, gradOutput->size(dimf));
+
+    TORCH_CHECK(
+        (gradOutput->size(dimh) == outputHeight &&
+         gradOutput->size(dimw) == outputWidth),
+        "invalid size of gradOutput, expected height: %d width: %d , but "
+        "got height: %d width: %d",
+        outputHeight, outputWidth, gradOutput->size(dimh),
+        gradOutput->size(dimw));
+  }
+}
+
+void deform_conv_forward(Tensor input, Tensor weight, Tensor offset,
+                         Tensor output, Tensor columns, Tensor ones, int kW,
+                         int kH, int dW, int dH, int padW, int padH,
+                         int dilationW, int dilationH, int group,
+                         int deformable_group, int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(weight);
+    CHECK_CUDA_INPUT(output);
+    CHECK_CUDA_INPUT(columns);
+    CHECK_CUDA_INPUT(ones);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(weight);
+    CHECK_CPU_INPUT(output);
+    CHECK_CPU_INPUT(columns);
+    CHECK_CPU_INPUT(ones);
+  }
+
+  deform_conv_shape_check(input, offset, NULL, weight, kH, kW, dH, dW, padH,
+                          padW, dilationH, dilationW, group, deformable_group);
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input.unsqueeze_(0);
+    offset.unsqueeze_(0);
+  }
+
+  // todo: assert batchsize dividable by im2col_step
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = weight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+
+  output = output.view({batchSize / im2col_step, im2col_step, nOutputPlane,
+                        outputHeight, outputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < outputHeight * outputWidth) {
+    ones = at::ones({outputHeight, outputWidth}, input.options());
+  }
+
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  Tensor output_buffer = at::zeros({batchSize / im2col_step, nOutputPlane,
+                                    im2col_step * outputHeight, outputWidth},
+                                   output.options());
+
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), group, output_buffer.size(1) / group,
+       output_buffer.size(2), output_buffer.size(3)});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col_impl(input[elt], offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group, columns);
+
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      output_buffer[elt][g] = output_buffer[elt][g]
+                                  .flatten(1)
+                                  .addmm_(weight[g].flatten(1), columns[g])
+                                  .view_as(output_buffer[elt][g]);
+    }
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+  }
+
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), output_buffer.size(1) * output_buffer.size(2),
+       output_buffer.size(3), output_buffer.size(4)});
+
+  output_buffer = output_buffer.view({batchSize / im2col_step, nOutputPlane,
+                                      im2col_step, outputHeight, outputWidth});
+  output_buffer.transpose_(1, 2);
+  output.copy_(output_buffer);
+  output = output.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    output = output.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+}
+
+void deform_conv_backward_input(Tensor input, Tensor offset, Tensor gradOutput,
+                                Tensor gradInput, Tensor gradOffset,
+                                Tensor weight, Tensor columns, int kW, int kH,
+                                int dW, int dH, int padW, int padH,
+                                int dilationW, int dilationH, int group,
+                                int deformable_group, int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(gradOutput);
+    CHECK_CUDA_INPUT(gradInput);
+    CHECK_CUDA_INPUT(gradOffset);
+    CHECK_CUDA_INPUT(weight);
+    CHECK_CUDA_INPUT(columns);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(gradOutput);
+    CHECK_CPU_INPUT(gradInput);
+    CHECK_CPU_INPUT(gradOffset);
+    CHECK_CPU_INPUT(weight);
+    CHECK_CPU_INPUT(columns);
+  }
+  deform_conv_shape_check(input, offset, &gradOutput, weight, kH, kW, dH, dW,
+                          padH, padW, dilationH, dilationW, group,
+                          deformable_group);
+
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view({1, input.size(0), input.size(1), input.size(2)});
+    offset = offset.view({1, offset.size(0), offset.size(1), offset.size(2)});
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = weight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), 3, "invalid batch size of offset");
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  // change order of grad output
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+
+  gradInput = gradInput.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                              inputHeight, inputWidth});
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  gradOffset = gradOffset.view({batchSize / im2col_step, im2col_step,
+                                deformable_group * 2 * kH * kW, outputHeight,
+                                outputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    // divide into groups
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), group, gradOutput.size(1) / group,
+         gradOutput.size(2), gradOutput.size(3), gradOutput.size(4)});
+
+    for (int g = 0; g < group; g++) {
+      columns[g] = columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                                     gradOutput[elt][g].flatten(1), 0.0f, 1.0f);
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), gradOutput.size(1) * gradOutput.size(2),
+         gradOutput.size(3), gradOutput.size(4), gradOutput.size(5)});
+
+    deformable_col2im_coord_impl(columns, input[elt], offset[elt], nInputPlane,
+                                 inputHeight, inputWidth, kH, kW, padH, padW,
+                                 dH, dW, dilationH, dilationW, im2col_step,
+                                 deformable_group, gradOffset[elt]);
+
+    deformable_col2im_impl(columns, offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group,
+                           gradInput[elt]);
+
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+  }
+
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  gradOffset = gradOffset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    gradInput = gradInput.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+    gradOffset =
+        gradOffset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+}
+
+void deform_conv_backward_parameters(Tensor input, Tensor offset,
+                                     Tensor gradOutput, Tensor gradWeight,
+                                     Tensor columns, Tensor ones, int kW,
+                                     int kH, int dW, int dH, int padW, int padH,
+                                     int dilationW, int dilationH, int group,
+                                     int deformable_group, float scale,
+                                     int im2col_step) {
+  if (input.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    CHECK_CUDA_INPUT(input);
+    CHECK_CUDA_INPUT(offset);
+    CHECK_CUDA_INPUT(gradOutput);
+    CHECK_CUDA_INPUT(gradWeight);
+    CHECK_CUDA_INPUT(columns);
+    CHECK_CUDA_INPUT(ones);
+#else
+    AT_ERROR("DeformConv is not compiled with GPU support");
+#endif
+  } else {
+    CHECK_CPU_INPUT(input);
+    CHECK_CPU_INPUT(offset);
+    CHECK_CPU_INPUT(gradOutput);
+    CHECK_CPU_INPUT(gradWeight);
+    CHECK_CPU_INPUT(columns);
+    CHECK_CPU_INPUT(ones);
+  }
+
+  deform_conv_shape_check(input, offset, &gradOutput, gradWeight, kH, kW, dH,
+                          dW, padH, padW, dilationH, dilationW, group,
+                          deformable_group);
+  at::DeviceGuard guard(input.device());
+
+  int batch = 1;
+
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view(
+        at::IntList({1, input.size(0), input.size(1), input.size(2)}));
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+
+  long nOutputPlane = gradWeight.size(0);
+
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+
+  TORCH_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+
+  Tensor gradOutputBuffer = at::zeros_like(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane, im2col_step,
+                             outputHeight, outputWidth});
+  gradOutputBuffer = gradOutputBuffer.contiguous();
+  gradOutputBuffer.copy_(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane,
+                             im2col_step * outputHeight, outputWidth});
+
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col_impl(input[elt], offset[elt], nInputPlane, inputHeight,
+                           inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                           dilationW, im2col_step, deformable_group, columns);
+
+    // divide into group
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0), group, gradOutputBuffer.size(1) / group,
+         gradOutputBuffer.size(2), gradOutputBuffer.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    gradWeight =
+        gradWeight.view({group, gradWeight.size(0) / group, gradWeight.size(1),
+                         gradWeight.size(2), gradWeight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      gradWeight[g] = gradWeight[g]
+                          .flatten(1)
+                          .addmm_(gradOutputBuffer[elt][g].flatten(1),
+                                  columns[g].transpose(1, 0), 1.0, scale)
+                          .view_as(gradWeight[g]);
+    }
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0),
+         gradOutputBuffer.size(1) * gradOutputBuffer.size(2),
+         gradOutputBuffer.size(3), gradOutputBuffer.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradWeight = gradWeight.view({gradWeight.size(0) * gradWeight.size(1),
+                                  gradWeight.size(2), gradWeight.size(3),
+                                  gradWeight.size(4)});
+  }
+
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+  }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_roi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_roi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4fb78a96e74f7e97dff5212bb767eab743f2e73c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/deform_roi_pool.cpp
@@ -0,0 +1,42 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  DISPATCH_DEVICE_IMPL(deform_roi_pool_forward_impl, input, rois, offset,
+                       output, pooled_height, pooled_width, spatial_scale,
+                       sampling_ratio, gamma);
+}
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma) {
+  DISPATCH_DEVICE_IMPL(deform_roi_pool_backward_impl, grad_output, input, rois,
+                       offset, grad_input, grad_offset, pooled_height,
+                       pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_forward(Tensor input, Tensor rois, Tensor offset,
+                             Tensor output, int pooled_height, int pooled_width,
+                             float spatial_scale, int sampling_ratio,
+                             float gamma) {
+  deform_roi_pool_forward_impl(input, rois, offset, output, pooled_height,
+                               pooled_width, spatial_scale, sampling_ratio,
+                               gamma);
+}
+
+void deform_roi_pool_backward(Tensor grad_output, Tensor input, Tensor rois,
+                              Tensor offset, Tensor grad_input,
+                              Tensor grad_offset, int pooled_height,
+                              int pooled_width, float spatial_scale,
+                              int sampling_ratio, float gamma) {
+  deform_roi_pool_backward_impl(grad_output, input, rois, offset, grad_input,
+                                grad_offset, pooled_height, pooled_width,
+                                spatial_scale, sampling_ratio, gamma);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/diff_iou_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/diff_iou_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2361b7fbe5c86fa62a0fa78f39f6d018de108f8f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/diff_iou_rotated.cpp
@@ -0,0 +1,14 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor diff_iou_rotated_sort_vertices_forward_impl(Tensor vertices, Tensor mask,
+                                                   Tensor num_valid) {
+  return DISPATCH_DEVICE_IMPL(diff_iou_rotated_sort_vertices_forward_impl,
+                              vertices, mask, num_valid);
+}
+
+Tensor diff_iou_rotated_sort_vertices_forward(Tensor vertices, Tensor mask,
+                                              Tensor num_valid) {
+  return diff_iou_rotated_sort_vertices_forward_impl(vertices, mask, num_valid);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/focal_loss.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/focal_loss.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..ed0e2186532d9d6d909f76d653283bbdc29eac11
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/focal_loss.cpp
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(sigmoid_focal_loss_forward_impl, input, target, weight,
+                       output, gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(sigmoid_focal_loss_backward_impl, input, target, weight,
+                       grad_input, gamma, alpha);
+}
+
+void softmax_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha) {
+  DISPATCH_DEVICE_IMPL(softmax_focal_loss_forward_impl, input, target, weight,
+                       output, gamma, alpha);
+}
+
+void softmax_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha) {
+  DISPATCH_DEVICE_IMPL(softmax_focal_loss_backward_impl, input, target, weight,
+                       buff, grad_input, gamma, alpha);
+}
+
+void sigmoid_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha) {
+  sigmoid_focal_loss_forward_impl(input, target, weight, output, gamma, alpha);
+}
+
+void sigmoid_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor grad_input, float gamma, float alpha) {
+  sigmoid_focal_loss_backward_impl(input, target, weight, grad_input, gamma,
+                                   alpha);
+}
+
+void softmax_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha) {
+  softmax_focal_loss_forward_impl(input, target, weight, output, gamma, alpha);
+}
+
+void softmax_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor buff, Tensor grad_input, float gamma,
+                                 float alpha) {
+  softmax_focal_loss_backward_impl(input, target, weight, buff, grad_input,
+                                   gamma, alpha);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/furthest_point_sample.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/furthest_point_sample.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9c7098acdb5b8392a698803dd7c7d34a360df6ad
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/furthest_point_sample.cpp
@@ -0,0 +1,34 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/sampling.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void furthest_point_sampling_forward_impl(Tensor points_tensor,
+                                          Tensor temp_tensor, Tensor idx_tensor,
+                                          int b, int n, int m) {
+  DISPATCH_DEVICE_IMPL(furthest_point_sampling_forward_impl, points_tensor,
+                       temp_tensor, idx_tensor, b, n, m);
+}
+
+void furthest_point_sampling_with_dist_forward_impl(Tensor points_tensor,
+                                                    Tensor temp_tensor,
+                                                    Tensor idx_tensor, int b,
+                                                    int n, int m) {
+  DISPATCH_DEVICE_IMPL(furthest_point_sampling_with_dist_forward_impl,
+                       points_tensor, temp_tensor, idx_tensor, b, n, m);
+}
+
+void furthest_point_sampling_forward(Tensor points_tensor, Tensor temp_tensor,
+                                     Tensor idx_tensor, int b, int n, int m) {
+  furthest_point_sampling_forward_impl(points_tensor, temp_tensor, idx_tensor,
+                                       b, n, m);
+}
+
+void furthest_point_sampling_with_dist_forward(Tensor points_tensor,
+                                               Tensor temp_tensor,
+                                               Tensor idx_tensor, int b, int n,
+                                               int m) {
+  furthest_point_sampling_with_dist_forward_impl(points_tensor, temp_tensor,
+                                                 idx_tensor, b, n, m);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_bias_leakyrelu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_bias_leakyrelu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8d411c9d843f15174653aab4b24cbb3c37564073
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_bias_leakyrelu.cpp
@@ -0,0 +1,119 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/fused_bias_act.cpp
+
+/*
+Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+
+NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+Augmentation (ADA)
+=======================================================================
+
+1. Definitions
+
+"Licensor" means any person or entity that distributes its Work.
+
+"Software" means the original work of authorship made available under
+this License.
+
+"Work" means the Software and any additions to or derivative works of
+the Software that are made available under this License.
+
+The terms "reproduce," "reproduction," "derivative works," and
+"distribution" have the meaning as provided under U.S. copyright law;
+provided, however, that for the purposes of this License, derivative
+works shall not include works that remain separable from, or merely
+link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are "made available" under this License
+by including in or with the Work either (a) a copyright notice
+referencing the applicability of this License to the Work, or (b) a
+copy of this License.
+
+2. License Grants
+
+    2.1 Copyright Grant. Subject to the terms and conditions of this
+    License, each Licensor grants to you a perpetual, worldwide,
+    non-exclusive, royalty-free, copyright license to reproduce,
+    prepare derivative works of, publicly display, publicly perform,
+    sublicense and distribute its Work and any resulting derivative
+    works in any form.
+
+3. Limitations
+
+    3.1 Redistribution. You may reproduce or distribute the Work only
+    if (a) you do so under this License, (b) you include a complete
+    copy of this License with your distribution, and (c) you retain
+    without modification any copyright, patent, trademark, or
+    attribution notices that are present in the Work.
+
+    3.2 Derivative Works. You may specify that additional or different
+    terms apply to the use, reproduction, and distribution of your
+    derivative works of the Work ("Your Terms") only if (a) Your Terms
+    provide that the use limitation in Section 3.3 applies to your
+    derivative works, and (b) you identify the specific derivative
+    works that are subject to Your Terms. Notwithstanding Your Terms,
+    this License (including the redistribution requirements in Section
+    3.1) will continue to apply to the Work itself.
+
+    3.3 Use Limitation. The Work and any derivative works thereof only
+    may be used or intended for use non-commercially. Notwithstanding
+    the foregoing, NVIDIA and its affiliates may use the Work and any
+    derivative works commercially. As used herein, "non-commercially"
+    means for research or evaluation purposes only.
+
+    3.4 Patent Claims. If you bring or threaten to bring a patent claim
+    against any Licensor (including any claim, cross-claim or
+    counterclaim in a lawsuit) to enforce any patents that you allege
+    are infringed by any Work, then your rights under this License from
+    such Licensor (including the grant in Section 2.1) will terminate
+    immediately.
+
+    3.5 Trademarks. This License does not grant any rights to use any
+    Licensor’s or its affiliates’ names, logos, or trademarks, except
+    as necessary to reproduce the notices described in this License.
+
+    3.6 Termination. If you violate any term of this License, then your
+    rights under this License (including the grant in Section 2.1) will
+    terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+torch::Tensor fused_bias_leakyrelu_op_impl(const torch::Tensor& input,
+                                           const torch::Tensor& bias,
+                                           const torch::Tensor& refer, int act,
+                                           int grad, float alpha, float scale) {
+  return DISPATCH_DEVICE_IMPL(fused_bias_leakyrelu_op_impl, input, bias, refer,
+                              act, grad, alpha, scale);
+}
+
+torch::Tensor fused_bias_leakyrelu(const torch::Tensor& input,
+                                   const torch::Tensor& bias,
+                                   const torch::Tensor& refer, int act,
+                                   int grad, float alpha, float scale) {
+  return fused_bias_leakyrelu_op_impl(input, bias, refer, act, grad, alpha,
+                                      scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_spconv_ops.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_spconv_ops.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..54073a54ec5d335d2e2ed68c553eb1d6eb49557b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/fused_spconv_ops.cpp
@@ -0,0 +1,34 @@
+// Copyright 2019 Yan Yan
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+torch::Tensor fused_indice_conv_batchnorm_forward_impl(
+    torch::Tensor features, torch::Tensor filters, torch::Tensor bias,
+    torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut,
+    int64_t _inverse, int64_t _subM) {
+  return DISPATCH_DEVICE_IMPL(fused_indice_conv_batchnorm_forward_impl,
+                              features, filters, bias, indicePairs, indiceNum,
+                              numActOut, _inverse, _subM);
+}
+
+torch::Tensor fused_indice_conv_batchnorm_forward(
+    torch::Tensor features, torch::Tensor filters, torch::Tensor bias,
+    torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut,
+    int64_t _inverse, int64_t _subM) {
+  return fused_indice_conv_batchnorm_forward_impl(features, filters, bias,
+                                                  indicePairs, indiceNum,
+                                                  numActOut, _inverse, _subM);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/gather_points.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/gather_points.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b8fb020022902bfbeb5ba940621d51859c616bdc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/gather_points.cpp
@@ -0,0 +1,30 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void gather_points_forward_impl(int b, int c, int n, int npoints,
+                                const Tensor points, const Tensor idx,
+                                Tensor out) {
+  DISPATCH_DEVICE_IMPL(gather_points_forward_impl, b, c, n, npoints, points,
+                       idx, out);
+}
+
+void gather_points_backward_impl(int b, int c, int n, int npoints,
+                                 const Tensor grad_out, const Tensor idx,
+                                 Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(gather_points_backward_impl, b, c, n, npoints, grad_out,
+                       idx, grad_points);
+}
+
+void gather_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                           Tensor out_tensor, int b, int c, int n,
+                           int npoints) {
+  gather_points_forward_impl(b, c, n, npoints, points_tensor, idx_tensor,
+                             out_tensor);
+}
+
+void gather_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                            Tensor grad_points_tensor, int b, int c, int n,
+                            int npoints) {
+  gather_points_backward_impl(b, c, n, npoints, grad_out_tensor, idx_tensor,
+                              grad_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/group_points.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/group_points.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..850deed9866a63604e8c1171dc6c485ffad62c72
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/group_points.cpp
@@ -0,0 +1,76 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/group_points.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void group_points_forward_impl(int b, int c, int n, int npoints, int nsample,
+                               const Tensor points, const Tensor idx,
+                               Tensor out) {
+  DISPATCH_DEVICE_IMPL(group_points_forward_impl, b, c, n, npoints, nsample,
+                       points, idx, out);
+}
+
+void group_points_backward_impl(int b, int c, int n, int npoints, int nsample,
+                                const Tensor grad_out, const Tensor idx,
+                                Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(group_points_backward_impl, b, c, n, npoints, nsample,
+                       grad_out, idx, grad_points);
+}
+
+void group_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                          Tensor out_tensor, int b, int c, int n, int npoints,
+                          int nsample) {
+  DISPATCH_DEVICE_IMPL(group_points_forward_impl, b, c, n, npoints, nsample,
+                       points_tensor, idx_tensor, out_tensor);
+}
+
+void group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                           Tensor grad_points_tensor, int b, int c, int n,
+                           int npoints, int nsample) {
+  group_points_backward_impl(b, c, n, npoints, nsample, grad_out_tensor,
+                             idx_tensor, grad_points_tensor);
+}
+
+void stack_group_points_backward_impl(int b, int c, int m, int n, int nsample,
+                                      const Tensor grad_out_tensor,
+                                      const Tensor idx_tensor,
+                                      const Tensor idx_batch_cnt_tensor,
+                                      const Tensor features_batch_cnt_tensor,
+                                      Tensor grad_features_tensor) {
+  DISPATCH_DEVICE_IMPL(stack_group_points_backward_impl, b, c, m, n, nsample,
+                       grad_out_tensor, idx_tensor, idx_batch_cnt_tensor,
+                       features_batch_cnt_tensor, grad_features_tensor);
+}
+
+void stack_group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                 Tensor idx_batch_cnt_tensor,
+                                 Tensor features_batch_cnt_tensor,
+                                 Tensor grad_features_tensor, int b, int c,
+                                 int m, int n, int nsample) {
+  stack_group_points_backward_impl(
+      b, c, m, n, nsample, grad_out_tensor, idx_tensor, idx_batch_cnt_tensor,
+      features_batch_cnt_tensor, grad_features_tensor);
+}
+
+void stack_group_points_forward_impl(int b, int c, int m, int nsample,
+                                     const Tensor features_tensor,
+                                     const Tensor features_batch_cnt_tensor,
+                                     const Tensor idx_tensor,
+                                     const Tensor idx_batch_cnt_tensor,
+                                     Tensor out_tensor) {
+  DISPATCH_DEVICE_IMPL(stack_group_points_forward_impl, b, c, m, nsample,
+                       features_tensor, features_batch_cnt_tensor, idx_tensor,
+                       idx_batch_cnt_tensor, out_tensor);
+}
+
+void stack_group_points_forward(Tensor features_tensor,
+                                Tensor features_batch_cnt_tensor,
+                                Tensor idx_tensor, Tensor idx_batch_cnt_tensor,
+                                Tensor out_tensor, int b, int c, int m,
+                                int nsample) {
+  DISPATCH_DEVICE_IMPL(stack_group_points_forward_impl, b, c, m, nsample,
+                       features_tensor, features_batch_cnt_tensor, idx_tensor,
+                       idx_batch_cnt_tensor, out_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/info.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/info.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a4cc41861128dc0a8f8ccd641f68044428c4dc2c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/info.cpp
@@ -0,0 +1,65 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/vision.cpp
+#include "pytorch_cpp_helper.hpp"
+
+#ifdef MMCV_WITH_CUDA
+#ifdef MMCV_WITH_HIP
+#include <hip/hip_runtime_api.h>
+int get_hiprt_version() {
+  int runtimeVersion;
+  hipRuntimeGetVersion(&runtimeVersion);
+  return runtimeVersion;
+}
+#else
+#include <cuda_runtime_api.h>
+int get_cudart_version() { return CUDART_VERSION; }
+#endif
+#endif
+
+std::string get_compiling_cuda_version() {
+#ifdef MMCV_WITH_CUDA
+#ifndef MMCV_WITH_HIP
+  std::ostringstream oss;
+  // copied from
+  // https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/CUDAHooks.cpp#L231
+  auto printCudaStyleVersion = [&](int v) {
+    oss << (v / 1000) << "." << (v / 10 % 100);
+    if (v % 10 != 0) {
+      oss << "." << (v % 10);
+    }
+  };
+  printCudaStyleVersion(get_cudart_version());
+  return oss.str();
+#else
+  std::ostringstream oss;
+  oss << get_hiprt_version();
+  return oss.str();
+#endif
+#else
+  return std::string("not available");
+#endif
+}
+
+// similar to
+// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Version.cpp
+std::string get_compiler_version() {
+  std::ostringstream ss;
+#if defined(__GNUC__)
+#ifndef __clang__
+  { ss << "GCC " << __GNUC__ << "." << __GNUC_MINOR__; }
+#endif
+#endif
+
+#if defined(__clang_major__)
+  {
+    ss << "clang " << __clang_major__ << "." << __clang_minor__ << "."
+       << __clang_patchlevel__;
+  }
+#endif
+
+#if defined(_MSC_VER)
+  { ss << "MSVC " << _MSC_FULL_VER; }
+#endif
+  return ss.str();
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/iou3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/iou3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a347c0ee96db9ceefd6168c3cce84bea243e7044
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/iou3d.cpp
@@ -0,0 +1,66 @@
+// Modified from
+// https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/iou3d_nms/src/iou3d_nms.cpp
+
+/*
+3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others)
+Written by Shaoshuai Shi
+All Rights Reserved 2019-2020.
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8;
+
+void iou3d_boxes_overlap_bev_forward_impl(const int num_a, const Tensor boxes_a,
+                                          const int num_b, const Tensor boxes_b,
+                                          Tensor ans_overlap) {
+  DISPATCH_DEVICE_IMPL(iou3d_boxes_overlap_bev_forward_impl, num_a, boxes_a,
+                       num_b, boxes_b, ans_overlap);
+}
+
+void iou3d_nms3d_forward_impl(const Tensor boxes, Tensor &keep,
+                              Tensor &keep_num, float nms_overlap_thresh) {
+  DISPATCH_DEVICE_IMPL(iou3d_nms3d_forward_impl, boxes, keep, keep_num,
+                       nms_overlap_thresh);
+}
+
+void iou3d_nms3d_normal_forward_impl(const Tensor boxes, Tensor &keep,
+                                     Tensor &keep_num,
+                                     float nms_overlap_thresh) {
+  DISPATCH_DEVICE_IMPL(iou3d_nms3d_normal_forward_impl, boxes, keep, keep_num,
+                       nms_overlap_thresh);
+}
+
+void iou3d_boxes_overlap_bev_forward(Tensor boxes_a, Tensor boxes_b,
+                                     Tensor ans_overlap) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params boxes_b: (M, 5)
+  // params ans_overlap: (N, M)
+  int num_a = boxes_a.size(0);
+  int num_b = boxes_b.size(0);
+
+  iou3d_boxes_overlap_bev_forward_impl(num_a, boxes_a, num_b, boxes_b,
+                                       ans_overlap);
+}
+
+void iou3d_nms3d_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                         float nms_overlap_thresh) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params keep: (N)
+  CHECK_CONTIGUOUS(boxes);
+  CHECK_CONTIGUOUS(keep);
+
+  iou3d_nms3d_forward_impl(boxes, keep, keep_num, nms_overlap_thresh);
+}
+
+void iou3d_nms3d_normal_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                                float nms_overlap_thresh) {
+  // params boxes: (N, 7) [x, y, z, dx, dy, dz, heading]
+  // params keep: (N)
+
+  CHECK_CONTIGUOUS(boxes);
+  CHECK_CONTIGUOUS(keep);
+
+  iou3d_nms3d_normal_forward_impl(boxes, keep, keep_num, nms_overlap_thresh);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/knn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/knn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b4be9428c59c0f04635891b954f4c73f7fb0536d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/knn.cpp
@@ -0,0 +1,17 @@
+// Modified from
+// https://github.com/CVMI-Lab/PAConv/tree/main/scene_seg/lib/pointops/src/knnquery_heap
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void knn_forward_impl(int b, int n, int m, int nsample, const Tensor xyz,
+                      const Tensor new_xyz, Tensor idx, Tensor dist2) {
+  DISPATCH_DEVICE_IMPL(knn_forward_impl, b, n, m, nsample, xyz, new_xyz, idx,
+                       dist2);
+}
+
+void knn_forward(Tensor xyz_tensor, Tensor new_xyz_tensor, Tensor idx_tensor,
+                 Tensor dist2_tensor, int b, int n, int m, int nsample) {
+  knn_forward_impl(b, n, m, nsample, xyz_tensor, new_xyz_tensor, idx_tensor,
+                   dist2_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/masked_conv2d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/masked_conv2d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5903925351fcb193b86c8b5f01b410e4fc0bbaf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/masked_conv2d.cpp
@@ -0,0 +1,33 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void masked_im2col_forward_impl(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w) {
+  DISPATCH_DEVICE_IMPL(masked_im2col_forward_impl, im, mask_h_idx, mask_w_idx,
+                       col, kernel_h, kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_impl(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels) {
+  DISPATCH_DEVICE_IMPL(masked_col2im_forward_impl, col, mask_h_idx, mask_w_idx,
+                       im, height, width, channels);
+}
+
+void masked_im2col_forward(const Tensor im, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor col,
+                           const int kernel_h, const int kernel_w,
+                           const int pad_h, const int pad_w) {
+  masked_im2col_forward_impl(im, mask_h_idx, mask_w_idx, col, kernel_h,
+                             kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward(const Tensor col, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor im, int height,
+                           int width, int channels) {
+  masked_col2im_forward_impl(col, mask_h_idx, mask_w_idx, im, height, width,
+                             channels);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/min_area_polygons.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/min_area_polygons.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8ff996dc8992b4c95633516054ecdba5913de8f3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/min_area_polygons.cpp
@@ -0,0 +1,11 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void min_area_polygons_impl(const Tensor pointsets, Tensor polygons) {
+  DISPATCH_DEVICE_IMPL(min_area_polygons_impl, pointsets, polygons);
+}
+
+void min_area_polygons(const Tensor pointsets, Tensor polygons) {
+  min_area_polygons_impl(pointsets, polygons);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/bbox_overlaps_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/bbox_overlaps_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..82d55559c52047bfd82c3813c995e3d0ae0c24c0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/bbox_overlaps_mlu.cpp
@@ -0,0 +1,100 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelBBoxOverlaps(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                        cnrtQueue_t queue, const cnrtDataType_t d_type,
+                        const void *bbox1, const void *bbox2, void *ious,
+                        const int32_t num_bbox1, const int32_t num_bbox2,
+                        const int32_t mode, const bool aligned,
+                        const int32_t offset);
+
+static void policyFunc(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type,
+                       const int32_t batch_num_all) {
+  auto union_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto core_num = union_num * core_dim;
+
+  // Union1 policyFunc
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_dim;
+  auto need_core_num = PAD_UP(batch_num_all, core_dim);
+  k_dim->y =
+      (need_core_num < core_num) ? (need_core_num / core_dim) : union_num;
+  k_dim->z = 1;
+
+  return;
+}
+
+void BBoxOverlapsMLUKernelLauncher(const Tensor bboxes1, const Tensor bboxes2,
+                                   Tensor ious, const int32_t mode,
+                                   const bool aligned, const int32_t offset) {
+  // check dtype
+  TORCH_CHECK(
+      bboxes1.scalar_type() == at::kFloat || bboxes1.scalar_type() == at::kHalf,
+      "Data type of input should be Float or Half. But now input type is ",
+      bboxes1.scalar_type(), ".");
+  TORCH_CHECK(bboxes1.scalar_type() == bboxes2.scalar_type(),
+              "bboxes1's dtype should be the same with bboxes2's dtype.");
+
+  // params check
+  TORCH_CHECK(bboxes1.dim() == 2, "bboxes1 should be a 2d tensor, got ",
+              bboxes1.dim(), "D");
+  TORCH_CHECK(bboxes2.dim() == 2, "bboxes2 should be a 2d tensor, got ",
+              bboxes2.dim(), "D");
+
+  auto rows = bboxes1.size(0);
+  auto cols = bboxes2.size(0);
+  auto batch_num_all = rows;
+
+  if (rows * cols == 0) {
+    // return if zero element
+    return;
+  }
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(&k_dim, &k_type, batch_num_all);
+
+  // get compute queue
+  cnrtQueue_t queue = torch_mlu::getCurQueue();
+
+  // get dtype of input
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(bboxes1.dtype());
+
+  // get ptr of tensors
+  auto bboxes1_impl = torch_mlu::getMluTensorImpl(bboxes1);
+  auto bboxes1_ptr = bboxes1_impl->cnnlMalloc();
+  auto bboxes2_impl = torch_mlu::getMluTensorImpl(bboxes2);
+  auto bboxes2_ptr = bboxes2_impl->cnnlMalloc();
+  auto ious_impl = torch_mlu::getMluTensorImpl(ious);
+  auto ious_ptr = ious_impl->cnnlMalloc();
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUUnion1BboxOverlapsKernel";
+  CNLOG(INFO) << "kDim :[ " << k_dim.x << ", " << k_dim.y << ", " << k_dim.z
+              << " ]";
+  KernelBBoxOverlaps(k_dim, k_type, queue, d_type, bboxes1_ptr, bboxes2_ptr,
+                     ious_ptr, rows, cols, mode, aligned, offset);
+}
+
+void bbox_overlaps_mlu(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                       const int mode, const bool aligned, const int offset) {
+  BBoxOverlapsMLUKernelLauncher(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                        const int mode, const bool aligned, const int offset);
+REGISTER_DEVICE_IMPL(bbox_overlaps_impl, MLU, bbox_overlaps_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/carafe_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/carafe_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..25e0b85d1245a669abbe5cd9e94402c3f9f7030e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/carafe_mlu.cpp
@@ -0,0 +1,429 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "carafe_utils.hpp"
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelCarafeForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                         cnrtQueue_t queue, const cnrtDataType_t d_type,
+                         const void *input, const void *mask,
+                         const CarafeForwardParam &param,
+                         const CarafeForwardBlockDim &block_dim,
+                         const CarafeForwardGridDim &grid_dim, void *output);
+
+void KernelCarafeBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t dtype,
+                          const void *input, const void *mask,
+                          const void *grad_output, void *grad_input,
+                          void *grad_mask, const int n, const int hi,
+                          const int wi, const int c, const int k_up,
+                          const int group, const int scale);
+
+// Get total NRAM usage and set strides of NRAM arrays.
+static void getNramUsage(CarafeForwardParam *param,
+                         CarafeForwardBlockDim *block_dim, int *nram_usage) {
+  // input_nram[blkDim_(Hi+Kh)-1, blkDim_(Wi+Kw)-1, blkDim_G, blkDim_Cg]
+  block_dim->Hi = CEIL_DIV(block_dim->Ho, param->scale_factor) + 1;
+  block_dim->Wi = CEIL_DIV(block_dim->Wo, param->scale_factor) + 1;
+
+  param->input_nram_stride_g = PAD_UP(block_dim->Cg, param->align_size_NRAM);
+  param->input_nram_stride_w = param->input_nram_stride_g * block_dim->G;
+  param->input_nram_stride_h =
+      (block_dim->Wi + block_dim->Kw - 1) * param->input_nram_stride_w;
+  param->input_nram_size =
+      (block_dim->Hi + block_dim->Kh - 1) * param->input_nram_stride_h;
+
+  // mask_nram[blkDim_Ho, blkDim_Wo, blkDim_G, blkDim_Kh, blkDim_Kw]
+  param->mask_nram_stride_kh = block_dim->Kw;
+  param->mask_nram_stride_g = block_dim->Kh * param->mask_nram_stride_kh;
+  param->mask_nram_stride_w = block_dim->G * param->mask_nram_stride_g;
+  param->mask_nram_stride_h = block_dim->Wo * param->mask_nram_stride_w;
+  param->mask_nram_size =
+      PAD_UP(block_dim->Ho * param->mask_nram_stride_h, param->align_size_NRAM);
+
+  // output_nram[blkDim_Ho, blkDim_Wo, blkDim_(G*Cg)]
+  param->output_nram_stride_g = param->input_nram_stride_g;
+  param->output_nram_stride_w =
+      PAD_UP(param->input_nram_stride_w, param->align_size_NFU);
+  param->output_nram_stride_h = block_dim->Wo * param->output_nram_stride_w;
+  param->output_nram_size = block_dim->Ho * param->output_nram_stride_h;
+
+  // sum_array[blkDim_(G*Cg)]
+
+  // ensure the last mul_const on Cg does not exceed memory boundary
+  int sum_array_size_bang_mul_const =
+      (block_dim->G - 1) * param->input_nram_stride_g +
+      PAD_UP(param->input_nram_stride_g, param->align_size_NFU);
+
+  int sum_array_size =
+      std::max(param->output_nram_stride_w, sum_array_size_bang_mul_const);
+
+  *nram_usage = param->input_nram_size + param->mask_nram_size +
+                param->output_nram_size + sum_array_size;
+}
+
+// Policy Function for Forward
+static void genPolicyForward(CarafeForwardParam *param,
+                             CarafeForwardBlockDim *block_dim,
+                             CarafeForwardGridDim *grid_dim, cnrtDim3_t *k_dim,
+                             cnrtFunctionType_t *k_type) {
+  // device info
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  auto core_num = core_dim * cluster_num;
+
+  // maximum NRAM size as the number of <dtype>
+  auto max_nram_size =
+      torch_mlu::getDeviceAttr(cnrtAttrNramSizePerMcore) / param->dtype_size;
+
+  // determine grid and block dimensions
+
+  // set initial values for block_dim and grid_dim
+  block_dim->Ho = param->Ho;
+  block_dim->Wo = param->Wo;
+  block_dim->Kh = param->kernel_size;
+  block_dim->Kw = param->kernel_size;
+  block_dim->G = param->group_size;
+  block_dim->Cg = param->Cg;
+
+  grid_dim->Ho = 1;
+  grid_dim->Wo = 1;
+  grid_dim->Kh = 1;
+  grid_dim->Kw = 1;
+  grid_dim->G = 1;
+  grid_dim->Cg = 1;
+
+  // decrease the block size to fit in the NRAM.
+  int nram_usage = 0;
+  while (true) {
+    getNramUsage(param, block_dim, &nram_usage);
+
+    if (nram_usage > max_nram_size) {
+      // decrease Ho
+      // decrease block_Ho and block_Wo evenly
+      // so that the block is close to a square.
+      if (block_dim->Ho > 1 && block_dim->Ho >= block_dim->Wo) {
+        grid_dim->Ho += 1;
+        block_dim->Ho = CEIL_DIV(param->Ho, grid_dim->Ho);
+      } else if (block_dim->Wo > 1 && block_dim->Wo > block_dim->Ho) {
+        // decrease Wo
+        grid_dim->Wo += 1;
+        block_dim->Wo = CEIL_DIV(param->Wo, grid_dim->Wo);
+      } else if (block_dim->Kh > 1) {
+        // decrease Kh
+        grid_dim->Kh += 1;
+        block_dim->Kh = CEIL_DIV(param->kernel_size, grid_dim->Kh);
+        // reset Hi, Wi to maximize NRAM usage
+        grid_dim->Ho = 1;
+        block_dim->Ho = param->Ho;
+        grid_dim->Wo = 1;
+        block_dim->Wo = param->Wo;
+      } else if (block_dim->Kw > 1) {
+        // decrease Kw
+        grid_dim->Kw += 1;
+        block_dim->Kw = CEIL_DIV(param->kernel_size, grid_dim->Kw);
+        // reset Kh
+        grid_dim->Kh = 1;
+        block_dim->Kh = param->kernel_size;
+      } else if (block_dim->G > 1) {
+        // decrease G
+        grid_dim->G += 1;
+        block_dim->G = CEIL_DIV(param->group_size, grid_dim->G);
+        // reset Kw
+        grid_dim->Kw = 1;
+        block_dim->Kw = param->kernel_size;
+      } else if (block_dim->Cg > 1) {
+        // decrease block_Cg
+        // This is done in the last since c is the continuous dim
+        // (input layout is NHWC) and large c can improve
+        // IO & compute efficiency.
+        grid_dim->Cg += 1;
+        block_dim->Cg = CEIL_DIV(param->Cg, grid_dim->Cg);
+        // reset G
+        grid_dim->G = 1;
+        block_dim->G = param->group_size;
+      } else {
+        // the block volume is one now, cannot decrease the block size anymore!
+        // this situation should not occur.
+        break;
+      }
+    } else {
+      break;
+    }
+  }
+
+  // define parameters depending on block_dim, grid_dim
+  param->block_Cg_NFU = PAD_UP(block_dim->Cg, param->align_size_NFU);
+
+  // define host arrays' strides
+
+  // input[N,H,W,G,Cg]
+  param->input_stride_g = param->Cg;
+  param->input_stride_w = param->Ci;
+  param->input_stride_h = param->Wi * param->input_stride_w;
+  param->input_stride_n = param->Hi * param->input_stride_h;
+  // mask[N,Ho,Wo,G,Kh,Kw]
+  param->mask_stride_kh = param->kernel_size;
+  param->mask_stride_g = param->kernel_size * param->mask_stride_kh;
+  param->mask_stride_w = param->group_size * param->mask_stride_g;
+  param->mask_stride_h = param->Wo * param->mask_stride_w;
+  param->mask_stride_n = param->Ho * param->mask_stride_h;
+  // output[N,Ho,Wo,G,Cg]
+  param->output_stride_g = param->Cg;
+  param->output_stride_w = param->Ci;
+  param->output_stride_h = param->Wo * param->output_stride_w;
+  param->output_stride_n = param->Ho * param->output_stride_h;
+
+  param->job_num =
+      param->N * grid_dim->Ho * grid_dim->Wo * grid_dim->G * grid_dim->Cg;
+
+  // determine task type and dims
+  *k_type = CNRT_FUNC_TYPE_BLOCK;
+  k_dim->x = std::min(param->job_num, static_cast<int>(core_num));
+  k_dim->y = 1;
+  k_dim->z = 1;
+}
+
+void CARAFEForwardMLUKernelLauncher(const Tensor input, const Tensor mask,
+                                    Tensor rinput, Tensor routput, Tensor rmask,
+                                    Tensor output, const int kernel_size,
+                                    const int group_size,
+                                    const int scale_factor) {
+  const int batch_size = output.size(0);
+  const int channels = output.size(1);
+  const int ho = output.size(2);
+  const int wo = output.size(3);
+
+  // check tensor data type
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "Data type of input should be Float or Half. But now input type is ",
+      input.scalar_type(), ".");
+
+  TORCH_CHECK(mask.scalar_type() == input.scalar_type(),
+              "Data types of input and mask should be the same, but got ",
+              input.scalar_type(), " and ", mask.scalar_type());
+
+  // check number of dimensions
+  TORCH_CHECK(input.dim() == 4, "input should be a 4-D tensor, but has ",
+              input.dim(), "D.");
+  TORCH_CHECK(mask.dim() == 4, "mask should be a 4-D tensor, but has ",
+              input.dim(), "D.");
+
+  // return fast on zero-element tensor
+  if (output.numel() == 0) {
+    output = at::zeros({batch_size, channels, ho, wo}, output.options());
+    return;
+  }
+
+  // set param
+  CarafeForwardParam param;
+  param.N = input.size(0);
+  param.Ci = input.size(1);
+  param.Hi = input.size(2);
+  param.Wi = input.size(3);
+
+  param.kernel_size = kernel_size;
+  param.group_size = group_size;
+  param.scale_factor = scale_factor;
+  param.Cg = param.Ci / group_size;
+  param.dtype_size = input.itemsize();
+  param.align_size_NRAM = NRAM_ALIGN_SIZE / param.dtype_size;
+  param.align_size_NFU = NFU_ALIGN_SIZE / param.dtype_size;
+  param.kernel_size_sq = param.kernel_size * param.kernel_size;
+  param.kernel_size_half = (param.kernel_size - 1) / 2;
+  param.Ho = param.Hi * param.scale_factor;
+  param.Wo = param.Wi * param.scale_factor;
+
+  // generate policy
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  CarafeForwardBlockDim block_dim;
+  CarafeForwardGridDim grid_dim;
+
+  genPolicyForward(&param, &block_dim, &grid_dim, &k_dim, &k_type);
+
+  // convert NCHW to NHWC
+  auto memory_format_input_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto rinput_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format_input_nhwc);
+
+  auto memory_format_mask_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(mask.dim());
+  auto rmask_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(mask, memory_format_mask_nhwc);
+
+  auto memory_format_output_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(output.dim());
+  auto routput_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(output, memory_format_output_nhwc);
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(rinput_);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto mask_impl = torch_mlu::getMluTensorImpl(rmask_);
+  auto mask_ptr = mask_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(routput_);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get dtype of input
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(input.dtype());
+
+  // launch kernel
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  CNLOG(INFO) << "Launch Kernel KernelCarafeForward<<<Union"
+              << k_type / core_dim << ", " << k_dim.x << ", " << k_dim.y << ", "
+              << k_dim.z << ">>>";
+
+  KernelCarafeForward(k_dim, k_type, queue, d_type, input_ptr, mask_ptr, param,
+                      block_dim, grid_dim, output_ptr);
+
+  // copy output from NHWC back into NCHW
+  rinput.copy_(rinput_);
+  output.copy_(routput_);
+}
+
+// Policy Function for Backward
+static void policyFuncBackward(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  // set Union1 Job
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim->z = 1;
+}
+
+void CARAFEBackwardMLUKernelLauncher(
+    const Tensor grad_output, const Tensor rinput, const Tensor mask,
+    Tensor rgrad_output, Tensor rgrad_input_hs, Tensor rgrad_input,
+    Tensor rgrad_mask, Tensor grad_input, Tensor grad_mask,
+    const int kernel_size, const int group_size, const int scale_factor) {
+  const int batch_size = rinput.size(0);
+  const int channels = rinput.size(1);
+  const int hi = rinput.size(2);
+  const int wi = rinput.size(3);
+
+  // data type check
+  TORCH_CHECK(grad_output.scalar_type() == at::kFloat ||
+                  grad_output.scalar_type() == at::kHalf,
+              "grad_output type should be Float or Half, got ",
+              grad_output.scalar_type());
+  TORCH_CHECK(grad_output.scalar_type() == mask.scalar_type(),
+              "mask should have the same type as grad_output");
+
+  // dim check
+  TORCH_CHECK(grad_output.dim() == 4, "grad_output should be a 4d tensor, got ",
+              grad_output.dim(), "D");
+
+  // param check
+  TORCH_CHECK(kernel_size < 137, "kernel_size should be less than 137, got ",
+              kernel_size);
+
+  // set task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncBackward(&k_dim, &k_type);
+
+  // convert NCHW to NHWC
+  auto memory_format_input_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(rinput.dim());
+  auto rinput_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(rinput, memory_format_input_nhwc);
+
+  auto memory_format_mask_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(mask.dim());
+  auto rmask_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(mask, memory_format_mask_nhwc);
+
+  auto memory_format_grad_output_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad_output.dim());
+  auto rgrad_output_ = torch_mlu::cnnl::ops::cnnl_contiguous(
+      grad_output, memory_format_grad_output_nhwc);
+
+  auto memory_format_grad_input_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad_input.dim());
+  auto rgrad_input_ = torch_mlu::cnnl::ops::cnnl_contiguous(
+                          grad_input, memory_format_grad_input_nhwc)
+                          .zero_();
+
+  auto memory_format_grad_mask_nhwc =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad_mask.dim());
+  auto rgrad_mask_ = torch_mlu::cnnl::ops::cnnl_contiguous(
+      grad_mask, memory_format_grad_mask_nhwc);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(rinput_);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto mask_impl = torch_mlu::getMluTensorImpl(rmask_);
+  auto mask_ptr = mask_impl->cnnlMalloc();
+  auto grad_output_impl = torch_mlu::getMluTensorImpl(rgrad_output_);
+  auto grad_output_ptr = grad_output_impl->cnnlMalloc();
+  auto grad_input_impl = torch_mlu::getMluTensorImpl(rgrad_input_);
+  auto grad_input_ptr = grad_input_impl->cnnlMalloc();
+  auto grad_mask_impl = torch_mlu::getMluTensorImpl(rgrad_mask_);
+  auto grad_mask_ptr = grad_mask_impl->cnnlMalloc();
+
+  // get dtype of grad_output
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(grad_output.dtype());
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+
+  CNLOG(INFO) << "Launch Kernel KernelCarafeBackward<<<Union"
+              << k_type / core_dim << ", " << k_dim.x << ", " << k_dim.y << ", "
+              << k_dim.z << ">>>";
+
+  // launch kernel
+  KernelCarafeBackward(k_dim, k_type, queue, d_type, input_ptr, mask_ptr,
+                       grad_output_ptr, grad_input_ptr, grad_mask_ptr,
+                       batch_size, hi, wi, channels, kernel_size, group_size,
+                       scale_factor);
+
+  // copy output from NHWC back into NCHW
+  grad_input.copy_(rgrad_input_);
+  grad_mask.copy_(rgrad_mask_);
+}
+
+void carafe_forward_mlu(Tensor features, Tensor masks, Tensor rfeatures,
+                        Tensor routput, Tensor rmasks, Tensor output,
+                        int kernel_size, int group_size, int scale_factor) {
+  CARAFEForwardMLUKernelLauncher(features, masks, rfeatures, routput, rmasks,
+                                 output, kernel_size, group_size, scale_factor);
+}
+
+void carafe_backward_mlu(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                         Tensor rtop_grad, Tensor rbottom_grad_hs,
+                         Tensor rbottom_grad, Tensor rmask_grad,
+                         Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                         int group_size, int scale_factor) {
+  CARAFEBackwardMLUKernelLauncher(top_grad, rfeatures, masks, rtop_grad,
+                                  rbottom_grad_hs, rbottom_grad, rmask_grad,
+                                  bottom_grad, mask_grad, kernel_size,
+                                  group_size, scale_factor);
+}
+
+void carafe_forward_impl(Tensor features, Tensor masks, Tensor rfeatures,
+                         Tensor routput, Tensor rmasks, Tensor output,
+                         int kernel_size, int group_size, int scale_factor);
+
+void carafe_backward_impl(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                          Tensor rtop_grad, Tensor rbottom_grad_hs,
+                          Tensor rbottom_grad, Tensor rmask_grad,
+                          Tensor bottom_grad, Tensor mask_grad, int kernel_size,
+                          int group_size, int scale_factor);
+
+REGISTER_DEVICE_IMPL(carafe_forward_impl, MLU, carafe_forward_mlu);
+REGISTER_DEVICE_IMPL(carafe_backward_impl, MLU, carafe_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/deform_roi_pool_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/deform_roi_pool_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4d73cbbe593b4ec9e86af77ed96729d285df356f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/deform_roi_pool_mlu.cpp
@@ -0,0 +1,343 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelDeformRoIPoolForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                cnrtQueue_t queue, cnrtDataType_t data_type,
+                                const void *input, const void *rois,
+                                const void *offset, void *output,
+                                const int channels, const int height,
+                                const int width, const int num_rois,
+                                const int pooled_height, const int pooled_width,
+                                const float spatial_scale,
+                                const int sampling_ratio, const float gamma);
+
+void KernelDeformRoIPoolBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    cnrtDataType_t data_type, const void *grad_output, const void *input,
+    const void *rois, const void *offset, void *grad_input, void *grad_offset,
+    const int channels, const int height, const int width, const int num_rois,
+    const int pooled_height, const int pooled_width, const float spatial_scale,
+    const int sampling_ratio, const float gamma);
+
+// policy function for forward and backward
+static void policyFunc(const int bin_num, cnrtDim3_t *k_dim,
+                       cnrtFunctionType_t *k_type) {
+  const size_t cluster_limit = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  ;
+  const size_t core_limit = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  const size_t bin_num_align = CEIL_ALIGN(bin_num, core_limit);
+  k_dim->x = core_limit;
+  k_dim->y = (bin_num_align / core_limit) > cluster_limit
+                 ? cluster_limit
+                 : (bin_num_align / core_limit);
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+}
+
+void DeformRoIPoolForwardMLUKernelLauncher(Tensor input, Tensor rois,
+                                           Tensor offset, Tensor output,
+                                           int pooled_height, int pooled_width,
+                                           float spatial_scale,
+                                           int sampling_ratio, float gamma) {
+  // Check dtype.
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "input type should be Float or Half, got ", input.scalar_type());
+  TORCH_CHECK(input.scalar_type() == rois.scalar_type(),
+              "rois should have the same type as input");
+
+  // Check shape.
+  TORCH_CHECK(input.dim() == 4, "input should be 4d tensor, got ", input.dim(),
+              "D.");
+  TORCH_CHECK(rois.dim() == 2, "rois should be 2d tensor, got ", rois.dim(),
+              "D.");
+  if (offset.defined() && offset.numel() > 0) {
+    TORCH_CHECK(input.scalar_type() == offset.scalar_type(),
+                "offset should have the same type as input");
+    TORCH_CHECK(offset.dim() == 4, "offset should be 4d tensor, got ",
+                offset.dim(), "D.");
+    TORCH_CHECK(
+        (offset.size(0) == rois.size(0)), "offset.size(0) = ", offset.size(0),
+        "while rois.size(0)) = ", rois.size(0), ". They should be the same.");
+    TORCH_CHECK((offset.size(1) == 2), "offset.size(1) should be 2, ",
+                "but now offset.size(1) = ", offset.size(1), ".");
+    TORCH_CHECK((offset.size(2) == output.size(2)),
+                "offset.size(2) = ", offset.size(2),
+                "while output.size(2)) = ", output.size(2),
+                ". They should be the same.");
+    TORCH_CHECK((offset.size(3) == output.size(3)),
+                "offset.size(3) = ", offset.size(3),
+                "while output.size(3)) = ", output.size(3),
+                ". They should be the same.");
+  }
+
+  TORCH_CHECK(spatial_scale > 0 && spatial_scale <= 1,
+              "spatial_scale should be within (0, 1], got ", spatial_scale,
+              ".");
+
+  // compute kernel params
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto channels = input.size(1);
+  auto num_rois = output.size(0);
+
+  if (output.numel() == 0) {
+    output = at::zeros({num_rois, channels, pooled_height, pooled_width},
+                       input.options());
+    return;
+  }
+
+  // zero element check
+  TORCH_CHECK(input.size(0) != 0, "input.size(0) should not be zero, got ",
+              input.size(0));
+  TORCH_CHECK(rois.numel() != 0, "rois.numel() should not be zero, got ",
+              rois.numel());
+  if (input.numel() == 0 || output.numel() == 0) {
+    return;
+  }
+
+  // large tensor check
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(input.numel() < max_input_num,
+              "input.numel() should be less than 2147483648, got ",
+              input.numel());
+  TORCH_CHECK(rois.numel() < max_input_num,
+              "rois.numel() should be less than 2147483648, got ",
+              rois.numel());
+  TORCH_CHECK(output.numel() < max_input_num,
+              "output.numel() should be less than 2147483648, got ",
+              output.numel());
+  TORCH_CHECK(!offset.defined() || offset.numel() < max_input_num,
+              "offset.numel() should be less than 2147483648, got ",
+              offset.numel());
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto input_ = torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format);
+
+  at::Tensor output_ =
+      at::empty({num_rois, channels, pooled_height, pooled_width},
+                input.options(), memory_format);
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(num_rois * pooled_height * pooled_width, &k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(input_);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto offset_impl = torch_mlu::getMluTensorImpl(offset);
+  auto offset_ptr = offset_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output_);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(input_.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelDeformRoIPoolForward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelDeformRoIPoolForward(k_dim, k_type, queue, data_type, input_ptr,
+                             rois_ptr, offset_ptr, output_ptr, channels, height,
+                             width, num_rois, pooled_height, pooled_width,
+                             spatial_scale, sampling_ratio, gamma);
+
+  output.copy_(output_);
+}
+
+void DeformRoIPoolBackwardMLUKernelLauncher(
+    Tensor grad_output, Tensor input, Tensor rois, Tensor offset,
+    Tensor grad_input, Tensor grad_offset, int pooled_height, int pooled_width,
+    float spatial_scale, int sampling_ratio, float gamma) {
+  // Check dtype.
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "input type should be Float or Half, got ", input.scalar_type());
+  TORCH_CHECK(input.scalar_type() == grad_output.scalar_type(),
+              "grad_output should have the same type as input");
+  TORCH_CHECK(input.scalar_type() == rois.scalar_type(),
+              "rois should have the same type as input");
+  TORCH_CHECK(input.scalar_type() == grad_input.scalar_type(),
+              "grad_input should have the same type as input");
+
+  // Check shape.
+  TORCH_CHECK(grad_output.dim() == 4, "grad_output should be 4d tensor, got ",
+              grad_output.dim(), "D.");
+  TORCH_CHECK(input.dim() == 4, "input should be 4d tensor, got ", input.dim(),
+              "D.");
+  TORCH_CHECK(rois.dim() == 2, "rois should be 2d tensor, got ", rois.dim(),
+              "D.");
+  if (offset.defined() && offset.numel() > 0) {
+    TORCH_CHECK(input.scalar_type() == offset.scalar_type(),
+                "offset should have the same type as input");
+    TORCH_CHECK(offset.dim() == 4, "offset should be 4d tensor, got ",
+                offset.dim(), "D.");
+    TORCH_CHECK(
+        (offset.size(0) == rois.size(0)), "offset.size(0) = ", offset.size(0),
+        "while rois.size(0)) = ", rois.size(0), ". They should be the same.");
+    TORCH_CHECK((offset.size(1) == 2), "offset.size(1) should be 2, ",
+                "but now offset.size(1) = ", offset.size(1), ".");
+    TORCH_CHECK((offset.size(2) == grad_output.size(2)),
+                "offset.size(2) = ", offset.size(2),
+                "while grad_output.size(2)) = ", grad_output.size(2),
+                ". They should be the same.");
+    TORCH_CHECK((offset.size(3) == grad_output.size(3)),
+                "offset.size(3) = ", offset.size(3),
+                "while grad_output.size(3)) = ", grad_output.size(3),
+                ". They should be the same.");
+  }
+
+  TORCH_CHECK(spatial_scale > 0 && spatial_scale <= 1,
+              "spatial_scale should be within (0, 1], got ", spatial_scale);
+
+  // Check relationship between tensor.
+  TORCH_CHECK((grad_output.size(0) == rois.size(0)),
+              "grad_output.size(0) = ", grad_output.size(0),
+              "while rois.size(0)) = ", rois.size(0),
+              ". They should be the same.");
+  TORCH_CHECK((grad_output.size(1) == input.size(1)),
+              "grad_output.size(1) = ", grad_output.size(1),
+              "while input.size(1)) = ", input.size(1),
+              ". They should be the same.");
+  TORCH_CHECK((grad_output.size(2) == pooled_height),
+              "grad_output.size(2) = ", grad_output.size(2),
+              "while pooled_height = ", pooled_height,
+              ". They should be the same.");
+  TORCH_CHECK((grad_output.size(3) == pooled_width),
+              "grad_output.size(3) = ", grad_output.size(3),
+              "while pooled_width = ", pooled_width,
+              ". They should be the same.");
+
+  // compute kernel params
+  auto batch = input.size(0);
+  auto channels = input.size(1);
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto num_rois = grad_output.size(0);
+
+  // zero element check
+  TORCH_CHECK(input.size(0) != 0, "input.size(0) should not be zero, got ",
+              input.size(0));
+  TORCH_CHECK(rois.numel() != 0, "rois.numel() should not be zero, got ",
+              rois.numel());
+  if (input.numel() == 0 || grad_output.numel() == 0) {
+    return;
+  }
+
+  // large tensor check
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(input.numel() < max_input_num,
+              "input.numel() should be less than 2147483648, got ",
+              input.numel());
+  TORCH_CHECK(rois.numel() < max_input_num,
+              "rois.numel() should be less than 2147483648, got ",
+              rois.numel());
+  TORCH_CHECK(grad_output.numel() < max_input_num,
+              "grad_output.numel() should be less than 2147483648, got ",
+              grad_output.numel());
+  TORCH_CHECK(!offset.defined() || offset.numel() < max_input_num,
+              "offset.numel() should be less than 2147483648, got ",
+              offset.numel());
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad_output.dim());
+  auto grad_output_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(grad_output, memory_format);
+  memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto input_ = torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format);
+  at::Tensor grad_input_ = at::empty({batch, channels, height, width},
+                                     input.options(), memory_format)
+                               .zero_();
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(num_rois * pooled_height * pooled_width, &k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto grad_output_impl = torch_mlu::getMluTensorImpl(grad_output_);
+  auto grad_output_ptr = grad_output_impl->cnnlMalloc();
+  auto input_impl = torch_mlu::getMluTensorImpl(input_);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto offset_impl = torch_mlu::getMluTensorImpl(offset);
+  auto offset_ptr = offset_impl->cnnlMalloc();
+  auto grad_input_impl = torch_mlu::getMluTensorImpl(grad_input_);
+  auto grad_input_ptr = grad_input_impl->cnnlMalloc();
+  auto grad_offset_impl = torch_mlu::getMluTensorImpl(grad_offset);
+  auto grad_offset_ptr = grad_offset_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(input.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel KernelDeformRoIPoolBackward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelDeformRoIPoolBackward(k_dim, k_type, queue, data_type, grad_output_ptr,
+                              input_ptr, rois_ptr, offset_ptr, grad_input_ptr,
+                              grad_offset_ptr, channels, height, width,
+                              num_rois, pooled_height, pooled_width,
+                              spatial_scale, sampling_ratio, gamma);
+
+  grad_input.copy_(grad_input_);
+}
+
+void deform_roi_pool_forward_mlu(Tensor input, Tensor rois, Tensor offset,
+                                 Tensor output, int pooled_height,
+                                 int pooled_width, float spatial_scale,
+                                 int sampling_ratio, float gamma) {
+  DeformRoIPoolForwardMLUKernelLauncher(input, rois, offset, output,
+                                        pooled_height, pooled_width,
+                                        spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_backward_mlu(Tensor grad_output, Tensor input, Tensor rois,
+                                  Tensor offset, Tensor grad_input,
+                                  Tensor grad_offset, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  DeformRoIPoolBackwardMLUKernelLauncher(
+      grad_output, input, rois, offset, grad_input, grad_offset, pooled_height,
+      pooled_width, spatial_scale, sampling_ratio, gamma);
+}
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma);
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma);
+
+REGISTER_DEVICE_IMPL(deform_roi_pool_forward_impl, MLU,
+                     deform_roi_pool_forward_mlu);
+REGISTER_DEVICE_IMPL(deform_roi_pool_backward_impl, MLU,
+                     deform_roi_pool_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9242644c894c31b8d7ac2d719fc80d2b57bbdb96
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp
@@ -0,0 +1,332 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include <string>
+#include <vector>
+
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelFocalLossSigmoidForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                   cnrtQueue_t queue,
+                                   const cnrtDataType_t d_type,
+                                   const void *input, const void *target,
+                                   const void *weight, const int32_t N,
+                                   const int32_t C, const float alpha,
+                                   const float gamma, void *output);
+
+void KernelFocalLossSigmoidBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                    cnrtQueue_t queue,
+                                    const cnrtDataType_t d_type,
+                                    const void *input, const void *target,
+                                    const void *weight, const float gamma,
+                                    const float alpha, const int32_t dim_n,
+                                    const int32_t deal_n, const int32_t dim_c,
+                                    void *output);
+// Policy Function for Forward
+static void policyFuncForward(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type,
+                              const Tensor &input, const Tensor &target,
+                              const Tensor &weight) {
+  auto N = input.size(0);
+  auto C = input.size(1);
+
+  const size_t nram_size = torch_mlu::getDeviceAttr(cnrtAttrNramSizePerMcore);
+  const size_t c_align_size = PAD_UP((C * input.itemsize()), NFU_ALIGN_SIZE);
+  const int split_target_num = 2;
+  const int split_pipeline_num = 6;
+  const int has_weight = weight.data_ptr() != nullptr;
+  const int target_data_width = target.scalar_type() == at::kLong
+                                    ? target.itemsize() / 2
+                                    : target.itemsize();
+  const int threshold_c =
+      PAD_DOWN((nram_size - split_target_num * sizeof(int)) /
+                   (split_pipeline_num + has_weight),
+               NFU_ALIGN_SIZE) /
+      input.itemsize();
+
+  int n_seg = 1;
+  if (C <= threshold_c) {
+    int c_size = C * input.itemsize();
+    int reservered_align_size =
+        (split_target_num + split_pipeline_num) * NFU_ALIGN_SIZE;
+    int wegiht_size = 0;
+    if (has_weight) {
+      c_size = c_align_size;
+      reservered_align_size = split_target_num * NFU_ALIGN_SIZE;
+      wegiht_size = c_align_size;
+    }
+    // n_seg * c_size * split_pipeline_num + n_seg * target.itemsize() *
+    // split_target_num
+    //     + weight_size + reservered_align_size <= nram_size
+    n_seg = (nram_size - wegiht_size - reservered_align_size) /
+            (split_pipeline_num * c_size + split_target_num * sizeof(int32_t));
+  }
+  auto seg_num = n_seg == 0 ? N : (N + n_seg - 1) / n_seg;
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  auto core_num = core_dim * cluster_num;
+
+  k_dim->x = *k_type;
+  k_dim->y =
+      seg_num > core_num ? cluster_num : (seg_num + core_dim - 1) / core_dim;
+  k_dim->z = 1;
+}
+
+// Policy Function for Backward
+static void policyFuncBackward(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  // set Union1 Job
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim->z = 1;
+}
+
+void SigmoidFocalLossForwardMLUKernelLauncher(Tensor input, Tensor target,
+                                              Tensor weight, Tensor output,
+                                              const float gamma,
+                                              const float alpha) {
+  // params check
+  TORCH_CHECK(gamma >= 0, "gamma should be greater than or equal to 0. ",
+              "But now gamma is ", gamma, ".");
+
+  // check dtype
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "Data type of input should be Float or Half. But now input type is ",
+      input.scalar_type(), ".");
+
+  TORCH_CHECK(
+      (target.scalar_type() == at::kInt || target.scalar_type() == at::kLong),
+      "target type should be Int or Long. ", "But now target type is ",
+      target.scalar_type(), ".");
+
+  if (weight.data_ptr() != nullptr) {
+    TORCH_CHECK(weight.scalar_type() == input.scalar_type(),
+                "Data types of input and weight should be the same. But now "
+                "input type is ",
+                input.scalar_type(), ", weight type is ", weight.scalar_type(),
+                ".");
+  } else {
+    CNLOG(INFO) << "weight is a empty tensor.";
+  }
+
+  // return if zero-element
+  if (input.numel() == 0 || target.numel() == 0 || output.numel() == 0) {
+    return;
+  }
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  policyFuncForward(&k_dim, &k_type, input, target, weight);
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(input);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto target_impl = torch_mlu::getMluTensorImpl(target);
+  auto target_ptr = target_impl->cnnlMalloc();
+  auto weight_impl = torch_mlu::getMluTensorImpl(weight);
+  auto weight_ptr = weight_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  // get dtype of input
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(input.dtype());
+
+  CNLOG(INFO) << "Launch Kernel KernelFocalLossSigmoidForward<<<Union"
+              << k_type / core_dim << ", " << k_dim.x << ", " << k_dim.y << ", "
+              << k_dim.z << ">>>";
+  // launch kernel
+  KernelFocalLossSigmoidForward(k_dim, k_type, queue, d_type, input_ptr,
+                                target_ptr, weight_ptr, input.size(0),
+                                input.size(1), alpha, gamma, output_ptr);
+}
+
+void getDealNAndThresholdC(const int compute_data_bytes,
+                           const int target_data_bytes, const int total_c,
+                           int *deal_n_ptr, int *threshold_c_ptr,
+                           const bool has_weight, const bool is_half) {
+  /* NRAM partition:
+   *
+   * |-----------------ping pong--------------------|
+   * |input | pt | alpha_t | temp | output | target | flt_min | gamma | weight|
+   *
+   * split_pipeline_num is 5: including input, pt, alpha_t, temp, output.
+   */
+  const int nram_split_num = 5;
+  const int nram_split_pingpong = 2;
+  const int max_nram_size = torch_mlu::getDeviceAttr(cnrtAttrNramSizePerMcore);
+  int32_t compute_align_size = NFU_ALIGN_SIZE;
+  if (is_half) {
+    compute_align_size += NFU_ALIGN_SIZE;
+  }
+  const int32_t compute_align_num = compute_align_size / compute_data_bytes;
+  // reservered_align_size: including input(ping pong), pt(ping pong),
+  //                        alpha_t(ping pong), temp(ping pong),
+  //                        output(ping pong), target(ping pong),
+  //                        flt_min and gamma.
+  const int reservered_align_size =
+      ((nram_split_num + 1) * nram_split_pingpong + 2) * compute_align_size;
+  int nram_pingpong_size = max_nram_size - reservered_align_size;
+
+  int compute_c = total_c;
+  int threshold_c = 0;
+  if (has_weight) {
+    // reserved space for weight to align
+    nram_pingpong_size -= NFU_ALIGN_SIZE;
+
+    // threshold_c * nram_split_pingpong * compute_data_bytes * nram_split_num +
+    //     nram_split_pingpong * target_data_bytes +
+    //     threshold_c * compute_data_bytes <= nram_pingpong_size
+    threshold_c =
+        (nram_pingpong_size - nram_split_pingpong * target_data_bytes) /
+        (compute_data_bytes * (nram_split_num * nram_split_pingpong + 1));
+    threshold_c = PAD_DOWN(threshold_c, compute_align_num);
+    int weight_space = PAD_UP(total_c * compute_data_bytes, NFU_ALIGN_SIZE);
+
+    // reserved space for weight
+    nram_pingpong_size -= weight_space;
+    compute_c = PAD_UP(total_c, compute_align_num);
+  } else {
+    // threshold_c * nram_split_pingpong * compute_data_bytes * nram_split_num +
+    //     nram_split_pingpong * target_data_bytes <= nram_pingpong_size
+    threshold_c =
+        (nram_pingpong_size / nram_split_pingpong - target_data_bytes) /
+        (nram_split_num * compute_data_bytes);
+  }
+  // deal_n * compute_c * nram_split_pingpong * compute_data_bytes *
+  //     nram_split_num + deal_n * nram_split_pingpong * target_data_bytes <=
+  //     nram_pingpong_size
+  *deal_n_ptr =
+      nram_pingpong_size /
+      ((nram_split_num * compute_c * compute_data_bytes + target_data_bytes) *
+       nram_split_pingpong);
+  *threshold_c_ptr = threshold_c;
+}
+
+void SigmoidFocalLossBackwardMLUKernelLauncher(Tensor input, Tensor target,
+                                               Tensor weight, Tensor output,
+                                               const float gamma,
+                                               const float alpha) {
+  // params check
+  TORCH_CHECK(gamma >= 0, "gamma should be greater than or equal to 0. ",
+              "But now gamma is ", gamma, ".");
+  // check dtype
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "Data type of input should be Float or Half. But now input type is ",
+      input.scalar_type(), ".");
+
+  TORCH_CHECK(
+      (target.scalar_type() == at::kInt || target.scalar_type() == at::kLong),
+      "target type should be Int or Long. ", "But now target type is ",
+      target.scalar_type(), ".");
+
+  bool has_weight = false;
+  if (weight.data_ptr() != nullptr) {
+    TORCH_CHECK(weight.scalar_type() == input.scalar_type(),
+                "Data types of input and weight should be the same. But now "
+                "input type is ",
+                input.scalar_type(), ", weight type is ", weight.scalar_type(),
+                ".");
+    has_weight = true;
+  } else {
+    CNLOG(INFO) << "weight is a empty tensor.";
+  }
+
+  auto dim_c = input.size(1);
+  const int compute_data_bytes = sizeof(float);
+  // target supports only INT on MLU device while it keeps LONG on host side,
+  // so target.itemsize() / 2
+  const int target_data_bytes = target.scalar_type() == at::kLong
+                                    ? (target.itemsize() / 2)
+                                    : target.itemsize();
+  int deal_n = 0;
+  int threshold_c = 0;
+  bool is_half = false;
+  if (input.scalar_type() == at::kHalf) {
+    is_half = true;
+  }
+  // calculate deal_n and threshold_c
+  getDealNAndThresholdC(compute_data_bytes, target_data_bytes, dim_c, &deal_n,
+                        &threshold_c, has_weight, is_half);
+
+  // check C
+  TORCH_CHECK(threshold_c >= dim_c,
+              "input.size(1) should be in the range of [0, ", threshold_c,
+              "]. ", "But now input.size(1) is ", dim_c, ".");
+
+  if (input.numel() == 0 || target.numel() == 0 || output.numel() == 0) {
+    // return if zero-element
+    return;
+  }
+
+  // set task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncBackward(&k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(input);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto target_impl = torch_mlu::getMluTensorImpl(target);
+  auto target_ptr = target_impl->cnnlMalloc();
+  auto weight_impl = torch_mlu::getMluTensorImpl(weight);
+  auto weight_ptr = weight_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  // get dtype of input
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(input.dtype());
+  auto core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto dim_n = input.size(0);
+
+  CNLOG(INFO) << "Launch Kernel KernelFocalLossSigmoidBackward<<<Union"
+              << k_type / core_dim << ", " << k_dim.x << ", " << k_dim.y << ", "
+              << k_dim.z << ">>>";
+
+  // launch kernel
+  KernelFocalLossSigmoidBackward(k_dim, k_type, queue, d_type, input_ptr,
+                                 target_ptr, weight_ptr, gamma, alpha, dim_n,
+                                 deal_n, dim_c, output_ptr);
+}
+
+void sigmoid_focal_loss_forward_mlu(Tensor input, Tensor target, Tensor weight,
+                                    Tensor output, float gamma, float alpha) {
+  SigmoidFocalLossForwardMLUKernelLauncher(input, target, weight, output, gamma,
+                                           alpha);
+}
+
+void sigmoid_focal_loss_backward_mlu(Tensor input, Tensor target, Tensor weight,
+                                     Tensor grad_input, float gamma,
+                                     float alpha) {
+  SigmoidFocalLossBackwardMLUKernelLauncher(input, target, weight, grad_input,
+                                            gamma, alpha);
+}
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha);
+
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_forward_impl, MLU,
+                     sigmoid_focal_loss_forward_mlu);
+REGISTER_DEVICE_IMPL(sigmoid_focal_loss_backward_impl, MLU,
+                     sigmoid_focal_loss_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/iou3d_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/iou3d_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5348d16e011e5387401857238a862009f38c46f6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/iou3d_mlu.cpp
@@ -0,0 +1,144 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelIou3d(cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+                 const cnrtDataType_t data_type_input, const void *boxes_dram,
+                 const int input_box_num, const float iou_threshold,
+                 void *workspace, void *output_size, void *output);
+
+int selectType(uint32_t use_job, int box_num_per_core) {
+  // the box_num_per_core should be at least 256, otherwise the real IO
+  // bandwidth would be very low
+  while (box_num_per_core < 256 && use_job >= 4) {
+    box_num_per_core *= 2;
+    use_job /= 2;
+  }
+  return use_job;
+}
+static cnnlStatus_t policyFunc(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type,
+                               int &core_num_per_class,
+                               const int input_box_num) {
+  uint32_t core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  uint32_t job_limit = getJobLimitCapability();
+  uint32_t core_number = job_limit;
+
+  int box_num_per_core = (input_box_num + core_number - 1) / core_number;
+  int use_job = selectType(job_limit, box_num_per_core);
+  // initiate k_type as Union1
+  k_dim->x = core_dim;
+  k_dim->y = 1;
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  switch (job_limit) {
+    case CN_KERNEL_CLASS_BLOCK:
+    case CN_KERNEL_CLASS_UNION:
+    case CN_KERNEL_CLASS_UNION2:
+    case CN_KERNEL_CLASS_UNION4:
+    case CN_KERNEL_CLASS_UNION8:
+    case CN_KERNEL_CLASS_UNION16: {
+      if (use_job < 4) {
+        k_dim->x = 1;
+        *k_type = CNRT_FUNC_TYPE_BLOCK;
+      } else if (use_job == 4) {
+        k_dim->x = core_dim;
+        *k_type = CNRT_FUNC_TYPE_UNION1;
+      } else {
+        k_dim->x = use_job;
+        *k_type = (cnrtFunctionType_t)use_job;
+      }
+    }; break;
+    default:
+      LOG(WARNING) << "[cnnlNms_v2]: got unsupported job limit number."
+                   << " Use default CN_KERNEL_CLASS_UNION1 with UNION1 task.";
+  }
+  return CNNL_STATUS_SUCCESS;
+}
+
+void IoU3DNMS3DMLUKernelLauncher(Tensor boxes, Tensor &keep, Tensor &keep_num,
+                                 float iou_threshold) {
+  // dimension parameters check
+  TORCH_CHECK(boxes.dim() == 2, "boxes should be a 2d tensor, got ",
+              boxes.dim(), "D");
+  TORCH_CHECK(boxes.size(1) == 7,
+              "boxes should have 7 elements in dimension 1, got ",
+              boxes.size(1));
+
+  // data type check
+  TORCH_CHECK(
+      boxes.scalar_type() == at::kFloat || boxes.scalar_type() == at::kHalf,
+      "data type of boxes should be Float or Half, got ", boxes.scalar_type());
+
+  if (boxes.numel() == 0) {
+    return;
+  }
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(boxes.numel() < max_input_num,
+              "boxes.numel() should be less than 2147483648, got ",
+              boxes.numel());
+  int input_box_num = boxes.size(0);
+
+  cnrtDataType_t data_type_input = torch_mlu::toCnrtDtype(boxes.dtype());
+  cnrtDim3_t k_dim;
+  cnrtJobType_t k_type;
+
+  int core_num_per_class;
+  policyFunc(&k_dim, &k_type, core_num_per_class, input_box_num);
+
+  // transpose boxes (n, 7) to (7, n) for better performance
+  auto boxes_t = boxes.transpose(0, 1);
+  auto boxes_ = torch_mlu::cnnl::ops::cnnl_contiguous(boxes_t);
+
+  auto output = at::empty({input_box_num}, boxes.options().dtype(at::kLong));
+  auto output_size = at::empty({1}, boxes.options().dtype(at::kInt));
+
+  // workspace
+  const int info_num = 7;  // x, y,z, dx, dy, dz,angle
+  size_t space_size = 0;
+  if (boxes.scalar_type() == at::kHalf) {
+    space_size = input_box_num * sizeof(int16_t) * info_num +
+                 input_box_num * sizeof(float) + sizeof(float);
+  } else {
+    space_size = input_box_num * sizeof(float) * (info_num + 1) + sizeof(float);
+  }
+
+  auto workspace = at::empty(space_size, boxes.options().dtype(at::kByte));
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  auto boxes_impl = torch_mlu::getMluTensorImpl(boxes_);
+  auto boxes_ptr = boxes_impl->cnnlMalloc();
+  auto workspace_impl = torch_mlu::getMluTensorImpl(workspace);
+  auto workspace_ptr = workspace_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(keep);
+  auto output_ptr = output_impl->cnnlMalloc();
+  auto output_size_impl = torch_mlu::getMluTensorImpl(keep_num);
+  auto output_size_ptr = output_size_impl->cnnlMalloc();
+
+  uint32_t core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  CNLOG(INFO) << "Launch Kernel KernelIou3d<<<Union" << k_type / core_dim
+              << ", " << k_dim.x << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+  KernelIou3d(k_dim, k_type, queue, data_type_input, boxes_ptr, input_box_num,
+              iou_threshold, workspace_ptr, output_size_ptr, output_ptr);
+}
+
+void iou3d_nms3d_forward_mlu(const Tensor boxes, Tensor &keep, Tensor &keep_num,
+                             float nms_overlap_thresh) {
+  IoU3DNMS3DMLUKernelLauncher(boxes, keep, keep_num, nms_overlap_thresh);
+}
+
+void iou3d_nms3d_forward_impl(const Tensor boxes, Tensor &keep,
+                              Tensor &keep_num, float nms_overlap_thresh);
+REGISTER_DEVICE_IMPL(iou3d_nms3d_forward_impl, MLU, iou3d_nms3d_forward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/masked_conv2d_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/masked_conv2d_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e7842b3a13841b2caa27ded028ee103193822931
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/masked_conv2d_mlu.cpp
@@ -0,0 +1,226 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelMaskedIm2colForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    cnrtDataType_t k_dtype, const void *im_ptr, const int height,
+    const int width, const int channels, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const void *mask_h_idx_ptr,
+    const void *mask_w_idx_ptr, const int mask_cnt, void *col_ptr);
+
+void KernelMaskedCol2imForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                               cnrtQueue_t queue, cnrtDataType_t k_dtype,
+                               const void *col_ptr, const int height,
+                               const int width, const int channels,
+                               const void *mask_h_idx_ptr,
+                               const void *mask_w_idx_ptr, const int mask_cnt,
+                               void *im_ptr);
+
+// policy function
+static void policyFunc(const int mask_cnt, cnrtDim3_t *k_dim,
+                       cnrtFunctionType_t *k_type) {
+  const size_t cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  const size_t core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  const size_t task_dim = CEIL_ALIGN(mask_cnt, core_num);
+  k_dim->x = core_num;
+  k_dim->y =
+      (task_dim / core_num) > cluster_num ? cluster_num : (task_dim / core_num);
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+}
+
+void MaskedIm2colForwardMLUKernelLauncher(const Tensor im,
+                                          const Tensor mask_h_idx,
+                                          const Tensor mask_w_idx, Tensor col,
+                                          const int kernel_h,
+                                          const int kernel_w, const int pad_h,
+                                          const int pad_w) {
+  // Check dtype.
+  TORCH_CHECK(im.scalar_type() == at::kFloat || im.scalar_type() == at::kHalf,
+              "im type should be Float or Half, got ", im.scalar_type(), ".");
+  TORCH_CHECK(mask_h_idx.scalar_type() == at::kInt ||
+                  mask_h_idx.scalar_type() == at::kLong,
+              "mask_h_idx type should be Int or Long, got ",
+              mask_h_idx.scalar_type(), ".");
+  TORCH_CHECK(mask_w_idx.scalar_type() == at::kInt ||
+                  mask_w_idx.scalar_type() == at::kLong,
+              "mask_w_idx type should be Int or Long, got ",
+              mask_w_idx.scalar_type(), ".");
+  TORCH_CHECK(kernel_h > 0, "kernel_h should greater than 0, got ", kernel_h,
+              ".");
+  TORCH_CHECK(kernel_w > 0, "kernel_w should greater than 0, got ", kernel_w,
+              ".");
+
+  // zero element check
+  TORCH_CHECK(im.numel() > 0, "im.numel should greater than zero, got ",
+              im.numel(), ".");
+  TORCH_CHECK(col.size(0) > 0, "col.size(0) should greater than zero, got ",
+              col.size(0), ".");
+
+  // large tensor check
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(im.numel() < max_input_num,
+              "im.numel() should be less than 2147483648, got ", im.numel(),
+              ".");
+  TORCH_CHECK(col.numel() < max_input_num,
+              "col.numel() should be less than 2147483648, got ", col.numel(),
+              ".");
+
+  const int channels = im.size(1);
+  const int height = im.size(2);
+  const int width = im.size(3);
+  const int mask_cnt = mask_h_idx.size(0);
+
+  // auto im_t = im.permute({0, 2, 3, 1}).contiguous();
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(im.dim());
+  auto im_ = torch_mlu::cnnl::ops::cnnl_contiguous(im, memory_format);
+  auto col_ =
+      at::zeros({mask_cnt, kernel_h * kernel_w, channels}, col.options());
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(mask_cnt, &k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+  // get ptr of tensors
+  auto im_impl = torch_mlu::getMluTensorImpl(im_);
+  auto im_ptr = im_impl->cnnlMalloc();
+  auto mask_h_idx_impl = torch_mlu::getMluTensorImpl(mask_h_idx);
+  auto mask_h_idx_ptr = mask_h_idx_impl->cnnlMalloc();
+  auto mask_w_idx_impl = torch_mlu::getMluTensorImpl(mask_w_idx);
+  auto mask_w_idx_ptr = mask_w_idx_impl->cnnlMalloc();
+  auto col_impl = torch_mlu::getMluTensorImpl(col_);
+  auto col_ptr = col_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(im.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelMaskedIm2colForward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+  KernelMaskedIm2colForward(k_dim, k_type, queue, data_type, im_ptr, height,
+                            width, channels, kernel_h, kernel_w, pad_h, pad_w,
+                            mask_h_idx_ptr, mask_w_idx_ptr, mask_cnt, col_ptr);
+
+  col.copy_(col_.permute({2, 1, 0})
+                .reshape({channels * kernel_h * kernel_w, mask_cnt})
+                .contiguous());
+}
+
+void MaskedCol2imForwardMLUKernelLauncher(const Tensor col,
+                                          const Tensor mask_h_idx,
+                                          const Tensor mask_w_idx, Tensor im,
+                                          const int height, const int width,
+                                          const int channels) {
+  // Check dtype.
+  TORCH_CHECK(col.scalar_type() == at::kFloat || col.scalar_type() == at::kHalf,
+              "col type should be Float or Half, got ", col.scalar_type(), ".");
+  TORCH_CHECK(mask_h_idx.scalar_type() == at::kInt ||
+                  mask_h_idx.scalar_type() == at::kLong,
+              "mask_h_idx type should be Int or Long, got ",
+              mask_h_idx.scalar_type(), ".");
+  TORCH_CHECK(mask_w_idx.scalar_type() == at::kInt ||
+                  mask_w_idx.scalar_type() == at::kLong,
+              "mask_w_idx type should be Int or Long, got ",
+              mask_w_idx.scalar_type(), ".");
+
+  // zero element check
+  TORCH_CHECK(im.numel() > 0, "im.numel should greater than zero, got ",
+              im.numel(), ".");
+  TORCH_CHECK(col.size(0) > 0, "col.size(0) should greater than zero, got ",
+              col.size(0), ".");
+
+  // large tensor check
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(im.numel() < max_input_num,
+              "im.numel() should be less than 2147483648, got ", im.numel(),
+              ".");
+  TORCH_CHECK(col.numel() < max_input_num,
+              "col.numel() should be less than 2147483648, got ", col.numel(),
+              ".");
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(im.dim());
+  at::Tensor im_ =
+      at::empty({1, channels, height, width}, im.options(), memory_format)
+          .zero_();
+
+  auto col_t = torch_mlu::cnnl::ops::cnnl_contiguous(col.transpose(0, 1));
+
+  const int mask_cnt = mask_h_idx.size(0);
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(mask_cnt, &k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+  // get ptr of tensors
+  auto im_impl = torch_mlu::getMluTensorImpl(im_);
+  auto im_ptr = im_impl->cnnlMalloc();
+  auto mask_h_idx_impl = torch_mlu::getMluTensorImpl(mask_h_idx);
+  auto mask_h_idx_ptr = mask_h_idx_impl->cnnlMalloc();
+  auto mask_w_idx_impl = torch_mlu::getMluTensorImpl(mask_w_idx);
+  auto mask_w_idx_ptr = mask_w_idx_impl->cnnlMalloc();
+  auto col_t_impl = torch_mlu::getMluTensorImpl(col_t);
+  auto col_t_ptr = col_t_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(col.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelMaskedCol2imForward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelMaskedCol2imForward(k_dim, k_type, queue, data_type, col_t_ptr, height,
+                            width, channels, mask_h_idx_ptr, mask_w_idx_ptr,
+                            mask_cnt, im_ptr);
+
+  im.copy_(im_);
+}
+
+void masked_im2col_forward_mlu(const Tensor im, const Tensor mask_h_idx,
+                               const Tensor mask_w_idx, Tensor col,
+                               const int kernel_h, const int kernel_w,
+                               const int pad_h, const int pad_w) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kw), col: (kh * kw * ic, ow * oh)
+  MaskedIm2colForwardMLUKernelLauncher(im, mask_h_idx, mask_w_idx, col,
+                                       kernel_h, kernel_w, pad_h, pad_w);
+}
+
+void masked_col2im_forward_mlu(const Tensor col, const Tensor mask_h_idx,
+                               const Tensor mask_w_idx, Tensor im, int height,
+                               int width, int channels) {
+  // im: (n, ic, h, w), kernel size (kh, kw)
+  // kernel: (oc, ic * kh * kh), col: (kh * kw * ic, ow * oh)
+  MaskedCol2imForwardMLUKernelLauncher(col, mask_h_idx, mask_w_idx, im, height,
+                                       width, channels);
+}
+
+void masked_im2col_forward_impl(const Tensor im, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor col,
+                                const int kernel_h, const int kernel_w,
+                                const int pad_h, const int pad_w);
+
+void masked_col2im_forward_impl(const Tensor col, const Tensor mask_h_idx,
+                                const Tensor mask_w_idx, Tensor im, int height,
+                                int width, int channels);
+
+REGISTER_DEVICE_IMPL(masked_im2col_forward_impl, MLU,
+                     masked_im2col_forward_mlu);
+REGISTER_DEVICE_IMPL(masked_col2im_forward_impl, MLU,
+                     masked_col2im_forward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/ms_deform_attn_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/ms_deform_attn_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e93fd984aacf4352e5a5891e263cceb6d53bf5f7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/ms_deform_attn_mlu.cpp
@@ -0,0 +1,420 @@
+/*************************************************************************
+ * Copyright (C) 2022 by Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+#define MIN(a, b) (((a) < (b)) ? (a) : (b))
+
+void KernelMsDeformAttnForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const char* data_value_gdram,
+    const char* data_spatial_shapes_gdram,
+    const char* data_level_start_index_gdram,
+    const char* data_sampling_loc_gdram, const char* data_attn_weight_gdram,
+    const int32_t batch_size, const int32_t num_keys, const int32_t num_heads,
+    const int32_t channels, const int32_t num_levels, const int32_t num_queries,
+    const int32_t num_points, char* data_col_gdram);
+void KernelMsDeformAttnBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const float* data_value,
+    const int32_t* spatial_shapes, const int32_t* data_level_start_index,
+    const float* data_sampling_loc, const float* data_attn_weight,
+    const float* grad_output, const int32_t batch_size, const int32_t num_keys,
+    const int32_t num_heads, const int32_t channels, const int32_t num_levels,
+    const int32_t num_queries, const int32_t num_points, float* grad_value,
+    float* grad_sampling_loc, float* grad_attn_weight);
+// policy function
+static void policyFuncForward(cnrtDim3_t* k_dim, cnrtFunctionType_t* k_type,
+                              const int batch_size, const int num_queries,
+                              const int num_heads) {
+  k_dim->x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->y =
+      MIN((batch_size * num_queries * num_heads + k_dim->x - 1) / k_dim->x,
+          torch_mlu::getDeviceAttr(cnrtAttrClusterCount));
+  k_dim->z = 1;
+#if __BANG_ARCH__ == 520
+  *k_type = CNRT_FUNC_TYPE_BLOCK;
+#else
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+#endif
+}
+
+// policy function for backward
+static void policyFuncBackward(const int32_t batch_size,
+                               const int32_t num_queries,
+                               const int32_t num_heads,
+                               const int32_t num_levels,
+                               cnrtFunctionType_t* k_type, cnrtDim3_t* k_dim) {
+  size_t cluster_limit = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  size_t core_limit = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->x = core_limit;
+  int32_t total_num = batch_size * num_queries * num_heads * num_levels;
+  size_t total_num_align = CEIL_ALIGN(total_num, core_limit);
+  k_dim->y = (total_num_align / core_limit) > cluster_limit
+                 ? cluster_limit
+                 : (total_num_align / core_limit);
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+}
+
+Tensor ms_deform_attn_mlu_forward(const Tensor& value,
+                                  const Tensor& spatial_shapes,
+                                  const Tensor& level_start_index,
+                                  const Tensor& sampling_loc,
+                                  const Tensor& attn_weight,
+                                  const int im2col_step) {
+  // check contiguous
+  AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
+  AT_ASSERTM(spatial_shapes.is_contiguous(),
+             "spatial_shapes tensor has to be contiguous");
+  AT_ASSERTM(level_start_index.is_contiguous(),
+             "level_start_index tensor has to be contiguous");
+  AT_ASSERTM(sampling_loc.is_contiguous(),
+             "sampling_loc tensor has to be contiguous");
+  AT_ASSERTM(attn_weight.is_contiguous(),
+             "attn_weight tensor has to be contiguous");
+
+  // check datatype
+  TORCH_CHECK((value.scalar_type() == at::kFloat),
+              "value type should be Float, got ", value.scalar_type(), ".");
+  TORCH_CHECK((spatial_shapes.scalar_type() == at::kInt ||
+               spatial_shapes.scalar_type() == at::kLong),
+              "spatial_shapes type should be Int, got ",
+              spatial_shapes.scalar_type(), ".");
+  TORCH_CHECK((level_start_index.scalar_type() == at::kInt ||
+               level_start_index.scalar_type() == at::kLong),
+              "level_start_index type should be Int, got ",
+              level_start_index.scalar_type(), ".");
+  TORCH_CHECK((sampling_loc.scalar_type() == at::kFloat),
+              "sampling_loc type should be Float, got ",
+              sampling_loc.scalar_type(), ".");
+  TORCH_CHECK((attn_weight.scalar_type() == at::kFloat),
+              "attn_weight type should be Float, got ",
+              attn_weight.scalar_type(), ".");
+
+  // check shape
+  TORCH_CHECK(value.dim() == 4, "value should be a 4d tensor, got ",
+              value.dim(), "D.");
+  TORCH_CHECK(spatial_shapes.dim() == 2,
+              "spatial_shapes should be a 2d tensor, got ",
+              spatial_shapes.dim(), "D.");
+  TORCH_CHECK(level_start_index.dim() == 1,
+              "level_start_index should be a 1d tensor, got ",
+              level_start_index.dim(), "D.");
+  TORCH_CHECK(sampling_loc.dim() == 6,
+              "sampling_loc should be a 6d tensor, got ", sampling_loc.dim(),
+              "D.");
+  TORCH_CHECK(attn_weight.dim() == 5, "attn_weight should be a 5d tensor, got ",
+              attn_weight.dim(), "D.");
+
+  const int batch_size = value.size(0);
+  const int num_keys = value.size(1);
+  const int num_heads = value.size(2);
+  const int channels = value.size(3);
+  const int num_levels = spatial_shapes.size(0);
+  const int num_queries = sampling_loc.size(1);
+  const int num_points = sampling_loc.size(4);
+
+  TORCH_CHECK(spatial_shapes.size(1) == 2,
+              "the 2nd dimensions of spatial_shapes should be 2, got ",
+              spatial_shapes.size(1), ".");
+  TORCH_CHECK(sampling_loc.size(5) == 2,
+              "the 6th dimensions of sampling_loc should be 2, got ",
+              sampling_loc.size(5), ".");
+  TORCH_CHECK((sampling_loc.size(0) == batch_size),
+              "the 1st dimensions of sampling_loc should be batch_size, ",
+              "but now the 1st dimension of sampling_loc is ",
+              sampling_loc.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((attn_weight.size(0) == batch_size),
+              "the 1st dimensions of attn_weight should be batch_size, ",
+              "but now the 1st dimension of attn_weight is ",
+              attn_weight.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((sampling_loc.size(2) == num_heads),
+              "the 3rd dimensions of sampling_loc should be num_heads, ",
+              "but now the 3rd dimension of sampling_loc is ",
+              sampling_loc.size(2), ", and num_heads is ", num_heads, ".");
+  TORCH_CHECK((attn_weight.size(2) == num_heads),
+              "the 3rd dimensions of attn_weight should be num_heads, ",
+              "but now the 3rd dimension of attn_weight is ",
+              attn_weight.size(2), ", and num_heads is ", num_heads, ".");
+  TORCH_CHECK((level_start_index.size(0) == num_levels),
+              "the 1st dimensions of level_start_index should be num_levels, ",
+              "but now the 1st dimension of level_start_index is ",
+              level_start_index.size(0), ", and num_levels is ", num_levels,
+              ".");
+  TORCH_CHECK((sampling_loc.size(3) == num_levels),
+              "the 4th dimensions of sampling_loc should be num_levels, ",
+              "but now the 4th dimension of sampling_loc is ",
+              sampling_loc.size(3), ", and num_levels is ", num_levels, ".");
+  TORCH_CHECK((attn_weight.size(3) == num_levels),
+              "the 4th dimensions of attn_weight should be num_levels, ",
+              "but now the 4th dimension of attn_weight is ",
+              attn_weight.size(3), ", and num_levels is ", num_levels, ".");
+  TORCH_CHECK((attn_weight.size(1) == num_queries),
+              "the 2nd dimensions of attn_weight should be num_queries, ",
+              "but now the 2nd dimension of attn_weight is ",
+              attn_weight.size(1), ", and num_queries is ", num_queries, ".");
+  TORCH_CHECK((attn_weight.size(4) == num_points),
+              "the 5th dimensions of attn_weight should be num_points, ",
+              "but now the 5th dimension of attn_weight is ",
+              attn_weight.size(4), ", and num_points is ", num_points, ".");
+
+  auto output = at::zeros({batch_size, num_queries, num_heads, channels},
+                          value.options());
+
+  // large tensor check
+  const size_t max_input_size = 2147483648;
+  TORCH_CHECK(value.numel() < max_input_size,
+              "value element num should be less than 2^31, got ", value.numel(),
+              ".");
+  TORCH_CHECK(sampling_loc.numel() < max_input_size,
+              "sampling_loc element num should be less than 2^31, got ",
+              sampling_loc.numel(), ".");
+  TORCH_CHECK(output.numel() < max_input_size,
+              "output element num should be less than 2^31, got ",
+              output.numel(), ".");
+
+  // check zero element
+  TORCH_CHECK(batch_size != 0, "batch_size should not be zero");
+  TORCH_CHECK(num_heads != 0, "num_heads should not be zero");
+  TORCH_CHECK(channels != 0, "channels should not be zero");
+  TORCH_CHECK(num_queries != 0, "num_queries should not be zero");
+
+  if (num_keys == 0 || num_levels == 0 || num_points == 0) {
+    return output;
+  }
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncForward(&k_dim, &k_type, batch_size, num_queries, num_heads);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  auto spatial_shapes_ = spatial_shapes.to(at::kInt);
+  auto level_start_index_ = level_start_index.to(at::kInt);
+
+  // get ptr of tensors
+  auto value_impl = torch_mlu::getMluTensorImpl(value);
+  auto value_ptr = value_impl->cnnlMalloc();
+  auto spatial_shapes_impl = torch_mlu::getMluTensorImpl(spatial_shapes_);
+  auto spatial_shapes_ptr = spatial_shapes_impl->cnnlMalloc();
+  auto level_start_index_impl = torch_mlu::getMluTensorImpl(level_start_index_);
+  auto level_start_index_ptr = level_start_index_impl->cnnlMalloc();
+  auto sampling_loc_impl = torch_mlu::getMluTensorImpl(sampling_loc);
+  auto sampling_loc_ptr = sampling_loc_impl->cnnlMalloc();
+  auto attn_weight_impl = torch_mlu::getMluTensorImpl(attn_weight);
+  auto attn_weight_ptr = attn_weight_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  // get compute dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(value.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelMsDeformAttnForward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelMsDeformAttnForward(
+      k_dim, k_type, queue, data_type, (char*)value_ptr,
+      (char*)spatial_shapes_ptr, (char*)level_start_index_ptr,
+      (char*)sampling_loc_ptr, (char*)attn_weight_ptr, batch_size, num_keys,
+      num_heads, channels, num_levels, num_queries, num_points,
+      (char*)output_ptr);
+
+  output = output.view({batch_size, num_queries, num_heads * channels});
+  return output;
+}
+
+void ms_deform_attn_mlu_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight,
+    const int im2col_step) {
+  // check contiguous
+  AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
+  AT_ASSERTM(spatial_shapes.is_contiguous(),
+             "spatial_shapes tensor has to be contiguous");
+  AT_ASSERTM(level_start_index.is_contiguous(),
+             "level_start_index tensor has to be contiguous");
+  AT_ASSERTM(sampling_loc.is_contiguous(),
+             "sampling_loc tensor has to be contiguous");
+  AT_ASSERTM(attn_weight.is_contiguous(),
+             "attn_weight tensor has to be contiguous");
+  AT_ASSERTM(grad_output.is_contiguous(),
+             "grad_output tensor has to be contiguous");
+
+  // check datatype
+  TORCH_CHECK((value.scalar_type() == at::kFloat),
+              "value type should be Float, got ", value.scalar_type(), ".");
+  TORCH_CHECK((spatial_shapes.scalar_type() == at::kInt ||
+               spatial_shapes.scalar_type() == at::kLong),
+              "spatial_shapes type should be Int, got ",
+              spatial_shapes.scalar_type(), ".");
+  TORCH_CHECK((level_start_index.scalar_type() == at::kInt ||
+               level_start_index.scalar_type() == at::kLong),
+              "level_start_index type should be Int, got ",
+              level_start_index.scalar_type(), ".");
+  TORCH_CHECK((sampling_loc.scalar_type() == at::kFloat),
+              "sampling_loc type should be Float, got ",
+              sampling_loc.scalar_type(), ".");
+  TORCH_CHECK((attn_weight.scalar_type() == at::kFloat),
+              "attn_weight type should be Float, got ",
+              attn_weight.scalar_type(), ".");
+  TORCH_CHECK((grad_output.scalar_type() == at::kFloat),
+              "grad_output type should be Float, got ",
+              grad_output.scalar_type(), ".");
+
+  const int batch_size = value.size(0);
+  const int num_keys = value.size(1);
+  const int num_heads = value.size(2);
+  const int channels = value.size(3);
+  const int num_levels = spatial_shapes.size(0);
+  const int num_queries = sampling_loc.size(1);
+  const int num_points = sampling_loc.size(4);
+  // Check shape.
+  TORCH_CHECK(spatial_shapes.size(1) == 2,
+              "the 2nd dimensions of spatial_shapes should be 2, got ",
+              spatial_shapes.size(1), ".");
+
+  TORCH_CHECK((level_start_index.size(0) == num_levels),
+              "the 1st dimensions of level_start_index should be num_levels, ",
+              "but now the 1st dimension of level_start_index is ",
+              level_start_index.size(0), ", and num_levels is ", num_levels,
+              ".");
+
+  TORCH_CHECK((sampling_loc.size(0) == batch_size),
+              "the 1st dimensions of sampling_loc should be batch_size, ",
+              "but now the 1st dimension of sampling_loc is ",
+              sampling_loc.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((sampling_loc.size(2) == num_heads),
+              "the 3rd dimensions of sampling_loc should be num_heads, ",
+              "but now the 3rd dimension of sampling_loc is ",
+              sampling_loc.size(2), ", and num_heads is ", num_heads, ".");
+  TORCH_CHECK((sampling_loc.size(3) == num_levels),
+              "the 4th dimensions of sampling_loc should be num_levels, ",
+              "but now the 4th dimension of sampling_loc is ",
+              sampling_loc.size(3), ", and num_levels is ", num_levels, ".");
+  TORCH_CHECK(sampling_loc.size(5) == 2,
+              "the 6th dimensions of sampling_loc should be 2, got ",
+              sampling_loc.size(5), ".");
+
+  TORCH_CHECK((attn_weight.size(0) == batch_size),
+              "the 1st dimensions of attn_weight should be batch_size, ",
+              "but now the 1st dimension of attn_weight is ",
+              attn_weight.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((attn_weight.size(1) == num_queries),
+              "the 2nd dimensions of attn_weight should be num_queries, ",
+              "but now the 2nd dimension of attn_weight is ",
+              attn_weight.size(1), ", and num_queries is ", num_queries, ".");
+
+  TORCH_CHECK((attn_weight.size(2) == num_heads),
+              "the 3rd dimensions of attn_weight should be num_heads, ",
+              "but now the 3rd dimension of attn_weight is ",
+              attn_weight.size(2), ", and num_heads is ", num_heads, ".");
+  TORCH_CHECK((attn_weight.size(3) == num_levels),
+              "the 4th dimensions of attn_weight should be num_levels, ",
+              "but now the 4th dimension of attn_weight is ",
+              attn_weight.size(3), ", and num_levels is ", num_levels, ".");
+  TORCH_CHECK((attn_weight.size(4) == num_points),
+              "the 5th dimensions of attn_weight should be num_points, ",
+              "but now the 5th dimension of attn_weight is ",
+              attn_weight.size(4), ", and num_points is ", num_points, ".");
+
+  TORCH_CHECK((grad_output.size(0) == batch_size),
+              "the 1st dimensions of grad_output should be batch_size, ",
+              "but now the 1st dimension of grad_output is ",
+              grad_output.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((grad_output.size(1) == num_queries),
+              "the 2nd dimensions of grad_output should be num_queries, ",
+              "but now the 2nd dimension of grad_output is ",
+              grad_output.size(1), ", and num_queries is ", num_queries, ".");
+  TORCH_CHECK(
+      (grad_output.size(2) == num_heads * channels),
+      "the 3rd dimensions of grad_output should be num_heads * channels, ",
+      "but now the 3rd dimension of grad_output is ", grad_output.size(2),
+      ", and num_heads * channels is ", num_heads * channels, ".");
+
+  // check zero element
+  TORCH_CHECK(batch_size != 0, "The batch_size is zero.");
+  TORCH_CHECK(channels != 0, "The channels is zero.");
+  TORCH_CHECK(num_keys != 0, "The num_keys is zero.");
+  TORCH_CHECK(num_heads != 0, "The num_heads is zero.");
+  TORCH_CHECK(num_queries != 0, "The num_queries is zero.");
+  if (num_levels == 0 || num_points == 0) {
+    return;
+  }
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncBackward(batch_size, num_queries, num_heads, num_levels, &k_type,
+                     &k_dim);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto value_impl = torch_mlu::getMluTensorImpl(value);
+  auto value_ptr = value_impl->cnnlMalloc();
+  auto spatial_shapes_impl = torch_mlu::getMluTensorImpl(spatial_shapes);
+  auto spatial_shapes_ptr = spatial_shapes_impl->cnnlMalloc();
+  auto level_start_index_impl = torch_mlu::getMluTensorImpl(level_start_index);
+  auto level_start_index_ptr = level_start_index_impl->cnnlMalloc();
+  auto sampling_loc_impl = torch_mlu::getMluTensorImpl(sampling_loc);
+  auto sampling_loc_ptr = sampling_loc_impl->cnnlMalloc();
+  auto attn_weight_impl = torch_mlu::getMluTensorImpl(attn_weight);
+  auto attn_weight_ptr = attn_weight_impl->cnnlMalloc();
+  auto grad_output_impl = torch_mlu::getMluTensorImpl(grad_output);
+  auto grad_output_ptr = grad_output_impl->cnnlMalloc();
+  auto grad_value_impl = torch_mlu::getMluTensorImpl(grad_value);
+  auto grad_value_ptr = grad_value_impl->cnnlMalloc();
+  auto grad_sampling_loc_impl = torch_mlu::getMluTensorImpl(grad_sampling_loc);
+  auto grad_sampling_loc_ptr = grad_sampling_loc_impl->cnnlMalloc();
+  auto grad_attn_weight_impl = torch_mlu::getMluTensorImpl(grad_attn_weight);
+  auto grad_attn_weight_ptr = grad_attn_weight_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(value.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelMsDeformAttnBackward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelMsDeformAttnBackward(
+      k_dim, k_type, queue, data_type, (float*)value_ptr,
+      (int32_t*)spatial_shapes_ptr, (int32_t*)level_start_index_ptr,
+      (float*)sampling_loc_ptr, (float*)attn_weight_ptr,
+      (float*)grad_output_ptr, batch_size, num_keys, num_heads, channels,
+      num_levels, num_queries, num_points, (float*)grad_value_ptr,
+      (float*)grad_sampling_loc_ptr, (float*)grad_attn_weight_ptr);
+}
+
+Tensor ms_deform_attn_impl_forward(const Tensor& value,
+                                   const Tensor& spatial_shapes,
+                                   const Tensor& level_start_index,
+                                   const Tensor& sampling_loc,
+                                   const Tensor& attn_weight,
+                                   const int im2col_step);
+
+void ms_deform_attn_impl_backward(
+    const Tensor& value, const Tensor& spatial_shapes,
+    const Tensor& level_start_index, const Tensor& sampling_loc,
+    const Tensor& attn_weight, const Tensor& grad_output, Tensor& grad_value,
+    Tensor& grad_sampling_loc, Tensor& grad_attn_weight, const int im2col_step);
+
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_forward, MLU,
+                     ms_deform_attn_mlu_forward);
+REGISTER_DEVICE_IMPL(ms_deform_attn_impl_backward, MLU,
+                     ms_deform_attn_mlu_backward);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/nms_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/nms_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e2f4322a0257db384cf0e63763174c96c9915778
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/nms_mlu.cpp
@@ -0,0 +1,156 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelNms(cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+               const cnrtDataType_t data_type_input, const void *boxes_ptr,
+               const void *scores_ptr, const int input_num_boxes,
+               const int max_output_boxes, const float iou_threshold,
+               const float offset, void *workspace_ptr, void *output_size_ptr,
+               void *output_ptr);
+
+int selectUnionType(uint32_t use_job, int box_num_per_core) {
+  // the box_num_per_core should be at least 256, otherwise the real IO
+  // bandwidth would be very low
+  while (box_num_per_core < 256 && use_job >= 4) {
+    box_num_per_core *= 2;
+    use_job /= 2;
+  }
+  return use_job;
+}
+
+static cnnlStatus_t policyFunc(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type,
+                               int &core_num_per_class,
+                               const int input_box_num) {
+  uint32_t core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  uint32_t cluster_number = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  uint32_t job_limit = getJobLimitCapability();
+  uint32_t core_number = job_limit;
+
+  int box_num_per_core = (input_box_num + core_number - 1) / core_number;
+  int use_job = selectUnionType(job_limit, box_num_per_core);
+  // initiate k_type as Union1
+  k_dim->x = core_dim;
+  k_dim->y = 1;
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  switch (job_limit) {
+    case CN_KERNEL_CLASS_BLOCK:
+    case CN_KERNEL_CLASS_UNION:
+    case CN_KERNEL_CLASS_UNION2:
+    case CN_KERNEL_CLASS_UNION4:
+    case CN_KERNEL_CLASS_UNION8:
+    case CN_KERNEL_CLASS_UNION16: {
+      if (use_job < 4) {
+        k_dim->x = 1;
+        *k_type = CNRT_FUNC_TYPE_BLOCK;
+      } else if (use_job == 4) {
+        k_dim->x = core_dim;
+        *k_type = CNRT_FUNC_TYPE_UNION1;
+      } else {
+        k_dim->x = use_job;
+        *k_type = (cnrtFunctionType_t)use_job;
+      }
+    }; break;
+    default:
+      LOG(WARNING) << "[cnnlNms_v2]: got unsupported job limit number."
+                   << " Use default CN_KERNEL_CLASS_UNION1 with UNION1 task.";
+  }
+  return CNNL_STATUS_SUCCESS;
+}
+
+Tensor NMSMLUKernelLauncher(Tensor boxes, Tensor scores, float iou_threshold,
+                            int offset) {
+  // dimension parameters check
+  TORCH_CHECK(boxes.dim() == 2, "boxes should be a 2d tensor, got ",
+              boxes.dim(), "D");
+  TORCH_CHECK(boxes.size(1) == 4,
+              "boxes should have 4 elements in dimension 1, got ",
+              boxes.size(1));
+  TORCH_CHECK(scores.dim() == 1, "scores should be a 1d tensor, got ",
+              scores.dim(), "D");
+
+  // data type check
+  TORCH_CHECK(boxes.scalar_type() == scores.scalar_type(),
+              "boxes should have the same type as scores");
+  TORCH_CHECK(
+      boxes.scalar_type() == at::kFloat || boxes.scalar_type() == at::kHalf,
+      "data type of boxes should be Float or Half, got ", boxes.scalar_type());
+
+  if (boxes.numel() == 0) {
+    return at::empty({0}, boxes.options().dtype(at::kLong));
+  }
+
+  int input_num_boxes = boxes.size(0);
+  int max_output_boxes = boxes.size(0);
+
+  cnrtDataType_t data_type_input = torch_mlu::toCnrtDtype(boxes.dtype());
+  cnrtDim3_t k_dim;
+  cnrtJobType_t k_type;
+
+  int core_num_per_class;
+  policyFunc(&k_dim, &k_type, core_num_per_class, input_num_boxes);
+
+  // transpose boxes (n, 4) to (4, n) for better performance
+  auto boxes_t = boxes.transpose(0, 1);
+  auto boxes_ = torch_mlu::cnnl::ops::cnnl_contiguous(boxes_t);
+  auto scores_ = torch_mlu::cnnl::ops::cnnl_contiguous(scores);
+  auto output = at::empty({max_output_boxes}, boxes.options().dtype(at::kLong));
+  auto output_size = at::empty({1}, scores.options().dtype(at::kInt));
+
+  // workspace
+  const int info_num = 5;  // x1, x2, y1, y2 and score
+  size_t space_size = 0;
+  if (boxes.scalar_type() == at::kHalf) {
+    space_size = input_num_boxes * sizeof(int16_t) * info_num + sizeof(float);
+  } else {
+    space_size = input_num_boxes * sizeof(float) * info_num + sizeof(float);
+  }
+#if __BANG_ARCH__ > 370
+  int cluster_num = getCoreNumOfJobLimitCapability() /
+                    torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  space_size += cluster_number * sizeof(float) * 7;
+#endif
+  auto workspace = at::empty(space_size, boxes.options().dtype(at::kByte));
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  auto boxes_impl = torch_mlu::getMluTensorImpl(boxes_);
+  auto boxes_ptr = boxes_impl->cnnlMalloc();
+  auto scores_impl = torch_mlu::getMluTensorImpl(scores_);
+  auto scores_ptr = scores_impl->cnnlMalloc();
+  auto workspace_impl = torch_mlu::getMluTensorImpl(workspace);
+  auto workspace_ptr = workspace_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output);
+  auto output_ptr = output_impl->cnnlMalloc();
+  auto output_size_impl = torch_mlu::getMluTensorImpl(output_size);
+  auto output_size_ptr = output_size_impl->cnnlMalloc();
+
+  uint32_t core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  CNLOG(INFO) << "Launch Kernel MLUUnionX NMS<<<Union" << k_type / core_dim
+              << ", " << k_dim.x << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+  KernelNms(k_dim, k_type, queue, data_type_input, boxes_ptr, scores_ptr,
+            input_num_boxes, max_output_boxes, iou_threshold, offset,
+            workspace_ptr, output_size_ptr, output_ptr);
+  int output_num = *static_cast<int *>(output_size.cpu().data_ptr());
+  return output.slice(0, 0, output_num);
+}
+
+Tensor nms_mlu(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return NMSMLUKernelLauncher(boxes, scores, iou_threshold, offset);
+}
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+REGISTER_DEVICE_IMPL(nms_impl, MLU, nms_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/psamask_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/psamask_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..87077b5c48e8751a3743b1ad068a76fa8651dc56
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/psamask_mlu.cpp
@@ -0,0 +1,308 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include <algorithm>
+
+#include "psamask_utils.hpp"
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+#define COMPUTE_COUNT_ALIGN 64
+
+void KernelPsamaskForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *x, void *y, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int x_c, const int y_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg);
+
+void KernelPsamaskBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *dy, void *dx, const PsamaskType psa_type,
+    const DimPartitionType core_partition,
+    const DimPartitionType cluster_partition, const int batch,
+    const int h_feature, const int w_feature, const int h_mask,
+    const int w_mask, const int dx_c, const int dy_c, const int half_h_mask,
+    const int half_w_mask, const int n_per_core, const int h_per_core,
+    const int n_per_cluster, const int h_per_cluster, const int limit_n_seg,
+    const int limit_h_seg, const int limit_w_seg);
+
+namespace {
+void policyFunc(cnrtDim3_t *k_dim_ptr, cnrtFunctionType_t *f_type_ptr,
+                PartitionSeg *partition_ptr, const int n, const int h_feature) {
+  unsigned int core_dim = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  unsigned int cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  unsigned int use_cluster_num = cluster_num;
+  unsigned int use_core_num = core_dim;
+
+  if (n >= cluster_num || n >= h_feature) {
+    partition_ptr->cluster_partition = PARTITION_N;
+    partition_ptr->n_per_cluster = (n + cluster_num - 1) / cluster_num;
+    partition_ptr->h_per_cluster = h_feature;
+    use_cluster_num =
+        (n + partition_ptr->n_per_cluster - 1) / partition_ptr->n_per_cluster;
+  } else {
+    partition_ptr->cluster_partition = PARTITION_H;
+    partition_ptr->h_per_cluster = (h_feature + cluster_num - 1) / cluster_num;
+    partition_ptr->n_per_cluster = n;
+    use_cluster_num = (h_feature + partition_ptr->h_per_cluster - 1) /
+                      partition_ptr->h_per_cluster;
+  }
+
+  if (partition_ptr->n_per_cluster >= core_dim ||
+      partition_ptr->n_per_cluster >= partition_ptr->h_per_cluster) {
+    partition_ptr->core_partition = PARTITION_N;
+    partition_ptr->n_per_core =
+        (partition_ptr->n_per_cluster + core_dim - 1) / core_dim;
+    partition_ptr->h_per_core = partition_ptr->h_per_cluster;
+    use_core_num =
+        (partition_ptr->n_per_cluster + partition_ptr->n_per_core - 1) /
+        partition_ptr->n_per_core;
+  } else {
+    partition_ptr->core_partition = PARTITION_H;
+    partition_ptr->h_per_core =
+        (partition_ptr->h_per_cluster + core_dim - 1) / core_dim;
+    partition_ptr->n_per_core = partition_ptr->n_per_cluster;
+    use_core_num =
+        (partition_ptr->h_per_cluster + partition_ptr->h_per_core - 1) /
+        partition_ptr->h_per_core;
+  }
+  *k_dim_ptr = {core_dim, use_cluster_num, 1};
+}
+
+}  // namespace
+
+bool findLimit(const int shape_core_n, const int shape_core_h,
+               const int shape_core_w, const int shape_core_ci,
+               const int shape_core_co, int *limit_n_seg_ptr,
+               int *limit_h_seg_ptr, int *limit_w_seg_ptr, const int psa_type) {
+  const bool need_temp = psa_type == 1;
+  const int input_bytes = sizeof(float);
+  int limit_n_seg = shape_core_n;
+  int limit_h_seg = shape_core_h;
+  int limit_w_seg = shape_core_w;
+
+  const int max_nram_size = torch_mlu::getDeviceAttr(cnrtAttrNramSizePerMcore);
+  const int align_base_128 = NFU_ALIGN_SIZE / input_bytes;
+  const int align_base_64 = COMPUTE_COUNT_ALIGN / input_bytes;
+  const int align_co = CEIL_ALIGN(shape_core_co, align_base_64);
+  const int align_w = CEIL_ALIGN(shape_core_w, align_base_64);
+  const int align_hw = CEIL_ALIGN(shape_core_h * shape_core_w, align_base_64);
+  const int max_num = max_nram_size / input_bytes;
+
+  int n_limit =
+      max_num /
+      (CEIL_ALIGN(shape_core_h * shape_core_w * shape_core_ci, align_base_128) +
+       align_hw * align_co * (1 + need_temp));
+  if (n_limit > 0) {
+    n_limit = std::min(n_limit, shape_core_n);
+    limit_n_seg = n_limit;
+  } else {
+    int h_limit =
+        max_num / (CEIL_ALIGN(shape_core_w * shape_core_ci, align_base_128) +
+                   align_w * align_co * (1 + need_temp));
+    if (h_limit > 0) {
+      h_limit = std::min(h_limit, shape_core_h);
+      limit_h_seg = h_limit;
+      limit_n_seg = 1;
+    } else {
+      int w_limit =
+          max_num / (CEIL_ALIGN(shape_core_ci, align_base_128) +
+                     CEIL_ALIGN(align_co, align_base_128) * (1 + need_temp));
+      if (w_limit > 0 && w_limit >= (COMPUTE_COUNT_ALIGN / input_bytes)) {
+        w_limit = std::min(w_limit, shape_core_w);
+        w_limit = w_limit / (COMPUTE_COUNT_ALIGN / input_bytes) *
+                  (COMPUTE_COUNT_ALIGN / input_bytes);
+        limit_w_seg = w_limit;
+        limit_h_seg = 1;
+        limit_n_seg = 1;
+      } else {
+        CNLOG(INFO) << "The size of input channel is too large.";
+        return false;
+      }
+    }
+  }
+  *limit_n_seg_ptr = limit_n_seg;
+  *limit_h_seg_ptr = limit_h_seg;
+  *limit_w_seg_ptr = limit_w_seg;
+  return true;
+}
+
+void PSAMaskForwardMLUKernelLauncher(const int psa_type, const Tensor x,
+                                     Tensor y, const int num_,
+                                     const int h_feature, const int w_feature,
+                                     const int h_mask, const int w_mask,
+                                     const int half_h_mask,
+                                     const int half_w_mask) {
+  // params check
+  TORCH_CHECK(x.scalar_type() == at::kFloat, "x type should be Float, got ",
+              x.scalar_type());
+  TORCH_CHECK(y.scalar_type() == x.scalar_type(),
+              "y should have the same type as x");
+  TORCH_CHECK(x.dim() == 4, "x should be a 4d tensor, got ", x.dim(), "D");
+  TORCH_CHECK(y.dim() == 4, "y should be a 4d tensor, got ", y.dim(), "D");
+
+  int x_c = x.size(1);
+  int y_c = y.size(1);
+  TORCH_CHECK(h_mask * w_mask == x_c,
+              "channel of x should be the same as h_mask * w_mask");
+  TORCH_CHECK(h_feature * w_feature == y_c,
+              "channel of y should be the same as h_feature * w_feature");
+  TORCH_CHECK(psa_type == 0 || psa_type == 1,
+              "psa_type only supports 'COLLECT' and 'DISTRIBUTE' currently");
+
+  if (x.numel() == 0) {
+    CNLOG(INFO) << "skip zero-element tensor";
+    return;
+  }
+
+  cnrtFunctionType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  cnrtDim3_t k_dim;
+  PartitionSeg partition_info;
+  policyFunc(&k_dim, &k_type, &partition_info, num_, h_feature);
+  int n_limit_seg, h_limit_seg, w_limit_seg;
+  bool ret =
+      findLimit(partition_info.n_per_core, partition_info.h_per_core, w_feature,
+                x_c, y_c, &n_limit_seg, &h_limit_seg, &w_limit_seg, psa_type);
+  if (ret != true) {
+    return;
+  }
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(x.dim());
+  auto x_tensor = torch_mlu::cnnl::ops::cnnl_contiguous(x, memory_format);
+  at::Tensor y_tmp =
+      at::empty({num_, y_c, h_feature, w_feature}, x.options(), memory_format);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto x_impl = torch_mlu::getMluTensorImpl(x_tensor);
+  auto x_ptr = x_impl->cnnlMalloc();
+  auto y_impl = torch_mlu::getMluTensorImpl(y_tmp);
+  auto y_ptr = y_impl->cnnlMalloc();
+
+  KernelPsamaskForward(
+      k_dim, k_type, queue, x_ptr, y_ptr, (PsamaskType)psa_type,
+      partition_info.core_partition, partition_info.cluster_partition, num_,
+      h_feature, w_feature, h_mask, w_mask, x_c, y_c, half_h_mask, half_w_mask,
+      partition_info.n_per_core, partition_info.h_per_core,
+      partition_info.n_per_cluster, partition_info.h_per_cluster, n_limit_seg,
+      h_limit_seg, w_limit_seg);
+
+  y.copy_(y_tmp);
+}
+
+void PSAMaskBackwardMLUKernelLauncher(const int psa_type, const Tensor dy,
+                                      Tensor dx, const int num_,
+                                      const int h_feature, const int w_feature,
+                                      const int h_mask, const int w_mask,
+                                      const int half_h_mask,
+                                      const int half_w_mask) {
+  // params check
+  TORCH_CHECK(dy.scalar_type() == at::kFloat, "dy type should be Float, got ",
+              dy.scalar_type());
+  TORCH_CHECK(dx.scalar_type() == dy.scalar_type(),
+              "dx should have the same type as dy");
+  TORCH_CHECK(dy.dim() == 4, "dy should be a 4d tensor, got ", dy.dim(), "D");
+  TORCH_CHECK(dx.dim() == 4, "dx should be a 4d tensor, got ", dx.dim(), "D");
+
+  int dy_c = dy.size(1);
+  int dx_c = dx.size(1);
+  TORCH_CHECK(h_feature * w_feature == dy_c,
+              "channel of dy should be the same as h_feature * w_feature");
+  TORCH_CHECK(h_mask * w_mask == dx_c,
+              "channel of dx should be the same as h_mask * w_mask");
+  TORCH_CHECK(psa_type == 0 || psa_type == 1,
+              "psa_type only supports 'COLLECT' and 'DISTRIBUTE' currently");
+
+  if (dx.numel() == 0) {
+    CNLOG(INFO) << "skip zero-element tensor";
+    return;
+  }
+
+  cnrtFunctionType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  cnrtDim3_t k_dim;
+  PartitionSeg partition_info;
+  policyFunc(&k_dim, &k_type, &partition_info, num_, h_feature);
+  int n_limit_seg, h_limit_seg, w_limit_seg;
+  bool ret =
+      findLimit(partition_info.n_per_core, partition_info.h_per_core, w_feature,
+                dx_c, dy_c, &n_limit_seg, &h_limit_seg, &w_limit_seg, psa_type);
+  if (ret != true) {
+    return;
+  }
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(dy.dim());
+  auto dy_tensor = torch_mlu::cnnl::ops::cnnl_contiguous(dy, memory_format);
+  at::Tensor dx_tmp = at::empty({num_, dx_c, h_feature, w_feature},
+                                dy.options(), memory_format);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto dx_impl = torch_mlu::getMluTensorImpl(dx_tmp);
+  auto dx_ptr = dx_impl->cnnlMalloc();
+  auto dy_impl = torch_mlu::getMluTensorImpl(dy_tensor);
+  auto dy_ptr = dy_impl->cnnlMalloc();
+
+  KernelPsamaskBackward(
+      k_dim, k_type, queue, dy_ptr, dx_ptr, (PsamaskType)psa_type,
+      partition_info.core_partition, partition_info.cluster_partition, num_,
+      h_feature, w_feature, h_mask, w_mask, dx_c, dy_c, half_h_mask,
+      half_w_mask, partition_info.n_per_core, partition_info.h_per_core,
+      partition_info.n_per_cluster, partition_info.h_per_cluster, n_limit_seg,
+      h_limit_seg, w_limit_seg);
+
+  dx.copy_(dx_tmp);
+}
+
+void psamask_forward_mlu(const int psa_type, const Tensor input, Tensor output,
+                         const int num_, const int h_feature,
+                         const int w_feature, const int h_mask,
+                         const int w_mask, const int half_h_mask,
+                         const int half_w_mask) {
+  PSAMaskForwardMLUKernelLauncher(psa_type, input, output, num_, h_feature,
+                                  w_feature, h_mask, w_mask, half_h_mask,
+                                  half_w_mask);
+}
+
+void psamask_backward_mlu(const int psa_type, const Tensor grad_output,
+                          Tensor grad_input, const int num_,
+                          const int h_feature, const int w_feature,
+                          const int h_mask, const int w_mask,
+                          const int half_h_mask, const int half_w_mask) {
+  PSAMaskBackwardMLUKernelLauncher(psa_type, grad_output, grad_input, num_,
+                                   h_feature, w_feature, h_mask, w_mask,
+                                   half_h_mask, half_w_mask);
+}
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask);
+
+REGISTER_DEVICE_IMPL(psamask_forward_impl, MLU, psamask_forward_mlu);
+REGISTER_DEVICE_IMPL(psamask_backward_impl, MLU, psamask_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..361bba25f61b4b46b5cd8be57650af174239d76e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_mlu.cpp
@@ -0,0 +1,206 @@
+/*************************************************************************
+ * Copyright (C) 2021 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelRoiAlign(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                    cnrtQueue_t queue, const cnrtDataType_t d_type,
+                    const void *input, const void *rois, const int channels,
+                    const bool aligned, const int pooled_height,
+                    const int pooled_width, const int input_height,
+                    const int input_width, const int sampling_ratio,
+                    const float spatial_scale, const int num_rois,
+                    void *output);
+
+void KernelRoiAlignBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                            cnrtQueue_t queue, const cnrtDataType_t dtype,
+                            const void *grads, const void *boxes,
+                            void *grads_image, const int boxes_num,
+                            const int hi, const int wi, const int c,
+                            const int no, const int ho, const int wo,
+                            const float spatial_scale, const int sampling_ratio,
+                            const bool aligned);
+
+void ROIAlignForwardMLUKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                      Tensor argmax_y, Tensor argmax_x,
+                                      int aligned_height, int aligned_width,
+                                      float spatial_scale, int sampling_ratio,
+                                      int pool_mode, bool aligned) {
+  // params check
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "input type should be Float or Half, got ", input.scalar_type());
+  TORCH_CHECK(rois.scalar_type() == input.scalar_type(),
+              "rois should have the same type as input");
+  TORCH_CHECK(input.dim() == 4, "input should be a 4d tensor, got ",
+              input.dim(), "D");
+  TORCH_CHECK(rois.dim() == 2, "rois should be a 2d tensor, got ", rois.dim(),
+              "D");
+  TORCH_CHECK(pool_mode == 1, "pool_mode only supports 'avg' currently");
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto input_tensor =
+      torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format);
+
+  auto num_rois = rois.size(0);
+  auto channels = input.size(1);
+  int height = input.size(2);
+  int width = input.size(3);
+
+  if (output.numel() == 0) {
+    output = at::zeros({num_rois, channels, aligned_height, aligned_width},
+                       input.options());
+    return;
+  }
+
+  at::Tensor output_tmp =
+      at::empty({num_rois, channels, aligned_height, aligned_width},
+                input.options(), memory_format);
+
+  // get tensor impl
+  auto self_impl = torch_mlu::getMluTensorImpl(input_tensor);
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto output_impl = torch_mlu::getMluTensorImpl(output_tmp);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get the mlu ptr
+  auto self_ptr = self_impl->cnnlMalloc();
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  cnrtJobType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  cnrtDim3_t k_dim;
+  k_dim.x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim.y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim.z = 1;
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(input.dtype());
+
+  KernelRoiAlign(k_dim, k_type, queue, data_type, self_ptr, rois_ptr, channels,
+                 aligned, aligned_height, aligned_width, height, width,
+                 sampling_ratio, spatial_scale, num_rois, output_ptr);
+
+  output.copy_(output_tmp);
+}
+
+static int nearestPower2(int x) {
+  x--;
+  x |= x >> 1;
+  x |= x >> 2;
+  x |= x >> 4;
+  x |= x >> 8;
+  x |= x >> 16;
+  x++;
+  return x;
+}
+
+void ROIAlignBackwardMLUKernelLauncher(Tensor grad, Tensor rois,
+                                       Tensor argmax_y, Tensor argmax_x,
+                                       Tensor grad_input, int aligned_height,
+                                       int aligned_width, float spatial_scale,
+                                       int sampling_ratio, int pool_mode,
+                                       bool aligned) {
+  // params check
+  TORCH_CHECK(
+      grad.scalar_type() == at::kFloat || grad.scalar_type() == at::kHalf,
+      "grad type should be Float or Half, got ", grad.scalar_type());
+  TORCH_CHECK(rois.scalar_type() == grad.scalar_type(),
+              "rois should have the same type as grad");
+  TORCH_CHECK(grad.dim() == 4, "grad should be a 4d tensor, got ", grad.dim(),
+              "D");
+  TORCH_CHECK(rois.dim() == 2, "rois should be a 2d tensor, got ", rois.dim(),
+              "D");
+  TORCH_CHECK(pool_mode == 1, "pool_mode only supports 'avg' currently");
+
+  int batch_size = grad_input.size(0);
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad.dim());
+  auto grad_ = torch_mlu::cnnl::ops::cnnl_contiguous(grad, memory_format);
+  auto grad_input_ = at::empty({batch_size, channels, height, width},
+                               grad.options(), memory_format)
+                         .zero_();
+
+  int boxes_num = rois.size(0);
+  int hi = grad.size(2);
+  int wi = grad.size(3);
+  int c = grad.size(1);
+
+  int no = grad_input.size(0);
+  int ho = grad_input.size(2);
+  int wo = grad_input.size(3);
+
+  // get tensor impl
+  auto grad_impl = torch_mlu::getMluTensorImpl(grad_);
+  auto grad_input_impl = torch_mlu::getMluTensorImpl(grad_input_);
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get the mlu ptr
+  auto grad_ptr = grad_impl->cnnlMalloc();
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto grad_input_ptr = grad_input_impl->cnnlMalloc();
+
+  cnrtJobType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  int need_core = nearestPower2(boxes_num);
+  int union_number = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  uint32_t dim_x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  uint32_t dim_y = (need_core - 1) / dim_x + 1;
+  dim_y = (dim_y > union_number) ? union_number : dim_y;
+  cnrtDim3_t k_dim = {dim_x, dim_y, 1};
+  cnrtDataType_t k_dtype = torch_mlu::toCnrtDtype(grad.dtype());
+
+  KernelRoiAlignBackward(k_dim, k_type, queue, k_dtype, grad_ptr, rois_ptr,
+                         grad_input_ptr, boxes_num, hi, wi, c, no, ho, wo,
+                         spatial_scale, sampling_ratio, aligned);
+  grad_input.copy_(grad_input_);
+}
+
+void roi_align_forward_mlu(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                           int aligned_width, float spatial_scale,
+                           int sampling_ratio, int pool_mode, bool aligned) {
+  ROIAlignForwardMLUKernelLauncher(input, rois, output, argmax_y, argmax_x,
+                                   aligned_height, aligned_width, spatial_scale,
+                                   sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_mlu(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                            Tensor argmax_x, Tensor grad_input,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  ROIAlignBackwardMLUKernelLauncher(
+      grad_output, rois, argmax_y, argmax_x, grad_input, aligned_height,
+      aligned_width, spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned);
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned);
+
+REGISTER_DEVICE_IMPL(roi_align_forward_impl, MLU, roi_align_forward_mlu);
+REGISTER_DEVICE_IMPL(roi_align_backward_impl, MLU, roi_align_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_rotated_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_rotated_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c3058c01f5d5476ee37abee4b79a5f5001f51c16
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_align_rotated_mlu.cpp
@@ -0,0 +1,232 @@
+/*************************************************************************
+ * Copyright (C) 2022 by Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+#include "roi_align_rotated_utils.hpp"
+
+namespace {
+
+void policyFunc(int bin_num, cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  unsigned int core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  unsigned int cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_num;
+  unsigned int use_cluster = (bin_num + core_num - 1) / core_num;
+  k_dim->y = use_cluster > cluster_num ? cluster_num : use_cluster;
+  k_dim->z = 1;
+}
+
+}  // namespace
+
+void KernelRoiAlignRotatedForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const void *features, const void *rois,
+    void *output, const int batch, const int height, const int width,
+    const int channel, const int rois_num,
+    const RoiAlignRotatedParams roiAlignRotatedParams);
+
+void KernelRoiAlignRotatedBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const void *top_grad, const void *rois,
+    void *bottom_grad, const int batch, const int height, const int width,
+    const int channel, const int rois_num,
+    const RoiAlignRotatedParams roiAlignRotatedParams);
+
+void ROIAlignRotatedForwardMLUKernelLauncher(Tensor input, Tensor rois,
+                                             Tensor output, int pooled_height,
+                                             int pooled_width,
+                                             float spatial_scale,
+                                             int sampling_ratio, bool aligned,
+                                             bool clockwise) {
+  TORCH_CHECK(((input.scalar_type() == output.scalar_type()) &&
+               (output.scalar_type() == rois.scalar_type())),
+              "data types of input, rois and output should be the same, ",
+              "but now input type is ", input.scalar_type(), ", rois type is ",
+              rois.scalar_type(), ", output type is ", output.scalar_type(),
+              ".");
+  TORCH_CHECK(
+      (input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf),
+      "input type should be Float or Half, got ", input.scalar_type(), ".");
+
+  TORCH_CHECK(input.dim() == 4, "input should be a 4d tensor, got ",
+              input.dim(), "D.");
+  TORCH_CHECK(rois.dim() == 2, "rois should be a 2d tensor, got ", rois.dim(),
+              "D.");
+  TORCH_CHECK(output.dim() == 4, "output should be a 4d tensor, got ",
+              output.dim(), "D.");
+
+  TORCH_CHECK((rois.size(0) == output.size(0)),
+              "the 1st dimensions of rois and output should be the same, ",
+              "but now the 1st dimension of rois is ", rois.size(0),
+              ", and output is ", output.size(0), ".");
+
+  TORCH_CHECK((input.size(1) == output.size(1)),
+              "the 2nd dimensions of input and output should be the same, ",
+              "but now the 2nd dimension of input is ", input.size(1),
+              ", and output is ", output.size(1), ".");
+
+  int channel = input.size(1);
+  int width = input.size(3);
+  int height = input.size(2);
+  int batch = input.size(0);
+  int rois_nums = rois.size(0);
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(input.dtype());
+
+  // return if zero-elements
+  if (input.numel() == 0) {
+    CNLOG(INFO) << "Skip the zero-elements case.";
+    return;
+  }
+
+  RoiAlignRotatedParams roiAlignRotatedParams{pooled_height,  pooled_width,
+                                              sampling_ratio, spatial_scale,
+                                              aligned,        clockwise};
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(rois_nums * pooled_height * pooled_width, &k_dim, &k_type);
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto input_tensor =
+      torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format);
+  at::Tensor output_tmp =
+      at::empty({rois_nums, channel, pooled_height, pooled_width},
+                input.options(), memory_format);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(input_tensor);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output_tmp);
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  KernelRoiAlignRotatedForward(k_dim, k_type, queue, d_type, input_ptr,
+                               rois_ptr, output_ptr, batch, height, width,
+                               channel, rois_nums, roiAlignRotatedParams);
+  output.copy_(output_tmp);
+}
+
+void ROIAlignRotatedBackwardMLUKernelLauncher(
+    Tensor top_grad, Tensor rois, Tensor bottom_grad, int pooled_height,
+    int pooled_width, float spatial_scale, int sampling_ratio, bool aligned,
+    bool clockwise) {
+  TORCH_CHECK(((top_grad.scalar_type() == bottom_grad.scalar_type()) &&
+               (bottom_grad.scalar_type() == rois.scalar_type())),
+              "data types of top_grad, rois and bottom_grad should be ",
+              "the same, but now top_grad type is ", top_grad.scalar_type(),
+              ", rois type is ", rois.scalar_type(), ", bottom_grad type is ",
+              bottom_grad.scalar_type(), ".");
+  TORCH_CHECK((bottom_grad.scalar_type() == at::kFloat ||
+               bottom_grad.scalar_type() == at::kHalf),
+              "Data type of bottom_grad should be Float ro Half, got ",
+              bottom_grad.scalar_type(), ".");
+
+  TORCH_CHECK(bottom_grad.dim() == 4, "bottom_grad should be a 4d tensor, got ",
+              top_grad.dim(), "D.");
+  TORCH_CHECK(rois.dim() == 2, "rois should be a 2d tensor, got ", rois.dim(),
+              "D.");
+  TORCH_CHECK(top_grad.dim() == 4, "top_grad should be a 4d tensor, got ",
+              bottom_grad.dim(), "D.");
+
+  TORCH_CHECK((rois.size(0) == top_grad.size(0)),
+              "the 1st dimensions of rois and top_grad should be the same, ",
+              "but now the 1st dimension of rois is ", rois.size(0),
+              ", and top_grad is ", top_grad.size(0), ".");
+
+  TORCH_CHECK((bottom_grad.size(1) == top_grad.size(1)),
+              "the 2nd dimensions of bottom_grad and top_grad should be ",
+              "the same, but now the 2nd dimension of bottom_grad is ",
+              bottom_grad.size(1), ", and top_grad is ", top_grad.size(1), ".");
+
+  int channel = bottom_grad.size(1);
+  int width = bottom_grad.size(3);
+  int height = bottom_grad.size(2);
+  int batch = bottom_grad.size(0);
+  int rois_nums = rois.size(0);
+  cnrtDataType_t d_type = torch_mlu::toCnrtDtype(bottom_grad.dtype());
+
+  // return if zero-elements
+  if (bottom_grad.numel() == 0) {
+    CNLOG(INFO) << "Skip the zero-elements case.";
+    return;
+  }
+
+  RoiAlignRotatedParams roiAlignRotatedParams{pooled_height,  pooled_width,
+                                              sampling_ratio, spatial_scale,
+                                              aligned,        clockwise};
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFunc(rois_nums * pooled_height * pooled_width, &k_dim, &k_type);
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(top_grad.dim());
+  auto top_grad_tensor =
+      torch_mlu::cnnl::ops::cnnl_contiguous(top_grad, memory_format);
+  at::Tensor bottom_grad_tmp = at::empty({batch, channel, height, width},
+                                         top_grad.options(), memory_format)
+                                   .zero_();
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto bottom_grad_impl = torch_mlu::getMluTensorImpl(bottom_grad_tmp);
+  auto bottom_grad_ptr = bottom_grad_impl->cnnlMalloc();
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto top_grad_impl = torch_mlu::getMluTensorImpl(top_grad_tensor);
+  auto top_grad_ptr = top_grad_impl->cnnlMalloc();
+
+  KernelRoiAlignRotatedBackward(k_dim, k_type, queue, d_type, top_grad_ptr,
+                                rois_ptr, bottom_grad_ptr, batch, height, width,
+                                channel, rois_nums, roiAlignRotatedParams);
+  bottom_grad.copy_(bottom_grad_tmp);
+}
+
+void roi_align_rotated_forward_mlu(Tensor input, Tensor rois, Tensor output,
+                                   int aligned_height, int aligned_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   bool aligned, bool clockwise) {
+  ROIAlignRotatedForwardMLUKernelLauncher(input, rois, output, aligned_height,
+                                          aligned_width, spatial_scale,
+                                          sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_backward_mlu(Tensor top_grad, Tensor rois,
+                                    Tensor bottom_grad, int aligned_height,
+                                    int aligned_width, float spatial_scale,
+                                    int sampling_ratio, bool aligned,
+                                    bool clockwise) {
+  ROIAlignRotatedBackwardMLUKernelLauncher(
+      top_grad, rois, bottom_grad, aligned_height, aligned_width, spatial_scale,
+      sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_forward_impl(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise);
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise);
+
+REGISTER_DEVICE_IMPL(roi_align_rotated_forward_impl, MLU,
+                     roi_align_rotated_forward_mlu);
+REGISTER_DEVICE_IMPL(roi_align_rotated_backward_impl, MLU,
+                     roi_align_rotated_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_pool_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_pool_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7db23957d2cbba8f496b9effd67a62f87cde39e5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roi_pool_mlu.cpp
@@ -0,0 +1,275 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelRoiPoolForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t data_type,
+                          const void *input_data, const void *input_rois,
+                          const int batch, const int channels, const int height,
+                          const int width, const int pooled_height,
+                          const int pooled_width, const int rois_num,
+                          const float spatial_scale, void *output_data,
+                          int *argmax);
+
+void KernelRoiPoolBackward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                           cnrtQueue_t queue, cnrtDataType_t k_dtype,
+                           const void *grad_output_ptr, const void *rois_ptr,
+                           const int *argmax_ptr, void *grad_input_ptr,
+                           const int box_num, const int pooled_height,
+                           const int pooled_width, const int channels,
+                           const int batch, const int height, const int width,
+                           const float spatial_scale);
+
+// policy function for forward
+static void policyFuncForward(const int bin_num, cnrtDim3_t *k_dim,
+                              cnrtFunctionType_t *k_type) {
+  auto core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_num;
+  unsigned int use_cluster = bin_num / core_num + (bin_num % core_num > 0);
+  k_dim->y = use_cluster > cluster_num ? cluster_num : use_cluster;
+  k_dim->z = 1;
+}
+
+void ROIPoolForwardMLUKernelLauncher(Tensor input, Tensor rois, Tensor output,
+                                     Tensor argmax, int pooled_height,
+                                     int pooled_width, float spatial_scale) {
+  // Check dtype.
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "input type should be Float or Half, got ", input.scalar_type());
+  TORCH_CHECK(input.scalar_type() == rois.scalar_type(),
+              "rois should have the same type as input");
+
+  // Check dtype relationship.
+  TORCH_CHECK(
+      argmax.scalar_type() == at::kLong || argmax.scalar_type() == at::kInt,
+      "argmax type should be Int or Long, got ", argmax.scalar_type());
+
+  // Check shape.
+  TORCH_CHECK(input.dim() == 4, "input should be 4d tensor, got ", input.dim(),
+              "D");
+  TORCH_CHECK(rois.dim() == 2, "rois should be 2d tensor, got ", rois.dim(),
+              "D");
+  TORCH_CHECK(argmax.dim() == 4, "argmax should be 4d tensor, got ",
+              argmax.dim(), "D");
+
+  TORCH_CHECK(spatial_scale > 0 && spatial_scale <= 1,
+              "spatial_scale should be within (0, 1], got ", spatial_scale);
+
+  // compute kernel params
+  auto batch = input.size(0);
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto channels = input.size(1);
+  auto rois_num = output.size(0);
+
+  if (output.numel() == 0) {
+    output = at::zeros({rois_num, channels, pooled_height, pooled_width},
+                       input.options());
+    return;
+  }
+  if (argmax.numel() == 0) {
+    argmax = at::zeros({rois_num, channels, pooled_height, pooled_width},
+                       argmax.options());
+    return;
+  }
+
+  // zero element check
+  if (input.numel() == 0 || rois.numel() == 0 || output.numel() == 0 ||
+      argmax.numel() == 0) {
+    return;
+  }
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(input.dim());
+  auto input_ = torch_mlu::cnnl::ops::cnnl_contiguous(input, memory_format);
+
+  at::Tensor output_ =
+      at::empty({rois_num, channels, pooled_height, pooled_width},
+                input.options(), memory_format);
+  at::Tensor argmax_ =
+      at::empty({rois_num, channels, pooled_height, pooled_width},
+                argmax.options(), memory_format);
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncForward(rois_num * pooled_height * pooled_width, &k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto input_impl = torch_mlu::getMluTensorImpl(input_);
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto output_impl = torch_mlu::getMluTensorImpl(output_);
+  auto output_ptr = output_impl->cnnlMalloc();
+  auto argmax_impl = torch_mlu::getMluTensorImpl(argmax_);
+  auto argmax_ptr = argmax_impl->cnnlMalloc();
+
+  // get comput dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(input_.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelRoiPoolForward<<<" << k_dim.x << ", "
+              << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelRoiPoolForward(k_dim, k_type, queue, data_type, input_ptr, rois_ptr,
+                       batch, channels, height, width, pooled_height,
+                       pooled_width, rois_num, spatial_scale, output_ptr,
+                       (int *)argmax_ptr);
+  output.copy_(output_);
+  argmax.copy_(argmax_);
+}
+
+// policy function for backward
+static void policyFuncBackward(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim->z = 1;
+}
+
+void ROIPoolBackwardMLUKernelLauncher(Tensor grad_output, Tensor rois,
+                                      Tensor argmax, Tensor grad_input,
+                                      int pooled_height, int pooled_width,
+                                      float spatial_scale) {
+  // Check dtype.
+  TORCH_CHECK(
+      argmax.scalar_type() == at::kLong || argmax.scalar_type() == at::kInt,
+      "argmax type should be Int or Long, got ", argmax.scalar_type());
+  TORCH_CHECK((grad_output.scalar_type() == at::kFloat ||
+               grad_output.scalar_type() == at::kHalf),
+              "grad_output type should be FLoat or Half, got ",
+              grad_output.scalar_type());
+
+  // Check dtype relationship.
+  TORCH_CHECK((rois.scalar_type() == grad_output.scalar_type()),
+              "rois should have the same type as grad_output");
+
+  // Check shape.
+  TORCH_CHECK(grad_output.dim() == 4, "grad_output should be 4d tensor, got ",
+              grad_output.dim(), "D");
+  TORCH_CHECK(rois.dim() == 2, "rois should be 2d tensor, got ", rois.dim(),
+              "D");
+  TORCH_CHECK(argmax.dim() == 4, "argmax should be 4d tensor, got ",
+              argmax.dim(), "D");
+
+  TORCH_CHECK(spatial_scale > 0 && spatial_scale <= 1,
+              "spatial_scale should be within (0, 1], got ", spatial_scale);
+
+  // Check relationship between tensor.
+  // Check the relationship of n.
+  TORCH_CHECK(grad_output.size(0) == rois.size(0),
+              "grad_output.size(0) = ", grad_output.size(0),
+              ", while rois.size(0) = ", rois.size(0),
+              ". They should be the same.");
+
+  // Check the relationship of channels.
+  TORCH_CHECK(grad_output.size(1) == argmax.size(1),
+              "grad_output.size(1) = ", grad_output.size(1),
+              ", while argmax.size(1) = ", argmax.size(1),
+              ". They should be the same.");
+
+  // Check the relationship of height and width.
+  TORCH_CHECK(grad_output.size(2) == argmax.size(2),
+              "argmax.size(2) = ", argmax.size(2),
+              ", while grad_output.size(2) = ", grad_output.size(2),
+              ". They should be the same.");
+  TORCH_CHECK(grad_output.size(3) == argmax.size(3),
+              "argmax.size(3) = ", argmax.size(3),
+              ", while grad_output.size(3) = ", grad_output.size(3),
+              ". They should be the same.");
+
+  // Check zero element.
+  if (grad_output.numel() == 0 || rois.numel() == 0 || argmax.numel() == 0 ||
+      grad_input.numel() == 0) {
+    // return if zero-element
+    return;
+  }
+
+  auto memory_format =
+      torch_mlu::cnnl::ops::get_channels_last_memory_format(grad_output.dim());
+  auto grad_output_ =
+      torch_mlu::cnnl::ops::cnnl_contiguous(grad_output, memory_format);
+  auto argmax_ = torch_mlu::cnnl::ops::cnnl_contiguous(argmax, memory_format);
+
+  int boxes_num = grad_output.size(0);
+  int no = grad_input.size(0);
+  int channels = grad_input.size(1);
+  int height = grad_input.size(2);
+  int width = grad_input.size(3);
+  auto grad_input_ = at::empty({no, channels, height, width},
+                               grad_input.options(), memory_format)
+                         .zero_();
+
+  // get tensor impl
+  auto grad_output_impl = torch_mlu::getMluTensorImpl(grad_output_);
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto argmax_impl = torch_mlu::getMluTensorImpl(argmax_);
+  auto grad_input_impl = torch_mlu::getMluTensorImpl(grad_input_);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get mlu ptr
+  auto grad_output_ptr = grad_output_impl->cnnlMalloc();
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  auto argmax_ptr = argmax_impl->cnnlMalloc();
+  auto grad_input_ptr = grad_input_impl->cnnlMalloc();
+
+  // calculate task dimension
+  cnrtDataType_t k_dtype = torch_mlu::toCnrtDtype(grad_input.dtype());
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncBackward(&k_dim, &k_type);
+
+  CNLOG(INFO) << "Launch Kernel MLUKernelRoiPoolBackward<<<" << k_dim.x << ", "
+              << k_dim.y << ", " << k_dim.z << ">>>";
+
+  KernelRoiPoolBackward(k_dim, k_type, queue, k_dtype, grad_output_ptr,
+                        rois_ptr, (int *)argmax_ptr, grad_input_ptr, boxes_num,
+                        pooled_height, pooled_width, channels, no, height,
+                        width, spatial_scale);
+
+  grad_input.copy_(grad_input_);
+}
+
+void roi_pool_forward_mlu(Tensor input, Tensor rois, Tensor output,
+                          Tensor argmax, int pooled_height, int pooled_width,
+                          float spatial_scale) {
+  ROIPoolForwardMLUKernelLauncher(input, rois, output, argmax, pooled_height,
+                                  pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_mlu(Tensor grad_output, Tensor rois, Tensor argmax,
+                           Tensor grad_input, int pooled_height,
+                           int pooled_width, float spatial_scale) {
+  ROIPoolBackwardMLUKernelLauncher(grad_output, rois, argmax, grad_input,
+                                   pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale);
+
+void roi_pool_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale);
+
+REGISTER_DEVICE_IMPL(roi_pool_forward_impl, MLU, roi_pool_forward_mlu);
+REGISTER_DEVICE_IMPL(roi_pool_backward_impl, MLU, roi_pool_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roiaware_pool3d_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roiaware_pool3d_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..62cb2dc62e44ca3e48d37e86b1c0cb941a3e3b53
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roiaware_pool3d_mlu.cpp
@@ -0,0 +1,399 @@
+/*************************************************************************
+ * Copyright (C) 2022 by Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelPtsIdxOfVoxels(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, const cnrtDataType_t d_type,
+                          const int pool_method, const int boxes_num,
+                          const int pts_num, const int max_pts_each_voxel,
+                          const int out_x, const int out_y, const int out_z,
+                          const void *rois, const void *pts,
+                          int *pts_idx_of_voxels);
+
+void KernelRoiawarePool3dForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const int pool_method, const int boxes_num,
+    const int pts_num, const int channels, const int max_pts_each_voxel,
+    const int out_x, const int out_y, const int out_z, const void *pts_feature,
+    const int *pts_idx_of_voxels, void *pooled_features, int *argmax);
+
+// policy function
+static void kernelPtsIdxOfVoxelsPolicyFunc(const int boxes_num,
+                                           cnrtDim3_t *k_dim,
+                                           cnrtFunctionType_t *k_type) {
+  unsigned int core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  unsigned int cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_num;
+  unsigned int use_cluster = (boxes_num + core_num - 1) / core_num;
+  k_dim->y = use_cluster > cluster_num ? cluster_num : use_cluster;
+  k_dim->z = 1;
+}
+
+static void kernelRoiawarePool3dForwardPolicyFunc(
+    const int boxes_num, const int out_x, const int out_y, const int out_z,
+    cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  unsigned int core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  unsigned int cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_num;
+  const int voxels_num = boxes_num * out_x * out_y * out_z;
+  unsigned int use_cluster = (voxels_num + core_num - 1) / core_num;
+  k_dim->y = use_cluster > cluster_num ? cluster_num : use_cluster;
+  k_dim->z = 1;
+}
+
+void RoiawarePool3dForwardMLUKernelLauncher(
+    const int pool_method, const int boxes_num, const int pts_num,
+    const int channels, const int max_pts_each_voxel, const int out_x,
+    const int out_y, const int out_z, const Tensor rois, const Tensor pts,
+    const Tensor pts_feature, Tensor pts_idx_of_voxels, Tensor pooled_features,
+    Tensor argmax) {
+  // check datatype
+  TORCH_CHECK(((pts.scalar_type() == rois.scalar_type()) &&
+               (pts_feature.scalar_type() == rois.scalar_type()) &&
+               (pooled_features.scalar_type() == rois.scalar_type())),
+              "data types of rois, rois, pts_feature and pooled_features "
+              "should be the same, ",
+              "but now rois type is ", rois.scalar_type(), ", pts type is ",
+              pts.scalar_type(), ", pts_feature type is ",
+              pts_feature.scalar_type(), ", pooled_features type is ",
+              pooled_features.scalar_type(), ".");
+  TORCH_CHECK(
+      (rois.scalar_type() == at::kFloat || rois.scalar_type() == at::kHalf),
+      "rois type should be Float or Half, got ", rois.scalar_type(), ".");
+  TORCH_CHECK((pts_idx_of_voxels.scalar_type() == at::kInt),
+              "pts_idx_of_voxels type should be Int, got ",
+              pts_idx_of_voxels.scalar_type(), ".");
+  // check dim
+  TORCH_CHECK(rois.dim() == 2, "rois should be a 2D tensor, got ", rois.dim(),
+              "D.");
+  TORCH_CHECK(pts.dim() == 2, "pts should be a 2D tensor, got ", pts.dim(),
+              "D.");
+  TORCH_CHECK(pts_feature.dim() == 2, "pts_feature should be a 2D tensor, got ",
+              pts_feature.dim(), "D.");
+  TORCH_CHECK(pts_idx_of_voxels.dim() == 5,
+              "pts_idx_of_voxels should be a 5D tensor, got ",
+              pts_idx_of_voxels.dim(), "D.");
+  TORCH_CHECK(pooled_features.dim() == 5,
+              "pooled_features should be a 5D tensor, got ",
+              pooled_features.dim(), "D.");
+  // check shape
+  TORCH_CHECK(((rois.size(0) == boxes_num) && (rois.size(1) == 7)),
+              "the dimensions of rois should be (boxes_num, 7), ", "but got (",
+              rois.size(0), ", ", rois.size(1), ") .");
+  TORCH_CHECK(((pts.size(0) == pts_num) && (pts.size(1) == 3)),
+              "the dimensions of pts should be (pts_num, 3), ", "but got (",
+              pts.size(0), ",", pts.size(1), ").");
+  TORCH_CHECK(
+      ((pts_feature.size(0) == pts_num) && (pts_feature.size(1) == channels)),
+      "the dimensions of pts_feature should be (pts_num, channels), ",
+      "but got (", pts_feature.size(0), ",", pts_feature.size(1), ").");
+  TORCH_CHECK(((pts_idx_of_voxels.size(0) == boxes_num) &&
+               (pts_idx_of_voxels.size(1) == out_x) &&
+               (pts_idx_of_voxels.size(2) == out_y) &&
+               (pts_idx_of_voxels.size(3) == out_z) &&
+               (pts_idx_of_voxels.size(4) == max_pts_each_voxel)),
+              "the dimensions of pts_idx_of_voxels should be (boxes_num, "
+              "out_x, out_y, out_z, max_pts_each_voxel), ",
+              "but got (", pts_idx_of_voxels.size(0), ",",
+              pts_idx_of_voxels.size(1), ",", pts_idx_of_voxels.size(2), ",",
+              pts_idx_of_voxels.size(3), ",", pts_idx_of_voxels.size(4), ").");
+  TORCH_CHECK(((pooled_features.size(0) == boxes_num) &&
+               (pooled_features.size(1) == out_x) &&
+               (pooled_features.size(2) == out_y) &&
+               (pooled_features.size(3) == out_z) &&
+               (pooled_features.size(4) == channels)),
+              "the dimensions of pooled_features should be (boxes_num, out_x, "
+              "out_y, out_z, channels), ",
+              "but got (", pooled_features.size(0), ",",
+              pooled_features.size(1), ",", pooled_features.size(2), ",",
+              pooled_features.size(3), ",", pooled_features.size(4), ").");
+  // check other params : pool_mothod
+  TORCH_CHECK(((pool_method == 0) || (pool_method == 1)),
+              "the num of pool_method should be 0(max) or 1(avg), ", "but got ",
+              pool_method, ".");
+  // check large tensor
+  const size_t max_input_size = 2147483648;
+  TORCH_CHECK(rois.numel() < max_input_size,
+              "rois element num should be less than 2^31, got ", rois.numel(),
+              ".");
+  TORCH_CHECK(pts.numel() < max_input_size,
+              "pts element num should be less than 2^31, got ", pts.numel(),
+              ".");
+  TORCH_CHECK(pts_feature.numel() < max_input_size,
+              "pts_feature element num should be less than 2^31, got ",
+              pts_feature.numel(), ".");
+  TORCH_CHECK(pts_idx_of_voxels.numel() < max_input_size,
+              "pts_idx_of_voxels element num should be less than 2^31, got ",
+              pts_idx_of_voxels.numel(), ".");
+  TORCH_CHECK(pooled_features.numel() < max_input_size,
+              "pooled_features element num should be less than 2^31, got ",
+              pooled_features.numel(), ".");
+  // check zero element
+  TORCH_CHECK(rois.numel() != 0, "rois.numel() should not be zero, got ",
+              rois.numel());
+  TORCH_CHECK(pts.numel() != 0, "pts.numel() should not be zero, got ",
+              pts.numel());
+  TORCH_CHECK(pts_feature.numel() != 0,
+              "pts_feature.numel() should not be zero, got ",
+              pts_feature.numel());
+  TORCH_CHECK(pts_idx_of_voxels.numel() != 0,
+              "pts_idx_of_voxels.numel() should not be zero, got ",
+              pts_idx_of_voxels.numel());
+  TORCH_CHECK(pooled_features.numel() != 0,
+              "pooled_features.numel() should not be zero, got ",
+              pooled_features.numel());
+  if (pool_method == 0) {
+    // check datatype
+    TORCH_CHECK((argmax.scalar_type() == at::kInt),
+                "argmax type should be Int, got ", argmax.scalar_type(), ".");
+    // check dim
+    TORCH_CHECK(argmax.dim() == 5, "argmax should be a 5D tensor, got ",
+                argmax.dim(), "D.");
+    // check shape
+    TORCH_CHECK(((argmax.size(0) == boxes_num) && (argmax.size(1) == out_x) &&
+                 (argmax.size(2) == out_y) && (argmax.size(3) == out_z) &&
+                 (argmax.size(4) == channels)),
+                "the dimensions of argmax should be (boxes_num, out_x, out_y, "
+                "out_z, channels), ",
+                "but got (", argmax.size(0), ",", argmax.size(1), ",",
+                argmax.size(2), ",", argmax.size(3), ",", argmax.size(4), ").");
+    // check large tensor
+    TORCH_CHECK(argmax.numel() < max_input_size,
+                "argmax element num should be less than 2^31, got ",
+                argmax.numel(), ".");
+    // check zero element
+    TORCH_CHECK(argmax.numel() != 0, "argmax.numel() should not be zero, got ",
+                argmax.numel());
+    // when pool_method is 0, which is max pool, init argmax data value to -1
+    argmax.fill_(static_cast<int>(-1));
+  }
+  // calculate task one dimension
+  cnrtDim3_t k1_dim;
+  cnrtFunctionType_t k1_type;
+  kernelPtsIdxOfVoxelsPolicyFunc(boxes_num, &k1_dim, &k1_type);
+  cnrtDim3_t k2_dim;
+  cnrtFunctionType_t k2_type;
+  kernelRoiawarePool3dForwardPolicyFunc(boxes_num, out_x, out_y, out_z, &k2_dim,
+                                        &k2_type);
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+  // get ptr of tensors
+  auto rois_impl = torch_mlu::getMluTensorImpl(rois);
+  auto rois_ptr = rois_impl->cnnlMalloc();
+  // transpose points [pts_num, 3] -> [3, pts_num]
+  auto pts_ = pts.permute({1, 0}).contiguous();
+  auto pts_impl = torch_mlu::getMluTensorImpl(pts_);
+  auto pts_ptr = pts_impl->cnnlMalloc();
+  // transpose points_features [pts_num, channels] -> [channels, pts_num]
+  auto pts_feature_ = pts_feature.permute({1, 0}).contiguous();
+  auto pts_feature_impl = torch_mlu::getMluTensorImpl(pts_feature_);
+  auto pts_feature_ptr = pts_feature_impl->cnnlMalloc();
+  auto pts_idx_of_voxels_impl = torch_mlu::getMluTensorImpl(pts_idx_of_voxels);
+  auto pts_idx_of_voxels_ptr = pts_idx_of_voxels_impl->cnnlMalloc();
+  auto pooled_features_impl = torch_mlu::getMluTensorImpl(pooled_features);
+  auto pooled_features_ptr = pooled_features_impl->cnnlMalloc();
+  auto argmax_impl = torch_mlu::getMluTensorImpl(argmax);
+  auto argmax_ptr = argmax_impl->cnnlMalloc();
+  // get compute dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(rois.dtype());
+  // launch kernel PtsIdxOfVoxels
+  CNLOG(INFO) << "Launch Kernel MLUKernel PtsIdxOfVoxels<<<" << k1_dim.x << ", "
+              << k1_dim.y << ", " << k1_dim.z << ">>>";
+  KernelPtsIdxOfVoxels(k1_dim, k1_type, queue, data_type, pool_method,
+                       boxes_num, pts_num, max_pts_each_voxel, out_x, out_y,
+                       out_z, rois_ptr, pts_ptr, (int *)pts_idx_of_voxels_ptr);
+  // launch kernel RoiawarePool3dForward
+  CNLOG(INFO) << "Launch Kernel MLUKernel RoiawarePool3dForward<<<" << k2_dim.x
+              << ", " << k2_dim.y << ", " << k2_dim.z << ">>>";
+  KernelRoiawarePool3dForward(
+      k2_dim, k2_type, queue, data_type, pool_method, boxes_num, pts_num,
+      channels, max_pts_each_voxel, out_x, out_y, out_z, pts_feature_ptr,
+      (int *)pts_idx_of_voxels_ptr, pooled_features_ptr, (int *)argmax_ptr);
+}
+
+void roiaware_pool3d_forward_mlu(int boxes_num, int pts_num, int channels,
+                                 int max_pts_each_voxel, int out_x, int out_y,
+                                 int out_z, const Tensor rois, const Tensor pts,
+                                 const Tensor pts_feature, Tensor argmax,
+                                 Tensor pts_idx_of_voxels,
+                                 Tensor pooled_features, int pool_method) {
+  RoiawarePool3dForwardMLUKernelLauncher(
+      pool_method, boxes_num, pts_num, channels, max_pts_each_voxel, out_x,
+      out_y, out_z, rois, pts, pts_feature, pts_idx_of_voxels, pooled_features,
+      argmax);
+}
+
+void roiaware_pool3d_forward_impl(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method);
+
+REGISTER_DEVICE_IMPL(roiaware_pool3d_forward_impl, MLU,
+                     roiaware_pool3d_forward_mlu);
+
+void KernelRoiawarePool3dBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const int pool_method, const int boxes_num,
+    const int out_x, const int out_y, const int out_z, const int channels,
+    const int max_pts_each_voxel, const int *pts_idx_of_voxels,
+    const int *argmax, const void *grad_out, void *grad_in);
+
+static void kernelRoiawarePool3dBackwardPolicyFunc(
+    const int boxes_num, const int out_x, const int out_y, const int out_z,
+    cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  unsigned int core_num = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  unsigned int cluster_num = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+  k_dim->x = core_num;
+  const int voxels_num = boxes_num * out_x * out_y * out_z;
+  unsigned int use_cluster = (voxels_num + core_num - 1) / core_num;
+  k_dim->y = use_cluster > cluster_num ? cluster_num : use_cluster;
+  k_dim->z = 1;
+}
+
+void RoiawarePool3dBackwardMLUKernelLauncher(
+    int pool_method, int boxes_num, int out_x, int out_y, int out_z,
+    int channels, int max_pts_each_voxel, const Tensor pts_idx_of_voxels,
+    const Tensor argmax, const Tensor grad_out, Tensor grad_in) {
+  // check datatype
+  TORCH_CHECK((pts_idx_of_voxels.scalar_type() == at::kInt),
+              "pts_idx_of_voxels type should be Int, got ",
+              pts_idx_of_voxels.scalar_type(), ".");
+  TORCH_CHECK((argmax.scalar_type() == at::kInt),
+              "argmax type should be Int, got ", argmax.scalar_type(), ".");
+  TORCH_CHECK((grad_out.scalar_type() == at::kFloat ||
+               grad_out.scalar_type() == at::kHalf),
+              "grad_out type should be Float or Half, got ",
+              grad_out.scalar_type(), ".");
+  TORCH_CHECK((grad_out.scalar_type() == grad_in.scalar_type()),
+              "data types of grad_out, grad_in, should be the same, ",
+              "but now grad_out type is ", grad_out.scalar_type(),
+              ", grad_in type is ", grad_in.scalar_type(), ".");
+  // check dim
+  TORCH_CHECK(pts_idx_of_voxels.dim() == 5,
+              "pts_idx_of_voxels should be a 5D tensor, got ",
+              pts_idx_of_voxels.dim(), "D.");
+  TORCH_CHECK(argmax.dim() == 5, "argmax should be a 5D tensor, got ",
+              argmax.dim(), "D.");
+  TORCH_CHECK(grad_out.dim() == 5, "grad_out should be a 5D tensor, got ",
+              grad_out.dim(), "D.");
+  TORCH_CHECK(grad_in.dim() == 2, "grad_in should be a 2D tensor, got ",
+              grad_in.dim(), "D.");
+  // check shape
+  TORCH_CHECK(((pts_idx_of_voxels.size(0) == boxes_num) &&
+               (pts_idx_of_voxels.size(1) == out_x) &&
+               (pts_idx_of_voxels.size(2) == out_y) &&
+               (pts_idx_of_voxels.size(3) == out_z) &&
+               (pts_idx_of_voxels.size(4) == max_pts_each_voxel)),
+              "the dimensions of pts_idx_of_voxels should be (boxes_num, "
+              "out_x, out_y, out_z, max_pts_each_voxel), ",
+              "but got (", pts_idx_of_voxels.size(0), ",",
+              pts_idx_of_voxels.size(1), ",", pts_idx_of_voxels.size(2), ",",
+              pts_idx_of_voxels.size(3), ",", pts_idx_of_voxels.size(4), ").");
+  TORCH_CHECK(((argmax.size(0) == boxes_num) && (argmax.size(1) == out_x) &&
+               (argmax.size(2) == out_y) && (argmax.size(3) == out_z) &&
+               (argmax.size(4) == channels)),
+              "the dimensions of argmax should be (boxes_num, out_x, out_y, "
+              "out_z, channels), ",
+              "but got (", argmax.size(0), ",", argmax.size(1), ",",
+              argmax.size(2), ",", argmax.size(3), ",", argmax.size(4), ").");
+  TORCH_CHECK(((grad_out.size(0) == boxes_num) && (grad_out.size(1) == out_x) &&
+               (grad_out.size(2) == out_y) && (grad_out.size(3) == out_z) &&
+               (grad_out.size(4) == channels)),
+              "the dimensions of grad_out should be (boxes_num, out_x, "
+              "out_y, out_z, channels), ",
+              "but got (", grad_out.size(0), ",", grad_out.size(1), ",",
+              grad_out.size(2), ",", grad_out.size(3), ",", grad_out.size(4),
+              ").");
+  TORCH_CHECK((grad_in.size(1) == channels),
+              "the 1st dimensions of grad_in should be channels, ", "but got ",
+              grad_in.size(1), ".");
+  // check other params : pool_mothod
+  TORCH_CHECK(((pool_method == 0) || (pool_method == 1)),
+              "the num of pool_method should be 0(max) or 1(avg), ", "but got ",
+              pool_method, ".");
+  // check large tensor
+  const size_t max_input_size = 2147483648;
+  TORCH_CHECK(pts_idx_of_voxels.numel() < max_input_size,
+              "pts_idx_of_voxels element num should be less than 2^31, got ",
+              pts_idx_of_voxels.numel(), ".");
+  TORCH_CHECK(argmax.numel() < max_input_size,
+              "argmax element num should be less than 2^31, got ",
+              argmax.numel(), ".");
+  TORCH_CHECK(grad_out.numel() < max_input_size,
+              "grad_out element num should be less than 2^31, got ",
+              grad_out.numel(), ".");
+  TORCH_CHECK(grad_in.numel() < max_input_size,
+              "grad_in element num should be less than 2^31, got ",
+              grad_in.numel(), ".");
+  // check zero element
+  TORCH_CHECK(pts_idx_of_voxels.numel() != 0,
+              "pts_idx_of_voxels.numel() should not be zero, got ",
+              pts_idx_of_voxels.numel());
+  TORCH_CHECK(argmax.numel() != 0, "argmax.numel() should not be zero, got ",
+              argmax.numel());
+  TORCH_CHECK(grad_out.numel() != 0,
+              "grad_out.numel() should not be zero, got ", grad_out.numel());
+  TORCH_CHECK(grad_in.numel() != 0, "grad_in.numel() should not be zero, got ",
+              grad_in.numel());
+  // calculate task one dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  kernelRoiawarePool3dBackwardPolicyFunc(boxes_num, out_x, out_y, out_z, &k_dim,
+                                         &k_type);
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+  // transpose points_features [pts_num, channels] -> [channels, pts_num]
+  auto pts_idx_of_voxels_impl = torch_mlu::getMluTensorImpl(pts_idx_of_voxels);
+  auto pts_idx_of_voxels_ptr = pts_idx_of_voxels_impl->cnnlMalloc();
+  auto argmax_impl = torch_mlu::getMluTensorImpl(argmax);
+  auto argmax_ptr = argmax_impl->cnnlMalloc();
+  auto grad_out_impl = torch_mlu::getMluTensorImpl(grad_out);
+  auto grad_out_ptr = grad_out_impl->cnnlMalloc();
+  auto grad_in_impl = torch_mlu::getMluTensorImpl(grad_in);
+  auto grad_in_ptr = grad_in_impl->cnnlMalloc();
+  // get compute dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(grad_out.dtype());
+  // launch kernel RoiawarePool3dForward
+  CNLOG(INFO) << "Launch Kernel MLUKernel RoiawarePool3dBackward<<<" << k_dim.x
+              << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+  KernelRoiawarePool3dBackward(k_dim, k_type, queue, data_type, pool_method,
+                               boxes_num, out_x, out_y, out_z, channels,
+                               max_pts_each_voxel, (int *)pts_idx_of_voxels_ptr,
+                               (int *)argmax_ptr, grad_out_ptr, grad_in_ptr);
+}
+
+void roiaware_pool3d_backward_mlu(int boxes_num, int out_x, int out_y,
+                                  int out_z, int channels,
+                                  int max_pts_each_voxel,
+                                  const Tensor pts_idx_of_voxels,
+                                  const Tensor argmax, const Tensor grad_out,
+                                  Tensor grad_in, int pool_method) {
+  RoiawarePool3dBackwardMLUKernelLauncher(
+      pool_method, boxes_num, out_x, out_y, out_z, channels, max_pts_each_voxel,
+      pts_idx_of_voxels, argmax, grad_out, grad_in);
+}
+
+void roiaware_pool3d_backward_impl(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method);
+
+REGISTER_DEVICE_IMPL(roiaware_pool3d_backward_impl, MLU,
+                     roiaware_pool3d_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roipoint_pool3d_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roipoint_pool3d_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..49dfe0ecad0c44c601736035d3e36590a95b27da
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/roipoint_pool3d_mlu.cpp
@@ -0,0 +1,166 @@
+/*************************************************************************
+ * Copyright (C) 2022 by Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelRoiPointPool3dForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                                 cnrtQueue_t queue, const cnrtDataType_t d_type,
+                                 const int batch_size, const int pts_num,
+                                 const int boxes_num, const int feature_in_len,
+                                 const int sampled_pts_num, const void *xyz,
+                                 const void *boxes3d, const void *pts_feature,
+                                 void *pooled_features, int *pooled_empty_flag);
+
+void KernelRoiPointPool3dLargeBoxesNumForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const cnrtDataType_t d_type, const int batch_size, const int pts_num,
+    const int boxes_num, const int feature_in_len, const int sampled_pts_num,
+    const void *xyz, const void *boxes3d, const void *pts_feature,
+    void *pooled_features, int *pooled_empty_flag);
+
+// policy function
+static void policyFuncForward(cnrtDim3_t *k_dim, cnrtFunctionType_t *k_type) {
+  // start U1 task, occupy all available clusters
+  k_dim->x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim->y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+}
+
+void RoIPointPool3dForwardMLUKernelLauncher(
+    int batch_size, int pts_num, int boxes_num, int feature_in_len,
+    int sampled_pts_num, const Tensor xyz, const Tensor boxes3d,
+    const Tensor pts_feature, Tensor pooled_features,
+    Tensor pooled_empty_flag) {
+  // check datatype
+  TORCH_CHECK(((xyz.scalar_type() == pooled_features.scalar_type()) &&
+               (boxes3d.scalar_type() == pooled_features.scalar_type()) &&
+               (pts_feature.scalar_type() == pooled_features.scalar_type())),
+              "data types of xyz, boxes3d, pts_feature and pooled_features "
+              "should be the same, ",
+              "but now xyz type is ", xyz.scalar_type(), ", boxes3d type is ",
+              boxes3d.scalar_type(), ", pts_feature type is ",
+              pts_feature.scalar_type(), ", pooled_features type is ",
+              pooled_features.scalar_type(), ".");
+  TORCH_CHECK(
+      (xyz.scalar_type() == at::kFloat || xyz.scalar_type() == at::kHalf),
+      "xyz type should be Float or Half, got ", xyz.scalar_type(), ".");
+  TORCH_CHECK((pooled_empty_flag.scalar_type() == at::kInt),
+              "pooled_empty_flag type should be Int, got ",
+              pooled_empty_flag.scalar_type(), ".");
+
+  // check shape
+  TORCH_CHECK(boxes3d.dim() == 3, "boxes3d should be a 3d tensor, got ",
+              boxes3d.dim(), "D.");
+  TORCH_CHECK(pts_feature.dim() == 3, "pts_feature should be a 3d tensor, got ",
+              pts_feature.dim(), "D.");
+
+  TORCH_CHECK(boxes3d.size(2) == 7,
+              "the 3rd dimensions of boxes3d should be 7, got ",
+              boxes3d.size(2), ".");
+  TORCH_CHECK((boxes3d.size(0) == batch_size),
+              "the 1st dimensions of boxes3d should be batch_size, ",
+              "but now the 1st dimension of boxes3d is ", boxes3d.size(0),
+              ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((pts_feature.size(0) == batch_size),
+              "the 1st dimensions of pts_feature should be batch_size, ",
+              "but now the 1st dimension of pts_feature is ",
+              pts_feature.size(0), ", and batch_size is ", batch_size, ".");
+  TORCH_CHECK((pts_feature.size(1) == pts_num),
+              "the 2nd dimensions of pts_feature should be pts_num, ",
+              "but now the 2nd dimension of pts_feature is ",
+              pts_feature.size(1), ", and pts_num is ", pts_num, ".");
+
+  // check zero element
+  if (xyz.numel() == 0 || pts_feature.numel() == 0 || boxes3d.numel() == 0 ||
+      pooled_features.numel() == 0 || pooled_empty_flag.numel() == 0) {
+    return;
+  }
+
+  // large tensor check
+  const size_t max_input_size = 2147483648;
+  TORCH_CHECK(xyz.numel() < max_input_size,
+              "xyz element num should be less than 2^31, got ", xyz.numel(),
+              ".");
+  TORCH_CHECK(boxes3d.numel() < max_input_size,
+              "boxes3d element num should be less than 2^31, got ",
+              boxes3d.numel(), ".");
+  TORCH_CHECK(pts_feature.numel() < max_input_size,
+              "pts_feature element num should be less than 2^31, got ",
+              pts_feature.numel(), ".");
+
+  // calculate task dimension
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  policyFuncForward(&k_dim, &k_type);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  // transpose points [B, N ,3] -> [3, B, N]
+  auto xyz_ = xyz.permute({2, 0, 1}).contiguous();
+  auto xyz_impl = torch_mlu::getMluTensorImpl(xyz_);
+  auto xyz_ptr = xyz_impl->cnnlMalloc();
+  // transpose point_features [B, N, C] -> [B, C, N]
+  auto pts_feature_ = pts_feature.permute({0, 2, 1}).contiguous();
+  auto pts_feature_impl = torch_mlu::getMluTensorImpl(pts_feature_);
+  auto pts_feature_ptr = pts_feature_impl->cnnlMalloc();
+  auto boxes3d_impl = torch_mlu::getMluTensorImpl(boxes3d);
+  auto boxes3d_ptr = boxes3d_impl->cnnlMalloc();
+  auto pooled_features_impl = torch_mlu::getMluTensorImpl(pooled_features);
+  auto pooled_features_ptr = pooled_features_impl->cnnlMalloc();
+  auto pooled_empty_flag_impl = torch_mlu::getMluTensorImpl(pooled_empty_flag);
+  auto pooled_empty_flag_ptr = pooled_empty_flag_impl->cnnlMalloc();
+
+  // get compute dtype of input
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(xyz_.dtype());
+
+  // launch kernel
+  if (boxes_num <= 10240) {
+    CNLOG(INFO) << "Launch Kernel MLUKernelRoiPointPool3dForward<<<" << k_dim.x
+                << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+    KernelRoiPointPool3dForward(
+        k_dim, k_type, queue, data_type, batch_size, pts_num, boxes_num,
+        feature_in_len, sampled_pts_num, xyz_ptr, boxes3d_ptr, pts_feature_ptr,
+        pooled_features_ptr, (int *)pooled_empty_flag_ptr);
+  } else {
+    CNLOG(INFO)
+        << "Launch Kernel MLUKernelRoiPointPool3dLargeBoxesNumForward<<<"
+        << k_dim.x << ", " << k_dim.y << ", " << k_dim.z << ">>>";
+    KernelRoiPointPool3dLargeBoxesNumForward(
+        k_dim, k_type, queue, data_type, batch_size, pts_num, boxes_num,
+        feature_in_len, sampled_pts_num, xyz_ptr, boxes3d_ptr, pts_feature_ptr,
+        pooled_features_ptr, (int *)pooled_empty_flag_ptr);
+  }
+}
+
+void roipoint_pool3d_forward_mlu(int batch_size, int pts_num, int boxes_num,
+                                 int feature_in_len, int sampled_pts_num,
+                                 const Tensor xyz, const Tensor boxes3d,
+                                 const Tensor pts_feature,
+                                 Tensor pooled_features,
+                                 Tensor pooled_empty_flag) {
+  RoIPointPool3dForwardMLUKernelLauncher(
+      batch_size, pts_num, boxes_num, feature_in_len, sampled_pts_num, xyz,
+      boxes3d, pts_feature, pooled_features, pooled_empty_flag);
+}
+
+void roipoint_pool3d_forward_impl(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag);
+
+REGISTER_DEVICE_IMPL(roipoint_pool3d_forward_impl, MLU,
+                     roipoint_pool3d_forward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/three_nn_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/three_nn_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f407e3f63ae2d9473aae317c22d834b5caf4714c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/three_nn_mlu.cpp
@@ -0,0 +1,100 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelThreeNNForward(cnrtDim3_t k_dim, cnrtFunctionType_t k_type,
+                          cnrtQueue_t queue, cnrtDataType_t data_type,
+                          const void *unknown, const void *known, void *dist2,
+                          int *idx, const int b, const int n, const int m);
+
+void ThreeNNMLUKernelLauncher(int b, int n, int m, const Tensor unknown,
+                              const Tensor known, Tensor dist2, Tensor idx) {
+  // Check dtype.
+  TORCH_CHECK(
+      unknown.scalar_type() == at::kFloat || unknown.scalar_type() == at::kHalf,
+      "unknown type should be Float or Half, got ", unknown.scalar_type(), ".");
+  TORCH_CHECK(unknown.scalar_type() == known.scalar_type(),
+              "known should have the same type as unknown.");
+  TORCH_CHECK(unknown.scalar_type() == dist2.scalar_type(),
+              "dist2 should have the same type as unknown.");
+  TORCH_CHECK(idx.scalar_type() == at::kInt, "idx type should be Int.");
+
+  // Check shape.
+  TORCH_CHECK(unknown.dim() == 3, "unknown should be 3d tensor, got ",
+              unknown.dim(), "D.");
+  TORCH_CHECK(known.dim() == 3, "known should be 3d tensor, got ", known.dim(),
+              "D.");
+  TORCH_CHECK(unknown.size(0) == known.size(0),
+              "known.dim0 should be equal to unknown.dim0, got ", known.size(0),
+              ".");
+  TORCH_CHECK(unknown.size(2) == 3, "unknown dim2 should be 3, got ",
+              unknown.size(2), ".");
+  TORCH_CHECK(known.size(2) == 3, "known dim2 should be 3, got ", known.size(2),
+              ".");
+
+  // zero element check
+  TORCH_CHECK(unknown.numel() > 0,
+              "unknown.numel should greater than zero, got ", unknown.numel(),
+              ".");
+  if (known.numel() == 0) {
+    // return if known zero element
+    return;
+  }
+
+  // large tensor check
+  const size_t max_input_num = 2147483648;  // 2^31, 2G num
+  TORCH_CHECK(unknown.numel() < max_input_num,
+              "unknown.numel() should be less than 2147483648, got ",
+              unknown.numel(), ".");
+  TORCH_CHECK(known.numel() < max_input_num,
+              "known.numel() should be less than 2147483648, got ",
+              known.numel(), ".");
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get ptr of tensors
+  auto unknown_impl = torch_mlu::getMluTensorImpl(unknown);
+  auto unknown_ptr = unknown_impl->cnnlMalloc();
+  auto known_t = known.permute({0, 2, 1}).contiguous();
+  auto known_impl = torch_mlu::getMluTensorImpl(known_t);
+  auto known_ptr = known_impl->cnnlMalloc();
+  auto dist2_impl = torch_mlu::getMluTensorImpl(dist2);
+  auto dist2_ptr = dist2_impl->cnnlMalloc();
+  auto idx_impl = torch_mlu::getMluTensorImpl(idx);
+  auto idx_ptr = idx_impl->cnnlMalloc();
+
+  cnrtJobType_t k_type = CNRT_FUNC_TYPE_UNION1;
+  cnrtDim3_t k_dim;
+  k_dim.x = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  k_dim.y = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  k_dim.z = 1;
+  cnrtDataType_t data_type = torch_mlu::toCnrtDtype(unknown.dtype());
+
+  // launch kernel
+  CNLOG(INFO) << "Launch Kernel MLUKernelThreeNNForward<<<" << k_dim.x << ", "
+              << k_dim.y << ", " << k_dim.z << ">>>.";
+
+  KernelThreeNNForward(k_dim, k_type, queue, data_type, unknown_ptr, known_ptr,
+                       dist2_ptr, (int *)idx_ptr, b, n, m);
+}
+
+void three_nn_forward_mlu(int b, int n, int m, const Tensor unknown,
+                          const Tensor known, Tensor dist2, Tensor idx) {
+  ThreeNNMLUKernelLauncher(b, n, m, unknown, known, dist2, idx);
+}
+
+void three_nn_forward_impl(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx);
+
+REGISTER_DEVICE_IMPL(three_nn_forward_impl, MLU, three_nn_forward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/tin_shift_mlu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/tin_shift_mlu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..728330795da89e944e037040f92e10be3634c406
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mlu/tin_shift_mlu.cpp
@@ -0,0 +1,203 @@
+/*************************************************************************
+ * Copyright (C) 2022 Cambricon.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *************************************************************************/
+#include "pytorch_device_registry.hpp"
+#include "pytorch_mlu_helper.hpp"
+
+void KernelTinShiftForward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *input, const void *shifts, void *output, const int batch_size,
+    const int time_size, const int channel_size, const int hw_size,
+    const int group_size, const int group_channel,
+    const cnrtDataType_t data_dtype, const int channel_per_core,
+    const int max_number_hw_per_core, const int max_length_per_core);
+
+void KernelTinShiftBackward(
+    cnrtDim3_t k_dim, cnrtFunctionType_t k_type, cnrtQueue_t queue,
+    const void *grad_output, const void *shifts, void *grad_input,
+    const int batch_size, const int time_size, const int channel_size,
+    const int hw_size, const int group_size, const int group_channel,
+    const cnrtDataType_t data_dtype, const int channel_per_core,
+    const int max_number_hw_per_core, const int max_length_per_core);
+
+// policy function
+static void policyFunc(const Tensor &input, cnrtDim3_t *k_dim,
+                       cnrtFunctionType_t *k_type, int *channel_per_core,
+                       int *max_number_hw_per_core, int *max_length_per_core) {
+  const int32_t cluster_limit = torch_mlu::getDeviceAttr(cnrtAttrClusterCount);
+  const int32_t core_limit = torch_mlu::getDeviceAttr(cnrtAttrMcorePerCluster);
+  auto nram_size = torch_mlu::getDeviceAttr(cnrtAttrNramSizePerMcore);
+  const int core_num = core_limit * cluster_limit;
+  const int batch_size = input.size(0);
+  const int time_size = input.size(1);
+  const int channel_size = input.size(2);
+  const int hw_size = input.size(3);
+
+  const size_t size_per_channel = time_size * hw_size * input.itemsize();
+  *channel_per_core = nram_size / size_per_channel;
+  int task_dim = 0;
+  if (*channel_per_core == 0) {
+    const size_t size_per_hw = hw_size * input.itemsize();
+    *max_number_hw_per_core = nram_size / size_per_hw;
+    if (*max_number_hw_per_core <= 0) {
+      *max_length_per_core = nram_size / input.itemsize();
+    }
+    int tmp_max_number_hw_per_core =
+        *max_number_hw_per_core > 0 ? *max_number_hw_per_core : 1;
+    const int loop_time =
+        (time_size / (tmp_max_number_hw_per_core)) +
+        ((time_size % (tmp_max_number_hw_per_core)) > 0 ? 1 : 0);
+    task_dim = batch_size * channel_size * loop_time < core_num
+                   ? batch_size * channel_size * loop_time
+                   : core_num;
+  } else {
+    task_dim = batch_size * channel_size < core_num ? batch_size * channel_size
+                                                    : core_num;
+  }
+
+  k_dim->x = core_limit;
+  k_dim->y = (task_dim / core_limit) > 0 ? (task_dim / core_limit) : 1;
+  k_dim->z = 1;
+  *k_type = CNRT_FUNC_TYPE_UNION1;
+}
+
+void TINShiftForwardMLUKernelLauncher(Tensor input, Tensor shift,
+                                      Tensor output) {
+  // params check
+  TORCH_CHECK(
+      input.scalar_type() == at::kFloat || input.scalar_type() == at::kHalf,
+      "input type should be Float or Half, got ", input.scalar_type(), ".");
+  TORCH_CHECK(input.dim() == 4, "input should be a 4d tensor, got ",
+              input.dim(), "d.");
+  TORCH_CHECK(shift.dim() == 2, "shift should be a 2d tensor, got ",
+              shift.dim(), "d.");
+  TORCH_CHECK(
+      input.size(0) == shift.size(0),
+      "input batch size should be the same as shift's, input batch size is ",
+      input.size(0), " and shift batch size is ", shift.size(0), ".");
+  TORCH_CHECK(input.size(0) != 0, "Input batch size should not be zero.");
+  TORCH_CHECK(input.size(3) != 0,
+              "The last dim size of input should not be zero.");
+  if (input.size(1) == 0) {
+    return;
+  }
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  int channel_per_core = 0;
+  int max_number_hw_per_core = 0;
+  int max_length_per_core = 0;
+  policyFunc(input, &k_dim, &k_type, &channel_per_core, &max_number_hw_per_core,
+             &max_length_per_core);
+
+  const int batch_size = input.size(0);
+  const int time_size = input.size(1);
+  const int channel_size = input.size(2);
+  const int hw_size = input.size(3);
+  const int group_size = shift.size(1);
+  int group_channel = channel_size / group_size;
+
+  // get tensor impl
+  auto input_impl = torch_mlu::getMluTensorImpl(input);
+  auto shift_impl = torch_mlu::getMluTensorImpl(shift);
+  auto output_impl = torch_mlu::getMluTensorImpl(output);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get the mlu ptr
+  auto input_ptr = input_impl->cnnlMalloc();
+  auto shift_ptr = shift_impl->cnnlMalloc();
+  auto output_ptr = output_impl->cnnlMalloc();
+
+  cnrtDataType_t data_dtype = torch_mlu::toCnrtDtype(input.dtype());
+
+  KernelTinShiftForward(k_dim, k_type, queue, input_ptr, shift_ptr, output_ptr,
+                        batch_size, time_size, channel_size, hw_size,
+                        group_size, group_channel, data_dtype, channel_per_core,
+                        max_number_hw_per_core, max_length_per_core);
+}
+
+void TINShiftBackwardMLUKernelLauncher(Tensor grad_output, Tensor shift,
+                                       Tensor grad_input) {
+  // params check
+  TORCH_CHECK(grad_output.scalar_type() == at::kFloat ||
+                  grad_output.scalar_type() == at::kHalf,
+              "grad_output type should be Float or Half, got ",
+              grad_output.scalar_type(), ".");
+  TORCH_CHECK(grad_output.dim() == 4, "grad_output should be a 4d tensor, got ",
+              grad_output.dim(), "d.");
+  TORCH_CHECK(shift.dim() == 2, "shift should be a 2d tensor, got ",
+              shift.dim(), "d.");
+  TORCH_CHECK(grad_output.size(0) == shift.size(0),
+              "grad_output batch size should be the same as shift's, "
+              "grad_output batch size is ",
+              grad_output.size(0), ", shift batch size is ", shift.size(0),
+              ".");
+  TORCH_CHECK(grad_output.size(0) != 0,
+              "grad_output batch size should not be zero.");
+  TORCH_CHECK(grad_output.size(3) != 0,
+              "The last dim size of grad_output should not be zero.");
+  if (grad_output.size(1) == 0) {
+    return;
+  }
+  cnrtDim3_t k_dim;
+  cnrtFunctionType_t k_type;
+  int channel_per_core = 0;
+  int max_number_hw_per_core = 0;
+  int max_length_per_core = 0;
+  policyFunc(grad_output, &k_dim, &k_type, &channel_per_core,
+             &max_number_hw_per_core, &max_length_per_core);
+
+  const int batch_size = grad_output.size(0);
+  const int time_size = grad_output.size(1);
+  const int channel_size = grad_output.size(2);
+  const int hw_size = grad_output.size(3);
+  const int group_size = shift.size(1);
+  int group_channel = channel_size / group_size;
+
+  // get tensor impl
+  auto grad_output_impl = torch_mlu::getMluTensorImpl(grad_output);
+  auto shift_impl = torch_mlu::getMluTensorImpl(shift);
+  auto grad_input_impl = torch_mlu::getMluTensorImpl(grad_input);
+
+  // get compute queue
+  auto queue = torch_mlu::getCurQueue();
+
+  // get the mlu ptr
+  auto grad_output_ptr = grad_output_impl->cnnlMalloc();
+  auto shift_ptr = shift_impl->cnnlMalloc();
+  auto grad_input_ptr = grad_input_impl->cnnlMalloc();
+
+  cnrtDataType_t data_dtype = torch_mlu::toCnrtDtype(grad_output.dtype());
+
+  KernelTinShiftBackward(k_dim, k_type, queue, grad_output_ptr, shift_ptr,
+                         grad_input_ptr, batch_size, time_size, channel_size,
+                         hw_size, group_size, group_channel, data_dtype,
+                         channel_per_core, max_number_hw_per_core,
+                         max_length_per_core);
+}
+
+void tin_shift_forward_mlu(Tensor input, Tensor shift, Tensor output) {
+  TINShiftForwardMLUKernelLauncher(input, shift, output);
+}
+
+void tin_shift_backward_mlu(Tensor grad_output, Tensor shift,
+                            Tensor grad_input) {
+  TINShiftBackwardMLUKernelLauncher(grad_output, shift, grad_input);
+}
+
+void tin_shift_forward_impl(Tensor input, Tensor shift, Tensor output);
+
+void tin_shift_backward_impl(Tensor grad_output, Tensor shift,
+                             Tensor grad_input);
+
+REGISTER_DEVICE_IMPL(tin_shift_forward_impl, MLU, tin_shift_forward_mlu);
+REGISTER_DEVICE_IMPL(tin_shift_backward_impl, MLU, tin_shift_backward_mlu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/modulated_deform_conv.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/modulated_deform_conv.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..12b538a05e6fd98becccfddf8e79cba7abf96d93
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/modulated_deform_conv.cpp
@@ -0,0 +1,237 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void modulated_deformable_im2col_impl(
+    const Tensor data_im, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor data_col) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_im2col_impl, data_im, data_offset,
+                       data_mask, batch_size, channels, height_im, width_im,
+                       height_col, width_col, kernel_h, kernel_w, pad_h, pad_w,
+                       stride_h, stride_w, dilation_h, dilation_w,
+                       deformable_group, data_col);
+}
+
+void modulated_deformable_col2im_impl(
+    const Tensor data_col, const Tensor data_offset, const Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kernel_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, Tensor grad_im) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_col2im_impl, data_col, data_offset,
+                       data_mask, batch_size, channels, height_im, width_im,
+                       height_col, width_col, kernel_h, kernel_w, pad_h, pad_w,
+                       stride_h, stride_w, dilation_h, dilation_w,
+                       deformable_group, grad_im);
+}
+
+void modulated_deformable_col2im_coord_impl(
+    const Tensor data_col, const Tensor data_im, const Tensor data_offset,
+    const Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    Tensor grad_offset, Tensor grad_mask) {
+  DISPATCH_DEVICE_IMPL(modulated_deformable_col2im_coord_impl, data_col,
+                       data_im, data_offset, data_mask, batch_size, channels,
+                       height_im, width_im, height_col, width_col, kernel_h,
+                       kernel_w, pad_h, pad_w, stride_h, stride_w, dilation_h,
+                       dilation_w, deformable_group, grad_offset, grad_mask);
+}
+
+void modulated_deform_conv_forward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor output, Tensor columns, int kernel_h, int kernel_w,
+    const int stride_h, const int stride_w, const int pad_h, const int pad_w,
+    const int dilation_h, const int dilation_w, const int group,
+    const int deformable_group, const bool with_bias) {
+  at::DeviceGuard guard(input.device());
+
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+
+  const int channels_out = weight.size(0);
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape won't match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels won't match: (%d vs %d).",
+             channels, channels_kernel * group);
+
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+
+  // resize output
+  output = output.view({batch, channels_out, height_out, width_out}).zero_();
+  // resize temporary columns
+  columns =
+      at::zeros({channels * kernel_h * kernel_w, 1 * height_out * width_out},
+                input.options());
+
+  output = output.view({output.size(0), group, output.size(1) / group,
+                        output.size(2), output.size(3)});
+
+  for (int b = 0; b < batch; b++) {
+    modulated_deformable_im2col_impl(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+
+    // divide into group
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+
+    for (int g = 0; g < group; g++) {
+      output[b][g] = output[b][g]
+                         .flatten(1)
+                         .addmm_(weight[g].flatten(1), columns[g])
+                         .view_as(output[b][g]);
+    }
+
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+  }
+
+  output = output.view({output.size(0), output.size(1) * output.size(2),
+                        output.size(3), output.size(4)});
+
+  if (with_bias) {
+    output += bias.view({1, bias.size(0), 1, 1});
+  }
+}
+
+void modulated_deform_conv_backward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor columns, Tensor grad_input, Tensor grad_weight,
+    Tensor grad_bias, Tensor grad_offset, Tensor grad_mask, Tensor grad_output,
+    int kernel_h, int kernel_w, int stride_h, int stride_w, int pad_h,
+    int pad_w, int dilation_h, int dilation_w, int group, int deformable_group,
+    const bool with_bias) {
+  at::DeviceGuard guard(input.device());
+
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape won't match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels won't match: (%d vs %d).",
+             channels, channels_kernel * group);
+
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+
+  grad_input = grad_input.view({batch, channels, height, width});
+  columns = at::zeros({channels * kernel_h * kernel_w, height_out * width_out},
+                      input.options());
+
+  grad_output =
+      grad_output.view({grad_output.size(0), group, grad_output.size(1) / group,
+                        grad_output.size(2), grad_output.size(3)});
+
+  for (int b = 0; b < batch; b++) {
+    // divide int group
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+
+    for (int g = 0; g < group; g++) {
+      columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                        grad_output[b][g].flatten(1), 0.0f, 1.0f);
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+
+    // gradient w.r.t. input coordinate data
+    modulated_deformable_col2im_coord_impl(
+        columns, input[b], offset[b], mask[b], 1, channels, height, width,
+        height_out, width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h,
+        stride_w, dilation_h, dilation_w, deformable_group, grad_offset[b],
+        grad_mask[b]);
+    // gradient w.r.t. input data
+    modulated_deformable_col2im_impl(
+        columns, offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, grad_input[b]);
+
+    // gradient w.r.t. weight, dWeight should accumulate across the batch and
+    // group
+    modulated_deformable_im2col_impl(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    grad_weight = grad_weight.view({group, grad_weight.size(0) / group,
+                                    grad_weight.size(1), grad_weight.size(2),
+                                    grad_weight.size(3)});
+    if (with_bias)
+      grad_bias = grad_bias.view({group, grad_bias.size(0) / group});
+
+    for (int g = 0; g < group; g++) {
+      grad_weight[g] =
+          grad_weight[g]
+              .flatten(1)
+              .addmm_(grad_output[b][g].flatten(1), columns[g].transpose(0, 1))
+              .view_as(grad_weight[g]);
+      if (with_bias) {
+        grad_bias[g] =
+            grad_bias[g]
+                .view({-1, 1})
+                .addmm_(grad_output[b][g].flatten(1), ones.view({-1, 1}))
+                .view(-1);
+      }
+    }
+
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    grad_weight = grad_weight.view({grad_weight.size(0) * grad_weight.size(1),
+                                    grad_weight.size(2), grad_weight.size(3),
+                                    grad_weight.size(4)});
+    if (with_bias)
+      grad_bias = grad_bias.view({grad_bias.size(0) * grad_bias.size(1)});
+  }
+  grad_output = grad_output.view({grad_output.size(0) * grad_output.size(1),
+                                  grad_output.size(2), grad_output.size(3),
+                                  grad_output.size(4)});
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mps/bbox_overlaps_mps.mm b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mps/bbox_overlaps_mps.mm
new file mode 100644
index 0000000000000000000000000000000000000000..cad6a41a09a0d9dbf43ae473235c356b16a2eec8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/mps/bbox_overlaps_mps.mm
@@ -0,0 +1,99 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "pytorch_device_registry.hpp"
+
+#include "MPSLibrary.h"
+#include "MPSStream.h"
+#include "MPSUtils.h"
+
+using at::Tensor;
+
+const static std::string kSourceCode = R"(
+#include <metal_math>
+#include <metal_stdlib>
+using namespace metal;
+
+kernel void bbox_overlap_mps_kernel(constant const float4* bboxes1,
+                       constant const float4* bboxes2,
+                       device float* ious,
+                       constant int& num_bbox1,
+                       constant int& num_bbox2,
+                       constant int& mode,
+                       constant bool& aligned,
+                       constant int& offset,
+                       uint index [[thread_position_in_grid]])
+{
+    int base1 = index;
+    int base2 = index;
+    if(!aligned){
+      base1 = index / num_bbox2;
+      base2 = index % num_bbox2;
+    }
+
+    const float f_offset = float(offset);
+
+    const float4 b1 = bboxes1[base1];
+    const float b1_area = (b1[2]-b1[0]+f_offset)*(b1[3]-b1[1]+f_offset);
+
+    const float4 b2 = bboxes2[base2];
+    const float b2_area = (b2[2]-b2[0]+f_offset)*(b2[3]-b2[1]+f_offset);
+
+    const float2 left_top = fmax(b1.xy, b2.xy);
+    const float2 right_bottom = fmin(b1.zw, b2.zw);
+    const float2 wh = fmax(right_bottom - left_top + f_offset, 0.0f);
+    const float interS = wh.x * wh.y;
+
+    const float baseS =
+        fmax(mode == 0 ? b1_area + b2_area - interS : b1_area, f_offset);
+    ious[index] = interS / baseS;
+}
+)";
+
+void BBoxOverlapsMPSKernelLauncher(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                                   const int mode, const bool aligned, const int offset) {
+  // get stream
+  auto stream = at::mps::getCurrentMPSStream();
+  auto library_manager = MPSLibraryManager::getInstance();
+  MPSLibrary* library;
+  const static std::string kLibraryName = "bbox_overlap";
+  if (library_manager->hasLibrary(kLibraryName))
+    library = library_manager->getLibrary(kLibraryName);
+  else
+    library = library_manager->createLibraryFromSouce(kLibraryName, kSourceCode);
+  auto func_pso = library->getComputePipelineState("bbox_overlap_mps_kernel");
+
+  // create command buffer and encoder
+  MTLCommandBuffer_t command_buffer = stream->commandBuffer();
+  MTLComputeCommandEncoder_t compute_encoder = [command_buffer computeCommandEncoder];
+
+  // set pso and buffer
+  int output_size = ious.numel();
+  int num_bbox1 = bboxes1.size(0);
+  int num_bbox2 = bboxes2.size(0);
+  int num_elements = output_size;
+  setMTLArgs(compute_encoder, func_pso, bboxes1, bboxes2, ious, num_bbox1, num_bbox2, mode, aligned,
+             offset);
+
+  // set grid size
+  MTLSize grid_size = MTLSizeMake(num_elements, 1, 1);
+  NSUInteger thread_group_size_x = func_pso.maxTotalThreadsPerThreadgroup;
+  if (thread_group_size_x > num_elements) {
+    thread_group_size_x = num_elements;
+  }
+  MTLSize thread_group_size = MTLSizeMake(thread_group_size_x, 1, 1);
+
+  // encoding
+  [compute_encoder dispatchThreads:grid_size threadsPerThreadgroup:thread_group_size];
+  [compute_encoder endEncoding];
+
+  // commit, not sure if flush is required
+  stream->commit(false);
+}
+
+void bbox_overlaps_mps(const Tensor bboxes1, const Tensor bboxes2, Tensor ious, const int mode,
+                       const bool aligned, const int offset) {
+  BBoxOverlapsMPSKernelLauncher(bboxes1, bboxes2, ious, mode, aligned, offset);
+}
+
+void bbox_overlaps_impl(const Tensor bboxes1, const Tensor bboxes2, Tensor ious, const int mode,
+                        const bool aligned, const int offset);
+REGISTER_DEVICE_IMPL(bbox_overlaps_impl, MPS, bbox_overlaps_mps);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ms_deform_attn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ms_deform_attn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..25c8f6209b16c475ba181eea7c880eb27cca4082
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/ms_deform_attn.cpp
@@ -0,0 +1,60 @@
+/*!
+**************************************************************************************************
+* Deformable DETR
+* Copyright (c) 2020 SenseTime. All Rights Reserved.
+* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
+**************************************************************************************************
+* Modified from
+*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
+**************************************************************************************************
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor ms_deform_attn_impl_forward(const Tensor &value,
+                                   const Tensor &spatial_shapes,
+                                   const Tensor &level_start_index,
+                                   const Tensor &sampling_loc,
+                                   const Tensor &attn_weight,
+                                   const int im2col_step) {
+  return DISPATCH_DEVICE_IMPL(ms_deform_attn_impl_forward, value,
+                              spatial_shapes, level_start_index, sampling_loc,
+                              attn_weight, im2col_step);
+}
+
+void ms_deform_attn_impl_backward(
+    const Tensor &value, const Tensor &spatial_shapes,
+    const Tensor &level_start_index, const Tensor &sampling_loc,
+    const Tensor &attn_weight, const Tensor &grad_output, Tensor &grad_value,
+    Tensor &grad_sampling_loc, Tensor &grad_attn_weight,
+    const int im2col_step) {
+  DISPATCH_DEVICE_IMPL(ms_deform_attn_impl_backward, value, spatial_shapes,
+                       level_start_index, sampling_loc, attn_weight,
+                       grad_output, grad_value, grad_sampling_loc,
+                       grad_attn_weight, im2col_step);
+}
+
+Tensor ms_deform_attn_forward(const Tensor &value, const Tensor &spatial_shapes,
+                              const Tensor &level_start_index,
+                              const Tensor &sampling_loc,
+                              const Tensor &attn_weight,
+                              const int im2col_step) {
+  at::DeviceGuard guard(value.device());
+  return ms_deform_attn_impl_forward(value, spatial_shapes, level_start_index,
+                                     sampling_loc, attn_weight, im2col_step);
+}
+
+void ms_deform_attn_backward(const Tensor &value, const Tensor &spatial_shapes,
+                             const Tensor &level_start_index,
+                             const Tensor &sampling_loc,
+                             const Tensor &attn_weight,
+                             const Tensor &grad_output, Tensor &grad_value,
+                             Tensor &grad_sampling_loc,
+                             Tensor &grad_attn_weight, const int im2col_step) {
+  at::DeviceGuard guard(value.device());
+  ms_deform_attn_impl_backward(value, spatial_shapes, level_start_index,
+                               sampling_loc, attn_weight, grad_output,
+                               grad_value, grad_sampling_loc, grad_attn_weight,
+                               im2col_step);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..199d8af236f5442fcdd53ce3dfd8d24aa67481bb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms.cpp
@@ -0,0 +1,33 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return DISPATCH_DEVICE_IMPL(nms_impl, boxes, scores, iou_threshold, offset);
+}
+
+Tensor softnms_impl(Tensor boxes, Tensor scores, Tensor dets,
+                    float iou_threshold, float sigma, float min_score,
+                    int method, int offset) {
+  return DISPATCH_DEVICE_IMPL(softnms_impl, boxes, scores, dets, iou_threshold,
+                              sigma, min_score, method, offset);
+}
+
+std::vector<std::vector<int> > nms_match_impl(Tensor dets,
+                                              float iou_threshold) {
+  return DISPATCH_DEVICE_IMPL(nms_match_impl, dets, iou_threshold);
+}
+
+Tensor nms(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  return nms_impl(boxes, scores, iou_threshold, offset);
+}
+
+Tensor softnms(Tensor boxes, Tensor scores, Tensor dets, float iou_threshold,
+               float sigma, float min_score, int method, int offset) {
+  return softnms_impl(boxes, scores, dets, iou_threshold, sigma, min_score,
+                      method, offset);
+}
+
+std::vector<std::vector<int> > nms_match(Tensor dets, float iou_threshold) {
+  return nms_match_impl(dets, iou_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_quadri.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_quadri.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b8baed951a6306d589e3609986f6fce1dd571067
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_quadri.cpp
@@ -0,0 +1,30 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+#include "pytorch_cpp_helper.hpp"
+
+Tensor nms_quadri_cpu(const Tensor dets, const Tensor scores,
+                      const float iou_threshold);
+
+#ifdef MMCV_WITH_CUDA
+Tensor nms_quadri_cuda(const Tensor dets, const Tensor scores,
+                       const Tensor order, const Tensor dets_sorted,
+                       const float iou_threshold, const int multi_label);
+#endif
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+Tensor nms_quadri(const Tensor dets, const Tensor scores, const Tensor order,
+                  const Tensor dets_sorted, const float iou_threshold,
+                  const int multi_label) {
+  assert(dets.device().is_cuda() == scores.device().is_cuda());
+  if (dets.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    return nms_quadri_cuda(dets, scores, order, dets_sorted, iou_threshold,
+                           multi_label);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+
+  return nms_quadri_cpu(dets, scores, iou_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e4ef676a9d6f94e5f60b7c9e1df8ce78eb6cbaa2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/nms_rotated.cpp
@@ -0,0 +1,32 @@
+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+// modified from
+// https://github.com/facebookresearch/detectron2/blob/master/detectron2/layers/csrc/nms_rotated/nms_rotated.h
+#include "pytorch_cpp_helper.hpp"
+
+Tensor nms_rotated_cpu(const Tensor dets, const Tensor scores,
+                       const float iou_threshold);
+
+#ifdef MMCV_WITH_CUDA
+Tensor nms_rotated_cuda(const Tensor dets, const Tensor scores,
+                        const Tensor order, const Tensor dets_sorted,
+                        const float iou_threshold, const int multi_label);
+#endif
+
+// Interface for Python
+// inline is needed to prevent multiple function definitions when this header is
+// included by different cpps
+Tensor nms_rotated(const Tensor dets, const Tensor scores, const Tensor order,
+                   const Tensor dets_sorted, const float iou_threshold,
+                   const int multi_label) {
+  assert(dets.device().is_cuda() == scores.device().is_cuda());
+  if (dets.device().is_cuda()) {
+#ifdef MMCV_WITH_CUDA
+    return nms_rotated_cuda(dets, scores, order, dets_sorted, iou_threshold,
+                            multi_label);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+
+  return nms_rotated_cpu(dets, scores, iou_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/deform_roi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/deform_roi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0e9f2ee7ac7189b80b18be14f92cced7089efc5c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/deform_roi_pool.cpp
@@ -0,0 +1,63 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+void deform_roi_pool_forward_impl(Tensor input, Tensor rois, Tensor offset,
+                                  Tensor output, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma);
+
+void deform_roi_pool_backward_impl(Tensor grad_output, Tensor input,
+                                   Tensor rois, Tensor offset,
+                                   Tensor grad_input, Tensor grad_offset,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale, int sampling_ratio,
+                                   float gamma);
+
+void deform_roi_pool_forward_npu(Tensor input, Tensor rois, Tensor offset,
+                                 Tensor output, int pooled_height,
+                                 int pooled_width, float spatial_scale,
+                                 int sampling_ratio, float gamma) {
+  c10::SmallVector<int64_t, 2> output_sizes = {pooled_height, pooled_width};
+  at::IntArrayRef output_size = at::IntArrayRef(output_sizes);
+  int64_t sampling_ratio_ = (int64_t)sampling_ratio;
+  OpCommand cmd;
+  cmd.Name("DeformableRoiPool")
+      .Input(input)
+      .Input(rois)
+      .Input(offset)
+      .Output(output)
+      .Attr("spatial_scale", spatial_scale)
+      .Attr("output_size", output_size)
+      .Attr("sampling_ratio", sampling_ratio_)
+      .Attr("gamma", gamma)
+      .Run();
+}
+
+void deform_roi_pool_backward_npu(Tensor grad_output, Tensor input, Tensor rois,
+                                  Tensor offset, Tensor grad_input,
+                                  Tensor grad_offset, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int sampling_ratio, float gamma) {
+  c10::SmallVector<int64_t, 2> output_sizes = {pooled_height, pooled_width};
+  at::IntArrayRef output_size = at::IntArrayRef(output_sizes);
+  int64_t sampling_ratio_ = (int64_t)sampling_ratio;
+  OpCommand cmd;
+  cmd.Name("DeformableRoiPoolGrad")
+      .Input(grad_input)
+      .Input(input)
+      .Input(rois)
+      .Input(offset)
+      .Output(grad_output)
+      .Output(grad_offset)
+      .Attr("output_size", output_size)
+      .Attr("spatial_scale", spatial_scale)
+      .Attr("sample_ratio", sampling_ratio_)
+      .Attr("gamma", gamma)
+      .Run();
+}
+
+REGISTER_NPU_IMPL(deform_roi_pool_forward_impl, deform_roi_pool_forward_npu);
+
+REGISTER_NPU_IMPL(deform_roi_pool_backward_impl, deform_roi_pool_backward_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/focal_loss_npu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/focal_loss_npu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c949bf9539bfab17fa187d9b93f02ccc0908a296
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/focal_loss_npu.cpp
@@ -0,0 +1,162 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+void sigmoid_focal_loss_forward_npu(Tensor input, Tensor target, Tensor weight,
+                                    Tensor output, float gamma, float alpha) {
+  int64_t n_class = input.size(1);
+  at::Tensor target_y = at::ones_like(input);
+  if (n_class == 1) {
+    target_y = at::reshape(target, input.sizes());
+    target_y = at::mul(target_y, -1.0);
+    target_y = at::add(target_y, 1.0);
+  } else {
+    target_y = at_npu::native::NPUNativeFunctions::one_hot(target, n_class);
+  }
+  target_y =
+      at_npu::native::NPUNativeFunctions::npu_dtype_cast(target_y, at::kInt);
+  int64_t weight_size = weight.size(0);
+  at::Tensor weight_y = at::ones_like(input);
+  if (weight_size > 0) {
+    weight_y = at_npu::native::NPUNativeFunctions::npu_broadcast(weight,
+                                                                 input.sizes());
+  }
+  OpCommand cmd;
+  string reduction = "none";
+  cmd.Name("SigmoidFocalLoss")
+      .Input(input)
+      .Input(target_y)
+      .Input(weight_y)
+      .Output(output)
+      .Attr("gamma", gamma)
+      .Attr("alpha", alpha)
+      .Attr("reduction", reduction)
+      .Run();
+}
+
+void sigmoid_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward_npu(Tensor input, Tensor target, Tensor weight,
+                                     Tensor grad_input, float gamma,
+                                     float alpha) {
+  int64_t n_class = input.size(1);
+  at::Tensor target_y = at::ones_like(input);
+  if (n_class == 1) {
+    target_y = at::reshape(target, input.sizes());
+  } else {
+    target_y = at_npu::native::NPUNativeFunctions::one_hot(target, n_class);
+    target_y = at::mul(target_y, -1.0);
+    target_y = at::add(target_y, 1.0);
+  }
+  target_y =
+      at_npu::native::NPUNativeFunctions::npu_dtype_cast(target_y, at::kInt);
+  at::Tensor grad_up = at::ones_like(input);
+  int64_t weight_size = weight.size(0);
+  at::Tensor weight_y = at::ones_like(input);
+  if (weight_size > 0) {
+    weight_y = at_npu::native::NPUNativeFunctions::npu_broadcast(weight,
+                                                                 input.sizes());
+  }
+  OpCommand cmd;
+  string reduction = "none";
+  cmd.Name("SigmoidFocalLossGrad")
+      .Input(input)
+      .Input(target_y)
+      .Input(grad_up)
+      .Input(weight_y)
+      .Output(grad_input)
+      .Attr("gamma", gamma)
+      .Attr("alpha", alpha)
+      .Attr("reduction", reduction)
+      .Run();
+}
+
+void sigmoid_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor grad_input,
+                                      float gamma, float alpha);
+
+void softmax_focal_loss_forward_npu(Tensor input, Tensor target, Tensor weight,
+                                    Tensor output, float gamma, float alpha) {
+  int64_t n_class = input.size(1);
+  at::Tensor target_y =
+      at_npu::native::NPUNativeFunctions::one_hot(target, n_class);
+  target_y =
+      at_npu::native::NPUNativeFunctions::npu_dtype_cast(target_y, at::kInt);
+  int64_t weight_size = weight.size(0);
+  at::Tensor weight_y = at::ones_like(input);
+  if (weight_size > 0) {
+    weight_y = at_npu::native::NPUNativeFunctions::npu_broadcast(weight,
+                                                                 input.sizes());
+  }
+  at::Tensor op_output = at::ones_like(input);
+  OpCommand cmd;
+  string reduction = "none";
+  cmd.Name("SoftmaxFocalLoss")
+      .Input(input)
+      .Input(target_y)
+      .Input(weight_y)
+      .Output(op_output)
+      .Attr("gamma", gamma)
+      .Attr("alpha", alpha)
+      .Attr("reduction", reduction)
+      .Run();
+  int64_t n_batch = input.size(0);
+  c10::SmallVector<int64_t, 2> offsets = {0, 0};
+  c10::SmallVector<int64_t, 2> sizes = {n_batch, 1};
+  at::IntArrayRef offset = at::IntArrayRef(offsets);
+  at::IntArrayRef size = at::IntArrayRef(sizes);
+  at_npu::native::NPUNativeFunctions::npu_slice_out(op_output, offset, size,
+                                                    output);
+}
+
+void softmax_focal_loss_forward_impl(Tensor input, Tensor target, Tensor weight,
+                                     Tensor grad_input, float gamma,
+                                     float alpha);
+
+void softmax_focal_loss_backward_npu(Tensor input, Tensor target, Tensor weight,
+                                     Tensor buff, Tensor grad_input,
+                                     float gamma, float alpha) {
+  int64_t n_class = input.size(1);
+  at::Tensor target_y =
+      at_npu::native::NPUNativeFunctions::one_hot(target, n_class);
+  target_y =
+      at_npu::native::NPUNativeFunctions::npu_dtype_cast(target_y, at::kInt);
+  at::Tensor grad_up = at::ones_like(input);
+  int64_t weight_size = weight.size(0);
+  at::Tensor weight_y = at::ones_like(input);
+  if (weight_size > 0) {
+    weight_y = at_npu::native::NPUNativeFunctions::npu_broadcast(weight,
+                                                                 input.sizes());
+  }
+  OpCommand cmd;
+  string reduction = "none";
+  cmd.Name("SoftmaxFocalLossGrad")
+      .Input(input)
+      .Input(target_y)
+      .Input(grad_up)
+      .Input(weight_y)
+      .Output(grad_input)
+      .Attr("gamma", gamma)
+      .Attr("alpha", alpha)
+      .Attr("reduction", reduction)
+      .Run();
+}
+
+void softmax_focal_loss_backward_impl(Tensor input, Tensor target,
+                                      Tensor weight, Tensor buff,
+                                      Tensor grad_input, float gamma,
+                                      float alpha);
+
+REGISTER_NPU_IMPL(sigmoid_focal_loss_forward_impl,
+                  sigmoid_focal_loss_forward_npu);
+
+REGISTER_NPU_IMPL(sigmoid_focal_loss_backward_impl,
+                  sigmoid_focal_loss_backward_npu);
+
+REGISTER_NPU_IMPL(softmax_focal_loss_forward_impl,
+                  softmax_focal_loss_forward_npu);
+
+REGISTER_NPU_IMPL(softmax_focal_loss_backward_impl,
+                  softmax_focal_loss_backward_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/fused_bias_leakyrelu_npu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/fused_bias_leakyrelu_npu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..cd052b5868e4b301de5b5f75cc9d51f0abd611bd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/fused_bias_leakyrelu_npu.cpp
@@ -0,0 +1,54 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+Tensor fused_bias_leakyrelu_op_impl(const Tensor &input, const Tensor &bias,
+                                    const Tensor &refer, int act, int grad,
+                                    float alpha, float scale);
+
+Tensor fused_bias_leakyrelu_npu(const Tensor &input, const Tensor &bias,
+                                const Tensor &refer, int act, int grad,
+                                float alpha, float scale) {
+  at::Tensor py = at::empty_like(input);
+  // forward
+  if (grad == 0) {
+    auto input_size = input.sizes();
+    int input_length = input_size.size();
+    c10::SmallVector<int64_t, SIZE> input_size_tmp;
+    input_size_tmp = array_to_small_vector(input_size);
+    if (input_length > 1) {
+      for (int i = 0; i < input_length; i++) {
+        if (i != 1) {
+          input_size_tmp[i] = 1;
+        }
+      }
+    }
+    at::Tensor bias_tmp = at::reshape(bias, input_size_tmp);
+    at::Tensor bias_ = at_npu::native::NPUNativeFunctions::npu_broadcast(
+        bias_tmp, input.sizes());
+    OpCommand cmd;
+    cmd.Name("FusedBiasLeakyRelu")
+        .Input(input)
+        .Input(bias_)
+        .Output(py)
+        .Attr("scale", scale)
+        .Attr("negative_slope", alpha)
+        .Run();
+  }
+
+  // backward
+  if (grad == 1) {
+    OpCommand cmd;
+    cmd.Name("FusedBiasLeakyReluGrad")
+        .Input(input)
+        .Input(refer)
+        .Output(py)
+        .Attr("scale", scale)
+        .Attr("negative_slope", alpha)
+        .Run();
+  }
+  return py;
+}
+
+REGISTER_NPU_IMPL(fused_bias_leakyrelu_op_impl, fused_bias_leakyrelu_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/nms_npu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/nms_npu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2f86893ea7846a8feffa1fd4c07e11d96ba76fd0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/nms_npu.cpp
@@ -0,0 +1,45 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+Tensor nms_npu(Tensor boxes, Tensor scores, float iou_threshold, int offset) {
+  at::Tensor boxed_offest = at_npu::native::OpPreparation::ApplyTensor(boxes);
+  at::Tensor ones_tensor =
+      at_npu::native::OpPreparation::ApplyTensor(boxes).fill_(1);
+  at::add_out(boxed_offest, boxes, ones_tensor, offset);
+  at::Tensor iou_threshold_y = at_npu::native::OpPreparation::ApplyTensor(
+                                   {}, boxes.options().dtype(at::kFloat), boxes)
+                                   .fill_(iou_threshold);
+  at::Tensor scores_threshold_y =
+      at_npu::native::OpPreparation::ApplyTensor(
+          {}, boxes.options().dtype(at::kFloat), boxes)
+          .fill_(0);
+  at::Tensor max_outputsize_y = at_npu::native::OpPreparation::ApplyTensor(
+                                    {}, boxes.options().dtype(at::kInt), boxes)
+                                    .fill_(boxes.size(0));
+  c10::SmallVector<int64_t, SIZE> outputsize = {boxes.size(0)};
+  at::Tensor output = at_npu::native::OpPreparation::ApplyTensor(
+                          outputsize, boxes.options().dtype(at::kInt), boxes)
+                          .fill_(-1);
+  OpCommand cmd;
+  cmd.Name("NonMaxSuppressionV3")
+      .Input(boxes)
+      .Input(scores)
+      .Input(max_outputsize_y)
+      .Input(iou_threshold_y)
+      .Input(scores_threshold_y)
+      .Output(output)
+      .Run();
+  auto outputsizeBool = at::gt(output, -1);
+  auto outputsizeInt = outputsizeBool.to(at::ScalarType::Int);
+  auto countLen = at::sum(outputsizeInt, at::ScalarType::Int);
+  at::Tensor actual_output = output.slice(0, 0, countLen.item().toLong());
+  actual_output = at_npu::native::NPUNativeFunctions::npu_dtype_cast(
+      actual_output, at::kLong);
+  return actual_output;
+}
+
+Tensor nms_impl(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+
+REGISTER_NPU_IMPL(nms_impl, nms_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/psa_mask_npu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/psa_mask_npu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..44ddb5431f498917099b9160a4deb3a1d9210815
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/psa_mask_npu.cpp
@@ -0,0 +1,75 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+void psamask_forward_npu(const int psa_type, const Tensor x, Tensor y,
+                         const int num, const int h_feature,
+                         const int w_feature, const int h_mask,
+                         const int w_mask, const int half_h_mask,
+                         const int half_w_mask) {
+  int64_t psa_type_i64 = psa_type;
+  int64_t num_i64 = num;
+  int64_t h_feature_i64 = h_feature;
+  int64_t w_feature_i64 = w_feature;
+  int64_t h_mask_i64 = h_mask;
+  int64_t w_mask_i64 = w_mask;
+  int64_t half_h_mask_i64 = half_h_mask;
+  int64_t half_w_mask_i64 = half_w_mask;
+  OpCommand cmd;
+  cmd.Name("PSAMask")
+      .Input(x)
+      .Output(y)
+      .Attr("psa_type", psa_type_i64)
+      .Attr("num", num_i64)
+      .Attr("h_feature", h_feature_i64)
+      .Attr("w_feature", w_feature_i64)
+      .Attr("h_mask", h_mask_i64)
+      .Attr("w_mask", w_mask_i64)
+      .Attr("half_h_mask", half_h_mask_i64)
+      .Attr("half_w_mask", half_w_mask_i64)
+      .Run();
+}
+
+void psamask_forward_impl(const int psa_type, const Tensor x, Tensor y,
+                          const int num, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask);
+
+void psamask_backward_npu(const int psa_type, const Tensor y_grad,
+                          Tensor x_grad, const int num, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask) {
+  int64_t psa_type_i64 = psa_type;
+  int64_t num_i64 = num;
+  int64_t h_feature_i64 = h_feature;
+  int64_t w_feature_i64 = w_feature;
+  int64_t h_mask_i64 = h_mask;
+  int64_t w_mask_i64 = w_mask;
+  int64_t half_h_mask_i64 = half_h_mask;
+  int64_t half_w_mask_i64 = half_w_mask;
+  OpCommand cmd;
+  cmd.Name("PSAMaskGrad")
+      .Input(y_grad)
+      .Output(x_grad)
+      .Attr("psa_type", psa_type_i64)
+      .Attr("num", num_i64)
+      .Attr("h_feature", h_feature_i64)
+      .Attr("w_feature", w_feature_i64)
+      .Attr("h_mask", h_mask_i64)
+      .Attr("w_mask", w_mask_i64)
+      .Attr("half_h_mask", half_h_mask_i64)
+      .Attr("half_w_mask", half_w_mask_i64)
+      .Run();
+}
+
+void psamask_backward_impl(const int psa_type, const Tensor y_grad,
+                           Tensor x_grad, const int num, const int h_feature,
+                           const int w_feature, const int h_mask,
+                           const int w_mask, const int half_h_mask,
+                           const int half_w_mask);
+
+REGISTER_NPU_IMPL(psamask_forward_impl, psamask_forward_npu);
+REGISTER_NPU_IMPL(psamask_backward_impl, psamask_backward_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/roi_pool_npu.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/roi_pool_npu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..36bd9c7a806c0619b0224b5d1f600ed723458d43
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/npu/roi_pool_npu.cpp
@@ -0,0 +1,34 @@
+#include "pytorch_npu_helper.hpp"
+
+using namespace NPU_NAME_SPACE;
+using namespace std;
+
+void roi_pool_forward_npu(Tensor input, Tensor rois, Tensor output,
+                          Tensor argmax, int pooled_height, int pooled_width,
+                          float spatial_scale) {
+  int64_t pooled_height_64 = pooled_height;
+  int64_t pooled_width_64 = pooled_width;
+  int64_t pooled_channel = 1;
+  at::Tensor roi_actual_num = at_npu::native::OpPreparation::ApplyTensor(
+      {}, rois.options().dtype(at::kInt), rois);
+
+  OpCommand cmd;
+  cmd.Name("RoiPoolingWithArgMax")
+      .Input(input)
+      .Input(rois)
+      .Input(roi_actual_num)
+      .Output(output)
+      .Output(argmax)
+      .Attr("pooled_h", pooled_height_64)
+      .Attr("pooled_w", pooled_width_64)
+      .Attr("spatial_scale_h", spatial_scale)
+      .Attr("spatial_scale_w", spatial_scale)
+      .Attr("pool_channel", pooled_channel)
+      .Run();
+}
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale);
+
+REGISTER_NPU_IMPL(roi_pool_forward_impl, roi_pool_forward_npu);
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pixel_group.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pixel_group.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..2bf8c8bbf2061cacb9e0c2d33c8a635834407622
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pixel_group.cpp
@@ -0,0 +1,26 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// It is modified from https://github.com/WenmuZhou/PAN.pytorch
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+std::vector<std::vector<float>> pixel_group_impl(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float dis_threshold) {
+  return DISPATCH_DEVICE_IMPL(pixel_group_impl, score, mask, embedding,
+                              kernel_label, kernel_contour, kernel_region_num,
+                              dis_threshold);
+}
+
+std::vector<std::vector<float>> pixel_group(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float distance_threshold) {
+  score = score.contiguous();
+  mask = mask.contiguous();
+  embedding = embedding.contiguous();
+  kernel_label = kernel_label.contiguous();
+  kernel_contour = kernel_contour.contiguous();
+
+  return pixel_group_impl(score, mask, embedding, kernel_label, kernel_contour,
+                          kernel_region_num, distance_threshold);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_boxes.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_boxes.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..540da94038f6dea2dc10443905f289ddd131f1af
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_boxes.cpp
@@ -0,0 +1,44 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void points_in_boxes_part_forward_impl(int batch_size, int boxes_num,
+                                       int pts_num, const Tensor boxes,
+                                       const Tensor pts,
+                                       Tensor box_idx_of_points) {
+  DISPATCH_DEVICE_IMPL(points_in_boxes_part_forward_impl, batch_size, boxes_num,
+                       pts_num, boxes, pts, box_idx_of_points);
+}
+
+void points_in_boxes_all_forward_impl(int batch_size, int boxes_num,
+                                      int pts_num, const Tensor boxes,
+                                      const Tensor pts,
+                                      Tensor box_idx_of_points) {
+  DISPATCH_DEVICE_IMPL(points_in_boxes_all_forward_impl, batch_size, boxes_num,
+                       pts_num, boxes, pts, box_idx_of_points);
+}
+
+void points_in_boxes_part_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                  Tensor box_idx_of_points_tensor) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center, each box params pts: (B, npoints, 3)
+  // [x, y, z] in LiDAR coordinate params boxes_idx_of_points: (B, npoints),
+  // default -1
+  int batch_size = boxes_tensor.size(0);
+  int boxes_num = boxes_tensor.size(1);
+  int pts_num = pts_tensor.size(1);
+  points_in_boxes_part_forward_impl(batch_size, boxes_num, pts_num,
+                                    boxes_tensor, pts_tensor,
+                                    box_idx_of_points_tensor);
+}
+
+void points_in_boxes_all_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor box_idx_of_points_tensor) {
+  // params boxes: (B, N, 7) [x, y, z, x_size, y_size, z_size, rz] in LiDAR
+  // coordinate, z is the bottom center. params pts: (B, npoints, 3) [x, y, z]
+  // in LiDAR coordinate params boxes_idx_of_points: (B, npoints), default -1
+  int batch_size = boxes_tensor.size(0);
+  int boxes_num = boxes_tensor.size(1);
+  int pts_num = pts_tensor.size(1);
+  points_in_boxes_all_forward_impl(batch_size, boxes_num, pts_num, boxes_tensor,
+                                   pts_tensor, box_idx_of_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_polygons.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_polygons.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..75a93dcef33f23904c1218048e16beff65c230d1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/points_in_polygons.cpp
@@ -0,0 +1,15 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void points_in_polygons_forward_impl(const Tensor points, const Tensor polygons,
+                                     Tensor output, const int rows,
+                                     const int cols) {
+  DISPATCH_DEVICE_IMPL(points_in_polygons_forward_impl, points, polygons,
+                       output, rows, cols);
+}
+
+void points_in_polygons_forward(Tensor points, Tensor polygons, Tensor output) {
+  int rows = points.size(0);
+  int cols = polygons.size(0);
+  points_in_polygons_forward_impl(points, polygons, output, rows, cols);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/prroi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/prroi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..00db84a154bef7a7cee8d38ba6236d959849a3bc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/prroi_pool.cpp
@@ -0,0 +1,47 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void prroi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                             int pooled_height, int pooled_width,
+                             float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_forward_impl, input, rois, output,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void prroi_pool_backward_impl(Tensor grad_output, Tensor rois,
+                              Tensor grad_input, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_backward_impl, grad_output, rois, grad_input,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void prroi_pool_coor_backward_impl(Tensor output, Tensor grad_output,
+                                   Tensor input, Tensor rois, Tensor grad_rois,
+                                   int pooled_height, int pooled_width,
+                                   float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(prroi_pool_coor_backward_impl, output, grad_output,
+                       input, rois, grad_rois, pooled_height, pooled_width,
+                       spatial_scale);
+}
+
+void prroi_pool_forward(Tensor input, Tensor rois, Tensor output,
+                        int pooled_height, int pooled_width,
+                        float spatial_scale) {
+  prroi_pool_forward_impl(input, rois, output, pooled_height, pooled_width,
+                          spatial_scale);
+}
+
+void prroi_pool_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                         int pooled_height, int pooled_width,
+                         float spatial_scale) {
+  prroi_pool_backward_impl(grad_output, rois, grad_input, pooled_height,
+                           pooled_width, spatial_scale);
+}
+
+void prroi_pool_coor_backward(Tensor output, Tensor grad_output, Tensor input,
+                              Tensor rois, Tensor grad_rois, int pooled_height,
+                              int pooled_width, float spatial_scale) {
+  prroi_pool_coor_backward_impl(output, grad_output, input, rois, grad_rois,
+                                pooled_height, pooled_width, spatial_scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/psamask.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/psamask.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6064c9ba5fd7ec9bcfef22b3abcc65ef50106d67
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/psamask.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+// Modified from
+// https://github.com/hszhao/semseg/blob/master/lib/psa/src
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void psamask_forward_impl(const int psa_type, const Tensor input, Tensor output,
+                          const int num_, const int h_feature,
+                          const int w_feature, const int h_mask,
+                          const int w_mask, const int half_h_mask,
+                          const int half_w_mask) {
+  DISPATCH_DEVICE_IMPL(psamask_forward_impl, psa_type, input, output, num_,
+                       h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                       half_w_mask);
+}
+
+void psamask_backward_impl(const int psa_type, const Tensor grad_output,
+                           Tensor grad_input, const int num_,
+                           const int h_feature, const int w_feature,
+                           const int h_mask, const int w_mask,
+                           const int half_h_mask, const int half_w_mask) {
+  DISPATCH_DEVICE_IMPL(psamask_backward_impl, psa_type, grad_output, grad_input,
+                       num_, h_feature, w_feature, h_mask, w_mask, half_h_mask,
+                       half_w_mask);
+}
+
+void psamask_forward(const Tensor input, Tensor output, const int psa_type,
+                     const int num_, const int h_feature, const int w_feature,
+                     const int h_mask, const int w_mask, const int half_h_mask,
+                     const int half_w_mask) {
+  psamask_forward_impl(psa_type, input, output, num_, h_feature, w_feature,
+                       h_mask, w_mask, half_h_mask, half_w_mask);
+}
+
+void psamask_backward(Tensor grad_output, const Tensor grad_input,
+                      const int psa_type, const int num_, const int h_feature,
+                      const int w_feature, const int h_mask, const int w_mask,
+                      const int half_h_mask, const int half_w_mask) {
+  psamask_backward_impl(psa_type, grad_output, grad_input, num_, h_feature,
+                        w_feature, h_mask, w_mask, half_h_mask, half_w_mask);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pybind.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pybind.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4efd3fd849f47dc3f0eef4f5247d569a11f1fc68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/pybind.cpp
@@ -0,0 +1,922 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include <torch/extension.h>
+
+#include "pytorch_cpp_helper.hpp"
+
+std::string get_compiler_version();
+std::string get_compiling_cuda_version();
+
+void assign_score_withk_forward(const Tensor &points, const Tensor &centers,
+                                const Tensor &scores, const Tensor &knn_idx,
+                                Tensor &output, int B, int N0, int N1, int M,
+                                int K, int O, int aggregate);
+
+void assign_score_withk_backward(const Tensor &grad_out, const Tensor &points,
+                                 const Tensor &centers, const Tensor &scores,
+                                 const Tensor &knn_idx, Tensor &grad_points,
+                                 Tensor &grad_centers, Tensor &grad_scores,
+                                 int B, int N0, int N1, int M, int K, int O,
+                                 int aggregate);
+
+void carafe_naive_forward(Tensor features, Tensor masks, Tensor output,
+                          int kernel_size, int group_size, int scale_factor);
+
+void carafe_naive_backward(Tensor top_grad, Tensor features, Tensor masks,
+                           Tensor bottom_grad, Tensor mask_grad,
+                           int kernel_size, int group_size, int scale_factor);
+
+void carafe_forward(Tensor features, Tensor masks, Tensor rfeatures,
+                    Tensor routput, Tensor rmasks, Tensor output,
+                    int kernel_size, int group_size, int scale_factor);
+
+void carafe_backward(Tensor top_grad, Tensor rfeatures, Tensor masks,
+                     Tensor rtop_grad, Tensor rbottom_grad_hs,
+                     Tensor rbottom_grad, Tensor rmask_grad, Tensor bottom_grad,
+                     Tensor mask_grad, int kernel_size, int group_size,
+                     int scale_factor);
+
+void deform_conv_forward(Tensor input, Tensor weight, Tensor offset,
+                         Tensor output, Tensor columns, Tensor ones, int kW,
+                         int kH, int dW, int dH, int padW, int padH,
+                         int dilationW, int dilationH, int group,
+                         int deformable_group, int im2col_step);
+
+void deform_conv_backward_input(Tensor input, Tensor offset, Tensor gradOutput,
+                                Tensor gradInput, Tensor gradOffset,
+                                Tensor weight, Tensor columns, int kW, int kH,
+                                int dW, int dH, int padW, int padH,
+                                int dilationW, int dilationH, int group,
+                                int deformable_group, int im2col_step);
+
+void deform_conv_backward_parameters(Tensor input, Tensor offset,
+                                     Tensor gradOutput, Tensor gradWeight,
+                                     Tensor columns, Tensor ones, int kW,
+                                     int kH, int dW, int dH, int padW, int padH,
+                                     int dilationW, int dilationH, int group,
+                                     int deformable_group, float scale,
+                                     int im2col_step);
+
+void deform_roi_pool_forward(Tensor input, Tensor rois, Tensor offset,
+                             Tensor output, int pooled_height, int pooled_width,
+                             float spatial_scale, int sampling_ratio,
+                             float gamma);
+
+void deform_roi_pool_backward(Tensor grad_output, Tensor input, Tensor rois,
+                              Tensor offset, Tensor grad_input,
+                              Tensor grad_offset, int pooled_height,
+                              int pooled_width, float spatial_scale,
+                              int sampling_ratio, float gamma);
+
+void group_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                          Tensor out_tensor, int b, int c, int n, int npoints,
+                          int nsample);
+
+void group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                           Tensor grad_points_tensor, int b, int c, int n,
+                           int npoints, int nsample);
+
+void stack_group_points_forward(Tensor features_tensor,
+                                Tensor features_batch_cnt_tensor,
+                                Tensor idx_tensor, Tensor idx_batch_cnt_tensor,
+                                Tensor out_tensor, int b, int c, int m,
+                                int nsample);
+
+void stack_group_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                 Tensor idx_batch_cnt_tensor,
+                                 Tensor features_batch_cnt_tensor,
+                                 Tensor grad_features_tensor, int b, int c,
+                                 int m, int n, int nsample);
+
+void roipoint_pool3d_forward(Tensor xyz, Tensor boxes3d, Tensor pts_feature,
+                             Tensor pooled_features, Tensor pooled_empty_flag);
+
+void gather_points_forward(Tensor points_tensor, Tensor idx_tensor,
+                           Tensor out_tensor, int b, int c, int n, int npoints);
+
+void gather_points_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                            Tensor grad_points_tensor, int b, int c, int n,
+                            int npoints);
+
+void sigmoid_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha);
+
+void sigmoid_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor grad_input, float gamma, float alpha);
+
+void softmax_focal_loss_forward(Tensor input, Tensor target, Tensor weight,
+                                Tensor output, float gamma, float alpha);
+
+void softmax_focal_loss_backward(Tensor input, Tensor target, Tensor weight,
+                                 Tensor buff, Tensor grad_input, float gamma,
+                                 float alpha);
+
+void three_interpolate_forward(Tensor points_tensor, Tensor idx_tensor,
+                               Tensor weight_tensor, Tensor out_tensor, int b,
+                               int c, int m, int n);
+
+void three_interpolate_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                Tensor weight_tensor, Tensor grad_points_tensor,
+                                int b, int c, int n, int m);
+
+void three_nn_forward(Tensor unknown_tensor, Tensor known_tensor,
+                      Tensor dist2_tensor, Tensor idx_tensor, int b, int n,
+                      int m);
+
+void bbox_overlaps(const Tensor bboxes1, const Tensor bboxes2, Tensor ious,
+                   const int mode, const bool aligned, const int offset);
+
+void knn_forward(Tensor xyz_tensor, Tensor new_xyz_tensor, Tensor idx_tensor,
+                 Tensor dist2_tensor, int b, int n, int m, int nsample);
+
+void iou3d_boxes_overlap_bev_forward(Tensor boxes_a, Tensor boxes_b,
+                                     Tensor ans_overlap);
+
+void iou3d_nms3d_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                         float nms_overlap_thresh);
+
+void iou3d_nms3d_normal_forward(Tensor boxes, Tensor keep, Tensor keep_num,
+                                float nms_overlap_thresh);
+
+void furthest_point_sampling_forward(Tensor points_tensor, Tensor temp_tensor,
+                                     Tensor idx_tensor, int b, int n, int m);
+
+void furthest_point_sampling_with_dist_forward(Tensor points_tensor,
+                                               Tensor temp_tensor,
+                                               Tensor idx_tensor, int b, int n,
+                                               int m);
+
+void masked_im2col_forward(const Tensor im, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor col,
+                           const int kernel_h, const int kernel_w,
+                           const int pad_h, const int pad_w);
+
+void masked_col2im_forward(const Tensor col, const Tensor mask_h_idx,
+                           const Tensor mask_w_idx, Tensor im, int height,
+                           int width, int channels);
+
+void modulated_deform_conv_forward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor output, Tensor columns, int kernel_h, int kernel_w,
+    const int stride_h, const int stride_w, const int pad_h, const int pad_w,
+    const int dilation_h, const int dilation_w, const int group,
+    const int deformable_group, const bool with_bias);
+
+void modulated_deform_conv_backward(
+    Tensor input, Tensor weight, Tensor bias, Tensor ones, Tensor offset,
+    Tensor mask, Tensor columns, Tensor grad_input, Tensor grad_weight,
+    Tensor grad_bias, Tensor grad_offset, Tensor grad_mask, Tensor grad_output,
+    int kernel_h, int kernel_w, int stride_h, int stride_w, int pad_h,
+    int pad_w, int dilation_h, int dilation_w, int group, int deformable_group,
+    const bool with_bias);
+
+Tensor ms_deform_attn_forward(const Tensor &value, const Tensor &spatial_shapes,
+                              const Tensor &level_start_index,
+                              const Tensor &sampling_loc,
+                              const Tensor &attn_weight, const int im2col_step);
+
+void ms_deform_attn_backward(const Tensor &value, const Tensor &spatial_shapes,
+                             const Tensor &level_start_index,
+                             const Tensor &sampling_loc,
+                             const Tensor &attn_weight,
+                             const Tensor &grad_output, Tensor &grad_value,
+                             Tensor &grad_sampling_loc,
+                             Tensor &grad_attn_weight, const int im2col_step);
+
+Tensor nms(Tensor boxes, Tensor scores, float iou_threshold, int offset);
+
+Tensor softnms(Tensor boxes, Tensor scores, Tensor dets, float iou_threshold,
+               float sigma, float min_score, int method, int offset);
+
+std::vector<std::vector<int>> nms_match(Tensor dets, float iou_threshold);
+
+std::vector<std::vector<float>> pixel_group(
+    Tensor score, Tensor mask, Tensor embedding, Tensor kernel_label,
+    Tensor kernel_contour, int kernel_region_num, float distance_threshold);
+
+std::vector<std::vector<int>> contour_expand(Tensor kernel_mask,
+                                             Tensor internal_kernel_label,
+                                             int min_kernel_area,
+                                             int kernel_num);
+
+void roi_align_forward(Tensor input, Tensor rois, Tensor output,
+                       Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                       int aligned_width, float spatial_scale,
+                       int sampling_ratio, int pool_mode, bool aligned);
+
+void roi_align_backward(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                        Tensor argmax_x, Tensor grad_input, int aligned_height,
+                        int aligned_width, float spatial_scale,
+                        int sampling_ratio, int pool_mode, bool aligned);
+
+void roi_pool_forward(Tensor input, Tensor rois, Tensor output, Tensor argmax,
+                      int pooled_height, int pooled_width, float spatial_scale);
+
+void roi_pool_backward(Tensor grad_output, Tensor rois, Tensor argmax,
+                       Tensor grad_input, int pooled_height, int pooled_width,
+                       float spatial_scale);
+
+void sync_bn_forward_mean(const Tensor input, Tensor mean);
+
+void sync_bn_forward_var(const Tensor input, const Tensor mean, Tensor var);
+
+void sync_bn_forward_output(const Tensor input, const Tensor mean,
+                            const Tensor var, const Tensor weight,
+                            const Tensor bias, Tensor running_mean,
+                            Tensor running_var, Tensor norm, Tensor std,
+                            Tensor output, float eps, float momentum,
+                            int group_size);
+
+void sync_bn_backward_param(const Tensor grad_output, const Tensor norm,
+                            Tensor grad_weight, Tensor grad_bias);
+
+void sync_bn_backward_data(const Tensor grad_output, const Tensor weight,
+                           const Tensor grad_weight, const Tensor grad_bias,
+                           const Tensor norm, const Tensor std,
+                           Tensor grad_input);
+
+void psamask_forward(const Tensor input, Tensor output, const int psa_type,
+                     const int num_, const int h_feature, const int w_feature,
+                     const int h_mask, const int w_mask, const int half_h_mask,
+                     const int half_w_mask);
+
+void psamask_backward(Tensor grad_output, const Tensor grad_input,
+                      const int psa_type, const int num_, const int h_feature,
+                      const int w_feature, const int h_mask, const int w_mask,
+                      const int half_h_mask, const int half_w_mask);
+
+void tin_shift_forward(Tensor input, Tensor shift, Tensor output);
+
+void tin_shift_backward(Tensor grad_output, Tensor shift, Tensor grad_input);
+
+void ball_query_forward(Tensor new_xyz_tensor, Tensor xyz_tensor,
+                        Tensor idx_tensor, int b, int n, int m,
+                        float min_radius, float max_radius, int nsample);
+
+void stack_ball_query_forward(Tensor new_xyz_tensor, Tensor new_xyz_batch_cnt,
+                              Tensor xyz_tensor, Tensor xyz_batch_cnt,
+                              Tensor idx_tensor, float max_radius, int nsample);
+
+void prroi_pool_forward(Tensor input, Tensor rois, Tensor output,
+                        int pooled_height, int pooled_width,
+                        float spatial_scale);
+
+void prroi_pool_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                         int pooled_height, int pooled_width,
+                         float spatial_scale);
+
+void prroi_pool_coor_backward(Tensor output, Tensor grad_output, Tensor input,
+                              Tensor rois, Tensor grad_rois, int pooled_height,
+                              int pooled_width, float spatial_scale);
+
+// template <unsigned NDim>
+// std::vector<torch::Tensor> get_indice_pairs_forward(
+//     torch::Tensor indices, int64_t batchSize,
+//     std::vector<int64_t> outSpatialShape, std::vector<int64_t> spatialShape,
+//     std::vector<int64_t> kernelSize, std::vector<int64_t> stride,
+//     std::vector<int64_t> padding, std::vector<int64_t> dilation,
+//     std::vector<int64_t> outPadding, int64_t _subM, int64_t _transpose);
+
+// template <unsigned NDim>
+// std::vector<Tensor> get_indice_pairs_backward(
+//     Tensor indices, Tensor gridOut, int64_t batchSize,
+//     std::vector<int64_t> outSpatialShape, std::vector<int64_t> spatialShape,
+//     std::vector<int64_t> kernelSize, std::vector<int64_t> stride,
+//     std::vector<int64_t> padding, std::vector<int64_t> dilation,
+//     std::vector<int64_t> outPadding, int64_t _subM, int64_t _transpose);
+
+// Tensor indice_conv_forward(Tensor features, Tensor filters, Tensor indicePairs,
+//                            Tensor indiceNum, int64_t numActOut,
+//                            int64_t _inverse, int64_t _subM);
+
+// std::vector<Tensor> indice_conv_backward(Tensor features, Tensor filters,
+//                                          Tensor outGrad, Tensor indicePairs,
+//                                          Tensor indiceNum, int64_t _inverse,
+//                                          int64_t _subM);
+
+// Tensor fused_indice_conv_batchnorm_forward(Tensor features, Tensor filters,
+//                                            Tensor bias, Tensor indicePairs,
+//                                            Tensor indiceNum, int64_t numActOut,
+//                                            int64_t _inverse, int64_t _subM);
+
+// Tensor indice_maxpool_forward(Tensor features, Tensor indicePairs,
+//                               Tensor indiceNum, int64_t numAct);
+
+// Tensor indice_maxpool_backward(Tensor features, Tensor outFeatures,
+//                                Tensor outGrad, Tensor indicePairs,
+//                                Tensor indiceNum);
+
+void box_iou_rotated(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                     const int mode_flag, const bool aligned);
+
+Tensor nms_rotated(const Tensor dets, const Tensor scores, const Tensor order,
+                   const Tensor dets_sorted, const float iou_threshold,
+                   const int multi_label);
+
+Tensor upfirdn2d(const Tensor &input, const Tensor &kernel, int up_x, int up_y,
+                 int down_x, int down_y, int pad_x0, int pad_x1, int pad_y0,
+                 int pad_y1);
+
+Tensor fused_bias_leakyrelu(const Tensor &input, const Tensor &bias,
+                            const Tensor &refer, int act, int grad, float alpha,
+                            float scale);
+
+void roi_align_rotated_forward(Tensor input, Tensor rois, Tensor output,
+                               int pooled_height, int pooled_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned, bool clockwise);
+
+void roi_align_rotated_backward(Tensor grad_output, Tensor rois,
+                                Tensor grad_input, int pooled_height,
+                                int pooled_width, float spatial_scale,
+                                int sampling_ratio, bool aligned,
+                                bool clockwise);
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward(
+    const torch::Tensor &feats, const torch::Tensor &coors,
+    const std::string &reduce_type);
+
+void dynamic_point_to_voxel_backward(torch::Tensor &grad_feats,
+                                     const torch::Tensor &grad_reduced_feats,
+                                     const torch::Tensor &feats,
+                                     const torch::Tensor &reduced_feats,
+                                     const torch::Tensor &coors_idx,
+                                     const torch::Tensor &reduce_count,
+                                     const std::string &reduce_type);
+
+void hard_voxelize_forward(const at::Tensor &points,
+                           const at::Tensor &voxel_size,
+                           const at::Tensor &coors_range, at::Tensor &voxels,
+                           at::Tensor &coors, at::Tensor &num_points_per_voxel,
+                           at::Tensor &voxel_num, const int max_points,
+                           const int max_voxels, const int NDim,
+                           const bool deterministic);
+
+void dynamic_voxelize_forward(const at::Tensor &points,
+                              const at::Tensor &voxel_size,
+                              const at::Tensor &coors_range, at::Tensor &coors,
+                              const int NDim);
+
+void border_align_forward(const Tensor &input, const Tensor &boxes,
+                          Tensor output, Tensor argmax_idx,
+                          const int pool_size);
+
+void border_align_backward(const Tensor &grad_output, const Tensor &boxes,
+                           const Tensor &argmax_idx, Tensor grad_input,
+                           const int pool_size);
+
+void points_in_boxes_cpu_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor pts_indices_tensor);
+
+void points_in_boxes_part_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                  Tensor box_idx_of_points_tensor);
+
+void points_in_boxes_all_forward(Tensor boxes_tensor, Tensor pts_tensor,
+                                 Tensor box_idx_of_points_tensor);
+
+void roiaware_pool3d_forward(Tensor rois, Tensor pts, Tensor pts_feature,
+                             Tensor argmax, Tensor pts_idx_of_voxels,
+                             Tensor pooled_features, int pool_method);
+
+void roiaware_pool3d_backward(Tensor pts_idx_of_voxels, Tensor argmax,
+                              Tensor grad_out, Tensor grad_in, int pool_method);
+
+void correlation_forward(Tensor input1, Tensor input2, Tensor output, int kH,
+                         int kW, int patchH, int patchW, int padH, int padW,
+                         int dilationH, int dilationW, int dilation_patchH,
+                         int dilation_patchW, int dH, int dW);
+
+void correlation_backward(Tensor grad_output, Tensor input1, Tensor input2,
+                          Tensor grad_input1, Tensor grad_input2, int kH,
+                          int kW, int patchH, int patchW, int padH, int padW,
+                          int dilationH, int dilationW, int dilation_patchH,
+                          int dilation_patchW, int dH, int dW);
+
+void rotated_feature_align_forward(const Tensor features,
+                                   const Tensor best_bboxes, Tensor output,
+                                   const float spatial_scale, const int points);
+
+void rotated_feature_align_backward(const Tensor top_grad,
+                                    const Tensor best_bboxes,
+                                    Tensor bottom_grad,
+                                    const float spatial_scale,
+                                    const int points);
+
+void riroi_align_rotated_forward(Tensor features, Tensor rois, Tensor output,
+                                 int pooled_height, int pooled_width,
+                                 float spatial_scale, int num_samples,
+                                 int num_orientations, bool clockwise);
+
+void riroi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                  Tensor bottom_grad, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int num_samples, int num_orientations,
+                                  bool clockwise);
+
+void points_in_polygons_forward(Tensor points, Tensor polygons, Tensor output);
+
+void min_area_polygons(const Tensor pointsets, Tensor polygons);
+
+void active_rotated_filter_forward(const Tensor input, const Tensor indices,
+                                   Tensor output);
+
+void active_rotated_filter_backward(const Tensor grad_out, const Tensor indices,
+                                    Tensor grad_in);
+
+void convex_iou(const Tensor pointsets, const Tensor polygons, Tensor ious);
+
+void convex_giou(const Tensor pointsets, const Tensor polygons, Tensor output);
+
+at::Tensor diff_iou_rotated_sort_vertices_forward(at::Tensor vertices,
+                                                  at::Tensor mask,
+                                                  at::Tensor num_valid);
+
+void chamfer_distance_forward(const Tensor xyz1, const Tensor xyz2,
+                              const Tensor dist1, const Tensor dist2,
+                              const Tensor idx1, const Tensor idx);
+
+void chamfer_distance_backward(const Tensor xyz1, const Tensor xyz2,
+                               Tensor idx1, Tensor idx2, Tensor graddist1,
+                               Tensor graddist2, Tensor gradxyz1,
+                               Tensor gradxyz2);
+
+void box_iou_quadri(const Tensor boxes1, const Tensor boxes2, Tensor ious,
+                    const int mode_flag, const bool aligned);
+
+Tensor nms_quadri(const Tensor dets, const Tensor scores, const Tensor order,
+                  const Tensor dets_sorted, const float iou_threshold,
+                  const int multi_label);
+
+void bezier_align_forward(Tensor input, Tensor rois, Tensor output,
+                          int aligned_height, int aligned_width,
+                          float spatial_scale, int sampling_ratio,
+                          bool aligned);
+
+void bezier_align_backward(Tensor grad_output, Tensor rois, Tensor grad_input,
+                           int aligned_height, int aligned_width,
+                           float spatial_scale, int sampling_ratio,
+                           bool aligned);
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+  m.def("upfirdn2d", &upfirdn2d, "upfirdn2d (CUDA)", py::arg("input"),
+        py::arg("kernel"), py::arg("up_x"), py::arg("up_y"), py::arg("down_x"),
+        py::arg("down_y"), py::arg("pad_x0"), py::arg("pad_x1"),
+        py::arg("pad_y0"), py::arg("pad_y1"));
+  m.def("fused_bias_leakyrelu", &fused_bias_leakyrelu,
+        "fused_bias_leakyrelu (CUDA)", py::arg("input"), py::arg("bias"),
+        py::arg("empty"), py::arg("act"), py::arg("grad"), py::arg("alpha"),
+        py::arg("scale"));
+  m.def("gather_points_forward", &gather_points_forward,
+        "gather_points_forward", py::arg("points_tensor"),
+        py::arg("idx_tensor"), py::arg("out_tensor"), py::arg("b"),
+        py::arg("c"), py::arg("n"), py::arg("npoints"));
+  m.def("gather_points_backward", &gather_points_backward,
+        "gather_points_backward", py::arg("grad_out_tensor"),
+        py::arg("idx_tensor"), py::arg("grad_points_tensor"), py::arg("b"),
+        py::arg("c"), py::arg("n"), py::arg("npoints"));
+  m.def("get_compiler_version", &get_compiler_version, "get_compiler_version");
+  m.def("get_compiling_cuda_version", &get_compiling_cuda_version,
+        "get_compiling_cuda_version");
+  m.def("assign_score_withk_forward", &assign_score_withk_forward,
+        "assign_score_withk_forward", py::arg("points"), py::arg("centers"),
+        py::arg("scores"), py::arg("knn_idx"), py::arg("output"), py::arg("B"),
+        py::arg("N0"), py::arg("N1"), py::arg("M"), py::arg("K"), py::arg("O"),
+        py::arg("aggregate"));
+  m.def("assign_score_withk_backward", &assign_score_withk_backward,
+        "assign_score_withk_backward", py::arg("grad_out"), py::arg("points"),
+        py::arg("centers"), py::arg("scores"), py::arg("knn_idx"),
+        py::arg("grad_points"), py::arg("grad_centers"), py::arg("grad_scores"),
+        py::arg("B"), py::arg("N0"), py::arg("N1"), py::arg("M"), py::arg("K"),
+        py::arg("O"), py::arg("aggregate"));
+  m.def("knn_forward", &knn_forward, "knn_forward", py::arg("xyz_tensor"),
+        py::arg("new_xyz_tensor"), py::arg("idx_tensor"),
+        py::arg("dist2_tensor"), py::arg("b"), py::arg("n"), py::arg("m"),
+        py::arg("nsample"));
+  m.def("carafe_naive_forward", &carafe_naive_forward, "carafe_naive_forward",
+        py::arg("features"), py::arg("masks"), py::arg("output"),
+        py::arg("kernel_size"), py::arg("group_size"), py::arg("scale_factor"));
+  m.def("carafe_naive_backward", &carafe_naive_backward,
+        "carafe_naive_backward", py::arg("top_grad"), py::arg("features"),
+        py::arg("masks"), py::arg("bottom_grad"), py::arg("mask_grad"),
+        py::arg("kernel_size"), py::arg("group_size"), py::arg("scale_factor"));
+  m.def("carafe_forward", &carafe_forward, "carafe_forward",
+        py::arg("features"), py::arg("masks"), py::arg("rfeatures"),
+        py::arg("routput"), py::arg("rmasks"), py::arg("output"),
+        py::arg("kernel_size"), py::arg("group_size"), py::arg("scale_factor"));
+  m.def("carafe_backward", &carafe_backward, "carafe_backward",
+        py::arg("top_grad"), py::arg("rfeatures"), py::arg("masks"),
+        py::arg("rtop_grad"), py::arg("rbottom_grad_hs"),
+        py::arg("rbottom_grad"), py::arg("rmask_grad"), py::arg("bottom_grad"),
+        py::arg("mask_grad"), py::arg("kernel_size"), py::arg("group_size"),
+        py::arg("scale_factor"));
+  m.def("deform_conv_forward", &deform_conv_forward, "deform_conv_forward",
+        py::arg("input"), py::arg("weight"), py::arg("offset"),
+        py::arg("output"), py::arg("columns"), py::arg("ones"), py::arg("kW"),
+        py::arg("kH"), py::arg("dW"), py::arg("dH"), py::arg("padW"),
+        py::arg("padH"), py::arg("dilationW"), py::arg("dilationH"),
+        py::arg("group"), py::arg("deformable_group"), py::arg("im2col_step"));
+  m.def("deform_conv_backward_input", &deform_conv_backward_input,
+        "deform_conv_backward_input", py::arg("input"), py::arg("offset"),
+        py::arg("gradOutput"), py::arg("gradInput"), py::arg("gradOffset"),
+        py::arg("weight"), py::arg("columns"), py::arg("kW"), py::arg("kH"),
+        py::arg("dW"), py::arg("dH"), py::arg("padW"), py::arg("padH"),
+        py::arg("dilationW"), py::arg("dilationH"), py::arg("group"),
+        py::arg("deformable_group"), py::arg("im2col_step"));
+  m.def("deform_conv_backward_parameters", &deform_conv_backward_parameters,
+        "deform_conv_backward_parameters", py::arg("input"), py::arg("offset"),
+        py::arg("gradOutput"), py::arg("gradWeight"), py::arg("columns"),
+        py::arg("ones"), py::arg("kW"), py::arg("kH"), py::arg("dW"),
+        py::arg("dH"), py::arg("padW"), py::arg("padH"), py::arg("dilationW"),
+        py::arg("dilationH"), py::arg("group"), py::arg("deformable_group"),
+        py::arg("scale"), py::arg("im2col_step"));
+  m.def("deform_roi_pool_forward", &deform_roi_pool_forward,
+        "deform roi pool forward", py::arg("input"), py::arg("rois"),
+        py::arg("offset"), py::arg("output"), py::arg("pooled_height"),
+        py::arg("pooled_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("gamma"));
+  m.def("deform_roi_pool_backward", &deform_roi_pool_backward,
+        "deform roi pool backward", py::arg("grad_output"), py::arg("input"),
+        py::arg("rois"), py::arg("offset"), py::arg("grad_input"),
+        py::arg("grad_offset"), py::arg("pooled_height"),
+        py::arg("pooled_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("gamma"));
+  m.def("roipoint_pool3d_forward", &roipoint_pool3d_forward,
+        "roipoint_pool3d_forward", py::arg("xyz"), py::arg("boxes3d"),
+        py::arg("pts_feature"), py::arg("pooled_features"),
+        py::arg("pooled_empty_flag"));
+  m.def("sigmoid_focal_loss_forward", &sigmoid_focal_loss_forward,
+        "sigmoid_focal_loss_forward ", py::arg("input"), py::arg("target"),
+        py::arg("weight"), py::arg("output"), py::arg("gamma"),
+        py::arg("alpha"));
+  m.def("sigmoid_focal_loss_backward", &sigmoid_focal_loss_backward,
+        "sigmoid_focal_loss_backward", py::arg("input"), py::arg("target"),
+        py::arg("weight"), py::arg("grad_input"), py::arg("gamma"),
+        py::arg("alpha"));
+  m.def("softmax_focal_loss_forward", &softmax_focal_loss_forward,
+        "softmax_focal_loss_forward", py::arg("input"), py::arg("target"),
+        py::arg("weight"), py::arg("output"), py::arg("gamma"),
+        py::arg("alpha"));
+  m.def("softmax_focal_loss_backward", &softmax_focal_loss_backward,
+        "softmax_focal_loss_backward", py::arg("input"), py::arg("target"),
+        py::arg("weight"), py::arg("buff"), py::arg("grad_input"),
+        py::arg("gamma"), py::arg("alpha"));
+  m.def("three_interpolate_forward", &three_interpolate_forward,
+        "three_interpolate_forward", py::arg("points_tensor"),
+        py::arg("idx_tensor"), py::arg("weight_tensor"), py::arg("out_tensor"),
+        py::arg("b"), py::arg("c"), py::arg("m"), py::arg("n"));
+  m.def("three_interpolate_backward", &three_interpolate_backward,
+        "three_interpolate_backward", py::arg("grad_out_tensor"),
+        py::arg("idx_tensor"), py::arg("weight_tensor"),
+        py::arg("grad_points_tensor"), py::arg("b"), py::arg("c"), py::arg("n"),
+        py::arg("m"));
+  m.def("three_nn_forward", &three_nn_forward, "three_nn_forward",
+        py::arg("unknown_tensor"), py::arg("known_tensor"),
+        py::arg("dist2_tensor"), py::arg("idx_tensor"), py::arg("b"),
+        py::arg("n"), py::arg("m"));
+  m.def("bbox_overlaps", &bbox_overlaps, "bbox_overlaps", py::arg("bboxes1"),
+        py::arg("bboxes2"), py::arg("ious"), py::arg("mode"),
+        py::arg("aligned"), py::arg("offset"));
+  m.def("group_points_forward", &group_points_forward, "group_points_forward",
+        py::arg("points_tensor"), py::arg("idx_tensor"), py::arg("out_tensor"),
+        py::arg("b"), py::arg("c"), py::arg("n"), py::arg("npoints"),
+        py::arg("nsample"));
+  m.def("group_points_backward", &group_points_backward,
+        "group_points_backward", py::arg("grad_out_tensor"),
+        py::arg("idx_tensor"), py::arg("grad_points_tensor"), py::arg("b"),
+        py::arg("c"), py::arg("n"), py::arg("npoints"), py::arg("nsample"));
+  m.def("stack_group_points_forward", &stack_group_points_forward,
+        "stack_group_points_forward", py::arg("features_tensor"),
+        py::arg("features_batch_cnt_tensor"), py::arg("idx_tensor"),
+        py::arg("idx_batch_cnt_tensor"), py::arg("out_tensor"), py::arg("b"),
+        py::arg("c"), py::arg("m"), py::arg("nsample"));
+  m.def("stack_group_points_backward", &stack_group_points_backward,
+        "stack_group_points_backward", py::arg("grad_out_tensor"),
+        py::arg("idx_tensor"), py::arg("idx_batch_cnt_tensor"),
+        py::arg("features_batch_cnt_tensor"), py::arg("grad_features_tensor"),
+        py::arg("b"), py::arg("c"), py::arg("m"), py::arg("n"),
+        py::arg("nsample"));
+  m.def("knn_forward", &knn_forward, "knn_forward", py::arg("b"), py::arg("n"),
+        py::arg("m"), py::arg("nsample"), py::arg("xyz_tensor"),
+        py::arg("new_xyz_tensor"), py::arg("idx_tensor"),
+        py::arg("dist2_tensor"));
+  m.def("iou3d_boxes_overlap_bev_forward", &iou3d_boxes_overlap_bev_forward,
+        "iou3d_boxes_overlap_bev_forward", py::arg("boxes_a"),
+        py::arg("boxes_b"), py::arg("ans_iou"));
+  m.def("iou3d_nms3d_forward", &iou3d_nms3d_forward, "iou3d_nms3d_forward",
+        py::arg("boxes"), py::arg("keep"), py::arg("num_out"),
+        py::arg("nms_overlap_thresh"));
+  m.def("iou3d_nms3d_normal_forward", &iou3d_nms3d_normal_forward,
+        "iou3d_nms3d_normal_forward", py::arg("boxes"), py::arg("keep"),
+        py::arg("num_out"), py::arg("nms_overlap_thresh"));
+  m.def("furthest_point_sampling_forward", &furthest_point_sampling_forward,
+        "furthest_point_sampling_forward", py::arg("points_tensor"),
+        py::arg("temp_tensor"), py::arg("idx_tensor"), py::arg("b"),
+        py::arg("n"), py::arg("m"));
+  m.def("furthest_point_sampling_with_dist_forward",
+        &furthest_point_sampling_with_dist_forward,
+        "furthest_point_sampling_with_dist_forward", py::arg("points_tensor"),
+        py::arg("temp_tensor"), py::arg("idx_tensor"), py::arg("b"),
+        py::arg("n"), py::arg("m"));
+  m.def("masked_im2col_forward", &masked_im2col_forward,
+        "masked_im2col_forward", py::arg("im"), py::arg("mask_h_idx"),
+        py::arg("mask_w_idx"), py::arg("col"), py::arg("kernel_h"),
+        py::arg("kernel_w"), py::arg("pad_h"), py::arg("pad_w"));
+  m.def("masked_col2im_forward", &masked_col2im_forward,
+        "masked_col2im_forward", py::arg("col"), py::arg("mask_h_idx"),
+        py::arg("mask_w_idx"), py::arg("im"), py::arg("height"),
+        py::arg("width"), py::arg("channels"));
+  m.def("modulated_deform_conv_forward", &modulated_deform_conv_forward,
+        "modulated deform conv forward", py::arg("input"), py::arg("weight"),
+        py::arg("bias"), py::arg("ones"), py::arg("offset"), py::arg("mask"),
+        py::arg("output"), py::arg("columns"), py::arg("kernel_h"),
+        py::arg("kernel_w"), py::arg("stride_h"), py::arg("stride_w"),
+        py::arg("pad_h"), py::arg("pad_w"), py::arg("dilation_h"),
+        py::arg("dilation_w"), py::arg("group"), py::arg("deformable_group"),
+        py::arg("with_bias"));
+  m.def("modulated_deform_conv_backward", &modulated_deform_conv_backward,
+        "modulated deform conv backward", py::arg("input"), py::arg("weight"),
+        py::arg("bias"), py::arg("ones"), py::arg("offset"), py::arg("mask"),
+        py::arg("columns"), py::arg("grad_input"), py::arg("grad_weight"),
+        py::arg("grad_bias"), py::arg("grad_offset"), py::arg("grad_mask"),
+        py::arg("grad_output"), py::arg("kernel_h"), py::arg("kernel_w"),
+        py::arg("stride_h"), py::arg("stride_w"), py::arg("pad_h"),
+        py::arg("pad_w"), py::arg("dilation_h"), py::arg("dilation_w"),
+        py::arg("group"), py::arg("deformable_group"), py::arg("with_bias"));
+  m.def("nms", &nms, "nms (CPU/CUDA) ", py::arg("boxes"), py::arg("scores"),
+        py::arg("iou_threshold"), py::arg("offset"));
+  m.def("softnms", &softnms, "softnms (CPU) ", py::arg("boxes"),
+        py::arg("scores"), py::arg("dets"), py::arg("iou_threshold"),
+        py::arg("sigma"), py::arg("min_score"), py::arg("method"),
+        py::arg("offset"));
+  m.def("nms_match", &nms_match, "nms_match (CPU) ", py::arg("dets"),
+        py::arg("iou_threshold"));
+  m.def("pixel_group", &pixel_group, "pixel group (CPU) ", py::arg("score"),
+        py::arg("mask"), py::arg("embedding"), py::arg("kernel_label"),
+        py::arg("kernel_contour"), py::arg("kernel_region_label"),
+        py::arg("distance_threshold"));
+  m.def("contour_expand", &contour_expand, "contour exapnd (CPU) ",
+        py::arg("kernel_mask"), py::arg("internal_kernel_label"),
+        py::arg("min_kernel_area"), py::arg("kernel_num"));
+  m.def("roi_align_forward", &roi_align_forward, "roi_align forward",
+        py::arg("input"), py::arg("rois"), py::arg("output"),
+        py::arg("argmax_y"), py::arg("argmax_x"), py::arg("aligned_height"),
+        py::arg("aligned_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("pool_mode"), py::arg("aligned"));
+  m.def("roi_align_backward", &roi_align_backward, "roi_align backward",
+        py::arg("grad_output"), py::arg("rois"), py::arg("argmax_y"),
+        py::arg("argmax_x"), py::arg("grad_input"), py::arg("aligned_height"),
+        py::arg("aligned_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("pool_mode"), py::arg("aligned"));
+  m.def("roi_pool_forward", &roi_pool_forward, "roi_pool forward",
+        py::arg("input"), py::arg("rois"), py::arg("output"), py::arg("argmax"),
+        py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"));
+  m.def("roi_pool_backward", &roi_pool_backward, "roi_pool backward",
+        py::arg("grad_output"), py::arg("rois"), py::arg("argmax"),
+        py::arg("grad_input"), py::arg("pooled_height"),
+        py::arg("pooled_width"), py::arg("spatial_scale"));
+  m.def("sync_bn_forward_mean", &sync_bn_forward_mean, "sync_bn forward_mean",
+        py::arg("input"), py::arg("mean"));
+  m.def("sync_bn_forward_var", &sync_bn_forward_var, "sync_bn forward_var",
+        py::arg("input"), py::arg("mean"), py::arg("var"));
+  m.def("sync_bn_forward_output", &sync_bn_forward_output,
+        "sync_bn forward_output", py::arg("input"), py::arg("mean"),
+        py::arg("var"), py::arg("weight"), py::arg("bias"),
+        py::arg("running_mean"), py::arg("running_var"), py::arg("norm"),
+        py::arg("std"), py::arg("output"), py::arg("eps"), py::arg("momentum"),
+        py::arg("group_size"));
+  m.def("sync_bn_backward_param", &sync_bn_backward_param,
+        "sync_bn backward_param", py::arg("grad_output"), py::arg("norm"),
+        py::arg("grad_weight"), py::arg("grad_bias"));
+  m.def("sync_bn_backward_data", &sync_bn_backward_data,
+        "sync_bn backward_data", py::arg("grad_output"), py::arg("weight"),
+        py::arg("grad_weight"), py::arg("grad_bias"), py::arg("norm"),
+        py::arg("std"), py::arg("grad_input"));
+//   m.def("get_indice_pairs_2d_forward", &get_indice_pairs_forward<2>,
+//         "get_indice_pairs_2d_forward", py::arg("indices"), py::arg("batchSize"),
+//         py::arg("outSpatialShape"), py::arg("spatialShape"),
+//         py::arg("kernelSize"), py::arg("stride"), py::arg("padding"),
+//         py::arg("dilation"), py::arg("outPadding"), py::arg("_subM"),
+//         py::arg("_transpose"));
+//   m.def("get_indice_pairs_3d_forward", &get_indice_pairs_forward<3>,
+//         "get_indice_pairs_3d_forward", py::arg("indices"), py::arg("batchSize"),
+//         py::arg("outSpatialShape"), py::arg("spatialShape"),
+//         py::arg("kernelSize"), py::arg("stride"), py::arg("padding"),
+//         py::arg("dilation"), py::arg("outPadding"), py::arg("_subM"),
+//         py::arg("_transpose"));
+//   m.def("get_indice_pairs_4d_forward", &get_indice_pairs_forward<4>,
+//         "get_indice_pairs_4d_forward", py::arg("indices"), py::arg("batchSize"),
+//         py::arg("outSpatialShape"), py::arg("spatialShape"),
+//         py::arg("kernelSize"), py::arg("stride"), py::arg("padding"),
+//         py::arg("dilation"), py::arg("outPadding"), py::arg("_subM"),
+//         py::arg("_transpose"));
+//   m.def("get_indice_pairs_2d_backward", &get_indice_pairs_backward<2>,
+//         "get_indice_pairs_2d_backward", py::arg("indices"), py::arg("gridOut"),
+//         py::arg("batchSize"), py::arg("outSpatialShape"),
+//         py::arg("spatialShape"), py::arg("kernelSize"), py::arg("stride"),
+//         py::arg("padding"), py::arg("dilation"), py::arg("outPadding"),
+//         py::arg("_subM"), py::arg("_transpose"));
+//   m.def("get_indice_pairs_3d_backward", &get_indice_pairs_backward<3>,
+//         "get_indice_pairs_3d_backward", py::arg("indices"), py::arg("gridOut"),
+//         py::arg("batchSize"), py::arg("outSpatialShape"),
+//         py::arg("spatialShape"), py::arg("kernelSize"), py::arg("stride"),
+//         py::arg("padding"), py::arg("dilation"), py::arg("outPadding"),
+//         py::arg("_subM"), py::arg("_transpose"));
+//   m.def("indice_conv_forward", &indice_conv_forward, "indice_conv_forward",
+//         py::arg("features"), py::arg("filters"), py::arg("indicePairs"),
+//         py::arg("indiceNum"), py::arg("numActOut"), py::arg("_inverse"),
+//         py::arg("_subM"));
+//   m.def("indice_conv_backward", &indice_conv_backward, "indice_conv_backward",
+//         py::arg("features"), py::arg("filters"), py::arg("outGrad"),
+//         py::arg("indicePairs"), py::arg("indiceNum"), py::arg("_inverse"),
+//         py::arg("_subM"));
+//   m.def("fused_indice_conv_forward", &fused_indice_conv_batchnorm_forward,
+//         "fused_indice_conv_forward", py::arg("features"), py::arg("filters"),
+//         py::arg("bias"), py::arg("indicePairs"), py::arg("indiceNum"),
+//         py::arg("numActOut"), py::arg("_inverse"), py::arg("_subM"));
+//   m.def("indice_maxpool_forward", &indice_maxpool_forward,
+//         "indice_maxpool_forward", py::arg("features"), py::arg("indicePairs"),
+//         py::arg("indiceNum"), py::arg("numAct"));
+//   m.def("indice_maxpool_backward", &indice_maxpool_backward,
+//         "indice_maxpool_backward", py::arg("features"), py::arg("outFeatures"),
+//         py::arg("outGrad"), py::arg("indicePairs"), py::arg("indiceNum"));
+  m.def("psamask_forward", &psamask_forward, "PSAMASK forward (CPU/CUDA)",
+        py::arg("input"), py::arg("output"), py::arg("psa_type"),
+        py::arg("num_"), py::arg("h_feature"), py::arg("w_feature"),
+        py::arg("h_mask"), py::arg("w_mask"), py::arg("half_h_mask"),
+        py::arg("half_w_mask"));
+  m.def("psamask_backward", &psamask_backward, "PSAMASK backward (CPU/CUDA)",
+        py::arg("grad_output"), py::arg("grad_input"), py::arg("psa_type"),
+        py::arg("num_"), py::arg("h_feature"), py::arg("w_feature"),
+        py::arg("h_mask"), py::arg("w_mask"), py::arg("half_h_mask"),
+        py::arg("half_w_mask"));
+  m.def("tin_shift_forward", &tin_shift_forward, "tin_shift forward",
+        py::arg("input"), py::arg("shift"), py::arg("output"));
+  m.def("tin_shift_backward", &tin_shift_backward, "tin_shift backward",
+        py::arg("grad_output"), py::arg("shift"), py::arg("grad_input"));
+  m.def("box_iou_rotated", &box_iou_rotated, "IoU for rotated boxes",
+        py::arg("boxes1"), py::arg("boxes2"), py::arg("ious"),
+        py::arg("mode_flag"), py::arg("aligned"));
+  m.def("nms_rotated", &nms_rotated, "NMS for rotated boxes", py::arg("dets"),
+        py::arg("scores"), py::arg("order"), py::arg("dets_sorted"),
+        py::arg("iou_threshold"), py::arg("multi_label"));
+  m.def("ball_query_forward", &ball_query_forward, "ball_query_forward",
+        py::arg("new_xyz_tensor"), py::arg("xyz_tensor"), py::arg("idx_tensor"),
+        py::arg("b"), py::arg("n"), py::arg("m"), py::arg("min_radius"),
+        py::arg("max_radius"), py::arg("nsample"));
+  m.def("stack_ball_query_forward", &stack_ball_query_forward,
+        "stack_ball_query_forward", py::arg("new_xyz_tensor"),
+        py::arg("new_xyz_batch_cnt"), py::arg("xyz_tensor"),
+        py::arg("xyz_batch_cnt"), py::arg("idx_tensor"), py::arg("max_radius"),
+        py::arg("nsample"));
+  m.def("roi_align_rotated_forward", &roi_align_rotated_forward,
+        "roi_align_rotated forward", py::arg("input"), py::arg("rois"),
+        py::arg("output"), py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"), py::arg("sampling_ratio"), py::arg("aligned"),
+        py::arg("clockwise"));
+  m.def("roi_align_rotated_backward", &roi_align_rotated_backward,
+        "roi_align_rotated backward", py::arg("rois"), py::arg("grad_input"),
+        py::arg("grad_output"), py::arg("pooled_height"),
+        py::arg("pooled_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("aligned"), py::arg("clockwise"));
+  m.def("dynamic_point_to_voxel_forward", &dynamic_point_to_voxel_forward,
+        "dynamic_point_to_voxel_forward", py::arg("feats"), py::arg("coors"),
+        py::arg("reduce_type"));
+  m.def("dynamic_point_to_voxel_backward", &dynamic_point_to_voxel_backward,
+        "dynamic_point_to_voxel_backward", py::arg("grad_feats"),
+        py::arg("grad_reduced_feats"), py::arg("feats"),
+        py::arg("reduced_feats"), py::arg("coors_idx"), py::arg("reduce_count"),
+        py::arg("reduce_type"));
+  m.def("hard_voxelize_forward", &hard_voxelize_forward,
+        "hard_voxelize_forward", py::arg("points"), py::arg("voxel_size"),
+        py::arg("coors_range"), py::arg("voxels"), py::arg("coors"),
+        py::arg("num_points_per_voxel"), py::arg("voxel_num"),
+        py::arg("max_points"), py::arg("max_voxels"), py::arg("NDim"),
+        py::arg("deterministic"));
+  m.def("dynamic_voxelize_forward", &dynamic_voxelize_forward,
+        "dynamic_voxelize_forward", py::arg("points"), py::arg("voxel_size"),
+        py::arg("coors_range"), py::arg("coors"), py::arg("NDim"));
+  m.def("ms_deform_attn_forward", &ms_deform_attn_forward,
+        "forward function of multi-scale deformable attention",
+        py::arg("value"), py::arg("value_spatial_shapes"),
+        py::arg("value_level_start_index"), py::arg("sampling_locations"),
+        py::arg("attention_weights"), py::arg("im2col_step"));
+  m.def("ms_deform_attn_backward", &ms_deform_attn_backward,
+        "backward function of multi-scale deformable attention",
+        py::arg("value"), py::arg("value_spatial_shapes"),
+        py::arg("value_level_start_index"), py::arg("sampling_locations"),
+        py::arg("attention_weights"), py::arg("grad_output"),
+        py::arg("grad_value"), py::arg("grad_sampling_loc"),
+        py::arg("grad_attn_weight"), py::arg("im2col_step"));
+  m.def("border_align_forward", &border_align_forward,
+        "forward function of border_align", py::arg("input"), py::arg("boxes"),
+        py::arg("output"), py::arg("argmax_idx"), py::arg("pool_size"));
+  m.def("border_align_backward", &border_align_backward,
+        "backward function of border_align", py::arg("grad_output"),
+        py::arg("boxes"), py::arg("argmax_idx"), py::arg("grad_input"),
+        py::arg("pool_size"));
+  m.def("correlation_forward", &correlation_forward, "Correlation forward",
+        py::arg("input1"), py::arg("input2"), py::arg("output"), py::arg("kH"),
+        py::arg("kW"), py::arg("patchH"), py::arg("patchW"), py::arg("padH"),
+        py::arg("padW"), py::arg("dilationH"), py::arg("dilationW"),
+        py::arg("dilation_patchH"), py::arg("dilation_patchW"), py::arg("dH"),
+        py::arg("dW"));
+  m.def("correlation_backward", &correlation_backward, "Correlation backward",
+        py::arg("grad_output"), py::arg("input1"), py::arg("input2"),
+        py::arg("grad_input1"), py::arg("grad_input2"), py::arg("kH"),
+        py::arg("kW"), py::arg("patchH"), py::arg("patchW"), py::arg("padH"),
+        py::arg("padW"), py::arg("dilationH"), py::arg("dilationW"),
+        py::arg("dilation_patchH"), py::arg("dilation_patchW"), py::arg("dH"),
+        py::arg("dW"));
+  m.def("points_in_boxes_cpu_forward", &points_in_boxes_cpu_forward,
+        "points_in_boxes_cpu_forward", py::arg("boxes_tensor"),
+        py::arg("pts_tensor"), py::arg("pts_indices_tensor"));
+  m.def("points_in_boxes_part_forward", &points_in_boxes_part_forward,
+        "points_in_boxes_part_forward", py::arg("boxes_tensor"),
+        py::arg("pts_tensor"), py::arg("box_idx_of_points_tensor"));
+  m.def("points_in_boxes_all_forward", &points_in_boxes_all_forward,
+        "points_in_boxes_all_forward", py::arg("boxes_tensor"),
+        py::arg("pts_tensor"), py::arg("box_idx_of_points_tensor"));
+  m.def("roiaware_pool3d_forward", &roiaware_pool3d_forward,
+        "roiaware_pool3d_forward", py::arg("rois"), py::arg("pts"),
+        py::arg("pts_feature"), py::arg("argmax"), py::arg("pts_idx_of_voxels"),
+        py::arg("pooled_features"), py::arg("pool_method"));
+  m.def("roiaware_pool3d_backward", &roiaware_pool3d_backward,
+        "roiaware_pool3d_backward", py::arg("pts_idx_of_voxels"),
+        py::arg("argmax"), py::arg("grad_out"), py::arg("grad_in"),
+        py::arg("pool_method"));
+  m.def("rotated_feature_align_forward", &rotated_feature_align_forward,
+        "Feature Refine forward (CUDA)", py::arg("features"),
+        py::arg("best_bboxes"), py::arg("output"), py::arg("spatial_scale"),
+        py::arg("points"));
+  m.def("rotated_feature_align_backward", &rotated_feature_align_backward,
+        "Feature Refine backward (CUDA)", py::arg("top_grad"),
+        py::arg("best_bboxes"), py::arg("bottom_grad"),
+        py::arg("spatial_scale"), py::arg("points"));
+  m.def("riroi_align_rotated_forward", &riroi_align_rotated_forward,
+        "riroi_align_rotated forward", py::arg("features"), py::arg("rois"),
+        py::arg("output"), py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"), py::arg("num_samples"),
+        py::arg("num_orientations"), py::arg("clockwise"));
+  m.def("riroi_align_rotated_backward", &riroi_align_rotated_backward,
+        "riroi_align_rotated backward", py::arg("top_grad"), py::arg("rois"),
+        py::arg("bottom_grad"), py::arg("pooled_height"),
+        py::arg("pooled_width"), py::arg("spatial_scale"),
+        py::arg("num_samples"), py::arg("num_orientations"),
+        py::arg("clockwise"));
+  m.def("points_in_polygons_forward", &points_in_polygons_forward,
+        "points_in_polygons_forward", py::arg("points"), py::arg("polygons"),
+        py::arg("output"));
+  m.def("min_area_polygons", &min_area_polygons, "min_area_polygons",
+        py::arg("pointsets"), py::arg("polygons"));
+  m.def("active_rotated_filter_forward", &active_rotated_filter_forward,
+        "active_rotated_filter_forward", py::arg("input"), py::arg("indices"),
+        py::arg("output"));
+  m.def("active_rotated_filter_backward", &active_rotated_filter_backward,
+        "active_rotated_filter_backward", py::arg("grad_out"),
+        py::arg("indices"), py::arg("grad_in"));
+  m.def("convex_iou", &convex_iou, "convex_iou", py::arg("pointsets"),
+        py::arg("polygons"), py::arg("ious"));
+  m.def("convex_giou", &convex_giou, "convex_giou", py::arg("pointsets"),
+        py::arg("polygons"), py::arg("output"));
+  m.def("diff_iou_rotated_sort_vertices_forward",
+        &diff_iou_rotated_sort_vertices_forward,
+        "diff_iou_rotated_sort_vertices_forward", py::arg("vertices"),
+        py::arg("mask"), py::arg("num_valid"));
+  m.def("chamfer_distance_forward", &chamfer_distance_forward,
+        "chamfer_distance_forward", py::arg("xyz1"), py::arg("xyz2"),
+        py::arg("dist1"), py::arg("dist2"), py::arg("idx1"), py::arg("idx2"));
+  m.def("chamfer_distance_backward", &chamfer_distance_backward,
+        "chamfer_distance_backward", py::arg("xyz1"), py::arg("xyz2"),
+        py::arg("idx1"), py::arg("idx2"), py::arg("graddist1"),
+        py::arg("graddist2"), py::arg("gradxyz1"), py::arg("gradxyz2"));
+  m.def("prroi_pool_forward", &prroi_pool_forward, "prroi_pool forward",
+        py::arg("input"), py::arg("rois"), py::arg("output"),
+        py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"));
+  m.def("prroi_pool_backward", &prroi_pool_backward, "prroi_pool_backward",
+        py::arg("grad_output"), py::arg("rois"), py::arg("grad_input"),
+        py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"));
+  m.def("prroi_pool_coor_backward", &prroi_pool_coor_backward,
+        "prroi_pool_coor_backward", py::arg("output"), py::arg("grad_output"),
+        py::arg("input"), py::arg("rois"), py::arg("grad_rois"),
+        py::arg("pooled_height"), py::arg("pooled_width"),
+        py::arg("spatial_scale"));
+  m.def("box_iou_quadri", &box_iou_quadri, "IoU for quadrilateral boxes",
+        py::arg("boxes1"), py::arg("boxes2"), py::arg("ious"),
+        py::arg("mode_flag"), py::arg("aligned"));
+  m.def("nms_quadri", &nms_quadri, "NMS for quadrilateral boxes",
+        py::arg("dets"), py::arg("scores"), py::arg("order"),
+        py::arg("dets_sorted"), py::arg("iou_threshold"),
+        py::arg("multi_label"));
+  m.def("bezier_align_forward", &bezier_align_forward, "bezier_align forward",
+        py::arg("input"), py::arg("rois"), py::arg("output"),
+        py::arg("aligned_height"), py::arg("aligned_width"),
+        py::arg("spatial_scale"), py::arg("sampling_ratio"),
+        py::arg("aligned"));
+  m.def("bezier_align_backward", &bezier_align_backward,
+        "bezier_align backward", py::arg("grad_output"), py::arg("rois"),
+        py::arg("grad_input"), py::arg("aligned_height"),
+        py::arg("aligned_width"), py::arg("spatial_scale"),
+        py::arg("sampling_ratio"), py::arg("aligned"));
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/riroi_align_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/riroi_align_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..81ffa9fd6dcd82117ca13ac83b88b5f023aca466
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/riroi_align_rotated.cpp
@@ -0,0 +1,42 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void riroi_align_rotated_forward_impl(Tensor features, Tensor rois,
+                                      Tensor output, int pooled_height,
+                                      int pooled_width, float spatial_scale,
+                                      int num_samples, int num_orientations,
+                                      bool clockwise) {
+  DISPATCH_DEVICE_IMPL(riroi_align_rotated_forward_impl, features, rois, output,
+                       pooled_height, pooled_width, spatial_scale, num_samples,
+                       num_orientations, clockwise);
+}
+
+void riroi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                       Tensor bottom_grad, int pooled_height,
+                                       int pooled_width, float spatial_scale,
+                                       int num_samples, int num_orientations,
+                                       bool clockwise) {
+  DISPATCH_DEVICE_IMPL(riroi_align_rotated_backward_impl, top_grad, rois,
+                       bottom_grad, pooled_height, pooled_width, spatial_scale,
+                       num_samples, num_orientations, clockwise);
+}
+
+void riroi_align_rotated_forward(Tensor features, Tensor rois, Tensor output,
+                                 int pooled_height, int pooled_width,
+                                 float spatial_scale, int num_samples,
+                                 int num_orientations, bool clockwise) {
+  riroi_align_rotated_forward_impl(features, rois, output, pooled_height,
+                                   pooled_width, spatial_scale, num_samples,
+                                   num_orientations, clockwise);
+}
+
+void riroi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                  Tensor bottom_grad, int pooled_height,
+                                  int pooled_width, float spatial_scale,
+                                  int num_samples, int num_orientations,
+                                  bool clockwise) {
+  riroi_align_rotated_backward_impl(top_grad, rois, bottom_grad, pooled_height,
+                                    pooled_width, spatial_scale, num_samples,
+                                    num_orientations, clockwise);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6e7077397d06ecd55af1e1060e64fe8c5ff08c94
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_align_forward_impl(Tensor input, Tensor rois, Tensor output,
+                            Tensor argmax_y, Tensor argmax_x,
+                            int aligned_height, int aligned_width,
+                            float spatial_scale, int sampling_ratio,
+                            int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_forward_impl, input, rois, output, argmax_y,
+                       argmax_x, aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                             Tensor argmax_x, Tensor grad_input,
+                             int aligned_height, int aligned_width,
+                             float spatial_scale, int sampling_ratio,
+                             int pool_mode, bool aligned) {
+  DISPATCH_DEVICE_IMPL(roi_align_backward_impl, grad_output, rois, argmax_y,
+                       argmax_x, grad_input, aligned_height, aligned_width,
+                       spatial_scale, sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_forward(Tensor input, Tensor rois, Tensor output,
+                       Tensor argmax_y, Tensor argmax_x, int aligned_height,
+                       int aligned_width, float spatial_scale,
+                       int sampling_ratio, int pool_mode, bool aligned) {
+  roi_align_forward_impl(input, rois, output, argmax_y, argmax_x,
+                         aligned_height, aligned_width, spatial_scale,
+                         sampling_ratio, pool_mode, aligned);
+}
+
+void roi_align_backward(Tensor grad_output, Tensor rois, Tensor argmax_y,
+                        Tensor argmax_x, Tensor grad_input, int aligned_height,
+                        int aligned_width, float spatial_scale,
+                        int sampling_ratio, int pool_mode, bool aligned) {
+  roi_align_backward_impl(grad_output, rois, argmax_y, argmax_x, grad_input,
+                          aligned_height, aligned_width, spatial_scale,
+                          sampling_ratio, pool_mode, aligned);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align_rotated.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align_rotated.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..77ea5ce70cff1724a6b012aee127ba256c7dd326
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_align_rotated.cpp
@@ -0,0 +1,41 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_align_rotated_forward_impl(Tensor input, Tensor rois, Tensor output,
+                                    int aligned_height, int aligned_width,
+                                    float spatial_scale, int sampling_ratio,
+                                    bool aligned, bool clockwise) {
+  DISPATCH_DEVICE_IMPL(roi_align_rotated_forward_impl, input, rois, output,
+                       aligned_height, aligned_width, spatial_scale,
+                       sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_backward_impl(Tensor top_grad, Tensor rois,
+                                     Tensor bottom_grad, int aligned_height,
+                                     int aligned_width, float spatial_scale,
+                                     int sampling_ratio, bool aligned,
+                                     bool clockwise) {
+  DISPATCH_DEVICE_IMPL(roi_align_rotated_backward_impl, top_grad, rois,
+                       bottom_grad, aligned_height, aligned_width,
+                       spatial_scale, sampling_ratio, aligned, clockwise);
+}
+
+void roi_align_rotated_forward(Tensor input, Tensor rois, Tensor output,
+                               int aligned_height, int aligned_width,
+                               float spatial_scale, int sampling_ratio,
+                               bool aligned, bool clockwise) {
+  roi_align_rotated_forward_impl(input, rois, output, aligned_height,
+                                 aligned_width, spatial_scale, sampling_ratio,
+                                 aligned, clockwise);
+}
+
+void roi_align_rotated_backward(Tensor top_grad, Tensor rois,
+                                Tensor bottom_grad, int aligned_height,
+                                int aligned_width, float spatial_scale,
+                                int sampling_ratio, bool aligned,
+                                bool clockwise) {
+  roi_align_rotated_backward_impl(top_grad, rois, bottom_grad, aligned_height,
+                                  aligned_width, spatial_scale, sampling_ratio,
+                                  aligned, clockwise);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_pool.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_pool.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bba90b806c5fe59d9e20a0b41a51df9922e91c3f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roi_pool.cpp
@@ -0,0 +1,31 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roi_pool_forward_impl(Tensor input, Tensor rois, Tensor output,
+                           Tensor argmax, int pooled_height, int pooled_width,
+                           float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(roi_pool_forward_impl, input, rois, output, argmax,
+                       pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_backward_impl(Tensor grad_output, Tensor rois, Tensor argmax,
+                            Tensor grad_input, int pooled_height,
+                            int pooled_width, float spatial_scale) {
+  DISPATCH_DEVICE_IMPL(roi_pool_backward_impl, grad_output, rois, argmax,
+                       grad_input, pooled_height, pooled_width, spatial_scale);
+}
+
+void roi_pool_forward(Tensor input, Tensor rois, Tensor output, Tensor argmax,
+                      int pooled_height, int pooled_width,
+                      float spatial_scale) {
+  roi_pool_forward_impl(input, rois, output, argmax, pooled_height,
+                        pooled_width, spatial_scale);
+}
+
+void roi_pool_backward(Tensor grad_output, Tensor rois, Tensor argmax,
+                       Tensor grad_input, int pooled_height, int pooled_width,
+                       float spatial_scale) {
+  roi_pool_backward_impl(grad_output, rois, argmax, grad_input, pooled_height,
+                         pooled_width, spatial_scale);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roiaware_pool3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roiaware_pool3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6cf9cf0945db4c0ce1774aed6d334b62f3e1a9e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roiaware_pool3d.cpp
@@ -0,0 +1,72 @@
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roiaware_pool3d_forward_impl(int boxes_num, int pts_num, int channels,
+                                  int max_pts_each_voxel, int out_x, int out_y,
+                                  int out_z, const Tensor rois,
+                                  const Tensor pts, const Tensor pts_feature,
+                                  Tensor argmax, Tensor pts_idx_of_voxels,
+                                  Tensor pooled_features, int pool_method) {
+  DISPATCH_DEVICE_IMPL(roiaware_pool3d_forward_impl, boxes_num, pts_num,
+                       channels, max_pts_each_voxel, out_x, out_y, out_z, rois,
+                       pts, pts_feature, argmax, pts_idx_of_voxels,
+                       pooled_features, pool_method);
+}
+
+void roiaware_pool3d_backward_impl(int boxes_num, int out_x, int out_y,
+                                   int out_z, int channels,
+                                   int max_pts_each_voxel,
+                                   const Tensor pts_idx_of_voxels,
+                                   const Tensor argmax, const Tensor grad_out,
+                                   Tensor grad_in, int pool_method) {
+  DISPATCH_DEVICE_IMPL(roiaware_pool3d_backward_impl, boxes_num, out_x, out_y,
+                       out_z, channels, max_pts_each_voxel, pts_idx_of_voxels,
+                       argmax, grad_out, grad_in, pool_method);
+}
+
+void roiaware_pool3d_forward(Tensor rois, Tensor pts, Tensor pts_feature,
+                             Tensor argmax, Tensor pts_idx_of_voxels,
+                             Tensor pooled_features, int pool_method) {
+  // params rois: (N, 7) [x, y, z, x_size, y_size, z_size, ry] in LiDAR
+  // coordinate
+  // params pts: (npoints, 3) [x, y, z] in LiDAR coordinate
+  // params pts_feature: (npoints, C)
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params pooled_features: (N, out_x, out_y, out_z, C)
+  // params pool_method: 0: max_pool 1: avg_pool
+  int boxes_num = rois.size(0);
+  int pts_num = pts.size(0);
+  int channels = pts_feature.size(1);
+  int max_pts_each_voxel = pts_idx_of_voxels.size(4);  // index 0 is the counter
+  int out_x = pts_idx_of_voxels.size(1);
+  int out_y = pts_idx_of_voxels.size(2);
+  int out_z = pts_idx_of_voxels.size(3);
+  assert((out_x < 256) && (out_y < 256) &&
+         (out_z < 256));  // we encode index with 8bit
+
+  roiaware_pool3d_forward_impl(boxes_num, pts_num, channels, max_pts_each_voxel,
+                               out_x, out_y, out_z, rois, pts, pts_feature,
+                               argmax, pts_idx_of_voxels, pooled_features,
+                               pool_method);
+}
+
+void roiaware_pool3d_backward(Tensor pts_idx_of_voxels, Tensor argmax,
+                              Tensor grad_out, Tensor grad_in,
+                              int pool_method) {
+  // params pts_idx_of_voxels: (N, out_x, out_y, out_z, max_pts_each_voxel)
+  // params argmax: (N, out_x, out_y, out_z, C)
+  // params grad_out: (N, out_x, out_y, out_z, C)
+  // params grad_in: (npoints, C), return value
+  // params pool_method: 0: max_pool 1: avg_pool
+  int boxes_num = pts_idx_of_voxels.size(0);
+  int out_x = pts_idx_of_voxels.size(1);
+  int out_y = pts_idx_of_voxels.size(2);
+  int out_z = pts_idx_of_voxels.size(3);
+  int max_pts_each_voxel = pts_idx_of_voxels.size(4);  // index 0 is the counter
+  int channels = grad_out.size(4);
+
+  roiaware_pool3d_backward_impl(boxes_num, out_x, out_y, out_z, channels,
+                                max_pts_each_voxel, pts_idx_of_voxels, argmax,
+                                grad_out, grad_in, pool_method);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roipoint_pool3d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roipoint_pool3d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a10080b7c23abb3a31b6f764c972ea7917f52346
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/roipoint_pool3d.cpp
@@ -0,0 +1,39 @@
+/*
+Modified from
+https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/ops/roipoint_pool3d/src/roipoint_pool3d.cpp
+Point cloud feature pooling
+Written by Shaoshuai Shi
+All Rights Reserved 2018.
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void roipoint_pool3d_forward_impl(int batch_size, int pts_num, int boxes_num,
+                                  int feature_in_len, int sampled_pts_num,
+                                  const Tensor xyz, const Tensor boxes3d,
+                                  const Tensor pts_feature,
+                                  Tensor pooled_features,
+                                  Tensor pooled_empty_flag) {
+  DISPATCH_DEVICE_IMPL(roipoint_pool3d_forward_impl, batch_size, pts_num,
+                       boxes_num, feature_in_len, sampled_pts_num, xyz, boxes3d,
+                       pts_feature, pooled_features, pooled_empty_flag);
+}
+
+void roipoint_pool3d_forward(Tensor xyz, Tensor boxes3d, Tensor pts_feature,
+                             Tensor pooled_features, Tensor pooled_empty_flag) {
+  // params xyz: (B, N, 3)
+  // params boxes3d: (B, M, 7)
+  // params pts_feature: (B, N, C)
+  // params pooled_features: (B, M, 512, 3+C)
+  // params pooled_empty_flag: (B, M)
+  int batch_size = xyz.size(0);
+  int pts_num = xyz.size(1);
+  int boxes_num = boxes3d.size(1);
+  int feature_in_len = pts_feature.size(2);
+  int sampled_pts_num = pooled_features.size(2);
+
+  roipoint_pool3d_forward_impl(batch_size, pts_num, boxes_num, feature_in_len,
+                               sampled_pts_num, xyz, boxes3d, pts_feature,
+                               pooled_features, pooled_empty_flag);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/rotated_feature_align.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/rotated_feature_align.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..71fe0c9a0a26003310a388d4edca6e79aa7b9026
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/rotated_feature_align.cpp
@@ -0,0 +1,39 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+// Modified from
+// https://github.com/SJTU-Thinklab-Det/r3det-on-mmdetection/blob/master/mmdet/ops/fr/src/feature_refine_cuda.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void rotated_feature_align_forward_impl(const Tensor features,
+                                        const Tensor best_bboxes,
+                                        const float spatial_scale,
+                                        const int points, Tensor output) {
+  DISPATCH_DEVICE_IMPL(rotated_feature_align_forward_impl, features,
+                       best_bboxes, spatial_scale, points, output);
+}
+
+void rotated_feature_align_backward_impl(const Tensor top_grad,
+                                         const Tensor best_bboxes,
+                                         const float spatial_scale,
+                                         const int points, Tensor bottom_grad) {
+  DISPATCH_DEVICE_IMPL(rotated_feature_align_backward_impl, top_grad,
+                       best_bboxes, spatial_scale, points, bottom_grad);
+}
+
+void rotated_feature_align_forward(const Tensor features,
+                                   const Tensor best_bboxes, Tensor output,
+                                   const float spatial_scale,
+                                   const int points) {
+  rotated_feature_align_forward_impl(features, best_bboxes, spatial_scale,
+                                     points, output);
+}
+
+void rotated_feature_align_backward(const Tensor top_grad,
+                                    const Tensor best_bboxes,
+                                    Tensor bottom_grad,
+                                    const float spatial_scale,
+                                    const int points) {
+  rotated_feature_align_backward_impl(top_grad, best_bboxes, spatial_scale,
+                                      points, bottom_grad);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/scatter_points.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/scatter_points.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0de8ebf64a3432db25b61a81fce305efc09195b8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/scatter_points.cpp
@@ -0,0 +1,53 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_impl(
+    const torch::Tensor &feats, const torch::Tensor &coors,
+    const reduce_t reduce_type) {
+  return DISPATCH_DEVICE_IMPL(dynamic_point_to_voxel_forward_impl, feats, coors,
+                              reduce_type);
+}
+
+void dynamic_point_to_voxel_backward_impl(
+    torch::Tensor &grad_feats, const torch::Tensor &grad_reduced_feats,
+    const torch::Tensor &feats, const torch::Tensor &reduced_feats,
+    const torch::Tensor &coors_idx, const torch::Tensor &reduce_count,
+    const reduce_t reduce_type) {
+  DISPATCH_DEVICE_IMPL(dynamic_point_to_voxel_backward_impl, grad_feats,
+                       grad_reduced_feats, feats, reduced_feats, coors_idx,
+                       reduce_count, reduce_type);
+}
+
+inline reduce_t convert_reduce_type(const std::string &reduce_type) {
+  if (reduce_type == "max")
+    return reduce_t::MAX;
+  else if (reduce_type == "sum")
+    return reduce_t::SUM;
+  else if (reduce_type == "mean")
+    return reduce_t::MEAN;
+  else
+    TORCH_CHECK(false, "do not support reduce type " + reduce_type)
+  return reduce_t::SUM;
+}
+
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward(
+    const torch::Tensor &feats, const torch::Tensor &coors,
+    const std::string &reduce_type) {
+  return dynamic_point_to_voxel_forward_impl(feats, coors,
+                                             convert_reduce_type(reduce_type));
+}
+
+void dynamic_point_to_voxel_backward(torch::Tensor &grad_feats,
+                                     const torch::Tensor &grad_reduced_feats,
+                                     const torch::Tensor &feats,
+                                     const torch::Tensor &reduced_feats,
+                                     const torch::Tensor &coors_idx,
+                                     const torch::Tensor &reduce_count,
+                                     const std::string &reduce_type) {
+  dynamic_point_to_voxel_backward_impl(grad_feats, grad_reduced_feats, feats,
+                                       reduced_feats, coors_idx, reduce_count,
+                                       convert_reduce_type(reduce_type));
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/sync_bn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/sync_bn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..fd5a513273a7bbce2cf41c790706fe4801f4c414
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/sync_bn.cpp
@@ -0,0 +1,69 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void sync_bn_forward_mean_impl(const Tensor input, Tensor mean) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_mean_impl, input, mean);
+}
+
+void sync_bn_forward_var_impl(const Tensor input, const Tensor mean,
+                              Tensor var) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_var_impl, input, mean, var);
+}
+
+void sync_bn_forward_output_impl(const Tensor input, const Tensor mean,
+                                 const Tensor var, Tensor running_mean,
+                                 Tensor running_var, const Tensor weight,
+                                 const Tensor bias, Tensor norm, Tensor std,
+                                 Tensor output, float eps, float momentum,
+                                 int group_size) {
+  DISPATCH_DEVICE_IMPL(sync_bn_forward_output_impl, input, mean, var,
+                       running_mean, running_var, weight, bias, norm, std,
+                       output, eps, momentum, group_size);
+}
+
+void sync_bn_backward_param_impl(const Tensor grad_output, const Tensor norm,
+                                 Tensor grad_weight, Tensor grad_bias) {
+  DISPATCH_DEVICE_IMPL(sync_bn_backward_param_impl, grad_output, norm,
+                       grad_weight, grad_bias);
+}
+
+void sync_bn_backward_data_impl(const Tensor grad_output, const Tensor weight,
+                                const Tensor grad_weight,
+                                const Tensor grad_bias, const Tensor norm,
+                                const Tensor std, Tensor grad_input) {
+  DISPATCH_DEVICE_IMPL(sync_bn_backward_data_impl, grad_output, weight,
+                       grad_weight, grad_bias, norm, std, grad_input);
+}
+
+void sync_bn_forward_mean(const Tensor input, Tensor mean) {
+  sync_bn_forward_mean_impl(input, mean);
+}
+
+void sync_bn_forward_var(const Tensor input, const Tensor mean, Tensor var) {
+  sync_bn_forward_var_impl(input, mean, var);
+}
+
+void sync_bn_forward_output(const Tensor input, const Tensor mean,
+                            const Tensor var, const Tensor weight,
+                            const Tensor bias, Tensor running_mean,
+                            Tensor running_var, Tensor norm, Tensor std,
+                            Tensor output, float eps, float momentum,
+                            int group_size) {
+  sync_bn_forward_output_impl(input, mean, var, running_mean, running_var,
+                              weight, bias, norm, std, output, eps, momentum,
+                              group_size);
+}
+
+void sync_bn_backward_param(const Tensor grad_output, const Tensor norm,
+                            Tensor grad_weight, Tensor grad_bias) {
+  sync_bn_backward_param_impl(grad_output, norm, grad_weight, grad_bias);
+}
+
+void sync_bn_backward_data(const Tensor grad_output, const Tensor weight,
+                           const Tensor grad_weight, const Tensor grad_bias,
+                           const Tensor norm, const Tensor std,
+                           Tensor grad_input) {
+  sync_bn_backward_data_impl(grad_output, weight, grad_weight, grad_bias, norm,
+                             std, grad_input);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_interpolate.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_interpolate.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1e0ec71bb3d3fdb8416dcc62cfda926cc45c9977
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_interpolate.cpp
@@ -0,0 +1,33 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void three_interpolate_forward_impl(int b, int c, int m, int n,
+                                    const Tensor points, const Tensor idx,
+                                    const Tensor weight, Tensor out) {
+  DISPATCH_DEVICE_IMPL(three_interpolate_forward_impl, b, c, m, n, points, idx,
+                       weight, out);
+}
+
+void three_interpolate_backward_impl(int b, int c, int n, int m,
+                                     const Tensor grad_out, const Tensor idx,
+                                     const Tensor weight, Tensor grad_points) {
+  DISPATCH_DEVICE_IMPL(three_interpolate_backward_impl, b, c, n, m, grad_out,
+                       idx, weight, grad_points);
+}
+
+void three_interpolate_forward(Tensor points_tensor, Tensor idx_tensor,
+                               Tensor weight_tensor, Tensor out_tensor, int b,
+                               int c, int m, int n) {
+  three_interpolate_forward_impl(b, c, m, n, points_tensor, idx_tensor,
+                                 weight_tensor, out_tensor);
+}
+
+void three_interpolate_backward(Tensor grad_out_tensor, Tensor idx_tensor,
+                                Tensor weight_tensor, Tensor grad_points_tensor,
+                                int b, int c, int n, int m) {
+  three_interpolate_backward_impl(b, c, n, m, grad_out_tensor, idx_tensor,
+                                  weight_tensor, grad_points_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_nn.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_nn.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b629200c0727cdec5ca4e0abd8ac65baacaa31f9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/three_nn.cpp
@@ -0,0 +1,18 @@
+// Modified from
+// https://github.com/sshaoshuai/Pointnet2.PyTorch/tree/master/pointnet2/src/interpolate.cpp
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void three_nn_forward_impl(int b, int n, int m, const Tensor unknown,
+                           const Tensor known, Tensor dist2, Tensor idx) {
+  DISPATCH_DEVICE_IMPL(three_nn_forward_impl, b, n, m, unknown, known, dist2,
+                       idx);
+}
+
+void three_nn_forward(Tensor unknown_tensor, Tensor known_tensor,
+                      Tensor dist2_tensor, Tensor idx_tensor, int b, int n,
+                      int m) {
+  three_nn_forward_impl(b, n, m, unknown_tensor, known_tensor, dist2_tensor,
+                        idx_tensor);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/tin_shift.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/tin_shift.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b03f587541f17cae3c3f03f5cb8747d4b0208efc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/tin_shift.cpp
@@ -0,0 +1,20 @@
+// Copyright (c) OpenMMLab. All rights reserved
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+void tin_shift_forward_impl(Tensor input, Tensor shift, Tensor output) {
+  DISPATCH_DEVICE_IMPL(tin_shift_forward_impl, input, shift, output);
+}
+
+void tin_shift_backward_impl(Tensor grad_output, Tensor shift,
+                             Tensor grad_input) {
+  DISPATCH_DEVICE_IMPL(tin_shift_backward_impl, grad_output, shift, grad_input);
+}
+
+void tin_shift_forward(Tensor input, Tensor shift, Tensor output) {
+  tin_shift_forward_impl(input, shift, output);
+}
+
+void tin_shift_backward(Tensor grad_output, Tensor shift, Tensor grad_input) {
+  tin_shift_backward_impl(grad_output, shift, grad_input);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/upfirdn2d.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/upfirdn2d.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..dd325bd7887a49b5f0ccd134604f24c0fd40fc10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/upfirdn2d.cpp
@@ -0,0 +1,118 @@
+// Modified from
+// https://github.com/rosinality/stylegan2-pytorch/blob/master/op/upfirdn2d.cpp
+
+/*
+Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+
+NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+Augmentation (ADA)
+=======================================================================
+
+1. Definitions
+
+"Licensor" means any person or entity that distributes its Work.
+
+"Software" means the original work of authorship made available under
+this License.
+
+"Work" means the Software and any additions to or derivative works of
+the Software that are made available under this License.
+
+The terms "reproduce," "reproduction," "derivative works," and
+"distribution" have the meaning as provided under U.S. copyright law;
+provided, however, that for the purposes of this License, derivative
+works shall not include works that remain separable from, or merely
+link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are "made available" under this License
+by including in or with the Work either (a) a copyright notice
+referencing the applicability of this License to the Work, or (b) a
+copy of this License.
+
+2. License Grants
+
+    2.1 Copyright Grant. Subject to the terms and conditions of this
+    License, each Licensor grants to you a perpetual, worldwide,
+    non-exclusive, royalty-free, copyright license to reproduce,
+    prepare derivative works of, publicly display, publicly perform,
+    sublicense and distribute its Work and any resulting derivative
+    works in any form.
+
+3. Limitations
+
+    3.1 Redistribution. You may reproduce or distribute the Work only
+    if (a) you do so under this License, (b) you include a complete
+    copy of this License with your distribution, and (c) you retain
+    without modification any copyright, patent, trademark, or
+    attribution notices that are present in the Work.
+
+    3.2 Derivative Works. You may specify that additional or different
+    terms apply to the use, reproduction, and distribution of your
+    derivative works of the Work ("Your Terms") only if (a) Your Terms
+    provide that the use limitation in Section 3.3 applies to your
+    derivative works, and (b) you identify the specific derivative
+    works that are subject to Your Terms. Notwithstanding Your Terms,
+    this License (including the redistribution requirements in Section
+    3.1) will continue to apply to the Work itself.
+
+    3.3 Use Limitation. The Work and any derivative works thereof only
+    may be used or intended for use non-commercially. Notwithstanding
+    the foregoing, NVIDIA and its affiliates may use the Work and any
+    derivative works commercially. As used herein, "non-commercially"
+    means for research or evaluation purposes only.
+
+    3.4 Patent Claims. If you bring or threaten to bring a patent claim
+    against any Licensor (including any claim, cross-claim or
+    counterclaim in a lawsuit) to enforce any patents that you allege
+    are infringed by any Work, then your rights under this License from
+    such Licensor (including the grant in Section 2.1) will terminate
+    immediately.
+
+    3.5 Trademarks. This License does not grant any rights to use any
+    Licensor’s or its affiliates’ names, logos, or trademarks, except
+    as necessary to reproduce the notices described in this License.
+
+    3.6 Termination. If you violate any term of this License, then your
+    rights under this License (including the grant in Section 2.1) will
+    terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
+*/
+
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+torch::Tensor upfirdn2d_op_impl(const torch::Tensor& input,
+                                const torch::Tensor& kernel, int up_x, int up_y,
+                                int down_x, int down_y, int pad_x0, int pad_x1,
+                                int pad_y0, int pad_y1) {
+  return DISPATCH_DEVICE_IMPL(upfirdn2d_op_impl, input, kernel, up_x, up_y,
+                              down_x, down_y, pad_x0, pad_x1, pad_y0, pad_y1);
+}
+
+torch::Tensor upfirdn2d(const torch::Tensor& input, const torch::Tensor& kernel,
+                        int up_x, int up_y, int down_x, int down_y, int pad_x0,
+                        int pad_x1, int pad_y0, int pad_y1) {
+  return upfirdn2d_op_impl(input, kernel, up_x, up_y, down_x, down_y, pad_x0,
+                           pad_x1, pad_y0, pad_y1);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/voxelization.cpp b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/voxelization.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7946be6178ad5eae64958b4631c1cabec2a04eee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/csrc/pytorch/voxelization.cpp
@@ -0,0 +1,74 @@
+// Copyright (c) OpenMMLab. All rights reserved.
+#include "pytorch_cpp_helper.hpp"
+#include "pytorch_device_registry.hpp"
+
+int hard_voxelize_forward_impl(const at::Tensor &points, at::Tensor &voxels,
+                               at::Tensor &coors,
+                               at::Tensor &num_points_per_voxel,
+                               const std::vector<float> voxel_size,
+                               const std::vector<float> coors_range,
+                               const int max_points, const int max_voxels,
+                               const int NDim = 3) {
+  return DISPATCH_DEVICE_IMPL(hard_voxelize_forward_impl, points, voxels, coors,
+                              num_points_per_voxel, voxel_size, coors_range,
+                              max_points, max_voxels, NDim);
+}
+
+int nondeterministic_hard_voxelize_forward_impl(
+    const at::Tensor &points, at::Tensor &voxels, at::Tensor &coors,
+    at::Tensor &num_points_per_voxel, const std::vector<float> voxel_size,
+    const std::vector<float> coors_range, const int max_points,
+    const int max_voxels, const int NDim = 3) {
+  return DISPATCH_DEVICE_IMPL(nondeterministic_hard_voxelize_forward_impl,
+                              points, voxels, coors, num_points_per_voxel,
+                              voxel_size, coors_range, max_points, max_voxels,
+                              NDim);
+}
+
+void dynamic_voxelize_forward_impl(const at::Tensor &points, at::Tensor &coors,
+                                   const std::vector<float> voxel_size,
+                                   const std::vector<float> coors_range,
+                                   const int NDim = 3) {
+  DISPATCH_DEVICE_IMPL(dynamic_voxelize_forward_impl, points, coors, voxel_size,
+                       coors_range, NDim);
+}
+
+void hard_voxelize_forward(const at::Tensor &points,
+                           const at::Tensor &voxel_size,
+                           const at::Tensor &coors_range, at::Tensor &voxels,
+                           at::Tensor &coors, at::Tensor &num_points_per_voxel,
+                           at::Tensor &voxel_num, const int max_points,
+                           const int max_voxels, const int NDim = 3,
+                           const bool deterministic = true) {
+  int64_t *voxel_num_data = voxel_num.data_ptr<int64_t>();
+  std::vector<float> voxel_size_v(
+      voxel_size.data_ptr<float>(),
+      voxel_size.data_ptr<float>() + voxel_size.numel());
+  std::vector<float> coors_range_v(
+      coors_range.data_ptr<float>(),
+      coors_range.data_ptr<float>() + coors_range.numel());
+
+  if (deterministic) {
+    *voxel_num_data = hard_voxelize_forward_impl(
+        points, voxels, coors, num_points_per_voxel, voxel_size_v,
+        coors_range_v, max_points, max_voxels, NDim);
+  } else {
+    *voxel_num_data = nondeterministic_hard_voxelize_forward_impl(
+        points, voxels, coors, num_points_per_voxel, voxel_size_v,
+        coors_range_v, max_points, max_voxels, NDim);
+  }
+}
+
+void dynamic_voxelize_forward(const at::Tensor &points,
+                              const at::Tensor &voxel_size,
+                              const at::Tensor &coors_range, at::Tensor &coors,
+                              const int NDim = 3) {
+  std::vector<float> voxel_size_v(
+      voxel_size.data_ptr<float>(),
+      voxel_size.data_ptr<float>() + voxel_size.numel());
+  std::vector<float> coors_range_v(
+      coors_range.data_ptr<float>(),
+      coors_range.data_ptr<float>() + coors_range.numel());
+  dynamic_voxelize_forward_impl(points, coors, voxel_size_v, coors_range_v,
+                                NDim);
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_conv.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc71b5c078afa4d102096e1f11629fd6b527a44c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_conv.py
@@ -0,0 +1,437 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.logging import print_log
+from mmengine.registry import MODELS
+from mmengine.utils import deprecated_api_warning
+from torch import Tensor
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair, _single
+
+from ..utils import ext_loader
+from .modulated_deform_conv import ModulatedDeformConv2dFunction
+
+ext_module = ext_loader.load_ext('_ext', [
+    'deform_conv_forward', 'deform_conv_backward_input',
+    'deform_conv_backward_parameters'
+])
+
+
+class DeformConv2dFunction(Function):
+
+    @staticmethod
+    def symbolic(g,
+                 input,
+                 offset,
+                 weight,
+                 stride,
+                 padding,
+                 dilation,
+                 groups,
+                 deform_groups,
+                 bias=False,
+                 im2col_step=32):
+        return g.op(
+            'mmcv::MMCVDeformConv2d',
+            input,
+            offset,
+            weight,
+            stride_i=stride,
+            padding_i=padding,
+            dilation_i=dilation,
+            groups_i=groups,
+            deform_groups_i=deform_groups,
+            bias_i=bias,
+            im2col_step_i=im2col_step)
+
+    @staticmethod
+    def _npu_backward(ctx, grad_output):
+        input_tensor, weight, offset_out, offset_all, sort_index_for_npu_bp = \
+            ctx.saved_tensors
+        grad_input, grad_weight, grad_offset_all, grad_bias = \
+            torch.npu_deformable_conv2dbk(
+                input_tensor, grad_output, offset_out, weight, offset_all,
+                kernel_size=[weight.shape[3], weight.shape[2]],
+                stride=[1, 1, ctx.stride[0], ctx.stride[1]],
+                padding=[1, 1, ctx.padding[0], ctx.padding[1]],
+                dilation=[1, 1, ctx.dilation[0], ctx.dilation[1]],
+                groups=ctx.groups, deformable_groups=ctx.deform_groups,
+                modulated=True)
+        grad_offset = grad_offset_all.index_select(1, sort_index_for_npu_bp)
+        return grad_input, grad_offset, grad_weight, \
+            None, None, None, None, None, None, None
+
+    @staticmethod
+    def forward(ctx,
+                input: Tensor,
+                offset: Tensor,
+                weight: Tensor,
+                stride: Union[int, Tuple[int, ...]] = 1,
+                padding: Union[int, Tuple[int, ...]] = 0,
+                dilation: Union[int, Tuple[int, ...]] = 1,
+                groups: int = 1,
+                deform_groups: int = 1,
+                bias: bool = False,
+                im2col_step: int = 32) -> Tensor:
+        if input is not None and input.dim() != 4:
+            raise ValueError(
+                f'Expected 4D tensor as input, got {input.dim()}D tensor \
+                  instead.')
+        assert bias is False, 'Only support bias is False.'
+        ctx.stride = _pair(stride)
+        ctx.padding = _pair(padding)
+        ctx.dilation = _pair(dilation)
+        ctx.groups = groups
+        ctx.deform_groups = deform_groups
+        ctx.im2col_step = im2col_step
+        ctx.device = input.device.type
+
+        # When pytorch version >= 1.6.0, amp is adopted for fp16 mode;
+        # amp won't cast the type of model (float32), but "offset" is cast
+        # to float16 by nn.Conv2d automatically, leading to the type
+        # mismatch with input (when it is float32) or weight.
+        # The flag for whether to use fp16 or amp is the type of "offset",
+        # we cast weight and input to temporarily support fp16 and amp
+        # whatever the pytorch version is.
+        input = input.type_as(offset)
+        weight = weight.type_as(input)
+        if ctx.device == 'npu':
+            mask_shape, _ = torch.chunk(offset, 2, dim=1)
+            mask = torch.ones_like(mask_shape).to(input.device)
+            bias = input.new_empty(0)
+            output = ModulatedDeformConv2dFunction._npu_forward(
+                ctx, input, offset, mask, weight, bias)
+            return output
+        ctx.save_for_backward(input, offset, weight)
+
+        output = input.new_empty(
+            DeformConv2dFunction._output_size(ctx, input, weight))
+
+        ctx.bufs_ = [input.new_empty(0), input.new_empty(0)]  # columns, ones
+
+        cur_im2col_step = min(ctx.im2col_step, input.size(0))
+        assert (input.size(0) % cur_im2col_step
+                ) == 0, 'batch size must be divisible by im2col_step'
+        ext_module.deform_conv_forward(
+            input,
+            weight,
+            offset,
+            output,
+            ctx.bufs_[0],
+            ctx.bufs_[1],
+            kW=weight.size(3),
+            kH=weight.size(2),
+            dW=ctx.stride[1],
+            dH=ctx.stride[0],
+            padW=ctx.padding[1],
+            padH=ctx.padding[0],
+            dilationW=ctx.dilation[1],
+            dilationH=ctx.dilation[0],
+            group=ctx.groups,
+            deformable_group=ctx.deform_groups,
+            im2col_step=cur_im2col_step)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(
+        ctx, grad_output: Tensor
+    ) -> Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor], None,
+               None, None, None, None, None, None]:
+        if ctx.device == 'npu':
+            return DeformConv2dFunction._npu_backward(ctx, grad_output)
+        input, offset, weight = ctx.saved_tensors
+
+        grad_input = grad_offset = grad_weight = None
+
+        cur_im2col_step = min(ctx.im2col_step, input.size(0))
+        assert (input.size(0) % cur_im2col_step
+                ) == 0, 'batch size must be divisible by im2col_step'
+
+        grad_output = grad_output.contiguous()
+        if ctx.needs_input_grad[0] or ctx.needs_input_grad[1]:
+            grad_input = torch.zeros_like(input)
+            grad_offset = torch.zeros_like(offset)
+            ext_module.deform_conv_backward_input(
+                input,
+                offset,
+                grad_output,
+                grad_input,
+                grad_offset,
+                weight,
+                ctx.bufs_[0],
+                kW=weight.size(3),
+                kH=weight.size(2),
+                dW=ctx.stride[1],
+                dH=ctx.stride[0],
+                padW=ctx.padding[1],
+                padH=ctx.padding[0],
+                dilationW=ctx.dilation[1],
+                dilationH=ctx.dilation[0],
+                group=ctx.groups,
+                deformable_group=ctx.deform_groups,
+                im2col_step=cur_im2col_step)
+
+        if ctx.needs_input_grad[2]:
+            grad_weight = torch.zeros_like(weight)
+            ext_module.deform_conv_backward_parameters(
+                input,
+                offset,
+                grad_output,
+                grad_weight,
+                ctx.bufs_[0],
+                ctx.bufs_[1],
+                kW=weight.size(3),
+                kH=weight.size(2),
+                dW=ctx.stride[1],
+                dH=ctx.stride[0],
+                padW=ctx.padding[1],
+                padH=ctx.padding[0],
+                dilationW=ctx.dilation[1],
+                dilationH=ctx.dilation[0],
+                group=ctx.groups,
+                deformable_group=ctx.deform_groups,
+                scale=1,
+                im2col_step=cur_im2col_step)
+
+        return grad_input, grad_offset, grad_weight, \
+            None, None, None, None, None, None, None
+
+    @staticmethod
+    def _output_size(ctx, input, weight):
+        channels = weight.size(0)
+        output_size = (input.size(0), channels)
+        for d in range(input.dim() - 2):
+            in_size = input.size(d + 2)
+            pad = ctx.padding[d]
+            kernel = ctx.dilation[d] * (weight.size(d + 2) - 1) + 1
+            stride_ = ctx.stride[d]
+            output_size += ((in_size + (2 * pad) - kernel) // stride_ + 1, )
+        if not all(map(lambda s: s > 0, output_size)):
+            raise ValueError(
+                'convolution input is too small (output would be ' +
+                'x'.join(map(str, output_size)) + ')')
+        return output_size
+
+
+deform_conv2d = DeformConv2dFunction.apply
+
+
+class DeformConv2d(nn.Module):
+    r"""Deformable 2D convolution.
+
+    Applies a deformable 2D convolution over an input signal composed of
+    several input planes. DeformConv2d was described in the paper
+    `Deformable Convolutional Networks
+    <https://arxiv.org/pdf/1703.06211.pdf>`_
+
+    Note:
+        The argument ``im2col_step`` was added in version 1.3.17, which means
+        number of samples processed by the ``im2col_cuda_kernel`` per call.
+        It enables users to define ``batch_size`` and ``im2col_step`` more
+        flexibly and solved `issue mmcv#1440
+        <https://github.com/open-mmlab/mmcv/issues/1440>`_.
+
+    Args:
+        in_channels (int): Number of channels in the input image.
+        out_channels (int): Number of channels produced by the convolution.
+        kernel_size(int, tuple): Size of the convolving kernel.
+        stride(int, tuple): Stride of the convolution. Default: 1.
+        padding (int or tuple): Zero-padding added to both sides of the input.
+            Default: 0.
+        dilation (int or tuple): Spacing between kernel elements. Default: 1.
+        groups (int): Number of blocked connections from input.
+            channels to output channels. Default: 1.
+        deform_groups (int): Number of deformable group partitions.
+        bias (bool): If True, adds a learnable bias to the output.
+            Default: False.
+        im2col_step (int): Number of samples processed by im2col_cuda_kernel
+            per call. It will work when ``batch_size`` > ``im2col_step``, but
+            ``batch_size`` must be divisible by ``im2col_step``. Default: 32.
+            `New in version 1.3.17.`
+    """
+
+    @deprecated_api_warning({'deformable_groups': 'deform_groups'},
+                            cls_name='DeformConv2d')
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, ...]],
+                 stride: Union[int, Tuple[int, ...]] = 1,
+                 padding: Union[int, Tuple[int, ...]] = 0,
+                 dilation: Union[int, Tuple[int, ...]] = 1,
+                 groups: int = 1,
+                 deform_groups: int = 1,
+                 bias: bool = False,
+                 im2col_step: int = 32) -> None:
+        super().__init__()
+
+        assert not bias, \
+            f'bias={bias} is not supported in DeformConv2d.'
+        assert in_channels % groups == 0, \
+            f'in_channels {in_channels} cannot be divisible by groups {groups}'
+        assert out_channels % groups == 0, \
+            f'out_channels {out_channels} cannot be divisible by groups \
+              {groups}'
+
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = _pair(kernel_size)
+        self.stride = _pair(stride)
+        self.padding = _pair(padding)
+        self.dilation = _pair(dilation)
+        self.groups = groups
+        self.deform_groups = deform_groups
+        self.im2col_step = im2col_step
+        # enable compatibility with nn.Conv2d
+        self.transposed = False
+        self.output_padding = _single(0)
+
+        # only weight, no bias
+        self.weight = nn.Parameter(
+            torch.Tensor(out_channels, in_channels // self.groups,
+                         *self.kernel_size))
+
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        # switch the initialization of `self.weight` to the standard kaiming
+        # method described in `Delving deep into rectifiers: Surpassing
+        # human-level performance on ImageNet classification` - He, K. et al.
+        # (2015), using a uniform distribution
+        nn.init.kaiming_uniform_(self.weight, nonlinearity='relu')
+
+    def forward(self, x: Tensor, offset: Tensor) -> Tensor:
+        """Deformable Convolutional forward function.
+
+        Args:
+            x (Tensor): Input feature, shape (B, C_in, H_in, W_in)
+            offset (Tensor): Offset for deformable convolution, shape
+                (B, deform_groups*kernel_size[0]*kernel_size[1]*2,
+                H_out, W_out), H_out, W_out are equal to the output's.
+
+                An offset is like `[y0, x0, y1, x1, y2, x2, ..., y8, x8]`.
+                The spatial arrangement is like:
+
+                .. code:: text
+
+                    (x0, y0) (x1, y1) (x2, y2)
+                    (x3, y3) (x4, y4) (x5, y5)
+                    (x6, y6) (x7, y7) (x8, y8)
+
+        Returns:
+            Tensor: Output of the layer.
+        """
+        # To fix an assert error in deform_conv_cuda.cpp:128
+        # input image is smaller than kernel
+        input_pad = (x.size(2) < self.kernel_size[0]) or (x.size(3) <
+                                                          self.kernel_size[1])
+        if input_pad:
+            pad_h = max(self.kernel_size[0] - x.size(2), 0)
+            pad_w = max(self.kernel_size[1] - x.size(3), 0)
+            x = F.pad(x, (0, pad_w, 0, pad_h), 'constant', 0).contiguous()
+            offset = F.pad(offset, (0, pad_w, 0, pad_h), 'constant', 0)
+            offset = offset.contiguous()
+        out = deform_conv2d(x, offset, self.weight, self.stride, self.padding,
+                            self.dilation, self.groups, self.deform_groups,
+                            False, self.im2col_step)
+        if input_pad:
+            out = out[:, :, :out.size(2) - pad_h, :out.size(3) -
+                      pad_w].contiguous()
+        return out
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(in_channels={self.in_channels},\n'
+        s += f'out_channels={self.out_channels},\n'
+        s += f'kernel_size={self.kernel_size},\n'
+        s += f'stride={self.stride},\n'
+        s += f'padding={self.padding},\n'
+        s += f'dilation={self.dilation},\n'
+        s += f'groups={self.groups},\n'
+        s += f'deform_groups={self.deform_groups},\n'
+        # bias is not supported in DeformConv2d.
+        s += 'bias=False)'
+        return s
+
+
+@MODELS.register_module('DCN')
+class DeformConv2dPack(DeformConv2d):
+    """A Deformable Conv Encapsulation that acts as normal Conv layers.
+
+    The offset tensor is like `[y0, x0, y1, x1, y2, x2, ..., y8, x8]`.
+    The spatial arrangement is like:
+
+    .. code:: text
+
+        (x0, y0) (x1, y1) (x2, y2)
+        (x3, y3) (x4, y4) (x5, y5)
+        (x6, y6) (x7, y7) (x8, y8)
+
+    Args:
+        in_channels (int): Same as nn.Conv2d.
+        out_channels (int): Same as nn.Conv2d.
+        kernel_size (int or tuple[int]): Same as nn.Conv2d.
+        stride (int or tuple[int]): Same as nn.Conv2d.
+        padding (int or tuple[int]): Same as nn.Conv2d.
+        dilation (int or tuple[int]): Same as nn.Conv2d.
+        groups (int): Same as nn.Conv2d.
+        bias (bool or str): If specified as `auto`, it will be decided by the
+            norm_cfg. Bias will be set as True if norm_cfg is None, otherwise
+            False.
+    """
+
+    _version = 2
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.conv_offset = nn.Conv2d(
+            self.in_channels,
+            self.deform_groups * 2 * self.kernel_size[0] * self.kernel_size[1],
+            kernel_size=self.kernel_size,
+            stride=_pair(self.stride),
+            padding=_pair(self.padding),
+            dilation=_pair(self.dilation),
+            bias=True)
+        self.init_offset()
+
+    def init_offset(self):
+        self.conv_offset.weight.data.zero_()
+        self.conv_offset.bias.data.zero_()
+
+    def forward(self, x: Tensor) -> Tensor:  # type: ignore
+        offset = self.conv_offset(x)
+        return deform_conv2d(x, offset, self.weight, self.stride, self.padding,
+                             self.dilation, self.groups, self.deform_groups,
+                             False, self.im2col_step)
+
+    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
+                              missing_keys, unexpected_keys, error_msgs):
+        version = local_metadata.get('version', None)
+
+        if version is None or version < 2:
+            # the key is different in early versions
+            # In version < 2, DeformConvPack loads previous benchmark models.
+            if (prefix + 'conv_offset.weight' not in state_dict
+                    and prefix[:-1] + '_offset.weight' in state_dict):
+                state_dict[prefix + 'conv_offset.weight'] = state_dict.pop(
+                    prefix[:-1] + '_offset.weight')
+            if (prefix + 'conv_offset.bias' not in state_dict
+                    and prefix[:-1] + '_offset.bias' in state_dict):
+                state_dict[prefix +
+                           'conv_offset.bias'] = state_dict.pop(prefix[:-1] +
+                                                                '_offset.bias')
+
+        if version is not None and version > 1:
+            print_log(
+                f'DeformConv2dPack {prefix.rstrip(".")} is upgraded to '
+                'version 2.',
+                logger='current')
+
+        super()._load_from_state_dict(state_dict, prefix, local_metadata,
+                                      strict, missing_keys, unexpected_keys,
+                                      error_msgs)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_roi_pool.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_roi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..1b140888025b3f4af2bf892e2e38d2b89e0f324a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deform_roi_pool.py
@@ -0,0 +1,211 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Tuple
+
+from torch import Tensor, nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['deform_roi_pool_forward', 'deform_roi_pool_backward'])
+
+
+class DeformRoIPoolFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, rois, offset, output_size, spatial_scale,
+                 sampling_ratio, gamma):
+        inputs = [input, rois]
+        if offset is not None:
+            inputs = [input, rois, offset]
+        return g.op(
+            'mmcv::MMCVDeformRoIPool',
+            *inputs,
+            pooled_height_i=output_size[0],
+            pooled_width_i=output_size[1],
+            spatial_scale_f=spatial_scale,
+            sampling_ratio_f=sampling_ratio,
+            gamma_f=gamma,
+        )
+
+    @staticmethod
+    def forward(ctx,
+                input: Tensor,
+                rois: Tensor,
+                offset: Optional[Tensor],
+                output_size: Tuple[int, ...],
+                spatial_scale: float = 1.0,
+                sampling_ratio: int = 0,
+                gamma: float = 0.1) -> Tensor:
+        if offset is None:
+            offset = input.new_zeros(0)
+        ctx.output_size = _pair(output_size)
+        ctx.spatial_scale = float(spatial_scale)
+        ctx.sampling_ratio = int(sampling_ratio)
+        ctx.gamma = float(gamma)
+
+        assert rois.size(1) == 5, 'RoI must be (idx, x1, y1, x2, y2)!'
+
+        output_shape = (rois.size(0), input.size(1), ctx.output_size[0],
+                        ctx.output_size[1])
+        output = input.new_zeros(output_shape)
+
+        ext_module.deform_roi_pool_forward(
+            input,
+            rois,
+            offset,
+            output,
+            pooled_height=ctx.output_size[0],
+            pooled_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            gamma=ctx.gamma)
+
+        ctx.save_for_backward(input, rois, offset)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(
+        ctx, grad_output: Tensor
+    ) -> Tuple[Tensor, None, Tensor, None, None, None, None]:
+        input, rois, offset = ctx.saved_tensors
+        grad_input = grad_output.new_zeros(input.shape)
+        grad_offset = grad_output.new_zeros(offset.shape)
+
+        ext_module.deform_roi_pool_backward(
+            grad_output,
+            input,
+            rois,
+            offset,
+            grad_input,
+            grad_offset,
+            pooled_height=ctx.output_size[0],
+            pooled_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            gamma=ctx.gamma)
+        if grad_offset.numel() == 0:
+            grad_offset = None
+        return grad_input, None, grad_offset, None, None, None, None
+
+
+deform_roi_pool = DeformRoIPoolFunction.apply
+
+
+class DeformRoIPool(nn.Module):
+
+    def __init__(self,
+                 output_size: Tuple[int, ...],
+                 spatial_scale: float = 1.0,
+                 sampling_ratio: int = 0,
+                 gamma: float = 0.1):
+        super().__init__()
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+        self.sampling_ratio = int(sampling_ratio)
+        self.gamma = float(gamma)
+
+    def forward(self,
+                input: Tensor,
+                rois: Tensor,
+                offset: Optional[Tensor] = None) -> Tensor:
+        return deform_roi_pool(input, rois, offset, self.output_size,
+                               self.spatial_scale, self.sampling_ratio,
+                               self.gamma)
+
+
+class DeformRoIPoolPack(DeformRoIPool):
+
+    def __init__(self,
+                 output_size: Tuple[int, ...],
+                 output_channels: int,
+                 deform_fc_channels: int = 1024,
+                 spatial_scale: float = 1.0,
+                 sampling_ratio: int = 0,
+                 gamma: float = 0.1):
+        super().__init__(output_size, spatial_scale, sampling_ratio, gamma)
+
+        self.output_channels = output_channels
+        self.deform_fc_channels = deform_fc_channels
+
+        self.offset_fc = nn.Sequential(
+            nn.Linear(
+                self.output_size[0] * self.output_size[1] *
+                self.output_channels, self.deform_fc_channels),
+            nn.ReLU(inplace=True),
+            nn.Linear(self.deform_fc_channels, self.deform_fc_channels),
+            nn.ReLU(inplace=True),
+            nn.Linear(self.deform_fc_channels,
+                      self.output_size[0] * self.output_size[1] * 2))
+        self.offset_fc[-1].weight.data.zero_()
+        self.offset_fc[-1].bias.data.zero_()
+
+    def forward(self, input: Tensor, rois: Tensor) -> Tensor:  # type: ignore
+        assert input.size(1) == self.output_channels
+        x = deform_roi_pool(input, rois, None, self.output_size,
+                            self.spatial_scale, self.sampling_ratio,
+                            self.gamma)
+        rois_num = rois.size(0)
+        offset = self.offset_fc(x.view(rois_num, -1))
+        offset = offset.view(rois_num, 2, self.output_size[0],
+                             self.output_size[1])
+        return deform_roi_pool(input, rois, offset, self.output_size,
+                               self.spatial_scale, self.sampling_ratio,
+                               self.gamma)
+
+
+class ModulatedDeformRoIPoolPack(DeformRoIPool):
+
+    def __init__(self,
+                 output_size: Tuple[int, ...],
+                 output_channels: int,
+                 deform_fc_channels: int = 1024,
+                 spatial_scale: float = 1.0,
+                 sampling_ratio: int = 0,
+                 gamma: float = 0.1):
+        super().__init__(output_size, spatial_scale, sampling_ratio, gamma)
+
+        self.output_channels = output_channels
+        self.deform_fc_channels = deform_fc_channels
+
+        self.offset_fc = nn.Sequential(
+            nn.Linear(
+                self.output_size[0] * self.output_size[1] *
+                self.output_channels, self.deform_fc_channels),
+            nn.ReLU(inplace=True),
+            nn.Linear(self.deform_fc_channels, self.deform_fc_channels),
+            nn.ReLU(inplace=True),
+            nn.Linear(self.deform_fc_channels,
+                      self.output_size[0] * self.output_size[1] * 2))
+        self.offset_fc[-1].weight.data.zero_()
+        self.offset_fc[-1].bias.data.zero_()
+
+        self.mask_fc = nn.Sequential(
+            nn.Linear(
+                self.output_size[0] * self.output_size[1] *
+                self.output_channels, self.deform_fc_channels),
+            nn.ReLU(inplace=True),
+            nn.Linear(self.deform_fc_channels,
+                      self.output_size[0] * self.output_size[1] * 1),
+            nn.Sigmoid())
+        self.mask_fc[2].weight.data.zero_()
+        self.mask_fc[2].bias.data.zero_()
+
+    def forward(self, input: Tensor, rois: Tensor) -> Tensor:  # type: ignore
+        assert input.size(1) == self.output_channels
+        x = deform_roi_pool(input, rois, None, self.output_size,
+                            self.spatial_scale, self.sampling_ratio,
+                            self.gamma)
+        rois_num = rois.size(0)
+        offset = self.offset_fc(x.view(rois_num, -1))
+        offset = offset.view(rois_num, 2, self.output_size[0],
+                             self.output_size[1])
+        mask = self.mask_fc(x.view(rois_num, -1))
+        mask = mask.view(rois_num, 1, self.output_size[0], self.output_size[1])
+        d = deform_roi_pool(input, rois, offset, self.output_size,
+                            self.spatial_scale, self.sampling_ratio,
+                            self.gamma)
+        return d * mask
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deprecated_wrappers.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deprecated_wrappers.py
new file mode 100644
index 0000000000000000000000000000000000000000..629a8033ff56be221b71a475ffd650ab7164f114
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/deprecated_wrappers.py
@@ -0,0 +1,46 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# This file is for backward compatibility.
+# Module wrappers for empty tensor have been moved to mmcv.cnn.bricks.
+import warnings
+
+from ..cnn.bricks.wrappers import Conv2d, ConvTranspose2d, Linear, MaxPool2d
+
+
+class Conv2d_deprecated(Conv2d):
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        warnings.warn(
+            'Importing Conv2d wrapper from "mmcv.ops" will be deprecated in'
+            ' the future. Please import them from "mmcv.cnn" instead',
+            DeprecationWarning)
+
+
+class ConvTranspose2d_deprecated(ConvTranspose2d):
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        warnings.warn(
+            'Importing ConvTranspose2d wrapper from "mmcv.ops" will be '
+            'deprecated in the future. Please import them from "mmcv.cnn" '
+            'instead', DeprecationWarning)
+
+
+class MaxPool2d_deprecated(MaxPool2d):
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        warnings.warn(
+            'Importing MaxPool2d wrapper from "mmcv.ops" will be deprecated in'
+            ' the future. Please import them from "mmcv.cnn" instead',
+            DeprecationWarning)
+
+
+class Linear_deprecated(Linear):
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        warnings.warn(
+            'Importing Linear wrapper from "mmcv.ops" will be deprecated in'
+            ' the future. Please import them from "mmcv.cnn" instead',
+            DeprecationWarning)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/diff_iou_rotated.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/diff_iou_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..ddcf4b4fc279ed7043196b88b4885870102c0955
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/diff_iou_rotated.py
@@ -0,0 +1,301 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Adapted from https://github.com/lilanxiao/Rotated_IoU/blob/master/box_intersection_2d.py  # noqa
+# Adapted from https://github.com/lilanxiao/Rotated_IoU/blob/master/oriented_iou_loss.py  # noqa
+from typing import Tuple
+
+import torch
+from torch import Tensor
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+EPSILON = 1e-8
+ext_module = ext_loader.load_ext('_ext',
+                                 ['diff_iou_rotated_sort_vertices_forward'])
+
+
+class SortVertices(Function):
+
+    @staticmethod
+    def forward(ctx, vertices, mask, num_valid):
+        idx = ext_module.diff_iou_rotated_sort_vertices_forward(
+            vertices, mask, num_valid)
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(idx)
+        return idx
+
+    @staticmethod
+    def backward(ctx, gradout):
+        return ()
+
+
+def box_intersection(corners1: Tensor,
+                     corners2: Tensor) -> Tuple[Tensor, Tensor]:
+    """Find intersection points of rectangles.
+    Convention: if two edges are collinear, there is no intersection point.
+
+    Args:
+        corners1 (Tensor): (B, N, 4, 2) First batch of boxes.
+        corners2 (Tensor): (B, N, 4, 2) Second batch of boxes.
+
+    Returns:
+        Tuple:
+         - Tensor: (B, N, 4, 4, 2) Intersections.
+         - Tensor: (B, N, 4, 4) Valid intersections mask.
+    """
+    # build edges from corners
+    # B, N, 4, 4: Batch, Box, edge, point
+    line1 = torch.cat([corners1, corners1[:, :, [1, 2, 3, 0], :]], dim=3)
+    line2 = torch.cat([corners2, corners2[:, :, [1, 2, 3, 0], :]], dim=3)
+    # duplicate data to pair each edges from the boxes
+    # (B, N, 4, 4) -> (B, N, 4, 4, 4) : Batch, Box, edge1, edge2, point
+    line1_ext = line1.unsqueeze(3)
+    line2_ext = line2.unsqueeze(2)
+    x1, y1, x2, y2 = line1_ext.split([1, 1, 1, 1], dim=-1)
+    x3, y3, x4, y4 = line2_ext.split([1, 1, 1, 1], dim=-1)
+    # math: https://en.wikipedia.org/wiki/Line%E2%80%93line_intersection
+    numerator = (x1 - x2) * (y3 - y4) - (y1 - y2) * (x3 - x4)
+    denumerator_t = (x1 - x3) * (y3 - y4) - (y1 - y3) * (x3 - x4)
+    t = denumerator_t / numerator
+    t[numerator == .0] = -1.
+    mask_t = (t > 0) & (t < 1)  # intersection on line segment 1
+    denumerator_u = (x1 - x2) * (y1 - y3) - (y1 - y2) * (x1 - x3)
+    u = -denumerator_u / numerator
+    u[numerator == .0] = -1.
+    mask_u = (u > 0) & (u < 1)  # intersection on line segment 2
+    mask = mask_t * mask_u
+    # overwrite with EPSILON. otherwise numerically unstable
+    t = denumerator_t / (numerator + EPSILON)
+    intersections = torch.stack([x1 + t * (x2 - x1), y1 + t * (y2 - y1)],
+                                dim=-1)
+    intersections = intersections * mask.float().unsqueeze(-1)
+    return intersections, mask
+
+
+def box1_in_box2(corners1: Tensor, corners2: Tensor) -> Tensor:
+    """Check if corners of box1 lie in box2.
+    Convention: if a corner is exactly on the edge of the other box,
+    it's also a valid point.
+
+    Args:
+        corners1 (Tensor): (B, N, 4, 2) First batch of boxes.
+        corners2 (Tensor): (B, N, 4, 2) Second batch of boxes.
+
+    Returns:
+        Tensor: (B, N, 4) Intersection.
+    """
+    # a, b, c, d - 4 vertices of box2
+    a = corners2[:, :, 0:1, :]  # (B, N, 1, 2)
+    b = corners2[:, :, 1:2, :]  # (B, N, 1, 2)
+    d = corners2[:, :, 3:4, :]  # (B, N, 1, 2)
+    # ab, am, ad - vectors between corresponding vertices
+    ab = b - a  # (B, N, 1, 2)
+    am = corners1 - a  # (B, N, 4, 2)
+    ad = d - a  # (B, N, 1, 2)
+    prod_ab = torch.sum(ab * am, dim=-1)  # (B, N, 4)
+    norm_ab = torch.sum(ab * ab, dim=-1)  # (B, N, 1)
+    prod_ad = torch.sum(ad * am, dim=-1)  # (B, N, 4)
+    norm_ad = torch.sum(ad * ad, dim=-1)  # (B, N, 1)
+    # NOTE: the expression looks ugly but is stable if the two boxes
+    # are exactly the same also stable with different scale of bboxes
+    cond1 = (prod_ab / norm_ab > -1e-6) * (prod_ab / norm_ab < 1 + 1e-6
+                                           )  # (B, N, 4)
+    cond2 = (prod_ad / norm_ad > -1e-6) * (prod_ad / norm_ad < 1 + 1e-6
+                                           )  # (B, N, 4)
+    return cond1 * cond2
+
+
+def box_in_box(corners1: Tensor, corners2: Tensor) -> Tuple[Tensor, Tensor]:
+    """Check if corners of two boxes lie in each other.
+
+    Args:
+        corners1 (Tensor): (B, N, 4, 2) First batch of boxes.
+        corners2 (Tensor): (B, N, 4, 2) Second batch of boxes.
+
+    Returns:
+        Tuple:
+         - Tensor: (B, N, 4) True if i-th corner of box1 is in box2.
+         - Tensor: (B, N, 4) True if i-th corner of box2 is in box1.
+    """
+    c1_in_2 = box1_in_box2(corners1, corners2)
+    c2_in_1 = box1_in_box2(corners2, corners1)
+    return c1_in_2, c2_in_1
+
+
+def build_vertices(corners1: Tensor, corners2: Tensor, c1_in_2: Tensor,
+                   c2_in_1: Tensor, intersections: Tensor,
+                   valid_mask: Tensor) -> Tuple[Tensor, Tensor]:
+    """Find vertices of intersection area.
+
+    Args:
+        corners1 (Tensor): (B, N, 4, 2) First batch of boxes.
+        corners2 (Tensor): (B, N, 4, 2) Second batch of boxes.
+        c1_in_2 (Tensor): (B, N, 4) True if i-th corner of box1 is in box2.
+        c2_in_1 (Tensor): (B, N, 4) True if i-th corner of box2 is in box1.
+        intersections (Tensor): (B, N, 4, 4, 2) Intersections.
+        valid_mask (Tensor): (B, N, 4, 4) Valid intersections mask.
+
+    Returns:
+        Tuple:
+         - Tensor: (B, N, 24, 2) Vertices of intersection area;
+               only some elements are valid.
+         - Tensor: (B, N, 24) Mask of valid elements in vertices.
+    """
+    # NOTE: inter has elements equals zero and has zeros gradient
+    # (masked by multiplying with 0); can be used as trick
+    B = corners1.size()[0]
+    N = corners1.size()[1]
+    # (B, N, 4 + 4 + 16, 2)
+    vertices = torch.cat(
+        [corners1, corners2,
+         intersections.view([B, N, -1, 2])], dim=2)
+    # Bool (B, N, 4 + 4 + 16)
+    mask = torch.cat([c1_in_2, c2_in_1, valid_mask.view([B, N, -1])], dim=2)
+    return vertices, mask
+
+
+def sort_indices(vertices: Tensor, mask: Tensor) -> Tensor:
+    """Sort indices.
+    Note:
+        why 9? the polygon has maximal 8 vertices.
+        +1 to duplicate the first element.
+        the index should have following structure:
+            (A, B, C, ... , A, X, X, X)
+        and X indicates the index of arbitrary elements in the last
+        16 (intersections not corners) with value 0 and mask False.
+        (cause they have zero value and zero gradient)
+
+    Args:
+        vertices (Tensor): (B, N, 24, 2) Box vertices.
+        mask (Tensor): (B, N, 24) Mask.
+
+    Returns:
+        Tensor: (B, N, 9) Sorted indices.
+
+    """
+    num_valid = torch.sum(mask.int(), dim=2).int()  # (B, N)
+    mean = torch.sum(
+        vertices * mask.float().unsqueeze(-1), dim=2,
+        keepdim=True) / num_valid.unsqueeze(-1).unsqueeze(-1)
+    vertices_normalized = vertices - mean  # normalization makes sorting easier
+    return SortVertices.apply(vertices_normalized, mask, num_valid).long()
+
+
+def calculate_area(idx_sorted: Tensor,
+                   vertices: Tensor) -> Tuple[Tensor, Tensor]:
+    """Calculate area of intersection.
+
+    Args:
+        idx_sorted (Tensor): (B, N, 9) Sorted vertex ids.
+        vertices (Tensor): (B, N, 24, 2) Vertices.
+
+    Returns:
+        Tuple:
+         - Tensor (B, N): Area of intersection.
+         - Tensor: (B, N, 9, 2) Vertices of polygon with zero padding.
+    """
+    idx_ext = idx_sorted.unsqueeze(-1).repeat([1, 1, 1, 2])
+    selected = torch.gather(vertices, 2, idx_ext)
+    total = selected[:, :, 0:-1, 0] * selected[:, :, 1:, 1] \
+        - selected[:, :, 0:-1, 1] * selected[:, :, 1:, 0]
+    total = torch.sum(total, dim=2)
+    area = torch.abs(total) / 2
+    return area, selected
+
+
+def oriented_box_intersection_2d(corners1: Tensor,
+                                 corners2: Tensor) -> Tuple[Tensor, Tensor]:
+    """Calculate intersection area of 2d rotated boxes.
+
+    Args:
+        corners1 (Tensor): (B, N, 4, 2) First batch of boxes.
+        corners2 (Tensor): (B, N, 4, 2) Second batch of boxes.
+
+    Returns:
+        Tuple:
+         - Tensor (B, N): Area of intersection.
+         - Tensor (B, N, 9, 2): Vertices of polygon with zero padding.
+    """
+    intersections, valid_mask = box_intersection(corners1, corners2)
+    c12, c21 = box_in_box(corners1, corners2)
+    vertices, mask = build_vertices(corners1, corners2, c12, c21,
+                                    intersections, valid_mask)
+    sorted_indices = sort_indices(vertices, mask)
+    return calculate_area(sorted_indices, vertices)
+
+
+def box2corners(box: Tensor) -> Tensor:
+    """Convert rotated 2d box coordinate to corners.
+
+    Args:
+        box (Tensor): (B, N, 5) with x, y, w, h, alpha.
+
+    Returns:
+        Tensor: (B, N, 4, 2) Corners.
+    """
+    B = box.size()[0]
+    x, y, w, h, alpha = box.split([1, 1, 1, 1, 1], dim=-1)
+    x4 = box.new_tensor([0.5, -0.5, -0.5, 0.5]).to(box.device)
+    x4 = x4 * w  # (B, N, 4)
+    y4 = box.new_tensor([0.5, 0.5, -0.5, -0.5]).to(box.device)
+    y4 = y4 * h  # (B, N, 4)
+    corners = torch.stack([x4, y4], dim=-1)  # (B, N, 4, 2)
+    sin = torch.sin(alpha)
+    cos = torch.cos(alpha)
+    row1 = torch.cat([cos, sin], dim=-1)
+    row2 = torch.cat([-sin, cos], dim=-1)  # (B, N, 2)
+    rot_T = torch.stack([row1, row2], dim=-2)  # (B, N, 2, 2)
+    rotated = torch.bmm(corners.view([-1, 4, 2]), rot_T.view([-1, 2, 2]))
+    rotated = rotated.view([B, -1, 4, 2])  # (B * N, 4, 2) -> (B, N, 4, 2)
+    rotated[..., 0] += x
+    rotated[..., 1] += y
+    return rotated
+
+
+def diff_iou_rotated_2d(box1: Tensor, box2: Tensor) -> Tensor:
+    """Calculate differentiable iou of rotated 2d boxes.
+
+    Args:
+        box1 (Tensor): (B, N, 5) First box.
+        box2 (Tensor): (B, N, 5) Second box.
+
+    Returns:
+        Tensor: (B, N) IoU.
+    """
+    corners1 = box2corners(box1)
+    corners2 = box2corners(box2)
+    intersection, _ = oriented_box_intersection_2d(corners1,
+                                                   corners2)  # (B, N)
+    area1 = box1[:, :, 2] * box1[:, :, 3]
+    area2 = box2[:, :, 2] * box2[:, :, 3]
+    union = area1 + area2 - intersection
+    iou = intersection / union
+    return iou
+
+
+def diff_iou_rotated_3d(box3d1: Tensor, box3d2: Tensor) -> Tensor:
+    """Calculate differentiable iou of rotated 3d boxes.
+
+    Args:
+        box3d1 (Tensor): (B, N, 3+3+1) First box (x,y,z,w,h,l,alpha).
+        box3d2 (Tensor): (B, N, 3+3+1) Second box (x,y,z,w,h,l,alpha).
+
+    Returns:
+        Tensor: (B, N) IoU.
+    """
+    box1 = box3d1[..., [0, 1, 3, 4, 6]]  # 2d box
+    box2 = box3d2[..., [0, 1, 3, 4, 6]]
+    corners1 = box2corners(box1)
+    corners2 = box2corners(box2)
+    intersection, _ = oriented_box_intersection_2d(corners1, corners2)
+    zmax1 = box3d1[..., 2] + box3d1[..., 5] * 0.5
+    zmin1 = box3d1[..., 2] - box3d1[..., 5] * 0.5
+    zmax2 = box3d2[..., 2] + box3d2[..., 5] * 0.5
+    zmin2 = box3d2[..., 2] - box3d2[..., 5] * 0.5
+    z_overlap = (torch.min(zmax1, zmax2) -
+                 torch.max(zmin1, zmin2)).clamp_(min=0.)
+    intersection_3d = intersection * z_overlap
+    volume1 = box3d1[..., 3] * box3d1[..., 4] * box3d1[..., 5]
+    volume2 = box3d2[..., 3] * box3d2[..., 4] * box3d2[..., 5]
+    union_3d = volume1 + volume2 - intersection_3d
+    return intersection_3d / union_3d
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/focal_loss.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/focal_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..69aab7305205f1024dbbd2976517ae5ec3e7af9d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/focal_loss.py
@@ -0,0 +1,208 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Union
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'sigmoid_focal_loss_forward', 'sigmoid_focal_loss_backward',
+    'softmax_focal_loss_forward', 'softmax_focal_loss_backward'
+])
+
+
+class SigmoidFocalLossFunction(Function):
+
+    @staticmethod
+    def forward(ctx,
+                input: torch.Tensor,
+                target: Union[torch.LongTensor, torch.cuda.LongTensor],
+                gamma: float = 2.0,
+                alpha: float = 0.25,
+                weight: Optional[torch.Tensor] = None,
+                reduction: str = 'mean') -> torch.Tensor:
+
+        assert target.dtype == torch.long
+        assert input.dim() == 2
+        assert target.dim() == 1
+        assert input.size(0) == target.size(0)
+        if weight is None:
+            weight = input.new_empty(0)
+        else:
+            assert weight.dim() == 1
+            assert input.size(1) == weight.size(0)
+        ctx.reduction_dict = {'none': 0, 'mean': 1, 'sum': 2}
+        assert reduction in ctx.reduction_dict.keys()
+
+        ctx.gamma = float(gamma)
+        ctx.alpha = float(alpha)
+        ctx.reduction = ctx.reduction_dict[reduction]
+
+        output = input.new_zeros(input.size())
+
+        ext_module.sigmoid_focal_loss_forward(
+            input, target, weight, output, gamma=ctx.gamma, alpha=ctx.alpha)
+        if ctx.reduction == ctx.reduction_dict['mean']:
+            output = output.sum() / input.size(0)
+        elif ctx.reduction == ctx.reduction_dict['sum']:
+            output = output.sum()
+        ctx.save_for_backward(input, target, weight)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        input, target, weight = ctx.saved_tensors
+
+        grad_input = input.new_zeros(input.size())
+
+        ext_module.sigmoid_focal_loss_backward(
+            input,
+            target,
+            weight,
+            grad_input,
+            gamma=ctx.gamma,
+            alpha=ctx.alpha)
+
+        grad_input *= grad_output
+        if ctx.reduction == ctx.reduction_dict['mean']:
+            grad_input /= input.size(0)
+        return grad_input, None, None, None, None, None
+
+
+sigmoid_focal_loss = SigmoidFocalLossFunction.apply
+
+
+class SigmoidFocalLoss(nn.Module):
+
+    def __init__(self,
+                 gamma: float,
+                 alpha: float,
+                 weight: Optional[torch.Tensor] = None,
+                 reduction: str = 'mean'):
+        super().__init__()
+        self.gamma = gamma
+        self.alpha = alpha
+        self.register_buffer('weight', weight)
+        self.reduction = reduction
+
+    def forward(
+        self,
+        input: torch.Tensor,
+        target: Union[torch.LongTensor, torch.cuda.LongTensor],
+    ) -> torch.Tensor:
+        return sigmoid_focal_loss(input, target, self.gamma, self.alpha,
+                                  self.weight, self.reduction)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(gamma={self.gamma}, '
+        s += f'alpha={self.alpha}, '
+        s += f'reduction={self.reduction})'
+        return s
+
+
+class SoftmaxFocalLossFunction(Function):
+
+    @staticmethod
+    def forward(ctx,
+                input: torch.Tensor,
+                target: Union[torch.LongTensor, torch.cuda.LongTensor],
+                gamma: float = 2.0,
+                alpha: float = 0.25,
+                weight: Optional[torch.Tensor] = None,
+                reduction='mean') -> torch.Tensor:
+
+        assert target.dtype == torch.long
+        assert input.dim() == 2
+        assert target.dim() == 1
+        assert input.size(0) == target.size(0)
+        if weight is None:
+            weight = input.new_empty(0)
+        else:
+            assert weight.dim() == 1
+            assert input.size(1) == weight.size(0)
+        ctx.reduction_dict = {'none': 0, 'mean': 1, 'sum': 2}
+        assert reduction in ctx.reduction_dict.keys()
+
+        ctx.gamma = float(gamma)
+        ctx.alpha = float(alpha)
+        ctx.reduction = ctx.reduction_dict[reduction]
+
+        channel_stats, _ = torch.max(input, dim=1)
+        input_softmax = input - channel_stats.unsqueeze(1).expand_as(input)
+        input_softmax.exp_()
+
+        channel_stats = input_softmax.sum(dim=1)
+        input_softmax /= channel_stats.unsqueeze(1).expand_as(input)
+
+        output = input.new_zeros(input.size(0))
+        ext_module.softmax_focal_loss_forward(
+            input_softmax,
+            target,
+            weight,
+            output,
+            gamma=ctx.gamma,
+            alpha=ctx.alpha)
+
+        if ctx.reduction == ctx.reduction_dict['mean']:
+            output = output.sum() / input.size(0)
+        elif ctx.reduction == ctx.reduction_dict['sum']:
+            output = output.sum()
+        ctx.save_for_backward(input_softmax, target, weight)
+        return output
+
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        input_softmax, target, weight = ctx.saved_tensors
+        buff = input_softmax.new_zeros(input_softmax.size(0))
+        grad_input = input_softmax.new_zeros(input_softmax.size())
+
+        ext_module.softmax_focal_loss_backward(
+            input_softmax,
+            target,
+            weight,
+            buff,
+            grad_input,
+            gamma=ctx.gamma,
+            alpha=ctx.alpha)
+
+        grad_input *= grad_output
+        if ctx.reduction == ctx.reduction_dict['mean']:
+            grad_input /= input_softmax.size(0)
+        return grad_input, None, None, None, None, None
+
+
+softmax_focal_loss = SoftmaxFocalLossFunction.apply
+
+
+class SoftmaxFocalLoss(nn.Module):
+
+    def __init__(self,
+                 gamma: float,
+                 alpha: float,
+                 weight: Optional[torch.Tensor] = None,
+                 reduction: str = 'mean'):
+        super().__init__()
+        self.gamma = gamma
+        self.alpha = alpha
+        self.register_buffer('weight', weight)
+        self.reduction = reduction
+
+    def forward(
+        self,
+        input: torch.Tensor,
+        target: Union[torch.LongTensor, torch.cuda.LongTensor],
+    ) -> torch.Tensor:
+        return softmax_focal_loss(input, target, self.gamma, self.alpha,
+                                  self.weight, self.reduction)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(gamma={self.gamma}, '
+        s += f'alpha={self.alpha}, '
+        s += f'reduction={self.reduction})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/furthest_point_sample.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/furthest_point_sample.py
new file mode 100644
index 0000000000000000000000000000000000000000..22b1a3048d08b3f1eda43e4a3d5c36a6f6ab5349
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/furthest_point_sample.py
@@ -0,0 +1,84 @@
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'furthest_point_sampling_forward',
+    'furthest_point_sampling_with_dist_forward'
+])
+
+
+class FurthestPointSampling(Function):
+    """Uses iterative furthest point sampling to select a set of features whose
+    corresponding points have the furthest distance."""
+
+    @staticmethod
+    def forward(ctx, points_xyz: torch.Tensor,
+                num_points: int) -> torch.Tensor:
+        """
+        Args:
+            points_xyz (torch.Tensor): (B, N, 3) where N > num_points.
+            num_points (int): Number of points in the sampled set.
+
+        Returns:
+            torch.Tensor: (B, num_points) indices of the sampled points.
+        """
+        assert points_xyz.is_contiguous()
+
+        B, N = points_xyz.size()[:2]
+        output = torch.cuda.IntTensor(B, num_points)
+        temp = torch.cuda.FloatTensor(B, N).fill_(1e10)
+
+        ext_module.furthest_point_sampling_forward(
+            points_xyz,
+            temp,
+            output,
+            b=B,
+            n=N,
+            m=num_points,
+        )
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(output)
+        return output
+
+    @staticmethod
+    def backward(xyz, a=None):
+        return None, None
+
+
+class FurthestPointSamplingWithDist(Function):
+    """Uses iterative furthest point sampling to select a set of features whose
+    corresponding points have the furthest distance."""
+
+    @staticmethod
+    def forward(ctx, points_dist: torch.Tensor,
+                num_points: int) -> torch.Tensor:
+        """
+        Args:
+            points_dist (torch.Tensor): (B, N, N) Distance between each point
+                pair.
+            num_points (int): Number of points in the sampled set.
+
+        Returns:
+            torch.Tensor: (B, num_points) indices of the sampled points.
+        """
+        assert points_dist.is_contiguous()
+
+        B, N, _ = points_dist.size()
+        output = points_dist.new_zeros([B, num_points], dtype=torch.int32)
+        temp = points_dist.new_zeros([B, N]).fill_(1e10)
+
+        ext_module.furthest_point_sampling_with_dist_forward(
+            points_dist, temp, output, b=B, n=N, m=num_points)
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(output)
+        return output
+
+    @staticmethod
+    def backward(xyz, a=None):
+        return None, None
+
+
+furthest_point_sample = FurthestPointSampling.apply
+furthest_point_sample_with_dist = FurthestPointSamplingWithDist.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/fused_bias_leakyrelu.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/fused_bias_leakyrelu.py
new file mode 100644
index 0000000000000000000000000000000000000000..e23617fb3af36234f1694e7c1210797d04b72113
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/fused_bias_leakyrelu.py
@@ -0,0 +1,282 @@
+# modified from https://github.com/rosinality/stylegan2-pytorch/blob/master/op/fused_act.py # noqa:E501
+
+# Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+# NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+# Augmentation (ADA)
+# =======================================================================
+
+# 1. Definitions
+
+# "Licensor" means any person or entity that distributes its Work.
+
+# "Software" means the original work of authorship made available under
+# this License.
+
+# "Work" means the Software and any additions to or derivative works of
+# the Software that are made available under this License.
+
+# The terms "reproduce," "reproduction," "derivative works," and
+# "distribution" have the meaning as provided under U.S. copyright law;
+# provided, however, that for the purposes of this License, derivative
+# works shall not include works that remain separable from, or merely
+# link (or bind by name) to the interfaces of, the Work.
+
+# Works, including the Software, are "made available" under this License
+# by including in or with the Work either (a) a copyright notice
+# referencing the applicability of this License to the Work, or (b) a
+# copy of this License.
+
+# 2. License Grants
+
+#     2.1 Copyright Grant. Subject to the terms and conditions of this
+#     License, each Licensor grants to you a perpetual, worldwide,
+#     non-exclusive, royalty-free, copyright license to reproduce,
+#     prepare derivative works of, publicly display, publicly perform,
+#     sublicense and distribute its Work and any resulting derivative
+#     works in any form.
+
+# 3. Limitations
+
+#     3.1 Redistribution. You may reproduce or distribute the Work only
+#     if (a) you do so under this License, (b) you include a complete
+#     copy of this License with your distribution, and (c) you retain
+#     without modification any copyright, patent, trademark, or
+#     attribution notices that are present in the Work.
+
+#     3.2 Derivative Works. You may specify that additional or different
+#     terms apply to the use, reproduction, and distribution of your
+#     derivative works of the Work ("Your Terms") only if (a) Your Terms
+#     provide that the use limitation in Section 3.3 applies to your
+#     derivative works, and (b) you identify the specific derivative
+#     works that are subject to Your Terms. Notwithstanding Your Terms,
+#     this License (including the redistribution requirements in Section
+#     3.1) will continue to apply to the Work itself.
+
+#     3.3 Use Limitation. The Work and any derivative works thereof only
+#     may be used or intended for use non-commercially. Notwithstanding
+#     the foregoing, NVIDIA and its affiliates may use the Work and any
+#     derivative works commercially. As used herein, "non-commercially"
+#     means for research or evaluation purposes only.
+
+#     3.4 Patent Claims. If you bring or threaten to bring a patent claim
+#     against any Licensor (including any claim, cross-claim or
+#     counterclaim in a lawsuit) to enforce any patents that you allege
+#     are infringed by any Work, then your rights under this License from
+#     such Licensor (including the grant in Section 2.1) will terminate
+#     immediately.
+
+#     3.5 Trademarks. This License does not grant any rights to use any
+#     Licensor’s or its affiliates’ names, logos, or trademarks, except
+#     as necessary to reproduce the notices described in this License.
+
+#     3.6 Termination. If you violate any term of this License, then your
+#     rights under this License (including the grant in Section 2.1) will
+#     terminate immediately.
+
+# 4. Disclaimer of Warranty.
+
+# THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+# NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+# THIS LICENSE.
+
+# 5. Limitation of Liability.
+
+# EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+# THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+# SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+# INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+# OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+# (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+# LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+# COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGES.
+
+# =======================================================================
+
+import torch
+import torch.nn.functional as F
+from torch import nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['fused_bias_leakyrelu'])
+
+
+class FusedBiasLeakyReLUFunctionBackward(Function):
+    """Calculate second order deviation.
+
+    This function is to compute the second order deviation for the fused leaky
+    relu operation.
+    """
+
+    @staticmethod
+    def forward(ctx, grad_output: torch.Tensor, out: torch.Tensor,
+                negative_slope: float, scale: float) -> tuple:
+        ctx.save_for_backward(out)
+        ctx.negative_slope = negative_slope
+        ctx.scale = scale
+
+        empty = grad_output.new_empty(0)
+
+        grad_input = ext_module.fused_bias_leakyrelu(
+            grad_output,
+            empty,
+            out,
+            act=3,
+            grad=1,
+            alpha=negative_slope,
+            scale=scale)
+
+        dim = [0]
+
+        if grad_input.ndim > 2:
+            dim += list(range(2, grad_input.ndim))
+
+        grad_bias = grad_input.sum(dim).detach()
+
+        return grad_input, grad_bias
+
+    @staticmethod
+    def backward(ctx, gradgrad_input: torch.Tensor,
+                 gradgrad_bias: nn.Parameter) -> tuple:
+        out, = ctx.saved_tensors
+
+        # The second order deviation, in fact, contains two parts, while the
+        # the first part is zero. Thus, we direct consider the second part
+        # which is similar with the first order deviation in implementation.
+        gradgrad_out = ext_module.fused_bias_leakyrelu(
+            gradgrad_input,
+            gradgrad_bias.to(out.dtype),
+            out,
+            act=3,
+            grad=1,
+            alpha=ctx.negative_slope,
+            scale=ctx.scale)
+
+        return gradgrad_out, None, None, None
+
+
+class FusedBiasLeakyReLUFunction(Function):
+
+    @staticmethod
+    def forward(ctx, input: torch.Tensor, bias: nn.Parameter,
+                negative_slope: float, scale: float) -> torch.Tensor:
+        empty = input.new_empty(0)
+
+        out = ext_module.fused_bias_leakyrelu(
+            input,
+            bias,
+            empty,
+            act=3,
+            grad=0,
+            alpha=negative_slope,
+            scale=scale)
+        ctx.save_for_backward(out)
+        ctx.negative_slope = negative_slope
+        ctx.scale = scale
+
+        return out
+
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        out, = ctx.saved_tensors
+
+        grad_input, grad_bias = FusedBiasLeakyReLUFunctionBackward.apply(
+            grad_output, out, ctx.negative_slope, ctx.scale)
+
+        return grad_input, grad_bias, None, None
+
+
+class FusedBiasLeakyReLU(nn.Module):
+    r"""Fused bias leaky ReLU.
+
+    This function is introduced in the StyleGAN2:
+    `Analyzing and Improving the Image Quality of StyleGAN
+    <http://arxiv.org/abs/1912.04958>`_
+
+    The bias term comes from the convolution operation. In addition, to keep
+    the variance of the feature map or gradients unchanged, they also adopt a
+    scale similarly with Kaiming initialization. However, since the
+    :math:`1+{alpha}^2` is too small, we can just ignore it. Therefore, the
+    final scale is just :math:`\sqrt{2}`. Of course, you may change it with
+    your own scale.
+
+    TODO: Implement the CPU version.
+
+    Args:
+        num_channels (int): The channel number of the feature map.
+        negative_slope (float, optional): Same as nn.LeakyRelu.
+            Defaults to 0.2.
+        scale (float, optional): A scalar to adjust the variance of the feature
+            map. Defaults to 2**0.5.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 negative_slope: float = 0.2,
+                 scale: float = 2**0.5):
+        super().__init__()
+
+        self.bias = nn.Parameter(torch.zeros(num_channels))
+        self.negative_slope = negative_slope
+        self.scale = scale
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        return fused_bias_leakyrelu(input, self.bias, self.negative_slope,
+                                    self.scale)
+
+
+def fused_bias_leakyrelu(input: torch.Tensor,
+                         bias: nn.Parameter,
+                         negative_slope: float = 0.2,
+                         scale: float = 2**0.5) -> torch.Tensor:
+    r"""Fused bias leaky ReLU function.
+
+    This function is introduced in the StyleGAN2:
+    `Analyzing and Improving the Image Quality of StyleGAN
+    <http://arxiv.org/abs/1912.04958>`_
+
+    The bias term comes from the convolution operation. In addition, to keep
+    the variance of the feature map or gradients unchanged, they also adopt a
+    scale similarly with Kaiming initialization. However, since the
+    :math:`1+{alpha}^2` is too small, we can just ignore it. Therefore, the
+    final scale is just :math:`\sqrt{2}`. Of course, you may change it with
+    your own scale.
+
+    Args:
+        input (torch.Tensor): Input feature map.
+        bias (nn.Parameter): The bias from convolution operation.
+        negative_slope (float, optional): Same as nn.LeakyRelu.
+            Defaults to 0.2.
+        scale (float, optional): A scalar to adjust the variance of the feature
+            map. Defaults to 2**0.5.
+
+    Returns:
+        torch.Tensor: Feature map after non-linear activation.
+    """
+
+    if not input.is_cuda:
+        return bias_leakyrelu_ref(input, bias, negative_slope, scale)
+
+    return FusedBiasLeakyReLUFunction.apply(input, bias.to(input.dtype),
+                                            negative_slope, scale)
+
+
+def bias_leakyrelu_ref(x: torch.Tensor,
+                       bias: nn.Parameter,
+                       negative_slope: float = 0.2,
+                       scale: float = 2**0.5) -> torch.Tensor:
+
+    if bias is not None:
+        assert bias.ndim == 1
+        assert bias.shape[0] == x.shape[1]
+        x = x + bias.reshape([-1 if i == 1 else 1 for i in range(x.ndim)])
+
+    x = F.leaky_relu(x, negative_slope)
+    if scale != 1:
+        x = x * scale
+
+    return x
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/gather_points.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/gather_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..895bfab643ba5c9da218e398501c12a646b869e8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/gather_points.py
@@ -0,0 +1,59 @@
+from typing import Tuple
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['gather_points_forward', 'gather_points_backward'])
+
+
+class GatherPoints(Function):
+    """Gather points with given index."""
+
+    @staticmethod
+    def forward(ctx, features: torch.Tensor,
+                indices: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            features (torch.Tensor): (B, C, N) features to gather.
+            indices (torch.Tensor): (B, M) where M is the number of points.
+
+        Returns:
+            torch.Tensor: (B, C, M) where M is the number of points.
+        """
+        assert features.is_contiguous()
+        assert indices.is_contiguous()
+
+        B, npoint = indices.size()
+        _, C, N = features.size()
+        output = features.new_zeros((B, C, npoint))
+
+        ext_module.gather_points_forward(
+            features, indices, output, b=B, c=C, n=N, npoints=npoint)
+
+        ctx.for_backwards = (indices, C, N)
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(indices)
+        return output
+
+    @staticmethod
+    def backward(ctx, grad_out: torch.Tensor) -> Tuple[torch.Tensor, None]:
+        idx, C, N = ctx.for_backwards
+        B, npoint = idx.size()
+
+        grad_features = grad_out.new_zeros((B, C, N))
+        grad_out_data = grad_out.data.contiguous()
+        ext_module.gather_points_backward(
+            grad_out_data,
+            idx,
+            grad_features.data,
+            b=B,
+            c=C,
+            n=N,
+            npoints=npoint)
+        return grad_features, None
+
+
+gather_points = GatherPoints.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/group_points.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/group_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..999728c22a4cc4aa3b368d1261b29a67e11d5523
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/group_points.py
@@ -0,0 +1,299 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Tuple, Union
+
+import torch
+from torch import nn as nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+from .ball_query import ball_query
+from .knn import knn
+
+ext_module = ext_loader.load_ext('_ext', [
+    'group_points_forward', 'group_points_backward',
+    'stack_group_points_forward', 'stack_group_points_backward'
+])
+
+
+class QueryAndGroup(nn.Module):
+    """Groups points with a ball query of radius.
+
+    Args:
+        max_radius (float): The maximum radius of the balls.
+            If None is given, we will use kNN sampling instead of ball query.
+        sample_num (int): Maximum number of features to gather in the ball.
+        min_radius (float, optional): The minimum radius of the balls.
+            Default: 0.
+        use_xyz (bool, optional): Whether to use xyz.
+            Default: True.
+        return_grouped_xyz (bool, optional): Whether to return grouped xyz.
+            Default: False.
+        normalize_xyz (bool, optional): Whether to normalize xyz.
+            Default: False.
+        uniform_sample (bool, optional): Whether to sample uniformly.
+            Default: False
+        return_unique_cnt (bool, optional): Whether to return the count of
+            unique samples. Default: False.
+        return_grouped_idx (bool, optional): Whether to return grouped idx.
+            Default: False.
+    """
+
+    def __init__(self,
+                 max_radius: float,
+                 sample_num: int,
+                 min_radius: float = 0.,
+                 use_xyz: bool = True,
+                 return_grouped_xyz: bool = False,
+                 normalize_xyz: bool = False,
+                 uniform_sample: bool = False,
+                 return_unique_cnt: bool = False,
+                 return_grouped_idx: bool = False):
+        super().__init__()
+        self.max_radius = max_radius
+        self.min_radius = min_radius
+        self.sample_num = sample_num
+        self.use_xyz = use_xyz
+        self.return_grouped_xyz = return_grouped_xyz
+        self.normalize_xyz = normalize_xyz
+        self.uniform_sample = uniform_sample
+        self.return_unique_cnt = return_unique_cnt
+        self.return_grouped_idx = return_grouped_idx
+        if self.return_unique_cnt:
+            assert self.uniform_sample, \
+                'uniform_sample should be True when ' \
+                'returning the count of unique samples'
+        if self.max_radius is None:
+            assert not self.normalize_xyz, \
+                'can not normalize grouped xyz when max_radius is None'
+
+    def forward(
+        self,
+        points_xyz: torch.Tensor,
+        center_xyz: torch.Tensor,
+        features: Optional[torch.Tensor] = None,
+    ) -> Union[torch.Tensor, Tuple]:
+        """
+        Args:
+            points_xyz (torch.Tensor): (B, N, 3) xyz coordinates of the
+                points.
+            center_xyz (torch.Tensor): (B, npoint, 3) coordinates of the
+                centriods.
+            features (torch.Tensor): (B, C, N) The features of grouped
+                points.
+
+        Returns:
+            Tuple | torch.Tensor: (B, 3 + C, npoint, sample_num) Grouped
+            concatenated coordinates and features of points.
+        """
+        # if self.max_radius is None, we will perform kNN instead of ball query
+        # idx is of shape [B, npoint, sample_num]
+        if self.max_radius is None:
+            idx = knn(self.sample_num, points_xyz, center_xyz, False)
+            idx = idx.transpose(1, 2).contiguous()
+        else:
+            idx = ball_query(self.min_radius, self.max_radius, self.sample_num,
+                             points_xyz, center_xyz)
+
+        if self.uniform_sample:
+            unique_cnt = torch.zeros((idx.shape[0], idx.shape[1]))
+            for i_batch in range(idx.shape[0]):
+                for i_region in range(idx.shape[1]):
+                    unique_ind = torch.unique(idx[i_batch, i_region, :])
+                    num_unique = unique_ind.shape[0]
+                    unique_cnt[i_batch, i_region] = num_unique
+                    sample_ind = torch.randint(
+                        0,
+                        num_unique, (self.sample_num - num_unique, ),
+                        dtype=torch.long)
+                    all_ind = torch.cat((unique_ind, unique_ind[sample_ind]))
+                    idx[i_batch, i_region, :] = all_ind
+
+        xyz_trans = points_xyz.transpose(1, 2).contiguous()
+        # (B, 3, npoint, sample_num)
+        grouped_xyz = grouping_operation(xyz_trans, idx)
+        grouped_xyz_diff = grouped_xyz - \
+            center_xyz.transpose(1, 2).unsqueeze(-1)  # relative offsets
+        if self.normalize_xyz:
+            grouped_xyz_diff /= self.max_radius
+
+        if features is not None:
+            grouped_features = grouping_operation(features, idx)
+            if self.use_xyz:
+                # (B, C + 3, npoint, sample_num)
+                new_features = torch.cat([grouped_xyz_diff, grouped_features],
+                                         dim=1)
+            else:
+                new_features = grouped_features
+        else:
+            assert (self.use_xyz
+                    ), 'Cannot have not features and not use xyz as a feature!'
+            new_features = grouped_xyz_diff
+
+        ret = [new_features]
+        if self.return_grouped_xyz:
+            ret.append(grouped_xyz)
+        if self.return_unique_cnt:
+            ret.append(unique_cnt)
+        if self.return_grouped_idx:
+            ret.append(idx)
+        if len(ret) == 1:
+            return ret[0]
+        else:
+            return tuple(ret)
+
+
+class GroupAll(nn.Module):
+    """Group xyz with feature.
+
+    Args:
+        use_xyz (bool): Whether to use xyz.
+    """
+
+    def __init__(self, use_xyz: bool = True):
+        super().__init__()
+        self.use_xyz = use_xyz
+
+    def forward(self,
+                xyz: torch.Tensor,
+                new_xyz: torch.Tensor,
+                features: Optional[torch.Tensor] = None) -> torch.Tensor:
+        """
+        Args:
+            xyz (Tensor): (B, N, 3) xyz coordinates of the features.
+            new_xyz (Tensor): new xyz coordinates of the features.
+            features (Tensor): (B, C, N) features to group.
+
+        Returns:
+            Tensor: (B, C + 3, 1, N) Grouped feature.
+        """
+        grouped_xyz = xyz.transpose(1, 2).unsqueeze(2)
+        if features is not None:
+            grouped_features = features.unsqueeze(2)
+            if self.use_xyz:
+                # (B, 3 + C, 1, N)
+                new_features = torch.cat([grouped_xyz, grouped_features],
+                                         dim=1)
+            else:
+                new_features = grouped_features
+        else:
+            new_features = grouped_xyz
+
+        return new_features
+
+
+class GroupingOperation(Function):
+    """Group feature with given index."""
+
+    @staticmethod
+    def forward(
+            ctx,
+            features: torch.Tensor,
+            indices: torch.Tensor,
+            features_batch_cnt: Optional[torch.Tensor] = None,
+            indices_batch_cnt: Optional[torch.Tensor] = None) -> torch.Tensor:
+        """
+        Args:
+            features (Tensor): Tensor of features to group, input shape is
+                (B, C, N) or stacked inputs (N1 + N2 ..., C).
+            indices (Tensor):  The indices of features to group with, input
+                shape is (B, npoint, nsample) or stacked inputs
+                (M1 + M2 ..., nsample).
+            features_batch_cnt (Tensor, optional): Input features nums in
+                each batch, just like (N1, N2, ...). Defaults to None.
+                New in version 1.7.0.
+            indices_batch_cnt (Tensor, optional): Input indices nums in
+                each batch, just like (M1, M2, ...). Defaults to None.
+                New in version 1.7.0.
+
+        Returns:
+            Tensor: Grouped features, the shape is (B, C, npoint, nsample)
+            or (M1 + M2 ..., C, nsample).
+        """
+        features = features.contiguous()
+        indices = indices.contiguous()
+        if features_batch_cnt is not None and indices_batch_cnt is not None:
+            assert features_batch_cnt.dtype == torch.int
+            assert indices_batch_cnt.dtype == torch.int
+            M, nsample = indices.size()
+            N, C = features.size()
+            B = indices_batch_cnt.shape[0]
+            output = features.new_zeros((M, C, nsample))
+            ext_module.stack_group_points_forward(
+                features,
+                features_batch_cnt,
+                indices,
+                indices_batch_cnt,
+                output,
+                b=B,
+                m=M,
+                c=C,
+                nsample=nsample)
+            ctx.for_backwards = (B, N, indices, features_batch_cnt,
+                                 indices_batch_cnt)
+        else:
+            B, nfeatures, nsample = indices.size()
+            _, C, N = features.size()
+            output = features.new_zeros(B, C, nfeatures, nsample)
+
+            ext_module.group_points_forward(
+                features,
+                indices,
+                output,
+                b=B,
+                c=C,
+                n=N,
+                npoints=nfeatures,
+                nsample=nsample)
+
+            ctx.for_backwards = (indices, N)
+        return output
+
+    @staticmethod
+    def backward(ctx, grad_out: torch.Tensor) -> Tuple:
+        """
+        Args:
+            grad_out (Tensor): (B, C, npoint, nsample) tensor of the gradients
+                of the output from forward.
+
+        Returns:
+            Tensor: (B, C, N) gradient of the features.
+        """
+        if len(ctx.for_backwards) != 5:
+            idx, N = ctx.for_backwards
+
+            B, C, npoint, nsample = grad_out.size()
+            grad_features = grad_out.new_zeros(B, C, N)
+
+            grad_out_data = grad_out.data.contiguous()
+            ext_module.group_points_backward(
+                grad_out_data,
+                idx,
+                grad_features.data,
+                b=B,
+                c=C,
+                n=N,
+                npoints=npoint,
+                nsample=nsample)
+            return grad_features, None
+        else:
+            B, N, idx, features_batch_cnt, idx_batch_cnt = ctx.for_backwards
+
+            M, C, nsample = grad_out.size()
+            grad_features = grad_out.new_zeros(N, C)
+
+            grad_out_data = grad_out.data.contiguous()
+            ext_module.stack_group_points_backward(
+                grad_out_data,
+                idx,
+                idx_batch_cnt,
+                features_batch_cnt,
+                grad_features.data,
+                b=B,
+                c=C,
+                m=M,
+                n=N,
+                nsample=nsample)
+            return grad_features, None, None, None
+
+
+grouping_operation = GroupingOperation.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/info.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/info.py
new file mode 100644
index 0000000000000000000000000000000000000000..b24b981f8f513b3bf5c2300d37375acaded62e21
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/info.py
@@ -0,0 +1,21 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+if torch.__version__ == 'parrots':
+    import parrots
+
+    def get_compiler_version():
+        return 'GCC ' + parrots.version.compiler
+
+    def get_compiling_cuda_version():
+        return parrots.version.cuda
+else:
+    from ..utils import ext_loader
+    ext_module = ext_loader.load_ext(
+        '_ext', ['get_compiler_version', 'get_compiling_cuda_version'])
+
+    def get_compiler_version():
+        return ext_module.get_compiler_version()
+
+    def get_compiling_cuda_version():
+        return ext_module.get_compiling_cuda_version()
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/iou3d.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/iou3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..94e2057ad2530e25a53ad89a0d2d78ee75ca0483
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/iou3d.py
@@ -0,0 +1,226 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Optional
+
+import torch
+from torch import Tensor
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'iou3d_boxes_overlap_bev_forward', 'iou3d_nms3d_forward',
+    'iou3d_nms3d_normal_forward'
+])
+
+
+def boxes_overlap_bev(boxes_a: Tensor, boxes_b: Tensor) -> Tensor:
+    """Calculate boxes BEV overlap.
+
+    Args:
+        boxes_a (torch.Tensor): Input boxes a with shape (M, 7).
+        boxes_b (torch.Tensor): Input boxes b with shape (N, 7).
+
+    Returns:
+        torch.Tensor: BEV overlap result with shape (M, N).
+    """
+    ans_overlap = boxes_a.new_zeros(
+        torch.Size((boxes_a.shape[0], boxes_b.shape[0])))
+    ext_module.iou3d_boxes_overlap_bev_forward(boxes_a.contiguous(),
+                                               boxes_b.contiguous(),
+                                               ans_overlap)
+
+    return ans_overlap
+
+
+def boxes_iou3d(boxes_a: Tensor, boxes_b: Tensor) -> Tensor:
+    """Calculate boxes 3D IoU.
+
+    Args:
+        boxes_a (torch.Tensor): Input boxes a with shape (M, 7).
+        boxes_b (torch.Tensor): Input boxes b with shape (N, 7).
+
+    Returns:
+        torch.Tensor: 3D IoU result with shape (M, N).
+    """
+    assert boxes_a.shape[1] == boxes_b.shape[1] == 7,\
+        'Input boxes shape should be (N, 7)'
+
+    boxes_a_height_max = (boxes_a[:, 2] + boxes_a[:, 5] / 2).view(-1, 1)
+    boxes_a_height_min = (boxes_a[:, 2] - boxes_a[:, 5] / 2).view(-1, 1)
+    boxes_b_height_max = (boxes_b[:, 2] + boxes_b[:, 5] / 2).view(1, -1)
+    boxes_b_height_min = (boxes_b[:, 2] - boxes_b[:, 5] / 2).view(1, -1)
+
+    overlaps_bev = boxes_a.new_zeros(
+        torch.Size((boxes_a.shape[0], boxes_b.shape[0])))
+    ext_module.iou3d_boxes_overlap_bev_forward(boxes_a.contiguous(),
+                                               boxes_b.contiguous(),
+                                               overlaps_bev)
+
+    max_of_min = torch.max(boxes_a_height_min, boxes_b_height_min)
+    min_of_max = torch.min(boxes_a_height_max, boxes_b_height_max)
+    overlaps_h = torch.clamp(min_of_max - max_of_min, min=0)
+    overlaps_3d = overlaps_bev * overlaps_h
+    vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).view(-1, 1)
+    vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).view(1, -1)
+    iou3d = overlaps_3d / torch.clamp(vol_a + vol_b - overlaps_3d, min=1e-6)
+    return iou3d
+
+
+def nms3d(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
+    """3D NMS function GPU implementation (for BEV boxes).
+
+    Args:
+        boxes (torch.Tensor): Input boxes with the shape of (N, 7)
+            ([x, y, z, dx, dy, dz, heading]).
+        scores (torch.Tensor): Scores of boxes with the shape of (N).
+        iou_threshold (float): Overlap threshold of NMS.
+
+    Returns:
+        torch.Tensor: Indexes after NMS.
+    """
+    assert boxes.size(1) == 7, 'Input boxes shape should be (N, 7)'
+    order = scores.sort(0, descending=True)[1]
+    boxes = boxes[order].contiguous()
+
+    keep = boxes.new_zeros(boxes.size(0), dtype=torch.long)
+    num_out = boxes.new_zeros(size=(), dtype=torch.long)
+    ext_module.iou3d_nms3d_forward(
+        boxes, keep, num_out, nms_overlap_thresh=iou_threshold)
+    keep = order[keep[:num_out].to(boxes.device)].contiguous()
+    return keep
+
+
+def nms3d_normal(boxes: Tensor, scores: Tensor,
+                 iou_threshold: float) -> Tensor:
+    """Normal 3D NMS function GPU implementation. The overlap of two boxes for
+    IoU calculation is defined as the exact overlapping area of the two boxes
+    WITH their yaw angle set to 0.
+
+    Args:
+        boxes (torch.Tensor): Input boxes with shape (N, 7).
+            ([x, y, z, dx, dy, dz, heading]).
+        scores (torch.Tensor): Scores of predicted boxes with shape (N).
+        iou_threshold (float): Overlap threshold of NMS.
+
+    Returns:
+        torch.Tensor: Remaining indices with scores in descending order.
+    """
+    assert boxes.shape[1] == 7, 'Input boxes shape should be (N, 7)'
+    order = scores.sort(0, descending=True)[1]
+    boxes = boxes[order].contiguous()
+
+    keep = boxes.new_zeros(boxes.size(0), dtype=torch.long)
+    num_out = boxes.new_zeros(size=(), dtype=torch.long)
+    ext_module.iou3d_nms3d_normal_forward(
+        boxes, keep, num_out, nms_overlap_thresh=iou_threshold)
+    return order[keep[:num_out].to(boxes.device)].contiguous()
+
+
+def _xyxyr2xywhr(boxes: Tensor) -> Tensor:
+    """Convert [x1, y1, x2, y2, heading] box to [x, y, dx, dy, heading] box.
+
+    Args:
+        box (torch.Tensor): Input boxes with shape (N, 5).
+
+    Returns:
+        torch.Tensor: Converted boxes with shape (N, 7).
+    """
+    warnings.warn(
+        'This function is deprecated and will be removed in the future.',
+        DeprecationWarning)
+    return torch.stack(
+        ((boxes[:, 0] + boxes[:, 2]) / 2, (boxes[:, 1] + boxes[:, 3]) / 2,
+         boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1], boxes[:, 4]),
+        dim=-1)
+
+
+def boxes_iou_bev(boxes_a: Tensor, boxes_b: Tensor) -> Tensor:
+    """Calculate boxes IoU in the Bird's Eye View.
+
+    Args:
+        boxes_a (torch.Tensor): Input boxes a with shape (M, 5)
+            ([x1, y1, x2, y2, ry]).
+        boxes_b (torch.Tensor): Input boxes b with shape (N, 5)
+            ([x1, y1, x2, y2, ry]).
+
+    Returns:
+        torch.Tensor: IoU result with shape (M, N).
+    """
+    from .box_iou_rotated import box_iou_rotated
+
+    warnings.warn(
+        '`iou3d.boxes_iou_bev` is deprecated and will be removed in'
+        ' the future. Please, use `box_iou_rotated.box_iou_rotated`.',
+        DeprecationWarning)
+
+    return box_iou_rotated(_xyxyr2xywhr(boxes_a), _xyxyr2xywhr(boxes_b))
+
+
+def nms_bev(boxes: Tensor,
+            scores: Tensor,
+            thresh: float,
+            pre_max_size: Optional[int] = None,
+            post_max_size: Optional[int] = None) -> Tensor:
+    """NMS function GPU implementation (for BEV boxes).
+
+    The overlap of two boxes for IoU calculation is defined as the exact
+    overlapping area of the two boxes. In this function, one can also
+    set ``pre_max_size`` and ``post_max_size``.
+
+    Args:
+        boxes (torch.Tensor): Input boxes with the shape of (N, 5)
+            ([x1, y1, x2, y2, ry]).
+        scores (torch.Tensor): Scores of boxes with the shape of (N,).
+        thresh (float): Overlap threshold of NMS.
+        pre_max_size (int, optional): Max size of boxes before NMS.
+            Default: None.
+        post_max_size (int, optional): Max size of boxes after NMS.
+            Default: None.
+
+    Returns:
+        torch.Tensor: Indexes after NMS.
+    """
+    from .nms import nms_rotated
+
+    warnings.warn(
+        '`iou3d.nms_bev` is deprecated and will be removed in'
+        ' the future. Please, use `nms.nms_rotated`.', DeprecationWarning)
+    assert boxes.size(1) == 5, 'Input boxes shape should be (N, 5)'
+    order = scores.sort(0, descending=True)[1]
+
+    if pre_max_size is not None:
+        order = order[:pre_max_size]
+    boxes = _xyxyr2xywhr(boxes)[order]
+    scores = scores[order]
+
+    keep = nms_rotated(boxes, scores, thresh)[1]
+    keep = order[keep]
+
+    if post_max_size is not None:
+        keep = keep[:post_max_size]
+    return keep
+
+
+def nms_normal_bev(boxes: Tensor, scores: Tensor, thresh: float) -> Tensor:
+    """Normal NMS function GPU implementation (for BEV boxes).
+
+    The overlap of two boxes for IoU calculation is defined as the exact
+    overlapping area of the two boxes WITH their yaw angle set to 0.
+
+    Args:
+        boxes (torch.Tensor): Input boxes with shape (N, 5)
+            ([x1, y1, x2, y2, ry]).
+        scores (torch.Tensor): Scores of predicted boxes with shape (N,).
+        thresh (float): Overlap threshold of NMS.
+
+    Returns:
+        torch.Tensor: Remaining indices with scores in descending order.
+    """
+    from .nms import nms
+
+    warnings.warn(
+        '`iou3d.nms_normal_bev` is deprecated and will be removed in'
+        ' the future. Please, use `nms.nms`.', DeprecationWarning)
+    assert boxes.shape[1] == 5, 'Input boxes shape should be (N, 5)'
+
+    return nms(boxes[:, :-1], scores, thresh)[1]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/knn.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/knn.py
new file mode 100644
index 0000000000000000000000000000000000000000..48ce92f9259bdcec166a23be2ba81544a69bc8c1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/knn.py
@@ -0,0 +1,80 @@
+from typing import Optional
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['knn_forward'])
+
+
+class KNN(Function):
+    r"""KNN (CUDA) based on heap data structure.
+
+    Modified from `PAConv <https://github.com/CVMI-Lab/PAConv/tree/main/
+    scene_seg/lib/pointops/src/knnquery_heap>`_.
+
+    Find k-nearest points.
+    """
+
+    @staticmethod
+    def forward(ctx,
+                k: int,
+                xyz: torch.Tensor,
+                center_xyz: Optional[torch.Tensor] = None,
+                transposed: bool = False) -> torch.Tensor:
+        """
+        Args:
+            k (int): number of nearest neighbors.
+            xyz (torch.Tensor): (B, N, 3) if transposed == False, else
+                (B, 3, N). xyz coordinates of the features.
+            center_xyz (torch.Tensor, optional): (B, npoint, 3) if transposed
+                is False, else (B, 3, npoint). centers of the knn query.
+                Default: None.
+            transposed (bool, optional): whether the input tensors are
+                transposed. Should not explicitly use this keyword when
+                calling knn (=KNN.apply), just add the fourth param.
+                Default: False.
+
+        Returns:
+            torch.Tensor: (B, k, npoint) tensor with the indices of the
+            features that form k-nearest neighbours.
+        """
+        assert (k > 0) & (k < 100), 'k should be in range(0, 100)'
+
+        if center_xyz is None:
+            center_xyz = xyz
+
+        if transposed:
+            xyz = xyz.transpose(2, 1).contiguous()
+            center_xyz = center_xyz.transpose(2, 1).contiguous()
+
+        assert xyz.is_contiguous()  # [B, N, 3]
+        assert center_xyz.is_contiguous()  # [B, npoint, 3]
+
+        center_xyz_device = center_xyz.get_device()
+        assert center_xyz_device == xyz.get_device(), \
+            'center_xyz and xyz should be put on the same device'
+        if torch.cuda.current_device() != center_xyz_device:
+            torch.cuda.set_device(center_xyz_device)
+
+        B, npoint, _ = center_xyz.shape
+        N = xyz.shape[1]
+
+        idx = center_xyz.new_zeros((B, npoint, k)).int()
+        dist2 = center_xyz.new_zeros((B, npoint, k)).float()
+
+        ext_module.knn_forward(
+            xyz, center_xyz, idx, dist2, b=B, n=N, m=npoint, nsample=k)
+        # idx shape to [B, k, npoint]
+        idx = idx.transpose(2, 1).contiguous()
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(idx)
+        return idx
+
+    @staticmethod
+    def backward(ctx, a=None):
+        return None, None, None
+
+
+knn = KNN.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/masked_conv.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/masked_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..919702e9cbd04b9e1f5c93147bcced8a1be38c61
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/masked_conv.py
@@ -0,0 +1,138 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['masked_im2col_forward', 'masked_col2im_forward'])
+
+
+class MaskedConv2dFunction(Function):
+
+    @staticmethod
+    def symbolic(g, features, mask, weight, bias, padding, stride=1):
+        return g.op(
+            'mmcv::MMCVMaskedConv2d',
+            features,
+            mask,
+            weight,
+            bias,
+            padding_i=padding,
+            stride_i=stride)
+
+    @staticmethod
+    def forward(ctx,
+                features: torch.Tensor,
+                mask: torch.Tensor,
+                weight: torch.nn.Parameter,
+                bias: torch.nn.Parameter,
+                padding: int = 0,
+                stride: int = 1) -> torch.Tensor:
+        assert mask.dim() == 3 and mask.size(0) == 1
+        assert features.dim() == 4 and features.size(0) == 1
+        assert features.size()[2:] == mask.size()[1:]
+        pad_h, pad_w = _pair(padding)
+        stride_h, stride_w = _pair(stride)
+        if stride_h != 1 or stride_w != 1:
+            raise ValueError(
+                'Stride could not only be 1 in masked_conv2d currently.')
+        out_channel, in_channel, kernel_h, kernel_w = weight.size()
+
+        if features.device.type == 'npu':
+            import torch_npu
+            output = torch_npu.npu_conv2d(
+                features,
+                weight,
+                bias,
+                stride=(stride_h, stride_w),
+                padding=(pad_h, pad_w),
+                dilation=(1, 1),
+                groups=1)
+            if mask.size()[1:] != output.size()[2:]:
+                raise ValueError(
+                    'The mask is inconsistent with the shape of output_conv.')
+            mask = mask > 0
+            mask = mask.type(output.dtype)
+            output = output * mask
+            return output
+
+        batch_size = features.size(0)
+        out_h = int(
+            math.floor(
+                torch.true_divide((features.size(2) + 2 * pad_h -
+                                   (kernel_h - 1) - 1), stride_h) + 1))
+        out_w = int(
+            math.floor(
+                torch.true_divide((features.size(3) + 2 * pad_w -
+                                   (kernel_w - 1) - 1), stride_w) + 1))
+        mask_inds = torch.nonzero(mask[0] > 0, as_tuple=False)
+        output = features.new_zeros(batch_size, out_channel, out_h, out_w)
+        if mask_inds.numel() > 0:
+            mask_h_idx = mask_inds[:, 0].contiguous()
+            mask_w_idx = mask_inds[:, 1].contiguous()
+            data_col = features.new_zeros(in_channel * kernel_h * kernel_w,
+                                          mask_inds.size(0))
+            ext_module.masked_im2col_forward(
+                features,
+                mask_h_idx,
+                mask_w_idx,
+                data_col,
+                kernel_h=kernel_h,
+                kernel_w=kernel_w,
+                pad_h=pad_h,
+                pad_w=pad_w)
+            masked_output = torch.addmm(1, bias[:, None], 1,
+                                        weight.view(out_channel, -1), data_col)
+            ext_module.masked_col2im_forward(
+                masked_output,
+                mask_h_idx,
+                mask_w_idx,
+                output,
+                height=out_h,
+                width=out_w,
+                channels=out_channel)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        return (None, ) * 5
+
+
+masked_conv2d = MaskedConv2dFunction.apply
+
+
+class MaskedConv2d(nn.Conv2d):
+    """A MaskedConv2d which inherits the official Conv2d.
+
+    The masked forward doesn't implement the backward function and only
+    supports the stride parameter to be 1 currently.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int, ...]],
+                 stride: int = 1,
+                 padding: int = 0,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 bias: bool = True):
+        super().__init__(in_channels, out_channels, kernel_size, stride,
+                         padding, dilation, groups, bias)
+
+    def forward(self,
+                input: torch.Tensor,
+                mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+        if mask is None:  # fallback to the normal Conv2d
+            return super().forward(input)
+        else:
+            return masked_conv2d(input, mask, self.weight, self.bias,
+                                 self.padding)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/merge_cells.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/merge_cells.py
new file mode 100644
index 0000000000000000000000000000000000000000..19c3fe6582bc04390819b1da9b2620548b462836
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/merge_cells.py
@@ -0,0 +1,166 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from abc import abstractmethod
+from typing import Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..cnn import ConvModule
+
+
+class BaseMergeCell(nn.Module):
+    """The basic class for cells used in NAS-FPN and NAS-FCOS.
+
+    BaseMergeCell takes 2 inputs. After applying convolution
+    on them, they are resized to the target size. Then,
+    they go through binary_op, which depends on the type of cell.
+    If with_out_conv is True, the result of output will go through
+    another convolution layer.
+
+    Args:
+        fused_channels (int): number of input channels in out_conv layer.
+        out_channels (int): number of output channels in out_conv layer.
+        with_out_conv (bool): Whether to use out_conv layer
+        out_conv_cfg (dict): Config dict for convolution layer, which should
+            contain "groups", "kernel_size", "padding", "bias" to build
+            out_conv layer.
+        out_norm_cfg (dict): Config dict for normalization layer in out_conv.
+        out_conv_order (tuple): The order of conv/norm/activation layers in
+            out_conv.
+        with_input1_conv (bool): Whether to use convolution on input1.
+        with_input2_conv (bool): Whether to use convolution on input2.
+        input_conv_cfg (dict): Config dict for building input1_conv layer and
+            input2_conv layer, which is expected to contain the type of
+            convolution.
+            Default: None, which means using conv2d.
+        input_norm_cfg (dict): Config dict for normalization layer in
+            input1_conv and input2_conv layer. Default: None.
+        upsample_mode (str): Interpolation method used to resize the output
+            of input1_conv and input2_conv to target size. Currently, we
+            support ['nearest', 'bilinear']. Default: 'nearest'.
+    """
+
+    def __init__(self,
+                 fused_channels: Optional[int] = 256,
+                 out_channels: Optional[int] = 256,
+                 with_out_conv: bool = True,
+                 out_conv_cfg: dict = dict(
+                     groups=1, kernel_size=3, padding=1, bias=True),
+                 out_norm_cfg: Optional[dict] = None,
+                 out_conv_order: tuple = ('act', 'conv', 'norm'),
+                 with_input1_conv: bool = False,
+                 with_input2_conv: bool = False,
+                 input_conv_cfg: Optional[dict] = None,
+                 input_norm_cfg: Optional[dict] = None,
+                 upsample_mode: str = 'nearest'):
+        super().__init__()
+        assert upsample_mode in ['nearest', 'bilinear']
+        self.with_out_conv = with_out_conv
+        self.with_input1_conv = with_input1_conv
+        self.with_input2_conv = with_input2_conv
+        self.upsample_mode = upsample_mode
+
+        if self.with_out_conv:
+            self.out_conv = ConvModule(
+                fused_channels,  # type: ignore
+                out_channels,  # type: ignore
+                **out_conv_cfg,
+                norm_cfg=out_norm_cfg,
+                order=out_conv_order)
+
+        self.input1_conv = self._build_input_conv(
+            out_channels, input_conv_cfg,
+            input_norm_cfg) if with_input1_conv else nn.Sequential()
+        self.input2_conv = self._build_input_conv(
+            out_channels, input_conv_cfg,
+            input_norm_cfg) if with_input2_conv else nn.Sequential()
+
+    def _build_input_conv(self, channel, conv_cfg, norm_cfg):
+        return ConvModule(
+            channel,
+            channel,
+            3,
+            padding=1,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            bias=True)
+
+    @abstractmethod
+    def _binary_op(self, x1, x2):
+        pass
+
+    def _resize(self, x, size):
+        if x.shape[-2:] == size:
+            return x
+        elif x.shape[-2:] < size:
+            return F.interpolate(x, size=size, mode=self.upsample_mode)
+        else:
+            if x.shape[-2] % size[-2] != 0 or x.shape[-1] % size[-1] != 0:
+                h, w = x.shape[-2:]
+                target_h, target_w = size
+                pad_h = math.ceil(h / target_h) * target_h - h
+                pad_w = math.ceil(w / target_w) * target_w - w
+                pad_l = pad_w // 2
+                pad_r = pad_w - pad_l
+                pad_t = pad_h // 2
+                pad_b = pad_h - pad_t
+                pad = (pad_l, pad_r, pad_t, pad_b)
+                x = F.pad(x, pad, mode='constant', value=0.0)
+            kernel_size = (x.shape[-2] // size[-2], x.shape[-1] // size[-1])
+            x = F.max_pool2d(x, kernel_size=kernel_size, stride=kernel_size)
+            return x
+
+    def forward(self,
+                x1: torch.Tensor,
+                x2: torch.Tensor,
+                out_size: Optional[tuple] = None) -> torch.Tensor:
+        assert x1.shape[:2] == x2.shape[:2]
+        assert out_size is None or len(out_size) == 2
+        if out_size is None:  # resize to larger one
+            out_size = max(x1.size()[2:], x2.size()[2:])
+
+        x1 = self.input1_conv(x1)
+        x2 = self.input2_conv(x2)
+
+        x1 = self._resize(x1, out_size)
+        x2 = self._resize(x2, out_size)
+
+        x = self._binary_op(x1, x2)
+        if self.with_out_conv:
+            x = self.out_conv(x)
+        return x
+
+
+class SumCell(BaseMergeCell):
+
+    def __init__(self, in_channels: int, out_channels: int, **kwargs):
+        super().__init__(in_channels, out_channels, **kwargs)
+
+    def _binary_op(self, x1, x2):
+        return x1 + x2
+
+
+class ConcatCell(BaseMergeCell):
+
+    def __init__(self, in_channels: int, out_channels: int, **kwargs):
+        super().__init__(in_channels * 2, out_channels, **kwargs)
+
+    def _binary_op(self, x1, x2):
+        ret = torch.cat([x1, x2], dim=1)
+        return ret
+
+
+class GlobalPoolingCell(BaseMergeCell):
+
+    def __init__(self,
+                 in_channels: Optional[int] = None,
+                 out_channels: Optional[int] = None,
+                 **kwargs):
+        super().__init__(in_channels, out_channels, **kwargs)
+        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
+
+    def _binary_op(self, x1, x2):
+        x2_att = self.global_pool(x2).sigmoid()
+        return x2 + x2_att * x1
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/min_area_polygons.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/min_area_polygons.py
new file mode 100644
index 0000000000000000000000000000000000000000..b95f58796f4a894ab5cc48e2d766319f4c3640c7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/min_area_polygons.py
@@ -0,0 +1,20 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['min_area_polygons'])
+
+
+def min_area_polygons(pointsets: torch.Tensor) -> torch.Tensor:
+    """Find the smallest polygons that surrounds all points in the point sets.
+
+    Args:
+        pointsets (Tensor): point sets with shape  (N, 18).
+
+    Returns:
+        torch.Tensor: Return the smallest polygons with shape (N, 8).
+    """
+    polygons = pointsets.new_zeros((pointsets.size(0), 8))
+    ext_module.min_area_polygons(pointsets, polygons)
+    return polygons
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/modulated_deform_conv.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/modulated_deform_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..acadc533bd746ecacdde5352d46610e5f09d8c31
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/modulated_deform_conv.py
@@ -0,0 +1,355 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.logging import print_log
+from mmengine.registry import MODELS
+from mmengine.utils import deprecated_api_warning
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair, _single
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext',
+    ['modulated_deform_conv_forward', 'modulated_deform_conv_backward'])
+
+
+class ModulatedDeformConv2dFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, offset, mask, weight, bias, stride, padding,
+                 dilation, groups, deform_groups):
+        input_tensors = [input, offset, mask, weight]
+        if bias is not None:
+            input_tensors.append(bias)
+        return g.op(
+            'mmcv::MMCVModulatedDeformConv2d',
+            *input_tensors,
+            stride_i=stride,
+            padding_i=padding,
+            dilation_i=dilation,
+            groups_i=groups,
+            deform_groups_i=deform_groups)
+
+    @staticmethod
+    def _calculate_sort_index(kernel_h, kernel_w, deformable_group):
+        split_num = deformable_group * 2 * kernel_h * kernel_w
+        sort_index = list(range(split_num))
+        sort_index_fp = (sort_index[1::2] + sort_index[::2])
+        sort_index_bp_dict = {i: idx for idx, i in enumerate(sort_index_fp)}
+        sort_index_bp = [sort_index_bp_dict[i] for i in sort_index]
+        sort_index_fp = torch.IntTensor(sort_index_fp)
+        sort_index_bp = torch.IntTensor(sort_index_bp)
+        sort_index_fp = sort_index_fp.npu()
+        sort_index_bp = sort_index_bp.npu()
+        return sort_index_fp, sort_index_bp
+
+    @staticmethod
+    def _npu_forward(ctx, input_tensor, offset, mask, weight, bias):
+        _, _, kernel_h, kernel_w = weight.shape
+        conv2d_bias = bias if len(bias) > 0 else None
+        sort_index_fp, sort_index_bp = \
+            ModulatedDeformConv2dFunction._calculate_sort_index(
+                kernel_w, kernel_h, ctx.deform_groups)
+        select_offset = offset.index_select(1, sort_index_fp)
+        offset_all = torch.cat([select_offset, mask], dim=1)
+        output, offset_out = torch.npu_deformable_conv2d(
+            input_tensor,
+            weight,
+            offset_all,
+            conv2d_bias,
+            kernel_size=[kernel_w, kernel_h],
+            stride=[1, 1, ctx.stride[0], ctx.stride[1]],
+            padding=[1, 1, ctx.padding[0], ctx.padding[1]],
+            dilation=[1, 1, ctx.dilation[0], ctx.dilation[1]],
+            groups=ctx.groups,
+            deformable_groups=ctx.deform_groups,
+            modulated=True)
+        if weight.requires_grad or mask.requires_grad or offset.requires_grad \
+                or input_tensor.requires_grad:
+            ctx.save_for_backward(input_tensor, weight, offset_out, offset_all,
+                                  sort_index_bp)
+        return output
+
+    @staticmethod
+    def _npu_backward(ctx, grad_output):
+        input_tensor, weight, offset_out, offset_all, sort_index_bp = \
+            ctx.saved_tensors
+        grad_input, grad_weight, grad_offset_all, grad_bias = \
+            torch.npu_deformable_conv2dbk(
+                input_tensor, grad_output, offset_out, weight, offset_all,
+                kernel_size=[weight.shape[3], weight.shape[2]],
+                stride=[1, 1, ctx.stride[0], ctx.stride[1]],
+                padding=[1, 1, ctx.padding[0], ctx.padding[1]],
+                dilation=[1, 1, ctx.dilation[0], ctx.dilation[1]],
+                groups=ctx.groups, deformable_groups=ctx.deform_groups,
+                modulated=True)
+        grad_offset = grad_offset_all.index_select(1, sort_index_bp)
+        grad_mask = grad_offset_all[:, grad_offset.shape[1]:, :, :]
+        if not ctx.with_bias:
+            grad_bias = None
+        return (grad_input, grad_offset, grad_mask, grad_weight, grad_bias,
+                None, None, None, None, None, None, None, None)
+
+    @staticmethod
+    def forward(ctx,
+                input: torch.Tensor,
+                offset: torch.Tensor,
+                mask: torch.Tensor,
+                weight: nn.Parameter,
+                bias: Optional[nn.Parameter] = None,
+                stride: int = 1,
+                padding: int = 0,
+                dilation: int = 1,
+                groups: int = 1,
+                deform_groups: int = 1) -> torch.Tensor:
+        if input is not None and input.dim() != 4:
+            raise ValueError(
+                f'Expected 4D tensor as input, got {input.dim()}D tensor \
+                  instead.')
+        ctx.stride = _pair(stride)
+        ctx.padding = _pair(padding)
+        ctx.dilation = _pair(dilation)
+        ctx.groups = groups
+        ctx.deform_groups = deform_groups
+        ctx.with_bias = bias is not None
+        ctx.device = input.device.type
+        if not ctx.with_bias:
+            bias = input.new_empty(0)  # fake tensor
+        # When pytorch version >= 1.6.0, amp is adopted for fp16 mode;
+        # amp won't cast the type of model (float32), but "offset" is cast
+        # to float16 by nn.Conv2d automatically, leading to the type
+        # mismatch with input (when it is float32) or weight.
+        # The flag for whether to use fp16 or amp is the type of "offset",
+        # we cast weight and input to temporarily support fp16 and amp
+        # whatever the pytorch version is.
+        input = input.type_as(offset)
+        weight = weight.type_as(input)
+        bias = bias.type_as(input)  # type: ignore
+        mask = mask.type_as(input)
+        if ctx.device == 'npu':
+            output = ModulatedDeformConv2dFunction._npu_forward(
+                ctx, input, offset, mask, weight, bias)
+            return output
+        ctx.save_for_backward(input, offset, mask, weight, bias)
+        output = input.new_empty(
+            ModulatedDeformConv2dFunction._output_size(ctx, input, weight))
+        ctx._bufs = [input.new_empty(0), input.new_empty(0)]
+        ext_module.modulated_deform_conv_forward(
+            input,
+            weight,
+            bias,
+            ctx._bufs[0],
+            offset,
+            mask,
+            output,
+            ctx._bufs[1],
+            kernel_h=weight.size(2),
+            kernel_w=weight.size(3),
+            stride_h=ctx.stride[0],
+            stride_w=ctx.stride[1],
+            pad_h=ctx.padding[0],
+            pad_w=ctx.padding[1],
+            dilation_h=ctx.dilation[0],
+            dilation_w=ctx.dilation[1],
+            group=ctx.groups,
+            deformable_group=ctx.deform_groups,
+            with_bias=ctx.with_bias)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        if ctx.device == 'npu':
+            return ModulatedDeformConv2dFunction._npu_backward(
+                ctx, grad_output)
+        input, offset, mask, weight, bias = ctx.saved_tensors
+        grad_input = torch.zeros_like(input)
+        grad_offset = torch.zeros_like(offset)
+        grad_mask = torch.zeros_like(mask)
+        grad_weight = torch.zeros_like(weight)
+        grad_bias = torch.zeros_like(bias)
+        grad_output = grad_output.contiguous()
+        ext_module.modulated_deform_conv_backward(
+            input,
+            weight,
+            bias,
+            ctx._bufs[0],
+            offset,
+            mask,
+            ctx._bufs[1],
+            grad_input,
+            grad_weight,
+            grad_bias,
+            grad_offset,
+            grad_mask,
+            grad_output,
+            kernel_h=weight.size(2),
+            kernel_w=weight.size(3),
+            stride_h=ctx.stride[0],
+            stride_w=ctx.stride[1],
+            pad_h=ctx.padding[0],
+            pad_w=ctx.padding[1],
+            dilation_h=ctx.dilation[0],
+            dilation_w=ctx.dilation[1],
+            group=ctx.groups,
+            deformable_group=ctx.deform_groups,
+            with_bias=ctx.with_bias)
+        if not ctx.with_bias:
+            grad_bias = None
+
+        return (grad_input, grad_offset, grad_mask, grad_weight, grad_bias,
+                None, None, None, None, None)
+
+    @staticmethod
+    def _output_size(ctx, input, weight):
+        channels = weight.size(0)
+        output_size = (input.size(0), channels)
+        for d in range(input.dim() - 2):
+            in_size = input.size(d + 2)
+            pad = ctx.padding[d]
+            kernel = ctx.dilation[d] * (weight.size(d + 2) - 1) + 1
+            stride_ = ctx.stride[d]
+            output_size += ((in_size + (2 * pad) - kernel) // stride_ + 1, )
+        if not all(map(lambda s: s > 0, output_size)):
+            raise ValueError(
+                'convolution input is too small (output would be ' +
+                'x'.join(map(str, output_size)) + ')')
+        return output_size
+
+
+modulated_deform_conv2d = ModulatedDeformConv2dFunction.apply
+
+
+class ModulatedDeformConv2d(nn.Module):
+
+    @deprecated_api_warning({'deformable_groups': 'deform_groups'},
+                            cls_name='ModulatedDeformConv2d')
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: Union[int, Tuple[int]],
+                 stride: int = 1,
+                 padding: int = 0,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 deform_groups: int = 1,
+                 bias: Union[bool, str] = True):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = _pair(kernel_size)
+        self.stride = _pair(stride)
+        self.padding = _pair(padding)
+        self.dilation = _pair(dilation)
+        self.groups = groups
+        self.deform_groups = deform_groups
+        # enable compatibility with nn.Conv2d
+        self.transposed = False
+        self.output_padding = _single(0)
+
+        self.weight = nn.Parameter(
+            torch.Tensor(out_channels, in_channels // groups,
+                         *self.kernel_size))
+        if bias:
+            self.bias = nn.Parameter(torch.Tensor(out_channels))
+        else:
+            self.register_parameter('bias', None)
+        self.init_weights()
+
+    def init_weights(self):
+        n = self.in_channels
+        for k in self.kernel_size:
+            n *= k
+        stdv = 1. / math.sqrt(n)
+        self.weight.data.uniform_(-stdv, stdv)
+        if self.bias is not None:
+            self.bias.data.zero_()
+
+    def forward(self, x: torch.Tensor, offset: torch.Tensor,
+                mask: torch.Tensor) -> torch.Tensor:
+        return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
+                                       self.stride, self.padding,
+                                       self.dilation, self.groups,
+                                       self.deform_groups)
+
+
+@MODELS.register_module('DCNv2')
+class ModulatedDeformConv2dPack(ModulatedDeformConv2d):
+    """A ModulatedDeformable Conv Encapsulation that acts as normal Conv
+    layers.
+
+    Args:
+        in_channels (int): Same as nn.Conv2d.
+        out_channels (int): Same as nn.Conv2d.
+        kernel_size (int or tuple[int]): Same as nn.Conv2d.
+        stride (int): Same as nn.Conv2d, while tuple is not supported.
+        padding (int): Same as nn.Conv2d, while tuple is not supported.
+        dilation (int): Same as nn.Conv2d, while tuple is not supported.
+        groups (int): Same as nn.Conv2d.
+        bias (bool or str): If specified as `auto`, it will be decided by the
+            norm_cfg. Bias will be set as True if norm_cfg is None, otherwise
+            False.
+    """
+
+    _version = 2
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.conv_offset = nn.Conv2d(
+            self.in_channels,
+            self.deform_groups * 3 * self.kernel_size[0] * self.kernel_size[1],
+            kernel_size=self.kernel_size,
+            stride=self.stride,
+            padding=self.padding,
+            dilation=self.dilation,
+            bias=True)
+        self.init_weights()
+
+    def init_weights(self) -> None:
+        super().init_weights()
+        if hasattr(self, 'conv_offset'):
+            self.conv_offset.weight.data.zero_()
+            self.conv_offset.bias.data.zero_()
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:  # type: ignore
+        out = self.conv_offset(x)
+        o1, o2, mask = torch.chunk(out, 3, dim=1)
+        offset = torch.cat((o1, o2), dim=1)
+        mask = torch.sigmoid(mask)
+        return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
+                                       self.stride, self.padding,
+                                       self.dilation, self.groups,
+                                       self.deform_groups)
+
+    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
+                              missing_keys, unexpected_keys, error_msgs):
+        version = local_metadata.get('version', None)
+
+        if version is None or version < 2:
+            # the key is different in early versions
+            # In version < 2, ModulatedDeformConvPack
+            # loads previous benchmark models.
+            if (prefix + 'conv_offset.weight' not in state_dict
+                    and prefix[:-1] + '_offset.weight' in state_dict):
+                state_dict[prefix + 'conv_offset.weight'] = state_dict.pop(
+                    prefix[:-1] + '_offset.weight')
+            if (prefix + 'conv_offset.bias' not in state_dict
+                    and prefix[:-1] + '_offset.bias' in state_dict):
+                state_dict[prefix +
+                           'conv_offset.bias'] = state_dict.pop(prefix[:-1] +
+                                                                '_offset.bias')
+
+        if version is not None and version > 1:
+            print_log(
+                f'ModulatedDeformConvPack {prefix.rstrip(".")} is upgraded to '
+                'version 2.',
+                logger='current')
+
+        super()._load_from_state_dict(state_dict, prefix, local_metadata,
+                                      strict, missing_keys, unexpected_keys,
+                                      error_msgs)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/multi_scale_deform_attn.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/multi_scale_deform_attn.py
new file mode 100644
index 0000000000000000000000000000000000000000..c1d415621a7c73c7f22acaf76f45c03742e32ce0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/multi_scale_deform_attn.py
@@ -0,0 +1,369 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+import warnings
+from typing import Optional, no_type_check
+
+import mmengine
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model import BaseModule, constant_init, xavier_init
+from mmengine.registry import MODELS
+from mmengine.utils import deprecated_api_warning
+from torch.autograd.function import Function, once_differentiable
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['ms_deform_attn_backward', 'ms_deform_attn_forward'])
+
+
+class MultiScaleDeformableAttnFunction(Function):
+
+    @staticmethod
+    def forward(ctx, value: torch.Tensor, value_spatial_shapes: torch.Tensor,
+                value_level_start_index: torch.Tensor,
+                sampling_locations: torch.Tensor,
+                attention_weights: torch.Tensor,
+                im2col_step: torch.Tensor) -> torch.Tensor:
+        """GPU/MLU version of multi-scale deformable attention.
+
+        Args:
+            value (torch.Tensor): The value has shape
+                (bs, num_keys, mum_heads, embed_dims//num_heads)
+            value_spatial_shapes (torch.Tensor): Spatial shape of
+                each feature map, has shape (num_levels, 2),
+                last dimension 2 represent (h, w)
+            sampling_locations (torch.Tensor): The location of sampling points,
+                has shape
+                (bs ,num_queries, num_heads, num_levels, num_points, 2),
+                the last dimension 2 represent (x, y).
+            attention_weights (torch.Tensor): The weight of sampling points
+                used when calculate the attention, has shape
+                (bs ,num_queries, num_heads, num_levels, num_points),
+            im2col_step (torch.Tensor): The step used in image to column.
+
+        Returns:
+            torch.Tensor: has shape (bs, num_queries, embed_dims)
+        """
+
+        ctx.im2col_step = im2col_step
+        output = ext_module.ms_deform_attn_forward(
+            value,
+            value_spatial_shapes,
+            value_level_start_index,
+            sampling_locations,
+            attention_weights,
+            im2col_step=ctx.im2col_step)
+        ctx.save_for_backward(value, value_spatial_shapes,
+                              value_level_start_index, sampling_locations,
+                              attention_weights)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx, grad_output: torch.Tensor) -> tuple:
+        """GPU/MLU version of backward function.
+
+        Args:
+            grad_output (torch.Tensor): Gradient of output tensor of forward.
+
+        Returns:
+            tuple[Tensor]: Gradient of input tensors in forward.
+        """
+        value, value_spatial_shapes, value_level_start_index,\
+            sampling_locations, attention_weights = ctx.saved_tensors
+        grad_value = torch.zeros_like(value)
+        grad_sampling_loc = torch.zeros_like(sampling_locations)
+        grad_attn_weight = torch.zeros_like(attention_weights)
+
+        ext_module.ms_deform_attn_backward(
+            value,
+            value_spatial_shapes,
+            value_level_start_index,
+            sampling_locations,
+            attention_weights,
+            grad_output.contiguous(),
+            grad_value,
+            grad_sampling_loc,
+            grad_attn_weight,
+            im2col_step=ctx.im2col_step)
+
+        return grad_value, None, None, \
+            grad_sampling_loc, grad_attn_weight, None
+
+
+def multi_scale_deformable_attn_pytorch(
+        value: torch.Tensor, value_spatial_shapes: torch.Tensor,
+        sampling_locations: torch.Tensor,
+        attention_weights: torch.Tensor) -> torch.Tensor:
+    """CPU version of multi-scale deformable attention.
+
+    Args:
+        value (torch.Tensor): The value has shape
+            (bs, num_keys, num_heads, embed_dims//num_heads)
+        value_spatial_shapes (torch.Tensor): Spatial shape of
+            each feature map, has shape (num_levels, 2),
+            last dimension 2 represent (h, w)
+        sampling_locations (torch.Tensor): The location of sampling points,
+            has shape
+            (bs ,num_queries, num_heads, num_levels, num_points, 2),
+            the last dimension 2 represent (x, y).
+        attention_weights (torch.Tensor): The weight of sampling points used
+            when calculate the attention, has shape
+            (bs ,num_queries, num_heads, num_levels, num_points),
+
+    Returns:
+        torch.Tensor: has shape (bs, num_queries, embed_dims)
+    """
+
+    bs, _, num_heads, embed_dims = value.shape
+    _, num_queries, num_heads, num_levels, num_points, _ =\
+        sampling_locations.shape
+    value_list = value.split([H_ * W_ for H_, W_ in value_spatial_shapes],
+                             dim=1)
+    sampling_grids = 2 * sampling_locations - 1
+    sampling_value_list = []
+    for level, (H_, W_) in enumerate(value_spatial_shapes):
+        # bs, H_*W_, num_heads, embed_dims ->
+        # bs, H_*W_, num_heads*embed_dims ->
+        # bs, num_heads*embed_dims, H_*W_ ->
+        # bs*num_heads, embed_dims, H_, W_
+        value_l_ = value_list[level].flatten(2).transpose(1, 2).reshape(
+            bs * num_heads, embed_dims, H_, W_)
+        # bs, num_queries, num_heads, num_points, 2 ->
+        # bs, num_heads, num_queries, num_points, 2 ->
+        # bs*num_heads, num_queries, num_points, 2
+        sampling_grid_l_ = sampling_grids[:, :, :,
+                                          level].transpose(1, 2).flatten(0, 1)
+        # bs*num_heads, embed_dims, num_queries, num_points
+        sampling_value_l_ = F.grid_sample(
+            value_l_,
+            sampling_grid_l_,
+            mode='bilinear',
+            padding_mode='zeros',
+            align_corners=False)
+        sampling_value_list.append(sampling_value_l_)
+    # (bs, num_queries, num_heads, num_levels, num_points) ->
+    # (bs, num_heads, num_queries, num_levels, num_points) ->
+    # (bs, num_heads, 1, num_queries, num_levels*num_points)
+    attention_weights = attention_weights.transpose(1, 2).reshape(
+        bs * num_heads, 1, num_queries, num_levels * num_points)
+    output = (torch.stack(sampling_value_list, dim=-2).flatten(-2) *
+              attention_weights).sum(-1).view(bs, num_heads * embed_dims,
+                                              num_queries)
+    return output.transpose(1, 2).contiguous()
+
+
+@MODELS.register_module()
+class MultiScaleDeformableAttention(BaseModule):
+    """An attention module used in Deformable-Detr.
+
+    `Deformable DETR: Deformable Transformers for End-to-End Object Detection.
+    <https://arxiv.org/pdf/2010.04159.pdf>`_.
+
+    Args:
+        embed_dims (int): The embedding dimension of Attention.
+            Default: 256.
+        num_heads (int): Parallel attention heads. Default: 64.
+        num_levels (int): The number of feature map used in
+            Attention. Default: 4.
+        num_points (int): The number of sampling points for
+            each query in each head. Default: 4.
+        im2col_step (int): The step used in image_to_column.
+            Default: 64.
+        dropout (float): A Dropout layer on `inp_identity`.
+            Default: 0.1.
+        batch_first (bool): Key, Query and Value are shape of
+            (batch, n, embed_dim)
+            or (n, batch, embed_dim). Default to False.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: None.
+        init_cfg (obj:`mmcv.ConfigDict`): The Config for initialization.
+            Default: None.
+        value_proj_ratio (float): The expansion ratio of value_proj.
+            Default: 1.0.
+    """
+
+    def __init__(self,
+                 embed_dims: int = 256,
+                 num_heads: int = 8,
+                 num_levels: int = 4,
+                 num_points: int = 4,
+                 im2col_step: int = 64,
+                 dropout: float = 0.1,
+                 batch_first: bool = False,
+                 norm_cfg: Optional[dict] = None,
+                 init_cfg: Optional[mmengine.ConfigDict] = None,
+                 value_proj_ratio: float = 1.0):
+        super().__init__(init_cfg)
+        if embed_dims % num_heads != 0:
+            raise ValueError(f'embed_dims must be divisible by num_heads, '
+                             f'but got {embed_dims} and {num_heads}')
+        dim_per_head = embed_dims // num_heads
+        self.norm_cfg = norm_cfg
+        self.dropout = nn.Dropout(dropout)
+        self.batch_first = batch_first
+
+        # you'd better set dim_per_head to a power of 2
+        # which is more efficient in the CUDA implementation
+        def _is_power_of_2(n):
+            if (not isinstance(n, int)) or (n < 0):
+                raise ValueError(
+                    'invalid input for _is_power_of_2: {} (type: {})'.format(
+                        n, type(n)))
+            return (n & (n - 1) == 0) and n != 0
+
+        if not _is_power_of_2(dim_per_head):
+            warnings.warn(
+                "You'd better set embed_dims in "
+                'MultiScaleDeformAttention to make '
+                'the dimension of each attention head a power of 2 '
+                'which is more efficient in our CUDA implementation.')
+
+        self.im2col_step = im2col_step
+        self.embed_dims = embed_dims
+        self.num_levels = num_levels
+        self.num_heads = num_heads
+        self.num_points = num_points
+        self.sampling_offsets = nn.Linear(
+            embed_dims, num_heads * num_levels * num_points * 2)
+        self.attention_weights = nn.Linear(embed_dims,
+                                           num_heads * num_levels * num_points)
+        value_proj_size = int(embed_dims * value_proj_ratio)
+        self.value_proj = nn.Linear(embed_dims, value_proj_size)
+        self.output_proj = nn.Linear(value_proj_size, embed_dims)
+        self.init_weights()
+
+    def init_weights(self) -> None:
+        """Default initialization for Parameters of Module."""
+        constant_init(self.sampling_offsets, 0.)
+        device = next(self.parameters()).device
+        thetas = torch.arange(
+            self.num_heads, dtype=torch.float32,
+            device=device) * (2.0 * math.pi / self.num_heads)
+        grid_init = torch.stack([thetas.cos(), thetas.sin()], -1)
+        grid_init = (grid_init /
+                     grid_init.abs().max(-1, keepdim=True)[0]).view(
+                         self.num_heads, 1, 1,
+                         2).repeat(1, self.num_levels, self.num_points, 1)
+        for i in range(self.num_points):
+            grid_init[:, :, i, :] *= i + 1
+
+        self.sampling_offsets.bias.data = grid_init.view(-1)
+        constant_init(self.attention_weights, val=0., bias=0.)
+        xavier_init(self.value_proj, distribution='uniform', bias=0.)
+        xavier_init(self.output_proj, distribution='uniform', bias=0.)
+        self._is_init = True
+
+    @no_type_check
+    @deprecated_api_warning({'residual': 'identity'},
+                            cls_name='MultiScaleDeformableAttention')
+    def forward(self,
+                query: torch.Tensor,
+                key: Optional[torch.Tensor] = None,
+                value: Optional[torch.Tensor] = None,
+                identity: Optional[torch.Tensor] = None,
+                query_pos: Optional[torch.Tensor] = None,
+                key_padding_mask: Optional[torch.Tensor] = None,
+                reference_points: Optional[torch.Tensor] = None,
+                spatial_shapes: Optional[torch.Tensor] = None,
+                level_start_index: Optional[torch.Tensor] = None,
+                **kwargs) -> torch.Tensor:
+        """Forward Function of MultiScaleDeformAttention.
+
+        Args:
+            query (torch.Tensor): Query of Transformer with shape
+                (num_query, bs, embed_dims).
+            key (torch.Tensor): The key tensor with shape
+                `(num_key, bs, embed_dims)`.
+            value (torch.Tensor): The value tensor with shape
+                `(num_key, bs, embed_dims)`.
+            identity (torch.Tensor): The tensor used for addition, with the
+                same shape as `query`. Default None. If None,
+                `query` will be used.
+            query_pos (torch.Tensor): The positional encoding for `query`.
+                Default: None.
+            key_padding_mask (torch.Tensor): ByteTensor for `query`, with
+                shape [bs, num_key].
+            reference_points (torch.Tensor):  The normalized reference
+                points with shape (bs, num_query, num_levels, 2),
+                all elements is range in [0, 1], top-left (0,0),
+                bottom-right (1, 1), including padding area.
+                or (N, Length_{query}, num_levels, 4), add
+                additional two dimensions is (w, h) to
+                form reference boxes.
+            spatial_shapes (torch.Tensor): Spatial shape of features in
+                different levels. With shape (num_levels, 2),
+                last dimension represents (h, w).
+            level_start_index (torch.Tensor): The start index of each level.
+                A tensor has shape ``(num_levels, )`` and can be represented
+                as [0, h_0*w_0, h_0*w_0+h_1*w_1, ...].
+
+        Returns:
+            torch.Tensor: forwarded results with shape
+            [num_query, bs, embed_dims].
+        """
+
+        if value is None:
+            value = query
+
+        if identity is None:
+            identity = query
+        if query_pos is not None:
+            query = query + query_pos
+        if not self.batch_first:
+            # change to (bs, num_query ,embed_dims)
+            query = query.permute(1, 0, 2)
+            value = value.permute(1, 0, 2)
+
+        bs, num_query, _ = query.shape
+        bs, num_value, _ = value.shape
+        assert (spatial_shapes[:, 0] * spatial_shapes[:, 1]).sum() == num_value
+
+        value = self.value_proj(value)
+        if key_padding_mask is not None:
+            value = value.masked_fill(key_padding_mask[..., None], 0.0)
+        value = value.view(bs, num_value, self.num_heads, -1)
+        sampling_offsets = self.sampling_offsets(query).view(
+            bs, num_query, self.num_heads, self.num_levels, self.num_points, 2)
+        attention_weights = self.attention_weights(query).view(
+            bs, num_query, self.num_heads, self.num_levels * self.num_points)
+        attention_weights = attention_weights.softmax(-1)
+
+        attention_weights = attention_weights.view(bs, num_query,
+                                                   self.num_heads,
+                                                   self.num_levels,
+                                                   self.num_points)
+        if reference_points.shape[-1] == 2:
+            offset_normalizer = torch.stack(
+                [spatial_shapes[..., 1], spatial_shapes[..., 0]], -1)
+            sampling_locations = reference_points[:, :, None, :, None, :] \
+                + sampling_offsets \
+                / offset_normalizer[None, None, None, :, None, :]
+        elif reference_points.shape[-1] == 4:
+            sampling_locations = reference_points[:, :, None, :, None, :2] \
+                + sampling_offsets / self.num_points \
+                * reference_points[:, :, None, :, None, 2:] \
+                * 0.5
+        else:
+            raise ValueError(
+                f'Last dim of reference_points must be'
+                f' 2 or 4, but get {reference_points.shape[-1]} instead.')
+        if ((IS_CUDA_AVAILABLE and value.is_cuda)
+                or (IS_MLU_AVAILABLE and value.is_mlu)):
+            output = MultiScaleDeformableAttnFunction.apply(
+                value, spatial_shapes, level_start_index, sampling_locations,
+                attention_weights, self.im2col_step)
+        else:
+            output = multi_scale_deformable_attn_pytorch(
+                value, spatial_shapes, sampling_locations, attention_weights)
+
+        output = self.output_proj(output)
+
+        if not self.batch_first:
+            # (num_query, bs ,embed_dims)
+            output = output.permute(1, 0, 2)
+
+        return self.dropout(output) + identity
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/nms.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/nms.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d09bf601a8a019597f6a23af73a677b015b94a2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/nms.py
@@ -0,0 +1,471 @@
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import numpy as np
+import torch
+from mmengine.utils import deprecated_api_warning
+from torch import Tensor
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['nms', 'softnms', 'nms_match', 'nms_rotated', 'nms_quadri'])
+
+
+# This function is modified from: https://github.com/pytorch/vision/
+class NMSop(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx: Any, bboxes: Tensor, scores: Tensor, iou_threshold: float,
+                offset: int, score_threshold: float, max_num: int) -> Tensor:
+        is_filtering_by_score = score_threshold > 0
+        if is_filtering_by_score:
+            valid_mask = scores > score_threshold
+            bboxes, scores = bboxes[valid_mask], scores[valid_mask]
+            valid_inds = torch.nonzero(
+                valid_mask, as_tuple=False).squeeze(dim=1)
+
+        inds = ext_module.nms(
+            bboxes, scores, iou_threshold=float(iou_threshold), offset=offset)
+
+        if max_num > 0:
+            inds = inds[:max_num]
+        if is_filtering_by_score:
+            inds = valid_inds[inds]
+        return inds
+
+
+class SoftNMSop(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx: Any, boxes: Tensor, scores: Tensor, iou_threshold: float,
+                sigma: float, min_score: float, method: int,
+                offset: int) -> Tuple[Tensor, Tensor]:
+        dets = boxes.new_empty((boxes.size(0), 5), device='cpu')
+        inds = ext_module.softnms(
+            boxes.cpu(),
+            scores.cpu(),
+            dets.cpu(),
+            iou_threshold=float(iou_threshold),
+            sigma=float(sigma),
+            min_score=float(min_score),
+            method=int(method),
+            offset=int(offset))
+        return dets, inds
+
+    @staticmethod
+    def symbolic(g, boxes, scores, iou_threshold, sigma, min_score, method,
+                 offset):
+        from packaging import version
+        assert version.parse(torch.__version__) >= version.parse('1.7.0')
+        nms_out = g.op(
+            'mmcv::SoftNonMaxSuppression',
+            boxes,
+            scores,
+            iou_threshold_f=float(iou_threshold),
+            sigma_f=float(sigma),
+            min_score_f=float(min_score),
+            method_i=int(method),
+            offset_i=int(offset),
+            outputs=2)
+        return nms_out
+
+
+array_like_type = Union[Tensor, np.ndarray]
+
+
+@deprecated_api_warning({'iou_thr': 'iou_threshold'})
+def nms(boxes: array_like_type,
+        scores: array_like_type,
+        iou_threshold: float,
+        offset: int = 0,
+        score_threshold: float = 0,
+        max_num: int = -1) -> Tuple[array_like_type, array_like_type]:
+    """Dispatch to either CPU or GPU NMS implementations.
+
+    The input can be either torch tensor or numpy array. GPU NMS will be used
+    if the input is gpu tensor, otherwise CPU NMS
+    will be used. The returned type will always be the same as inputs.
+
+    Arguments:
+        boxes (torch.Tensor or np.ndarray): boxes in shape (N, 4).
+        scores (torch.Tensor or np.ndarray): scores in shape (N, ).
+        iou_threshold (float): IoU threshold for NMS.
+        offset (int, 0 or 1): boxes' width or height is (x2 - x1 + offset).
+        score_threshold (float): score threshold for NMS.
+        max_num (int): maximum number of boxes after NMS.
+
+    Returns:
+        tuple: kept dets (boxes and scores) and indice, which always have
+        the same data type as the input.
+
+    Example:
+        >>> boxes = np.array([[49.1, 32.4, 51.0, 35.9],
+        >>>                   [49.3, 32.9, 51.0, 35.3],
+        >>>                   [49.2, 31.8, 51.0, 35.4],
+        >>>                   [35.1, 11.5, 39.1, 15.7],
+        >>>                   [35.6, 11.8, 39.3, 14.2],
+        >>>                   [35.3, 11.5, 39.9, 14.5],
+        >>>                   [35.2, 11.7, 39.7, 15.7]], dtype=np.float32)
+        >>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.5, 0.4, 0.3],\
+               dtype=np.float32)
+        >>> iou_threshold = 0.6
+        >>> dets, inds = nms(boxes, scores, iou_threshold)
+        >>> assert len(inds) == len(dets) == 3
+    """
+    assert isinstance(boxes, (Tensor, np.ndarray))
+    assert isinstance(scores, (Tensor, np.ndarray))
+    is_numpy = False
+    if isinstance(boxes, np.ndarray):
+        is_numpy = True
+        boxes = torch.from_numpy(boxes)
+    if isinstance(scores, np.ndarray):
+        scores = torch.from_numpy(scores)
+    assert boxes.size(1) == 4
+    assert boxes.size(0) == scores.size(0)
+    assert offset in (0, 1)
+
+    inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
+                       max_num)
+    dets = torch.cat((boxes[inds], scores[inds].reshape(-1, 1)), dim=1)
+    if is_numpy:
+        dets = dets.cpu().numpy()
+        inds = inds.cpu().numpy()
+    return dets, inds
+
+
+@deprecated_api_warning({'iou_thr': 'iou_threshold'})
+def soft_nms(boxes: array_like_type,
+             scores: array_like_type,
+             iou_threshold: float = 0.3,
+             sigma: float = 0.5,
+             min_score: float = 1e-3,
+             method: str = 'linear',
+             offset: int = 0) -> Tuple[array_like_type, array_like_type]:
+    """Dispatch to only CPU Soft NMS implementations.
+
+    The input can be either a torch tensor or numpy array.
+    The returned type will always be the same as inputs.
+
+    Args:
+        boxes (torch.Tensor or np.ndarray): boxes in shape (N, 4).
+        scores (torch.Tensor or np.ndarray): scores in shape (N, ).
+        iou_threshold (float): IoU threshold for NMS.
+        sigma (float): hyperparameter for gaussian method
+        min_score (float): score filter threshold
+        method (str): either 'linear' or 'gaussian'
+        offset (int, 0 or 1): boxes' width or height is (x2 - x1 + offset).
+
+    Returns:
+        tuple: kept dets (boxes and scores) and indice, which always have
+        the same data type as the input.
+
+    Example:
+        >>> boxes = np.array([[4., 3., 5., 3.],
+        >>>                   [4., 3., 5., 4.],
+        >>>                   [3., 1., 3., 1.],
+        >>>                   [3., 1., 3., 1.],
+        >>>                   [3., 1., 3., 1.],
+        >>>                   [3., 1., 3., 1.]], dtype=np.float32)
+        >>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.4, 0.0], dtype=np.float32)
+        >>> iou_threshold = 0.6
+        >>> dets, inds = soft_nms(boxes, scores, iou_threshold, sigma=0.5)
+        >>> assert len(inds) == len(dets) == 5
+    """
+
+    assert isinstance(boxes, (Tensor, np.ndarray))
+    assert isinstance(scores, (Tensor, np.ndarray))
+    is_numpy = False
+    if isinstance(boxes, np.ndarray):
+        is_numpy = True
+        boxes = torch.from_numpy(boxes)
+    if isinstance(scores, np.ndarray):
+        scores = torch.from_numpy(scores)
+    assert boxes.size(1) == 4
+    assert boxes.size(0) == scores.size(0)
+    assert offset in (0, 1)
+    method_dict = {'naive': 0, 'linear': 1, 'gaussian': 2}
+    assert method in method_dict.keys()
+
+    if torch.__version__ == 'parrots':
+        dets = boxes.new_empty((boxes.size(0), 5), device='cpu')
+        indata_list = [boxes.cpu(), scores.cpu(), dets.cpu()]
+        indata_dict = {
+            'iou_threshold': float(iou_threshold),
+            'sigma': float(sigma),
+            'min_score': min_score,
+            'method': method_dict[method],
+            'offset': int(offset)
+        }
+        inds = ext_module.softnms(*indata_list, **indata_dict)
+    else:
+        dets, inds = SoftNMSop.apply(boxes.cpu(), scores.cpu(),
+                                     float(iou_threshold), float(sigma),
+                                     float(min_score), method_dict[method],
+                                     int(offset))
+
+    dets = dets[:inds.size(0)]
+
+    if is_numpy:
+        dets = dets.cpu().numpy()
+        inds = inds.cpu().numpy()
+        return dets, inds
+    else:
+        return dets.to(device=boxes.device), inds.to(device=boxes.device)
+
+
+def batched_nms(boxes: Tensor,
+                scores: Tensor,
+                idxs: Tensor,
+                nms_cfg: Optional[Dict],
+                class_agnostic: bool = False) -> Tuple[Tensor, Tensor]:
+    r"""Performs non-maximum suppression in a batched fashion.
+
+    Modified from `torchvision/ops/boxes.py#L39
+    <https://github.com/pytorch/vision/blob/
+    505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39>`_.
+    In order to perform NMS independently per class, we add an offset to all
+    the boxes. The offset is dependent only on the class idx, and is large
+    enough so that boxes from different classes do not overlap.
+
+    Note:
+        In v1.4.1 and later, ``batched_nms`` supports skipping the NMS and
+        returns sorted raw results when `nms_cfg` is None.
+
+    Args:
+        boxes (torch.Tensor): boxes in shape (N, 4) or (N, 5).
+        scores (torch.Tensor): scores in shape (N, ).
+        idxs (torch.Tensor): each index value correspond to a bbox cluster,
+            and NMS will not be applied between elements of different idxs,
+            shape (N, ).
+        nms_cfg (dict | optional): Supports skipping the nms when `nms_cfg`
+            is None, otherwise it should specify nms type and other
+            parameters like `iou_thr`. Possible keys includes the following.
+
+            - iou_threshold (float): IoU threshold used for NMS.
+            - split_thr (float): threshold number of boxes. In some cases the
+              number of boxes is large (e.g., 200k). To avoid OOM during
+              training, the users could set `split_thr` to a small value.
+              If the number of boxes is greater than the threshold, it will
+              perform NMS on each group of boxes separately and sequentially.
+              Defaults to 10000.
+        class_agnostic (bool): if true, nms is class agnostic,
+            i.e. IoU thresholding happens over all boxes,
+            regardless of the predicted class. Defaults to False.
+
+    Returns:
+        tuple: kept dets and indice.
+
+        - boxes (Tensor): Bboxes with score after nms, has shape
+          (num_bboxes, 5). last dimension 5 arrange as
+          (x1, y1, x2, y2, score)
+        - keep (Tensor): The indices of remaining boxes in input
+          boxes.
+    """
+    # skip nms when nms_cfg is None
+    if nms_cfg is None:
+        scores, inds = scores.sort(descending=True)
+        boxes = boxes[inds]
+        return torch.cat([boxes, scores[:, None]], -1), inds
+
+    nms_cfg_ = nms_cfg.copy()
+    class_agnostic = nms_cfg_.pop('class_agnostic', class_agnostic)
+    if class_agnostic:
+        boxes_for_nms = boxes
+    else:
+        # When using rotated boxes, only apply offsets on center.
+        if boxes.size(-1) == 5:
+            # Strictly, the maximum coordinates of the rotating box
+            # (x,y,w,h,a) should be calculated by polygon coordinates.
+            # But the conversion from rotated box to polygon will
+            # slow down the speed.
+            # So we use max(x,y) + max(w,h) as max coordinate
+            # which is larger than polygon max coordinate
+            # max(x1, y1, x2, y2,x3, y3, x4, y4)
+            max_coordinate = boxes[..., :2].max() + boxes[..., 2:4].max()
+            offsets = idxs.to(boxes) * (
+                max_coordinate + torch.tensor(1).to(boxes))
+            boxes_ctr_for_nms = boxes[..., :2] + offsets[:, None]
+            boxes_for_nms = torch.cat([boxes_ctr_for_nms, boxes[..., 2:5]],
+                                      dim=-1)
+        else:
+            max_coordinate = boxes.max()
+            offsets = idxs.to(boxes) * (
+                max_coordinate + torch.tensor(1).to(boxes))
+            boxes_for_nms = boxes + offsets[:, None]
+
+    nms_type = nms_cfg_.pop('type', 'nms')
+    nms_op = eval(nms_type)
+
+    split_thr = nms_cfg_.pop('split_thr', 10000)
+    # Won't split to multiple nms nodes when exporting to onnx
+    if boxes_for_nms.shape[0] < split_thr:
+        dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
+        boxes = boxes[keep]
+
+        # This assumes `dets` has arbitrary dimensions where
+        # the last dimension is score.
+        # Currently it supports bounding boxes [x1, y1, x2, y2, score] or
+        # rotated boxes [cx, cy, w, h, angle_radian, score].
+
+        scores = dets[:, -1]
+    else:
+        max_num = nms_cfg_.pop('max_num', -1)
+        total_mask = scores.new_zeros(scores.size(), dtype=torch.bool)
+        # Some type of nms would reweight the score, such as SoftNMS
+        scores_after_nms = scores.new_zeros(scores.size())
+        for id in torch.unique(idxs):
+            mask = (idxs == id).nonzero(as_tuple=False).view(-1)
+            dets, keep = nms_op(boxes_for_nms[mask], scores[mask], **nms_cfg_)
+            total_mask[mask[keep]] = True
+            scores_after_nms[mask[keep]] = dets[:, -1]
+        keep = total_mask.nonzero(as_tuple=False).view(-1)
+
+        scores, inds = scores_after_nms[keep].sort(descending=True)
+        keep = keep[inds]
+        boxes = boxes[keep]
+
+        if max_num > 0:
+            keep = keep[:max_num]
+            boxes = boxes[:max_num]
+            scores = scores[:max_num]
+
+    boxes = torch.cat([boxes, scores[:, None]], -1)
+    return boxes, keep
+
+
+def nms_match(dets: array_like_type,
+              iou_threshold: float) -> List[array_like_type]:
+    """Matched dets into different groups by NMS.
+
+    NMS match is Similar to NMS but when a bbox is suppressed, nms match will
+    record the indice of suppressed bbox and form a group with the indice of
+    kept bbox. In each group, indice is sorted as score order.
+
+    Args:
+        dets (torch.Tensor | np.ndarray): Det boxes with scores, shape (N, 5).
+        iou_threshold (float): IoU thresh for NMS.
+
+    Returns:
+        list[torch.Tensor | np.ndarray]: The outer list corresponds different
+        matched group, the inner Tensor corresponds the indices for a group
+        in score order.
+    """
+    if dets.shape[0] == 0:
+        matched = []
+    else:
+        assert dets.shape[-1] == 5, 'inputs dets.shape should be (N, 5), ' \
+                                    f'but get {dets.shape}'
+        if isinstance(dets, Tensor):
+            dets_t = dets.detach().cpu()
+        else:
+            dets_t = torch.from_numpy(dets)
+        indata_list = [dets_t]
+        indata_dict = {'iou_threshold': float(iou_threshold)}
+        matched = ext_module.nms_match(*indata_list, **indata_dict)
+        if torch.__version__ == 'parrots':
+            matched = matched.tolist()  # type: ignore
+
+    if isinstance(dets, Tensor):
+        return [dets.new_tensor(m, dtype=torch.long) for m in matched]
+    else:
+        return [np.array(m, dtype=int) for m in matched]
+
+
+def nms_rotated(dets: Tensor,
+                scores: Tensor,
+                iou_threshold: float,
+                labels: Optional[Tensor] = None,
+                clockwise: bool = True) -> Tuple[Tensor, Tensor]:
+    """Performs non-maximum suppression (NMS) on the rotated boxes according to
+    their intersection-over-union (IoU).
+
+    Rotated NMS iteratively removes lower scoring rotated boxes which have an
+    IoU greater than iou_threshold with another (higher scoring) rotated box.
+
+    Args:
+        dets (torch.Tensor):  Rotated boxes in shape (N, 5).
+            They are expected to be in
+            (x_ctr, y_ctr, width, height, angle_radian) format.
+        scores (torch.Tensor): scores in shape (N, ).
+        iou_threshold (float): IoU thresh for NMS.
+        labels (torch.Tensor, optional): boxes' label in shape (N,).
+        clockwise (bool): flag indicating whether the positive angular
+            orientation is clockwise. default True.
+            `New in version 1.4.3.`
+
+    Returns:
+        tuple: kept dets(boxes and scores) and indice, which is always the
+        same data type as the input.
+    """
+    if dets.shape[0] == 0:
+        return dets, None
+    if not clockwise:
+        flip_mat = dets.new_ones(dets.shape[-1])
+        flip_mat[-1] = -1
+        dets_cw = dets * flip_mat
+    else:
+        dets_cw = dets
+    multi_label = labels is not None
+    if multi_label:
+        dets_wl = torch.cat((dets_cw, labels.unsqueeze(1)), 1)  # type: ignore
+    else:
+        dets_wl = dets_cw
+    _, order = scores.sort(0, descending=True)
+    dets_sorted = dets_wl.index_select(0, order)
+
+    if torch.__version__ == 'parrots':
+        keep_inds = ext_module.nms_rotated(
+            dets_wl,
+            scores,
+            order,
+            dets_sorted,
+            iou_threshold=iou_threshold,
+            multi_label=multi_label)
+    else:
+        keep_inds = ext_module.nms_rotated(dets_wl, scores, order, dets_sorted,
+                                           iou_threshold, multi_label)
+    dets = torch.cat((dets[keep_inds], scores[keep_inds].reshape(-1, 1)),
+                     dim=1)
+    return dets, keep_inds
+
+
+def nms_quadri(dets: Tensor,
+               scores: Tensor,
+               iou_threshold: float,
+               labels: Optional[Tensor] = None) -> Tuple[Tensor, Tensor]:
+    """Performs non-maximum suppression (NMS) on the quadrilateral boxes
+    according to their intersection-over-union (IoU).
+
+    Quadri NMS iteratively removes lower scoring quadrilateral boxes
+    which have an IoU greater than iou_threshold with another (higher
+    scoring) quadrilateral box.
+
+    Args:
+        dets (torch.Tensor):  Quadri boxes in shape (N, 8).
+            They are expected to be in
+            (x1, y1, ..., x4, y4) format.
+        scores (torch.Tensor): scores in shape (N, ).
+        iou_threshold (float): IoU thresh for NMS.
+        labels (torch.Tensor, optional): boxes' label in shape (N,).
+
+    Returns:
+        tuple: kept dets(boxes and scores) and indice, which is always the
+        same data type as the input.
+    """
+    if dets.shape[0] == 0:
+        return dets, None
+
+    multi_label = labels is not None
+    if multi_label:
+        dets_with_lables = \
+            torch.cat((dets, labels.unsqueeze(1)), 1)  # type: ignore
+    else:
+        dets_with_lables = dets
+    _, order = scores.sort(0, descending=True)
+    dets_sorted = dets_with_lables.index_select(0, order)
+
+    keep_inds = ext_module.nms_quadri(dets_with_lables, scores, order,
+                                      dets_sorted, iou_threshold, multi_label)
+    dets = torch.cat((dets[keep_inds], scores[keep_inds].reshape(-1, 1)),
+                     dim=1)
+    return dets, keep_inds
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/pixel_group.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/pixel_group.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf73e326da8f46bf899b84955d0b911dd3f65014
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/pixel_group.py
@@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Union
+
+import numpy as np
+import torch
+from torch import Tensor
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['pixel_group'])
+
+
+def pixel_group(
+    score: Union[np.ndarray, Tensor],
+    mask: Union[np.ndarray, Tensor],
+    embedding: Union[np.ndarray, Tensor],
+    kernel_label: Union[np.ndarray, Tensor],
+    kernel_contour: Union[np.ndarray, Tensor],
+    kernel_region_num: int,
+    distance_threshold: float,
+) -> List[List[float]]:
+    """Group pixels into text instances, which is widely used text detection
+    methods.
+
+    Arguments:
+        score (np.array or torch.Tensor): The foreground score with size hxw.
+        mask (np.array or Tensor): The foreground mask with size hxw.
+        embedding (np.array or torch.Tensor): The embedding with size hxwxc to
+            distinguish instances.
+        kernel_label (np.array or torch.Tensor): The instance kernel index with
+            size hxw.
+        kernel_contour (np.array or torch.Tensor): The kernel contour with
+            size hxw.
+        kernel_region_num (int): The instance kernel region number.
+        distance_threshold (float): The embedding distance threshold between
+            kernel and pixel in one instance.
+
+    Returns:
+        list[list[float]]: The instance coordinates and attributes list. Each
+        element consists of averaged confidence, pixel number, and coordinates
+        (x_i, y_i for all pixels) in order.
+    """
+    assert isinstance(score, (torch.Tensor, np.ndarray))
+    assert isinstance(mask, (torch.Tensor, np.ndarray))
+    assert isinstance(embedding, (torch.Tensor, np.ndarray))
+    assert isinstance(kernel_label, (torch.Tensor, np.ndarray))
+    assert isinstance(kernel_contour, (torch.Tensor, np.ndarray))
+    assert isinstance(kernel_region_num, int)
+    assert isinstance(distance_threshold, float)
+
+    if isinstance(score, np.ndarray):
+        score = torch.from_numpy(score)
+    if isinstance(mask, np.ndarray):
+        mask = torch.from_numpy(mask)
+    if isinstance(embedding, np.ndarray):
+        embedding = torch.from_numpy(embedding)
+    if isinstance(kernel_label, np.ndarray):
+        kernel_label = torch.from_numpy(kernel_label)
+    if isinstance(kernel_contour, np.ndarray):
+        kernel_contour = torch.from_numpy(kernel_contour)
+
+    if torch.__version__ == 'parrots':
+        label = ext_module.pixel_group(
+            score,
+            mask,
+            embedding,
+            kernel_label,
+            kernel_contour,
+            kernel_region_num=kernel_region_num,
+            distance_threshold=distance_threshold)
+        label = label.tolist()
+        label = label[0]
+        list_index = kernel_region_num
+        pixel_assignment = []
+        for x in range(kernel_region_num):
+            pixel_assignment.append(
+                np.array(
+                    label[list_index:list_index + int(label[x])],
+                    dtype=np.float))
+            list_index = list_index + int(label[x])
+    else:
+        pixel_assignment = ext_module.pixel_group(score, mask, embedding,
+                                                  kernel_label, kernel_contour,
+                                                  kernel_region_num,
+                                                  distance_threshold)
+    return pixel_assignment
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/point_sample.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/point_sample.py
new file mode 100644
index 0000000000000000000000000000000000000000..38112531285c865b706cef6f073975ab3329940c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/point_sample.py
@@ -0,0 +1,332 @@
+# Modified from https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend  # noqa
+
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+from torch.nn.modules.utils import _pair
+
+
+def bilinear_grid_sample(im: Tensor,
+                         grid: Tensor,
+                         align_corners: bool = False) -> Tensor:
+    """Given an input and a flow-field grid, computes the output using input
+    values and pixel locations from grid. Supported only bilinear interpolation
+    method to sample the input pixels.
+
+    Args:
+        im (torch.Tensor): Input feature map, shape (N, C, H, W)
+        grid (torch.Tensor): Point coordinates, shape (N, Hg, Wg, 2)
+        align_corners (bool): If set to True, the extrema (-1 and 1) are
+            considered as referring to the center points of the input’s
+            corner pixels. If set to False, they are instead considered as
+            referring to the corner points of the input’s corner pixels,
+            making the sampling more resolution agnostic.
+
+    Returns:
+        torch.Tensor: A tensor with sampled points, shape (N, C, Hg, Wg)
+    """
+    n, c, h, w = im.shape
+    gn, gh, gw, _ = grid.shape
+    assert n == gn
+
+    x = grid[:, :, :, 0]
+    y = grid[:, :, :, 1]
+
+    if align_corners:
+        x = ((x + 1) / 2) * (w - 1)
+        y = ((y + 1) / 2) * (h - 1)
+    else:
+        x = ((x + 1) * w - 1) / 2
+        y = ((y + 1) * h - 1) / 2
+
+    x = x.view(n, -1)
+    y = y.view(n, -1)
+
+    x0 = torch.floor(x).long()
+    y0 = torch.floor(y).long()
+    x1 = x0 + 1
+    y1 = y0 + 1
+
+    wa = ((x1 - x) * (y1 - y)).unsqueeze(1)
+    wb = ((x1 - x) * (y - y0)).unsqueeze(1)
+    wc = ((x - x0) * (y1 - y)).unsqueeze(1)
+    wd = ((x - x0) * (y - y0)).unsqueeze(1)
+
+    # Apply default for grid_sample function zero padding
+    im_padded = F.pad(im, pad=[1, 1, 1, 1], mode='constant', value=0)
+    padded_h = h + 2
+    padded_w = w + 2
+    # save points positions after padding
+    x0, x1, y0, y1 = x0 + 1, x1 + 1, y0 + 1, y1 + 1
+
+    # Clip coordinates to padded image size
+    x0 = torch.where(x0 < 0, torch.tensor(0), x0)
+    x0 = torch.where(x0 > padded_w - 1, torch.tensor(padded_w - 1), x0)
+    x1 = torch.where(x1 < 0, torch.tensor(0), x1)
+    x1 = torch.where(x1 > padded_w - 1, torch.tensor(padded_w - 1), x1)
+    y0 = torch.where(y0 < 0, torch.tensor(0), y0)
+    y0 = torch.where(y0 > padded_h - 1, torch.tensor(padded_h - 1), y0)
+    y1 = torch.where(y1 < 0, torch.tensor(0), y1)
+    y1 = torch.where(y1 > padded_h - 1, torch.tensor(padded_h - 1), y1)
+
+    im_padded = im_padded.view(n, c, -1)
+
+    x0_y0 = (x0 + y0 * padded_w).unsqueeze(1).expand(-1, c, -1)
+    x0_y1 = (x0 + y1 * padded_w).unsqueeze(1).expand(-1, c, -1)
+    x1_y0 = (x1 + y0 * padded_w).unsqueeze(1).expand(-1, c, -1)
+    x1_y1 = (x1 + y1 * padded_w).unsqueeze(1).expand(-1, c, -1)
+
+    Ia = torch.gather(im_padded, 2, x0_y0)
+    Ib = torch.gather(im_padded, 2, x0_y1)
+    Ic = torch.gather(im_padded, 2, x1_y0)
+    Id = torch.gather(im_padded, 2, x1_y1)
+
+    return (Ia * wa + Ib * wb + Ic * wc + Id * wd).reshape(n, c, gh, gw)
+
+
+def normalize(grid: Tensor) -> Tensor:
+    """Normalize input grid from [-1, 1] to [0, 1]
+
+    Args:
+        grid (torch.Tensor): The grid to be normalize, range [-1, 1].
+
+    Returns:
+        torch.Tensor: Normalized grid, range [0, 1].
+    """
+
+    return (grid + 1.0) / 2.0
+
+
+def denormalize(grid: Tensor) -> Tensor:
+    """Denormalize input grid from range [0, 1] to [-1, 1]
+
+    Args:
+        grid (torch.Tensor): The grid to be denormalize, range [0, 1].
+
+    Returns:
+        torch.Tensor: Denormalized grid, range [-1, 1].
+    """
+
+    return grid * 2.0 - 1.0
+
+
+def generate_grid(num_grid: int, size: Tuple[int, int],
+                  device: torch.device) -> Tensor:
+    """Generate regular square grid of points in [0, 1] x [0, 1] coordinate
+    space.
+
+    Args:
+        num_grid (int): The number of grids to sample, one for each region.
+        size (tuple[int, int]): The side size of the regular grid.
+        device (torch.device): Desired device of returned tensor.
+
+    Returns:
+        torch.Tensor: A tensor of shape (num_grid, size[0]*size[1], 2) that
+        contains coordinates for the regular grids.
+    """
+
+    affine_trans = torch.tensor([[[1., 0., 0.], [0., 1., 0.]]], device=device)
+    grid = F.affine_grid(
+        affine_trans, torch.Size((1, 1, *size)), align_corners=False)
+    grid = normalize(grid)
+    return grid.view(1, -1, 2).expand(num_grid, -1, -1)
+
+
+def rel_roi_point_to_abs_img_point(rois: Tensor,
+                                   rel_roi_points: Tensor) -> Tensor:
+    """Convert roi based relative point coordinates to image based absolute
+    point coordinates.
+
+    Args:
+        rois (torch.Tensor): RoIs or BBoxes, shape (N, 4) or (N, 5)
+        rel_roi_points (torch.Tensor): Point coordinates inside RoI, relative
+            to RoI, location, range (0, 1), shape (N, P, 2)
+    Returns:
+        torch.Tensor: Image based absolute point coordinates, shape (N, P, 2)
+    """
+
+    with torch.no_grad():
+        assert rel_roi_points.size(0) == rois.size(0)
+        assert rois.dim() == 2
+        assert rel_roi_points.dim() == 3
+        assert rel_roi_points.size(2) == 2
+        # remove batch idx
+        if rois.size(1) == 5:
+            rois = rois[:, 1:]
+        abs_img_points = rel_roi_points.clone()
+        # To avoid an error during exporting to onnx use independent
+        # variables instead inplace computation
+        xs = abs_img_points[:, :, 0] * (rois[:, None, 2] - rois[:, None, 0])
+        ys = abs_img_points[:, :, 1] * (rois[:, None, 3] - rois[:, None, 1])
+        xs += rois[:, None, 0]
+        ys += rois[:, None, 1]
+        abs_img_points = torch.stack([xs, ys], dim=2)
+    return abs_img_points
+
+
+def get_shape_from_feature_map(x: Tensor) -> Tensor:
+    """Get spatial resolution of input feature map considering exporting to
+    onnx mode.
+
+    Args:
+        x (torch.Tensor): Input tensor, shape (N, C, H, W)
+
+    Returns:
+        torch.Tensor: Spatial resolution (width, height), shape (1, 1, 2)
+    """
+    img_shape = torch.tensor(x.shape[2:]).flip(0).view(1, 1,
+                                                       2).to(x.device).float()
+    return img_shape
+
+
+def abs_img_point_to_rel_img_point(abs_img_points: Tensor,
+                                   img: Union[tuple, Tensor],
+                                   spatial_scale: float = 1.) -> Tensor:
+    """Convert image based absolute point coordinates to image based relative
+    coordinates for sampling.
+
+    Args:
+        abs_img_points (torch.Tensor): Image based absolute point coordinates,
+            shape (N, P, 2)
+        img (tuple or torch.Tensor): (height, width) of image or feature map.
+        spatial_scale (float, optional): Scale points by this factor.
+            Default: 1.
+
+    Returns:
+        Tensor: Image based relative point coordinates for sampling, shape
+        (N, P, 2).
+    """
+
+    assert (isinstance(img, tuple) and len(img) == 2) or \
+           (isinstance(img, torch.Tensor) and len(img.shape) == 4)
+
+    if isinstance(img, tuple):
+        h, w = img
+        scale = torch.tensor([w, h],
+                             dtype=torch.float,
+                             device=abs_img_points.device)
+        scale = scale.view(1, 1, 2)
+    else:
+        scale = get_shape_from_feature_map(img)
+
+    return abs_img_points / scale * spatial_scale
+
+
+def rel_roi_point_to_rel_img_point(rois: Tensor,
+                                   rel_roi_points: Tensor,
+                                   img: Union[tuple, Tensor],
+                                   spatial_scale: float = 1.) -> Tensor:
+    """Convert roi based relative point coordinates to image based absolute
+    point coordinates.
+
+    Args:
+        rois (torch.Tensor): RoIs or BBoxes, shape (N, 4) or (N, 5)
+        rel_roi_points (torch.Tensor): Point coordinates inside RoI, relative
+            to RoI, location, range (0, 1), shape (N, P, 2)
+        img (tuple or torch.Tensor): (height, width) of image or feature map.
+        spatial_scale (float, optional): Scale points by this factor.
+            Default: 1.
+
+    Returns:
+        torch.Tensor: Image based relative point coordinates for sampling,
+        shape (N, P, 2).
+    """
+
+    abs_img_point = rel_roi_point_to_abs_img_point(rois, rel_roi_points)
+    rel_img_point = abs_img_point_to_rel_img_point(abs_img_point, img,
+                                                   spatial_scale)
+
+    return rel_img_point
+
+
+def point_sample(input: Tensor,
+                 points: Tensor,
+                 align_corners: bool = False,
+                 **kwargs) -> Tensor:
+    """A wrapper around :func:`grid_sample` to support 3D point_coords tensors
+    Unlike :func:`torch.nn.functional.grid_sample` it assumes point_coords to
+    lie inside ``[0, 1] x [0, 1]`` square.
+
+    Args:
+        input (torch.Tensor): Feature map, shape (N, C, H, W).
+        points (torch.Tensor): Image based absolute point coordinates
+            (normalized), range [0, 1] x [0, 1], shape (N, P, 2) or
+            (N, Hgrid, Wgrid, 2).
+        align_corners (bool, optional): Whether align_corners.
+            Default: False
+
+    Returns:
+        torch.Tensor: Features of `point` on `input`, shape (N, C, P) or
+        (N, C, Hgrid, Wgrid).
+    """
+
+    add_dim = False
+    if points.dim() == 3:
+        add_dim = True
+        points = points.unsqueeze(2)
+    output = F.grid_sample(
+        input, denormalize(points), align_corners=align_corners, **kwargs)
+    if add_dim:
+        output = output.squeeze(3)
+    return output
+
+
+class SimpleRoIAlign(nn.Module):
+
+    def __init__(self,
+                 output_size: Tuple[int],
+                 spatial_scale: float,
+                 aligned: bool = True) -> None:
+        """Simple RoI align in PointRend, faster than standard RoIAlign.
+
+        Args:
+            output_size (tuple[int]): h, w
+            spatial_scale (float): scale the input boxes by this number
+            aligned (bool): if False, use the legacy implementation in
+                MMDetection, align_corners=True will be used in F.grid_sample.
+                If True, align the results more perfectly.
+        """
+
+        super().__init__()
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+        # to be consistent with other RoI ops
+        self.use_torchvision = False
+        self.aligned = aligned
+
+    def forward(self, features: Tensor, rois: Tensor) -> Tensor:
+        num_imgs = features.size(0)
+        num_rois = rois.size(0)
+        rel_roi_points = generate_grid(
+            num_rois, self.output_size, device=rois.device)
+
+        point_feats = []
+        for batch_ind in range(num_imgs):
+            # unravel batch dim
+            feat = features[batch_ind].unsqueeze(0)
+            inds = (rois[:, 0].long() == batch_ind)
+            if inds.any():
+                rel_img_points = rel_roi_point_to_rel_img_point(
+                    rois[inds], rel_roi_points[inds], feat,
+                    self.spatial_scale).unsqueeze(0)
+                point_feat = point_sample(
+                    feat, rel_img_points, align_corners=not self.aligned)
+                point_feat = point_feat.squeeze(0).transpose(0, 1)
+                point_feats.append(point_feat)
+
+        point_feats_t = torch.cat(point_feats, dim=0)
+
+        channels = features.size(1)
+        roi_feats = point_feats_t.reshape(num_rois, channels,
+                                          *self.output_size)
+
+        return roi_feats
+
+    def __repr__(self) -> str:
+        format_str = self.__class__.__name__
+        format_str += '(output_size={}, spatial_scale={}'.format(
+            self.output_size, self.spatial_scale)
+        return format_str
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_boxes.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_boxes.py
new file mode 100644
index 0000000000000000000000000000000000000000..4915e6b573923fe40658d9dca09b39da9dcb31ed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_boxes.py
@@ -0,0 +1,137 @@
+import torch
+from torch import Tensor
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'points_in_boxes_part_forward', 'points_in_boxes_cpu_forward',
+    'points_in_boxes_all_forward'
+])
+
+
+def points_in_boxes_part(points: Tensor, boxes: Tensor) -> Tensor:
+    """Find the box in which each point is (CUDA).
+
+    Args:
+        points (torch.Tensor): [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate.
+        boxes (torch.Tensor): [B, T, 7],
+            num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz] in
+            LiDAR/DEPTH coordinate, (x, y, z) is the bottom center.
+
+    Returns:
+        torch.Tensor: Return the box indices of points with the shape of
+        (B, M). Default background = -1.
+    """
+    assert points.shape[0] == boxes.shape[0], \
+        'Points and boxes should have the same batch size, ' \
+        f'but got {points.shape[0]} and {boxes.shape[0]}'
+    assert boxes.shape[2] == 7, \
+        'boxes dimension should be 7, ' \
+        f'but got unexpected shape {boxes.shape[2]}'
+    assert points.shape[2] == 3, \
+        'points dimension should be 3, ' \
+        f'but got unexpected shape {points.shape[2]}'
+    batch_size, num_points, _ = points.shape
+
+    box_idxs_of_pts = points.new_zeros((batch_size, num_points),
+                                       dtype=torch.int).fill_(-1)
+
+    # If manually put the tensor 'points' or 'boxes' on a device
+    # which is not the current device, some temporary variables
+    # will be created on the current device in the cuda op,
+    # and the output will be incorrect.
+    # Therefore, we force the current device to be the same
+    # as the device of the tensors if it was not.
+    # Please refer to https://github.com/open-mmlab/mmdetection3d/issues/305
+    # for the incorrect output before the fix.
+    points_device = points.get_device()
+    assert points_device == boxes.get_device(), \
+        'Points and boxes should be put on the same device'
+    if torch.cuda.current_device() != points_device:
+        torch.cuda.set_device(points_device)
+
+    ext_module.points_in_boxes_part_forward(boxes.contiguous(),
+                                            points.contiguous(),
+                                            box_idxs_of_pts)
+
+    return box_idxs_of_pts
+
+
+def points_in_boxes_cpu(points: Tensor, boxes: Tensor) -> Tensor:
+    """Find all boxes in which each point is (CPU). The CPU version of
+    :meth:`points_in_boxes_all`.
+
+    Args:
+        points (torch.Tensor): [B, M, 3], [x, y, z] in
+            LiDAR/DEPTH coordinate
+        boxes (torch.Tensor): [B, T, 7],
+            num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz],
+            (x, y, z) is the bottom center.
+
+    Returns:
+        torch.Tensor: Return the box indices of points with the shape of
+        (B, M, T). Default background = 0.
+    """
+    assert points.shape[0] == boxes.shape[0], \
+        'Points and boxes should have the same batch size, ' \
+        f'but got {points.shape[0]} and {boxes.shape[0]}'
+    assert boxes.shape[2] == 7, \
+        'boxes dimension should be 7, ' \
+        f'but got unexpected shape {boxes.shape[2]}'
+    assert points.shape[2] == 3, \
+        'points dimension should be 3, ' \
+        f'but got unexpected shape {points.shape[2]}'
+    batch_size, num_points, _ = points.shape
+    num_boxes = boxes.shape[1]
+
+    point_indices = points.new_zeros((batch_size, num_boxes, num_points),
+                                     dtype=torch.int)
+    for b in range(batch_size):
+        ext_module.points_in_boxes_cpu_forward(boxes[b].float().contiguous(),
+                                               points[b].float().contiguous(),
+                                               point_indices[b])
+    point_indices = point_indices.transpose(1, 2)
+
+    return point_indices
+
+
+def points_in_boxes_all(points: Tensor, boxes: Tensor) -> Tensor:
+    """Find all boxes in which each point is (CUDA).
+
+    Args:
+        points (torch.Tensor): [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate
+        boxes (torch.Tensor): [B, T, 7],
+            num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz],
+            (x, y, z) is the bottom center.
+
+    Returns:
+        torch.Tensor: Return the box indices of points with the shape of
+        (B, M, T). Default background = 0.
+    """
+    assert boxes.shape[0] == points.shape[0], \
+        'Points and boxes should have the same batch size, ' \
+        f'but got {boxes.shape[0]} and {boxes.shape[0]}'
+    assert boxes.shape[2] == 7, \
+        'boxes dimension should be 7, ' \
+        f'but got unexpected shape {boxes.shape[2]}'
+    assert points.shape[2] == 3, \
+        'points dimension should be 3, ' \
+        f'but got unexpected shape {points.shape[2]}'
+    batch_size, num_points, _ = points.shape
+    num_boxes = boxes.shape[1]
+
+    box_idxs_of_pts = points.new_zeros((batch_size, num_points, num_boxes),
+                                       dtype=torch.int).fill_(0)
+
+    # Same reason as line 25-32
+    points_device = points.get_device()
+    assert points_device == boxes.get_device(), \
+        'Points and boxes should be put on the same device'
+    if torch.cuda.current_device() != points_device:
+        torch.cuda.set_device(points_device)
+
+    ext_module.points_in_boxes_all_forward(boxes.contiguous(),
+                                           points.contiguous(),
+                                           box_idxs_of_pts)
+
+    return box_idxs_of_pts
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_polygons.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_polygons.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c07aba10cf7bd43773ab41365de730184c98ff5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_in_polygons.py
@@ -0,0 +1,38 @@
+import torch
+from torch import Tensor
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['points_in_polygons_forward'])
+
+
+def points_in_polygons(points: Tensor, polygons: Tensor) -> Tensor:
+    """Judging whether points are inside polygons, which is used in the ATSS
+    assignment for the rotated boxes.
+
+    It should be noted that when the point is just at the polygon boundary, the
+    judgment will be inaccurate, but the effect on assignment is limited.
+
+    Args:
+        points (torch.Tensor): It has shape (B, 2), indicating (x, y).
+            M means the number of predicted points.
+        polygons (torch.Tensor): It has shape (M, 8), indicating
+            (x1, y1, x2, y2, x3, y3, x4, y4). M means the number of
+            ground truth polygons.
+
+    Returns:
+        torch.Tensor: Return the result with the shape of (B, M),
+        1 indicates that the point is inside the polygon,
+        0 indicates that the point is outside the polygon.
+    """
+    assert points.shape[1] == 2, \
+        'points dimension should be 2, ' \
+        f'but got unexpected shape {points.shape[1]}'
+    assert polygons.shape[1] == 8, \
+        'polygons dimension should be 8, ' \
+        f'but got unexpected shape {polygons.shape[1]}'
+    output = torch.full([points.shape[0], polygons.shape[0]],
+                        0.).float().cuda()
+    ext_module.points_in_polygons_forward(points.contiguous(),
+                                          polygons.contiguous(), output)
+    return output
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_sampler.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_sampler.py
new file mode 100644
index 0000000000000000000000000000000000000000..776abc76eccfb6c233ad9b481a230f5d1e22982f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/points_sampler.py
@@ -0,0 +1,178 @@
+from typing import List
+
+import torch
+from torch import Tensor
+from torch import nn as nn
+
+from .furthest_point_sample import (furthest_point_sample,
+                                    furthest_point_sample_with_dist)
+
+
+def calc_square_dist(point_feat_a: Tensor,
+                     point_feat_b: Tensor,
+                     norm: bool = True) -> Tensor:
+    """Calculating square distance between a and b.
+
+    Args:
+        point_feat_a (torch.Tensor): (B, N, C) Feature vector of each point.
+        point_feat_b (torch.Tensor): (B, M, C) Feature vector of each point.
+        norm (bool, optional): Whether to normalize the distance.
+            Default: True.
+
+    Returns:
+        torch.Tensor: (B, N, M) Square distance between each point pair.
+    """
+    num_channel = point_feat_a.shape[-1]
+    dist = torch.cdist(point_feat_a, point_feat_b)
+    if norm:
+        dist = dist / num_channel
+    else:
+        dist = torch.square(dist)
+    return dist
+
+
+def get_sampler_cls(sampler_type: str) -> nn.Module:
+    """Get the type and mode of points sampler.
+
+    Args:
+        sampler_type (str): The type of points sampler.
+            The valid value are "D-FPS", "F-FPS", or "FS".
+
+    Returns:
+        class: Points sampler type.
+    """
+    sampler_mappings = {
+        'D-FPS': DFPSSampler,
+        'F-FPS': FFPSSampler,
+        'FS': FSSampler,
+    }
+    try:
+        return sampler_mappings[sampler_type]
+    except KeyError:
+        raise KeyError(
+            f'Supported `sampler_type` are {sampler_mappings.keys()}, but got \
+                {sampler_type}')
+
+
+class PointsSampler(nn.Module):
+    """Points sampling.
+
+    Args:
+        num_point (list[int]): Number of sample points.
+        fps_mod_list (list[str], optional): Type of FPS method, valid mod
+            ['F-FPS', 'D-FPS', 'FS'], Default: ['D-FPS'].
+            F-FPS: using feature distances for FPS.
+            D-FPS: using Euclidean distances of points for FPS.
+            FS: using F-FPS and D-FPS simultaneously.
+        fps_sample_range_list (list[int], optional):
+            Range of points to apply FPS. Default: [-1].
+    """
+
+    def __init__(self,
+                 num_point: List[int],
+                 fps_mod_list: List[str] = ['D-FPS'],
+                 fps_sample_range_list: List[int] = [-1]) -> None:
+        super().__init__()
+        # FPS would be applied to different fps_mod in the list,
+        # so the length of the num_point should be equal to
+        # fps_mod_list and fps_sample_range_list.
+        assert len(num_point) == len(fps_mod_list) == len(
+            fps_sample_range_list)
+        self.num_point = num_point
+        self.fps_sample_range_list = fps_sample_range_list
+        self.samplers = nn.ModuleList()
+        for fps_mod in fps_mod_list:
+            self.samplers.append(get_sampler_cls(fps_mod)())
+        self.fp16_enabled = False
+
+    def forward(self, points_xyz: Tensor, features: Tensor) -> Tensor:
+        """
+        Args:
+            points_xyz (torch.Tensor): (B, N, 3) xyz coordinates of
+                the points.
+            features (torch.Tensor): (B, C, N) features of the points.
+
+        Returns:
+            torch.Tensor: (B, npoint, sample_num) Indices of sampled points.
+        """
+        if points_xyz.dtype == torch.half:
+            points_xyz = points_xyz.to(torch.float32)
+        if features is not None and features.dtype == torch.half:
+            features = features.to(torch.float32)
+
+        indices = []
+        last_fps_end_index = 0
+        for fps_sample_range, sampler, npoint in zip(
+                self.fps_sample_range_list, self.samplers, self.num_point):
+            assert fps_sample_range < points_xyz.shape[1]
+
+            if fps_sample_range == -1:
+                sample_points_xyz = points_xyz[:, last_fps_end_index:]
+                if features is not None:
+                    sample_features = features[:, :, last_fps_end_index:]
+                else:
+                    sample_features = None
+            else:
+                sample_points_xyz = points_xyz[:, last_fps_end_index:
+                                               fps_sample_range]
+                if features is not None:
+                    sample_features = features[:, :, last_fps_end_index:
+                                               fps_sample_range]
+                else:
+                    sample_features = None
+
+            fps_idx = sampler(sample_points_xyz.contiguous(), sample_features,
+                              npoint)
+
+            indices.append(fps_idx + last_fps_end_index)
+            last_fps_end_index = fps_sample_range
+        indices = torch.cat(indices, dim=1)
+
+        return indices
+
+
+class DFPSSampler(nn.Module):
+    """Using Euclidean distances of points for FPS."""
+
+    def __init__(self) -> None:
+        super().__init__()
+
+    def forward(self, points: Tensor, features: Tensor, npoint: int) -> Tensor:
+        """Sampling points with D-FPS."""
+        fps_idx = furthest_point_sample(points.contiguous(), npoint)
+        return fps_idx
+
+
+class FFPSSampler(nn.Module):
+    """Using feature distances for FPS."""
+
+    def __init__(self) -> None:
+        super().__init__()
+
+    def forward(self, points: Tensor, features: Tensor, npoint: int) -> Tensor:
+        """Sampling points with F-FPS."""
+        assert features is not None, \
+            'feature input to FFPS_Sampler should not be None'
+        features_for_fps = torch.cat([points, features.transpose(1, 2)], dim=2)
+        features_dist = calc_square_dist(
+            features_for_fps, features_for_fps, norm=False)
+        fps_idx = furthest_point_sample_with_dist(features_dist, npoint)
+        return fps_idx
+
+
+class FSSampler(nn.Module):
+    """Using F-FPS and D-FPS simultaneously."""
+
+    def __init__(self) -> None:
+        super().__init__()
+
+    def forward(self, points: Tensor, features: Tensor, npoint: int) -> Tensor:
+        """Sampling points with FS_Sampling."""
+        assert features is not None, \
+            'feature input to FS_Sampler should not be None'
+        ffps_sampler = FFPSSampler()
+        dfps_sampler = DFPSSampler()
+        fps_idx_ffps = ffps_sampler(points, features, npoint)
+        fps_idx_dfps = dfps_sampler(points, features, npoint)
+        fps_idx = torch.cat([fps_idx_ffps, fps_idx_dfps], dim=1)
+        return fps_idx
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/prroi_pool.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/prroi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c263e30780cc221afaa721ebd3196f02f4a3776
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/prroi_pool.py
@@ -0,0 +1,152 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.utils.dl_utils import TORCH_VERSION
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext',
+    ['prroi_pool_forward', 'prroi_pool_backward', 'prroi_pool_coor_backward'])
+
+
+class PrRoIPoolFunction(Function):
+
+    @staticmethod
+    def symbolic(g, features, rois, output_size, spatial_scale):
+        return g.op(
+            'mmcv::PrRoIPool',
+            features,
+            rois,
+            pooled_height_i=int(output_size[0]),
+            pooled_width_i=int(output_size[1]),
+            spatial_scale_f=float(spatial_scale))
+
+    @staticmethod
+    def forward(ctx,
+                features: torch.Tensor,
+                rois: torch.Tensor,
+                output_size: Tuple,
+                spatial_scale: float = 1.0) -> torch.Tensor:
+        if features.dtype != torch.float32 or rois.dtype != torch.float32:
+            raise ValueError('Precise RoI Pooling only takes float input, got '
+                             f'{features.dtype()} for features and'
+                             f'{rois.dtype()} for rois.')
+
+        pooled_height = int(output_size[0])
+        pooled_width = int(output_size[1])
+        spatial_scale = float(spatial_scale)
+
+        features = features.contiguous()
+        rois = rois.contiguous()
+        output_shape = (rois.size(0), features.size(1), pooled_height,
+                        pooled_width)
+        output = features.new_zeros(output_shape)
+        params = (pooled_height, pooled_width, spatial_scale)
+
+        ext_module.prroi_pool_forward(
+            features,
+            rois,
+            output,
+            pooled_height=params[0],
+            pooled_width=params[1],
+            spatial_scale=params[2])
+        ctx.params = params
+        # everything here is contiguous.
+        ctx.save_for_backward(features, rois, output)
+
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(
+        ctx, grad_output: torch.Tensor
+    ) -> Tuple[torch.Tensor, torch.Tensor, None, None, None]:
+        features, rois, output = ctx.saved_tensors
+        grad_input = grad_output.new_zeros(*features.shape)
+        grad_coor = grad_output.new_zeros(*rois.shape)
+
+        if features.requires_grad or TORCH_VERSION == 'parrots':
+            grad_output = grad_output.contiguous()
+            ext_module.prroi_pool_backward(
+                grad_output,
+                rois,
+                grad_input,
+                pooled_height=ctx.params[0],
+                pooled_width=ctx.params[1],
+                spatial_scale=ctx.params[2])
+        if rois.requires_grad or TORCH_VERSION == 'parrots':
+            grad_output = grad_output.contiguous()
+            ext_module.prroi_pool_coor_backward(
+                output,
+                grad_output,
+                features,
+                rois,
+                grad_coor,
+                pooled_height=ctx.params[0],
+                pooled_width=ctx.params[1],
+                spatial_scale=ctx.params[2])
+
+        return grad_input, grad_coor, None, None, None
+
+
+prroi_pool = PrRoIPoolFunction.apply
+
+
+class PrRoIPool(nn.Module):
+    """The operation of precision RoI pooling. The implementation of PrRoIPool
+    is modified from https://github.com/vacancy/PreciseRoIPooling/
+
+    Precise RoI Pooling (PrRoIPool) is an integration-based (bilinear
+    interpolation) average pooling method for RoI Pooling. It avoids any
+    quantization and has a continuous gradient on bounding box coordinates.
+    It is:
+
+    1. different from the original RoI Pooling proposed in Fast R-CNN. PrRoI
+    Pooling uses average pooling instead of max pooling for each bin and has a
+    continuous gradient on bounding box coordinates. That is, one can take the
+    derivatives of some loss function w.r.t the coordinates of each RoI and
+    optimize the RoI coordinates.
+    2. different from the RoI Align proposed in Mask R-CNN. PrRoI Pooling uses
+    a full integration-based average pooling instead of sampling a constant
+    number of points. This makes the gradient w.r.t. the coordinates
+    continuous.
+
+    Args:
+        output_size (Union[int, tuple]): h, w.
+        spatial_scale (float, optional): scale the input boxes by this number.
+            Defaults to 1.0.
+    """
+
+    def __init__(self,
+                 output_size: Union[int, tuple],
+                 spatial_scale: float = 1.0):
+        super().__init__()
+
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+
+    def forward(self, features: torch.Tensor,
+                rois: torch.Tensor) -> torch.Tensor:
+        """Forward function.
+
+        Args:
+            features (torch.Tensor): The feature map.
+            rois (torch.Tensor): The RoI bboxes in [tl_x, tl_y, br_x, br_y]
+                format.
+
+        Returns:
+            torch.Tensor: The pooled results.
+        """
+        return prroi_pool(features, rois, self.output_size, self.spatial_scale)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(output_size={self.output_size}, '
+        s += f'spatial_scale={self.spatial_scale})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/psa_mask.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/psa_mask.py
new file mode 100644
index 0000000000000000000000000000000000000000..45f4946662c6751fe72fe6fd139f6e4b508d6cba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/psa_mask.py
@@ -0,0 +1,98 @@
+# Modified from https://github.com/hszhao/semseg/blob/master/lib/psa
+from typing import Optional, Tuple
+
+import torch
+from torch import nn
+from torch.autograd import Function
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext',
+                                 ['psamask_forward', 'psamask_backward'])
+
+
+class PSAMaskFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, psa_type, mask_size):
+        return g.op(
+            'mmcv::MMCVPSAMask',
+            input,
+            psa_type_i=psa_type,
+            mask_size_i=mask_size)
+
+    @staticmethod
+    def forward(ctx, input: torch.Tensor, psa_type: str,
+                mask_size: int) -> torch.Tensor:
+        ctx.psa_type = psa_type
+        ctx.mask_size = _pair(mask_size)
+        ctx.save_for_backward(input)
+
+        h_mask, w_mask = ctx.mask_size
+        batch_size, channels, h_feature, w_feature = input.size()
+        assert channels == h_mask * w_mask
+        output = input.new_zeros(
+            (batch_size, h_feature * w_feature, h_feature, w_feature))
+
+        ext_module.psamask_forward(
+            input,
+            output,
+            psa_type=psa_type,
+            num_=batch_size,
+            h_feature=h_feature,
+            w_feature=w_feature,
+            h_mask=h_mask,
+            w_mask=w_mask,
+            half_h_mask=(h_mask - 1) // 2,
+            half_w_mask=(w_mask - 1) // 2)
+        return output
+
+    @staticmethod
+    def backward(
+            ctx, grad_output: torch.Tensor
+    ) -> Tuple[torch.Tensor, None, None, None]:
+        input = ctx.saved_tensors[0]
+        psa_type = ctx.psa_type
+        h_mask, w_mask = ctx.mask_size
+        batch_size, channels, h_feature, w_feature = input.size()
+        grad_input = grad_output.new_zeros(
+            (batch_size, channels, h_feature, w_feature))
+        ext_module.psamask_backward(
+            grad_output,
+            grad_input,
+            psa_type=psa_type,
+            num_=batch_size,
+            h_feature=h_feature,
+            w_feature=w_feature,
+            h_mask=h_mask,
+            w_mask=w_mask,
+            half_h_mask=(h_mask - 1) // 2,
+            half_w_mask=(w_mask - 1) // 2)
+        return grad_input, None, None, None
+
+
+psa_mask = PSAMaskFunction.apply
+
+
+class PSAMask(nn.Module):
+
+    def __init__(self, psa_type: str, mask_size: Optional[tuple] = None):
+        super().__init__()
+        assert psa_type in ['collect', 'distribute']
+        if psa_type == 'collect':
+            psa_type_enum = 0
+        else:
+            psa_type_enum = 1
+        self.psa_type_enum = psa_type_enum
+        self.mask_size = mask_size
+        self.psa_type = psa_type
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        return psa_mask(input, self.psa_type_enum, self.mask_size)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(psa_type={self.psa_type}, '
+        s += f'mask_size={self.mask_size})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/riroi_align_rotated.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/riroi_align_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4e5a542f2a4fd69c96415ce60ee1536ce77ed0f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/riroi_align_rotated.py
@@ -0,0 +1,140 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.utils import is_tuple_of
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['riroi_align_rotated_forward', 'riroi_align_rotated_backward'])
+
+
+class RiRoIAlignRotatedFunction(Function):
+
+    @staticmethod
+    def forward(ctx: Any,
+                features: torch.Tensor,
+                rois: torch.Tensor,
+                out_size: Union[int, tuple],
+                spatial_scale: float,
+                num_samples: int = 0,
+                num_orientations: int = 8,
+                clockwise: bool = False) -> torch.Tensor:
+        if isinstance(out_size, int):
+            out_h = out_size
+            out_w = out_size
+        elif is_tuple_of(out_size, int):
+            assert len(out_size) == 2
+            out_h, out_w = out_size
+        else:
+            raise TypeError(
+                f'"out_size" should be an integer or tuple of integers,'
+                f' but got {out_size}')
+        ctx.spatial_scale = spatial_scale
+        ctx.num_samples = num_samples
+        ctx.num_orientations = num_orientations
+        ctx.clockwise = clockwise
+        ctx.save_for_backward(rois)
+        ctx.feature_size = features.size()
+
+        batch_size, num_channels, _, _ = features.size()
+        num_rois = rois.size(0)
+
+        output = features.new_zeros(num_rois, num_channels, out_h, out_w)
+
+        ext_module.riroi_align_rotated_forward(
+            features,
+            rois,
+            output,
+            pooled_height=out_h,
+            pooled_width=out_w,
+            spatial_scale=spatial_scale,
+            num_samples=num_samples,
+            num_orientations=num_orientations,
+            clockwise=clockwise)
+        return output
+
+    @staticmethod
+    def backward(
+        ctx: Any, grad_output: torch.Tensor
+    ) -> Optional[Tuple[torch.Tensor, None, None, None, None, None, None]]:
+        feature_size = ctx.feature_size
+        spatial_scale = ctx.spatial_scale
+        num_orientations = ctx.num_orientations
+        clockwise = ctx.clockwise
+        num_samples = ctx.num_samples
+        rois = ctx.saved_tensors[0]
+        assert feature_size is not None
+        batch_size, num_channels, feature_h, feature_w = feature_size
+
+        out_w = grad_output.size(3)
+        out_h = grad_output.size(2)
+
+        grad_input = None
+
+        if ctx.needs_input_grad[0]:
+            grad_input = rois.new_zeros(batch_size, num_channels, feature_h,
+                                        feature_w)
+            ext_module.riroi_align_rotated_backward(
+                grad_output.contiguous(),
+                rois,
+                grad_input,
+                pooled_height=out_h,
+                pooled_width=out_w,
+                spatial_scale=spatial_scale,
+                num_samples=num_samples,
+                num_orientations=num_orientations,
+                clockwise=clockwise)
+
+            return grad_input, None, None, None, None, None, None
+        return None
+
+
+riroi_align_rotated = RiRoIAlignRotatedFunction.apply
+
+
+class RiRoIAlignRotated(nn.Module):
+    """Rotation-invariant RoI align pooling layer for rotated proposals.
+
+    It accepts a feature map of shape (N, C, H, W) and rois with shape
+    (n, 6) with each roi decoded as (batch_index, center_x, center_y,
+    w, h, angle). The angle is in radian.
+
+    The details are described in the paper `ReDet: A Rotation-equivariant
+    Detector for Aerial Object Detection  <https://arxiv.org/abs/2103.07733>`_.
+
+    Args:
+        out_size (tuple): fixed dimensional RoI output with shape (h, w).
+        spatial_scale (float): scale the input boxes by this number
+        num_samples (int): number of inputs samples to take for each
+            output sample. 0 to take samples densely for current models.
+        num_orientations (int): number of oriented channels.
+        clockwise (bool): If True, the angle in each proposal follows a
+            clockwise fashion in image space, otherwise, the angle is
+            counterclockwise. Default: False.
+    """
+
+    def __init__(self,
+                 out_size: tuple,
+                 spatial_scale: float,
+                 num_samples: int = 0,
+                 num_orientations: int = 8,
+                 clockwise: bool = False):
+        super().__init__()
+
+        self.out_size = out_size
+        self.spatial_scale = float(spatial_scale)
+        self.num_samples = int(num_samples)
+        self.num_orientations = int(num_orientations)
+        self.clockwise = clockwise
+
+    def forward(self, features: torch.Tensor,
+                rois: torch.Tensor) -> torch.Tensor:
+        return RiRoIAlignRotatedFunction.apply(features, rois, self.out_size,
+                                               self.spatial_scale,
+                                               self.num_samples,
+                                               self.num_orientations,
+                                               self.clockwise)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..de2bed204df1b9ed00a379147086b7ca5123a1e3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align.py
@@ -0,0 +1,221 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any
+
+import torch
+import torch.nn as nn
+from mmengine.utils import deprecated_api_warning
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext',
+                                 ['roi_align_forward', 'roi_align_backward'])
+
+
+class RoIAlignFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, rois, output_size, spatial_scale, sampling_ratio,
+                 pool_mode, aligned):
+        from torch.onnx import TensorProtoDataType
+        from torch.onnx.symbolic_opset9 import sub
+
+        def _select(g, self, dim, index):
+            return g.op('Gather', self, index, axis_i=dim)
+
+        # batch_indices = rois[:, 0].long()
+        batch_indices = _select(
+            g, rois, 1,
+            g.op('Constant', value_t=torch.tensor([0], dtype=torch.long)))
+        batch_indices = g.op('Squeeze', batch_indices, axes_i=[1])
+        batch_indices = g.op(
+            'Cast', batch_indices, to_i=TensorProtoDataType.INT64)
+        # rois = rois[:, 1:]
+        rois = _select(
+            g, rois, 1,
+            g.op(
+                'Constant',
+                value_t=torch.tensor([1, 2, 3, 4], dtype=torch.long)))
+
+        if aligned:
+            # rois -= 0.5/spatial_scale
+            aligned_offset = g.op(
+                'Constant',
+                value_t=torch.tensor([0.5 / spatial_scale],
+                                     dtype=torch.float32))
+            rois = sub(g, rois, aligned_offset)
+        # roi align
+        return g.op(
+            'RoiAlign',
+            input,
+            rois,
+            batch_indices,
+            output_height_i=output_size[0],
+            output_width_i=output_size[1],
+            spatial_scale_f=spatial_scale,
+            sampling_ratio_i=max(0, sampling_ratio),
+            mode_s=pool_mode)
+
+    @staticmethod
+    def forward(ctx: Any,
+                input: torch.Tensor,
+                rois: torch.Tensor,
+                output_size: int,
+                spatial_scale: float = 1.0,
+                sampling_ratio: int = 0,
+                pool_mode: str = 'avg',
+                aligned: bool = True) -> torch.Tensor:
+        ctx.output_size = _pair(output_size)
+        ctx.spatial_scale = spatial_scale
+        ctx.sampling_ratio = sampling_ratio
+        assert pool_mode in ('max', 'avg')
+        ctx.pool_mode = 0 if pool_mode == 'max' else 1
+        ctx.aligned = aligned
+        ctx.input_shape = input.size()
+
+        assert rois.size(1) == 5, 'RoI must be (idx, x1, y1, x2, y2)!'
+
+        output_shape = (rois.size(0), input.size(1), ctx.output_size[0],
+                        ctx.output_size[1])
+        output = input.new_zeros(output_shape)
+        if ctx.pool_mode == 0:
+            argmax_y = input.new_zeros(output_shape)
+            argmax_x = input.new_zeros(output_shape)
+        else:
+            argmax_y = input.new_zeros(0)
+            argmax_x = input.new_zeros(0)
+
+        ext_module.roi_align_forward(
+            input,
+            rois,
+            output,
+            argmax_y,
+            argmax_x,
+            aligned_height=ctx.output_size[0],
+            aligned_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            pool_mode=ctx.pool_mode,
+            aligned=ctx.aligned)
+
+        ctx.save_for_backward(rois, argmax_y, argmax_x)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx: Any, grad_output: torch.Tensor) -> tuple:
+        rois, argmax_y, argmax_x = ctx.saved_tensors
+        grad_input = grad_output.new_zeros(ctx.input_shape)
+        # complex head architecture may cause grad_output uncontiguous.
+        grad_output = grad_output.contiguous()
+        ext_module.roi_align_backward(
+            grad_output,
+            rois,
+            argmax_y,
+            argmax_x,
+            grad_input,
+            aligned_height=ctx.output_size[0],
+            aligned_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            pool_mode=ctx.pool_mode,
+            aligned=ctx.aligned)
+        return grad_input, None, None, None, None, None, None
+
+
+roi_align = RoIAlignFunction.apply
+
+
+class RoIAlign(nn.Module):
+    """RoI align pooling layer.
+
+    Args:
+        output_size (tuple): h, w
+        spatial_scale (float): scale the input boxes by this number
+        sampling_ratio (int): number of inputs samples to take for each
+            output sample. 0 to take samples densely for current models.
+        pool_mode (str, 'avg' or 'max'): pooling mode in each bin.
+        aligned (bool): if False, use the legacy implementation in
+            MMDetection. If True, align the results more perfectly.
+        use_torchvision (bool): whether to use roi_align from torchvision.
+
+    Note:
+        The implementation of RoIAlign when aligned=True is modified from
+        https://github.com/facebookresearch/detectron2/
+
+        The meaning of aligned=True:
+
+        Given a continuous coordinate c, its two neighboring pixel
+        indices (in our pixel model) are computed by floor(c - 0.5) and
+        ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete
+        indices [0] and [1] (which are sampled from the underlying signal
+        at continuous coordinates 0.5 and 1.5). But the original roi_align
+        (aligned=False) does not subtract the 0.5 when computing
+        neighboring pixel indices and therefore it uses pixels with a
+        slightly incorrect alignment (relative to our pixel model) when
+        performing bilinear interpolation.
+
+        With `aligned=True`,
+        we first appropriately scale the ROI and then shift it by -0.5
+        prior to calling roi_align. This produces the correct neighbors;
+
+        The difference does not make a difference to the model's
+        performance if ROIAlign is used together with conv layers.
+    """
+
+    @deprecated_api_warning(
+        {
+            'out_size': 'output_size',
+            'sample_num': 'sampling_ratio'
+        },
+        cls_name='RoIAlign')
+    def __init__(self,
+                 output_size: tuple,
+                 spatial_scale: float = 1.0,
+                 sampling_ratio: int = 0,
+                 pool_mode: str = 'avg',
+                 aligned: bool = True,
+                 use_torchvision: bool = False):
+        super().__init__()
+
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+        self.sampling_ratio = int(sampling_ratio)
+        self.pool_mode = pool_mode
+        self.aligned = aligned
+        self.use_torchvision = use_torchvision
+
+    def forward(self, input: torch.Tensor, rois: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            input: NCHW images
+            rois: Bx5 boxes. First column is the index into N.\
+                The other 4 columns are xyxy.
+        """
+        if self.use_torchvision:
+            from torchvision.ops import roi_align as tv_roi_align
+            if 'aligned' in tv_roi_align.__code__.co_varnames:
+                return tv_roi_align(input, rois, self.output_size,
+                                    self.spatial_scale, self.sampling_ratio,
+                                    self.aligned)
+            else:
+                if self.aligned:
+                    rois -= rois.new_tensor([0.] +
+                                            [0.5 / self.spatial_scale] * 4)
+                return tv_roi_align(input, rois, self.output_size,
+                                    self.spatial_scale, self.sampling_ratio)
+        else:
+            return roi_align(input, rois, self.output_size, self.spatial_scale,
+                             self.sampling_ratio, self.pool_mode, self.aligned)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(output_size={self.output_size}, '
+        s += f'spatial_scale={self.spatial_scale}, '
+        s += f'sampling_ratio={self.sampling_ratio}, '
+        s += f'pool_mode={self.pool_mode}, '
+        s += f'aligned={self.aligned}, '
+        s += f'use_torchvision={self.use_torchvision})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align_rotated.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..38e6ea3d32705ac2d61d486fe67ff550bfd8a3f4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_align_rotated.py
@@ -0,0 +1,187 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.utils import deprecated_api_warning
+from torch.autograd import Function
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['roi_align_rotated_forward', 'roi_align_rotated_backward'])
+
+
+class RoIAlignRotatedFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, rois, output_size, spatial_scale, sampling_ratio,
+                 aligned, clockwise):
+        if isinstance(output_size, int):
+            out_h = output_size
+            out_w = output_size
+        elif isinstance(output_size, tuple):
+            assert len(output_size) == 2
+            assert isinstance(output_size[0], int)
+            assert isinstance(output_size[1], int)
+            out_h, out_w = output_size
+        else:
+            raise TypeError(
+                '"output_size" must be an integer or tuple of integers')
+        return g.op(
+            'mmcv::MMCVRoIAlignRotated',
+            input,
+            rois,
+            output_height_i=out_h,
+            output_width_i=out_h,
+            spatial_scale_f=spatial_scale,
+            sampling_ratio_i=sampling_ratio,
+            aligned_i=aligned,
+            clockwise_i=clockwise)
+
+    @staticmethod
+    def forward(ctx: Any,
+                input: torch.Tensor,
+                rois: torch.Tensor,
+                output_size: Union[int, tuple],
+                spatial_scale: float,
+                sampling_ratio: int = 0,
+                aligned: bool = True,
+                clockwise: bool = False) -> torch.Tensor:
+        ctx.output_size = _pair(output_size)
+        ctx.spatial_scale = spatial_scale
+        ctx.sampling_ratio = sampling_ratio
+        ctx.aligned = aligned
+        ctx.clockwise = clockwise
+        ctx.save_for_backward(rois)
+        ctx.feature_size = input.size()
+
+        batch_size, num_channels, data_height, data_width = input.size()
+        num_rois = rois.size(0)
+
+        output = input.new_zeros(num_rois, num_channels, ctx.output_size[0],
+                                 ctx.output_size[1])
+        ext_module.roi_align_rotated_forward(
+            input,
+            rois,
+            output,
+            pooled_height=ctx.output_size[0],
+            pooled_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale,
+            sampling_ratio=ctx.sampling_ratio,
+            aligned=ctx.aligned,
+            clockwise=ctx.clockwise)
+        return output
+
+    @staticmethod
+    def backward(
+        ctx: Any, grad_output: torch.Tensor
+    ) -> Tuple[Optional[torch.Tensor], Optional[torch.Tensor], None, None,
+               None, None, None]:
+        feature_size = ctx.feature_size
+        rois = ctx.saved_tensors[0]
+        assert feature_size is not None
+        batch_size, num_channels, data_height, data_width = feature_size
+
+        out_w = grad_output.size(3)
+        out_h = grad_output.size(2)
+
+        grad_input = grad_rois = None
+
+        if ctx.needs_input_grad[0]:
+            grad_input = rois.new_zeros(batch_size, num_channels, data_height,
+                                        data_width)
+            ext_module.roi_align_rotated_backward(
+                grad_output.contiguous(),
+                rois,
+                grad_input,
+                pooled_height=out_h,
+                pooled_width=out_w,
+                spatial_scale=ctx.spatial_scale,
+                sampling_ratio=ctx.sampling_ratio,
+                aligned=ctx.aligned,
+                clockwise=ctx.clockwise)
+        return grad_input, grad_rois, None, None, None, None, None
+
+
+roi_align_rotated = RoIAlignRotatedFunction.apply
+
+
+class RoIAlignRotated(nn.Module):
+    """RoI align pooling layer for rotated proposals.
+
+    It accepts a feature map of shape (N, C, H, W) and rois with shape
+    (n, 6) with each roi decoded as (batch_index, center_x, center_y,
+    w, h, angle). The angle is in radian.
+
+    Args:
+        output_size (tuple): h, w
+        spatial_scale (float): scale the input boxes by this number
+        sampling_ratio(int): number of inputs samples to take for each
+            output sample. 0 to take samples densely for current models.
+        aligned (bool): if False, use the legacy implementation in
+            MMDetection. If True, align the results more perfectly.
+            Default: True.
+        clockwise (bool): If True, the angle in each proposal follows a
+            clockwise fashion in image space, otherwise, the angle is
+            counterclockwise. Default: False.
+
+    Note:
+        The implementation of RoIAlign when aligned=True is modified from
+        https://github.com/facebookresearch/detectron2/
+
+        The meaning of aligned=True:
+
+        Given a continuous coordinate c, its two neighboring pixel
+        indices (in our pixel model) are computed by floor(c - 0.5) and
+        ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete
+        indices [0] and [1] (which are sampled from the underlying signal
+        at continuous coordinates 0.5 and 1.5). But the original roi_align
+        (aligned=False) does not subtract the 0.5 when computing
+        neighboring pixel indices and therefore it uses pixels with a
+        slightly incorrect alignment (relative to our pixel model) when
+        performing bilinear interpolation.
+
+        With `aligned=True`,
+        we first appropriately scale the ROI and then shift it by -0.5
+        prior to calling roi_align. This produces the correct neighbors;
+
+        The difference does not make a difference to the model's
+        performance if ROIAlign is used together with conv layers.
+    """
+
+    @deprecated_api_warning(
+        {
+            'out_size': 'output_size',
+            'sample_num': 'sampling_ratio'
+        },
+        cls_name='RoIAlignRotated')
+    def __init__(self,
+                 output_size: Union[int, tuple],
+                 spatial_scale: float,
+                 sampling_ratio: int = 0,
+                 aligned: bool = True,
+                 clockwise: bool = False):
+        super().__init__()
+
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+        self.sampling_ratio = int(sampling_ratio)
+        self.aligned = aligned
+        self.clockwise = clockwise
+
+    def forward(self, input: torch.Tensor, rois: torch.Tensor) -> torch.Tensor:
+        return RoIAlignRotatedFunction.apply(input, rois, self.output_size,
+                                             self.spatial_scale,
+                                             self.sampling_ratio, self.aligned,
+                                             self.clockwise)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(output_size={self.output_size}, '
+        s += f'spatial_scale={self.spatial_scale}, '
+        s += f'sampling_ratio={self.sampling_ratio}, '
+        s += f'aligned={self.aligned}, '
+        s += f'clockwise={self.clockwise})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_pool.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..e295b6a0c16b893688be3a574c6ce423df3399e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roi_pool.py
@@ -0,0 +1,96 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Tuple, Union
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext',
+                                 ['roi_pool_forward', 'roi_pool_backward'])
+
+
+class RoIPoolFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, rois, output_size, spatial_scale):
+        return g.op(
+            'MaxRoiPool',
+            input,
+            rois,
+            pooled_shape_i=output_size,
+            spatial_scale_f=spatial_scale)
+
+    @staticmethod
+    def forward(ctx: Any,
+                input: torch.Tensor,
+                rois: torch.Tensor,
+                output_size: Union[int, tuple],
+                spatial_scale: float = 1.0) -> torch.Tensor:
+        ctx.output_size = _pair(output_size)
+        ctx.spatial_scale = spatial_scale
+        ctx.input_shape = input.size()
+
+        assert rois.size(1) == 5, 'RoI must be (idx, x1, y1, x2, y2)!'
+
+        output_shape = (rois.size(0), input.size(1), ctx.output_size[0],
+                        ctx.output_size[1])
+        output = input.new_zeros(output_shape)
+        argmax = input.new_zeros(output_shape, dtype=torch.int)
+
+        ext_module.roi_pool_forward(
+            input,
+            rois,
+            output,
+            argmax,
+            pooled_height=ctx.output_size[0],
+            pooled_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale)
+
+        ctx.save_for_backward(rois, argmax)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(
+            ctx: Any, grad_output: torch.Tensor
+    ) -> Tuple[torch.Tensor, None, None, None]:
+        rois, argmax = ctx.saved_tensors
+        grad_input = grad_output.new_zeros(ctx.input_shape)
+
+        ext_module.roi_pool_backward(
+            grad_output,
+            rois,
+            argmax,
+            grad_input,
+            pooled_height=ctx.output_size[0],
+            pooled_width=ctx.output_size[1],
+            spatial_scale=ctx.spatial_scale)
+
+        return grad_input, None, None, None
+
+
+roi_pool = RoIPoolFunction.apply
+
+
+class RoIPool(nn.Module):
+
+    def __init__(self,
+                 output_size: Union[int, tuple],
+                 spatial_scale: float = 1.0):
+        super().__init__()
+
+        self.output_size = _pair(output_size)
+        self.spatial_scale = float(spatial_scale)
+
+    def forward(self, input: torch.Tensor, rois: torch.Tensor) -> torch.Tensor:
+        return roi_pool(input, rois, self.output_size, self.spatial_scale)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'(output_size={self.output_size}, '
+        s += f'spatial_scale={self.spatial_scale})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roiaware_pool3d.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roiaware_pool3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..728f246809ae991481a85cc0b4eb8e689bdc7f8f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roiaware_pool3d.py
@@ -0,0 +1,132 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Tuple, Union
+
+import mmengine
+import torch
+from torch import nn as nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['roiaware_pool3d_forward', 'roiaware_pool3d_backward'])
+
+
+class RoIAwarePool3d(nn.Module):
+    """Encode the geometry-specific features of each 3D proposal.
+
+    Please refer to `PartA2 <https://arxiv.org/pdf/1907.03670.pdf>`_ for more
+    details.
+
+    Args:
+        out_size (int or tuple): The size of output features. n or
+            [n1, n2, n3].
+        max_pts_per_voxel (int, optional): The maximum number of points per
+            voxel. Default: 128.
+        mode (str, optional): Pooling method of RoIAware, 'max' or 'avg'.
+            Default: 'max'.
+    """
+
+    def __init__(self,
+                 out_size: Union[int, tuple],
+                 max_pts_per_voxel: int = 128,
+                 mode: str = 'max'):
+        super().__init__()
+
+        self.out_size = out_size
+        self.max_pts_per_voxel = max_pts_per_voxel
+        assert mode in ['max', 'avg']
+        pool_mapping = {'max': 0, 'avg': 1}
+        self.mode = pool_mapping[mode]
+
+    def forward(self, rois: torch.Tensor, pts: torch.Tensor,
+                pts_feature: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            rois (torch.Tensor): [N, 7], in LiDAR coordinate,
+                (x, y, z) is the bottom center of rois.
+            pts (torch.Tensor): [npoints, 3], coordinates of input points.
+            pts_feature (torch.Tensor): [npoints, C], features of input points.
+
+        Returns:
+            torch.Tensor: Pooled features whose shape is
+            [N, out_x, out_y, out_z, C].
+        """
+
+        return RoIAwarePool3dFunction.apply(rois, pts, pts_feature,
+                                            self.out_size,
+                                            self.max_pts_per_voxel, self.mode)
+
+
+class RoIAwarePool3dFunction(Function):
+
+    @staticmethod
+    def forward(ctx: Any, rois: torch.Tensor, pts: torch.Tensor,
+                pts_feature: torch.Tensor, out_size: Union[int, tuple],
+                max_pts_per_voxel: int, mode: int) -> torch.Tensor:
+        """
+        Args:
+            rois (torch.Tensor): [N, 7], in LiDAR coordinate,
+                (x, y, z) is the bottom center of rois.
+            pts (torch.Tensor): [npoints, 3], coordinates of input points.
+            pts_feature (torch.Tensor): [npoints, C], features of input points.
+            out_size (int or tuple): The size of output features. n or
+                [n1, n2, n3].
+            max_pts_per_voxel (int): The maximum number of points per voxel.
+                Default: 128.
+            mode (int): Pooling method of RoIAware, 0 (max pool) or 1 (average
+                pool).
+
+        Returns:
+            torch.Tensor: Pooled features whose shape is
+            [N, out_x, out_y, out_z, C].
+        """
+
+        if isinstance(out_size, int):
+            out_x = out_y = out_z = out_size
+        else:
+            assert len(out_size) == 3
+            assert mmengine.is_tuple_of(out_size, int)
+            out_x, out_y, out_z = out_size
+
+        num_rois = rois.shape[0]
+        num_channels = pts_feature.shape[-1]
+        num_pts = pts.shape[0]
+
+        pooled_features = pts_feature.new_zeros(
+            (num_rois, out_x, out_y, out_z, num_channels))
+        argmax = pts_feature.new_zeros(
+            (num_rois, out_x, out_y, out_z, num_channels), dtype=torch.int)
+        pts_idx_of_voxels = pts_feature.new_zeros(
+            (num_rois, out_x, out_y, out_z, max_pts_per_voxel),
+            dtype=torch.int)
+
+        ext_module.roiaware_pool3d_forward(
+            rois,
+            pts,
+            pts_feature,
+            argmax,
+            pts_idx_of_voxels,
+            pooled_features,
+            pool_method=mode)
+
+        ctx.roiaware_pool3d_for_backward = (pts_idx_of_voxels, argmax, mode,
+                                            num_pts, num_channels)
+        return pooled_features
+
+    @staticmethod
+    def backward(
+        ctx: Any, grad_out: torch.Tensor
+    ) -> Tuple[None, None, torch.Tensor, None, None, None]:
+        ret = ctx.roiaware_pool3d_for_backward
+        pts_idx_of_voxels, argmax, mode, num_pts, num_channels = ret
+
+        grad_in = grad_out.new_zeros((num_pts, num_channels))
+        ext_module.roiaware_pool3d_backward(
+            pts_idx_of_voxels,
+            argmax,
+            grad_out.contiguous(),
+            grad_in,
+            pool_method=mode)
+
+        return None, None, grad_in, None, None, None
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roipoint_pool3d.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roipoint_pool3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..3c16f5fa67cb3cf6d48d4263b5acf0173ccde7bf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/roipoint_pool3d.py
@@ -0,0 +1,87 @@
+from typing import Any, Tuple
+
+import torch
+from torch import nn as nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['roipoint_pool3d_forward'])
+
+
+class RoIPointPool3d(nn.Module):
+    """Encode the geometry-specific features of each 3D proposal.
+
+    Please refer to `Paper of PartA2 <https://arxiv.org/pdf/1907.03670.pdf>`_
+    for more details.
+
+    Args:
+        num_sampled_points (int, optional): Number of samples in each roi.
+            Default: 512.
+    """
+
+    def __init__(self, num_sampled_points: int = 512):
+        super().__init__()
+        self.num_sampled_points = num_sampled_points
+
+    def forward(self, points: torch.Tensor, point_features: torch.Tensor,
+                boxes3d: torch.Tensor) -> Tuple[torch.Tensor]:
+        """
+        Args:
+            points (torch.Tensor): Input points whose shape is (B, N, C).
+            point_features (torch.Tensor): Features of input points whose shape
+                is (B, N, C).
+            boxes3d (B, M, 7), Input bounding boxes whose shape is (B, M, 7).
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains two elements. The first one
+            is the pooled features whose shape is (B, M, 512, 3 + C). The
+            second is an empty flag whose shape is (B, M).
+        """
+        return RoIPointPool3dFunction.apply(points, point_features, boxes3d,
+                                            self.num_sampled_points)
+
+
+class RoIPointPool3dFunction(Function):
+
+    @staticmethod
+    def forward(
+            ctx: Any,
+            points: torch.Tensor,
+            point_features: torch.Tensor,
+            boxes3d: torch.Tensor,
+            num_sampled_points: int = 512
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """
+        Args:
+            points (torch.Tensor): Input points whose shape is (B, N, C).
+            point_features (torch.Tensor): Features of input points whose shape
+                is (B, N, C).
+            boxes3d (B, M, 7), Input bounding boxes whose shape is (B, M, 7).
+            num_sampled_points (int, optional): The num of sampled points.
+                Default: 512.
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains two elements. The first one
+            is the pooled features whose shape is (B, M, 512, 3 + C). The
+            second is an empty flag whose shape is (B, M).
+        """
+        assert len(points.shape) == 3 and points.shape[2] == 3
+        batch_size, boxes_num, feature_len = points.shape[0], boxes3d.shape[
+            1], point_features.shape[2]
+        pooled_boxes3d = boxes3d.view(batch_size, -1, 7)
+        pooled_features = point_features.new_zeros(
+            (batch_size, boxes_num, num_sampled_points, 3 + feature_len))
+        pooled_empty_flag = point_features.new_zeros(
+            (batch_size, boxes_num)).int()
+
+        ext_module.roipoint_pool3d_forward(points.contiguous(),
+                                           pooled_boxes3d.contiguous(),
+                                           point_features.contiguous(),
+                                           pooled_features, pooled_empty_flag)
+
+        return pooled_features, pooled_empty_flag
+
+    @staticmethod
+    def backward(ctx: Any, grad_out: torch.Tensor) -> torch.Tensor:
+        raise NotImplementedError
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/rotated_feature_align.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/rotated_feature_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..0132c048621790fcd6211f179405a6f7b0d77390
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/rotated_feature_align.py
@@ -0,0 +1,95 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any
+
+import torch
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext',
+    ['rotated_feature_align_forward', 'rotated_feature_align_backward'])
+
+
+class RotatedFeatureAlignFunction(Function):
+    """Using the feature interpolation to obtain the position information
+    correspond to the refined rotate anchors and reconstruct the feature maps
+    in pixel-wise manner to achieve feature alignment.
+
+    The details are described in the paper
+    `R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating
+    Object <https://arxiv.org/abs/1908.05612>`_.
+    """
+
+    @staticmethod
+    def symbolic(g, features, best_rbboxes, spatial_scale, points):
+        assert points in [1, 5]
+        return g.op(
+            'mmcv::MMCVRotatedFeatureAlign',
+            features,
+            best_rbboxes,
+            spatial_scale_f=spatial_scale,
+            points_i=points)
+
+    @staticmethod
+    def forward(ctx: Any, features: torch.Tensor, best_rbboxes: torch.Tensor,
+                spatial_scale: float, points: int) -> torch.Tensor:
+        """
+        Args:
+            features (torch.Tensor): Input features with shape [N,C,H,W].
+            best_rbboxes (torch.Tensor): Refined rotate anchors with
+                shape [N,H,W,5]. Coordinate format (cx,cx,h,w,a).
+            spatial_scale (float): The scale of feature map size and
+                input image size.
+            points (int, optional): The number of sample points.
+                Only 1 and 5 are supported. Defaults to 1.
+
+        Returns:
+            torch.Tensor: Refined features with shape [N,C,H,W].
+        """
+        ctx.spatial_scale = spatial_scale
+        ctx.points = points
+        ctx.save_for_backward(best_rbboxes)
+        assert points in [1, 5]
+        output = torch.zeros_like(features)
+        ext_module.rotated_feature_align_forward(
+            features,
+            best_rbboxes,
+            output,
+            spatial_scale=spatial_scale,
+            points=points)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(ctx: Any, grad_output: torch.Tensor) -> tuple:
+        """
+        Args:
+            grad_output (torch.Tensor): The gradient of output features
+                with shape [N,C,H,W].
+
+        Returns:
+            torch.Tensor: The gradient of input features with shape [N,C,H,W].
+        """
+        best_rbboxes = ctx.saved_tensors[0]
+        points = ctx.points
+        spatial_scale = ctx.spatial_scale
+        grad_input = None
+        if ctx.needs_input_grad[0]:
+            grad_input = torch.zeros_like(grad_output)
+            ext_module.rotated_feature_align_backward(
+                grad_output.contiguous(),
+                best_rbboxes,
+                grad_input,
+                spatial_scale=spatial_scale,
+                points=points)
+        return grad_input, None, None, None
+
+
+def rotated_feature_align(features: torch.Tensor,
+                          best_rbboxes: torch.Tensor,
+                          spatial_scale: float = 1 / 8,
+                          points: int = 1) -> torch.Tensor:
+    return RotatedFeatureAlignFunction.apply(features, best_rbboxes,
+                                             spatial_scale, points)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/saconv.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/saconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..f932884073e0cc9a428d41f66b8aec0112b9e5ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/saconv.py
@@ -0,0 +1,149 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model import constant_init
+from mmengine.registry import MODELS
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+from mmcv.cnn import ConvAWS2d
+from mmcv.ops.deform_conv import deform_conv2d
+
+
+@MODELS.register_module(name='SAC')
+class SAConv2d(ConvAWS2d):
+    """SAC (Switchable Atrous Convolution)
+
+    This is an implementation of `DetectoRS: Detecting Objects with Recursive
+    Feature Pyramid and Switchable Atrous Convolution
+    <https://arxiv.org/abs/2006.02334>`_.
+
+    Args:
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        kernel_size (int or tuple): Size of the convolving kernel
+        stride (int or tuple, optional): Stride of the convolution. Default: 1
+        padding (int or tuple, optional): Zero-padding added to both sides of
+            the input. Default: 0
+        padding_mode (string, optional): ``'zeros'``, ``'reflect'``,
+            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
+        dilation (int or tuple, optional): Spacing between kernel elements.
+            Default: 1
+        groups (int, optional): Number of blocked connections from input
+            channels to output channels. Default: 1
+        bias (bool, optional): If ``True``, adds a learnable bias to the
+            output. Default: ``True``
+        use_deform: If ``True``, replace convolution with deformable
+            convolution. Default: ``False``.
+    """
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 kernel_size,
+                 stride=1,
+                 padding=0,
+                 dilation=1,
+                 groups=1,
+                 bias=True,
+                 use_deform=False):
+        super().__init__(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            bias=bias)
+        self.use_deform = use_deform
+        self.switch = nn.Conv2d(
+            self.in_channels, 1, kernel_size=1, stride=stride, bias=True)
+        self.weight_diff = nn.Parameter(torch.Tensor(self.weight.size()))
+        self.pre_context = nn.Conv2d(
+            self.in_channels, self.in_channels, kernel_size=1, bias=True)
+        self.post_context = nn.Conv2d(
+            self.out_channels, self.out_channels, kernel_size=1, bias=True)
+        if self.use_deform:
+            self.offset_s = nn.Conv2d(
+                self.in_channels,
+                18,
+                kernel_size=3,
+                padding=1,
+                stride=stride,
+                bias=True)
+            self.offset_l = nn.Conv2d(
+                self.in_channels,
+                18,
+                kernel_size=3,
+                padding=1,
+                stride=stride,
+                bias=True)
+        self.init_weights()
+
+    def init_weights(self):
+        constant_init(self.switch, 0, bias=1)
+        self.weight_diff.data.zero_()
+        constant_init(self.pre_context, 0)
+        constant_init(self.post_context, 0)
+        if self.use_deform:
+            constant_init(self.offset_s, 0)
+            constant_init(self.offset_l, 0)
+
+    def forward(self, x):
+        # pre-context
+        avg_x = F.adaptive_avg_pool2d(x, output_size=1)
+        avg_x = self.pre_context(avg_x)
+        avg_x = avg_x.expand_as(x)
+        x = x + avg_x
+        # switch
+        avg_x = F.pad(x, pad=(2, 2, 2, 2), mode='reflect')
+        avg_x = F.avg_pool2d(avg_x, kernel_size=5, stride=1, padding=0)
+        switch = self.switch(avg_x)
+        # sac
+        weight = self._get_weight(self.weight)
+        zero_bias = torch.zeros(
+            self.out_channels, device=weight.device, dtype=weight.dtype)
+
+        if self.use_deform:
+            offset = self.offset_s(avg_x)
+            out_s = deform_conv2d(x, offset, weight, self.stride, self.padding,
+                                  self.dilation, self.groups, 1)
+        else:
+            if (TORCH_VERSION == 'parrots'
+                    or digit_version(TORCH_VERSION) < digit_version('1.5.0')):
+                out_s = super().conv2d_forward(x, weight)
+            elif digit_version(TORCH_VERSION) >= digit_version('1.8.0'):
+                # bias is a required argument of _conv_forward in torch 1.8.0
+                out_s = super()._conv_forward(x, weight, zero_bias)
+            else:
+                out_s = super()._conv_forward(x, weight)
+        ori_p = self.padding
+        ori_d = self.dilation
+        self.padding = tuple(3 * p for p in self.padding)
+        self.dilation = tuple(3 * d for d in self.dilation)
+        weight = weight + self.weight_diff
+        if self.use_deform:
+            offset = self.offset_l(avg_x)
+            out_l = deform_conv2d(x, offset, weight, self.stride, self.padding,
+                                  self.dilation, self.groups, 1)
+        else:
+            if (TORCH_VERSION == 'parrots'
+                    or digit_version(TORCH_VERSION) < digit_version('1.5.0')):
+                out_l = super().conv2d_forward(x, weight)
+            elif digit_version(TORCH_VERSION) >= digit_version('1.8.0'):
+                # bias is a required argument of _conv_forward in torch 1.8.0
+                out_l = super()._conv_forward(x, weight, zero_bias)
+            else:
+                out_l = super()._conv_forward(x, weight)
+
+        out = switch * out_s + (1 - switch) * out_l
+        self.padding = ori_p
+        self.dilation = ori_d
+        # post-context
+        avg_x = F.adaptive_avg_pool2d(out, output_size=1)
+        avg_x = self.post_context(avg_x)
+        avg_x = avg_x.expand_as(out)
+        out = out + avg_x
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/scatter_points.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/scatter_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d881bfe63309fb406c123ee69d4e37125f45843
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/scatter_points.py
@@ -0,0 +1,148 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, List, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from torch import nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext',
+    ['dynamic_point_to_voxel_forward', 'dynamic_point_to_voxel_backward'])
+
+
+class _DynamicScatter(Function):
+
+    @staticmethod
+    def forward(ctx: Any,
+                feats: torch.Tensor,
+                coors: torch.Tensor,
+                reduce_type: str = 'max') -> Tuple[torch.Tensor, torch.Tensor]:
+        """convert kitti points(N, >=3) to voxels.
+
+        Args:
+            feats (torch.Tensor): [N, C]. Points features to be reduced
+                into voxels.
+            coors (torch.Tensor): [N, ndim]. Corresponding voxel coordinates
+                (specifically multi-dim voxel index) of each points.
+            reduce_type (str, optional): Reduce op. support 'max', 'sum' and
+                'mean'. Default: 'max'.
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains two elements. The first one
+            is the voxel features with shape [M, C] which are respectively
+            reduced from input features that share the same voxel coordinates.
+            The second is voxel coordinates with shape [M, ndim].
+        """
+        results = ext_module.dynamic_point_to_voxel_forward(
+            feats, coors, reduce_type)
+        (voxel_feats, voxel_coors, point2voxel_map,
+         voxel_points_count) = results
+        ctx.reduce_type = reduce_type
+        ctx.save_for_backward(feats, voxel_feats, point2voxel_map,
+                              voxel_points_count)
+        ctx.mark_non_differentiable(voxel_coors)
+        return voxel_feats, voxel_coors
+
+    @staticmethod
+    def backward(ctx: Any,
+                 grad_voxel_feats: torch.Tensor,
+                 grad_voxel_coors: Optional[torch.Tensor] = None) -> tuple:
+        (feats, voxel_feats, point2voxel_map,
+         voxel_points_count) = ctx.saved_tensors
+        grad_feats = torch.zeros_like(feats)
+        # TODO: whether to use index put or use cuda_backward
+        # To use index put, need point to voxel index
+        ext_module.dynamic_point_to_voxel_backward(
+            grad_feats, grad_voxel_feats.contiguous(), feats, voxel_feats,
+            point2voxel_map, voxel_points_count, ctx.reduce_type)
+        return grad_feats, None, None
+
+
+dynamic_scatter = _DynamicScatter.apply
+
+
+class DynamicScatter(nn.Module):
+    """Scatters points into voxels, used in the voxel encoder with dynamic
+    voxelization.
+
+    Note:
+        The CPU and GPU implementation get the same output, but have numerical
+        difference after summation and division (e.g., 5e-7).
+
+    Args:
+        voxel_size (list): list [x, y, z] size of three dimension.
+        point_cloud_range (list): The coordinate range of points, [x_min,
+            y_min, z_min, x_max, y_max, z_max].
+        average_points (bool): whether to use avg pooling to scatter points
+            into voxel.
+    """
+
+    def __init__(self, voxel_size: List, point_cloud_range: List,
+                 average_points: bool):
+        super().__init__()
+
+        self.voxel_size = voxel_size
+        self.point_cloud_range = point_cloud_range
+        self.average_points = average_points
+
+    def forward_single(
+            self, points: torch.Tensor,
+            coors: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Scatters points into voxels.
+
+        Args:
+            points (torch.Tensor): Points to be reduced into voxels.
+            coors (torch.Tensor): Corresponding voxel coordinates (specifically
+                multi-dim voxel index) of each points.
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains two elements. The first one
+            is the voxel features with shape [M, C] which are respectively
+            reduced from input features that share the same voxel coordinates.
+            The second is voxel coordinates with shape [M, ndim].
+        """
+        reduce = 'mean' if self.average_points else 'max'
+        return dynamic_scatter(points.contiguous(), coors.contiguous(), reduce)
+
+    def forward(self, points: torch.Tensor,
+                coors: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Scatters points/features into voxels.
+
+        Args:
+            points (torch.Tensor): Points to be reduced into voxels.
+            coors (torch.Tensor): Corresponding voxel coordinates (specifically
+                multi-dim voxel index) of each points.
+
+        Returns:
+            tuple[torch.Tensor]: A tuple contains two elements. The first one
+            is the voxel features with shape [M, C] which are respectively
+            reduced from input features that share the same voxel coordinates.
+            The second is voxel coordinates with shape [M, ndim].
+        """
+        if coors.size(-1) == 3:
+            return self.forward_single(points, coors)
+        else:
+            batch_size = coors[-1, 0] + 1
+            voxels, voxel_coors = [], []
+            for i in range(batch_size):
+                inds = torch.where(coors[:, 0] == i)
+                voxel, voxel_coor = self.forward_single(
+                    points[inds], coors[inds][:, 1:])
+                coor_pad = F.pad(voxel_coor, (1, 0), mode='constant', value=i)
+                voxel_coors.append(coor_pad)
+                voxels.append(voxel)
+            features = torch.cat(voxels, dim=0)
+            feature_coors = torch.cat(voxel_coors, dim=0)
+
+            return features, feature_coors
+
+    def __repr__(self):
+        s = self.__class__.__name__ + '('
+        s += 'voxel_size=' + str(self.voxel_size)
+        s += ', point_cloud_range=' + str(self.point_cloud_range)
+        s += ', average_points=' + str(self.average_points)
+        s += ')'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/sync_bn.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/sync_bn.py
new file mode 100644
index 0000000000000000000000000000000000000000..2b14d30376c7d1dce957e2225b5b8c8af54bb52a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/sync_bn.py
@@ -0,0 +1,283 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import torch
+import torch.distributed as dist
+import torch.nn.functional as F
+from mmengine.registry import MODELS
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+from torch.nn.modules.module import Module
+from torch.nn.parameter import Parameter
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', [
+    'sync_bn_forward_mean', 'sync_bn_forward_var', 'sync_bn_forward_output',
+    'sync_bn_backward_param', 'sync_bn_backward_data'
+])
+
+
+class SyncBatchNormFunction(Function):
+
+    @staticmethod
+    def symbolic(g, input, running_mean, running_var, weight, bias, momentum,
+                 eps, group, group_size, stats_mode):
+        return g.op(
+            'mmcv::MMCVSyncBatchNorm',
+            input,
+            running_mean,
+            running_var,
+            weight,
+            bias,
+            momentum_f=momentum,
+            eps_f=eps,
+            group_i=group,
+            group_size_i=group_size,
+            stats_mode=stats_mode)
+
+    @staticmethod
+    def forward(self, input: torch.Tensor, running_mean: torch.Tensor,
+                running_var: torch.Tensor, weight: torch.Tensor,
+                bias: torch.Tensor, momentum: float, eps: float, group: int,
+                group_size: int, stats_mode: str) -> torch.Tensor:
+        self.momentum = momentum
+        self.eps = eps
+        self.group = group
+        self.group_size = group_size
+        self.stats_mode = stats_mode
+
+        assert isinstance(
+                   input, (torch.HalfTensor, torch.FloatTensor,
+                           torch.cuda.HalfTensor, torch.cuda.FloatTensor)), \
+               f'only support Half or Float Tensor, but {input.type()}'
+        output = torch.zeros_like(input)
+        input3d = input.flatten(start_dim=2)
+        output3d = output.view_as(input3d)
+        num_channels = input3d.size(1)
+
+        # ensure mean/var/norm/std are initialized as zeros
+        # ``torch.empty()`` does not guarantee that
+        mean = torch.zeros(
+            num_channels, dtype=torch.float, device=input3d.device)
+        var = torch.zeros(
+            num_channels, dtype=torch.float, device=input3d.device)
+        norm = torch.zeros_like(
+            input3d, dtype=torch.float, device=input3d.device)
+        std = torch.zeros(
+            num_channels, dtype=torch.float, device=input3d.device)
+
+        batch_size = input3d.size(0)
+        if batch_size > 0:
+            ext_module.sync_bn_forward_mean(input3d, mean)
+            batch_flag = torch.ones([1], device=mean.device, dtype=mean.dtype)
+        else:
+            # skip updating mean and leave it as zeros when the input is empty
+            batch_flag = torch.zeros([1], device=mean.device, dtype=mean.dtype)
+
+        # synchronize mean and the batch flag
+        vec = torch.cat([mean, batch_flag])
+        if self.stats_mode == 'N':
+            vec *= batch_size
+        if self.group_size > 1:
+            dist.all_reduce(vec, group=self.group)
+        total_batch = vec[-1].detach()
+        mean = vec[:num_channels]
+
+        if self.stats_mode == 'default':
+            mean = mean / self.group_size
+        elif self.stats_mode == 'N':
+            mean = mean / total_batch.clamp(min=1)
+        else:
+            raise NotImplementedError
+
+        # leave var as zeros when the input is empty
+        if batch_size > 0:
+            ext_module.sync_bn_forward_var(input3d, mean, var)
+
+        if self.stats_mode == 'N':
+            var *= batch_size
+        if self.group_size > 1:
+            dist.all_reduce(var, group=self.group)
+
+        if self.stats_mode == 'default':
+            var /= self.group_size
+        elif self.stats_mode == 'N':
+            var /= total_batch.clamp(min=1)
+        else:
+            raise NotImplementedError
+
+        # if the total batch size over all the ranks is zero,
+        # we should not update the statistics in the current batch
+        update_flag = total_batch.clamp(max=1)
+        momentum = update_flag * self.momentum
+        ext_module.sync_bn_forward_output(
+            input3d,
+            mean,
+            var,
+            weight,
+            bias,
+            running_mean,
+            running_var,
+            norm,
+            std,
+            output3d,
+            eps=self.eps,
+            momentum=momentum,
+            group_size=self.group_size)
+        self.save_for_backward(norm, std, weight)
+        return output
+
+    @staticmethod
+    @once_differentiable
+    def backward(self, grad_output: torch.Tensor) -> tuple:
+        norm, std, weight = self.saved_tensors
+        grad_weight = torch.zeros_like(weight)
+        grad_bias = torch.zeros_like(weight)
+        grad_input = torch.zeros_like(grad_output)
+        grad_output3d = grad_output.flatten(start_dim=2)
+        grad_input3d = grad_input.view_as(grad_output3d)
+
+        batch_size = grad_input3d.size(0)
+        if batch_size > 0:
+            ext_module.sync_bn_backward_param(grad_output3d, norm, grad_weight,
+                                              grad_bias)
+
+        # all reduce
+        if self.group_size > 1:
+            dist.all_reduce(grad_weight, group=self.group)
+            dist.all_reduce(grad_bias, group=self.group)
+            grad_weight /= self.group_size
+            grad_bias /= self.group_size
+
+        if batch_size > 0:
+            ext_module.sync_bn_backward_data(grad_output3d, weight,
+                                             grad_weight, grad_bias, norm, std,
+                                             grad_input3d)
+
+        return grad_input, None, None, grad_weight, grad_bias, \
+            None, None, None, None, None
+
+
+@MODELS.register_module(name='MMSyncBN')
+class SyncBatchNorm(Module):
+    """Synchronized Batch Normalization.
+
+    Args:
+        num_features (int): number of features/chennels in input tensor
+        eps (float, optional): a value added to the denominator for numerical
+            stability. Defaults to 1e-5.
+        momentum (float, optional): the value used for the running_mean and
+            running_var computation. Defaults to 0.1.
+        affine (bool, optional): whether to use learnable affine parameters.
+            Defaults to True.
+        track_running_stats (bool, optional): whether to track the running
+            mean and variance during training. When set to False, this
+            module does not track such statistics, and initializes statistics
+            buffers ``running_mean`` and ``running_var`` as ``None``. When
+            these buffers are ``None``, this module always uses batch
+            statistics in both training and eval modes. Defaults to True.
+        group (int, optional): synchronization of stats happen within
+            each process group individually. By default it is synchronization
+            across the whole world. Defaults to None.
+        stats_mode (str, optional): The statistical mode. Available options
+            includes ``'default'`` and ``'N'``. Defaults to 'default'.
+            When ``stats_mode=='default'``, it computes the overall statistics
+            using those from each worker with equal weight, i.e., the
+            statistics are synchronized and simply divied by ``group``. This
+            mode will produce inaccurate statistics when empty tensors occur.
+            When ``stats_mode=='N'``, it compute the overall statistics using
+            the total number of batches in each worker ignoring the number of
+            group, i.e., the statistics are synchronized and then divied by
+            the total batch ``N``. This mode is beneficial when empty tensors
+            occur during training, as it average the total mean by the real
+            number of batch.
+    """
+
+    def __init__(self,
+                 num_features: int,
+                 eps: float = 1e-5,
+                 momentum: float = 0.1,
+                 affine: bool = True,
+                 track_running_stats: bool = True,
+                 group: Optional[int] = None,
+                 stats_mode: str = 'default'):
+        super().__init__()
+        self.num_features = num_features
+        self.eps = eps
+        self.momentum = momentum
+        self.affine = affine
+        self.track_running_stats = track_running_stats
+        group = dist.group.WORLD if group is None else group
+        self.group = group
+        self.group_size = dist.get_world_size(group)
+        assert stats_mode in ['default', 'N'], \
+            f'"stats_mode" only accepts "default" and "N", got "{stats_mode}"'
+        self.stats_mode = stats_mode
+        if self.affine:
+            self.weight = Parameter(torch.Tensor(num_features))
+            self.bias = Parameter(torch.Tensor(num_features))
+        else:
+            self.register_parameter('weight', None)
+            self.register_parameter('bias', None)
+        if self.track_running_stats:
+            self.register_buffer('running_mean', torch.zeros(num_features))
+            self.register_buffer('running_var', torch.ones(num_features))
+            self.register_buffer('num_batches_tracked',
+                                 torch.tensor(0, dtype=torch.long))
+        else:
+            self.register_buffer('running_mean', None)
+            self.register_buffer('running_var', None)
+            self.register_buffer('num_batches_tracked', None)
+        self.reset_parameters()
+
+    def reset_running_stats(self):
+        if self.track_running_stats:
+            self.running_mean.zero_()
+            self.running_var.fill_(1)
+            self.num_batches_tracked.zero_()
+
+    def reset_parameters(self):
+        self.reset_running_stats()
+        if self.affine:
+            self.weight.data.uniform_()  # pytorch use ones_()
+            self.bias.data.zero_()
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        if input.dim() < 2:
+            raise ValueError(
+                f'expected at least 2D input, got {input.dim()}D input')
+        if self.momentum is None:
+            exponential_average_factor = 0.0
+        else:
+            exponential_average_factor = self.momentum
+
+        if self.training and self.track_running_stats:
+            if self.num_batches_tracked is not None:
+                self.num_batches_tracked += 1
+                if self.momentum is None:  # use cumulative moving average
+                    exponential_average_factor = 1.0 / float(
+                        self.num_batches_tracked)
+                else:  # use exponential moving average
+                    exponential_average_factor = self.momentum
+
+        if self.training or not self.track_running_stats:
+            return SyncBatchNormFunction.apply(
+                input, self.running_mean, self.running_var, self.weight,
+                self.bias, exponential_average_factor, self.eps, self.group,
+                self.group_size, self.stats_mode)
+        else:
+            return F.batch_norm(input, self.running_mean, self.running_var,
+                                self.weight, self.bias, False,
+                                exponential_average_factor, self.eps)
+
+    def __repr__(self):
+        s = self.__class__.__name__
+        s += f'({self.num_features}, '
+        s += f'eps={self.eps}, '
+        s += f'momentum={self.momentum}, '
+        s += f'affine={self.affine}, '
+        s += f'track_running_stats={self.track_running_stats}, '
+        s += f'group_size={self.group_size},'
+        s += f'stats_mode={self.stats_mode})'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_interpolate.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_interpolate.py
new file mode 100644
index 0000000000000000000000000000000000000000..286bd0472ebae83f405534178a19fefe9ffbc384
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_interpolate.py
@@ -0,0 +1,69 @@
+from typing import Any, Tuple
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['three_interpolate_forward', 'three_interpolate_backward'])
+
+
+class ThreeInterpolate(Function):
+    """Performs weighted linear interpolation on 3 features.
+
+    Please refer to `Paper of PointNet++ <https://arxiv.org/abs/1706.02413>`_
+    for more details.
+    """
+
+    @staticmethod
+    def forward(ctx: Any, features: torch.Tensor, indices: torch.Tensor,
+                weight: torch.Tensor) -> torch.Tensor:
+        """
+        Args:
+            features (torch.Tensor): (B, C, M) Features descriptors to be
+                interpolated.
+            indices (torch.Tensor): (B, n, 3) indices of three nearest
+                neighbor features for the target features.
+            weight (torch.Tensor): (B, n, 3) weights of three nearest
+                neighbor features for the target features.
+
+        Returns:
+            torch.Tensor: (B, C, N) tensor of the interpolated features
+        """
+        assert features.is_contiguous()
+        assert indices.is_contiguous()
+        assert weight.is_contiguous()
+
+        B, c, m = features.size()
+        n = indices.size(1)
+        ctx.three_interpolate_for_backward = (indices, weight, m)
+        output = features.new_empty(B, c, n)
+
+        ext_module.three_interpolate_forward(
+            features, indices, weight, output, b=B, c=c, m=m, n=n)
+        return output
+
+    @staticmethod
+    def backward(
+        ctx, grad_out: torch.Tensor
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+        """
+        Args:
+            grad_out (torch.Tensor): (B, C, N) tensor with gradients of outputs
+
+        Returns:
+            torch.Tensor: (B, C, M) tensor with gradients of features
+        """
+        idx, weight, m = ctx.three_interpolate_for_backward
+        B, c, n = grad_out.size()
+
+        grad_features = grad_out.new_zeros(B, c, m)
+        grad_out_data = grad_out.data.contiguous()
+
+        ext_module.three_interpolate_backward(
+            grad_out_data, idx, weight, grad_features.data, b=B, c=c, n=n, m=m)
+        return grad_features, None, None
+
+
+three_interpolate = ThreeInterpolate.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_nn.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_nn.py
new file mode 100644
index 0000000000000000000000000000000000000000..d41b9789cf8e33c4cd490a8ec522e9c4bc7851e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/three_nn.py
@@ -0,0 +1,51 @@
+from typing import Any, Tuple
+
+import torch
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext', ['three_nn_forward'])
+
+
+class ThreeNN(Function):
+    """Find the top-3 nearest neighbors of the target set from the source set.
+
+    Please refer to `Paper of PointNet++ <https://arxiv.org/abs/1706.02413>`_
+    for more details.
+    """
+
+    @staticmethod
+    def forward(ctx: Any, target: torch.Tensor,
+                source: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        """
+        Args:
+            target (torch.Tensor): shape (B, N, 3), points set that needs to
+                find the nearest neighbors.
+            source (torch.Tensor): shape (B, M, 3), points set that is used
+                to find the nearest neighbors of points in target set.
+
+        Returns:
+            torch.Tensor: shape (B, N, 3), L2 distance of each point in target
+            set to their corresponding top three nearest neighbors.
+        """
+        target = target.contiguous()
+        source = source.contiguous()
+
+        B, N, _ = target.size()
+        m = source.size(1)
+        dist2 = target.new_empty(B, N, 3)
+        idx = target.new_empty(B, N, 3, dtype=torch.int32)
+
+        ext_module.three_nn_forward(target, source, dist2, idx, b=B, n=N, m=m)
+        if torch.__version__ != 'parrots':
+            ctx.mark_non_differentiable(idx)
+
+        return torch.sqrt(dist2), idx
+
+    @staticmethod
+    def backward(ctx, a=None, b=None):
+        return None, None
+
+
+three_nn = ThreeNN.apply
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/tin_shift.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/tin_shift.py
new file mode 100644
index 0000000000000000000000000000000000000000..473231cc0de002bbf8bdb22cc19755487fbddb48
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/tin_shift.py
@@ -0,0 +1,75 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Code reference from "Temporal Interlacing Network"
+# https://github.com/deepcs233/TIN/blob/master/cuda_shift/rtc_wrap.py
+# Hao Shao, Shengju Qian, Yu Liu
+# shaoh19@mails.tsinghua.edu.cn, sjqian@cse.cuhk.edu.hk, yuliu@ee.cuhk.edu.hk
+
+import torch
+import torch.nn as nn
+from torch.autograd import Function
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext('_ext',
+                                 ['tin_shift_forward', 'tin_shift_backward'])
+
+
+class TINShiftFunction(Function):
+
+    @staticmethod
+    def forward(ctx, input, shift):
+        if input.size(0) != shift.size(0):
+            raise ValueError(
+                'The first dim (batch) of `input` and `shift` should be '
+                f'same, but got {input.size(0)} and {shift.size(0)}.')
+        C = input.size(2)
+        num_segments = shift.size(1)
+        if C // num_segments <= 0 or C % num_segments != 0:
+            raise ValueError('C should be a multiple of num_segments, '
+                             f'but got C={C} and num_segments={num_segments}.')
+
+        ctx.save_for_backward(shift)
+
+        out = torch.zeros_like(input)
+        ext_module.tin_shift_forward(input, shift, out)
+
+        return out
+
+    @staticmethod
+    def backward(ctx, grad_output):
+
+        shift = ctx.saved_tensors[0]
+        data_grad_input = grad_output.new(*grad_output.size()).zero_()
+        shift_grad_input = shift.new(*shift.size()).zero_()
+        ext_module.tin_shift_backward(grad_output, shift, data_grad_input)
+
+        return data_grad_input, shift_grad_input
+
+
+tin_shift = TINShiftFunction.apply
+
+
+class TINShift(nn.Module):
+    """Temporal Interlace Shift.
+
+    Temporal Interlace shift is a differentiable temporal-wise frame shifting
+    which is proposed in "Temporal Interlacing Network"
+
+    Please refer to `Temporal Interlacing Network
+    <https://arxiv.org/abs/2001.06499>`_ for more details.
+
+    Code is modified from https://github.com/mit-han-lab/temporal-shift-module
+    """
+
+    def forward(self, input, shift):
+        """Perform temporal interlace shift.
+
+        Args:
+            input (torch.Tensor): Feature map with shape
+                [N, num_segments, C, H * W].
+            shift (torch.Tensor): Shift tensor with shape [N, num_segments].
+
+        Returns:
+            Feature map after temporal interlace shift.
+        """
+        return tin_shift(input, shift)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/upfirdn2d.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/upfirdn2d.py
new file mode 100644
index 0000000000000000000000000000000000000000..574d4d315bb793ccb0f9ac2d6d890894f5338c9b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/upfirdn2d.py
@@ -0,0 +1,341 @@
+# modified from https://github.com/rosinality/stylegan2-pytorch/blob/master/op/upfirdn2d.py  # noqa:E501
+
+# Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+# NVIDIA Source Code License for StyleGAN2 with Adaptive Discriminator
+# Augmentation (ADA)
+# =======================================================================
+
+# 1. Definitions
+
+# "Licensor" means any person or entity that distributes its Work.
+
+# "Software" means the original work of authorship made available under
+# this License.
+
+# "Work" means the Software and any additions to or derivative works of
+# the Software that are made available under this License.
+
+# The terms "reproduce," "reproduction," "derivative works," and
+# "distribution" have the meaning as provided under U.S. copyright law;
+# provided, however, that for the purposes of this License, derivative
+# works shall not include works that remain separable from, or merely
+# link (or bind by name) to the interfaces of, the Work.
+
+# Works, including the Software, are "made available" under this License
+# by including in or with the Work either (a) a copyright notice
+# referencing the applicability of this License to the Work, or (b) a
+# copy of this License.
+
+# 2. License Grants
+
+#     2.1 Copyright Grant. Subject to the terms and conditions of this
+#     License, each Licensor grants to you a perpetual, worldwide,
+#     non-exclusive, royalty-free, copyright license to reproduce,
+#     prepare derivative works of, publicly display, publicly perform,
+#     sublicense and distribute its Work and any resulting derivative
+#     works in any form.
+
+# 3. Limitations
+
+#     3.1 Redistribution. You may reproduce or distribute the Work only
+#     if (a) you do so under this License, (b) you include a complete
+#     copy of this License with your distribution, and (c) you retain
+#     without modification any copyright, patent, trademark, or
+#     attribution notices that are present in the Work.
+
+#     3.2 Derivative Works. You may specify that additional or different
+#     terms apply to the use, reproduction, and distribution of your
+#     derivative works of the Work ("Your Terms") only if (a) Your Terms
+#     provide that the use limitation in Section 3.3 applies to your
+#     derivative works, and (b) you identify the specific derivative
+#     works that are subject to Your Terms. Notwithstanding Your Terms,
+#     this License (including the redistribution requirements in Section
+#     3.1) will continue to apply to the Work itself.
+
+#     3.3 Use Limitation. The Work and any derivative works thereof only
+#     may be used or intended for use non-commercially. Notwithstanding
+#     the foregoing, NVIDIA and its affiliates may use the Work and any
+#     derivative works commercially. As used herein, "non-commercially"
+#     means for research or evaluation purposes only.
+
+#     3.4 Patent Claims. If you bring or threaten to bring a patent claim
+#     against any Licensor (including any claim, cross-claim or
+#     counterclaim in a lawsuit) to enforce any patents that you allege
+#     are infringed by any Work, then your rights under this License from
+#     such Licensor (including the grant in Section 2.1) will terminate
+#     immediately.
+
+#     3.5 Trademarks. This License does not grant any rights to use any
+#     Licensor’s or its affiliates’ names, logos, or trademarks, except
+#     as necessary to reproduce the notices described in this License.
+
+#     3.6 Termination. If you violate any term of this License, then your
+#     rights under this License (including the grant in Section 2.1) will
+#     terminate immediately.
+
+# 4. Disclaimer of Warranty.
+
+# THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+# NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+# THIS LICENSE.
+
+# 5. Limitation of Liability.
+
+# EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+# THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+# SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+# INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+# OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+# (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+# LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+# COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGES.
+
+# =======================================================================
+
+from typing import Any, List, Tuple, Union
+
+import torch
+from mmengine.utils import to_2tuple
+from torch.autograd import Function
+from torch.nn import functional as F
+
+from ..utils import ext_loader
+
+upfirdn2d_ext = ext_loader.load_ext('_ext', ['upfirdn2d'])
+
+
+class UpFirDn2dBackward(Function):
+
+    @staticmethod
+    def forward(ctx: Any, grad_output: torch.Tensor, kernel: torch.Tensor,
+                grad_kernel: torch.Tensor, up: tuple, down: tuple, pad: tuple,
+                g_pad: tuple, in_size: Union[List, Tuple],
+                out_size: Union[List, Tuple]) -> torch.Tensor:
+
+        up_x, up_y = up
+        down_x, down_y = down
+        g_pad_x0, g_pad_x1, g_pad_y0, g_pad_y1 = g_pad
+
+        grad_output = grad_output.reshape(-1, out_size[0], out_size[1], 1)
+
+        grad_input = upfirdn2d_ext.upfirdn2d(
+            grad_output,
+            grad_kernel,
+            up_x=down_x,
+            up_y=down_y,
+            down_x=up_x,
+            down_y=up_y,
+            pad_x0=g_pad_x0,
+            pad_x1=g_pad_x1,
+            pad_y0=g_pad_y0,
+            pad_y1=g_pad_y1)
+        grad_input = grad_input.view(in_size[0], in_size[1], in_size[2],
+                                     in_size[3])
+
+        ctx.save_for_backward(kernel)
+
+        pad_x0, pad_x1, pad_y0, pad_y1 = pad
+
+        ctx.up_x = up_x
+        ctx.up_y = up_y
+        ctx.down_x = down_x
+        ctx.down_y = down_y
+        ctx.pad_x0 = pad_x0
+        ctx.pad_x1 = pad_x1
+        ctx.pad_y0 = pad_y0
+        ctx.pad_y1 = pad_y1
+        ctx.in_size = in_size
+        ctx.out_size = out_size
+
+        return grad_input
+
+    @staticmethod
+    def backward(ctx: Any, gradgrad_input: torch.Tensor) -> tuple:
+        kernel, = ctx.saved_tensors
+
+        gradgrad_input = gradgrad_input.reshape(-1, ctx.in_size[2],
+                                                ctx.in_size[3], 1)
+
+        gradgrad_out = upfirdn2d_ext.upfirdn2d(
+            gradgrad_input,
+            kernel,
+            up_x=ctx.up_x,
+            up_y=ctx.up_y,
+            down_x=ctx.down_x,
+            down_y=ctx.down_y,
+            pad_x0=ctx.pad_x0,
+            pad_x1=ctx.pad_x1,
+            pad_y0=ctx.pad_y0,
+            pad_y1=ctx.pad_y1)
+        # gradgrad_out = gradgrad_out.view(ctx.in_size[0], ctx.out_size[0],
+        #                                  ctx.out_size[1], ctx.in_size[3])
+        gradgrad_out = gradgrad_out.view(ctx.in_size[0], ctx.in_size[1],
+                                         ctx.out_size[0], ctx.out_size[1])
+
+        return gradgrad_out, None, None, None, None, None, None, None, None
+
+
+class UpFirDn2d(Function):
+
+    @staticmethod
+    def forward(ctx: Any, input: torch.Tensor, kernel: torch.Tensor, up: tuple,
+                down: tuple, pad: tuple) -> torch.Tensor:
+        up_x, up_y = up
+        down_x, down_y = down
+        pad_x0, pad_x1, pad_y0, pad_y1 = pad
+
+        kernel_h, kernel_w = kernel.shape
+        batch, channel, in_h, in_w = input.shape
+        ctx.in_size = input.shape
+
+        input = input.reshape(-1, in_h, in_w, 1)
+
+        ctx.save_for_backward(kernel, torch.flip(kernel, [0, 1]))
+
+        out_h = (in_h * up_y + pad_y0 + pad_y1 - kernel_h) // down_y + 1
+        out_w = (in_w * up_x + pad_x0 + pad_x1 - kernel_w) // down_x + 1
+        ctx.out_size = (out_h, out_w)
+
+        ctx.up = (up_x, up_y)
+        ctx.down = (down_x, down_y)
+        ctx.pad = (pad_x0, pad_x1, pad_y0, pad_y1)
+
+        g_pad_x0 = kernel_w - pad_x0 - 1
+        g_pad_y0 = kernel_h - pad_y0 - 1
+        g_pad_x1 = in_w * up_x - out_w * down_x + pad_x0 - up_x + 1
+        g_pad_y1 = in_h * up_y - out_h * down_y + pad_y0 - up_y + 1
+
+        ctx.g_pad = (g_pad_x0, g_pad_x1, g_pad_y0, g_pad_y1)
+
+        out = upfirdn2d_ext.upfirdn2d(
+            input,
+            kernel,
+            up_x=up_x,
+            up_y=up_y,
+            down_x=down_x,
+            down_y=down_y,
+            pad_x0=pad_x0,
+            pad_x1=pad_x1,
+            pad_y0=pad_y0,
+            pad_y1=pad_y1)
+        # out = out.view(major, out_h, out_w, minor)
+        out = out.view(-1, channel, out_h, out_w)
+
+        return out
+
+    @staticmethod
+    def backward(ctx: Any, grad_output: torch.Tensor) -> tuple:
+        kernel, grad_kernel = ctx.saved_tensors
+
+        grad_input = UpFirDn2dBackward.apply(
+            grad_output,
+            kernel,
+            grad_kernel,
+            ctx.up,
+            ctx.down,
+            ctx.pad,
+            ctx.g_pad,
+            ctx.in_size,
+            ctx.out_size,
+        )
+
+        return grad_input, None, None, None, None
+
+
+def upfirdn2d(
+    input: torch.Tensor,
+    kernel: torch.Tensor,
+    up: Union[int, tuple] = 1,
+    down: Union[int, tuple] = 1,
+    pad: tuple = (0, 0)) -> torch.Tensor:  # noqa E125
+    """UpFRIDn for 2d features.
+
+    UpFIRDn is short for upsample, apply FIR filter and downsample. More
+    details can be found in:
+    https://www.mathworks.com/help/signal/ref/upfirdn.html
+
+    Args:
+        input (torch.Tensor): Tensor with shape of (n, c, h, w).
+        kernel (torch.Tensor): Filter kernel.
+        up (int | tuple[int], optional): Upsampling factor. If given a number,
+            we will use this factor for the both height and width side.
+            Defaults to 1.
+        down (int | tuple[int], optional): Downsampling factor. If given a
+            number, we will use this factor for the both height and width side.
+            Defaults to 1.
+        pad (tuple[int], optional): Padding for tensors, (x_pad, y_pad) or
+            (x_pad_0, x_pad_1, y_pad_0, y_pad_1). Defaults to (0, 0).
+
+    Returns:
+        torch.Tensor: Tensor after UpFIRDn.
+    """
+    if input.device.type == 'cpu':
+        if len(pad) == 2:
+            pad = (pad[0], pad[1], pad[0], pad[1])  # type: ignore
+
+        _up = to_2tuple(up)
+
+        _down = to_2tuple(down)
+
+        out = upfirdn2d_native(input, kernel, _up[0], _up[1], _down[0],
+                               _down[1], pad[0], pad[1], pad[2], pad[3])
+    else:
+        _up = to_2tuple(up)
+
+        _down = to_2tuple(down)
+
+        if len(pad) == 4:
+            _pad = pad
+        elif len(pad) == 2:
+            _pad = (pad[0], pad[1], pad[0], pad[1])
+
+        out = UpFirDn2d.apply(input, kernel, _up, _down, _pad)
+
+    return out
+
+
+def upfirdn2d_native(input: torch.Tensor, kernel: torch.Tensor, up_x: int,
+                     up_y: int, down_x: int, down_y: int, pad_x0: int,
+                     pad_x1: int, pad_y0: int, pad_y1: int) -> torch.Tensor:
+    _, channel, in_h, in_w = input.shape
+    input = input.reshape(-1, in_h, in_w, 1)
+
+    _, in_h, in_w, minor = input.shape
+    kernel_h, kernel_w = kernel.shape
+
+    out = input.view(-1, in_h, 1, in_w, 1, minor)
+    out = F.pad(out, [0, 0, 0, up_x - 1, 0, 0, 0, up_y - 1])
+    out = out.view(-1, in_h * up_y, in_w * up_x, minor)
+
+    out = F.pad(
+        out,
+        [0, 0,
+         max(pad_x0, 0),
+         max(pad_x1, 0),
+         max(pad_y0, 0),
+         max(pad_y1, 0)])
+    out = out[:,
+              max(-pad_y0, 0):out.shape[1] - max(-pad_y1, 0),
+              max(-pad_x0, 0):out.shape[2] - max(-pad_x1, 0), :, ]
+
+    out = out.permute(0, 3, 1, 2)
+    out = out.reshape(
+        [-1, 1, in_h * up_y + pad_y0 + pad_y1, in_w * up_x + pad_x0 + pad_x1])
+    w = torch.flip(kernel, [0, 1]).view(1, 1, kernel_h, kernel_w)
+    out = F.conv2d(out, w)
+    out = out.reshape(
+        -1,
+        minor,
+        in_h * up_y + pad_y0 + pad_y1 - kernel_h + 1,
+        in_w * up_x + pad_x0 + pad_x1 - kernel_w + 1,
+    )
+    out = out.permute(0, 2, 3, 1)
+    out = out[:, ::down_y, ::down_x, :]
+
+    out_h = (in_h * up_y + pad_y0 + pad_y1 - kernel_h) // down_y + 1
+    out_w = (in_w * up_x + pad_x0 + pad_x1 - kernel_w) // down_x + 1
+
+    return out.view(-1, channel, out_h, out_w)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/voxelize.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/voxelize.py
new file mode 100644
index 0000000000000000000000000000000000000000..992ce68fd2a970bd475abaae68e62c78fec0e4c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/ops/voxelize.py
@@ -0,0 +1,183 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, List, Tuple, Union
+
+import torch
+from torch import nn
+from torch.autograd import Function
+from torch.nn.modules.utils import _pair
+
+from ..utils import ext_loader
+
+ext_module = ext_loader.load_ext(
+    '_ext', ['dynamic_voxelize_forward', 'hard_voxelize_forward'])
+
+
+class _Voxelization(Function):
+
+    @staticmethod
+    def forward(
+            ctx: Any,
+            points: torch.Tensor,
+            voxel_size: Union[tuple, float],
+            coors_range: Union[tuple, float],
+            max_points: int = 35,
+            max_voxels: int = 20000,
+            deterministic: bool = True) -> Union[Tuple[torch.Tensor], Tuple]:
+        """Convert kitti points(N, >=3) to voxels.
+
+        Args:
+            points (torch.Tensor): [N, ndim]. Points[:, :3] contain xyz points
+                and points[:, 3:] contain other information like reflectivity.
+            voxel_size (tuple or float): The size of voxel with the shape of
+                [3].
+            coors_range (tuple or float): The coordinate range of voxel with
+                the shape of [6].
+            max_points (int, optional): maximum points contained in a voxel. if
+                max_points=-1, it means using dynamic_voxelize. Default: 35.
+            max_voxels (int, optional): maximum voxels this function create.
+                for second, 20000 is a good choice. Users should shuffle points
+                before call this function because max_voxels may drop points.
+                Default: 20000.
+            deterministic: bool. whether to invoke the non-deterministic
+                version of hard-voxelization implementations. non-deterministic
+                version is considerablly fast but is not deterministic. only
+                affects hard voxelization. default True. for more information
+                of this argument and the implementation insights, please refer
+                to the following links:
+                https://github.com/open-mmlab/mmdetection3d/issues/894
+                https://github.com/open-mmlab/mmdetection3d/pull/904
+                it is an experimental feature and we will appreciate it if
+                you could share with us the failing cases.
+
+        Returns:
+            tuple[torch.Tensor]: tuple[torch.Tensor]: A tuple contains three
+            elements. The first one is the output voxels with the shape of
+            [M, max_points, n_dim], which only contain points and returned
+            when max_points != -1. The second is the voxel coordinates with
+            shape of [M, 3]. The last is number of point per voxel with the
+            shape of [M], which only returned when max_points != -1.
+        """
+        if max_points == -1 or max_voxels == -1:
+            coors = points.new_zeros(size=(points.size(0), 3), dtype=torch.int)
+            ext_module.dynamic_voxelize_forward(
+                points,
+                torch.tensor(voxel_size, dtype=torch.float),
+                torch.tensor(coors_range, dtype=torch.float),
+                coors,
+                NDim=3)
+            return coors
+        else:
+            voxels = points.new_zeros(
+                size=(max_voxels, max_points, points.size(1)))
+            coors = points.new_zeros(size=(max_voxels, 3), dtype=torch.int)
+            num_points_per_voxel = points.new_zeros(
+                size=(max_voxels, ), dtype=torch.int)
+            voxel_num = torch.zeros(size=(), dtype=torch.long)
+            ext_module.hard_voxelize_forward(
+                points,
+                torch.tensor(voxel_size, dtype=torch.float),
+                torch.tensor(coors_range, dtype=torch.float),
+                voxels,
+                coors,
+                num_points_per_voxel,
+                voxel_num,
+                max_points=max_points,
+                max_voxels=max_voxels,
+                NDim=3,
+                deterministic=deterministic)
+            # select the valid voxels
+            voxels_out = voxels[:voxel_num]
+            coors_out = coors[:voxel_num]
+            num_points_per_voxel_out = num_points_per_voxel[:voxel_num]
+            return voxels_out, coors_out, num_points_per_voxel_out
+
+
+voxelization = _Voxelization.apply
+
+
+class Voxelization(nn.Module):
+    """Convert kitti points(N, >=3) to voxels.
+
+    Please refer to `Point-Voxel CNN for Efficient 3D Deep Learning
+    <https://arxiv.org/abs/1907.03739>`_ for more details.
+
+    Args:
+        voxel_size (tuple or float): The size of voxel with the shape of [3].
+        point_cloud_range (tuple or float): The coordinate range of voxel with
+            the shape of [6].
+        max_num_points (int): maximum points contained in a voxel. if
+            max_points=-1, it means using dynamic_voxelize.
+        max_voxels (int, optional): maximum voxels this function create.
+            for second, 20000 is a good choice. Users should shuffle points
+            before call this function because max_voxels may drop points.
+            Default: 20000.
+    """
+
+    def __init__(self,
+                 voxel_size: List,
+                 point_cloud_range: List,
+                 max_num_points: int,
+                 max_voxels: Union[tuple, int] = 20000,
+                 deterministic: bool = True):
+        """
+        Args:
+            voxel_size (list): list [x, y, z] size of three dimension
+            point_cloud_range (list):
+                [x_min, y_min, z_min, x_max, y_max, z_max]
+            max_num_points (int): max number of points per voxel
+            max_voxels (tuple or int): max number of voxels in
+                (training, testing) time
+            deterministic: bool. whether to invoke the non-deterministic
+                version of hard-voxelization implementations. non-deterministic
+                version is considerablly fast but is not deterministic. only
+                affects hard voxelization. default True. for more information
+                of this argument and the implementation insights, please refer
+                to the following links:
+                https://github.com/open-mmlab/mmdetection3d/issues/894
+                https://github.com/open-mmlab/mmdetection3d/pull/904
+                it is an experimental feature and we will appreciate it if
+                you could share with us the failing cases.
+        """
+        super().__init__()
+
+        self.voxel_size = voxel_size
+        self.point_cloud_range = point_cloud_range
+        self.max_num_points = max_num_points
+        if isinstance(max_voxels, tuple):
+            self.max_voxels = max_voxels
+        else:
+            self.max_voxels = _pair(max_voxels)
+        self.deterministic = deterministic
+
+        point_cloud_range = torch.tensor(
+            point_cloud_range, dtype=torch.float32)
+        voxel_size = torch.tensor(voxel_size, dtype=torch.float32)
+        grid_size = (
+            point_cloud_range[3:] -  # type: ignore
+            point_cloud_range[:3]) / voxel_size  # type: ignore
+        grid_size = torch.round(grid_size).long()
+        input_feat_shape = grid_size[:2]
+        self.grid_size = grid_size
+        # the origin shape is as [x-len, y-len, z-len]
+        # [w, h, d] -> [d, h, w]
+        self.pcd_shape = [*input_feat_shape, 1][::-1]
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        if self.training:
+            max_voxels = self.max_voxels[0]
+        else:
+            max_voxels = self.max_voxels[1]
+
+        return voxelization(input, self.voxel_size, self.point_cloud_range,
+                            self.max_num_points, max_voxels,
+                            self.deterministic)
+
+    def __repr__(self):
+        s = self.__class__.__name__ + '('
+        s += 'voxel_size=' + str(self.voxel_size)
+        s += ', point_cloud_range=' + str(self.point_cloud_range)
+        s += ', max_num_points=' + str(self.max_num_points)
+        s += ', max_voxels=' + str(self.max_voxels)
+        s += ', deterministic=' + str(self.deterministic)
+        s += ')'
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4dbaa1bbfbde35696a67876eed4a283d79c85b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/__init__.py
@@ -0,0 +1,30 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base import BaseTransform
+from .builder import TRANSFORMS
+from .loading import LoadAnnotations, LoadImageFromFile
+from .processing import (CenterCrop, MultiScaleFlipAug, Normalize, Pad,
+                         RandomChoiceResize, RandomFlip, RandomGrayscale,
+                         RandomResize, Resize, TestTimeAug)
+from .wrappers import (Compose, KeyMapper, RandomApply, RandomChoice,
+                       TransformBroadcaster)
+
+try:
+    import torch  # noqa: F401
+except ImportError:
+    __all__ = [
+        'BaseTransform', 'TRANSFORMS', 'TransformBroadcaster', 'Compose',
+        'RandomChoice', 'KeyMapper', 'LoadImageFromFile', 'LoadAnnotations',
+        'Normalize', 'Resize', 'Pad', 'RandomFlip', 'RandomChoiceResize',
+        'CenterCrop', 'RandomGrayscale', 'MultiScaleFlipAug', 'RandomResize',
+        'RandomApply', 'TestTimeAug'
+    ]
+else:
+    from .formatting import ImageToTensor, ToTensor, to_tensor
+
+    __all__ = [
+        'BaseTransform', 'TRANSFORMS', 'TransformBroadcaster', 'Compose',
+        'RandomChoice', 'KeyMapper', 'LoadImageFromFile', 'LoadAnnotations',
+        'Normalize', 'Resize', 'Pad', 'ToTensor', 'to_tensor', 'ImageToTensor',
+        'RandomFlip', 'RandomChoiceResize', 'CenterCrop', 'RandomGrayscale',
+        'MultiScaleFlipAug', 'RandomResize', 'RandomApply', 'TestTimeAug'
+    ]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/base.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..321afb6038ca9f4504289c92c4bedd1c7aed40c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/base.py
@@ -0,0 +1,30 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import Dict, List, Optional, Tuple, Union
+
+
+class BaseTransform(metaclass=ABCMeta):
+    """Base class for all transformations."""
+
+    def __call__(self,
+                 results: Dict) -> Optional[Union[Dict, Tuple[List, List]]]:
+
+        return self.transform(results)
+
+    @abstractmethod
+    def transform(self,
+                  results: Dict) -> Optional[Union[Dict, Tuple[List, List]]]:
+        """The transform function. All subclass of BaseTransform should
+        override this method.
+
+        This function takes the result dict as the input, and can add new
+        items to the dict or modify existing items in the dict. And the result
+        dict will be returned in the end, which allows to concate multiple
+        transforms into a pipeline.
+
+        Args:
+            results (dict): The result dict.
+
+        Returns:
+            dict: The result dict.
+        """
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/builder.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/builder.py
new file mode 100644
index 0000000000000000000000000000000000000000..aefa21222a83d7bafa9eaf580eab41472c2cbd55
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/builder.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.registry import TRANSFORMS  # noqa: F401
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/formatting.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/formatting.py
new file mode 100644
index 0000000000000000000000000000000000000000..02089215e1796fa4709befc5030af93166332e41
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/formatting.py
@@ -0,0 +1,127 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Sequence, Union
+
+import mmengine
+import numpy as np
+import torch
+
+from .base import BaseTransform
+from .builder import TRANSFORMS
+
+
+def to_tensor(
+    data: Union[torch.Tensor, np.ndarray, Sequence, int,
+                float]) -> torch.Tensor:
+    """Convert objects of various python types to :obj:`torch.Tensor`.
+
+    Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`,
+    :class:`Sequence`, :class:`int` and :class:`float`.
+
+    Args:
+        data (torch.Tensor | numpy.ndarray | Sequence | int | float): Data to
+            be converted.
+
+    Returns:
+        torch.Tensor: the converted data.
+    """
+
+    if isinstance(data, torch.Tensor):
+        return data
+    elif isinstance(data, np.ndarray):
+        return torch.from_numpy(data)
+    elif isinstance(data, Sequence) and not mmengine.is_str(data):
+        return torch.tensor(data)
+    elif isinstance(data, int):
+        return torch.LongTensor([data])
+    elif isinstance(data, float):
+        return torch.FloatTensor([data])
+    else:
+        raise TypeError(f'type {type(data)} cannot be converted to tensor.')
+
+
+@TRANSFORMS.register_module()
+class ToTensor(BaseTransform):
+    """Convert some results to :obj:`torch.Tensor` by given keys.
+
+    Required keys:
+
+    - all these keys in `keys`
+
+    Modified Keys:
+
+    - all these keys in `keys`
+
+    Args:
+        keys (Sequence[str]): Keys that need to be converted to Tensor.
+    """
+
+    def __init__(self, keys: Sequence[str]) -> None:
+        self.keys = keys
+
+    def transform(self, results: dict) -> dict:
+        """Transform function to convert data to `torch.Tensor`.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+        Returns:
+            dict: `keys` in results will be updated.
+        """
+        for key in self.keys:
+
+            key_list = key.split('.')
+            cur_item = results
+            for i in range(len(key_list)):
+                if key_list[i] not in cur_item:
+                    raise KeyError(f'Can not find key {key}')
+                if i == len(key_list) - 1:
+                    cur_item[key_list[i]] = to_tensor(cur_item[key_list[i]])
+                    break
+                cur_item = cur_item[key_list[i]]
+
+        return results
+
+    def __repr__(self) -> str:
+        return self.__class__.__name__ + f'(keys={self.keys})'
+
+
+@TRANSFORMS.register_module()
+class ImageToTensor(BaseTransform):
+    """Convert image to :obj:`torch.Tensor` by given keys.
+
+    The dimension order of input image is (H, W, C). The pipeline will convert
+    it to (C, H, W). If only 2 dimension (H, W) is given, the output would be
+    (1, H, W).
+
+    Required keys:
+
+    - all these keys in `keys`
+
+    Modified Keys:
+
+    - all these keys in `keys`
+
+    Args:
+        keys (Sequence[str]): Key of images to be converted to Tensor.
+    """
+
+    def __init__(self, keys: dict) -> None:
+        self.keys = keys
+
+    def transform(self, results: dict) -> dict:
+        """Transform function to convert image in results to
+        :obj:`torch.Tensor` and transpose the channel order.
+        Args:
+            results (dict): Result dict contains the image data to convert.
+        Returns:
+            dict: The result dict contains the image converted
+            to :obj:``torch.Tensor`` and transposed to (C, H, W) order.
+        """
+        for key in self.keys:
+            img = results[key]
+            if len(img.shape) < 3:
+                img = np.expand_dims(img, -1)
+            results[key] = (to_tensor(img.transpose(2, 0, 1))).contiguous()
+        return results
+
+    def __repr__(self) -> str:
+        return self.__class__.__name__ + f'(keys={self.keys})'
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/loading.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/loading.py
new file mode 100644
index 0000000000000000000000000000000000000000..c0c17c97ac22a15cbb6224e3c1edc777172df0d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/loading.py
@@ -0,0 +1,360 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Optional
+
+import mmengine.fileio as fileio
+import numpy as np
+
+import mmcv
+from .base import BaseTransform
+from .builder import TRANSFORMS
+
+
+@TRANSFORMS.register_module()
+class LoadImageFromFile(BaseTransform):
+    """Load an image from file.
+
+    Required Keys:
+
+    - img_path
+
+    Modified Keys:
+
+    - img
+    - img_shape
+    - ori_shape
+
+    Args:
+        to_float32 (bool): Whether to convert the loaded image to a float32
+            numpy array. If set to False, the loaded image is an uint8 array.
+            Defaults to False.
+        color_type (str): The flag argument for :func:`mmcv.imfrombytes`.
+            Defaults to 'color'.
+        imdecode_backend (str): The image decoding backend type. The backend
+            argument for :func:`mmcv.imfrombytes`.
+            See :func:`mmcv.imfrombytes` for details.
+            Defaults to 'cv2'.
+        file_client_args (dict, optional): Arguments to instantiate a
+            FileClient. See :class:`mmengine.fileio.FileClient` for details.
+            Defaults to None. It will be deprecated in future. Please use
+            ``backend_args`` instead.
+            Deprecated in version 2.0.0rc4.
+        ignore_empty (bool): Whether to allow loading empty image or file path
+            not existent. Defaults to False.
+        backend_args (dict, optional): Instantiates the corresponding file
+            backend. It may contain `backend` key to specify the file
+            backend. If it contains, the file backend corresponding to this
+            value will be used and initialized with the remaining values,
+            otherwise the corresponding file backend will be selected
+            based on the prefix of the file path. Defaults to None.
+            New in version 2.0.0rc4.
+    """
+
+    def __init__(self,
+                 to_float32: bool = False,
+                 color_type: str = 'color',
+                 imdecode_backend: str = 'cv2',
+                 file_client_args: Optional[dict] = None,
+                 ignore_empty: bool = False,
+                 *,
+                 backend_args: Optional[dict] = None) -> None:
+        self.ignore_empty = ignore_empty
+        self.to_float32 = to_float32
+        self.color_type = color_type
+        self.imdecode_backend = imdecode_backend
+
+        self.file_client_args: Optional[dict] = None
+        self.backend_args: Optional[dict] = None
+        if file_client_args is not None:
+            warnings.warn(
+                '"file_client_args" will be deprecated in future. '
+                'Please use "backend_args" instead', DeprecationWarning)
+            if backend_args is not None:
+                raise ValueError(
+                    '"file_client_args" and "backend_args" cannot be set '
+                    'at the same time.')
+
+            self.file_client_args = file_client_args.copy()
+        if backend_args is not None:
+            self.backend_args = backend_args.copy()
+
+    def transform(self, results: dict) -> Optional[dict]:
+        """Functions to load image.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded image and meta information.
+        """
+
+        filename = results['img_path']
+        try:
+            if self.file_client_args is not None:
+                file_client = fileio.FileClient.infer_client(
+                    self.file_client_args, filename)
+                img_bytes = file_client.get(filename)
+            else:
+                img_bytes = fileio.get(
+                    filename, backend_args=self.backend_args)
+            img = mmcv.imfrombytes(
+                img_bytes, flag=self.color_type, backend=self.imdecode_backend)
+        except Exception as e:
+            if self.ignore_empty:
+                return None
+            else:
+                raise e
+        if self.to_float32:
+            img = img.astype(np.float32)
+
+        results['img'] = img
+        results['img_shape'] = img.shape[:2]
+        results['ori_shape'] = img.shape[:2]
+        return results
+
+    def __repr__(self):
+        repr_str = (f'{self.__class__.__name__}('
+                    f'ignore_empty={self.ignore_empty}, '
+                    f'to_float32={self.to_float32}, '
+                    f"color_type='{self.color_type}', "
+                    f"imdecode_backend='{self.imdecode_backend}', ")
+
+        if self.file_client_args is not None:
+            repr_str += f'file_client_args={self.file_client_args})'
+        else:
+            repr_str += f'backend_args={self.backend_args})'
+
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class LoadAnnotations(BaseTransform):
+    """Load and process the ``instances`` and ``seg_map`` annotation provided
+    by dataset.
+
+    The annotation format is as the following:
+
+    .. code-block:: python
+
+        {
+            'instances':
+            [
+                {
+                # List of 4 numbers representing the bounding box of the
+                # instance, in (x1, y1, x2, y2) order.
+                'bbox': [x1, y1, x2, y2],
+
+                # Label of image classification.
+                'bbox_label': 1,
+
+                # Used in key point detection.
+                # Can only load the format of [x1, y1, v1,…, xn, yn, vn]. v[i]
+                # means the visibility of this keypoint. n must be equal to the
+                # number of keypoint categories.
+                'keypoints': [x1, y1, v1, ..., xn, yn, vn]
+                }
+            ]
+            # Filename of semantic or panoptic segmentation ground truth file.
+            'seg_map_path': 'a/b/c'
+        }
+
+    After this module, the annotation has been changed to the format below:
+
+    .. code-block:: python
+
+        {
+            # In (x1, y1, x2, y2) order, float type. N is the number of bboxes
+            # in np.float32
+            'gt_bboxes': np.ndarray(N, 4)
+             # In np.int64 type.
+            'gt_bboxes_labels': np.ndarray(N, )
+             # In uint8 type.
+            'gt_seg_map': np.ndarray (H, W)
+             # with (x, y, v) order, in np.float32 type.
+            'gt_keypoints': np.ndarray(N, NK, 3)
+        }
+
+    Required Keys:
+
+    - instances
+
+      - bbox (optional)
+      - bbox_label
+      - keypoints (optional)
+
+    - seg_map_path (optional)
+
+    Added Keys:
+
+    - gt_bboxes (np.float32)
+    - gt_bboxes_labels (np.int64)
+    - gt_seg_map (np.uint8)
+    - gt_keypoints (np.float32)
+
+    Args:
+        with_bbox (bool): Whether to parse and load the bbox annotation.
+            Defaults to True.
+        with_label (bool): Whether to parse and load the label annotation.
+            Defaults to True.
+        with_seg (bool): Whether to parse and load the semantic segmentation
+            annotation. Defaults to False.
+        with_keypoints (bool): Whether to parse and load the keypoints
+            annotation. Defaults to False.
+        imdecode_backend (str): The image decoding backend type. The backend
+            argument for :func:`mmcv.imfrombytes`.
+            See :func:`mmcv.imfrombytes` for details.
+            Defaults to 'cv2'.
+        file_client_args (dict, optional): Arguments to instantiate a
+            FileClient. See :class:`mmengine.fileio.FileClient` for details.
+            Defaults to None. It will be deprecated in future. Please use
+            ``backend_args`` instead.
+            Deprecated in version 2.0.0rc4.
+        backend_args (dict, optional): Instantiates the corresponding file
+            backend. It may contain `backend` key to specify the file
+            backend. If it contains, the file backend corresponding to this
+            value will be used and initialized with the remaining values,
+            otherwise the corresponding file backend will be selected
+            based on the prefix of the file path. Defaults to None.
+            New in version 2.0.0rc4.
+    """
+
+    def __init__(
+        self,
+        with_bbox: bool = True,
+        with_label: bool = True,
+        with_seg: bool = False,
+        with_keypoints: bool = False,
+        imdecode_backend: str = 'cv2',
+        file_client_args: Optional[dict] = None,
+        *,
+        backend_args: Optional[dict] = None,
+    ) -> None:
+        super().__init__()
+        self.with_bbox = with_bbox
+        self.with_label = with_label
+        self.with_seg = with_seg
+        self.with_keypoints = with_keypoints
+        self.imdecode_backend = imdecode_backend
+
+        self.file_client_args: Optional[dict] = None
+        self.backend_args: Optional[dict] = None
+        if file_client_args is not None:
+            warnings.warn(
+                '"file_client_args" will be deprecated in future. '
+                'Please use "backend_args" instead', DeprecationWarning)
+            if backend_args is not None:
+                raise ValueError(
+                    '"file_client_args" and "backend_args" cannot be set '
+                    'at the same time.')
+
+            self.file_client_args = file_client_args.copy()
+        if backend_args is not None:
+            self.backend_args = backend_args.copy()
+
+    def _load_bboxes(self, results: dict) -> None:
+        """Private function to load bounding box annotations.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded bounding box annotations.
+        """
+        gt_bboxes = []
+        for instance in results['instances']:
+            gt_bboxes.append(instance['bbox'])
+        results['gt_bboxes'] = np.array(
+            gt_bboxes, dtype=np.float32).reshape(-1, 4)
+
+    def _load_labels(self, results: dict) -> None:
+        """Private function to load label annotations.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded label annotations.
+        """
+        gt_bboxes_labels = []
+        for instance in results['instances']:
+            gt_bboxes_labels.append(instance['bbox_label'])
+        results['gt_bboxes_labels'] = np.array(
+            gt_bboxes_labels, dtype=np.int64)
+
+    def _load_seg_map(self, results: dict) -> None:
+        """Private function to load semantic segmentation annotations.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded semantic segmentation annotations.
+        """
+        if self.file_client_args is not None:
+            file_client = fileio.FileClient.infer_client(
+                self.file_client_args, results['seg_map_path'])
+            img_bytes = file_client.get(results['seg_map_path'])
+        else:
+            img_bytes = fileio.get(
+                results['seg_map_path'], backend_args=self.backend_args)
+
+        results['gt_seg_map'] = mmcv.imfrombytes(
+            img_bytes, flag='unchanged',
+            backend=self.imdecode_backend).squeeze()
+
+    def _load_kps(self, results: dict) -> None:
+        """Private function to load keypoints annotations.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded keypoints annotations.
+        """
+        gt_keypoints = []
+        for instance in results['instances']:
+            gt_keypoints.append(instance['keypoints'])
+        results['gt_keypoints'] = np.array(gt_keypoints, np.float32).reshape(
+            (len(gt_keypoints), -1, 3))
+
+    def transform(self, results: dict) -> dict:
+        """Function to load multiple types annotations.
+
+        Args:
+            results (dict): Result dict from
+                :class:`mmengine.dataset.BaseDataset`.
+
+        Returns:
+            dict: The dict contains loaded bounding box, label and
+            semantic segmentation and keypoints annotations.
+        """
+
+        if self.with_bbox:
+            self._load_bboxes(results)
+        if self.with_label:
+            self._load_labels(results)
+        if self.with_seg:
+            self._load_seg_map(results)
+        if self.with_keypoints:
+            self._load_kps(results)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(with_bbox={self.with_bbox}, '
+        repr_str += f'with_label={self.with_label}, '
+        repr_str += f'with_seg={self.with_seg}, '
+        repr_str += f'with_keypoints={self.with_keypoints}, '
+        repr_str += f"imdecode_backend='{self.imdecode_backend}', "
+
+        if self.file_client_args is not None:
+            repr_str += f'file_client_args={self.file_client_args})'
+        else:
+            repr_str += f'backend_args={self.backend_args})'
+
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/processing.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/processing.py
new file mode 100644
index 0000000000000000000000000000000000000000..96e1bb0a1c98979e64282935222fd67a1166b6c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/processing.py
@@ -0,0 +1,1562 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import random
+import warnings
+from itertools import product
+from typing import Dict, Iterable, List, Optional, Sequence, Tuple, Union
+
+import mmengine
+import numpy as np
+
+import mmcv
+from mmcv.image.geometric import _scale_size
+from .base import BaseTransform
+from .builder import TRANSFORMS
+from .utils import cache_randomness
+from .wrappers import Compose
+
+Number = Union[int, float]
+
+
+@TRANSFORMS.register_module()
+class Normalize(BaseTransform):
+    """Normalize the image.
+
+    Required Keys:
+
+    - img
+
+    Modified Keys:
+
+    - img
+
+    Added Keys:
+
+    - img_norm_cfg
+
+      - mean
+      - std
+      - to_rgb
+
+
+    Args:
+        mean (sequence): Mean values of 3 channels.
+        std (sequence): Std values of 3 channels.
+        to_rgb (bool): Whether to convert the image from BGR to RGB before
+            normlizing the image. If ``to_rgb=True``, the order of mean and std
+            should be RGB. If ``to_rgb=False``, the order of mean and std
+            should be the same order of the image. Defaults to True.
+    """
+
+    def __init__(self,
+                 mean: Sequence[Number],
+                 std: Sequence[Number],
+                 to_rgb: bool = True) -> None:
+        self.mean = np.array(mean, dtype=np.float32)
+        self.std = np.array(std, dtype=np.float32)
+        self.to_rgb = to_rgb
+
+    def transform(self, results: dict) -> dict:
+        """Function to normalize images.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+
+        Returns:
+            dict: Normalized results, key 'img_norm_cfg' key is added in to
+            result dict.
+        """
+
+        results['img'] = mmcv.imnormalize(results['img'], self.mean, self.std,
+                                          self.to_rgb)
+        results['img_norm_cfg'] = dict(
+            mean=self.mean, std=self.std, to_rgb=self.to_rgb)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class Resize(BaseTransform):
+    """Resize images & bbox & seg & keypoints.
+
+    This transform resizes the input image according to ``scale`` or
+    ``scale_factor``. Bboxes, seg map and keypoints are then resized with the
+    same scale factor.
+    if ``scale`` and ``scale_factor`` are both set, it will use ``scale`` to
+    resize.
+
+    Required Keys:
+
+    - img
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+    - gt_keypoints (optional)
+
+    Modified Keys:
+
+    - img
+    - gt_bboxes
+    - gt_seg_map
+    - gt_keypoints
+    - img_shape
+
+    Added Keys:
+
+    - scale
+    - scale_factor
+    - keep_ratio
+
+    Args:
+        scale (int or tuple): Images scales for resizing. Defaults to None
+        scale_factor (float or tuple[float]): Scale factors for resizing.
+            Defaults to None.
+        keep_ratio (bool): Whether to keep the aspect ratio when resizing the
+            image. Defaults to False.
+        clip_object_border (bool): Whether to clip the objects
+            outside the border of the image. In some dataset like MOT17, the gt
+            bboxes are allowed to cross the border of images. Therefore, we
+            don't need to clip the gt bboxes in these cases. Defaults to True.
+        backend (str): Image resize backend, choices are 'cv2' and 'pillow'.
+            These two backends generates slightly different results. Defaults
+            to 'cv2'.
+        interpolation (str): Interpolation method, accepted values are
+            "nearest", "bilinear", "bicubic", "area", "lanczos" for 'cv2'
+            backend, "nearest", "bilinear" for 'pillow' backend. Defaults
+            to 'bilinear'.
+    """
+
+    def __init__(self,
+                 scale: Optional[Union[int, Tuple[int, int]]] = None,
+                 scale_factor: Optional[Union[float, Tuple[float,
+                                                           float]]] = None,
+                 keep_ratio: bool = False,
+                 clip_object_border: bool = True,
+                 backend: str = 'cv2',
+                 interpolation='bilinear') -> None:
+        assert scale is not None or scale_factor is not None, (
+            '`scale` and'
+            '`scale_factor` can not both be `None`')
+        if scale is None:
+            self.scale = None
+        else:
+            if isinstance(scale, int):
+                self.scale = (scale, scale)
+            else:
+                self.scale = scale
+
+        self.backend = backend
+        self.interpolation = interpolation
+        self.keep_ratio = keep_ratio
+        self.clip_object_border = clip_object_border
+        if scale_factor is None:
+            self.scale_factor = None
+        elif isinstance(scale_factor, float):
+            self.scale_factor = (scale_factor, scale_factor)
+        elif isinstance(scale_factor, tuple):
+            assert (len(scale_factor)) == 2
+            self.scale_factor = scale_factor
+        else:
+            raise TypeError(
+                f'expect scale_factor is float or Tuple(float), but'
+                f'get {type(scale_factor)}')
+
+    def _resize_img(self, results: dict) -> None:
+        """Resize images with ``results['scale']``."""
+
+        if results.get('img', None) is not None:
+            if self.keep_ratio:
+                img, scale_factor = mmcv.imrescale(
+                    results['img'],
+                    results['scale'],
+                    interpolation=self.interpolation,
+                    return_scale=True,
+                    backend=self.backend)
+                # the w_scale and h_scale has minor difference
+                # a real fix should be done in the mmcv.imrescale in the future
+                new_h, new_w = img.shape[:2]
+                h, w = results['img'].shape[:2]
+                w_scale = new_w / w
+                h_scale = new_h / h
+            else:
+                img, w_scale, h_scale = mmcv.imresize(
+                    results['img'],
+                    results['scale'],
+                    interpolation=self.interpolation,
+                    return_scale=True,
+                    backend=self.backend)
+            results['img'] = img
+            results['img_shape'] = img.shape[:2]
+            results['scale_factor'] = (w_scale, h_scale)
+            results['keep_ratio'] = self.keep_ratio
+
+    def _resize_bboxes(self, results: dict) -> None:
+        """Resize bounding boxes with ``results['scale_factor']``."""
+        if results.get('gt_bboxes', None) is not None:
+            bboxes = results['gt_bboxes'] * np.tile(
+                np.array(results['scale_factor']), 2)
+            if self.clip_object_border:
+                bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0,
+                                          results['img_shape'][1])
+                bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0,
+                                          results['img_shape'][0])
+            results['gt_bboxes'] = bboxes
+
+    def _resize_seg(self, results: dict) -> None:
+        """Resize semantic segmentation map with ``results['scale']``."""
+        if results.get('gt_seg_map', None) is not None:
+            if self.keep_ratio:
+                gt_seg = mmcv.imrescale(
+                    results['gt_seg_map'],
+                    results['scale'],
+                    interpolation='nearest',
+                    backend=self.backend)
+            else:
+                gt_seg = mmcv.imresize(
+                    results['gt_seg_map'],
+                    results['scale'],
+                    interpolation='nearest',
+                    backend=self.backend)
+            results['gt_seg_map'] = gt_seg
+
+    def _resize_keypoints(self, results: dict) -> None:
+        """Resize keypoints with ``results['scale_factor']``."""
+        if results.get('gt_keypoints', None) is not None:
+            keypoints = results['gt_keypoints']
+
+            keypoints[:, :, :2] = keypoints[:, :, :2] * np.array(
+                results['scale_factor'])
+            if self.clip_object_border:
+                keypoints[:, :, 0] = np.clip(keypoints[:, :, 0], 0,
+                                             results['img_shape'][1])
+                keypoints[:, :, 1] = np.clip(keypoints[:, :, 1], 0,
+                                             results['img_shape'][0])
+            results['gt_keypoints'] = keypoints
+
+    def transform(self, results: dict) -> dict:
+        """Transform function to resize images, bounding boxes, semantic
+        segmentation map and keypoints.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+        Returns:
+            dict: Resized results, 'img', 'gt_bboxes', 'gt_seg_map',
+            'gt_keypoints', 'scale', 'scale_factor', 'img_shape',
+            and 'keep_ratio' keys are updated in result dict.
+        """
+
+        if self.scale:
+            results['scale'] = self.scale
+        else:
+            img_shape = results['img'].shape[:2]
+            results['scale'] = _scale_size(img_shape[::-1],
+                                           self.scale_factor)  # type: ignore
+        self._resize_img(results)
+        self._resize_bboxes(results)
+        self._resize_seg(results)
+        self._resize_keypoints(results)
+        return results
+
+    def __repr__(self):
+        repr_str = self.__class__.__name__
+        repr_str += f'(scale={self.scale}, '
+        repr_str += f'scale_factor={self.scale_factor}, '
+        repr_str += f'keep_ratio={self.keep_ratio}, '
+        repr_str += f'clip_object_border={self.clip_object_border}), '
+        repr_str += f'backend={self.backend}), '
+        repr_str += f'interpolation={self.interpolation})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class Pad(BaseTransform):
+    """Pad the image & segmentation map.
+
+    There are three padding modes: (1) pad to a fixed size and (2) pad to the
+    minimum size that is divisible by some number. and (3)pad to square. Also,
+    pad to square and pad to the minimum size can be used as the same time.
+
+    Required Keys:
+
+    - img
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+
+    Modified Keys:
+
+    - img
+    - gt_seg_map
+    - img_shape
+
+    Added Keys:
+
+    - pad_shape
+    - pad_fixed_size
+    - pad_size_divisor
+
+    Args:
+        size (tuple, optional): Fixed padding size.
+            Expected padding shape (w, h). Defaults to None.
+        size_divisor (int, optional): The divisor of padded size. Defaults to
+            None.
+        pad_to_square (bool): Whether to pad the image into a square.
+            Currently only used for YOLOX. Defaults to False.
+        pad_val (Number | dict[str, Number], optional): Padding value for if
+            the pad_mode is "constant". If it is a single number, the value
+            to pad the image is the number and to pad the semantic
+            segmentation map is 255. If it is a dict, it should have the
+            following keys:
+
+            - img: The value to pad the image.
+            - seg: The value to pad the semantic segmentation map.
+
+            Defaults to dict(img=0, seg=255).
+        padding_mode (str): Type of padding. Should be: constant, edge,
+            reflect or symmetric. Defaults to 'constant'.
+
+            - constant: pads with a constant value, this value is specified
+              with pad_val.
+            - edge: pads with the last value at the edge of the image.
+            - reflect: pads with reflection of image without repeating the last
+              value on the edge. For example, padding [1, 2, 3, 4] with 2
+              elements on both sides in reflect mode will result in
+              [3, 2, 1, 2, 3, 4, 3, 2].
+            - symmetric: pads with reflection of image repeating the last value
+              on the edge. For example, padding [1, 2, 3, 4] with 2 elements on
+              both sides in symmetric mode will result in
+              [2, 1, 1, 2, 3, 4, 4, 3]
+    """
+
+    def __init__(self,
+                 size: Optional[Tuple[int, int]] = None,
+                 size_divisor: Optional[int] = None,
+                 pad_to_square: bool = False,
+                 pad_val: Union[Number, dict] = dict(img=0, seg=255),
+                 padding_mode: str = 'constant') -> None:
+        self.size = size
+        self.size_divisor = size_divisor
+        if isinstance(pad_val, int):
+            pad_val = dict(img=pad_val, seg=255)
+        assert isinstance(pad_val, dict), 'pad_val '
+        self.pad_val = pad_val
+        self.pad_to_square = pad_to_square
+
+        if pad_to_square:
+            assert size is None, \
+                'The size and size_divisor must be None ' \
+                'when pad2square is True'
+        else:
+            assert size is not None or size_divisor is not None, \
+                'only one of size and size_divisor should be valid'
+            assert size is None or size_divisor is None
+        assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']
+        self.padding_mode = padding_mode
+
+    def _pad_img(self, results: dict) -> None:
+        """Pad images according to ``self.size``."""
+        pad_val = self.pad_val.get('img', 0)
+
+        size = None
+        if self.pad_to_square:
+            max_size = max(results['img'].shape[:2])
+            size = (max_size, max_size)
+        if self.size_divisor is not None:
+            if size is None:
+                size = (results['img'].shape[0], results['img'].shape[1])
+            pad_h = int(np.ceil(
+                size[0] / self.size_divisor)) * self.size_divisor
+            pad_w = int(np.ceil(
+                size[1] / self.size_divisor)) * self.size_divisor
+            size = (pad_h, pad_w)
+        elif self.size is not None:
+            size = self.size[::-1]
+        if isinstance(pad_val, int) and results['img'].ndim == 3:
+            pad_val = tuple(pad_val for _ in range(results['img'].shape[2]))
+        padded_img = mmcv.impad(
+            results['img'],
+            shape=size,
+            pad_val=pad_val,
+            padding_mode=self.padding_mode)
+
+        results['img'] = padded_img
+        results['pad_shape'] = padded_img.shape
+        results['pad_fixed_size'] = self.size
+        results['pad_size_divisor'] = self.size_divisor
+        results['img_shape'] = padded_img.shape[:2]
+
+    def _pad_seg(self, results: dict) -> None:
+        """Pad semantic segmentation map according to
+        ``results['pad_shape']``."""
+        if results.get('gt_seg_map', None) is not None:
+            pad_val = self.pad_val.get('seg', 255)
+            if isinstance(pad_val, int) and results['gt_seg_map'].ndim == 3:
+                pad_val = tuple(
+                    pad_val for _ in range(results['gt_seg_map'].shape[2]))
+            results['gt_seg_map'] = mmcv.impad(
+                results['gt_seg_map'],
+                shape=results['pad_shape'][:2],
+                pad_val=pad_val,
+                padding_mode=self.padding_mode)
+
+    def transform(self, results: dict) -> dict:
+        """Call function to pad images, masks, semantic segmentation maps.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+
+        Returns:
+            dict: Updated result dict.
+        """
+        self._pad_img(results)
+        self._pad_seg(results)
+        return results
+
+    def __repr__(self):
+        repr_str = self.__class__.__name__
+        repr_str += f'(size={self.size}, '
+        repr_str += f'size_divisor={self.size_divisor}, '
+        repr_str += f'pad_to_square={self.pad_to_square}, '
+        repr_str += f'pad_val={self.pad_val}), '
+        repr_str += f'padding_mode={self.padding_mode})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class CenterCrop(BaseTransform):
+    """Crop the center of the image, segmentation masks, bounding boxes and key
+    points. If the crop area exceeds the original image and ``auto_pad`` is
+    True, the original image will be padded before cropping.
+
+    Required Keys:
+
+    - img
+    - gt_seg_map (optional)
+    - gt_bboxes (optional)
+    - gt_keypoints (optional)
+
+    Modified Keys:
+
+    - img
+    - img_shape
+    - gt_seg_map (optional)
+    - gt_bboxes (optional)
+    - gt_keypoints (optional)
+
+    Added Key:
+
+    - pad_shape
+
+
+    Args:
+        crop_size (Union[int, Tuple[int, int]]):  Expected size after cropping
+            with the format of (w, h). If set to an integer, then cropping
+            width and height are equal to this integer.
+        auto_pad (bool): Whether to pad the image if it's smaller than the
+            ``crop_size``. Defaults to False.
+        pad_cfg (dict): Base config for padding. Refer to ``mmcv.Pad`` for
+            detail. Defaults to ``dict(type='Pad')``.
+        clip_object_border (bool): Whether to clip the objects
+            outside the border of the image. In some dataset like MOT17, the
+            gt bboxes are allowed to cross the border of images. Therefore,
+            we don't need to clip the gt bboxes in these cases.
+            Defaults to True.
+    """
+
+    def __init__(self,
+                 crop_size: Union[int, Tuple[int, int]],
+                 auto_pad: bool = False,
+                 pad_cfg: dict = dict(type='Pad'),
+                 clip_object_border: bool = True) -> None:
+        super().__init__()
+        assert isinstance(crop_size, int) or (
+            isinstance(crop_size, tuple) and len(crop_size) == 2
+        ), 'The expected crop_size is an integer, or a tuple containing two '
+        'intergers'
+
+        if isinstance(crop_size, int):
+            crop_size = (crop_size, crop_size)
+        assert crop_size[0] > 0 and crop_size[1] > 0
+        self.crop_size = crop_size
+        self.auto_pad = auto_pad
+
+        self.pad_cfg = pad_cfg.copy()
+        # size will be overwritten
+        if 'size' in self.pad_cfg and auto_pad:
+            warnings.warn('``size`` is set in ``pad_cfg``,'
+                          'however this argument will be overwritten'
+                          ' according to crop size and image size')
+
+        self.clip_object_border = clip_object_border
+
+    def _crop_img(self, results: dict, bboxes: np.ndarray) -> None:
+        """Crop image.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+            bboxes (np.ndarray): Shape (4, ), location of cropped bboxes.
+        """
+        if results.get('img', None) is not None:
+            img = mmcv.imcrop(results['img'], bboxes=bboxes)
+            img_shape = img.shape[:2]  # type: ignore
+            results['img'] = img
+            results['img_shape'] = img_shape
+            results['pad_shape'] = img_shape
+
+    def _crop_seg_map(self, results: dict, bboxes: np.ndarray) -> None:
+        """Crop semantic segmentation map.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+            bboxes (np.ndarray): Shape (4, ), location of cropped bboxes.
+        """
+        if results.get('gt_seg_map', None) is not None:
+            img = mmcv.imcrop(results['gt_seg_map'], bboxes=bboxes)
+            results['gt_seg_map'] = img
+
+    def _crop_bboxes(self, results: dict, bboxes: np.ndarray) -> None:
+        """Update bounding boxes according to CenterCrop.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+            bboxes (np.ndarray): Shape (4, ), location of cropped bboxes.
+        """
+        if 'gt_bboxes' in results:
+            offset_w = bboxes[0]
+            offset_h = bboxes[1]
+            bbox_offset = np.array([offset_w, offset_h, offset_w, offset_h])
+            # gt_bboxes has shape (num_gts, 4) in (tl_x, tl_y, br_x, br_y)
+            # order.
+            gt_bboxes = results['gt_bboxes'] - bbox_offset
+            if self.clip_object_border:
+                gt_bboxes[:, 0::2] = np.clip(gt_bboxes[:, 0::2], 0,
+                                             results['img'].shape[1])
+                gt_bboxes[:, 1::2] = np.clip(gt_bboxes[:, 1::2], 0,
+                                             results['img'].shape[0])
+            results['gt_bboxes'] = gt_bboxes
+
+    def _crop_keypoints(self, results: dict, bboxes: np.ndarray) -> None:
+        """Update key points according to CenterCrop. Keypoints that not in the
+        cropped image will be set invisible.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+            bboxes (np.ndarray): Shape (4, ), location of cropped bboxes.
+        """
+        if 'gt_keypoints' in results:
+            offset_w = bboxes[0]
+            offset_h = bboxes[1]
+            keypoints_offset = np.array([offset_w, offset_h, 0])
+            # gt_keypoints has shape (N, NK, 3) in (x, y, visibility) order,
+            # NK = number of points per object
+            gt_keypoints = results['gt_keypoints'] - keypoints_offset
+            # set gt_kepoints out of the result image invisible
+            height, width = results['img'].shape[:2]
+            valid_pos = (gt_keypoints[:, :, 0] >=
+                         0) * (gt_keypoints[:, :, 0] <
+                               width) * (gt_keypoints[:, :, 1] >= 0) * (
+                                   gt_keypoints[:, :, 1] < height)
+            gt_keypoints[:, :, 2] = np.where(valid_pos, gt_keypoints[:, :, 2],
+                                             0)
+            gt_keypoints[:, :, 0] = np.clip(gt_keypoints[:, :, 0], 0,
+                                            results['img'].shape[1])
+            gt_keypoints[:, :, 1] = np.clip(gt_keypoints[:, :, 1], 0,
+                                            results['img'].shape[0])
+            results['gt_keypoints'] = gt_keypoints
+
+    def transform(self, results: dict) -> dict:
+        """Apply center crop on results.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+
+        Returns:
+            dict: Results with CenterCropped image and semantic segmentation
+            map.
+        """
+        crop_width, crop_height = self.crop_size[0], self.crop_size[1]
+
+        assert 'img' in results, '`img` is not found in results'
+        img = results['img']
+        # img.shape has length 2 for grayscale, length 3 for color
+        img_height, img_width = img.shape[:2]
+
+        if crop_height > img_height or crop_width > img_width:
+            if self.auto_pad:
+                # pad the area
+                img_height = max(img_height, crop_height)
+                img_width = max(img_width, crop_width)
+                pad_size = (img_width, img_height)
+                _pad_cfg = self.pad_cfg.copy()
+                _pad_cfg.update(dict(size=pad_size))
+                pad_transform = TRANSFORMS.build(_pad_cfg)
+                results = pad_transform(results)
+            else:
+                crop_height = min(crop_height, img_height)
+                crop_width = min(crop_width, img_width)
+
+        y1 = max(0, int(round((img_height - crop_height) / 2.)))
+        x1 = max(0, int(round((img_width - crop_width) / 2.)))
+        y2 = min(img_height, y1 + crop_height) - 1
+        x2 = min(img_width, x1 + crop_width) - 1
+        bboxes = np.array([x1, y1, x2, y2])
+
+        # crop the image
+        self._crop_img(results, bboxes)
+        # crop the gt_seg_map
+        self._crop_seg_map(results, bboxes)
+        # crop the bounding box
+        self._crop_bboxes(results, bboxes)
+        # crop the keypoints
+        self._crop_keypoints(results, bboxes)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(crop_size = {self.crop_size}'
+        repr_str += f', auto_pad={self.auto_pad}'
+        repr_str += f', pad_cfg={self.pad_cfg}'
+        repr_str += f',clip_object_border = {self.clip_object_border})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomGrayscale(BaseTransform):
+    """Randomly convert image to grayscale with a probability.
+
+    Required Key:
+
+    - img
+
+    Modified Key:
+
+    - img
+
+    Added Keys:
+
+    - grayscale
+    - grayscale_weights
+
+    Args:
+        prob (float): Probability that image should be converted to
+            grayscale. Defaults to 0.1.
+        keep_channels (bool): Whether keep channel number the same as
+            input. Defaults to False.
+        channel_weights (tuple): The grayscale weights of each channel,
+            and the weights will be normalized. For example, (1, 2, 1)
+            will be normalized as (0.25, 0.5, 0.25). Defaults to
+            (1., 1., 1.).
+        color_format (str): Color format set to be any of 'bgr',
+            'rgb', 'hsv'. Note: 'hsv' image will be transformed into 'bgr'
+            format no matter whether it is grayscaled. Defaults to 'bgr'.
+    """
+
+    def __init__(self,
+                 prob: float = 0.1,
+                 keep_channels: bool = False,
+                 channel_weights: Sequence[float] = (1., 1., 1.),
+                 color_format: str = 'bgr') -> None:
+        super().__init__()
+        assert 0. <= prob <= 1., ('The range of ``prob`` value is [0., 1.],' +
+                                  f' but got {prob} instead')
+        self.prob = prob
+        self.keep_channels = keep_channels
+        self.channel_weights = channel_weights
+        assert color_format in ['bgr', 'rgb', 'hsv']
+        self.color_format = color_format
+
+    @cache_randomness
+    def _random_prob(self):
+        return random.random()
+
+    def transform(self, results: dict) -> dict:
+        """Apply random grayscale on results.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+
+        Returns:
+           dict: Results with grayscale image.
+        """
+        img = results['img']
+        # convert hsv to bgr
+        if self.color_format == 'hsv':
+            img = mmcv.hsv2bgr(img)
+        img = img[..., None] if img.ndim == 2 else img
+        num_output_channels = img.shape[2]
+        if self._random_prob() < self.prob:
+            if num_output_channels > 1:
+                assert num_output_channels == len(
+                    self.channel_weights
+                ), 'The length of ``channel_weights`` are supposed to be '
+                f'num_output_channels, but got {len(self.channel_weights)}'
+                ' instead.'
+                normalized_weights = (
+                    np.array(self.channel_weights) / sum(self.channel_weights))
+                img = (normalized_weights * img).sum(axis=2)
+                img = img.astype('uint8')
+                if self.keep_channels:
+                    img = img[:, :, None]
+                    results['img'] = np.dstack(
+                        [img for _ in range(num_output_channels)])
+                else:
+                    results['img'] = img
+                return results
+        img = img.astype('uint8')
+        results['img'] = img
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(prob = {self.prob}'
+        repr_str += f', keep_channels = {self.keep_channels}'
+        repr_str += f', channel_weights = {self.channel_weights}'
+        repr_str += f', color_format = {self.color_format})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class MultiScaleFlipAug(BaseTransform):
+    """Test-time augmentation with multiple scales and flipping.
+
+    An example configuration is as followed:
+
+    .. code-block::
+
+        dict(
+            type='MultiScaleFlipAug',
+            scales=[(1333, 400), (1333, 800)],
+            flip=True,
+            transforms=[
+                dict(type='Normalize', **img_norm_cfg),
+                dict(type='Pad', size_divisor=1),
+                dict(type='ImageToTensor', keys=['img']),
+                dict(type='Collect', keys=['img'])
+            ])
+
+    ``results`` will be resized using all the sizes in ``scales``.
+    If ``flip`` is True, then flipped results will also be added into output
+    list.
+
+    For the above configuration, there are four combinations of resize
+    and flip:
+
+    - Resize to (1333, 400) + no flip
+    - Resize to (1333, 400) + flip
+    - Resize to (1333, 800) + no flip
+    - resize to (1333, 800) + flip
+
+    The four results are then transformed with ``transforms`` argument.
+    After that, results are wrapped into lists of the same length as below:
+
+    .. code-block::
+
+        dict(
+            inputs=[...],
+            data_samples=[...]
+        )
+
+    Where the length of ``inputs`` and ``data_samples`` are both 4.
+
+    Required Keys:
+
+    - Depending on the requirements of the ``transforms`` parameter.
+
+    Modified Keys:
+
+    - All output keys of each transform.
+
+    Args:
+        transforms (list[dict]): Transforms to be applied to each resized
+            and flipped data.
+        scales (tuple | list[tuple] | None): Images scales for resizing.
+        scale_factor (float or tuple[float]): Scale factors for resizing.
+            Defaults to None.
+        allow_flip (bool): Whether apply flip augmentation. Defaults to False.
+        flip_direction (str | list[str]): Flip augmentation directions,
+            options are "horizontal", "vertical" and "diagonal". If
+            flip_direction is a list, multiple flip augmentations will be
+            applied. It has no effect when flip == False. Defaults to
+            "horizontal".
+        resize_cfg (dict): Base config for resizing. Defaults to
+            ``dict(type='Resize', keep_ratio=True)``.
+        flip_cfg (dict): Base config for flipping. Defaults to
+            ``dict(type='RandomFlip')``.
+    """
+
+    def __init__(
+        self,
+        transforms: List[dict],
+        scales: Optional[Union[Tuple, List[Tuple]]] = None,
+        scale_factor: Optional[Union[float, List[float]]] = None,
+        allow_flip: bool = False,
+        flip_direction: Union[str, List[str]] = 'horizontal',
+        resize_cfg: dict = dict(type='Resize', keep_ratio=True),
+        flip_cfg: dict = dict(type='RandomFlip')
+    ) -> None:
+        super().__init__()
+        self.transforms = Compose(transforms)  # type: ignore
+
+        if scales is not None:
+            self.scales = scales if isinstance(scales, list) else [scales]
+            self.scale_key = 'scale'
+            assert mmengine.is_list_of(self.scales, tuple)
+        else:
+            # if ``scales`` and ``scale_factor`` both be ``None``
+            if scale_factor is None:
+                self.scales = [1.]  # type: ignore
+            elif isinstance(scale_factor, list):
+                self.scales = scale_factor  # type: ignore
+            else:
+                self.scales = [scale_factor]  # type: ignore
+
+            self.scale_key = 'scale_factor'
+
+        self.allow_flip = allow_flip
+        self.flip_direction = flip_direction if isinstance(
+            flip_direction, list) else [flip_direction]
+        assert mmengine.is_list_of(self.flip_direction, str)
+        if not self.allow_flip and self.flip_direction != ['horizontal']:
+            warnings.warn(
+                'flip_direction has no effect when flip is set to False')
+        self.resize_cfg = resize_cfg.copy()
+        self.flip_cfg = flip_cfg
+
+    def transform(self, results: dict) -> Dict:
+        """Apply test time augment transforms on results.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+
+        Returns:
+            dict: The augmented data, where each value is wrapped
+            into a list.
+        """
+
+        data_samples = []
+        inputs = []
+        flip_args = [(False, '')]
+        if self.allow_flip:
+            flip_args += [(True, direction)
+                          for direction in self.flip_direction]
+        for scale in self.scales:
+            for flip, direction in flip_args:
+                _resize_cfg = self.resize_cfg.copy()
+                _resize_cfg.update({self.scale_key: scale})
+                _resize_flip = [_resize_cfg]
+
+                if flip:
+                    _flip_cfg = self.flip_cfg.copy()
+                    _flip_cfg.update(prob=1.0, direction=direction)
+                    _resize_flip.append(_flip_cfg)
+                else:
+                    results['flip'] = False
+                    results['flip_direction'] = None
+
+                resize_flip = Compose(_resize_flip)
+                _results = resize_flip(results.copy())
+                packed_results = self.transforms(_results)  # type: ignore
+
+                inputs.append(packed_results['inputs'])  # type: ignore
+                data_samples.append(
+                    packed_results['data_sample'])  # type: ignore
+        return dict(inputs=inputs, data_sample=data_samples)
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(transforms={self.transforms}'
+        repr_str += f', scales={self.scales}'
+        repr_str += f', allow_flip={self.allow_flip}'
+        repr_str += f', flip_direction={self.flip_direction})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class TestTimeAug(BaseTransform):
+    """Test-time augmentation transform.
+
+    An example configuration is as followed:
+
+    .. code-block::
+
+        dict(type='TestTimeAug',
+             transforms=[
+                [dict(type='Resize', scale=(1333, 400), keep_ratio=True),
+                 dict(type='Resize', scale=(1333, 800), keep_ratio=True)],
+                [dict(type='RandomFlip', prob=1.),
+                 dict(type='RandomFlip', prob=0.)],
+                [dict(type='PackDetInputs',
+                      meta_keys=('img_id', 'img_path', 'ori_shape',
+                                 'img_shape', 'scale_factor', 'flip',
+                                 'flip_direction'))]])
+
+    ``results`` will be transformed using all transforms defined in
+    ``transforms`` arguments.
+
+    For the above configuration, there are four combinations of resize
+    and flip:
+
+    - Resize to (1333, 400) + no flip
+    - Resize to (1333, 400) + flip
+    - Resize to (1333, 800) + no flip
+    - resize to (1333, 800) + flip
+
+    After that, results are wrapped into lists of the same length as below:
+
+    .. code-block::
+
+        dict(
+            inputs=[...],
+            data_samples=[...]
+        )
+
+    The length of ``inputs`` and ``data_samples`` are both 4.
+
+    Required Keys:
+
+    - Depending on the requirements of the ``transforms`` parameter.
+
+    Modified Keys:
+
+    - All output keys of each transform.
+
+    Args:
+        transforms (list[list[dict]]): Transforms to be applied to data sampled
+            from dataset. ``transforms`` is a list of list, and each list
+            element usually represents a series of transforms with the same
+            type and different arguments. Data will be processed by each list
+            elements sequentially. See more information in :meth:`transform`.
+    """
+
+    def __init__(self, transforms: list):
+        for i, transform_list in enumerate(transforms):
+            for j, transform in enumerate(transform_list):
+                if isinstance(transform, dict):
+                    transform_list[j] = TRANSFORMS.build(transform)
+                elif callable(transform):
+                    continue
+                else:
+                    raise TypeError(
+                        'transform must be callable or a dict, but got'
+                        f' {type(transform)}')
+            transforms[i] = transform_list
+
+        self.subroutines = [
+            Compose(subroutine) for subroutine in product(*transforms)
+        ]
+
+    def transform(self, results: dict) -> dict:
+        """Apply all transforms defined in :attr:`transforms` to the results.
+
+        As the example given in :obj:`TestTimeAug`, ``transforms`` consists of
+        2 ``Resize``, 2 ``RandomFlip`` and 1 ``PackDetInputs``.
+        The data sampled from dataset will be processed as follows:
+
+        1. Data will be processed by 2 ``Resize`` and return a list
+           of 2 results.
+        2. Each result in list will be further passed to 2
+           ``RandomFlip``, and aggregates into a list of 4 results.
+        3. Each result will be processed by ``PackDetInputs``, and
+           return a list of dict.
+        4. Aggregates the same fields of results, and finally returns
+           a dict. Each value of the dict represents 4 transformed
+           results.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+
+        Returns:
+            dict: The augmented data, where each value is wrapped
+            into a list.
+        """
+        results_list = []  # type: ignore
+        for subroutine in self.subroutines:
+            result = subroutine(copy.deepcopy(results))
+            assert isinstance(result, dict), (
+                f'Data processed by {subroutine} must return a dict, but got '
+                f'{result}')
+            assert result is not None, (
+                f'Data processed by {subroutine} in `TestTimeAug` should not '
+                'be None! Please check your validation dataset and the '
+                f'transforms in {subroutine}')
+            results_list.append(result)
+
+        aug_data_dict = {
+            key: [item[key] for item in results_list]  # type: ignore
+            for key in results_list[0]  # type: ignore
+        }
+        return aug_data_dict
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += 'transforms=\n'
+        for subroutine in self.subroutines:
+            repr_str += f'{repr(subroutine)}\n'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomChoiceResize(BaseTransform):
+    """Resize images & bbox & mask from a list of multiple scales.
+
+    This transform resizes the input image to some scale. Bboxes and masks are
+    then resized with the same scale factor. Resize scale will be randomly
+    selected from ``scales``.
+
+    How to choose the target scale to resize the image will follow the rules
+    below:
+
+    - if `scale` is a list of tuple, the target scale is sampled from the list
+      uniformally.
+    - if `scale` is a tuple, the target scale will be set to the tuple.
+
+    Required Keys:
+
+    - img
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+    - gt_keypoints (optional)
+
+    Modified Keys:
+
+    - img
+    - img_shape
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+    - gt_keypoints (optional)
+
+    Added Keys:
+
+    - scale
+    - scale_factor
+    - scale_idx
+    - keep_ratio
+
+
+    Args:
+        scales (Union[list, Tuple]): Images scales for resizing.
+        resize_type (str): The type of resize class to use. Defaults to
+            "Resize".
+        **resize_kwargs: Other keyword arguments for the ``resize_type``.
+
+    Note:
+        By defaults, the ``resize_type`` is "Resize", if it's not overwritten
+        by your registry, it indicates the :class:`mmcv.Resize`. And therefore,
+        ``resize_kwargs`` accepts any keyword arguments of it, like
+        ``keep_ratio``, ``interpolation`` and so on.
+
+        If you want to use your custom resize class, the class should accept
+        ``scale`` argument and have ``scale`` attribution which determines the
+        resize shape.
+    """
+
+    def __init__(
+        self,
+        scales: Sequence[Union[int, Tuple]],
+        resize_type: str = 'Resize',
+        **resize_kwargs,
+    ) -> None:
+        super().__init__()
+        if isinstance(scales, list):
+            self.scales = scales
+        else:
+            self.scales = [scales]
+        assert mmengine.is_seq_of(self.scales, (tuple, int))
+
+        self.resize_cfg = dict(type=resize_type, **resize_kwargs)
+        # create a empty Resize object
+        self.resize = TRANSFORMS.build({'scale': 0, **self.resize_cfg})
+
+    @cache_randomness
+    def _random_select(self) -> Tuple[int, int]:
+        """Randomly select an scale from given candidates.
+
+        Returns:
+            (tuple, int): Returns a tuple ``(scale, scale_dix)``,
+            where ``scale`` is the selected image scale and
+            ``scale_idx`` is the selected index in the given candidates.
+        """
+
+        scale_idx = np.random.randint(len(self.scales))
+        scale = self.scales[scale_idx]
+        return scale, scale_idx
+
+    def transform(self, results: dict) -> dict:
+        """Apply resize transforms on results from a list of scales.
+
+        Args:
+            results (dict): Result dict contains the data to transform.
+
+        Returns:
+            dict: Resized results, 'img', 'gt_bboxes', 'gt_seg_map',
+            'gt_keypoints', 'scale', 'scale_factor', 'img_shape',
+            and 'keep_ratio' keys are updated in result dict.
+        """
+
+        target_scale, scale_idx = self._random_select()
+        self.resize.scale = target_scale
+        results = self.resize(results)
+        results['scale_idx'] = scale_idx
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(scales={self.scales}'
+        repr_str += f', resize_cfg={self.resize_cfg})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomFlip(BaseTransform):
+    """Flip the image & bbox & keypoints & segmentation map. Added or Updated
+    keys: flip, flip_direction, img, gt_bboxes, gt_seg_map, and
+    gt_keypoints. There are 3 flip modes:
+
+    - ``prob`` is float, ``direction`` is string: the image will be
+      ``direction``ly flipped with probability of ``prob`` .
+      E.g., ``prob=0.5``, ``direction='horizontal'``,
+      then image will be horizontally flipped with probability of 0.5.
+
+    - ``prob`` is float, ``direction`` is list of string: the image will
+      be ``direction[i]``ly flipped with probability of
+      ``prob/len(direction)``.
+      E.g., ``prob=0.5``, ``direction=['horizontal', 'vertical']``,
+      then image will be horizontally flipped with probability of 0.25,
+      vertically with probability of 0.25.
+
+    - ``prob`` is list of float, ``direction`` is list of string:
+      given ``len(prob) == len(direction)``, the image will
+      be ``direction[i]``ly flipped with probability of ``prob[i]``.
+      E.g., ``prob=[0.3, 0.5]``, ``direction=['horizontal',
+      'vertical']``, then image will be horizontally flipped with
+      probability of 0.3, vertically with probability of 0.5.
+
+    Required Keys:
+
+    - img
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+    - gt_keypoints (optional)
+
+    Modified Keys:
+
+    - img
+    - gt_bboxes (optional)
+    - gt_seg_map (optional)
+    - gt_keypoints (optional)
+
+    Added Keys:
+
+    - flip
+    - flip_direction
+    - swap_seg_labels (optional)
+
+    Args:
+        prob (float | list[float], optional): The flipping probability.
+            Defaults to None.
+        direction(str | list[str]): The flipping direction. Options
+            If input is a list, the length must equal ``prob``. Each
+            element in ``prob`` indicates the flip probability of
+            corresponding direction. Defaults to 'horizontal'.
+        swap_seg_labels (list, optional): The label pair need to be swapped
+            for ground truth, like 'left arm' and 'right arm' need to be
+            swapped after horizontal flipping. For example, ``[(1, 5)]``,
+            where 1/5 is the label of the left/right arm. Defaults to None.
+    """
+
+    def __init__(self,
+                 prob: Optional[Union[float, Iterable[float]]] = None,
+                 direction: Union[str, Sequence[Optional[str]]] = 'horizontal',
+                 swap_seg_labels: Optional[Sequence] = None) -> None:
+        if isinstance(prob, list):
+            assert mmengine.is_list_of(prob, float)
+            assert 0 <= sum(prob) <= 1
+        elif isinstance(prob, float):
+            assert 0 <= prob <= 1
+        else:
+            raise ValueError(f'probs must be float or list of float, but \
+                              got `{type(prob)}`.')
+        self.prob = prob
+        self.swap_seg_labels = swap_seg_labels
+
+        valid_directions = ['horizontal', 'vertical', 'diagonal']
+        if isinstance(direction, str):
+            assert direction in valid_directions
+        elif isinstance(direction, list):
+            assert mmengine.is_list_of(direction, str)
+            assert set(direction).issubset(set(valid_directions))
+        else:
+            raise ValueError(f'direction must be either str or list of str, \
+                               but got `{type(direction)}`.')
+        self.direction = direction
+
+        if isinstance(prob, list):
+            assert len(prob) == len(self.direction)
+
+    def _flip_bbox(self, bboxes: np.ndarray, img_shape: Tuple[int, int],
+                   direction: str) -> np.ndarray:
+        """Flip bboxes horizontally.
+
+        Args:
+            bboxes (numpy.ndarray): Bounding boxes, shape (..., 4*k)
+            img_shape (tuple[int]): Image shape (height, width)
+            direction (str): Flip direction. Options are 'horizontal',
+                'vertical', and 'diagonal'.
+
+        Returns:
+            numpy.ndarray: Flipped bounding boxes.
+        """
+        assert bboxes.shape[-1] % 4 == 0
+        flipped = bboxes.copy()
+        h, w = img_shape
+        if direction == 'horizontal':
+            flipped[..., 0::4] = w - bboxes[..., 2::4]
+            flipped[..., 2::4] = w - bboxes[..., 0::4]
+        elif direction == 'vertical':
+            flipped[..., 1::4] = h - bboxes[..., 3::4]
+            flipped[..., 3::4] = h - bboxes[..., 1::4]
+        elif direction == 'diagonal':
+            flipped[..., 0::4] = w - bboxes[..., 2::4]
+            flipped[..., 1::4] = h - bboxes[..., 3::4]
+            flipped[..., 2::4] = w - bboxes[..., 0::4]
+            flipped[..., 3::4] = h - bboxes[..., 1::4]
+        else:
+            raise ValueError(
+                f"Flipping direction must be 'horizontal', 'vertical', \
+                  or 'diagonal', but got '{direction}'")
+        return flipped
+
+    def _flip_keypoints(
+        self,
+        keypoints: np.ndarray,
+        img_shape: Tuple[int, int],
+        direction: str,
+    ) -> np.ndarray:
+        """Flip keypoints horizontally, vertically or diagonally.
+
+        Args:
+            keypoints (numpy.ndarray): Keypoints, shape (..., 2)
+            img_shape (tuple[int]): Image shape (height, width)
+            direction (str): Flip direction. Options are 'horizontal',
+                'vertical', and 'diagonal'.
+
+        Returns:
+            numpy.ndarray: Flipped keypoints.
+        """
+
+        meta_info = keypoints[..., 2:]
+        keypoints = keypoints[..., :2]
+        flipped = keypoints.copy()
+        h, w = img_shape
+        if direction == 'horizontal':
+            flipped[..., 0::2] = w - keypoints[..., 0::2]
+        elif direction == 'vertical':
+            flipped[..., 1::2] = h - keypoints[..., 1::2]
+        elif direction == 'diagonal':
+            flipped[..., 0::2] = w - keypoints[..., 0::2]
+            flipped[..., 1::2] = h - keypoints[..., 1::2]
+        else:
+            raise ValueError(
+                f"Flipping direction must be 'horizontal', 'vertical', \
+                  or 'diagonal', but got '{direction}'")
+        flipped = np.concatenate([flipped, meta_info], axis=-1)
+        return flipped
+
+    def _flip_seg_map(self, seg_map: dict, direction: str) -> np.ndarray:
+        """Flip segmentation map horizontally, vertically or diagonally.
+
+        Args:
+            seg_map (numpy.ndarray): segmentation map, shape (H, W).
+            direction (str): Flip direction. Options are 'horizontal',
+                'vertical'.
+
+        Returns:
+            numpy.ndarray: Flipped segmentation map.
+        """
+        seg_map = mmcv.imflip(seg_map, direction=direction)
+        if self.swap_seg_labels is not None:
+            # to handle datasets with left/right annotations
+            # like 'Left-arm' and 'Right-arm' in LIP dataset
+            # Modified from https://github.com/openseg-group/openseg.pytorch/blob/master/lib/datasets/tools/cv2_aug_transforms.py # noqa:E501
+            # Licensed under MIT license
+            temp = seg_map.copy()
+            assert isinstance(self.swap_seg_labels, (tuple, list))
+            for pair in self.swap_seg_labels:
+                assert isinstance(pair, (tuple, list)) and len(pair) == 2, \
+                    'swap_seg_labels must be a sequence with pair, but got ' \
+                    f'{self.swap_seg_labels}.'
+                seg_map[temp == pair[0]] = pair[1]
+                seg_map[temp == pair[1]] = pair[0]
+        return seg_map
+
+    @cache_randomness
+    def _choose_direction(self) -> str:
+        """Choose the flip direction according to `prob` and `direction`"""
+        if isinstance(self.direction,
+                      Sequence) and not isinstance(self.direction, str):
+            # None means non-flip
+            direction_list: list = list(self.direction) + [None]
+        elif isinstance(self.direction, str):
+            # None means non-flip
+            direction_list = [self.direction, None]
+
+        if isinstance(self.prob, list):
+            non_prob: float = 1 - sum(self.prob)
+            prob_list = self.prob + [non_prob]
+        elif isinstance(self.prob, float):
+            non_prob = 1. - self.prob
+            # exclude non-flip
+            single_ratio = self.prob / (len(direction_list) - 1)
+            prob_list = [single_ratio] * (len(direction_list) - 1) + [non_prob]
+
+        cur_dir = np.random.choice(direction_list, p=prob_list)
+
+        return cur_dir
+
+    def _flip(self, results: dict) -> None:
+        """Flip images, bounding boxes, semantic segmentation map and
+        keypoints."""
+        # flip image
+        results['img'] = mmcv.imflip(
+            results['img'], direction=results['flip_direction'])
+
+        img_shape = results['img'].shape[:2]
+
+        # flip bboxes
+        if results.get('gt_bboxes', None) is not None:
+            results['gt_bboxes'] = self._flip_bbox(results['gt_bboxes'],
+                                                   img_shape,
+                                                   results['flip_direction'])
+
+        # flip keypoints
+        if results.get('gt_keypoints', None) is not None:
+            results['gt_keypoints'] = self._flip_keypoints(
+                results['gt_keypoints'], img_shape, results['flip_direction'])
+
+        # flip seg map
+        if results.get('gt_seg_map', None) is not None:
+            results['gt_seg_map'] = self._flip_seg_map(
+                results['gt_seg_map'], direction=results['flip_direction'])
+            results['swap_seg_labels'] = self.swap_seg_labels
+
+    def _flip_on_direction(self, results: dict) -> None:
+        """Function to flip images, bounding boxes, semantic segmentation map
+        and keypoints."""
+        cur_dir = self._choose_direction()
+        if cur_dir is None:
+            results['flip'] = False
+            results['flip_direction'] = None
+        else:
+            results['flip'] = True
+            results['flip_direction'] = cur_dir
+            self._flip(results)
+
+    def transform(self, results: dict) -> dict:
+        """Transform function to flip images, bounding boxes, semantic
+        segmentation map and keypoints.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+
+        Returns:
+            dict: Flipped results, 'img', 'gt_bboxes', 'gt_seg_map',
+            'gt_keypoints', 'flip', and 'flip_direction' keys are
+            updated in result dict.
+        """
+        self._flip_on_direction(results)
+
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(prob={self.prob}, '
+        repr_str += f'direction={self.direction})'
+
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomResize(BaseTransform):
+    """Random resize images & bbox & keypoints.
+
+    How to choose the target scale to resize the image will follow the rules
+    below:
+
+    - if ``scale`` is a sequence of tuple
+
+    .. math::
+        target\\_scale[0] \\sim Uniform([scale[0][0], scale[1][0]])
+    .. math::
+        target\\_scale[1] \\sim Uniform([scale[0][1], scale[1][1]])
+
+    Following the resize order of weight and height in cv2, ``scale[i][0]``
+    is for width, and ``scale[i][1]`` is for height.
+
+    - if ``scale`` is a tuple
+
+    .. math::
+        target\\_scale[0] \\sim Uniform([ratio\\_range[0], ratio\\_range[1]])
+            * scale[0]
+    .. math::
+        target\\_scale[0] \\sim Uniform([ratio\\_range[0], ratio\\_range[1]])
+            * scale[1]
+
+    Following the resize order of weight and height in cv2, ``ratio_range[0]``
+    is for width, and ``ratio_range[1]`` is for height.
+
+    - if ``keep_ratio`` is True, the minimum value of ``target_scale`` will be
+      used to set the shorter side and the maximum value will be used to
+      set the longer side.
+
+    - if ``keep_ratio`` is False, the value of ``target_scale`` will be used to
+      reisze the width and height accordingly.
+
+    Required Keys:
+
+    - img
+    - gt_bboxes
+    - gt_seg_map
+    - gt_keypoints
+
+    Modified Keys:
+
+    - img
+    - gt_bboxes
+    - gt_seg_map
+    - gt_keypoints
+    - img_shape
+
+    Added Keys:
+
+    - scale
+    - scale_factor
+    - keep_ratio
+
+    Args:
+        scale (tuple or Sequence[tuple]): Images scales for resizing.
+            Defaults to None.
+        ratio_range (tuple[float], optional): (min_ratio, max_ratio).
+            Defaults to None.
+        resize_type (str): The type of resize class to use. Defaults to
+            "Resize".
+        **resize_kwargs: Other keyword arguments for the ``resize_type``.
+
+    Note:
+        By defaults, the ``resize_type`` is "Resize", if it's not overwritten
+        by your registry, it indicates the :class:`mmcv.Resize`. And therefore,
+        ``resize_kwargs`` accepts any keyword arguments of it, like
+        ``keep_ratio``, ``interpolation`` and so on.
+
+        If you want to use your custom resize class, the class should accept
+        ``scale`` argument and have ``scale`` attribution which determines the
+        resize shape.
+    """
+
+    def __init__(
+        self,
+        scale: Union[Tuple[int, int], Sequence[Tuple[int, int]]],
+        ratio_range: Tuple[float, float] = None,
+        resize_type: str = 'Resize',
+        **resize_kwargs,
+    ) -> None:
+
+        self.scale = scale
+        self.ratio_range = ratio_range
+
+        self.resize_cfg = dict(type=resize_type, **resize_kwargs)
+        # create a empty Reisize object
+        self.resize = TRANSFORMS.build({'scale': 0, **self.resize_cfg})
+
+    @staticmethod
+    def _random_sample(scales: Sequence[Tuple[int, int]]) -> tuple:
+        """Private function to randomly sample a scale from a list of tuples.
+
+        Args:
+            scales (list[tuple]): Images scale range for sampling.
+                There must be two tuples in scales, which specify the lower
+                and upper bound of image scales.
+
+        Returns:
+            tuple: The targeted scale of the image to be resized.
+        """
+
+        assert mmengine.is_list_of(scales, tuple) and len(scales) == 2
+        scale_0 = [scales[0][0], scales[1][0]]
+        scale_1 = [scales[0][1], scales[1][1]]
+        edge_0 = np.random.randint(min(scale_0), max(scale_0) + 1)
+        edge_1 = np.random.randint(min(scale_1), max(scale_1) + 1)
+        scale = (edge_0, edge_1)
+        return scale
+
+    @staticmethod
+    def _random_sample_ratio(scale: tuple, ratio_range: Tuple[float,
+                                                              float]) -> tuple:
+        """Private function to randomly sample a scale from a tuple.
+
+        A ratio will be randomly sampled from the range specified by
+        ``ratio_range``. Then it would be multiplied with ``scale`` to
+        generate sampled scale.
+
+        Args:
+            scale (tuple): Images scale base to multiply with ratio.
+            ratio_range (tuple[float]): The minimum and maximum ratio to scale
+                the ``scale``.
+
+        Returns:
+            tuple: The targeted scale of the image to be resized.
+        """
+
+        assert isinstance(scale, tuple) and len(scale) == 2
+        min_ratio, max_ratio = ratio_range
+        assert min_ratio <= max_ratio
+        ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio
+        scale = int(scale[0] * ratio), int(scale[1] * ratio)
+        return scale
+
+    @cache_randomness
+    def _random_scale(self) -> tuple:
+        """Private function to randomly sample an scale according to the type
+        of ``scale``.
+
+        Returns:
+            tuple: The targeted scale of the image to be resized.
+        """
+
+        if mmengine.is_tuple_of(self.scale, int):
+            assert self.ratio_range is not None and len(self.ratio_range) == 2
+            scale = self._random_sample_ratio(
+                self.scale,  # type: ignore
+                self.ratio_range)
+        elif mmengine.is_seq_of(self.scale, tuple):
+            scale = self._random_sample(self.scale)  # type: ignore
+        else:
+            raise NotImplementedError('Do not support sampling function '
+                                      f'for "{self.scale}"')
+
+        return scale
+
+    def transform(self, results: dict) -> dict:
+        """Transform function to resize images, bounding boxes, semantic
+        segmentation map.
+
+        Args:
+            results (dict): Result dict from loading pipeline.
+
+        Returns:
+            dict: Resized results, ``img``, ``gt_bboxes``, ``gt_semantic_seg``,
+            ``gt_keypoints``, ``scale``, ``scale_factor``, ``img_shape``, and
+            ``keep_ratio`` keys are updated in result dict.
+        """
+        results['scale'] = self._random_scale()
+        self.resize.scale = results['scale']
+        results = self.resize(results)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(scale={self.scale}, '
+        repr_str += f'ratio_range={self.ratio_range}, '
+        repr_str += f'resize_cfg={self.resize_cfg})'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/utils.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..370580dcf4e2f01cad770e414d41871d6975a9d6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/utils.py
@@ -0,0 +1,249 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import functools
+import inspect
+import weakref
+from collections import defaultdict
+from collections.abc import Iterable
+from contextlib import contextmanager
+from typing import Callable, Union
+
+from .base import BaseTransform
+
+
+class cache_randomness:
+    """Decorator that marks the method with random return value(s) in a
+    transform class.
+
+    This decorator is usually used together with the context-manager
+    :func`:cache_random_params`. In this context, a decorated method will
+    cache its return value(s) at the first time of being invoked, and always
+    return the cached values when being invoked again.
+
+    .. note::
+        Only an instance method can be decorated with ``cache_randomness``.
+    """
+
+    def __init__(self, func):
+
+        # Check `func` is to be bound as an instance method
+        if not inspect.isfunction(func):
+            raise TypeError('Unsupport callable to decorate with'
+                            '@cache_randomness.')
+        func_args = inspect.getfullargspec(func).args
+        if len(func_args) == 0 or func_args[0] != 'self':
+            raise TypeError(
+                '@cache_randomness should only be used to decorate '
+                'instance methods (the first argument is ``self``).')
+
+        functools.update_wrapper(self, func)
+        self.func = func
+        self.instance_ref = None
+
+    def __set_name__(self, owner, name):
+        # Maintain a record of decorated methods in the class
+        if not hasattr(owner, '_methods_with_randomness'):
+            setattr(owner, '_methods_with_randomness', [])
+
+        # Here `name` equals to `self.__name__`, i.e., the name of the
+        # decorated function, due to the invocation of `update_wrapper` in
+        # `self.__init__()`
+        owner._methods_with_randomness.append(name)
+
+    def __call__(self, *args, **kwargs):
+        # Get the transform instance whose method is decorated
+        # by cache_randomness
+        instance = self.instance_ref()
+        name = self.__name__
+
+        # Check the flag ``self._cache_enabled``, which should be
+        # set by the contextmanagers like ``cache_random_parameters```
+        cache_enabled = getattr(instance, '_cache_enabled', False)
+
+        if cache_enabled:
+            # Initialize the cache of the transform instances. The flag
+            # ``cache_enabled``` is set by contextmanagers like
+            # ``cache_random_params```.
+            if not hasattr(instance, '_cache'):
+                setattr(instance, '_cache', {})
+
+            if name not in instance._cache:
+                instance._cache[name] = self.func(instance, *args, **kwargs)
+            # Return the cached value
+            return instance._cache[name]
+        else:
+            # Clear cache
+            if hasattr(instance, '_cache'):
+                del instance._cache
+            # Return function output
+            return self.func(instance, *args, **kwargs)
+
+    def __get__(self, obj, cls):
+        self.instance_ref = weakref.ref(obj)
+        return self
+
+
+def avoid_cache_randomness(cls):
+    """Decorator that marks a data transform class (subclass of
+    :class:`BaseTransform`) prohibited from caching randomness. With this
+    decorator, errors will be raised in following cases:
+
+        1. A method is defined in the class with the decorate
+    `cache_randomness`;
+        2. An instance of the class is invoked with the context
+    `cache_random_params`.
+
+    A typical usage of `avoid_cache_randomness` is to decorate the data
+    transforms with non-cacheable random behaviors (e.g., the random behavior
+    can not be defined in a method, thus can not be decorated with
+    `cache_randomness`). This is for preventing unintentinoal use of such data
+    transforms within the context of caching randomness, which may lead to
+    unexpected results.
+    """
+
+    # Check that cls is a data transform class
+    assert issubclass(cls, BaseTransform)
+
+    # Check that no method is decorated with `cache_randomness` in cls
+    if getattr(cls, '_methods_with_randomness', None):
+        raise RuntimeError(
+            f'Class {cls.__name__} decorated with '
+            '``avoid_cache_randomness`` should not have methods decorated '
+            'with ``cache_randomness`` (invalid methods: '
+            f'{cls._methods_with_randomness})')
+
+    class AvoidCacheRandomness:
+
+        def __get__(self, obj, objtype=None):
+            # Here we check the value in `objtype.__dict__` instead of
+            # directly checking the attribute
+            # `objtype._avoid_cache_randomness`. So if the base class is
+            # decorated with :func:`avoid_cache_randomness`, it will not be
+            # inherited by subclasses.
+            return objtype.__dict__.get('_avoid_cache_randomness', False)
+
+    cls.avoid_cache_randomness = AvoidCacheRandomness()
+    cls._avoid_cache_randomness = True
+
+    return cls
+
+
+@contextmanager
+def cache_random_params(transforms: Union[BaseTransform, Iterable]):
+    """Context-manager that enables the cache of return values of methods
+    decorated with ``cache_randomness`` in transforms.
+
+    In this mode, decorated methods will cache their return values on the
+    first invoking, and always return the cached value afterward. This allow
+    to apply random transforms in a deterministic way. For example, apply same
+    transforms on multiple examples. See ``cache_randomness`` for more
+    information.
+
+    Args:
+        transforms (BaseTransform|list[BaseTransform]): The transforms to
+            enable cache.
+    """
+
+    # key2method stores the original methods that are replaced by the wrapped
+    # ones. These methods will be restituted when exiting the context.
+    key2method = dict()
+
+    # key2counter stores the usage number of each cache_randomness. This is
+    # used to check that any cache_randomness is invoked once during processing
+    # on data sample.
+    key2counter: dict = defaultdict(int)
+
+    def _add_invoke_counter(obj, method_name):
+        method = getattr(obj, method_name)
+        key = f'{id(obj)}.{method_name}'
+        key2method[key] = method
+
+        @functools.wraps(method)
+        def wrapped(*args, **kwargs):
+            key2counter[key] += 1
+            return method(*args, **kwargs)
+
+        return wrapped
+
+    def _add_invoke_checker(obj, method_name):
+        # check that the method in _methods_with_randomness has been
+        # invoked at most once
+        method = getattr(obj, method_name)
+        key = f'{id(obj)}.{method_name}'
+        key2method[key] = method
+
+        @functools.wraps(method)
+        def wrapped(*args, **kwargs):
+            # clear counter
+            for name in obj._methods_with_randomness:
+                key = f'{id(obj)}.{name}'
+                key2counter[key] = 0
+
+            output = method(*args, **kwargs)
+
+            for name in obj._methods_with_randomness:
+                key = f'{id(obj)}.{name}'
+                if key2counter[key] > 1:
+                    raise RuntimeError(
+                        'The method decorated with ``cache_randomness`` '
+                        'should be invoked at most once during processing '
+                        f'one data sample. The method {name} of {obj} has '
+                        f'been invoked {key2counter[key]} times.')
+            return output
+
+        return wrapped
+
+    def _start_cache(t: BaseTransform):
+        # Check if cache is allowed for `t`
+        if getattr(t, 'avoid_cache_randomness', False):
+            raise RuntimeError(
+                f'Class {t.__class__.__name__} decorated with '
+                '``avoid_cache_randomness`` is not allowed to be used with'
+                ' ``cache_random_params`` (e.g. wrapped by '
+                '``ApplyToMultiple`` with ``share_random_params==True``).')
+
+        # Skip transforms w/o random method
+        if not hasattr(t, '_methods_with_randomness'):
+            return
+
+        # Set cache enabled flag
+        setattr(t, '_cache_enabled', True)
+
+        # Store the original method and init the counter
+        if hasattr(t, '_methods_with_randomness'):
+            setattr(t, 'transform', _add_invoke_checker(t, 'transform'))
+            for name in getattr(t, '_methods_with_randomness'):
+                setattr(t, name, _add_invoke_counter(t, name))
+
+    def _end_cache(t: BaseTransform):
+        # Skip transforms w/o random method
+        if not hasattr(t, '_methods_with_randomness'):
+            return
+
+        # Remove cache enabled flag
+        delattr(t, '_cache_enabled')
+        if hasattr(t, '_cache'):
+            delattr(t, '_cache')
+
+        # Restore the original method
+        if hasattr(t, '_methods_with_randomness'):
+            for name in getattr(t, '_methods_with_randomness'):
+                key = f'{id(t)}.{name}'
+                setattr(t, name, key2method[key])
+
+            key_transform = f'{id(t)}.transform'
+            setattr(t, 'transform', key2method[key_transform])
+
+    def _apply(t: Union[BaseTransform, Iterable],
+               func: Callable[[BaseTransform], None]):
+        if isinstance(t, BaseTransform):
+            func(t)
+        if isinstance(t, Iterable):
+            for _t in t:
+                _apply(_t, func)
+
+    try:
+        _apply(transforms, _start_cache)
+        yield
+    finally:
+        _apply(transforms, _end_cache)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/wrappers.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/wrappers.py
new file mode 100644
index 0000000000000000000000000000000000000000..132ddcc4f9bcbdda2aadc4cbee48bf2dfb6b5369
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/transforms/wrappers.py
@@ -0,0 +1,648 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import Any, Callable, Dict, List, Optional, Sequence, Union
+
+import mmengine
+import numpy as np
+
+from .base import BaseTransform
+from .builder import TRANSFORMS
+from .utils import cache_random_params, cache_randomness
+
+# Define type of transform or transform config
+Transform = Union[Dict, Callable[[Dict], Dict]]
+
+# Indicator of keys marked by KeyMapper._map_input, which means ignoring the
+# marked keys in KeyMapper._apply_transform so they will be invisible to
+# wrapped transforms.
+# This can be 2 possible case:
+# 1. The key is required but missing in results
+# 2. The key is manually set as ... (Ellipsis) in ``mapping``, which means
+# the original value in results should be ignored
+IgnoreKey = object()
+
+# Import nullcontext if python>=3.7, otherwise use a simple alternative
+# implementation.
+try:
+    from contextlib import nullcontext  # type: ignore
+except ImportError:
+    from contextlib import contextmanager
+
+    @contextmanager  # type: ignore
+    def nullcontext(resource=None):
+        try:
+            yield resource
+        finally:
+            pass
+
+
+class Compose(BaseTransform):
+    """Compose multiple transforms sequentially.
+
+    Args:
+        transforms (list[dict | callable]): Sequence of transform object or
+            config dict to be composed.
+
+    Examples:
+        >>> pipeline = [
+        >>>     dict(type='Compose',
+        >>>         transforms=[
+        >>>             dict(type='LoadImageFromFile'),
+        >>>             dict(type='Normalize')
+        >>>         ]
+        >>>     )
+        >>> ]
+    """
+
+    def __init__(self, transforms: Union[Transform, Sequence[Transform]]):
+        super().__init__()
+
+        if not isinstance(transforms, Sequence):
+            transforms = [transforms]
+        self.transforms: List = []
+        for transform in transforms:
+            if isinstance(transform, dict):
+                transform = TRANSFORMS.build(transform)
+                self.transforms.append(transform)
+            elif callable(transform):
+                self.transforms.append(transform)
+            else:
+                raise TypeError('transform must be callable or a dict, but got'
+                                f' {type(transform)}')
+
+    def __iter__(self):
+        """Allow easy iteration over the transform sequence."""
+        return iter(self.transforms)
+
+    def transform(self, results: Dict) -> Optional[Dict]:
+        """Call function to apply transforms sequentially.
+
+        Args:
+            results (dict): A result dict contains the results to transform.
+
+        Returns:
+            dict or None: Transformed results.
+        """
+        for t in self.transforms:
+            results = t(results)  # type: ignore
+            if results is None:
+                return None
+        return results
+
+    def __repr__(self):
+        """Compute the string representation."""
+        format_string = self.__class__.__name__ + '('
+        for t in self.transforms:
+            format_string += f'\n    {t}'
+        format_string += '\n)'
+        return format_string
+
+
+@TRANSFORMS.register_module()
+class KeyMapper(BaseTransform):
+    """A transform wrapper to map and reorganize the input/output of the
+    wrapped transforms (or sub-pipeline).
+
+    Args:
+        transforms (list[dict | callable], optional): Sequence of transform
+            object or config dict to be wrapped.
+        mapping (dict): A dict that defines the input key mapping.
+            The keys corresponds to the inner key (i.e., kwargs of the
+            ``transform`` method), and should be string type. The values
+            corresponds to the outer keys (i.e., the keys of the
+            data/results), and should have a type of string, list or dict.
+            None means not applying input mapping. Default: None.
+        remapping (dict): A dict that defines the output key mapping.
+            The keys and values have the same meanings and rules as in the
+            ``mapping``. Default: None.
+        auto_remap (bool, optional): If True, an inverse of the mapping will
+            be used as the remapping. If auto_remap is not given, it will be
+            automatically set True if 'remapping' is not given, and vice
+            versa. Default: None.
+        allow_nonexist_keys (bool): If False, the outer keys in the mapping
+            must exist in the input data, or an exception will be raised.
+            Default: False.
+
+    Examples:
+        >>> # Example 1: KeyMapper 'gt_img' to 'img'
+        >>> pipeline = [
+        >>>     # Use KeyMapper to convert outer (original) field name
+        >>>     # 'gt_img' to inner (used by inner transforms) filed name
+        >>>     # 'img'
+        >>>     dict(type='KeyMapper',
+        >>>         mapping={'img': 'gt_img'},
+        >>>         # auto_remap=True means output key mapping is the revert of
+        >>>         # the input key mapping, e.g. inner 'img' will be mapped
+        >>>         # back to outer 'gt_img'
+        >>>         auto_remap=True,
+        >>>         transforms=[
+        >>>             # In all transforms' implementation just use 'img'
+        >>>             # as a standard field name
+        >>>             dict(type='Crop', crop_size=(384, 384)),
+        >>>             dict(type='Normalize'),
+        >>>         ])
+        >>> ]
+
+        >>> # Example 2: Collect and structure multiple items
+        >>> pipeline = [
+        >>>     # The inner field 'imgs' will be a dict with keys 'img_src'
+        >>>     # and 'img_tar', whose values are outer fields 'img1' and
+        >>>     # 'img2' respectively.
+        >>>     dict(type='KeyMapper',
+        >>>         dict(
+        >>>             type='KeyMapper',
+        >>>             mapping=dict(
+        >>>                 imgs=dict(
+        >>>                     img_src='img1',
+        >>>                     img_tar='img2')),
+        >>>         transforms=...)
+        >>> ]
+
+        >>> # Example 3: Manually set ignored keys by "..."
+        >>> pipeline = [
+        >>>     ...
+        >>>     dict(type='KeyMapper',
+        >>>         mapping={
+        >>>             # map outer key "gt_img" to inner key "img"
+        >>>             'img': 'gt_img',
+        >>>             # ignore outer key "mask"
+        >>>             'mask': ...,
+        >>>         },
+        >>>         transforms=[
+        >>>             dict(type='RandomFlip'),
+        >>>         ])
+        >>>     ...
+        >>> ]
+    """
+
+    def __init__(self,
+                 transforms: Union[Transform, List[Transform]] = None,
+                 mapping: Optional[Dict] = None,
+                 remapping: Optional[Dict] = None,
+                 auto_remap: Optional[bool] = None,
+                 allow_nonexist_keys: bool = False):
+
+        super().__init__()
+
+        self.allow_nonexist_keys = allow_nonexist_keys
+        self.mapping = mapping
+
+        if auto_remap is None:
+            auto_remap = remapping is None
+        self.auto_remap = auto_remap
+
+        if self.auto_remap:
+            if remapping is not None:
+                raise ValueError('KeyMapper: ``remapping`` must be None if'
+                                 '`auto_remap` is set True.')
+            self.remapping = mapping
+        else:
+            self.remapping = remapping
+
+        if transforms is None:
+            transforms = []
+        self.transforms = Compose(transforms)
+
+    def __iter__(self):
+        """Allow easy iteration over the transform sequence."""
+        return iter(self.transforms)
+
+    def _map_input(self, data: Dict,
+                   mapping: Optional[Dict]) -> Dict[str, Any]:
+        """KeyMapper inputs for the wrapped transforms by gathering and
+        renaming data items according to the mapping.
+
+        Args:
+            data (dict): The original input data
+            mapping (dict, optional): The input key mapping. See the document
+                of ``mmcv.transforms.wrappers.KeyMapper`` for details. In
+                set None, return the input data directly.
+
+        Returns:
+            dict: The input data with remapped keys. This will be the actual
+                input of the wrapped pipeline.
+        """
+
+        if mapping is None:
+            return data.copy()
+
+        def _map(data, m):
+            if isinstance(m, dict):
+                # m is a dict {inner_key:outer_key, ...}
+                return {k_in: _map(data, k_out) for k_in, k_out in m.items()}
+            if isinstance(m, (tuple, list)):
+                # m is a list or tuple [outer_key1, outer_key2, ...]
+                # This is the case when we collect items from the original
+                # data to form a list or tuple to feed to the wrapped
+                # transforms.
+                return m.__class__(_map(data, e) for e in m)
+
+            # allow manually mark a key to be ignored by ...
+            if m is ...:
+                return IgnoreKey
+
+            # m is an outer_key
+            if self.allow_nonexist_keys:
+                return data.get(m, IgnoreKey)
+            else:
+                return data.get(m)
+
+        collected = _map(data, mapping)
+
+        # Retain unmapped items
+        inputs = data.copy()
+        inputs.update(collected)
+
+        return inputs
+
+    def _map_output(self, data: Dict,
+                    remapping: Optional[Dict]) -> Dict[str, Any]:
+        """KeyMapper outputs from the wrapped transforms by gathering and
+        renaming data items according to the remapping.
+
+        Args:
+            data (dict): The output of the wrapped pipeline.
+            remapping (dict, optional): The output key mapping. See the
+                document of ``mmcv.transforms.wrappers.KeyMapper`` for
+                details. If ``remapping is None``, no key mapping will be
+                applied but only remove the special token ``IgnoreKey``.
+
+        Returns:
+            dict: The output with remapped keys.
+        """
+
+        # Remove ``IgnoreKey``
+        if remapping is None:
+            return {k: v for k, v in data.items() if v is not IgnoreKey}
+
+        def _map(data, m):
+            if isinstance(m, dict):
+                assert isinstance(data, dict)
+                results = {}
+                for k_in, k_out in m.items():
+                    assert k_in in data
+                    results.update(_map(data[k_in], k_out))
+                return results
+            if isinstance(m, (list, tuple)):
+                assert isinstance(data, (list, tuple))
+                assert len(data) == len(m)
+                results = {}
+                for m_i, d_i in zip(m, data):
+                    results.update(_map(d_i, m_i))
+                return results
+
+            # ``m is ...`` means the key is marked ignored, in which case the
+            # inner resuls will not affect the outer results in remapping.
+            # Another case that will have ``data is IgnoreKey`` is that the
+            # key is missing in the inputs. In this case, if the inner key is
+            # created by the wrapped transforms, it will be remapped to the
+            # corresponding outer key during remapping.
+            if m is ... or data is IgnoreKey:
+                return {}
+
+            return {m: data}
+
+        # Note that unmapped items are not retained, which is different from
+        # the behavior in _map_input. This is to avoid original data items
+        # being overwritten by intermediate namesakes
+        return _map(data, remapping)
+
+    def _apply_transforms(self, inputs: Dict) -> Dict:
+        """Apply ``self.transforms``.
+
+        Note that the special token ``IgnoreKey`` will be invisible to
+        ``self.transforms``, but not removed in this method. It will be
+        eventually removed in :func:``self._map_output``.
+        """
+        results = inputs.copy()
+        inputs = {k: v for k, v in inputs.items() if v is not IgnoreKey}
+        outputs = self.transforms(inputs)
+
+        if outputs is None:
+            raise ValueError(
+                f'Transforms wrapped by {self.__class__.__name__} should '
+                'not return None.')
+
+        results.update(outputs)  # type: ignore
+        return results
+
+    def transform(self, results: Dict) -> Dict:
+        """Apply mapping, wrapped transforms and remapping."""
+
+        # Apply mapping
+        inputs = self._map_input(results, self.mapping)
+        # Apply wrapped transforms
+        outputs = self._apply_transforms(inputs)
+        # Apply remapping
+        outputs = self._map_output(outputs, self.remapping)
+
+        results.update(outputs)  # type: ignore
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(transforms = {self.transforms}'
+        repr_str += f', mapping = {self.mapping}'
+        repr_str += f', remapping = {self.remapping}'
+        repr_str += f', auto_remap = {self.auto_remap}'
+        repr_str += f', allow_nonexist_keys = {self.allow_nonexist_keys})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class TransformBroadcaster(KeyMapper):
+    """A transform wrapper to apply the wrapped transforms to multiple data
+    items. For example, apply Resize to multiple images.
+
+    Args:
+        transforms (list[dict | callable]): Sequence of transform object or
+            config dict to be wrapped.
+        mapping (dict): A dict that defines the input key mapping.
+            Note that to apply the transforms to multiple data items, the
+            outer keys of the target items should be remapped as a list with
+            the standard inner key (The key required by the wrapped transform).
+            See the following example and the document of
+            ``mmcv.transforms.wrappers.KeyMapper`` for details.
+        remapping (dict): A dict that defines the output key mapping.
+            The keys and values have the same meanings and rules as in the
+            ``mapping``. Default: None.
+        auto_remap (bool, optional): If True, an inverse of the mapping will
+            be used as the remapping. If auto_remap is not given, it will be
+            automatically set True if 'remapping' is not given, and vice
+            versa. Default: None.
+        allow_nonexist_keys (bool): If False, the outer keys in the mapping
+            must exist in the input data, or an exception will be raised.
+            Default: False.
+        share_random_params (bool): If True, the random transform
+            (e.g., RandomFlip) will be conducted in a deterministic way and
+            have the same behavior on all data items. For example, to randomly
+            flip either both input image and ground-truth image, or none.
+            Default: False.
+
+    .. note::
+        To apply the transforms to each elements of a list or tuple, instead
+        of separating data items, you can map the outer key of the target
+        sequence to the standard inner key. See example 2.
+        example.
+
+    Examples:
+        >>> # Example 1: Broadcast to enumerated keys, each contains a single
+        >>> # data element
+        >>> pipeline = [
+        >>>     dict(type='LoadImageFromFile', key='lq'),  # low-quality img
+        >>>     dict(type='LoadImageFromFile', key='gt'),  # ground-truth img
+        >>>     # TransformBroadcaster maps multiple outer fields to standard
+        >>>     # the inner field and process them with wrapped transforms
+        >>>     # respectively
+        >>>     dict(type='TransformBroadcaster',
+        >>>         # case 1: from multiple outer fields
+        >>>         mapping={'img': ['lq', 'gt']},
+        >>>         auto_remap=True,
+        >>>         # share_random_param=True means using identical random
+        >>>         # parameters in every processing
+        >>>         share_random_param=True,
+        >>>         transforms=[
+        >>>             dict(type='Crop', crop_size=(384, 384)),
+        >>>             dict(type='Normalize'),
+        >>>         ])
+        >>> ]
+
+        >>> # Example 2: Broadcast to keys that contains data sequences
+        >>> pipeline = [
+        >>>     dict(type='LoadImageFromFile', key='lq'),  # low-quality img
+        >>>     dict(type='LoadImageFromFile', key='gt'),  # ground-truth img
+        >>>     # TransformBroadcaster maps multiple outer fields to standard
+        >>>     # the inner field and process them with wrapped transforms
+        >>>     # respectively
+        >>>     dict(type='TransformBroadcaster',
+        >>>         # case 2: from one outer field that contains multiple
+        >>>         # data elements (e.g. a list)
+        >>>         # mapping={'img': 'images'},
+        >>>         auto_remap=True,
+        >>>         share_random_param=True,
+        >>>         transforms=[
+        >>>             dict(type='Crop', crop_size=(384, 384)),
+        >>>             dict(type='Normalize'),
+        >>>         ])
+        >>> ]
+
+        >>> Example 3: Set ignored keys in broadcasting
+        >>> pipeline = [
+        >>>        dict(type='TransformBroadcaster',
+        >>>            # Broadcast the wrapped transforms to multiple images
+        >>>            # 'lq' and 'gt, but only update 'img_shape' once
+        >>>            mapping={
+        >>>                'img': ['lq', 'gt'],
+        >>>                'img_shape': ['img_shape', ...],
+        >>>             },
+        >>>            auto_remap=True,
+        >>>            share_random_params=True,
+        >>>            transforms=[
+        >>>                # `RandomCrop` will modify the field "img",
+        >>>                # and optionally update "img_shape" if it exists
+        >>>                dict(type='RandomCrop'),
+        >>>            ])
+        >>>    ]
+    """
+
+    def __init__(self,
+                 transforms: List[Union[Dict, Callable[[Dict], Dict]]],
+                 mapping: Optional[Dict] = None,
+                 remapping: Optional[Dict] = None,
+                 auto_remap: Optional[bool] = None,
+                 allow_nonexist_keys: bool = False,
+                 share_random_params: bool = False):
+        super().__init__(transforms, mapping, remapping, auto_remap,
+                         allow_nonexist_keys)
+
+        self.share_random_params = share_random_params
+
+    def scatter_sequence(self, data: Dict) -> List[Dict]:
+        """Scatter the broadcasting targets to a list of inputs of the wrapped
+        transforms."""
+
+        # infer split number from input
+        seq_len = 0
+        key_rep = None
+
+        if self.mapping:
+            keys = self.mapping.keys()
+        else:
+            keys = data.keys()
+
+        for key in keys:
+            assert isinstance(data[key], Sequence)
+            if seq_len:
+                if len(data[key]) != seq_len:
+                    raise ValueError('Got inconsistent sequence length: '
+                                     f'{seq_len} ({key_rep}) vs. '
+                                     f'{len(data[key])} ({key})')
+            else:
+                seq_len = len(data[key])
+                key_rep = key
+
+        assert seq_len > 0, 'Fail to get the number of broadcasting targets'
+
+        scatters = []
+        for i in range(seq_len):  # type: ignore
+            scatter = data.copy()
+            for key in keys:
+                scatter[key] = data[key][i]
+            scatters.append(scatter)
+        return scatters
+
+    def transform(self, results: Dict):
+        """Broadcast wrapped transforms to multiple targets."""
+
+        # Apply input remapping
+        inputs = self._map_input(results, self.mapping)
+
+        # Scatter sequential inputs into a list
+        input_scatters = self.scatter_sequence(inputs)
+
+        # Control random parameter sharing with a context manager
+        if self.share_random_params:
+            # The context manager :func`:cache_random_params` will let
+            # cacheable method of the transforms cache their outputs. Thus
+            # the random parameters will only generated once and shared
+            # by all data items.
+            ctx = cache_random_params  # type: ignore
+        else:
+            ctx = nullcontext  # type: ignore
+
+        with ctx(self.transforms):
+            output_scatters = [
+                self._apply_transforms(_input) for _input in input_scatters
+            ]
+
+        # Collate output scatters (list of dict to dict of list)
+        outputs = {
+            key: [_output[key] for _output in output_scatters]
+            for key in output_scatters[0]
+        }
+
+        # Apply remapping
+        outputs = self._map_output(outputs, self.remapping)
+
+        results.update(outputs)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(transforms = {self.transforms}'
+        repr_str += f', mapping = {self.mapping}'
+        repr_str += f', remapping = {self.remapping}'
+        repr_str += f', auto_remap = {self.auto_remap}'
+        repr_str += f', allow_nonexist_keys = {self.allow_nonexist_keys}'
+        repr_str += f', share_random_params = {self.share_random_params})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomChoice(BaseTransform):
+    """Process data with a randomly chosen transform from given candidates.
+
+    Args:
+        transforms (list[list]): A list of transform candidates, each is a
+            sequence of transforms.
+        prob (list[float], optional): The probabilities associated
+            with each pipeline. The length should be equal to the pipeline
+            number and the sum should be 1. If not given, a uniform
+            distribution will be assumed.
+
+    Examples:
+        >>> # config
+        >>> pipeline = [
+        >>>     dict(type='RandomChoice',
+        >>>         transforms=[
+        >>>             [dict(type='RandomHorizontalFlip')],  # subpipeline 1
+        >>>             [dict(type='RandomRotate')],  # subpipeline 2
+        >>>         ]
+        >>>     )
+        >>> ]
+    """
+
+    def __init__(self,
+                 transforms: List[Union[Transform, List[Transform]]],
+                 prob: Optional[List[float]] = None):
+
+        super().__init__()
+
+        if prob is not None:
+            assert mmengine.is_seq_of(prob, float)
+            assert len(transforms) == len(prob), \
+                '``transforms`` and ``prob`` must have same lengths. ' \
+                f'Got {len(transforms)} vs {len(prob)}.'
+            assert sum(prob) == 1
+
+        self.prob = prob
+        self.transforms = [Compose(transforms) for transforms in transforms]
+
+    def __iter__(self):
+        return iter(self.transforms)
+
+    @cache_randomness
+    def random_pipeline_index(self) -> int:
+        """Return a random transform index."""
+        indices = np.arange(len(self.transforms))
+        return np.random.choice(indices, p=self.prob)
+
+    def transform(self, results: Dict) -> Optional[Dict]:
+        """Randomly choose a transform to apply."""
+        idx = self.random_pipeline_index()
+        return self.transforms[idx](results)
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(transforms = {self.transforms}'
+        repr_str += f'prob = {self.prob})'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomApply(BaseTransform):
+    """Apply transforms randomly with a given probability.
+
+    Args:
+        transforms (list[dict | callable]): The transform or transform list
+            to randomly apply.
+        prob (float): The probability to apply transforms. Default: 0.5
+
+    Examples:
+        >>> # config
+        >>> pipeline = [
+        >>>     dict(type='RandomApply',
+        >>>         transforms=[dict(type='HorizontalFlip')],
+        >>>         prob=0.3)
+        >>> ]
+    """
+
+    def __init__(self,
+                 transforms: Union[Transform, List[Transform]],
+                 prob: float = 0.5):
+
+        super().__init__()
+        self.prob = prob
+        self.transforms = Compose(transforms)
+
+    def __iter__(self):
+        return iter(self.transforms)
+
+    @cache_randomness
+    def random_apply(self) -> bool:
+        """Return a random bool value indicating whether apply the
+        transform."""
+        return np.random.rand() < self.prob
+
+    def transform(self, results: Dict) -> Optional[Dict]:
+        """Randomly apply the transform."""
+        if self.random_apply():
+            return self.transforms(results)  # type: ignore
+        else:
+            return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(transforms = {self.transforms}'
+        repr_str += f', prob = {self.prob})'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..53ebb9453749d470a261e2aace1e5d2c47266545
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .device_type import (IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE,
+                          IS_MPS_AVAILABLE, IS_NPU_AVAILABLE)
+from .env import collect_env
+from .parrots_jit import jit, skip_no_elena
+
+__all__ = [
+    'IS_MLU_AVAILABLE', 'IS_MPS_AVAILABLE', 'IS_CUDA_AVAILABLE',
+    'IS_NPU_AVAILABLE', 'collect_env', 'jit', 'skip_no_elena'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/device_type.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/device_type.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a84371276df230f21119ba58155f37b973eb367
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/device_type.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.device import (is_cuda_available, is_mlu_available,
+                             is_mps_available, is_npu_available)
+
+IS_MLU_AVAILABLE = is_mlu_available()
+IS_MPS_AVAILABLE = is_mps_available()
+IS_CUDA_AVAILABLE = is_cuda_available()
+IS_NPU_AVAILABLE = is_npu_available()
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/env.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/env.py
new file mode 100644
index 0000000000000000000000000000000000000000..27988cf2aead8573533076430b9c488a51be3a24
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/env.py
@@ -0,0 +1,80 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file holding some environment constant for sharing by other files."""
+
+import os.path as osp
+import subprocess
+
+import torch
+from mmengine.utils.dl_utils import collect_env as mmengine_collect_env
+
+import mmcv
+
+
+def collect_env():
+    """Collect the information of the running environments.
+
+    Returns:
+        dict: The environment information. The following fields are contained.
+
+            - sys.platform: The variable of ``sys.platform``.
+            - Python: Python version.
+            - CUDA available: Bool, indicating if CUDA is available.
+            - GPU devices: Device type of each GPU.
+            - CUDA_HOME (optional): The env var ``CUDA_HOME``.
+            - NVCC (optional): NVCC version.
+            - GCC: GCC version, "n/a" if GCC is not installed.
+            - MSVC: Microsoft Virtual C++ Compiler version, Windows only.
+            - PyTorch: PyTorch version.
+            - PyTorch compiling details: The output of \
+                ``torch.__config__.show()``.
+            - TorchVision (optional): TorchVision version.
+            - OpenCV: OpenCV version.
+            - MMEngine: MMEngine version.
+            - MMCV: MMCV version.
+            - MMCV Compiler: The GCC version for compiling MMCV ops.
+            - MMCV CUDA Compiler: The CUDA version for compiling MMCV ops.
+    """
+    env_info = mmengine_collect_env()
+
+    # MMEngine does not add the hipcc compiler information when collecting
+    # environment information, so it is added here. When MMEngine v0.3.0 is
+    # released, the code here can be removed.
+    cuda_available = torch.cuda.is_available()
+    if cuda_available and env_info.get('NVCC') == 'Not Available':
+        CUDA_HOME = env_info['CUDA_HOME']
+        if CUDA_HOME is not None and osp.isdir(CUDA_HOME):
+            if CUDA_HOME == '/opt/rocm':
+                try:
+                    nvcc = osp.join(CUDA_HOME, 'hip/bin/hipcc')
+                    nvcc = subprocess.check_output(
+                        f'"{nvcc}" --version', shell=True)
+                    nvcc = nvcc.decode('utf-8').strip()
+                    release = nvcc.rfind('HIP version:')
+                    build = nvcc.rfind('')
+                    nvcc = nvcc[release:build].strip()
+                except subprocess.SubprocessError:
+                    nvcc = 'Not Available'
+            else:
+                try:
+                    nvcc = osp.join(CUDA_HOME, 'bin/nvcc')
+                    nvcc = subprocess.check_output(f'"{nvcc}" -V', shell=True)
+                    nvcc = nvcc.decode('utf-8').strip()
+                    release = nvcc.rfind('Cuda compilation tools')
+                    build = nvcc.rfind('Build ')
+                    nvcc = nvcc[release:build].strip()
+                except subprocess.SubprocessError:
+                    nvcc = 'Not Available'
+            env_info['NVCC'] = nvcc
+
+    env_info['MMCV'] = mmcv.__version__
+
+    try:
+        from mmcv.ops import get_compiler_version, get_compiling_cuda_version
+    except ModuleNotFoundError:
+        env_info['MMCV Compiler'] = 'n/a'
+        env_info['MMCV CUDA Compiler'] = 'n/a'
+    else:
+        env_info['MMCV Compiler'] = get_compiler_version()
+        env_info['MMCV CUDA Compiler'] = get_compiling_cuda_version()
+
+    return env_info
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/ext_loader.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/ext_loader.py
new file mode 100644
index 0000000000000000000000000000000000000000..a31e107dfef8b710dc56fd887f569097d1c63208
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/ext_loader.py
@@ -0,0 +1,72 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import importlib
+import os
+import pkgutil
+import warnings
+from collections import namedtuple
+
+import torch
+
+if torch.__version__ != 'parrots':
+
+    def load_ext(name, funcs):
+        ext = importlib.import_module('mmcv.' + name)
+        for fun in funcs:
+            assert hasattr(ext, fun), f'{fun} miss in module {name}'
+        return ext
+else:
+    from parrots import extension
+    from parrots.base import ParrotsException
+
+    has_return_value_ops = [
+        'nms',
+        'softnms',
+        'nms_match',
+        'nms_rotated',
+        'top_pool_forward',
+        'top_pool_backward',
+        'bottom_pool_forward',
+        'bottom_pool_backward',
+        'left_pool_forward',
+        'left_pool_backward',
+        'right_pool_forward',
+        'right_pool_backward',
+        'fused_bias_leakyrelu',
+        'upfirdn2d',
+        'ms_deform_attn_forward',
+        'pixel_group',
+        'contour_expand',
+        'diff_iou_rotated_sort_vertices_forward',
+    ]
+
+    def get_fake_func(name, e):
+
+        def fake_func(*args, **kwargs):
+            warnings.warn(f'{name} is not supported in parrots now')
+            raise e
+
+        return fake_func
+
+    def load_ext(name, funcs):
+        ExtModule = namedtuple('ExtModule', funcs)
+        ext_list = []
+        lib_root = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+        for fun in funcs:
+            try:
+                ext_fun = extension.load(fun, name, lib_dir=lib_root)
+            except ParrotsException as e:
+                if 'No element registered' not in e.message:
+                    warnings.warn(e.message)
+                ext_fun = get_fake_func(fun, e)
+                ext_list.append(ext_fun)
+            else:
+                if fun in has_return_value_ops:
+                    ext_list.append(ext_fun.op)
+                else:
+                    ext_list.append(ext_fun.op_)
+        return ExtModule(*ext_list)
+
+
+def check_ops_exist() -> bool:
+    ext_loader = pkgutil.find_loader('mmcv._ext')
+    return ext_loader is not None
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/parrots_jit.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/parrots_jit.py
new file mode 100644
index 0000000000000000000000000000000000000000..0e3a58c242db29c5c3cc140e073c977fc14c3d9f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/utils/parrots_jit.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+from mmengine.utils.dl_utils.parrots_wrapper import TORCH_VERSION
+
+parrots_jit_option = os.getenv('PARROTS_JIT_OPTION')
+
+if TORCH_VERSION == 'parrots' and parrots_jit_option == 'ON':
+    from parrots.jit import pat as jit
+else:
+
+    def jit(func=None,
+            check_input=None,
+            full_shape=True,
+            derivate=False,
+            coderize=False,
+            optimize=False):
+
+        def wrapper(func):
+
+            def wrapper_inner(*args, **kargs):
+                return func(*args, **kargs)
+
+            return wrapper_inner
+
+        if func is None:
+            return wrapper
+        else:
+            return func
+
+
+if TORCH_VERSION == 'parrots':
+    from parrots.utils.tester import skip_no_elena
+else:
+
+    def skip_no_elena(func):
+
+        def wrapper(*args, **kargs):
+            return func(*args, **kargs)
+
+        return wrapper
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/version.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/version.py
new file mode 100644
index 0000000000000000000000000000000000000000..e96cd5078d2326f33af8285c14daf29751d4ff39
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/version.py
@@ -0,0 +1,35 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+__version__ = '2.0.0rc4'
+
+
+def parse_version_info(version_str: str, length: int = 4) -> tuple:
+    """Parse a version string into a tuple.
+
+    Args:
+        version_str (str): The version string.
+        length (int): The maximum number of version levels. Default: 4.
+
+    Returns:
+        tuple[int | str]: The version info, e.g., "1.3.0" is parsed into
+            (1, 3, 0, 0, 0, 0), and "2.0.0rc1" is parsed into
+            (2, 0, 0, 0, 'rc', 1) (when length is set to 4).
+    """
+    from packaging.version import parse
+    version = parse(version_str)
+    assert version.release, f'failed to parse version {version_str}'
+    release = list(version.release)
+    release = release[:length]
+    if len(release) < length:
+        release = release + [0] * (length - len(release))
+    if version.is_prerelease:
+        release.extend(list(version.pre))  # type: ignore
+    elif version.is_postrelease:
+        release.extend(list(version.post))  # type: ignore
+    else:
+        release.extend([0, 0])
+    return tuple(release)
+
+
+version_info = (2, 0, 0, 0, 'rc', 4)
+
+__all__ = ['__version__', 'version_info', 'parse_version_info']
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/video/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..73199b01dec52820dc6ca0139903536344d5a1eb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .io import Cache, VideoReader, frames2video
+from .optflow import (dequantize_flow, flow_from_bytes, flow_warp, flowread,
+                      flowwrite, quantize_flow, sparse_flow_from_bytes)
+from .processing import concat_video, convert_video, cut_video, resize_video
+
+__all__ = [
+    'Cache', 'VideoReader', 'frames2video', 'convert_video', 'resize_video',
+    'cut_video', 'concat_video', 'flowread', 'flowwrite', 'quantize_flow',
+    'dequantize_flow', 'flow_warp', 'flow_from_bytes', 'sparse_flow_from_bytes'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/video/io.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/io.py
new file mode 100644
index 0000000000000000000000000000000000000000..378f5b9f7cc72984f543d262533044d8b031b4e9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/io.py
@@ -0,0 +1,316 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+from collections import OrderedDict
+
+import cv2
+from cv2 import (CAP_PROP_FOURCC, CAP_PROP_FPS, CAP_PROP_FRAME_COUNT,
+                 CAP_PROP_FRAME_HEIGHT, CAP_PROP_FRAME_WIDTH,
+                 CAP_PROP_POS_FRAMES, VideoWriter_fourcc)
+from mmengine.utils import (check_file_exist, mkdir_or_exist, scandir,
+                            track_progress)
+
+
+class Cache:
+
+    def __init__(self, capacity):
+        self._cache = OrderedDict()
+        self._capacity = int(capacity)
+        if capacity <= 0:
+            raise ValueError('capacity must be a positive integer')
+
+    @property
+    def capacity(self):
+        return self._capacity
+
+    @property
+    def size(self):
+        return len(self._cache)
+
+    def put(self, key, val):
+        if key in self._cache:
+            return
+        if len(self._cache) >= self.capacity:
+            self._cache.popitem(last=False)
+        self._cache[key] = val
+
+    def get(self, key, default=None):
+        val = self._cache[key] if key in self._cache else default
+        return val
+
+
+class VideoReader:
+    """Video class with similar usage to a list object.
+
+    This video wrapper class provides convenient apis to access frames.
+    There exists an issue of OpenCV's VideoCapture class that jumping to a
+    certain frame may be inaccurate. It is fixed in this class by checking
+    the position after jumping each time.
+    Cache is used when decoding videos. So if the same frame is visited for
+    the second time, there is no need to decode again if it is stored in the
+    cache.
+
+    Examples:
+        >>> import mmcv
+        >>> v = mmcv.VideoReader('sample.mp4')
+        >>> len(v)  # get the total frame number with `len()`
+        120
+        >>> for img in v:  # v is iterable
+        >>>     mmcv.imshow(img)
+        >>> v[5]  # get the 6th frame
+    """
+
+    def __init__(self, filename, cache_capacity=10):
+        # Check whether the video path is a url
+        if not filename.startswith(('https://', 'http://')):
+            check_file_exist(filename, 'Video file not found: ' + filename)
+        self._vcap = cv2.VideoCapture(filename)
+        assert cache_capacity > 0
+        self._cache = Cache(cache_capacity)
+        self._position = 0
+        # get basic info
+        self._width = int(self._vcap.get(CAP_PROP_FRAME_WIDTH))
+        self._height = int(self._vcap.get(CAP_PROP_FRAME_HEIGHT))
+        self._fps = self._vcap.get(CAP_PROP_FPS)
+        self._frame_cnt = int(self._vcap.get(CAP_PROP_FRAME_COUNT))
+        self._fourcc = self._vcap.get(CAP_PROP_FOURCC)
+
+    @property
+    def vcap(self):
+        """:obj:`cv2.VideoCapture`: The raw VideoCapture object."""
+        return self._vcap
+
+    @property
+    def opened(self):
+        """bool: Indicate whether the video is opened."""
+        return self._vcap.isOpened()
+
+    @property
+    def width(self):
+        """int: Width of video frames."""
+        return self._width
+
+    @property
+    def height(self):
+        """int: Height of video frames."""
+        return self._height
+
+    @property
+    def resolution(self):
+        """tuple: Video resolution (width, height)."""
+        return (self._width, self._height)
+
+    @property
+    def fps(self):
+        """float: FPS of the video."""
+        return self._fps
+
+    @property
+    def frame_cnt(self):
+        """int: Total frames of the video."""
+        return self._frame_cnt
+
+    @property
+    def fourcc(self):
+        """str: "Four character code" of the video."""
+        return self._fourcc
+
+    @property
+    def position(self):
+        """int: Current cursor position, indicating frame decoded."""
+        return self._position
+
+    def _get_real_position(self):
+        return int(round(self._vcap.get(CAP_PROP_POS_FRAMES)))
+
+    def _set_real_position(self, frame_id):
+        self._vcap.set(CAP_PROP_POS_FRAMES, frame_id)
+        pos = self._get_real_position()
+        for _ in range(frame_id - pos):
+            self._vcap.read()
+        self._position = frame_id
+
+    def read(self):
+        """Read the next frame.
+
+        If the next frame have been decoded before and in the cache, then
+        return it directly, otherwise decode, cache and return it.
+
+        Returns:
+            ndarray or None: Return the frame if successful, otherwise None.
+        """
+        # pos = self._position
+        if self._cache:
+            img = self._cache.get(self._position)
+            if img is not None:
+                ret = True
+            else:
+                if self._position != self._get_real_position():
+                    self._set_real_position(self._position)
+                ret, img = self._vcap.read()
+                if ret:
+                    self._cache.put(self._position, img)
+        else:
+            ret, img = self._vcap.read()
+        if ret:
+            self._position += 1
+        return img
+
+    def get_frame(self, frame_id):
+        """Get frame by index.
+
+        Args:
+            frame_id (int): Index of the expected frame, 0-based.
+
+        Returns:
+            ndarray or None: Return the frame if successful, otherwise None.
+        """
+        if frame_id < 0 or frame_id >= self._frame_cnt:
+            raise IndexError(
+                f'"frame_id" must be between 0 and {self._frame_cnt - 1}')
+        if frame_id == self._position:
+            return self.read()
+        if self._cache:
+            img = self._cache.get(frame_id)
+            if img is not None:
+                self._position = frame_id + 1
+                return img
+        self._set_real_position(frame_id)
+        ret, img = self._vcap.read()
+        if ret:
+            if self._cache:
+                self._cache.put(self._position, img)
+            self._position += 1
+        return img
+
+    def current_frame(self):
+        """Get the current frame (frame that is just visited).
+
+        Returns:
+            ndarray or None: If the video is fresh, return None, otherwise
+            return the frame.
+        """
+        if self._position == 0:
+            return None
+        return self._cache.get(self._position - 1)
+
+    def cvt2frames(self,
+                   frame_dir,
+                   file_start=0,
+                   filename_tmpl='{:06d}.jpg',
+                   start=0,
+                   max_num=0,
+                   show_progress=True):
+        """Convert a video to frame images.
+
+        Args:
+            frame_dir (str): Output directory to store all the frame images.
+            file_start (int): Filenames will start from the specified number.
+            filename_tmpl (str): Filename template with the index as the
+                placeholder.
+            start (int): The starting frame index.
+            max_num (int): Maximum number of frames to be written.
+            show_progress (bool): Whether to show a progress bar.
+        """
+        mkdir_or_exist(frame_dir)
+        if max_num == 0:
+            task_num = self.frame_cnt - start
+        else:
+            task_num = min(self.frame_cnt - start, max_num)
+        if task_num <= 0:
+            raise ValueError('start must be less than total frame number')
+        if start > 0:
+            self._set_real_position(start)
+
+        def write_frame(file_idx):
+            img = self.read()
+            if img is None:
+                return
+            filename = osp.join(frame_dir, filename_tmpl.format(file_idx))
+            cv2.imwrite(filename, img)
+
+        if show_progress:
+            track_progress(write_frame, range(file_start,
+                                              file_start + task_num))
+        else:
+            for i in range(task_num):
+                write_frame(file_start + i)
+
+    def __len__(self):
+        return self.frame_cnt
+
+    def __getitem__(self, index):
+        if isinstance(index, slice):
+            return [
+                self.get_frame(i)
+                for i in range(*index.indices(self.frame_cnt))
+            ]
+        # support negative indexing
+        if index < 0:
+            index += self.frame_cnt
+            if index < 0:
+                raise IndexError('index out of range')
+        return self.get_frame(index)
+
+    def __iter__(self):
+        self._set_real_position(0)
+        return self
+
+    def __next__(self):
+        img = self.read()
+        if img is not None:
+            return img
+        else:
+            raise StopIteration
+
+    next = __next__
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        self._vcap.release()
+
+
+def frames2video(frame_dir: str,
+                 video_file: str,
+                 fps: float = 30,
+                 fourcc: str = 'XVID',
+                 filename_tmpl: str = '{:06d}.jpg',
+                 start: int = 0,
+                 end: int = 0,
+                 show_progress: bool = True) -> None:
+    """Read the frame images from a directory and join them as a video.
+
+    Args:
+        frame_dir (str): The directory containing video frames.
+        video_file (str): Output filename.
+        fps (float): FPS of the output video.
+        fourcc (str): Fourcc of the output video, this should be compatible
+            with the output file type.
+        filename_tmpl (str): Filename template with the index as the variable.
+        start (int): Starting frame index.
+        end (int): Ending frame index.
+        show_progress (bool): Whether to show a progress bar.
+    """
+    if end == 0:
+        ext = filename_tmpl.split('.')[-1]
+        end = len([name for name in scandir(frame_dir, ext)])
+    first_file = osp.join(frame_dir, filename_tmpl.format(start))
+    check_file_exist(first_file, 'The start frame not found: ' + first_file)
+    img = cv2.imread(first_file)
+    height, width = img.shape[:2]
+    resolution = (width, height)
+    vwriter = cv2.VideoWriter(video_file, VideoWriter_fourcc(*fourcc), fps,
+                              resolution)
+
+    def write_frame(file_idx):
+        filename = osp.join(frame_dir, filename_tmpl.format(file_idx))
+        img = cv2.imread(filename)
+        vwriter.write(img)
+
+    if show_progress:
+        track_progress(write_frame, range(start, end))
+    else:
+        for i in range(start, end):
+            write_frame(i)
+    vwriter.release()
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/video/optflow.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/optflow.py
new file mode 100644
index 0000000000000000000000000000000000000000..edd3e42069ff53a0782ef722403d8ae0ec36291a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/optflow.py
@@ -0,0 +1,272 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Tuple, Union
+
+import cv2
+import numpy as np
+from mmengine.utils import is_str
+
+from mmcv.arraymisc import dequantize, quantize
+from mmcv.image import imread, imwrite
+
+
+def flowread(flow_or_path: Union[np.ndarray, str],
+             quantize: bool = False,
+             concat_axis: int = 0,
+             *args,
+             **kwargs) -> np.ndarray:
+    """Read an optical flow map.
+
+    Args:
+        flow_or_path (ndarray or str): A flow map or filepath.
+        quantize (bool): whether to read quantized pair, if set to True,
+            remaining args will be passed to :func:`dequantize_flow`.
+        concat_axis (int): The axis that dx and dy are concatenated,
+            can be either 0 or 1. Ignored if quantize is False.
+
+    Returns:
+        ndarray: Optical flow represented as a (h, w, 2) numpy array
+    """
+    if isinstance(flow_or_path, np.ndarray):
+        if (flow_or_path.ndim != 3) or (flow_or_path.shape[-1] != 2):
+            raise ValueError(f'Invalid flow with shape {flow_or_path.shape}')
+        return flow_or_path
+    elif not is_str(flow_or_path):
+        raise TypeError(f'"flow_or_path" must be a filename or numpy array, '
+                        f'not {type(flow_or_path)}')
+
+    if not quantize:
+        with open(flow_or_path, 'rb') as f:
+            try:
+                header = f.read(4).decode('utf-8')
+            except Exception:
+                raise OSError(f'Invalid flow file: {flow_or_path}')
+            else:
+                if header != 'PIEH':
+                    raise OSError(f'Invalid flow file: {flow_or_path}, '
+                                  'header does not contain PIEH')
+
+            w = np.fromfile(f, np.int32, 1).squeeze()
+            h = np.fromfile(f, np.int32, 1).squeeze()
+            flow = np.fromfile(f, np.float32, w * h * 2).reshape((h, w, 2))
+    else:
+        assert concat_axis in [0, 1]
+        cat_flow = imread(flow_or_path, flag='unchanged')
+        if cat_flow.ndim != 2:
+            raise OSError(
+                f'{flow_or_path} is not a valid quantized flow file, '
+                f'its dimension is {cat_flow.ndim}.')
+        assert cat_flow.shape[concat_axis] % 2 == 0
+        dx, dy = np.split(cat_flow, 2, axis=concat_axis)
+        flow = dequantize_flow(dx, dy, *args, **kwargs)
+
+    return flow.astype(np.float32)
+
+
+def flowwrite(flow: np.ndarray,
+              filename: str,
+              quantize: bool = False,
+              concat_axis: int = 0,
+              *args,
+              **kwargs) -> None:
+    """Write optical flow to file.
+
+    If the flow is not quantized, it will be saved as a .flo file losslessly,
+    otherwise a jpeg image which is lossy but of much smaller size. (dx and dy
+    will be concatenated horizontally into a single image if quantize is True.)
+
+    Args:
+        flow (ndarray): (h, w, 2) array of optical flow.
+        filename (str): Output filepath.
+        quantize (bool): Whether to quantize the flow and save it to 2 jpeg
+            images. If set to True, remaining args will be passed to
+            :func:`quantize_flow`.
+        concat_axis (int): The axis that dx and dy are concatenated,
+            can be either 0 or 1. Ignored if quantize is False.
+    """
+    if not quantize:
+        with open(filename, 'wb') as f:
+            f.write(b'PIEH')
+            np.array([flow.shape[1], flow.shape[0]], dtype=np.int32).tofile(f)
+            flow = flow.astype(np.float32)
+            flow.tofile(f)
+            f.flush()
+    else:
+        assert concat_axis in [0, 1]
+        dx, dy = quantize_flow(flow, *args, **kwargs)
+        dxdy = np.concatenate((dx, dy), axis=concat_axis)
+        imwrite(dxdy, filename)
+
+
+def quantize_flow(flow: np.ndarray,
+                  max_val: float = 0.02,
+                  norm: bool = True) -> tuple:
+    """Quantize flow to [0, 255].
+
+    After this step, the size of flow will be much smaller, and can be
+    dumped as jpeg images.
+
+    Args:
+        flow (ndarray): (h, w, 2) array of optical flow.
+        max_val (float): Maximum value of flow, values beyond
+                        [-max_val, max_val] will be truncated.
+        norm (bool): Whether to divide flow values by image width/height.
+
+    Returns:
+        tuple[ndarray]: Quantized dx and dy.
+    """
+    h, w, _ = flow.shape
+    dx = flow[..., 0]
+    dy = flow[..., 1]
+    if norm:
+        dx = dx / w  # avoid inplace operations
+        dy = dy / h
+    # use 255 levels instead of 256 to make sure 0 is 0 after dequantization.
+    flow_comps = [
+        quantize(d, -max_val, max_val, 255, np.uint8) for d in [dx, dy]
+    ]
+    return tuple(flow_comps)
+
+
+def dequantize_flow(dx: np.ndarray,
+                    dy: np.ndarray,
+                    max_val: float = 0.02,
+                    denorm: bool = True) -> np.ndarray:
+    """Recover from quantized flow.
+
+    Args:
+        dx (ndarray): Quantized dx.
+        dy (ndarray): Quantized dy.
+        max_val (float): Maximum value used when quantizing.
+        denorm (bool): Whether to multiply flow values with width/height.
+
+    Returns:
+        ndarray: Dequantized flow.
+    """
+    assert dx.shape == dy.shape
+    assert dx.ndim == 2 or (dx.ndim == 3 and dx.shape[-1] == 1)
+
+    dx, dy = (dequantize(d, -max_val, max_val, 255) for d in [dx, dy])
+
+    if denorm:
+        dx *= dx.shape[1]
+        dy *= dx.shape[0]
+    flow = np.dstack((dx, dy))
+    return flow
+
+
+def flow_warp(img: np.ndarray,
+              flow: np.ndarray,
+              filling_value: int = 0,
+              interpolate_mode: str = 'nearest') -> np.ndarray:
+    """Use flow to warp img.
+
+    Args:
+        img (ndarray): Image to be warped.
+        flow (ndarray): Optical Flow.
+        filling_value (int): The missing pixels will be set with filling_value.
+        interpolate_mode (str): bilinear -> Bilinear Interpolation;
+                                nearest -> Nearest Neighbor.
+
+    Returns:
+        ndarray: Warped image with the same shape of img
+    """
+    warnings.warn('This function is just for prototyping and cannot '
+                  'guarantee the computational efficiency.')
+    assert flow.ndim == 3, 'Flow must be in 3D arrays.'
+    height = flow.shape[0]
+    width = flow.shape[1]
+    channels = img.shape[2]
+
+    output = np.ones(
+        (height, width, channels), dtype=img.dtype) * filling_value
+
+    grid = np.indices((height, width)).swapaxes(0, 1).swapaxes(1, 2)
+    dx = grid[:, :, 0] + flow[:, :, 1]
+    dy = grid[:, :, 1] + flow[:, :, 0]
+    sx = np.floor(dx).astype(int)
+    sy = np.floor(dy).astype(int)
+    valid = (sx >= 0) & (sx < height - 1) & (sy >= 0) & (sy < width - 1)
+
+    if interpolate_mode == 'nearest':
+        output[valid, :] = img[dx[valid].round().astype(int),
+                               dy[valid].round().astype(int), :]
+    elif interpolate_mode == 'bilinear':
+        # dirty walkround for integer positions
+        eps_ = 1e-6
+        dx, dy = dx + eps_, dy + eps_
+        left_top_ = img[np.floor(dx[valid]).astype(int),
+                        np.floor(dy[valid]).astype(int), :] * (
+                            np.ceil(dx[valid]) - dx[valid])[:, None] * (
+                                np.ceil(dy[valid]) - dy[valid])[:, None]
+        left_down_ = img[np.ceil(dx[valid]).astype(int),
+                         np.floor(dy[valid]).astype(int), :] * (
+                             dx[valid] - np.floor(dx[valid]))[:, None] * (
+                                 np.ceil(dy[valid]) - dy[valid])[:, None]
+        right_top_ = img[np.floor(dx[valid]).astype(int),
+                         np.ceil(dy[valid]).astype(int), :] * (
+                             np.ceil(dx[valid]) - dx[valid])[:, None] * (
+                                 dy[valid] - np.floor(dy[valid]))[:, None]
+        right_down_ = img[np.ceil(dx[valid]).astype(int),
+                          np.ceil(dy[valid]).astype(int), :] * (
+                              dx[valid] - np.floor(dx[valid]))[:, None] * (
+                                  dy[valid] - np.floor(dy[valid]))[:, None]
+        output[valid, :] = left_top_ + left_down_ + right_top_ + right_down_
+    else:
+        raise NotImplementedError(
+            'We only support interpolation modes of nearest and bilinear, '
+            f'but got {interpolate_mode}.')
+    return output.astype(img.dtype)
+
+
+def flow_from_bytes(content: bytes) -> np.ndarray:
+    """Read dense optical flow from bytes.
+
+    .. note::
+        This load optical flow function works for FlyingChairs, FlyingThings3D,
+        Sintel, FlyingChairsOcc datasets, but cannot load the data from
+        ChairsSDHom.
+
+    Args:
+        content (bytes): Optical flow bytes got from files or other streams.
+
+    Returns:
+        ndarray: Loaded optical flow with the shape (H, W, 2).
+    """
+
+    # header in first 4 bytes
+    header = content[:4]
+    if header.decode('utf-8') != 'PIEH':
+        raise Exception('Flow file header does not contain PIEH')
+    # width in second 4 bytes
+    width = np.frombuffer(content[4:], np.int32, 1).squeeze()
+    # height in third 4 bytes
+    height = np.frombuffer(content[8:], np.int32, 1).squeeze()
+    # after first 12 bytes, all bytes are flow
+    flow = np.frombuffer(content[12:], np.float32, width * height * 2).reshape(
+        (height, width, 2))
+
+    return flow
+
+
+def sparse_flow_from_bytes(content: bytes) -> Tuple[np.ndarray, np.ndarray]:
+    """Read the optical flow in KITTI datasets from bytes.
+
+    This function is modified from RAFT load the `KITTI datasets
+    <https://github.com/princeton-vl/RAFT/blob/224320502d66c356d88e6c712f38129e60661e80/core/utils/frame_utils.py#L102>`_.
+
+    Args:
+        content (bytes): Optical flow bytes got from files or other streams.
+
+    Returns:
+        Tuple(ndarray, ndarray): Loaded optical flow with the shape (H, W, 2)
+        and flow valid mask with the shape (H, W).
+    """  # nopa
+
+    content = np.frombuffer(content, np.uint8)
+    flow = cv2.imdecode(content, cv2.IMREAD_ANYDEPTH | cv2.IMREAD_COLOR)
+    flow = flow[:, :, ::-1].astype(np.float32)
+    # flow shape (H, W, 2) valid shape (H, W)
+    flow, valid = flow[:, :, :2], flow[:, :, 2]
+    flow = (flow - 2**15) / 64.0
+    return flow, valid
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/video/processing.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/processing.py
new file mode 100644
index 0000000000000000000000000000000000000000..4962e08a9e7c0c05279146491c71282708289c32
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/video/processing.py
@@ -0,0 +1,161 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import subprocess
+import tempfile
+from typing import List, Optional, Union
+
+from mmengine.utils import requires_executable
+
+
+@requires_executable('ffmpeg')
+def convert_video(in_file: str,
+                  out_file: str,
+                  print_cmd: bool = False,
+                  pre_options: str = '',
+                  **kwargs) -> None:
+    """Convert a video with ffmpeg.
+
+    This provides a general api to ffmpeg, the executed command is::
+
+        `ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`
+
+    Options(kwargs) are mapped to ffmpeg commands with the following rules:
+
+    - key=val: "-key val"
+    - key=True: "-key"
+    - key=False: ""
+
+    Args:
+        in_file (str): Input video filename.
+        out_file (str): Output video filename.
+        pre_options (str): Options appears before "-i <in_file>".
+        print_cmd (bool): Whether to print the final ffmpeg command.
+    """
+    options = []
+    for k, v in kwargs.items():
+        if isinstance(v, bool):
+            if v:
+                options.append(f'-{k}')
+        elif k == 'log_level':
+            assert v in [
+                'quiet', 'panic', 'fatal', 'error', 'warning', 'info',
+                'verbose', 'debug', 'trace'
+            ]
+            options.append(f'-loglevel {v}')
+        else:
+            options.append(f'-{k} {v}')
+    cmd = f'ffmpeg -y {pre_options} -i {in_file} {" ".join(options)} ' \
+          f'{out_file}'
+    if print_cmd:
+        print(cmd)
+    subprocess.call(cmd, shell=True)
+
+
+@requires_executable('ffmpeg')
+def resize_video(in_file: str,
+                 out_file: str,
+                 size: Optional[tuple] = None,
+                 ratio: Union[tuple, float, None] = None,
+                 keep_ar: bool = False,
+                 log_level: str = 'info',
+                 print_cmd: bool = False) -> None:
+    """Resize a video.
+
+    Args:
+        in_file (str): Input video filename.
+        out_file (str): Output video filename.
+        size (tuple): Expected size (w, h), eg, (320, 240) or (320, -1).
+        ratio (tuple or float): Expected resize ratio, (2, 0.5) means
+            (w*2, h*0.5).
+        keep_ar (bool): Whether to keep original aspect ratio.
+        log_level (str): Logging level of ffmpeg.
+        print_cmd (bool): Whether to print the final ffmpeg command.
+    """
+    if size is None and ratio is None:
+        raise ValueError('expected size or ratio must be specified')
+    if size is not None and ratio is not None:
+        raise ValueError('size and ratio cannot be specified at the same time')
+    options = {'log_level': log_level}
+    if size:
+        if not keep_ar:
+            options['vf'] = f'scale={size[0]}:{size[1]}'
+        else:
+            options['vf'] = f'scale=w={size[0]}:h={size[1]}:' \
+                            'force_original_aspect_ratio=decrease'
+    else:
+        if not isinstance(ratio, tuple):
+            ratio = (ratio, ratio)
+        options['vf'] = f'scale="trunc(iw*{ratio[0]}):trunc(ih*{ratio[1]})"'
+    convert_video(in_file, out_file, print_cmd, **options)
+
+
+@requires_executable('ffmpeg')
+def cut_video(in_file: str,
+              out_file: str,
+              start: Optional[float] = None,
+              end: Optional[float] = None,
+              vcodec: Optional[str] = None,
+              acodec: Optional[str] = None,
+              log_level: str = 'info',
+              print_cmd: bool = False) -> None:
+    """Cut a clip from a video.
+
+    Args:
+        in_file (str): Input video filename.
+        out_file (str): Output video filename.
+        start (None or float): Start time (in seconds).
+        end (None or float): End time (in seconds).
+        vcodec (None or str): Output video codec, None for unchanged.
+        acodec (None or str): Output audio codec, None for unchanged.
+        log_level (str): Logging level of ffmpeg.
+        print_cmd (bool): Whether to print the final ffmpeg command.
+    """
+    options = {'log_level': log_level}
+    if vcodec is None:
+        options['vcodec'] = 'copy'
+    if acodec is None:
+        options['acodec'] = 'copy'
+    if start:
+        options['ss'] = start  # type: ignore
+    else:
+        start = 0
+    if end:
+        options['t'] = end - start  # type: ignore
+    convert_video(in_file, out_file, print_cmd, **options)
+
+
+@requires_executable('ffmpeg')
+def concat_video(video_list: List,
+                 out_file: str,
+                 vcodec: Optional[str] = None,
+                 acodec: Optional[str] = None,
+                 log_level: str = 'info',
+                 print_cmd: bool = False) -> None:
+    """Concatenate multiple videos into a single one.
+
+    Args:
+        video_list (list): A list of video filenames
+        out_file (str): Output video filename
+        vcodec (None or str): Output video codec, None for unchanged
+        acodec (None or str): Output audio codec, None for unchanged
+        log_level (str): Logging level of ffmpeg.
+        print_cmd (bool): Whether to print the final ffmpeg command.
+    """
+    tmp_filehandler, tmp_filename = tempfile.mkstemp(suffix='.txt', text=True)
+    with open(tmp_filename, 'w') as f:
+        for filename in video_list:
+            f.write(f'file {osp.abspath(filename)}\n')
+    options = {'log_level': log_level}
+    if vcodec is None:
+        options['vcodec'] = 'copy'
+    if acodec is None:
+        options['acodec'] = 'copy'
+    convert_video(
+        tmp_filename,
+        out_file,
+        print_cmd,
+        pre_options='-f concat -safe 0',
+        **options)
+    os.close(tmp_filehandler)
+    os.remove(tmp_filename)
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/__init__.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..835df136bdcf69348281d22914d41aa84cdf92b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .color import Color, color_val
+from .image import imshow, imshow_bboxes, imshow_det_bboxes
+from .optflow import flow2rgb, flowshow, make_color_wheel
+
+__all__ = [
+    'Color', 'color_val', 'imshow', 'imshow_bboxes', 'imshow_det_bboxes',
+    'flowshow', 'flow2rgb', 'make_color_wheel'
+]
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/color.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/color.py
new file mode 100644
index 0000000000000000000000000000000000000000..05796a80c38fb6b167369fc696f3f6aa1935e00d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/color.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from enum import Enum
+from typing import Union
+
+import numpy as np
+from mmengine.utils import is_str
+
+
+class Color(Enum):
+    """An enum that defines common colors.
+
+    Contains red, green, blue, cyan, yellow, magenta, white and black.
+    """
+    red = (0, 0, 255)
+    green = (0, 255, 0)
+    blue = (255, 0, 0)
+    cyan = (255, 255, 0)
+    yellow = (0, 255, 255)
+    magenta = (255, 0, 255)
+    white = (255, 255, 255)
+    black = (0, 0, 0)
+
+
+def color_val(color: Union[Color, str, tuple, int, np.ndarray]) -> tuple:
+    """Convert various input to color tuples.
+
+    Args:
+        color (:obj:`Color`/str/tuple/int/ndarray): Color inputs
+
+    Returns:
+        tuple[int]: A tuple of 3 integers indicating BGR channels.
+    """
+    if is_str(color):
+        return Color[color].value  # type: ignore
+    elif isinstance(color, Color):
+        return color.value
+    elif isinstance(color, tuple):
+        assert len(color) == 3
+        for channel in color:
+            assert 0 <= channel <= 255
+        return color
+    elif isinstance(color, int):
+        assert 0 <= color <= 255
+        return color, color, color
+    elif isinstance(color, np.ndarray):
+        assert color.ndim == 1 and color.size == 3
+        assert np.all((color >= 0) & (color <= 255))
+        color = color.astype(np.uint8)
+        return tuple(color)
+    else:
+        raise TypeError(f'Invalid type for color: {type(color)}')
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/image.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/image.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7ac4c181744cb08e51a77707c970400a9198a74
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/image.py
@@ -0,0 +1,161 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Optional, Union
+
+import cv2
+import numpy as np
+
+from mmcv.image import imread, imwrite
+from .color import Color, color_val
+
+# a type alias declares the optional types of color argument
+ColorType = Union[Color, str, tuple, int, np.ndarray]
+
+
+def imshow(img: Union[str, np.ndarray],
+           win_name: str = '',
+           wait_time: int = 0):
+    """Show an image.
+
+    Args:
+        img (str or ndarray): The image to be displayed.
+        win_name (str): The window name.
+        wait_time (int): Value of waitKey param.
+    """
+    cv2.imshow(win_name, imread(img))
+    if wait_time == 0:  # prevent from hanging if windows was closed
+        while True:
+            ret = cv2.waitKey(1)
+
+            closed = cv2.getWindowProperty(win_name, cv2.WND_PROP_VISIBLE) < 1
+            # if user closed window or if some key pressed
+            if closed or ret != -1:
+                break
+    else:
+        ret = cv2.waitKey(wait_time)
+
+
+def imshow_bboxes(img: Union[str, np.ndarray],
+                  bboxes: Union[list, np.ndarray],
+                  colors: ColorType = 'green',
+                  top_k: int = -1,
+                  thickness: int = 1,
+                  show: bool = True,
+                  win_name: str = '',
+                  wait_time: int = 0,
+                  out_file: Optional[str] = None):
+    """Draw bboxes on an image.
+
+    Args:
+        img (str or ndarray): The image to be displayed.
+        bboxes (list or ndarray): A list of ndarray of shape (k, 4).
+        colors (Color or str or tuple or int or ndarray): A list of colors.
+        top_k (int): Plot the first k bboxes only if set positive.
+        thickness (int): Thickness of lines.
+        show (bool): Whether to show the image.
+        win_name (str): The window name.
+        wait_time (int): Value of waitKey param.
+        out_file (str, optional): The filename to write the image.
+
+    Returns:
+        ndarray: The image with bboxes drawn on it.
+    """
+    img = imread(img)
+    img = np.ascontiguousarray(img)
+
+    if isinstance(bboxes, np.ndarray):
+        bboxes = [bboxes]
+    if not isinstance(colors, list):
+        colors = [colors for _ in range(len(bboxes))]
+    colors = [color_val(c) for c in colors]
+    assert len(bboxes) == len(colors)
+
+    for i, _bboxes in enumerate(bboxes):
+        _bboxes = _bboxes.astype(np.int32)
+        if top_k <= 0:
+            _top_k = _bboxes.shape[0]
+        else:
+            _top_k = min(top_k, _bboxes.shape[0])
+        for j in range(_top_k):
+            left_top = (_bboxes[j, 0], _bboxes[j, 1])
+            right_bottom = (_bboxes[j, 2], _bboxes[j, 3])
+            cv2.rectangle(
+                img, left_top, right_bottom, colors[i], thickness=thickness)
+
+    if show:
+        imshow(img, win_name, wait_time)
+    if out_file is not None:
+        imwrite(img, out_file)
+    return img
+
+
+def imshow_det_bboxes(img: Union[str, np.ndarray],
+                      bboxes: np.ndarray,
+                      labels: np.ndarray,
+                      class_names: List[str] = None,
+                      score_thr: float = 0,
+                      bbox_color: ColorType = 'green',
+                      text_color: ColorType = 'green',
+                      thickness: int = 1,
+                      font_scale: float = 0.5,
+                      show: bool = True,
+                      win_name: str = '',
+                      wait_time: int = 0,
+                      out_file: Optional[str] = None):
+    """Draw bboxes and class labels (with scores) on an image.
+
+    Args:
+        img (str or ndarray): The image to be displayed.
+        bboxes (ndarray): Bounding boxes (with scores), shaped (n, 4) or
+            (n, 5).
+        labels (ndarray): Labels of bboxes.
+        class_names (list[str]): Names of each classes.
+        score_thr (float): Minimum score of bboxes to be shown.
+        bbox_color (Color or str or tuple or int or ndarray): Color
+            of bbox lines.
+        text_color (Color or str or tuple or int or ndarray): Color
+            of texts.
+        thickness (int): Thickness of lines.
+        font_scale (float): Font scales of texts.
+        show (bool): Whether to show the image.
+        win_name (str): The window name.
+        wait_time (int): Value of waitKey param.
+        out_file (str or None): The filename to write the image.
+
+    Returns:
+        ndarray: The image with bboxes drawn on it.
+    """
+    assert bboxes.ndim == 2
+    assert labels.ndim == 1
+    assert bboxes.shape[0] == labels.shape[0]
+    assert bboxes.shape[1] == 4 or bboxes.shape[1] == 5
+    img = imread(img)
+    img = np.ascontiguousarray(img)
+
+    if score_thr > 0:
+        assert bboxes.shape[1] == 5
+        scores = bboxes[:, -1]
+        inds = scores > score_thr
+        bboxes = bboxes[inds, :]
+        labels = labels[inds]
+
+    bbox_color = color_val(bbox_color)
+    text_color = color_val(text_color)
+
+    for bbox, label in zip(bboxes, labels):
+        bbox_int = bbox.astype(np.int32)
+        left_top = (bbox_int[0], bbox_int[1])
+        right_bottom = (bbox_int[2], bbox_int[3])
+        cv2.rectangle(
+            img, left_top, right_bottom, bbox_color, thickness=thickness)
+        label_text = class_names[
+            label] if class_names is not None else f'cls {label}'
+        if len(bbox) > 4:
+            label_text += f'|{bbox[-1]:.02f}'
+        cv2.putText(img, label_text, (bbox_int[0], bbox_int[1] - 2),
+                    cv2.FONT_HERSHEY_COMPLEX, font_scale, text_color)
+
+    if show:
+        imshow(img, win_name, wait_time)
+    if out_file is not None:
+        imwrite(img, out_file)
+    return img
diff --git a/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/optflow.py b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/optflow.py
new file mode 100644
index 0000000000000000000000000000000000000000..080b0e61f401c2aab3eedd307d8fc8686b0cae08
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/mmcv/visualization/optflow.py
@@ -0,0 +1,116 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Union
+
+import numpy as np
+
+from mmcv.image import rgb2bgr
+from mmcv.video import flowread
+from .image import imshow
+
+
+def flowshow(flow: Union[np.ndarray, str],
+             win_name: str = '',
+             wait_time: int = 0) -> None:
+    """Show optical flow.
+
+    Args:
+        flow (ndarray or str): The optical flow to be displayed.
+        win_name (str): The window name.
+        wait_time (int): Value of waitKey param.
+    """
+    flow = flowread(flow)
+    flow_img = flow2rgb(flow)
+    imshow(rgb2bgr(flow_img), win_name, wait_time)
+
+
+def flow2rgb(flow: np.ndarray,
+             color_wheel: Optional[np.ndarray] = None,
+             unknown_thr: float = 1e6) -> np.ndarray:
+    """Convert flow map to RGB image.
+
+    Args:
+        flow (ndarray): Array of optical flow.
+        color_wheel (ndarray or None): Color wheel used to map flow field to
+            RGB colorspace. Default color wheel will be used if not specified.
+        unknown_thr (float): Values above this threshold will be marked as
+            unknown and thus ignored.
+
+    Returns:
+        ndarray: RGB image that can be visualized.
+    """
+    assert flow.ndim == 3 and flow.shape[-1] == 2
+    if color_wheel is None:
+        color_wheel = make_color_wheel()
+    assert color_wheel.ndim == 2 and color_wheel.shape[1] == 3
+    num_bins = color_wheel.shape[0]
+
+    dx = flow[:, :, 0].copy()
+    dy = flow[:, :, 1].copy()
+
+    ignore_inds = (
+        np.isnan(dx) | np.isnan(dy) | (np.abs(dx) > unknown_thr) |
+        (np.abs(dy) > unknown_thr))
+    dx[ignore_inds] = 0
+    dy[ignore_inds] = 0
+
+    rad = np.sqrt(dx**2 + dy**2)
+    if np.any(rad > np.finfo(float).eps):
+        max_rad = np.max(rad)
+        dx /= max_rad
+        dy /= max_rad
+
+    rad = np.sqrt(dx**2 + dy**2)
+    angle = np.arctan2(-dy, -dx) / np.pi
+
+    bin_real = (angle + 1) / 2 * (num_bins - 1)
+    bin_left = np.floor(bin_real).astype(int)
+    bin_right = (bin_left + 1) % num_bins
+    w = (bin_real - bin_left.astype(np.float32))[..., None]
+    flow_img = (1 -
+                w) * color_wheel[bin_left, :] + w * color_wheel[bin_right, :]
+    small_ind = rad <= 1
+    flow_img[small_ind] = 1 - rad[small_ind, None] * (1 - flow_img[small_ind])
+    flow_img[np.logical_not(small_ind)] *= 0.75
+
+    flow_img[ignore_inds, :] = 0
+
+    return flow_img
+
+
+def make_color_wheel(bins: Optional[Union[list, tuple]] = None) -> np.ndarray:
+    """Build a color wheel.
+
+    Args:
+        bins(list or tuple, optional): Specify the number of bins for each
+            color range, corresponding to six ranges: red -> yellow,
+            yellow -> green, green -> cyan, cyan -> blue, blue -> magenta,
+            magenta -> red. [15, 6, 4, 11, 13, 6] is used for default
+            (see Middlebury).
+
+    Returns:
+        ndarray: Color wheel of shape (total_bins, 3).
+    """
+    if bins is None:
+        bins = [15, 6, 4, 11, 13, 6]
+    assert len(bins) == 6
+
+    RY, YG, GC, CB, BM, MR = tuple(bins)
+
+    ry = [1, np.arange(RY) / RY, 0]
+    yg = [1 - np.arange(YG) / YG, 1, 0]
+    gc = [0, 1, np.arange(GC) / GC]
+    cb = [0, 1 - np.arange(CB) / CB, 1]
+    bm = [np.arange(BM) / BM, 0, 1]
+    mr = [1, 0, 1 - np.arange(MR) / MR]
+
+    num_bins = RY + YG + GC + CB + BM + MR
+
+    color_wheel = np.zeros((3, num_bins), dtype=np.float32)
+
+    col = 0
+    for i, color in enumerate([ry, yg, gc, cb, bm, mr]):
+        for j in range(3):
+            color_wheel[j, col:col + bins[i]] = color[j]
+        col += bins[i]
+
+    return color_wheel.T
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements.txt b/cv/distiller/CWD/pytorch/mmcv/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..448e224f92ec0e79f5aed2efc5c749f1b4447fd0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements.txt
@@ -0,0 +1,4 @@
+-r requirements/build.txt
+-r requirements/optional.txt
+-r requirements/runtime.txt
+-r requirements/test.txt
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements/build.txt b/cv/distiller/CWD/pytorch/mmcv/requirements/build.txt
new file mode 100644
index 0000000000000000000000000000000000000000..abf514853e58db1b0903721c7624cb313bf3aa57
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements/build.txt
@@ -0,0 +1 @@
+pytest-runner
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements/docs.txt b/cv/distiller/CWD/pytorch/mmcv/requirements/docs.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a1ff4d39061087b1ea7ab3c6f0481c516936d586
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements/docs.txt
@@ -0,0 +1,9 @@
+docutils==0.16.0
+markdown>=3.4.0
+myst-parser
+opencv-python
+-e git+https://github.com/open-mmlab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
+sphinx==4.0.2
+sphinx-copybutton
+sphinx_markdown_tables>=0.0.16
+torch
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements/optional.txt b/cv/distiller/CWD/pytorch/mmcv/requirements/optional.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bc74f1d295b447f385fe6387ae04a0af8dedaf13
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements/optional.txt
@@ -0,0 +1,2 @@
+ninja
+psutil
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements/runtime.txt b/cv/distiller/CWD/pytorch/mmcv/requirements/runtime.txt
new file mode 100644
index 0000000000000000000000000000000000000000..167b58c2c21562a16bffc1dda75b40f22c6a9de3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements/runtime.txt
@@ -0,0 +1,8 @@
+addict
+mmengine>=0.2.0
+numpy
+packaging
+Pillow
+pyyaml
+regex;sys_platform=='win32'
+yapf
diff --git a/cv/distiller/CWD/pytorch/mmcv/requirements/test.txt b/cv/distiller/CWD/pytorch/mmcv/requirements/test.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f163c03afd3941b47f52c0704ecf033b8bb8fb0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/requirements/test.txt
@@ -0,0 +1,9 @@
+coverage
+lmdb
+onnx
+onnxoptimizer
+onnxruntime
+pytest
+PyTurboJPEG
+scipy
+tifffile
diff --git a/cv/distiller/CWD/pytorch/mmcv/setup.cfg b/cv/distiller/CWD/pytorch/mmcv/setup.cfg
new file mode 100644
index 0000000000000000000000000000000000000000..dc8d3768ef6fb35d4b91d0504faf3fe60def3323
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/setup.cfg
@@ -0,0 +1,26 @@
+[bdist_wheel]
+universal=1
+
+[aliases]
+test=pytest
+
+[yapf]
+based_on_style = pep8
+blank_line_before_nested_class_or_def = true
+split_before_expression_after_opening_paren = true
+
+[isort]
+line_length = 79
+multi_line_output = 0
+extra_standard_library = pkg_resources,setuptools,logging,os,warnings,abc
+known_first_party = mmcv
+known_third_party = addict,cv2,matplotlib,numpy,onnx,packaging,pytest,pytorch_sphinx_theme,scipy,sphinx,torch,torchvision,yaml,yapf
+no_lines_before = STDLIB,LOCALFOLDER
+default_section = THIRDPARTY
+
+# ignore-words-list needs to be lowercase format. For example, if we want to
+# ignore word "BA", then we need to append "ba" to ignore-words-list rather
+# than "BA"
+[codespell]
+quiet-level = 3
+ignore-words-list = inout,hist,ba,ro,inh
diff --git a/cv/distiller/CWD/pytorch/mmcv/setup.py b/cv/distiller/CWD/pytorch/mmcv/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3f692777c9774fd5e5b0d18ef29d00a37da9543
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/setup.py
@@ -0,0 +1,346 @@
+import glob
+import os
+import platform
+import re
+from pkg_resources import DistributionNotFound, get_distribution
+from setuptools import find_packages, setup
+
+EXT_TYPE = ''
+try:
+    import torch
+    if torch.__version__ == 'parrots':
+        from parrots.utils.build_extension import BuildExtension
+        EXT_TYPE = 'parrots'
+    elif (hasattr(torch, 'is_mlu_available') and torch.is_mlu_available()) or \
+            os.getenv('FORCE_MLU', '0') == '1':
+        from torch_mlu.utils.cpp_extension import BuildExtension
+        EXT_TYPE = 'pytorch'
+    else:
+        from torch.utils.cpp_extension import BuildExtension
+        EXT_TYPE = 'pytorch'
+    cmd_class = {'build_ext': BuildExtension}
+except ModuleNotFoundError:
+    cmd_class = {}
+    print('Skip building ext ops due to the absence of torch.')
+
+
+def choose_requirement(primary, secondary):
+    """If some version of primary requirement installed, return primary, else
+    return secondary."""
+    try:
+        name = re.split(r'[!<>=]', primary)[0]
+        get_distribution(name)
+    except DistributionNotFound:
+        return secondary
+
+    return str(primary)
+
+
+def get_version():
+    version_file = 'mmcv/version.py'
+    with open(version_file, encoding='utf-8') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+
+
+def parse_requirements(fname='requirements/runtime.txt', with_version=True):
+    """Parse the package dependencies listed in a requirements file but strips
+    specific versioning information.
+
+    Args:
+        fname (str): path to requirements file
+        with_version (bool, default=False): if True include version specs
+
+    Returns:
+        List[str]: list of requirements items
+
+    CommandLine:
+        python -c "import setup; print(setup.parse_requirements())"
+    """
+    import sys
+    from os.path import exists
+    require_fpath = fname
+
+    def parse_line(line):
+        """Parse information from a line in a requirements text file."""
+        if line.startswith('-r '):
+            # Allow specifying requirements in other files
+            target = line.split(' ')[1]
+            for info in parse_require_file(target):
+                yield info
+        else:
+            info = {'line': line}
+            if line.startswith('-e '):
+                info['package'] = line.split('#egg=')[1]
+            else:
+                # Remove versioning from the package
+                pat = '(' + '|'.join(['>=', '==', '>']) + ')'
+                parts = re.split(pat, line, maxsplit=1)
+                parts = [p.strip() for p in parts]
+
+                info['package'] = parts[0]
+                if len(parts) > 1:
+                    op, rest = parts[1:]
+                    if ';' in rest:
+                        # Handle platform specific dependencies
+                        # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies
+                        version, platform_deps = map(str.strip,
+                                                     rest.split(';'))
+                        info['platform_deps'] = platform_deps
+                    else:
+                        version = rest  # NOQA
+                    info['version'] = (op, version)
+            yield info
+
+    def parse_require_file(fpath):
+        with open(fpath) as f:
+            for line in f.readlines():
+                line = line.strip()
+                if line and not line.startswith('#'):
+                    yield from parse_line(line)
+
+    def gen_packages_items():
+        if exists(require_fpath):
+            for info in parse_require_file(require_fpath):
+                parts = [info['package']]
+                if with_version and 'version' in info:
+                    parts.extend(info['version'])
+                if not sys.version.startswith('3.4'):
+                    # apparently package_deps are broken in 3.4
+                    platform_deps = info.get('platform_deps')
+                    if platform_deps is not None:
+                        parts.append(';' + platform_deps)
+                item = ''.join(parts)
+                yield item
+
+    packages = list(gen_packages_items())
+    return packages
+
+
+install_requires = parse_requirements()
+
+try:
+    # OpenCV installed via conda.
+    import cv2  # NOQA: F401
+    major, minor, *rest = cv2.__version__.split('.')
+    if int(major) < 3:
+        raise RuntimeError(
+            f'OpenCV >=3 is required but {cv2.__version__} is installed')
+except ImportError:
+    # If first not installed install second package
+    CHOOSE_INSTALL_REQUIRES = [('opencv-python-headless>=3',
+                                'opencv-python>=3')]
+    for main, secondary in CHOOSE_INSTALL_REQUIRES:
+        install_requires.append(choose_requirement(main, secondary))
+
+
+def get_extensions():
+    extensions = []
+
+    if os.getenv('MMCV_WITH_OPS', '1') == '0':
+        return extensions
+
+    if EXT_TYPE == 'parrots':
+        ext_name = 'mmcv._ext'
+        from parrots.utils.build_extension import Extension
+
+        # new parrots op impl do not use MMCV_USE_PARROTS
+        # define_macros = [('MMCV_USE_PARROTS', None)]
+        define_macros = []
+        include_dirs = []
+        op_files = glob.glob('./mmcv/ops/csrc/pytorch/cuda/*.cu') +\
+            glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp') +\
+            glob.glob('./mmcv/ops/csrc/parrots/*.cpp')
+        include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+        include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common/cuda'))
+        cuda_args = os.getenv('MMCV_CUDA_ARGS')
+        extra_compile_args = {
+            'nvcc': [cuda_args, '-std=c++14'] if cuda_args else ['-std=c++14'],
+            'cxx': ['-std=c++14'],
+        }
+        if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1':
+            define_macros += [('MMCV_WITH_CUDA', None)]
+            extra_compile_args['nvcc'] += [
+                '-D__CUDA_NO_HALF_OPERATORS__',
+                '-D__CUDA_NO_HALF_CONVERSIONS__',
+                '-D__CUDA_NO_HALF2_OPERATORS__',
+            ]
+        ext_ops = Extension(
+            name=ext_name,
+            sources=op_files,
+            include_dirs=include_dirs,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            cuda=True,
+            pytorch=True)
+        extensions.append(ext_ops)
+    elif EXT_TYPE == 'pytorch':
+        ext_name = 'mmcv._ext'
+        from torch.utils.cpp_extension import CppExtension, CUDAExtension
+
+        # prevent ninja from using too many resources
+        try:
+            import psutil
+            num_cpu = len(psutil.Process().cpu_affinity())
+            cpu_use = max(4, num_cpu - 1)
+        except (ModuleNotFoundError, AttributeError):
+            cpu_use = 4
+
+        os.environ.setdefault('MAX_JOBS', str(cpu_use))
+        define_macros = []
+
+        # Before PyTorch1.8.0, when compiling CUDA code, `cxx` is a
+        # required key passed to PyTorch. Even if there is no flag passed
+        # to cxx, users also need to pass an empty list to PyTorch.
+        # Since PyTorch1.8.0, it has a default value so users do not need
+        # to pass an empty list anymore.
+        # More details at https://github.com/pytorch/pytorch/pull/45956
+        extra_compile_args = {'cxx': []}
+
+        # Since the PR (https://github.com/open-mmlab/mmcv/pull/1463) uses
+        # c++14 features, the argument ['std=c++14'] must be added here.
+        # However, in the windows environment, some standard libraries
+        # will depend on c++17 or higher. In fact, for the windows
+        # environment, the compiler will choose the appropriate compiler
+        # to compile those cpp files, so there is no need to add the
+        # argument
+        if platform.system() != 'Windows':
+            extra_compile_args['cxx'] = ['-std=c++14']
+
+        include_dirs = []
+
+        is_rocm_pytorch = False
+        try:
+            from torch.utils.cpp_extension import ROCM_HOME
+            is_rocm_pytorch = True if ((torch.version.hip is not None) and
+                                       (ROCM_HOME is not None)) else False
+        except ImportError:
+            pass
+
+        if is_rocm_pytorch or torch.cuda.is_available() or os.getenv(
+                'FORCE_CUDA', '0') == '1':
+            if is_rocm_pytorch:
+                define_macros += [('MMCV_WITH_HIP', None)]
+            define_macros += [('MMCV_WITH_CUDA', None)]
+            cuda_args = os.getenv('MMCV_CUDA_ARGS')
+            extra_compile_args['nvcc'] = [cuda_args] if cuda_args else []
+            op_files = glob.glob('./mmcv/ops/csrc/pytorch/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cuda/*.cu') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cuda/*.cpp')
+            extension = CUDAExtension
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/pytorch'))
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common/cuda'))
+        elif (hasattr(torch, 'is_mlu_available') and
+                torch.is_mlu_available()) or \
+                os.getenv('FORCE_MLU', '0') == '1':
+            from torch_mlu.utils.cpp_extension import MLUExtension
+            define_macros += [('MMCV_WITH_MLU', None)]
+            mlu_args = os.getenv('MMCV_MLU_ARGS')
+            extra_compile_args['cncc'] = [mlu_args] if mlu_args else []
+            op_files = glob.glob('./mmcv/ops/csrc/pytorch/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/mlu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/common/mlu/*.mlu')
+            extension = MLUExtension
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common/mlu'))
+        elif (hasattr(torch.backends, 'mps')
+              and torch.backends.mps.is_available()) or os.getenv(
+                  'FORCE_MPS', '0') == '1':
+            # objc compiler support
+            from distutils.unixccompiler import UnixCCompiler
+            if '.mm' not in UnixCCompiler.src_extensions:
+                UnixCCompiler.src_extensions.append('.mm')
+                UnixCCompiler.language_map['.mm'] = 'objc'
+
+            define_macros += [('MMCV_WITH_MPS', None)]
+            extra_compile_args = {}
+            extra_compile_args['cxx'] = ['-Wall', '-std=c++17']
+            extra_compile_args['cxx'] += [
+                '-framework', 'Metal', '-framework', 'Foundation'
+            ]
+            extra_compile_args['cxx'] += ['-ObjC++']
+            # src
+            op_files = glob.glob('./mmcv/ops/csrc/pytorch/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/common/mps/*.mm') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/mps/*.mm')
+            extension = CppExtension
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common/mps'))
+        elif (os.getenv('FORCE_NPU', '0') == '1'):
+            print(f'Compiling {ext_name} only with CPU and NPU')
+            try:
+                from torch_npu.utils.cpp_extension import NpuExtension
+                define_macros += [('MMCV_WITH_NPU', None)]
+                extension = NpuExtension
+            except Exception:
+                raise ImportError('can not find any torch_npu')
+            # src
+            op_files = glob.glob('./mmcv/ops/csrc/pytorch/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/common/npu/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/npu/*.cpp')
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common/npu'))
+        else:
+            print(f'Compiling {ext_name} only with CPU')
+            op_files = glob.glob('./mmcv/ops/csrc/pytorch/*.cpp') + \
+                glob.glob('./mmcv/ops/csrc/pytorch/cpu/*.cpp')
+            extension = CppExtension
+            include_dirs.append(os.path.abspath('./mmcv/ops/csrc/common'))
+
+        # Since the PR (https://github.com/open-mmlab/mmcv/pull/1463) uses
+        # c++14 features, the argument ['std=c++14'] must be added here.
+        # However, in the windows environment, some standard libraries
+        # will depend on c++17 or higher. In fact, for the windows
+        # environment, the compiler will choose the appropriate compiler
+        # to compile those cpp files, so there is no need to add the
+        # argument
+        if 'nvcc' in extra_compile_args and platform.system() != 'Windows':
+            extra_compile_args['nvcc'] += ['-std=c++14']
+
+        ext_ops = extension(
+            name=ext_name,
+            sources=op_files,
+            include_dirs=include_dirs,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args)
+        extensions.append(ext_ops)
+    return extensions
+
+
+setup(
+    name='mmcv' if os.getenv('MMCV_WITH_OPS', '1') == '1' else 'mmcv-lite',
+    version=get_version(),
+    description='OpenMMLab Computer Vision Foundation',
+    keywords='computer vision',
+    packages=find_packages(),
+    include_package_data=True,
+    classifiers=[
+        'Development Status :: 4 - Beta',
+        'License :: OSI Approved :: Apache Software License',
+        'Operating System :: OS Independent',
+        'Programming Language :: Python :: 3',
+        'Programming Language :: Python :: 3.7',
+        'Programming Language :: Python :: 3.8',
+        'Programming Language :: Python :: 3.9',
+        'Programming Language :: Python :: 3.10',
+        'Topic :: Utilities',
+    ],
+    url='https://github.com/open-mmlab/mmcv',
+    author='MMCV Contributors',
+    author_email='openmmlab@gmail.com',
+    install_requires=install_requires,
+    extras_require={
+        'all': parse_requirements('requirements.txt'),
+        'tests': parse_requirements('requirements/test.txt'),
+        'build': parse_requirements('requirements/build.txt'),
+        'optional': parse_requirements('requirements/optional.txt'),
+    },
+    python_requires='>=3.7',
+    ext_modules=get_extensions(),
+    cmdclass=cmd_class,
+    zip_safe=False)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/batched_nms_data.pkl b/cv/distiller/CWD/pytorch/mmcv/tests/data/batched_nms_data.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..24edecfa82bd313729201d814d8c3478c30d1d0c
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/batched_nms_data.pkl differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/color.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/color.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2f19ebc6c6e867372f61dceadba4d66de46e31ab
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/color.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/color_exif.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/color_exif.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..703851b8183e499d9f086a720d2e92c87e3e0434
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/color_exif.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.b.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.b.py
new file mode 100644
index 0000000000000000000000000000000000000000..2364e1d10b054e99c2e1e5780cf8d0e007d659c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.b.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = [1, 2]
+item2 = {'a': 0}
+item3 = True
+item4 = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.py
new file mode 100644
index 0000000000000000000000000000000000000000..2364e1d10b054e99c2e1e5780cf8d0e007d659c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/a.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = [1, 2]
+item2 = {'a': 0}
+item3 = True
+item4 = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/b.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/b.json
new file mode 100644
index 0000000000000000000000000000000000000000..4bbbd09e8edebb1e8b93b9727a6ad5faab88e71e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/b.json
@@ -0,0 +1,8 @@
+{
+    "item1": [1, 2],
+    "item2": {
+        "a": 0
+    },
+    "item3": true,
+    "item4": "test"
+}
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/base.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..2364e1d10b054e99c2e1e5780cf8d0e007d659c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/base.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = [1, 2]
+item2 = {'a': 0}
+item3 = True
+item4 = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/c.yaml b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/c.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5365b7142fa06524678f3fd2502a97f4080c1d6c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/c.yaml
@@ -0,0 +1,4 @@
+item1: [1, 2]
+item2: {'a': 0}
+item3: True
+item4: 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/code.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/code.py
new file mode 100644
index 0000000000000000000000000000000000000000..65f70045d2c23223d38803a284d341b6a35256b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/code.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmcv import Config  # isort:skip
+
+cfg = Config.fromfile('./tests/data/config/a.py')
+item5 = cfg.item1[0] + cfg.item2.a
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/d.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/d.py
new file mode 100644
index 0000000000000000000000000000000000000000..19edcf82d0c9a40c007ba6a1eca03153f7056ce0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/d.py
@@ -0,0 +1,6 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './base.py'
+item1 = [2, 3]
+item2 = {'a': 1}
+item3 = False
+item4 = 'test_base'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/delete.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/delete.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8a1eaf64c46d301f47a90d4ac907d1a0362e84e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/delete.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './base.py'
+item1 = {'a': 0, '_delete_': True}
+item2 = {'b': 0}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated.py
new file mode 100644
index 0000000000000000000000000000000000000000..791b0f6ad8c41dbe14c4dd373beee1d8613b859a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated.py
@@ -0,0 +1,6 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './expected.py'
+
+_deprecation_ = dict(
+    expected='tests/data/config/expected.py',
+    reference='https://github.com/open-mmlab/mmcv/pull/1275')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated_as_base.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated_as_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..406964d102ef0bfe1a6ab7513cee9e32052621cc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/deprecated_as_base.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './deprecated.py'
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/e.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/e.py
new file mode 100644
index 0000000000000000000000000000000000000000..1340e4bd27198e3d3ef82dbf516f22d8daf236f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/e.py
@@ -0,0 +1,3 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './base.py'
+item3 = {'a': 1}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/expected.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/expected.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f6b729171a5b0c6158514bc500390c4ddbbbc76
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/expected.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = 'expected'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/f.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/f.py
new file mode 100644
index 0000000000000000000000000000000000000000..b6ed109bdeb01c0fede98d01d7f5e308113f7591
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/f.py
@@ -0,0 +1,3 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './d.py'
+item4 = 'test_recursive_bases'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/g.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/g.py
new file mode 100644
index 0000000000000000000000000000000000000000..34d4ebe2f898a01ee8aa11a51f0383040213dc7f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/g.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+filename = 'reserved.py'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/h.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/h.py
new file mode 100644
index 0000000000000000000000000000000000000000..82594590cf4a73bed123e92ad8c392f3d4723148
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/h.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = '{{fileBasename}}'
+item2 = '{{ fileDirname}}'
+item3 = 'abc_{{ fileBasenameNoExtension }}'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_base.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..f31a46a15de9d84191e25e8117d84a50fc967474
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_base.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = [1, 2]
+item2 = {'a': 0}
+item3 = True
+item4 = 'test'
+item_cfg = {'b': 1}
+item5 = {'cfg': item_cfg}
+item6 = {'cfg': item_cfg}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_child.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_child.py
new file mode 100644
index 0000000000000000000000000000000000000000..dfb91d16e973530dd8e07e45611ea4dc77f720ae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/i_child.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = './i_base.py'
+item_cfg = {'b': 2}
+item6 = {'cfg': item_cfg}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a17bfcbcfdebf42a527ca63a8e33ef71ed67e7c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+
+
+def func(x):
+    return x
+
+_base_ = ['./l1.py', './l2.yaml', './l3.json', './l4.py']
+item3 = False
+item4 = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l1.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l1.py
new file mode 100644
index 0000000000000000000000000000000000000000..13db1375e71095d4295bde140bceaad9db9e1c31
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l1.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item1 = [1, 2]
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l2.yaml b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b73902b39a3cf65231aaa667a530c54cfa03bde2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l2.yaml
@@ -0,0 +1 @@
+item2: {'a': 0}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l3.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l3.json
new file mode 100644
index 0000000000000000000000000000000000000000..3251c5d6395974fa788eb27a368b03150eabd72c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l3.json
@@ -0,0 +1,3 @@
+{
+  "item3": true
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l4.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l4.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb7b4365ec3674339d3de106bee06c451d4d09ee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/l4.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item5 = dict(a=0, b=1)
+item6 = [dict(a=0), dict(b=1)]
+item7 = dict(a=[0, 1, 2], b=dict(c=[3.1, 4.2, 5.3]))
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/m.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/m.py
new file mode 100644
index 0000000000000000000000000000000000000000..af81ca35ca5086e5288a823f7c60269d8e751e99
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/m.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = ['./l1.py', './l2.yaml', './l3.json', 'a.py']
+item3 = False
+item4 = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/n.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/n.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d295984c85573a28c08717f0e81a4eb104b7299
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/n.py
@@ -0,0 +1,23 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+
+
+def func(x):
+    return x
+
+test_item1 = [1, 2]
+bool_item2 = True
+str_item3 = 'test'
+dict_item4 = dict(
+    a={
+        'c/d': 'path/d',
+        'f': 's3//f',
+        6: '2333',
+        '2333': 'number'
+    },
+    b={'8': 543},
+    c={9: 678},
+    d={'a': 0},
+    f=dict(a='69'))
+dict_item5 = {'x/x': {'a.0': 233}}
+dict_list_item6 = {'x/x': [{'a.0': 1., 'b.0': 2.}, {'c/3': 3.}]}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/o.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/o.json
new file mode 100644
index 0000000000000000000000000000000000000000..84c5e3ed33ffb4365385c4af9b83196f9d28008d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/o.json
@@ -0,0 +1,3 @@
+{
+    "item1": "{{ fileDirname }}"
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/p.yaml b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/p.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3b3e46e81a0b44a8c029e034d7008fa68fdf1c7f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/p.yaml
@@ -0,0 +1 @@
+item1: '{{ fileDirname }}'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/q.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/q.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7ca0a70bb381f5b7249fec97b5ab8630f5dd57c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/q.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+custom_imports = dict(imports=['r'], allow_failed_imports=False)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/r.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/r.py
new file mode 100644
index 0000000000000000000000000000000000000000..26d982e82ac83fb94c500ee3155b50837cdd0028
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/r.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+os.environ["TEST_VALUE"] = 'test'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/s.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/s.py
new file mode 100644
index 0000000000000000000000000000000000000000..cca07539c8942c7d0424d685d7a3c5e829f27d3a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/s.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+item = [{'a': 0}, {'b': 0, 'c': 0}]
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.json
new file mode 100644
index 0000000000000000000000000000000000000000..8f7b9b4a171e1fc617872c01026cebd09e6bea22
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.json
@@ -0,0 +1,13 @@
+{
+    "_base_": [
+        "./l1.py",
+        "./l2.yaml",
+        "./l3.json",
+        "./l4.py"
+    ],
+    "item3": false,
+    "item4": "test",
+    "item8": "{{fileBasename}}",
+    "item9": {{ _base_.item2 }},
+    "item10": {{ _base_.item7.b.c }}
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.py
new file mode 100644
index 0000000000000000000000000000000000000000..1df57cb5ad2343c3791925b00a53a0bfe726c626
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = ['./l1.py', './l2.yaml', './l3.json', './l4.py']
+item3 = False
+item4 = 'test'
+item8 = '{{fileBasename}}'
+item9 = {{ _base_.item2 }}
+item10 = {{ _base_.item7.b.c }}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.yaml b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ab42859ec92af833e33fe757cf1f8ca116662b09
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/t.yaml
@@ -0,0 +1,6 @@
+_base_ : ['./l1.py', './l2.yaml', './l3.json', './l4.py']
+item3 : False
+item4 : 'test'
+item8 : '{{fileBasename}}'
+item9 : {{ _base_.item2 }}
+item10 : {{ _base_.item7.b.c }}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.json
new file mode 100644
index 0000000000000000000000000000000000000000..f6a01e3c08f383802b4bd87de03760417efffb2d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.json
@@ -0,0 +1,26 @@
+{
+    "_base_": [
+        "./t.py"
+    ],
+    "base": "_base_.item8",
+    "item11": {{ _base_.item8 }},
+    "item12": {{ _base_.item9 }},
+    "item13": {{ _base_.item10 }},
+    "item14": {{ _base_.item1 }},
+    "item15": {
+        "a": {
+            "b": {{ _base_.item2 }}
+        },
+        "b": [
+            {{ _base_.item3 }}
+        ],
+        "c": [{{ _base_.item4 }}],
+        "d": [[
+            {
+                "e": {{ _base_.item5.a }}
+            }
+        ],
+        {{ _base_.item6 }}],
+        "e": {{ _base_.item1 }}
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.py
new file mode 100644
index 0000000000000000000000000000000000000000..be6c5bbb7e36fc6b6f47bd1b0573fe5a72a23e20
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.py
@@ -0,0 +1,14 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = ['./t.py']
+base = '_base_.item8'
+item11 = {{ _base_.item8 }}
+item12 = {{ _base_.item9 }}
+item13 = {{ _base_.item10 }}
+item14 = {{ _base_.item1 }}
+item15 = dict(
+    a = dict( b = {{ _base_.item2 }} ),
+    b = [{{ _base_.item3 }}],
+    c = [{{ _base_.item4 }}],
+    d = [[dict(e = {{ _base_.item5.a }})],{{ _base_.item6 }}],
+    e = {{ _base_.item1 }}
+)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.yaml b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d201cb926dc948e292b22057c89b8d6734285b72
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/u.yaml
@@ -0,0 +1,15 @@
+_base_: ["./t.py"]
+base: "_base_.item8"
+item11: {{ _base_.item8 }}
+item12: {{ _base_.item9 }}
+item13: {{ _base_.item10 }}
+item14: {{ _base_.item1 }}
+item15:
+    a:
+        b: {{ _base_.item2 }}
+    b: [{{ _base_.item3 }}]
+    c: [{{ _base_.item4 }}]
+    d:
+        - [e: {{ _base_.item5.a }}]
+        - {{ _base_.item6 }}
+    e: {{ _base_.item1 }}
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/config/v.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/v.py
new file mode 100644
index 0000000000000000000000000000000000000000..13d204d24f5df313661f05cd523642305e9ae408
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/config/v.py
@@ -0,0 +1,12 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = ['./u.py']
+item21 = {{ _base_.item11 }}
+item22 = item21
+item23 = {{ _base_.item10 }}
+item24 = item23
+item25 = dict(
+    a = dict( b = item24 ),
+    b = [item24],
+    c = [[dict(e = item22)],{{ _base_.item6 }}],
+    e = item21
+)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/data.mdb b/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/data.mdb
new file mode 100644
index 0000000000000000000000000000000000000000..db1f7d306cfd4255f53dca6a25411adc81cb145a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/data.mdb differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/lock.mdb b/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/lock.mdb
new file mode 100644
index 0000000000000000000000000000000000000000..2e0c69d48b97b0e5215261435d9e0f953b66088c
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/demo.lmdb/lock.mdb differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/filelist.txt b/cv/distiller/CWD/pytorch/mmcv/tests/data/filelist.txt
new file mode 100644
index 0000000000000000000000000000000000000000..66117a873343d3dd52eedb6176fc9c9d69cde3b9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/filelist.txt
@@ -0,0 +1,5 @@
+1.jpg
+2.jpg
+3.jpg
+4.jpg
+5.jpg
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/features_for_fps_distance.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/features_for_fps_distance.npy
new file mode 100644
index 0000000000000000000000000000000000000000..6626710cc121fe8452afb36f5d81ccbaf6798543
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/features_for_fps_distance.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/fps_idx.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/fps_idx.npy
new file mode 100644
index 0000000000000000000000000000000000000000..4c460b8794a8928c2aade1be98261c582e6ba721
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/fps_idx.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/test_voxel.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/test_voxel.npy
new file mode 100644
index 0000000000000000000000000000000000000000..0ca96590dae57544c5b07de1189a0c837b5b5ab9
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_3d_ops/test_voxel.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat.bin
new file mode 100644
index 0000000000000000000000000000000000000000..9402a7cdacd110d94343dfc8ca3acc935ea4af1b
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat_grad.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat_grad.bin
new file mode 100644
index 0000000000000000000000000000000000000000..d195bd18031de420179f4e6573efe92a12250bba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_feat_grad.bin
@@ -0,0 +1,33 @@
+���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A���A��A�>�A�d�A�"~A܈A^a�A��A�J�A
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA
+��A�
+�A0�A6��AE|�Ay�A�Ab�AN�hA�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��A�A)��A���A%ՈAZw�A;�A���Ae��A,��AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���AWl�A��A�A8��Ag֊A���A��A^�xA���A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A8D�A�քA�{�AFܗA��A�חA��ATi�A`�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A�ǋA��ALlAl�zA٦lA�.�A��A*��AYO�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A���A���A��Auq�A�^�A�p�AΚA���A*�A˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA˼�A���AhZ�AgAĒ�AB\�A�y�A�A�ЌA
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask.bin
new file mode 100644
index 0000000000000000000000000000000000000000..18dc01b0a4a24dd56628eecf8103b76e40d19b9b
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask_grad.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask_grad.bin
new file mode 100644
index 0000000000000000000000000000000000000000..f6f93dc68d7a102c524f2b38db957b1b0329614c
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_mask_grad.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_output.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_output.bin
new file mode 100644
index 0000000000000000000000000000000000000000..540052702068fe07315ea81735062ad1b9091e45
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_carafe/carafe_output.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_input.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_input.bin
new file mode 100644
index 0000000000000000000000000000000000000000..c2d8094a4c637feb89d8119d7091d3cdfbfa5a96
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_input.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_output.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_output.bin
new file mode 100644
index 0000000000000000000000000000000000000000..c2d8094a4c637feb89d8119d7091d3cdfbfa5a96
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_ccattention/ccattention_output.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_bias.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_bias.npy
new file mode 100644
index 0000000000000000000000000000000000000000..c60951a1d64d67dc8717284753e1bd33f95f6a48
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_bias.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_input.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_input.npy
new file mode 100644
index 0000000000000000000000000000000000000000..f45c03457d84e47a6c737dc2f362a279634d0108
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_input.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_mask.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_mask.npy
new file mode 100644
index 0000000000000000000000000000000000000000..4c074471e7b7411868d1ac62ed5a56a93ab6f216
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_mask.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_output.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_output.npy
new file mode 100644
index 0000000000000000000000000000000000000000..4741265afb3411221c184b2665737b5dab755692
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_output.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_weight.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_weight.npy
new file mode 100644
index 0000000000000000000000000000000000000000..50f04b53f01297845c95a690bfbf87ce3a88e523
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_masked_conv2d/masked_conv2d_for_weight.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_input.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_input.bin
new file mode 100644
index 0000000000000000000000000000000000000000..9172735d6500c4bcf95021f1d16a44f4e2beef4c
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_input.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_collect.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_collect.bin
new file mode 100644
index 0000000000000000000000000000000000000000..f179bdef1db9ea7aa925eef41a72f85dbebe21f9
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_collect.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_distribute.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_distribute.bin
new file mode 100644
index 0000000000000000000000000000000000000000..3264f3803c0f81b665209c8b21924e042bccb383
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_psa_mask/psa_output_distribute.bin differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/.file b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/.file
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/1.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/1.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/1.txt b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/1.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/2.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/2.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/2.txt b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/3.TXT b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/3.TXT
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/a.bin b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/a.bin
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/sub/1.json b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/sub/1.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/sub/1.txt b/cv/distiller/CWD/pytorch/mmcv/tests/data/for_scan/sub/1.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/gray_alpha.png b/cv/distiller/CWD/pytorch/mmcv/tests/data/gray_alpha.png
new file mode 100644
index 0000000000000000000000000000000000000000..f60450c251b315d88ac865f61a0eb9e060e6fb4b
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/gray_alpha.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..319d6746689375c815f061e50c4ea9a40db60857
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale_dim3.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale_dim3.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5c2da4398aa1349493200a36a21aac1d175c979a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/grayscale_dim3.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/mapping.txt b/cv/distiller/CWD/pytorch/mmcv/tests/data/mapping.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c85bdb05ffe83c501708989943673753465bdb94
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/mapping.txt
@@ -0,0 +1,3 @@
+1 cat
+2 dog cow
+3 panda
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow.flo b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow.flo
new file mode 100644
index 0000000000000000000000000000000000000000..c6dd4d93e92c8ec59955c83fe6af8e3a99d0da4d
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow.flo differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat0.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat0.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d052cd1637d8e9169111c5816cd5c4c3bce1ef11
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat0.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat1.jpg b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat1.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6c9e7c82cd99595b91b1de9e9960d8da9a21fcc1
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/optflow_concat1.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/palette.gif b/cv/distiller/CWD/pytorch/mmcv/tests/data/palette.gif
new file mode 100644
index 0000000000000000000000000000000000000000..e1b4415598f416af5efe12282e47cbd7c1279ae9
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/palette.gif differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/0.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/0.npy
new file mode 100644
index 0000000000000000000000000000000000000000..8b33a83710bd5e0ced1f05980a42d30757ad1356
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/0.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/1.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/1.npy
new file mode 100644
index 0000000000000000000000000000000000000000..b919b67c44a9c4d672f9d3ceae7e00e7de618e5f
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/1.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/2.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/2.npy
new file mode 100644
index 0000000000000000000000000000000000000000..3f53b58259e7e82fa44228c9e9ee296495e25ef7
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/2.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/3.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/3.npy
new file mode 100644
index 0000000000000000000000000000000000000000..8998cca4683ef1b7d7bcfc9b48ddfdf0f3f9e391
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/3.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/4.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/4.npy
new file mode 100644
index 0000000000000000000000000000000000000000..ddaef3b66ea7d7dbb7adc1dcab35293da2378664
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/4.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_0.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_0.npy
new file mode 100644
index 0000000000000000000000000000000000000000..334f417d929e174fb6dcb0344a6e4d85dff89c2a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_0.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_1.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_1.npy
new file mode 100644
index 0000000000000000000000000000000000000000..e81c7e4fbadcc02070a17dffe0971065f80c8b04
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_1.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_2.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_2.npy
new file mode 100644
index 0000000000000000000000000000000000000000..de63a480cff1687371f803b2be2fd8f0babe38f1
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_2.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_3.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_3.npy
new file mode 100644
index 0000000000000000000000000000000000000000..68e766d31a81936a86887eea11a8b72cc598d1e5
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_3.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_4.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_4.npy
new file mode 100644
index 0000000000000000000000000000000000000000..b8ce7d51f63c9afd4499d3f464ef5dd18d46128b
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad0_4.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_0.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_0.npy
new file mode 100644
index 0000000000000000000000000000000000000000..334f417d929e174fb6dcb0344a6e4d85dff89c2a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_0.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_1.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_1.npy
new file mode 100644
index 0000000000000000000000000000000000000000..41d85632c5af28d982107007d5f01fdc48b3efea
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_1.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_2.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_2.npy
new file mode 100644
index 0000000000000000000000000000000000000000..12d2612bc9134cb5bd747f5a8878cea4bf224dc8
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_2.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_3.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_3.npy
new file mode 100644
index 0000000000000000000000000000000000000000..f9f8ffbeaf87d3866fab65fcdd04c8e65da6314b
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_3.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_4.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_4.npy
new file mode 100644
index 0000000000000000000000000000000000000000..f41c7d0079d8bffa100361ec1b600855ee012fd4
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/pad_4.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_0.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_0.npy
new file mode 100644
index 0000000000000000000000000000000000000000..334f417d929e174fb6dcb0344a6e4d85dff89c2a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_0.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_1.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_1.npy
new file mode 100644
index 0000000000000000000000000000000000000000..721fddcf0623857818cd1ce11a594ade4da74222
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_1.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_2.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_2.npy
new file mode 100644
index 0000000000000000000000000000000000000000..fb24241dc75ac64b8a7d37fda05831a019d363f6
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_2.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_3.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_3.npy
new file mode 100644
index 0000000000000000000000000000000000000000..ac566b6bf1a2b4ad9f2d7a991a62df2042eb5c8a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_3.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_4.npy b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_4.npy
new file mode 100644
index 0000000000000000000000000000000000000000..e9488e025df775fb19949266fed8ba0b55c6ac8f
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/patches/scale_4.npy differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/scripts/hello.py b/cv/distiller/CWD/pytorch/mmcv/tests/data/scripts/hello.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ed1a1e319fa36eb11ed3f0fcd365eb43a382d01
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/data/scripts/hello.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+#!/usr/bin/env python
+
+import argparse
+import warnings
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Say hello.')
+    parser.add_argument('name', help='To whom.')
+
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    args = parse_args()
+    print(f'hello {args.name}!')
+    if args.name == 'agent':
+        warnings.warn('I have a secret!')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/sparse_flow.png b/cv/distiller/CWD/pytorch/mmcv/tests/data/sparse_flow.png
new file mode 100644
index 0000000000000000000000000000000000000000..6d529602745b4a8d48ebadd4fee86630ac59906c
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/sparse_flow.png differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/test.mp4 b/cv/distiller/CWD/pytorch/mmcv/tests/data/test.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..50db28101cba4b56d2478d03e3e6de71c23f1f24
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/test.mp4 differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/data/uint16-5channel.tif b/cv/distiller/CWD/pytorch/mmcv/tests/data/uint16-5channel.tif
new file mode 100644
index 0000000000000000000000000000000000000000..71362de2a962b605906aa795aab4ca4b42804142
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/data/uint16-5channel.tif differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_arraymisc.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_arraymisc.py
new file mode 100644
index 0000000000000000000000000000000000000000..b29e5f670c3b43663a3390c0e5d4206d49680b70
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_arraymisc.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import numpy as np
+import pytest
+
+import mmcv
+
+
+def test_quantize():
+    arr = np.random.randn(10, 10)
+    levels = 20
+
+    qarr = mmcv.quantize(arr, -1, 1, levels)
+    assert qarr.shape == arr.shape
+    assert qarr.dtype == np.dtype('int64')
+    for i in range(arr.shape[0]):
+        for j in range(arr.shape[1]):
+            ref = min(levels - 1,
+                      int(np.floor(10 * (1 + max(min(arr[i, j], 1), -1)))))
+            assert qarr[i, j] == ref
+
+    qarr = mmcv.quantize(arr, -1, 1, 20, dtype=np.uint8)
+    assert qarr.shape == arr.shape
+    assert qarr.dtype == np.dtype('uint8')
+
+    with pytest.raises(ValueError):
+        mmcv.quantize(arr, -1, 1, levels=0)
+    with pytest.raises(ValueError):
+        mmcv.quantize(arr, -1, 1, levels=10.0)
+    with pytest.raises(ValueError):
+        mmcv.quantize(arr, 2, 1, levels)
+
+
+def test_dequantize():
+    levels = 20
+    qarr = np.random.randint(levels, size=(10, 10))
+
+    arr = mmcv.dequantize(qarr, -1, 1, levels)
+    assert arr.shape == qarr.shape
+    assert arr.dtype == np.dtype('float64')
+    for i in range(qarr.shape[0]):
+        for j in range(qarr.shape[1]):
+            assert arr[i, j] == (qarr[i, j] + 0.5) / 10 - 1
+
+    arr = mmcv.dequantize(qarr, -1, 1, levels, dtype=np.float32)
+    assert arr.shape == qarr.shape
+    assert arr.dtype == np.dtype('float32')
+
+    with pytest.raises(ValueError):
+        mmcv.dequantize(arr, -1, 1, levels=0)
+    with pytest.raises(ValueError):
+        mmcv.dequantize(arr, -1, 1, levels=10.0)
+    with pytest.raises(ValueError):
+        mmcv.dequantize(arr, 2, 1, levels)
+
+
+def test_joint():
+    arr = np.random.randn(100, 100)
+    levels = 1000
+    qarr = mmcv.quantize(arr, -1, 1, levels)
+    recover = mmcv.dequantize(qarr, -1, 1, levels)
+    assert np.abs(recover[arr < -1] + 0.999).max() < 1e-6
+    assert np.abs(recover[arr > 1] - 0.999).max() < 1e-6
+    assert np.abs((recover - arr)[(arr >= -1) & (arr <= 1)]).max() <= 1e-3
+
+    arr = np.clip(np.random.randn(100) / 1000, -0.01, 0.01)
+    levels = 99
+    qarr = mmcv.quantize(arr, -1, 1, levels)
+    recover = mmcv.dequantize(qarr, -1, 1, levels)
+    assert np.all(recover == 0)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_build_layers.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_build_layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..c8903ac40dffc6b758dff49277f69cc84a4417f8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_build_layers.py
@@ -0,0 +1,430 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from importlib import import_module
+
+import numpy as np
+import pytest
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+from mmengine.utils.dl_utils.parrots_wrapper import _BatchNorm
+
+from mmcv.cnn.bricks import (build_activation_layer, build_conv_layer,
+                             build_norm_layer, build_padding_layer,
+                             build_plugin_layer, build_upsample_layer, is_norm)
+from mmcv.cnn.bricks.norm import infer_abbr as infer_norm_abbr
+from mmcv.cnn.bricks.plugin import infer_abbr as infer_plugin_abbr
+from mmcv.cnn.bricks.upsample import PixelShufflePack
+
+
+def test_build_conv_layer():
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'Conv2d'
+        build_conv_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict(kernel_size=3)
+        build_conv_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # unsupported conv type
+        cfg = dict(type='FancyConv')
+        build_conv_layer(cfg)
+
+    kwargs = dict(
+        in_channels=4, out_channels=8, kernel_size=3, groups=2, dilation=2)
+    cfg = None
+    layer = build_conv_layer(cfg, **kwargs)
+    assert isinstance(layer, nn.Conv2d)
+    assert layer.in_channels == kwargs['in_channels']
+    assert layer.out_channels == kwargs['out_channels']
+    assert layer.kernel_size == (kwargs['kernel_size'], kwargs['kernel_size'])
+    assert layer.groups == kwargs['groups']
+    assert layer.dilation == (kwargs['dilation'], kwargs['dilation'])
+
+    cfg = dict(type='Conv')
+    layer = build_conv_layer(cfg, **kwargs)
+    assert isinstance(layer, nn.Conv2d)
+    assert layer.in_channels == kwargs['in_channels']
+    assert layer.out_channels == kwargs['out_channels']
+    assert layer.kernel_size == (kwargs['kernel_size'], kwargs['kernel_size'])
+    assert layer.groups == kwargs['groups']
+    assert layer.dilation == (kwargs['dilation'], kwargs['dilation'])
+
+    cfg = dict(type='deconv')
+    layer = build_conv_layer(cfg, **kwargs)
+    assert isinstance(layer, nn.ConvTranspose2d)
+    assert layer.in_channels == kwargs['in_channels']
+    assert layer.out_channels == kwargs['out_channels']
+    assert layer.kernel_size == (kwargs['kernel_size'], kwargs['kernel_size'])
+    assert layer.groups == kwargs['groups']
+    assert layer.dilation == (kwargs['dilation'], kwargs['dilation'])
+
+    # sparse convs cannot support the case when groups>1
+    kwargs.pop('groups')
+
+    for type_name, module in MODELS.module_dict.items():
+        cfg = dict(type=type_name)
+        # SparseInverseConv2d and SparseInverseConv3d do not have the argument
+        # 'dilation'
+        if type_name == 'SparseInverseConv2d' or type_name == \
+                'SparseInverseConv3d':
+            kwargs.pop('dilation')
+        if 'conv' in type_name.lower():
+            layer = build_conv_layer(cfg, **kwargs)
+            assert isinstance(layer, module)
+            assert layer.in_channels == kwargs['in_channels']
+            assert layer.out_channels == kwargs['out_channels']
+            kwargs['dilation'] = 2  # recover the key
+
+
+def test_infer_norm_abbr():
+    with pytest.raises(TypeError):
+        # class_type must be a class
+        infer_norm_abbr(0)
+
+    class MyNorm:
+
+        _abbr_ = 'mn'
+
+    assert infer_norm_abbr(MyNorm) == 'mn'
+
+    class FancyBatchNorm:
+        pass
+
+    assert infer_norm_abbr(FancyBatchNorm) == 'bn'
+
+    class FancyInstanceNorm:
+        pass
+
+    assert infer_norm_abbr(FancyInstanceNorm) == 'in'
+
+    class FancyLayerNorm:
+        pass
+
+    assert infer_norm_abbr(FancyLayerNorm) == 'ln'
+
+    class FancyGroupNorm:
+        pass
+
+    assert infer_norm_abbr(FancyGroupNorm) == 'gn'
+
+    class FancyNorm:
+        pass
+
+    assert infer_norm_abbr(FancyNorm) == 'norm_layer'
+
+
+def test_build_norm_layer():
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'BN'
+        build_norm_layer(cfg, 3)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict()
+        build_norm_layer(cfg, 3)
+
+    with pytest.raises(KeyError):
+        # unsupported norm type
+        cfg = dict(type='FancyNorm')
+        build_norm_layer(cfg, 3)
+
+    with pytest.raises(AssertionError):
+        # postfix must be int or str
+        cfg = dict(type='BN')
+        build_norm_layer(cfg, 3, postfix=[1, 2])
+
+    with pytest.raises(AssertionError):
+        # `num_groups` must be in cfg when using 'GN'
+        cfg = dict(type='GN')
+        build_norm_layer(cfg, 3)
+
+    # test each type of norm layer in norm_cfg
+    abbr_mapping = {
+        'BN': 'bn',
+        'BN1d': 'bn',
+        'BN2d': 'bn',
+        'BN3d': 'bn',
+        'SyncBN': 'bn',
+        'GN': 'gn',
+        'LN': 'ln',
+        'IN': 'in',
+        'IN1d': 'in',
+        'IN2d': 'in',
+        'IN3d': 'in',
+    }
+    for type_name, module in MODELS.module_dict.items():
+        if type_name not in abbr_mapping:
+            continue
+        if type_name == 'MMSyncBN':  # skip MMSyncBN
+            continue
+        for postfix in ['_test', 1]:
+            cfg = dict(type=type_name)
+            if type_name == 'GN':
+                cfg['num_groups'] = 3
+            name, layer = build_norm_layer(cfg, 3, postfix=postfix)
+            assert name == abbr_mapping[type_name] + str(postfix)
+            assert isinstance(layer, module)
+            if type_name == 'GN':
+                assert layer.num_channels == 3
+                assert layer.num_groups == cfg['num_groups']
+            elif type_name != 'LN':
+                assert layer.num_features == 3
+
+
+def test_build_activation_layer():
+    act_names = [
+        'ReLU', 'LeakyReLU', 'PReLU', 'RReLU', 'ReLU6', 'ELU', 'Sigmoid',
+        'Tanh'
+    ]
+
+    for module_name in ['activation', 'hsigmoid', 'hswish', 'swish']:
+        act_module = import_module(f'mmcv.cnn.bricks.{module_name}')
+        for key, value in act_module.__dict__.items():
+            if isinstance(value, type) and issubclass(value, nn.Module):
+                act_names.append(key)
+
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'ReLU'
+        build_activation_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict()
+        build_activation_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # unsupported activation type
+        cfg = dict(type='FancyReLU')
+        build_activation_layer(cfg)
+
+    # test each type of activation layer in activation_cfg
+    for type_name, module in MODELS.module_dict.items():
+        if type_name in act_names:
+            cfg['type'] = type_name
+            layer = build_activation_layer(cfg)
+            assert isinstance(layer, module)
+
+    # sanity check for Clamp
+    act = build_activation_layer(dict(type='Clamp'))
+    x = torch.randn(10) * 1000
+    y = act(x)
+    assert np.logical_and((y >= -1).numpy(), (y <= 1).numpy()).all()
+    act = build_activation_layer(dict(type='Clip', min=0))
+    y = act(x)
+    assert np.logical_and((y >= 0).numpy(), (y <= 1).numpy()).all()
+    act = build_activation_layer(dict(type='Clamp', max=0))
+    y = act(x)
+    assert np.logical_and((y >= -1).numpy(), (y <= 0).numpy()).all()
+
+
+def test_build_padding_layer():
+    pad_names = ['zero', 'reflect', 'replicate']
+    for module_name in ['padding']:
+        pad_module = import_module(f'mmcv.cnn.bricks.{module_name}')
+        for key, value in pad_module.__dict__.items():
+            if isinstance(value, type) and issubclass(value, nn.Module):
+                pad_names.append(key)
+
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'reflect'
+        build_padding_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict()
+        build_padding_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # unsupported activation type
+        cfg = dict(type='FancyPad')
+        build_padding_layer(cfg)
+
+    for type_name, module in MODELS.module_dict.items():
+        if type_name in pad_names:
+            cfg['type'] = type_name
+            layer = build_padding_layer(cfg, 2)
+            assert isinstance(layer, module)
+
+    input_x = torch.randn(1, 2, 5, 5)
+    cfg = dict(type='reflect')
+    padding_layer = build_padding_layer(cfg, 2)
+    res = padding_layer(input_x)
+    assert res.shape == (1, 2, 9, 9)
+
+
+def test_upsample_layer():
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'bilinear'
+        build_upsample_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict()
+        build_upsample_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # unsupported activation type
+        cfg = dict(type='FancyUpsample')
+        build_upsample_layer(cfg)
+
+    for type_name in ['nearest', 'bilinear']:
+        cfg['type'] = type_name
+        layer = build_upsample_layer(cfg)
+        assert isinstance(layer, nn.Upsample)
+        assert layer.mode == type_name
+
+    cfg = dict(
+        type='deconv', in_channels=3, out_channels=3, kernel_size=3, stride=2)
+    layer = build_upsample_layer(cfg)
+    assert isinstance(layer, nn.ConvTranspose2d)
+
+    cfg = dict(type='deconv')
+    kwargs = dict(in_channels=3, out_channels=3, kernel_size=3, stride=2)
+    layer = build_upsample_layer(cfg, **kwargs)
+    assert isinstance(layer, nn.ConvTranspose2d)
+    assert layer.in_channels == kwargs['in_channels']
+    assert layer.out_channels == kwargs['out_channels']
+    assert layer.kernel_size == (kwargs['kernel_size'], kwargs['kernel_size'])
+    assert layer.stride == (kwargs['stride'], kwargs['stride'])
+
+    layer = build_upsample_layer(cfg, 3, 3, 3, 2)
+    assert isinstance(layer, nn.ConvTranspose2d)
+    assert layer.in_channels == kwargs['in_channels']
+    assert layer.out_channels == kwargs['out_channels']
+    assert layer.kernel_size == (kwargs['kernel_size'], kwargs['kernel_size'])
+    assert layer.stride == (kwargs['stride'], kwargs['stride'])
+
+    cfg = dict(
+        type='pixel_shuffle',
+        in_channels=3,
+        out_channels=3,
+        scale_factor=2,
+        upsample_kernel=3)
+    layer = build_upsample_layer(cfg)
+
+    assert isinstance(layer, PixelShufflePack)
+    assert layer.scale_factor == 2
+    assert layer.upsample_kernel == 3
+
+
+def test_pixel_shuffle_pack():
+    x_in = torch.rand(2, 3, 10, 10)
+    pixel_shuffle = PixelShufflePack(3, 3, scale_factor=2, upsample_kernel=3)
+    assert pixel_shuffle.upsample_conv.kernel_size == (3, 3)
+    x_out = pixel_shuffle(x_in)
+    assert x_out.shape == (2, 3, 20, 20)
+
+
+def test_is_norm():
+    norm_set1 = [
+        nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d, nn.InstanceNorm1d,
+        nn.InstanceNorm2d, nn.InstanceNorm3d, nn.LayerNorm
+    ]
+    norm_set2 = [nn.GroupNorm]
+    for norm_type in norm_set1:
+        layer = norm_type(3)
+        assert is_norm(layer)
+        assert not is_norm(layer, exclude=(norm_type, ))
+    for norm_type in norm_set2:
+        layer = norm_type(3, 6)
+        assert is_norm(layer)
+        assert not is_norm(layer, exclude=(norm_type, ))
+
+    class MyNorm(nn.BatchNorm2d):
+        pass
+
+    layer = MyNorm(3)
+    assert is_norm(layer)
+    assert not is_norm(layer, exclude=_BatchNorm)
+    assert not is_norm(layer, exclude=(_BatchNorm, ))
+
+    layer = nn.Conv2d(3, 8, 1)
+    assert not is_norm(layer)
+
+    with pytest.raises(TypeError):
+        layer = nn.BatchNorm1d(3)
+        is_norm(layer, exclude='BN')
+
+    with pytest.raises(TypeError):
+        layer = nn.BatchNorm1d(3)
+        is_norm(layer, exclude=('BN', ))
+
+
+def test_infer_plugin_abbr():
+    with pytest.raises(TypeError):
+        # class_type must be a class
+        infer_plugin_abbr(0)
+
+    class MyPlugin:
+
+        _abbr_ = 'mp'
+
+    assert infer_plugin_abbr(MyPlugin) == 'mp'
+
+    class FancyPlugin:
+        pass
+
+    assert infer_plugin_abbr(FancyPlugin) == 'fancy_plugin'
+
+
+def test_build_plugin_layer():
+    with pytest.raises(TypeError):
+        # cfg must be a dict
+        cfg = 'Plugin'
+        build_plugin_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # `type` must be in cfg
+        cfg = dict()
+        build_plugin_layer(cfg)
+
+    with pytest.raises(KeyError):
+        # unsupported plugin type
+        cfg = dict(type='FancyPlugin')
+        build_plugin_layer(cfg)
+
+    with pytest.raises(AssertionError):
+        # postfix must be int or str
+        cfg = dict(type='ConvModule')
+        build_plugin_layer(cfg, postfix=[1, 2])
+
+    # test ContextBlock
+    for postfix in ['', '_test', 1]:
+        cfg = dict(type='ContextBlock')
+        name, layer = build_plugin_layer(
+            cfg, postfix=postfix, in_channels=16, ratio=1. / 4)
+        assert name == 'context_block' + str(postfix)
+        assert isinstance(layer, MODELS.module_dict['ContextBlock'])
+
+    # test GeneralizedAttention
+    for postfix in ['', '_test', 1]:
+        cfg = dict(type='GeneralizedAttention')
+        name, layer = build_plugin_layer(cfg, postfix=postfix, in_channels=16)
+        assert name == 'gen_attention_block' + str(postfix)
+        assert isinstance(layer, MODELS.module_dict['GeneralizedAttention'])
+
+    # test NonLocal2d
+    for postfix in ['', '_test', 1]:
+        cfg = dict(type='NonLocal2d')
+        name, layer = build_plugin_layer(cfg, postfix=postfix, in_channels=16)
+        assert name == 'nonlocal_block' + str(postfix)
+        assert isinstance(layer, MODELS.module_dict['NonLocal2d'])
+
+    # test ConvModule
+    for postfix in ['', '_test', 1]:
+        cfg = dict(type='ConvModule')
+        name, layer = build_plugin_layer(
+            cfg,
+            postfix=postfix,
+            in_channels=16,
+            out_channels=4,
+            kernel_size=3)
+        assert name == 'conv_block' + str(postfix)
+        assert isinstance(layer, MODELS.module_dict['ConvModule'])
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_context_block.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_context_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..864cb417937603d162235c4a72b4eff09b151518
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_context_block.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.cnn.bricks import ContextBlock
+
+
+def test_context_block():
+    with pytest.raises(AssertionError):
+        # pooling_type should be in ['att', 'avg']
+        ContextBlock(16, 1. / 4, pooling_type='unsupport_type')
+
+    with pytest.raises(AssertionError):
+        # fusion_types should be of type list or tuple
+        ContextBlock(16, 1. / 4, fusion_types='unsupport_type')
+
+    with pytest.raises(AssertionError):
+        # fusion_types should be in ['channel_add', 'channel_mul']
+        ContextBlock(16, 1. / 4, fusion_types=('unsupport_type', ))
+
+    # test pooling_type='att'
+    imgs = torch.randn(2, 16, 20, 20)
+    context_block = ContextBlock(16, 1. / 4, pooling_type='att')
+    out = context_block(imgs)
+    assert context_block.conv_mask.in_channels == 16
+    assert context_block.conv_mask.out_channels == 1
+    assert out.shape == imgs.shape
+
+    # test pooling_type='avg'
+    imgs = torch.randn(2, 16, 20, 20)
+    context_block = ContextBlock(16, 1. / 4, pooling_type='avg')
+    out = context_block(imgs)
+    assert hasattr(context_block, 'avg_pool')
+    assert out.shape == imgs.shape
+
+    # test fusion_types=('channel_add',)
+    imgs = torch.randn(2, 16, 20, 20)
+    context_block = ContextBlock(16, 1. / 4, fusion_types=('channel_add', ))
+    out = context_block(imgs)
+    assert context_block.channel_add_conv is not None
+    assert context_block.channel_mul_conv is None
+    assert out.shape == imgs.shape
+
+    # test fusion_types=('channel_mul',)
+    imgs = torch.randn(2, 16, 20, 20)
+    context_block = ContextBlock(16, 1. / 4, fusion_types=('channel_mul', ))
+    out = context_block(imgs)
+    assert context_block.channel_add_conv is None
+    assert context_block.channel_mul_conv is not None
+    assert out.shape == imgs.shape
+
+    # test fusion_types=('channel_add', 'channel_mul')
+    imgs = torch.randn(2, 16, 20, 20)
+    context_block = ContextBlock(
+        16, 1. / 4, fusion_types=('channel_add', 'channel_mul'))
+    out = context_block(imgs)
+    assert context_block.channel_add_conv is not None
+    assert context_block.channel_mul_conv is not None
+    assert out.shape == imgs.shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv2d_adaptive_padding.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv2d_adaptive_padding.py
new file mode 100644
index 0000000000000000000000000000000000000000..83114bd5b5588dd37523a2a7476cef15b8c15df5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv2d_adaptive_padding.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmcv.cnn.bricks import Conv2dAdaptivePadding
+
+
+def test_conv2d_samepadding():
+    # test Conv2dAdaptivePadding with stride=1
+    inputs = torch.rand((1, 3, 28, 28))
+    conv = Conv2dAdaptivePadding(3, 3, kernel_size=3, stride=1)
+    output = conv(inputs)
+    assert output.shape == inputs.shape
+
+    inputs = torch.rand((1, 3, 13, 13))
+    conv = Conv2dAdaptivePadding(3, 3, kernel_size=3, stride=1)
+    output = conv(inputs)
+    assert output.shape == inputs.shape
+
+    # test Conv2dAdaptivePadding with stride=2
+    inputs = torch.rand((1, 3, 28, 28))
+    conv = Conv2dAdaptivePadding(3, 3, kernel_size=3, stride=2)
+    output = conv(inputs)
+    assert output.shape == torch.Size([1, 3, 14, 14])
+
+    inputs = torch.rand((1, 3, 13, 13))
+    conv = Conv2dAdaptivePadding(3, 3, kernel_size=3, stride=2)
+    output = conv(inputs)
+    assert output.shape == torch.Size([1, 3, 7, 7])
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv_module.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..d31167a743e7bf3a9f40f6d5c346401948f5c76b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_conv_module.py
@@ -0,0 +1,253 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from unittest.mock import patch
+
+import pytest
+import torch
+import torch.nn as nn
+from mmengine.registry import MODELS
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+from mmcv.cnn.bricks import ConvModule, HSigmoid, HSwish
+
+
+@MODELS.register_module()
+class ExampleConv(nn.Module):
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 kernel_size,
+                 stride=1,
+                 padding=0,
+                 dilation=1,
+                 groups=1,
+                 bias=True,
+                 norm_cfg=None):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.padding = padding
+        self.dilation = dilation
+        self.groups = groups
+        self.bias = bias
+        self.norm_cfg = norm_cfg
+        self.output_padding = (0, 0, 0)
+        self.transposed = False
+
+        self.conv0 = nn.Conv2d(in_channels, out_channels, kernel_size)
+        self.init_weights()
+
+    def forward(self, x):
+        x = self.conv0(x)
+        return x
+
+    def init_weights(self):
+        nn.init.constant_(self.conv0.weight, 0)
+
+
+def test_conv_module():
+    with pytest.raises(AssertionError):
+        # conv_cfg must be a dict or None
+        conv_cfg = 'conv'
+        ConvModule(3, 8, 2, conv_cfg=conv_cfg)
+
+    with pytest.raises(AssertionError):
+        # norm_cfg must be a dict or None
+        norm_cfg = 'norm'
+        ConvModule(3, 8, 2, norm_cfg=norm_cfg)
+
+    with pytest.raises(KeyError):
+        # softmax is not supported
+        act_cfg = dict(type='softmax')
+        ConvModule(3, 8, 2, act_cfg=act_cfg)
+
+    # conv + norm + act
+    conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    assert conv.with_activation
+    assert hasattr(conv, 'activate')
+    assert conv.with_norm
+    assert hasattr(conv, 'norm')
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # conv + act
+    conv = ConvModule(3, 8, 2)
+    assert conv.with_activation
+    assert hasattr(conv, 'activate')
+    assert not conv.with_norm
+    assert conv.norm is None
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # conv
+    conv = ConvModule(3, 8, 2, act_cfg=None)
+    assert not conv.with_norm
+    assert conv.norm is None
+    assert not conv.with_activation
+    assert not hasattr(conv, 'activate')
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # conv with its own `init_weights` method
+    conv_module = ConvModule(
+        3, 8, 2, conv_cfg=dict(type='ExampleConv'), act_cfg=None)
+    assert torch.equal(conv_module.conv.conv0.weight, torch.zeros(8, 3, 2, 2))
+
+    # with_spectral_norm=True
+    conv = ConvModule(3, 8, 3, padding=1, with_spectral_norm=True)
+    assert hasattr(conv.conv, 'weight_orig')
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # padding_mode='reflect'
+    conv = ConvModule(3, 8, 3, padding=1, padding_mode='reflect')
+    assert isinstance(conv.padding_layer, nn.ReflectionPad2d)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # non-existing padding mode
+    with pytest.raises(KeyError):
+        conv = ConvModule(3, 8, 3, padding=1, padding_mode='non_exists')
+
+    # leaky relu
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='LeakyReLU'))
+    assert isinstance(conv.activate, nn.LeakyReLU)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # tanh
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='Tanh'))
+    assert isinstance(conv.activate, nn.Tanh)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # Sigmoid
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='Sigmoid'))
+    assert isinstance(conv.activate, nn.Sigmoid)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # PReLU
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='PReLU'))
+    assert isinstance(conv.activate, nn.PReLU)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # HSwish
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='HSwish'))
+    if (TORCH_VERSION == 'parrots'
+            or digit_version(TORCH_VERSION) < digit_version('1.7')):
+        assert isinstance(conv.activate, HSwish)
+    else:
+        assert isinstance(conv.activate, nn.Hardswish)
+
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # HSigmoid
+    conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='HSigmoid'))
+    assert isinstance(conv.activate, HSigmoid)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+
+def test_bias():
+    # bias: auto, without norm
+    conv = ConvModule(3, 8, 2)
+    assert conv.conv.bias is not None
+
+    # bias: auto, with norm
+    conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    assert conv.conv.bias is None
+
+    # bias: False, without norm
+    conv = ConvModule(3, 8, 2, bias=False)
+    assert conv.conv.bias is None
+
+    # bias: True, with batch norm
+    with pytest.warns(UserWarning) as record:
+        ConvModule(3, 8, 2, bias=True, norm_cfg=dict(type='BN'))
+    assert len(record) == 1
+    assert record[0].message.args[
+        0] == 'Unnecessary conv bias before batch/instance norm'
+
+    # bias: True, with instance norm
+    with pytest.warns(UserWarning) as record:
+        ConvModule(3, 8, 2, bias=True, norm_cfg=dict(type='IN'))
+    assert len(record) == 1
+    assert record[0].message.args[
+        0] == 'Unnecessary conv bias before batch/instance norm'
+
+    # bias: True, with other norm
+    with pytest.warns(UserWarning) as record:
+        norm_cfg = dict(type='GN', num_groups=1)
+        ConvModule(3, 8, 2, bias=True, norm_cfg=norm_cfg)
+        warnings.warn('No warnings')
+    assert len(record) == 1
+    assert record[0].message.args[0] == 'No warnings'
+
+
+def conv_forward(self, x):
+    return x + '_conv'
+
+
+def bn_forward(self, x):
+    return x + '_bn'
+
+
+def relu_forward(self, x):
+    return x + '_relu'
+
+
+@patch('torch.nn.ReLU.forward', relu_forward)
+@patch('torch.nn.BatchNorm2d.forward', bn_forward)
+@patch('torch.nn.Conv2d.forward', conv_forward)
+def test_order():
+
+    with pytest.raises(AssertionError):
+        # order must be a tuple
+        order = ['conv', 'norm', 'act']
+        ConvModule(3, 8, 2, order=order)
+
+    with pytest.raises(AssertionError):
+        # length of order must be 3
+        order = ('conv', 'norm')
+        ConvModule(3, 8, 2, order=order)
+
+    with pytest.raises(AssertionError):
+        # order must be an order of 'conv', 'norm', 'act'
+        order = ('conv', 'norm', 'norm')
+        ConvModule(3, 8, 2, order=order)
+
+    with pytest.raises(AssertionError):
+        # order must be an order of 'conv', 'norm', 'act'
+        order = ('conv', 'norm', 'something')
+        ConvModule(3, 8, 2, order=order)
+
+    # ('conv', 'norm', 'act')
+    conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    out = conv('input')
+    assert out == 'input_conv_bn_relu'
+
+    # ('norm', 'conv', 'act')
+    conv = ConvModule(
+        3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
+    out = conv('input')
+    assert out == 'input_bn_conv_relu'
+
+    # ('conv', 'norm', 'act'), activate=False
+    conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    out = conv('input', activate=False)
+    assert out == 'input_conv_bn'
+
+    # ('conv', 'norm', 'act'), activate=False
+    conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    out = conv('input', norm=False)
+    assert out == 'input_conv_relu'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_depthwise_seperable_conv_module.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_depthwise_seperable_conv_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..748fc1bf88166b50aec9665900e664e638b78186
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_depthwise_seperable_conv_module.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+import torch.nn as nn
+
+from mmcv.cnn.bricks import DepthwiseSeparableConvModule
+
+
+def test_depthwise_separable_conv():
+    with pytest.raises(AssertionError):
+        # conv_cfg must be a dict or None
+        DepthwiseSeparableConvModule(4, 8, 2, groups=2)
+
+    # test default config
+    conv = DepthwiseSeparableConvModule(3, 8, 2)
+    assert conv.depthwise_conv.conv.groups == 3
+    assert conv.pointwise_conv.conv.kernel_size == (1, 1)
+    assert not conv.depthwise_conv.with_norm
+    assert not conv.pointwise_conv.with_norm
+    assert conv.depthwise_conv.activate.__class__.__name__ == 'ReLU'
+    assert conv.pointwise_conv.activate.__class__.__name__ == 'ReLU'
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # test dw_norm_cfg
+    conv = DepthwiseSeparableConvModule(3, 8, 2, dw_norm_cfg=dict(type='BN'))
+    assert conv.depthwise_conv.norm_name == 'bn'
+    assert not conv.pointwise_conv.with_norm
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # test pw_norm_cfg
+    conv = DepthwiseSeparableConvModule(3, 8, 2, pw_norm_cfg=dict(type='BN'))
+    assert not conv.depthwise_conv.with_norm
+    assert conv.pointwise_conv.norm_name == 'bn'
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # test norm_cfg
+    conv = DepthwiseSeparableConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
+    assert conv.depthwise_conv.norm_name == 'bn'
+    assert conv.pointwise_conv.norm_name == 'bn'
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    # add test for ['norm', 'conv', 'act']
+    conv = DepthwiseSeparableConvModule(3, 8, 2, order=('norm', 'conv', 'act'))
+    x = torch.rand(1, 3, 256, 256)
+    output = conv(x)
+    assert output.shape == (1, 8, 255, 255)
+
+    conv = DepthwiseSeparableConvModule(
+        3, 8, 3, padding=1, with_spectral_norm=True)
+    assert hasattr(conv.depthwise_conv.conv, 'weight_orig')
+    assert hasattr(conv.pointwise_conv.conv, 'weight_orig')
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    conv = DepthwiseSeparableConvModule(
+        3, 8, 3, padding=1, padding_mode='reflect')
+    assert isinstance(conv.depthwise_conv.padding_layer, nn.ReflectionPad2d)
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # test dw_act_cfg
+    conv = DepthwiseSeparableConvModule(
+        3, 8, 3, padding=1, dw_act_cfg=dict(type='LeakyReLU'))
+    assert conv.depthwise_conv.activate.__class__.__name__ == 'LeakyReLU'
+    assert conv.pointwise_conv.activate.__class__.__name__ == 'ReLU'
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # test pw_act_cfg
+    conv = DepthwiseSeparableConvModule(
+        3, 8, 3, padding=1, pw_act_cfg=dict(type='LeakyReLU'))
+    assert conv.depthwise_conv.activate.__class__.__name__ == 'ReLU'
+    assert conv.pointwise_conv.activate.__class__.__name__ == 'LeakyReLU'
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
+
+    # test act_cfg
+    conv = DepthwiseSeparableConvModule(
+        3, 8, 3, padding=1, act_cfg=dict(type='LeakyReLU'))
+    assert conv.depthwise_conv.activate.__class__.__name__ == 'LeakyReLU'
+    assert conv.pointwise_conv.activate.__class__.__name__ == 'LeakyReLU'
+    output = conv(x)
+    assert output.shape == (1, 8, 256, 256)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_flops_counter.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_flops_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2ba6e242fd95f0d5f7f645046e2871915fae086
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_flops_counter.py
@@ -0,0 +1,152 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+import torch.nn as nn
+
+from mmcv.cnn import get_model_complexity_info
+from mmcv.cnn.utils.flops_counter import flops_to_string, params_to_string
+
+try:
+    from StringIO import StringIO
+except ImportError:
+    from io import StringIO
+
+# yapf: disable
+gt_results = [
+    {'model': nn.Conv1d(3, 8, 3), 'input': (3, 16), 'flops': 1120.0, 'params': 80.0},  # noqa: E501
+    {'model': nn.Conv2d(3, 8, 3), 'input': (3, 16, 16), 'flops': 43904.0, 'params': 224.0},  # noqa: E501
+    {'model': nn.Conv3d(3, 8, 3), 'input': (3, 3, 16, 16), 'flops': 128576.0, 'params': 656.0},  # noqa: E501
+    {'model': nn.ReLU(), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.PReLU(), 'input': (3, 16, 16), 'flops': 768.0, 'params': 1},  # noqa: E501
+    {'model': nn.ELU(), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.LeakyReLU(), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.ReLU6(), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.MaxPool1d(2), 'input': (3, 16), 'flops': 48.0, 'params': 0},  # noqa: E501
+    {'model': nn.MaxPool2d(2), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.MaxPool3d(2), 'input': (3, 3, 16, 16), 'flops': 2304.0, 'params': 0},  # noqa: E501
+    {'model': nn.AvgPool1d(2), 'input': (3, 16), 'flops': 48.0, 'params': 0},  # noqa: E501
+    {'model': nn.AvgPool2d(2), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.AvgPool3d(2), 'input': (3, 3, 16, 16), 'flops': 2304.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveMaxPool1d(2), 'input': (3, 16), 'flops': 48.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveMaxPool2d(2), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveMaxPool3d(2), 'input': (3, 3, 16, 16), 'flops': 2304.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveAvgPool1d(2), 'input': (3, 16), 'flops': 48.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveAvgPool2d(2), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.AdaptiveAvgPool3d(2), 'input': (3, 3, 16, 16), 'flops': 2304.0, 'params': 0},  # noqa: E501
+    {'model': nn.BatchNorm1d(3), 'input': (3, 16), 'flops': 96.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.BatchNorm2d(3), 'input': (3, 16, 16), 'flops': 1536.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.BatchNorm3d(3), 'input': (3, 3, 16, 16), 'flops': 4608.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.GroupNorm(2, 6), 'input': (6, 16, 16), 'flops': 3072.0, 'params': 12.0},  # noqa: E501
+    {'model': nn.InstanceNorm1d(3, affine=True), 'input': (3, 16), 'flops': 96.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.InstanceNorm2d(3, affine=True), 'input': (3, 16, 16), 'flops': 1536.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.InstanceNorm3d(3, affine=True), 'input': (3, 3, 16, 16), 'flops': 4608.0, 'params': 6.0},  # noqa: E501
+    {'model': nn.LayerNorm((3, 16, 16)), 'input': (3, 16, 16), 'flops': 1536.0, 'params': 1536.0},  # noqa: E501
+    {'model': nn.LayerNorm((3, 16, 16), elementwise_affine=False), 'input': (3, 16, 16), 'flops': 768.0, 'params': 0},  # noqa: E501
+    {'model': nn.Linear(1024, 2), 'input': (1024, ), 'flops': 2048.0, 'params': 2050.0},  # noqa: E501
+    {'model': nn.ConvTranspose2d(3, 8, 3), 'input': (3, 16, 16), 'flops': 57888, 'params': 224.0},  # noqa: E501
+    {'model': nn.Upsample((32, 32)), 'input': (3, 16, 16), 'flops': 3072.0, 'params': 0}  # noqa: E501
+]
+# yapf: enable
+
+
+class ExampleModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.conv2d = nn.Conv2d(3, 8, 3)
+
+    def forward(self, imgs):
+        x = torch.randn((1, *imgs))
+        return self.conv2d(x)
+
+
+def input_constructor(x):
+    return dict(imgs=x)
+
+
+def test_flops_counter():
+    with pytest.raises(AssertionError):
+        # input_res should be a tuple
+        model = nn.Conv2d(3, 8, 3)
+        input_res = [1, 3, 16, 16]
+        get_model_complexity_info(model, input_res)
+
+    with pytest.raises(AssertionError):
+        # len(input_res) >= 2
+        model = nn.Conv2d(3, 8, 3)
+        input_res = tuple()
+        get_model_complexity_info(model, input_res)
+
+    # test common layers
+    for item in gt_results:
+        model = item['model']
+        input = item['input']
+        flops, params = get_model_complexity_info(
+            model, input, as_strings=False, print_per_layer_stat=False)
+        assert flops == item['flops'] and params == item['params']
+
+    # test input constructor
+    model = ExampleModel()
+    x = (3, 16, 16)
+    flops, params = get_model_complexity_info(
+        model,
+        x,
+        as_strings=False,
+        print_per_layer_stat=False,
+        input_constructor=input_constructor)
+    assert flops == 43904.0 and params == 224.0
+
+    # test output string
+    model = nn.Conv3d(3, 8, 3)
+    x = (3, 3, 512, 512)
+    flops, params = get_model_complexity_info(
+        model, x, print_per_layer_stat=False)
+    assert flops == '0.17 GFLOPs' and params == str(656)
+
+    # test print per layer status
+    model = nn.Conv1d(3, 8, 3)
+    x = (3, 16)
+    out = StringIO()
+    get_model_complexity_info(model, x, ost=out)
+    assert out.getvalue() == \
+        'Conv1d(0.0 M, 100.000% Params, 0.0 GFLOPs, 100.000% FLOPs, 3, 8, kernel_size=(3,), stride=(1,))\n'  # noqa: E501
+
+    # test when model is not a common instance
+    model = nn.Sequential(nn.Conv2d(3, 8, 3), nn.Flatten(), nn.Linear(1568, 2))
+    x = (3, 16, 16)
+    flops, params = get_model_complexity_info(
+        model, x, as_strings=False, print_per_layer_stat=True)
+    assert flops == 47040.0 and params == 3362
+
+
+def test_flops_to_string():
+    flops = 6.54321 * 10.**9
+    assert flops_to_string(flops) == '6.54 GFLOPs'
+    assert flops_to_string(flops, 'MFLOPs') == '6543.21 MFLOPs'
+    assert flops_to_string(flops, 'KFLOPs') == '6543210.0 KFLOPs'
+    assert flops_to_string(flops, 'FLOPs') == '6543210000.0 FLOPs'
+    assert flops_to_string(flops, precision=4) == '6.5432 GFLOPs'
+
+    flops = 6.54321 * 10.**9
+    assert flops_to_string(flops, None) == '6.54 GFLOPs'
+    flops = 3.21 * 10.**7
+    assert flops_to_string(flops, None) == '32.1 MFLOPs'
+    flops = 5.4 * 10.**3
+    assert flops_to_string(flops, None) == '5.4 KFLOPs'
+    flops = 987
+    assert flops_to_string(flops, None) == '987 FLOPs'
+
+
+def test_params_to_string():
+    num_params = 3.21 * 10.**7
+    assert params_to_string(num_params) == '32.1 M'
+    num_params = 4.56 * 10.**5
+    assert params_to_string(num_params) == '456.0 k'
+    num_params = 7.89 * 10.**2
+    assert params_to_string(num_params) == '789.0'
+
+    num_params = 6.54321 * 10.**7
+    assert params_to_string(num_params, 'M') == '65.43 M'
+    assert params_to_string(num_params, 'K') == '65432.1 K'
+    assert params_to_string(num_params, '') == '65432100.0'
+    assert params_to_string(num_params, precision=4) == '65.4321 M'
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_fuse_conv_bn.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_fuse_conv_bn.py
new file mode 100644
index 0000000000000000000000000000000000000000..e60be5386c5cc96c765caf066a8d9a82de127996
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_fuse_conv_bn.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmcv.cnn import ConvModule, fuse_conv_bn
+
+
+def test_fuse_conv_bn():
+    inputs = torch.rand((1, 3, 5, 5))
+    modules = nn.ModuleList()
+    modules.append(nn.BatchNorm2d(3))
+    modules.append(ConvModule(3, 5, 3, norm_cfg=dict(type='BN')))
+    modules.append(ConvModule(5, 5, 3, norm_cfg=dict(type='BN')))
+    modules = nn.Sequential(*modules)
+    fused_modules = fuse_conv_bn(modules)
+    assert torch.equal(modules(inputs), fused_modules(inputs))
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_generalized_attention.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_generalized_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b844f0ad57ec8a1410956d7c928e40714d06eeb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_generalized_attention.py
@@ -0,0 +1,76 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmcv.cnn.bricks import GeneralizedAttention
+
+
+def test_context_block():
+
+    # test attention_type='1000'
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, attention_type='1000')
+    assert gen_attention_block.query_conv.in_channels == 16
+    assert gen_attention_block.key_conv.in_channels == 16
+    assert gen_attention_block.key_conv.in_channels == 16
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test attention_type='0100'
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, attention_type='0100')
+    assert gen_attention_block.query_conv.in_channels == 16
+    assert gen_attention_block.appr_geom_fc_x.in_features == 8
+    assert gen_attention_block.appr_geom_fc_y.in_features == 8
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test attention_type='0010'
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, attention_type='0010')
+    assert gen_attention_block.key_conv.in_channels == 16
+    assert hasattr(gen_attention_block, 'appr_bias')
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test attention_type='0001'
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, attention_type='0001')
+    assert gen_attention_block.appr_geom_fc_x.in_features == 8
+    assert gen_attention_block.appr_geom_fc_y.in_features == 8
+    assert hasattr(gen_attention_block, 'geom_bias')
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test spatial_range >= 0
+    imgs = torch.randn(2, 256, 20, 20)
+    gen_attention_block = GeneralizedAttention(256, spatial_range=10)
+    assert hasattr(gen_attention_block, 'local_constraint_map')
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test q_stride > 1
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, q_stride=2)
+    assert gen_attention_block.q_downsample is not None
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test kv_stride > 1
+    imgs = torch.randn(2, 16, 20, 20)
+    gen_attention_block = GeneralizedAttention(16, kv_stride=2)
+    assert gen_attention_block.kv_downsample is not None
+    out = gen_attention_block(imgs)
+    assert out.shape == imgs.shape
+
+    # test fp16 with attention_type='1111'
+    if torch.cuda.is_available():
+        imgs = torch.randn(2, 16, 20, 20).cuda().to(torch.half)
+        gen_attention_block = GeneralizedAttention(
+            16,
+            spatial_range=-1,
+            num_heads=8,
+            attention_type='1111',
+            kv_stride=2)
+        gen_attention_block.cuda().type(torch.half)
+        out = gen_attention_block(imgs)
+        assert out.shape == imgs.shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hsigmoid.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hsigmoid.py
new file mode 100644
index 0000000000000000000000000000000000000000..43e9f624a2ccf369d844a9e8ec7238158b364187
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hsigmoid.py
@@ -0,0 +1,37 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.cnn.bricks import HSigmoid
+
+
+def test_hsigmoid():
+    # test assertion divisor can not be zero
+    with pytest.raises(AssertionError):
+        HSigmoid(divisor=0)
+
+    # test with default parameters
+    act = HSigmoid()
+    input_shape = torch.Size([1, 3, 64, 64])
+    input = torch.randn(input_shape)
+    output = act(input)
+    expected_output = torch.min(
+        torch.max((input + 3) / 6, torch.zeros(input_shape)),
+        torch.ones(input_shape))
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.equal(output, expected_output)
+
+    # test with designated parameters
+    act = HSigmoid(1, 2, 0, 1)
+    input_shape = torch.Size([1, 3, 64, 64])
+    input = torch.randn(input_shape)
+    output = act(input)
+    expected_output = torch.min(
+        torch.max((input + 1) / 2, torch.zeros(input_shape)),
+        torch.ones(input_shape))
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.equal(output, expected_output)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hswish.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hswish.py
new file mode 100644
index 0000000000000000000000000000000000000000..5cd1bcf31221b14fec0b60537b869f4ebe12f26a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_hswish.py
@@ -0,0 +1,21 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from torch.nn.functional import relu6
+
+from mmcv.cnn.bricks import HSwish
+
+
+def test_hswish():
+    # test inplace
+    act = HSwish(inplace=True)
+    assert act.act.inplace
+    act = HSwish()
+    assert not act.act.inplace
+
+    input = torch.randn(1, 3, 64, 64)
+    expected_output = input * relu6(input + 3) / 6
+    output = act(input)
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.equal(output, expected_output)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_non_local.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_non_local.py
new file mode 100644
index 0000000000000000000000000000000000000000..25d78833912a195532eb946a8939d1ea986043a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_non_local.py
@@ -0,0 +1,220 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+import torch.nn as nn
+
+from mmcv.cnn import NonLocal1d, NonLocal2d, NonLocal3d
+from mmcv.cnn.bricks.non_local import _NonLocalNd
+
+
+def test_nonlocal():
+    with pytest.raises(ValueError):
+        # mode should be in ['embedded_gaussian', 'dot_product']
+        _NonLocalNd(3, mode='unsupport_mode')
+
+    # _NonLocalNd with zero initialization
+    _NonLocalNd(3)
+    _NonLocalNd(3, norm_cfg=dict(type='BN'))
+
+    # _NonLocalNd without zero initialization
+    _NonLocalNd(3, zeros_init=False)
+    _NonLocalNd(3, norm_cfg=dict(type='BN'), zeros_init=False)
+
+
+def test_nonlocal3d():
+    # NonLocal3d with 'embedded_gaussian' mode
+    imgs = torch.randn(2, 3, 10, 20, 20)
+    nonlocal_3d = NonLocal3d(3)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            # NonLocal is only implemented on gpu in parrots
+            imgs = imgs.cuda()
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal3d with 'dot_product' mode
+    nonlocal_3d = NonLocal3d(3, mode='dot_product')
+    assert nonlocal_3d.mode == 'dot_product'
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal3d with 'concatenation' mode
+    nonlocal_3d = NonLocal3d(3, mode='concatenation')
+    assert nonlocal_3d.mode == 'concatenation'
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal3d with 'gaussian' mode
+    nonlocal_3d = NonLocal3d(3, mode='gaussian')
+    assert not hasattr(nonlocal_3d, 'phi')
+    assert nonlocal_3d.mode == 'gaussian'
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal3d with 'gaussian' mode and sub_sample
+    nonlocal_3d = NonLocal3d(3, mode='gaussian', sub_sample=True)
+    assert isinstance(nonlocal_3d.g, nn.Sequential) and len(nonlocal_3d.g) == 2
+    assert isinstance(nonlocal_3d.g[1], nn.MaxPool3d)
+    assert nonlocal_3d.g[1].kernel_size == (1, 2, 2)
+    assert isinstance(nonlocal_3d.phi, nn.MaxPool3d)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal3d with 'dot_product' mode and sub_sample
+    nonlocal_3d = NonLocal3d(3, mode='dot_product', sub_sample=True)
+    for m in [nonlocal_3d.g, nonlocal_3d.phi]:
+        assert isinstance(m, nn.Sequential) and len(m) == 2
+        assert isinstance(m[1], nn.MaxPool3d)
+        assert m[1].kernel_size == (1, 2, 2)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_3d.cuda()
+    out = nonlocal_3d(imgs)
+    assert out.shape == imgs.shape
+
+
+def test_nonlocal2d():
+    # NonLocal2d with 'embedded_gaussian' mode
+    imgs = torch.randn(2, 3, 20, 20)
+    nonlocal_2d = NonLocal2d(3)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal2d with 'dot_product' mode
+    imgs = torch.randn(2, 3, 20, 20)
+    nonlocal_2d = NonLocal2d(3, mode='dot_product')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal2d with 'concatenation' mode
+    imgs = torch.randn(2, 3, 20, 20)
+    nonlocal_2d = NonLocal2d(3, mode='concatenation')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal2d with 'gaussian' mode
+    imgs = torch.randn(2, 3, 20, 20)
+    nonlocal_2d = NonLocal2d(3, mode='gaussian')
+    assert not hasattr(nonlocal_2d, 'phi')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal2d with 'gaussian' mode and sub_sample
+    nonlocal_2d = NonLocal2d(3, mode='gaussian', sub_sample=True)
+    assert isinstance(nonlocal_2d.g, nn.Sequential) and len(nonlocal_2d.g) == 2
+    assert isinstance(nonlocal_2d.g[1], nn.MaxPool2d)
+    assert nonlocal_2d.g[1].kernel_size == (2, 2)
+    assert isinstance(nonlocal_2d.phi, nn.MaxPool2d)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal2d with 'dot_product' mode and sub_sample
+    nonlocal_2d = NonLocal2d(3, mode='dot_product', sub_sample=True)
+    for m in [nonlocal_2d.g, nonlocal_2d.phi]:
+        assert isinstance(m, nn.Sequential) and len(m) == 2
+        assert isinstance(m[1], nn.MaxPool2d)
+        assert m[1].kernel_size == (2, 2)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_2d.cuda()
+    out = nonlocal_2d(imgs)
+    assert out.shape == imgs.shape
+
+
+def test_nonlocal1d():
+    # NonLocal1d with 'embedded_gaussian' mode
+    imgs = torch.randn(2, 3, 20)
+    nonlocal_1d = NonLocal1d(3)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal1d with 'dot_product' mode
+    imgs = torch.randn(2, 3, 20)
+    nonlocal_1d = NonLocal1d(3, mode='dot_product')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal1d with 'concatenation' mode
+    imgs = torch.randn(2, 3, 20)
+    nonlocal_1d = NonLocal1d(3, mode='concatenation')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal1d with 'gaussian' mode
+    imgs = torch.randn(2, 3, 20)
+    nonlocal_1d = NonLocal1d(3, mode='gaussian')
+    assert not hasattr(nonlocal_1d, 'phi')
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            imgs = imgs.cuda()
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal1d with 'gaussian' mode and sub_sample
+    nonlocal_1d = NonLocal1d(3, mode='gaussian', sub_sample=True)
+    assert isinstance(nonlocal_1d.g, nn.Sequential) and len(nonlocal_1d.g) == 2
+    assert isinstance(nonlocal_1d.g[1], nn.MaxPool1d)
+    assert nonlocal_1d.g[1].kernel_size == 2
+    assert isinstance(nonlocal_1d.phi, nn.MaxPool1d)
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
+
+    # NonLocal1d with 'dot_product' mode and sub_sample
+    nonlocal_1d = NonLocal1d(3, mode='dot_product', sub_sample=True)
+    for m in [nonlocal_1d.g, nonlocal_1d.phi]:
+        assert isinstance(m, nn.Sequential) and len(m) == 2
+        assert isinstance(m[1], nn.MaxPool1d)
+        assert m[1].kernel_size == 2
+    if torch.__version__ == 'parrots':
+        if torch.cuda.is_available():
+            nonlocal_1d.cuda()
+    out = nonlocal_1d(imgs)
+    assert out.shape == imgs.shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_scale.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_scale.py
new file mode 100644
index 0000000000000000000000000000000000000000..04d75ec16f56353b3d5f8c85bf5911a04a38d4c3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_scale.py
@@ -0,0 +1,78 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.cnn.bricks import LayerScale, Scale
+
+
+def test_scale():
+    # test default scale
+    scale = Scale()
+    assert scale.scale.data == 1.
+    assert scale.scale.dtype == torch.float
+    x = torch.rand(1, 3, 64, 64)
+    output = scale(x)
+    assert output.shape == (1, 3, 64, 64)
+
+    # test given scale
+    scale = Scale(10.)
+    assert scale.scale.data == 10.
+    assert scale.scale.dtype == torch.float
+    x = torch.rand(1, 3, 64, 64)
+    output = scale(x)
+    assert output.shape == (1, 3, 64, 64)
+
+
+def test_layer_scale():
+    with pytest.raises(AssertionError):
+        cfg = dict(
+            dim=10,
+            data_format='BNC',
+        )
+        LayerScale(**cfg)
+
+    # test init
+    cfg = dict(dim=10)
+    ls = LayerScale(**cfg)
+    assert torch.equal(ls.weight, torch.ones(10, requires_grad=True) * 1e-5)
+
+    # test forward
+    # test channels_last
+    cfg = dict(dim=256, inplace=False, data_format='channels_last')
+    ls_channels_last = LayerScale(**cfg)
+    x = torch.randn((4, 49, 256))
+    out = ls_channels_last(x)
+    assert tuple(out.size()) == (4, 49, 256)
+    assert torch.equal(x * 1e-5, out)
+
+    # test channels_last 2d
+    cfg = dict(dim=256, inplace=False, data_format='channels_last')
+    ls_channels_last = LayerScale(**cfg)
+    x = torch.randn((4, 7, 49, 256))
+    out = ls_channels_last(x)
+    assert tuple(out.size()) == (4, 7, 49, 256)
+    assert torch.equal(x * 1e-5, out)
+
+    # test channels_first
+    cfg = dict(dim=256, inplace=False, data_format='channels_first')
+    ls_channels_first = LayerScale(**cfg)
+    x = torch.randn((4, 256, 7, 7))
+    out = ls_channels_first(x)
+    assert tuple(out.size()) == (4, 256, 7, 7)
+    assert torch.equal(x * 1e-5, out)
+
+    # test channels_first 3D
+    cfg = dict(dim=256, inplace=False, data_format='channels_first')
+    ls_channels_first = LayerScale(**cfg)
+    x = torch.randn((4, 256, 7, 7, 7))
+    out = ls_channels_first(x)
+    assert tuple(out.size()) == (4, 256, 7, 7, 7)
+    assert torch.equal(x * 1e-5, out)
+
+    # test inplace True
+    cfg = dict(dim=256, inplace=True, data_format='channels_first')
+    ls_channels_first = LayerScale(**cfg)
+    x = torch.randn((4, 256, 7, 7))
+    out = ls_channels_first(x)
+    assert tuple(out.size()) == (4, 256, 7, 7)
+    assert x is out
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_silu.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_silu.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3bbc0f9bb5f2c07da61b413f0df0fae16fab9d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_silu.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmcv.cnn.bricks import build_activation_layer
+
+
+def test_silu():
+    act = build_activation_layer(dict(type='SiLU'))
+    input = torch.randn(1, 3, 64, 64)
+    expected_output = input * torch.sigmoid(input)
+    output = act(input)
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.allclose(output, expected_output)
+
+    # test inplace
+    act = build_activation_layer(dict(type='SiLU', inplace=True))
+    assert act.inplace
+    input = torch.randn(1, 3, 64, 64)
+    expected_output = input * torch.sigmoid(input)
+    output = act(input)
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.allclose(output, expected_output)
+    assert torch.allclose(input, expected_output)
+    assert input is output
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_swish.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_swish.py
new file mode 100644
index 0000000000000000000000000000000000000000..2317f5a139a5228c049848a260ea914ac02eecee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_swish.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn.functional as F
+
+from mmcv.cnn.bricks import Swish
+
+
+def test_swish():
+    act = Swish()
+    input = torch.randn(1, 3, 64, 64)
+    expected_output = input * F.sigmoid(input)
+    output = act(input)
+    # test output shape
+    assert output.shape == expected_output.shape
+    # test output value
+    assert torch.equal(output, expected_output)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_transformer.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..b5a9562ee723cdd8e780e5878990170e4be419a7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_transformer.py
@@ -0,0 +1,687 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+
+import pytest
+import torch
+from mmengine.model import ModuleList
+
+from mmcv.cnn.bricks.drop import DropPath
+from mmcv.cnn.bricks.transformer import (FFN, AdaptivePadding,
+                                         BaseTransformerLayer,
+                                         MultiheadAttention, PatchEmbed,
+                                         PatchMerging,
+                                         TransformerLayerSequence)
+
+
+def test_adaptive_padding():
+
+    for padding in ('same', 'corner'):
+        kernel_size = 16
+        stride = 16
+        dilation = 1
+        input = torch.rand(1, 1, 15, 17)
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        out = adap_pad(input)
+        # padding to divisible by 16
+        assert (out.shape[2], out.shape[3]) == (16, 32)
+        input = torch.rand(1, 1, 16, 17)
+        out = adap_pad(input)
+        # padding to divisible by 16
+        assert (out.shape[2], out.shape[3]) == (16, 32)
+
+        kernel_size = (2, 2)
+        stride = (2, 2)
+        dilation = (1, 1)
+
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        input = torch.rand(1, 1, 11, 13)
+        out = adap_pad(input)
+        # padding to divisible by 2
+        assert (out.shape[2], out.shape[3]) == (12, 14)
+
+        kernel_size = (2, 2)
+        stride = (10, 10)
+        dilation = (1, 1)
+
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        input = torch.rand(1, 1, 10, 13)
+        out = adap_pad(input)
+        #  no padding
+        assert (out.shape[2], out.shape[3]) == (10, 13)
+
+        kernel_size = (11, 11)
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        input = torch.rand(1, 1, 11, 13)
+        out = adap_pad(input)
+        #  all padding
+        assert (out.shape[2], out.shape[3]) == (21, 21)
+
+        # test padding as kernel is (7,9)
+        input = torch.rand(1, 1, 11, 13)
+        stride = (3, 4)
+        kernel_size = (4, 5)
+        dilation = (2, 2)
+        # actually (7, 9)
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        dilation_out = adap_pad(input)
+        assert (dilation_out.shape[2], dilation_out.shape[3]) == (16, 21)
+        kernel_size = (7, 9)
+        dilation = (1, 1)
+        adap_pad = AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding)
+        kernel79_out = adap_pad(input)
+        assert (kernel79_out.shape[2], kernel79_out.shape[3]) == (16, 21)
+        assert kernel79_out.shape == dilation_out.shape
+
+    # assert only support "same" "corner"
+    with pytest.raises(AssertionError):
+        AdaptivePadding(
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=1)
+
+
+def test_patch_embed():
+    B = 2
+    H = 3
+    W = 4
+    C = 3
+    embed_dims = 10
+    kernel_size = 3
+    stride = 1
+    dummy_input = torch.rand(B, C, H, W)
+    patch_merge_1 = PatchEmbed(
+        in_channels=C,
+        embed_dims=embed_dims,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=0,
+        dilation=1,
+        norm_cfg=None)
+
+    x1, shape = patch_merge_1(dummy_input)
+    # test out shape
+    assert x1.shape == (2, 2, 10)
+    # test outsize is correct
+    assert shape == (1, 2)
+    # test L = out_h * out_w
+    assert shape[0] * shape[1] == x1.shape[1]
+
+    B = 2
+    H = 10
+    W = 10
+    C = 3
+    embed_dims = 10
+    kernel_size = 5
+    stride = 2
+    dummy_input = torch.rand(B, C, H, W)
+    # test dilation
+    patch_merge_2 = PatchEmbed(
+        in_channels=C,
+        embed_dims=embed_dims,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=0,
+        dilation=2,
+        norm_cfg=None,
+    )
+
+    x2, shape = patch_merge_2(dummy_input)
+    # test out shape
+    assert x2.shape == (2, 1, 10)
+    # test outsize is correct
+    assert shape == (1, 1)
+    # test L = out_h * out_w
+    assert shape[0] * shape[1] == x2.shape[1]
+
+    stride = 2
+    input_size = (10, 10)
+
+    dummy_input = torch.rand(B, C, H, W)
+    # test stride and norm
+    patch_merge_3 = PatchEmbed(
+        in_channels=C,
+        embed_dims=embed_dims,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=0,
+        dilation=2,
+        norm_cfg=dict(type='LN'),
+        input_size=input_size)
+
+    x3, shape = patch_merge_3(dummy_input)
+    # test out shape
+    assert x3.shape == (2, 1, 10)
+    # test outsize is correct
+    assert shape == (1, 1)
+    # test L = out_h * out_w
+    assert shape[0] * shape[1] == x3.shape[1]
+
+    # test the init_out_size with nn.Unfold
+    assert patch_merge_3.init_out_size[1] == (input_size[0] - 2 * 4 -
+                                              1) // 2 + 1
+    assert patch_merge_3.init_out_size[0] == (input_size[0] - 2 * 4 -
+                                              1) // 2 + 1
+    H = 11
+    W = 12
+    input_size = (H, W)
+    dummy_input = torch.rand(B, C, H, W)
+    # test stride and norm
+    patch_merge_3 = PatchEmbed(
+        in_channels=C,
+        embed_dims=embed_dims,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=0,
+        dilation=2,
+        norm_cfg=dict(type='LN'),
+        input_size=input_size)
+
+    _, shape = patch_merge_3(dummy_input)
+    # when input_size equal to real input
+    # the out_size should be equal to `init_out_size`
+    assert shape == patch_merge_3.init_out_size
+
+    input_size = (H, W)
+    dummy_input = torch.rand(B, C, H, W)
+    # test stride and norm
+    patch_merge_3 = PatchEmbed(
+        in_channels=C,
+        embed_dims=embed_dims,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=0,
+        dilation=2,
+        norm_cfg=dict(type='LN'),
+        input_size=input_size)
+
+    _, shape = patch_merge_3(dummy_input)
+    # when input_size equal to real input
+    # the out_size should be equal to `init_out_size`
+    assert shape == patch_merge_3.init_out_size
+
+    # test adap padding
+    for padding in ('same', 'corner'):
+        in_c = 2
+        embed_dims = 3
+        B = 2
+
+        # test stride is 1
+        input_size = (5, 5)
+        kernel_size = (5, 5)
+        stride = (1, 1)
+        dilation = 1
+        bias = False
+
+        x = torch.rand(B, in_c, *input_size)
+        patch_embed = PatchEmbed(
+            in_channels=in_c,
+            embed_dims=embed_dims,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_embed(x)
+        assert x_out.size() == (B, 25, 3)
+        assert out_size == (5, 5)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test kernel_size == stride
+        input_size = (5, 5)
+        kernel_size = (5, 5)
+        stride = (5, 5)
+        dilation = 1
+        bias = False
+
+        x = torch.rand(B, in_c, *input_size)
+        patch_embed = PatchEmbed(
+            in_channels=in_c,
+            embed_dims=embed_dims,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_embed(x)
+        assert x_out.size() == (B, 1, 3)
+        assert out_size == (1, 1)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test kernel_size == stride
+        input_size = (6, 5)
+        kernel_size = (5, 5)
+        stride = (5, 5)
+        dilation = 1
+        bias = False
+
+        x = torch.rand(B, in_c, *input_size)
+        patch_embed = PatchEmbed(
+            in_channels=in_c,
+            embed_dims=embed_dims,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_embed(x)
+        assert x_out.size() == (B, 2, 3)
+        assert out_size == (2, 1)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test different kernel_size with different stride
+        input_size = (6, 5)
+        kernel_size = (6, 2)
+        stride = (6, 2)
+        dilation = 1
+        bias = False
+
+        x = torch.rand(B, in_c, *input_size)
+        patch_embed = PatchEmbed(
+            in_channels=in_c,
+            embed_dims=embed_dims,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_embed(x)
+        assert x_out.size() == (B, 3, 3)
+        assert out_size == (1, 3)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+
+def test_patch_merging():
+
+    # Test the model with int padding
+    in_c = 3
+    out_c = 4
+    kernel_size = 3
+    stride = 3
+    padding = 1
+    dilation = 1
+    bias = False
+    # test the case `pad_to_stride` is False
+    patch_merge = PatchMerging(
+        in_channels=in_c,
+        out_channels=out_c,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        bias=bias)
+    B, L, C = 1, 100, 3
+    input_size = (10, 10)
+    x = torch.rand(B, L, C)
+    x_out, out_size = patch_merge(x, input_size)
+    assert x_out.size() == (1, 16, 4)
+    assert out_size == (4, 4)
+    # assert out size is consistent with real output
+    assert x_out.size(1) == out_size[0] * out_size[1]
+    in_c = 4
+    out_c = 5
+    kernel_size = 6
+    stride = 3
+    padding = 2
+    dilation = 2
+    bias = False
+    patch_merge = PatchMerging(
+        in_channels=in_c,
+        out_channels=out_c,
+        kernel_size=kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        bias=bias)
+    B, L, C = 1, 100, 4
+    input_size = (10, 10)
+    x = torch.rand(B, L, C)
+    x_out, out_size = patch_merge(x, input_size)
+    assert x_out.size() == (1, 4, 5)
+    assert out_size == (2, 2)
+    # assert out size is consistent with real output
+    assert x_out.size(1) == out_size[0] * out_size[1]
+
+    # Test with adaptive padding
+    for padding in ('same', 'corner'):
+        in_c = 2
+        out_c = 3
+        B = 2
+
+        # test stride is 1
+        input_size = (5, 5)
+        kernel_size = (5, 5)
+        stride = (1, 1)
+        dilation = 1
+        bias = False
+        L = input_size[0] * input_size[1]
+
+        x = torch.rand(B, L, in_c)
+        patch_merge = PatchMerging(
+            in_channels=in_c,
+            out_channels=out_c,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_merge(x, input_size)
+        assert x_out.size() == (B, 25, 3)
+        assert out_size == (5, 5)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test kernel_size == stride
+        input_size = (5, 5)
+        kernel_size = (5, 5)
+        stride = (5, 5)
+        dilation = 1
+        bias = False
+        L = input_size[0] * input_size[1]
+
+        x = torch.rand(B, L, in_c)
+        patch_merge = PatchMerging(
+            in_channels=in_c,
+            out_channels=out_c,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_merge(x, input_size)
+        assert x_out.size() == (B, 1, 3)
+        assert out_size == (1, 1)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test kernel_size == stride
+        input_size = (6, 5)
+        kernel_size = (5, 5)
+        stride = (5, 5)
+        dilation = 1
+        bias = False
+        L = input_size[0] * input_size[1]
+
+        x = torch.rand(B, L, in_c)
+        patch_merge = PatchMerging(
+            in_channels=in_c,
+            out_channels=out_c,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_merge(x, input_size)
+        assert x_out.size() == (B, 2, 3)
+        assert out_size == (2, 1)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+        # test different kernel_size with different stride
+        input_size = (6, 5)
+        kernel_size = (6, 2)
+        stride = (6, 2)
+        dilation = 1
+        bias = False
+        L = input_size[0] * input_size[1]
+
+        x = torch.rand(B, L, in_c)
+        patch_merge = PatchMerging(
+            in_channels=in_c,
+            out_channels=out_c,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            bias=bias)
+
+        x_out, out_size = patch_merge(x, input_size)
+        assert x_out.size() == (B, 3, 3)
+        assert out_size == (1, 3)
+        assert x_out.size(1) == out_size[0] * out_size[1]
+
+
+def test_multiheadattention():
+    MultiheadAttention(
+        embed_dims=5,
+        num_heads=5,
+        attn_drop=0,
+        proj_drop=0,
+        dropout_layer=dict(type='Dropout', drop_prob=0.),
+        batch_first=True)
+    batch_dim = 2
+    embed_dim = 5
+    num_query = 100
+    attn_batch_first = MultiheadAttention(
+        embed_dims=5,
+        num_heads=5,
+        attn_drop=0,
+        proj_drop=0,
+        dropout_layer=dict(type='DropPath', drop_prob=0.),
+        batch_first=True)
+
+    attn_query_first = MultiheadAttention(
+        embed_dims=5,
+        num_heads=5,
+        attn_drop=0,
+        proj_drop=0,
+        dropout_layer=dict(type='DropPath', drop_prob=0.),
+        batch_first=False)
+
+    param_dict = dict(attn_query_first.named_parameters())
+    for n, v in attn_batch_first.named_parameters():
+        param_dict[n].data = v.data
+
+    input_batch_first = torch.rand(batch_dim, num_query, embed_dim)
+    input_query_first = input_batch_first.transpose(0, 1)
+
+    assert torch.allclose(
+        attn_query_first(input_query_first).sum(),
+        attn_batch_first(input_batch_first).sum())
+
+    key_batch_first = torch.rand(batch_dim, num_query, embed_dim)
+    key_query_first = key_batch_first.transpose(0, 1)
+
+    assert torch.allclose(
+        attn_query_first(input_query_first, key_query_first).sum(),
+        attn_batch_first(input_batch_first, key_batch_first).sum())
+
+    identity = torch.ones_like(input_query_first)
+
+    # check deprecated arguments can be used normally
+
+    assert torch.allclose(
+        attn_query_first(
+            input_query_first, key_query_first, residual=identity).sum(),
+        attn_batch_first(input_batch_first, key_batch_first).sum() +
+        identity.sum() - input_batch_first.sum())
+
+    assert torch.allclose(
+        attn_query_first(
+            input_query_first, key_query_first, identity=identity).sum(),
+        attn_batch_first(input_batch_first, key_batch_first).sum() +
+        identity.sum() - input_batch_first.sum())
+
+    attn_query_first(
+        input_query_first, key_query_first, identity=identity).sum(),
+
+
+def test_ffn():
+    with pytest.raises(AssertionError):
+        # num_fcs should be no less than 2
+        FFN(num_fcs=1)
+    ffn = FFN(dropout=0, add_identity=True)
+
+    input_tensor = torch.rand(2, 20, 256)
+    input_tensor_nbc = input_tensor.transpose(0, 1)
+    assert torch.allclose(ffn(input_tensor).sum(), ffn(input_tensor_nbc).sum())
+    residual = torch.rand_like(input_tensor)
+    torch.allclose(
+        ffn(input_tensor, residual=residual).sum(),
+        ffn(input_tensor).sum() + residual.sum() - input_tensor.sum())
+
+    torch.allclose(
+        ffn(input_tensor, identity=residual).sum(),
+        ffn(input_tensor).sum() + residual.sum() - input_tensor.sum())
+
+    # test with layer_scale
+    ffn = FFN(dropout=0, add_identity=True, layer_scale_init_value=0.1)
+
+    input_tensor = torch.rand(2, 20, 256)
+    input_tensor_nbc = input_tensor.transpose(0, 1)
+    assert torch.allclose(ffn(input_tensor).sum(), ffn(input_tensor_nbc).sum())
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='Cuda not available')
+def test_basetransformerlayer_cuda():
+    # To test if the BaseTransformerLayer's behaviour remains
+    # consistent after being deepcopied
+    operation_order = ('self_attn', 'ffn')
+    baselayer = BaseTransformerLayer(
+        operation_order=operation_order,
+        batch_first=True,
+        attn_cfgs=dict(
+            type='MultiheadAttention',
+            embed_dims=256,
+            num_heads=8,
+        ),
+    )
+    baselayers = ModuleList([copy.deepcopy(baselayer) for _ in range(2)])
+    baselayers.to('cuda')
+    x = torch.rand(2, 10, 256).cuda()
+    for m in baselayers:
+        x = m(x)
+        assert x.shape == torch.Size([2, 10, 256])
+
+
+@pytest.mark.parametrize('embed_dims', [False, 256])
+def test_basetransformerlayer(embed_dims):
+    attn_cfgs = dict(type='MultiheadAttention', embed_dims=256, num_heads=8),
+    if embed_dims:
+        ffn_cfgs = dict(
+            type='FFN',
+            embed_dims=embed_dims,
+            feedforward_channels=1024,
+            num_fcs=2,
+            ffn_drop=0.,
+            act_cfg=dict(type='ReLU', inplace=True),
+        )
+    else:
+        ffn_cfgs = dict(
+            type='FFN',
+            feedforward_channels=1024,
+            num_fcs=2,
+            ffn_drop=0.,
+            act_cfg=dict(type='ReLU', inplace=True),
+        )
+
+    feedforward_channels = 2048
+    ffn_dropout = 0.1
+    operation_order = ('self_attn', 'norm', 'ffn', 'norm')
+
+    # test deprecated_args
+    baselayer = BaseTransformerLayer(
+        attn_cfgs=attn_cfgs,
+        ffn_cfgs=ffn_cfgs,
+        feedforward_channels=feedforward_channels,
+        ffn_dropout=ffn_dropout,
+        operation_order=operation_order)
+    assert baselayer.batch_first is False
+    assert baselayer.ffns[0].feedforward_channels == feedforward_channels
+
+    attn_cfgs = dict(type='MultiheadAttention', num_heads=8, embed_dims=256),
+    feedforward_channels = 2048
+    ffn_dropout = 0.1
+    operation_order = ('self_attn', 'norm', 'ffn', 'norm')
+    baselayer = BaseTransformerLayer(
+        attn_cfgs=attn_cfgs,
+        feedforward_channels=feedforward_channels,
+        ffn_dropout=ffn_dropout,
+        operation_order=operation_order,
+        batch_first=True)
+    assert baselayer.attentions[0].batch_first
+    in_tensor = torch.rand(2, 10, 256)
+    baselayer(in_tensor)
+
+
+def test_transformerlayersequence():
+    squeue = TransformerLayerSequence(
+        num_layers=6,
+        transformerlayers=dict(
+            type='BaseTransformerLayer',
+            attn_cfgs=[
+                dict(
+                    type='MultiheadAttention',
+                    embed_dims=256,
+                    num_heads=8,
+                    dropout=0.1),
+                dict(type='MultiheadAttention', embed_dims=256, num_heads=4)
+            ],
+            feedforward_channels=1024,
+            ffn_dropout=0.1,
+            operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn',
+                             'norm')))
+    assert len(squeue.layers) == 6
+    assert squeue.pre_norm is False
+    with pytest.raises(AssertionError):
+        # if transformerlayers is a list, len(transformerlayers)
+        # should be equal to num_layers
+        TransformerLayerSequence(
+            num_layers=6,
+            transformerlayers=[
+                dict(
+                    type='BaseTransformerLayer',
+                    attn_cfgs=[
+                        dict(
+                            type='MultiheadAttention',
+                            embed_dims=256,
+                            num_heads=8,
+                            dropout=0.1),
+                        dict(type='MultiheadAttention', embed_dims=256)
+                    ],
+                    feedforward_channels=1024,
+                    ffn_dropout=0.1,
+                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
+                                     'ffn', 'norm'))
+            ])
+
+
+def test_drop_path():
+    drop_path = DropPath(drop_prob=0)
+    test_in = torch.rand(2, 3, 4, 5)
+    assert test_in is drop_path(test_in)
+
+    drop_path = DropPath(drop_prob=0.1)
+    drop_path.training = False
+    test_in = torch.rand(2, 3, 4, 5)
+    assert test_in is drop_path(test_in)
+    drop_path.training = True
+    assert test_in is not drop_path(test_in)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_wrappers.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_wrappers.py
new file mode 100644
index 0000000000000000000000000000000000000000..02e0f13cd790de613fac388d54a7f33bc2b9ce5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_cnn/test_wrappers.py
@@ -0,0 +1,376 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest.mock import patch
+
+import pytest
+import torch
+import torch.nn as nn
+
+from mmcv.cnn.bricks import (Conv2d, Conv3d, ConvTranspose2d, ConvTranspose3d,
+                             Linear, MaxPool2d, MaxPool3d)
+
+if torch.__version__ != 'parrots':
+    torch_version = '1.1'
+else:
+    torch_version = 'parrots'
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_channel,out_channel,kernel_size,stride,padding,dilation',
+    [(10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 3, 3, 5, 2, 1, 2)])
+def test_conv2d(in_w, in_h, in_channel, out_channel, kernel_size, stride,
+                padding, dilation):
+    """
+    CommandLine:
+        xdoctest -m tests/test_wrappers.py test_conv2d
+    """
+    # train mode
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_h, in_w)
+    torch.manual_seed(0)
+    wrapper = Conv2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_h, in_w).requires_grad_(True)
+    torch.manual_seed(0)
+    ref = nn.Conv2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    wrapper_out.sum().backward()
+    assert wrapper.weight.grad is not None
+    assert wrapper.weight.grad.shape == wrapper.weight.shape
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+    # eval mode
+    x_empty = torch.randn(0, in_channel, in_h, in_w)
+    wrapper = Conv2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    wrapper.eval()
+    wrapper(x_empty)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_t,in_channel,out_channel,kernel_size,stride,padding,dilation',  # noqa: E501
+    [(10, 10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 20, 3, 3, 5, 2, 1, 2)])
+def test_conv3d(in_w, in_h, in_t, in_channel, out_channel, kernel_size, stride,
+                padding, dilation):
+    """
+    CommandLine:
+        xdoctest -m tests/test_wrappers.py test_conv3d
+    """
+    # train mode
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_t, in_h, in_w)
+    torch.manual_seed(0)
+    wrapper = Conv3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_t, in_h,
+                           in_w).requires_grad_(True)
+    torch.manual_seed(0)
+    ref = nn.Conv3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    wrapper_out.sum().backward()
+    assert wrapper.weight.grad is not None
+    assert wrapper.weight.grad.shape == wrapper.weight.shape
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+    # eval mode
+    x_empty = torch.randn(0, in_channel, in_t, in_h, in_w)
+    wrapper = Conv3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation)
+    wrapper.eval()
+    wrapper(x_empty)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_channel,out_channel,kernel_size,stride,padding,dilation',
+    [(10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 3, 3, 5, 2, 1, 2)])
+def test_conv_transposed_2d(in_w, in_h, in_channel, out_channel, kernel_size,
+                            stride, padding, dilation):
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_h, in_w, requires_grad=True)
+    # out padding must be smaller than either stride or dilation
+    op = min(stride, dilation) - 1
+    if torch.__version__ == 'parrots':
+        op = 0
+    torch.manual_seed(0)
+    wrapper = ConvTranspose2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_h, in_w)
+    torch.manual_seed(0)
+    ref = nn.ConvTranspose2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    wrapper_out.sum().backward()
+    assert wrapper.weight.grad is not None
+    assert wrapper.weight.grad.shape == wrapper.weight.shape
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+    # eval mode
+    x_empty = torch.randn(0, in_channel, in_h, in_w)
+    wrapper = ConvTranspose2d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    wrapper.eval()
+    wrapper(x_empty)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_t,in_channel,out_channel,kernel_size,stride,padding,dilation',  # noqa: E501
+    [(10, 10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 20, 3, 3, 5, 2, 1, 2)])
+def test_conv_transposed_3d(in_w, in_h, in_t, in_channel, out_channel,
+                            kernel_size, stride, padding, dilation):
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_t, in_h, in_w, requires_grad=True)
+    # out padding must be smaller than either stride or dilation
+    op = min(stride, dilation) - 1
+    torch.manual_seed(0)
+    wrapper = ConvTranspose3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_t, in_h, in_w)
+    torch.manual_seed(0)
+    ref = nn.ConvTranspose3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    wrapper_out.sum().backward()
+    assert wrapper.weight.grad is not None
+    assert wrapper.weight.grad.shape == wrapper.weight.shape
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+    # eval mode
+    x_empty = torch.randn(0, in_channel, in_t, in_h, in_w)
+    wrapper = ConvTranspose3d(
+        in_channel,
+        out_channel,
+        kernel_size,
+        stride=stride,
+        padding=padding,
+        dilation=dilation,
+        output_padding=op)
+    wrapper.eval()
+    wrapper(x_empty)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_channel,out_channel,kernel_size,stride,padding,dilation',
+    [(10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 3, 3, 5, 2, 1, 2)])
+def test_max_pool_2d(in_w, in_h, in_channel, out_channel, kernel_size, stride,
+                     padding, dilation):
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_h, in_w, requires_grad=True)
+    wrapper = MaxPool2d(
+        kernel_size, stride=stride, padding=padding, dilation=dilation)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_h, in_w)
+    ref = nn.MaxPool2d(
+        kernel_size, stride=stride, padding=padding, dilation=dilation)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize(
+    'in_w,in_h,in_t,in_channel,out_channel,kernel_size,stride,padding,dilation',  # noqa: E501
+    [(10, 10, 10, 1, 1, 3, 1, 0, 1), (20, 20, 20, 3, 3, 5, 2, 1, 2)])
+@pytest.mark.skipif(
+    torch.__version__ == 'parrots' and not torch.cuda.is_available(),
+    reason='parrots requires CUDA support')
+def test_max_pool_3d(in_w, in_h, in_t, in_channel, out_channel, kernel_size,
+                     stride, padding, dilation):
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_channel, in_t, in_h, in_w, requires_grad=True)
+    wrapper = MaxPool3d(
+        kernel_size, stride=stride, padding=padding, dilation=dilation)
+    if torch.__version__ == 'parrots':
+        x_empty = x_empty.cuda()
+    wrapper_out = wrapper(x_empty)
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_channel, in_t, in_h, in_w)
+    ref = nn.MaxPool3d(
+        kernel_size, stride=stride, padding=padding, dilation=dilation)
+    if torch.__version__ == 'parrots':
+        x_normal = x_normal.cuda()
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+
+@patch('torch.__version__', torch_version)
+@pytest.mark.parametrize('in_w,in_h,in_feature,out_feature', [(10, 10, 1, 1),
+                                                              (20, 20, 3, 3)])
+def test_linear(in_w, in_h, in_feature, out_feature):
+    # wrapper op with 0-dim input
+    x_empty = torch.randn(0, in_feature, requires_grad=True)
+    torch.manual_seed(0)
+    wrapper = Linear(in_feature, out_feature)
+    wrapper_out = wrapper(x_empty)
+
+    # torch op with 3-dim input as shape reference
+    x_normal = torch.randn(3, in_feature)
+    torch.manual_seed(0)
+    ref = nn.Linear(in_feature, out_feature)
+    ref_out = ref(x_normal)
+
+    assert wrapper_out.shape[0] == 0
+    assert wrapper_out.shape[1:] == ref_out.shape[1:]
+
+    wrapper_out.sum().backward()
+    assert wrapper.weight.grad is not None
+    assert wrapper.weight.grad.shape == wrapper.weight.shape
+
+    assert torch.equal(wrapper(x_normal), ref_out)
+
+    # eval mode
+    x_empty = torch.randn(0, in_feature)
+    wrapper = Linear(in_feature, out_feature)
+    wrapper.eval()
+    wrapper(x_empty)
+
+
+@patch('mmcv.cnn.bricks.wrappers.TORCH_VERSION', (1, 10))
+def test_nn_op_forward_called():
+
+    for m in ['Conv2d', 'ConvTranspose2d', 'MaxPool2d']:
+        with patch(f'torch.nn.{m}.forward') as nn_module_forward:
+            # randn input
+            x_empty = torch.randn(0, 3, 10, 10)
+            wrapper = eval(m)(3, 2, 1)
+            wrapper(x_empty)
+            nn_module_forward.assert_called_with(x_empty)
+
+            # non-randn input
+            x_normal = torch.randn(1, 3, 10, 10)
+            wrapper = eval(m)(3, 2, 1)
+            wrapper(x_normal)
+            nn_module_forward.assert_called_with(x_normal)
+
+    for m in ['Conv3d', 'ConvTranspose3d', 'MaxPool3d']:
+        with patch(f'torch.nn.{m}.forward') as nn_module_forward:
+            # randn input
+            x_empty = torch.randn(0, 3, 10, 10, 10)
+            wrapper = eval(m)(3, 2, 1)
+            wrapper(x_empty)
+            nn_module_forward.assert_called_with(x_empty)
+
+            # non-randn input
+            x_normal = torch.randn(1, 3, 10, 10, 10)
+            wrapper = eval(m)(3, 2, 1)
+            wrapper(x_normal)
+            nn_module_forward.assert_called_with(x_normal)
+
+    with patch('torch.nn.Linear.forward') as nn_module_forward:
+        # randn input
+        x_empty = torch.randn(0, 3)
+        wrapper = Linear(3, 3)
+        wrapper(x_empty)
+        nn_module_forward.assert_called_with(x_empty)
+
+        # non-randn input
+        x_normal = torch.randn(1, 3)
+        wrapper = Linear(3, 3)
+        wrapper(x_normal)
+        nn_module_forward.assert_called_with(x_normal)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_colorspace.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_colorspace.py
new file mode 100644
index 0000000000000000000000000000000000000000..d53e4e44da7bf656fa5b35cb042eb2ee37979a42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_colorspace.py
@@ -0,0 +1,355 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import cv2
+import numpy as np
+import pytest
+from numpy.testing import assert_array_almost_equal, assert_array_equal
+
+import mmcv
+from mmcv.image.colorspace import (_convert_input_type_range,
+                                   _convert_output_type_range)
+
+
+def test_bgr2gray():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2gray(in_img)
+    computed_gray = (
+        in_img[:, :, 0] * 0.114 + in_img[:, :, 1] * 0.587 +
+        in_img[:, :, 2] * 0.299)
+    assert_array_almost_equal(out_img, computed_gray, decimal=4)
+    out_img_3d = mmcv.bgr2gray(in_img, True)
+    assert out_img_3d.shape == (10, 10, 1)
+    assert_array_almost_equal(out_img_3d[..., 0], out_img, decimal=4)
+
+
+def test_rgb2gray():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.rgb2gray(in_img)
+    computed_gray = (
+        in_img[:, :, 0] * 0.299 + in_img[:, :, 1] * 0.587 +
+        in_img[:, :, 2] * 0.114)
+    assert_array_almost_equal(out_img, computed_gray, decimal=4)
+    out_img_3d = mmcv.rgb2gray(in_img, True)
+    assert out_img_3d.shape == (10, 10, 1)
+    assert_array_almost_equal(out_img_3d[..., 0], out_img, decimal=4)
+
+
+def test_gray2bgr():
+    in_img = np.random.rand(10, 10).astype(np.float32)
+    out_img = mmcv.gray2bgr(in_img)
+    assert out_img.shape == (10, 10, 3)
+    for i in range(3):
+        assert_array_almost_equal(out_img[..., i], in_img, decimal=4)
+
+
+def test_gray2rgb():
+    in_img = np.random.rand(10, 10).astype(np.float32)
+    out_img = mmcv.gray2rgb(in_img)
+    assert out_img.shape == (10, 10, 3)
+    for i in range(3):
+        assert_array_almost_equal(out_img[..., i], in_img, decimal=4)
+
+
+def test_bgr2rgb():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2rgb(in_img)
+    assert out_img.shape == in_img.shape
+    assert_array_equal(out_img[..., 0], in_img[..., 2])
+    assert_array_equal(out_img[..., 1], in_img[..., 1])
+    assert_array_equal(out_img[..., 2], in_img[..., 0])
+
+
+def test_rgb2bgr():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.rgb2bgr(in_img)
+    assert out_img.shape == in_img.shape
+    assert_array_equal(out_img[..., 0], in_img[..., 2])
+    assert_array_equal(out_img[..., 1], in_img[..., 1])
+    assert_array_equal(out_img[..., 2], in_img[..., 0])
+
+
+def test_bgr2hsv():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2hsv(in_img)
+    argmax = in_img.argmax(axis=2)
+    computed_hsv = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            v = max(r, g, b)
+            s = (v - min(r, g, b)) / v if v != 0 else 0
+            if argmax[i, j] == 0:
+                h = 240 + 60 * (r - g) / (v - min(r, g, b))
+            elif argmax[i, j] == 1:
+                h = 120 + 60 * (b - r) / (v - min(r, g, b))
+            else:
+                h = 60 * (g - b) / (v - min(r, g, b))
+            if h < 0:
+                h += 360
+            computed_hsv[i, j, :] = [h, s, v]
+    assert_array_almost_equal(out_img, computed_hsv, decimal=2)
+
+
+def test_convert_input_type_range():
+    with pytest.raises(TypeError):
+        # The img type should be np.float32 or np.uint8
+        in_img = np.random.rand(10, 10, 3).astype(np.uint64)
+        _convert_input_type_range(in_img)
+    # np.float32
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = _convert_input_type_range(in_img)
+    assert out_img.dtype == np.float32
+    assert np.absolute(out_img).mean() < 1
+    # np.uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = _convert_input_type_range(in_img)
+    assert out_img.dtype == np.float32
+    assert np.absolute(out_img).mean() < 1
+
+
+def test_convert_output_type_range():
+    with pytest.raises(TypeError):
+        # The dst_type should be np.float32 or np.uint8
+        in_img = np.random.rand(10, 10, 3).astype(np.float32)
+        _convert_output_type_range(in_img, np.uint64)
+    # np.float32
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.float32)
+    out_img = _convert_output_type_range(in_img, np.float32)
+    assert out_img.dtype == np.float32
+    assert np.absolute(out_img).mean() < 1
+    # np.uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.float32)
+    out_img = _convert_output_type_range(in_img, np.uint8)
+    assert out_img.dtype == np.uint8
+    assert np.absolute(out_img).mean() > 1
+
+
+def assert_image_almost_equal(x, y, atol=1):
+    assert x.dtype == np.uint8
+    assert y.dtype == np.uint8
+    assert np.all(np.abs(x.astype(np.int32) - y.astype(np.int32)) <= atol)
+
+
+def test_rgb2ycbcr():
+    with pytest.raises(TypeError):
+        # The img type should be np.float32 or np.uint8
+        in_img = np.random.rand(10, 10, 3).astype(np.uint64)
+        mmcv.rgb2ycbcr(in_img)
+
+    # float32
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.rgb2ycbcr(in_img)
+    computed_ycbcr = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            r, g, b = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            cb = 128 - r * 37.797 - g * 74.203 + b * 112.0
+            cr = 128 + r * 112.0 - g * 93.786 - b * 18.214
+            computed_ycbcr[i, j, :] = [y, cb, cr]
+    computed_ycbcr /= 255.
+    assert_array_almost_equal(out_img, computed_ycbcr, decimal=2)
+    # y_only=True
+    out_img = mmcv.rgb2ycbcr(in_img, y_only=True)
+    computed_y = np.empty_like(out_img, dtype=out_img.dtype)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            r, g, b = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            computed_y[i, j] = y
+    computed_y /= 255.
+    assert_array_almost_equal(out_img, computed_y, decimal=2)
+
+    # uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.rgb2ycbcr(in_img)
+    computed_ycbcr = np.empty_like(in_img)
+    in_img = in_img / 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            r, g, b = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            cb = 128 - r * 37.797 - g * 74.203 + b * 112.0
+            cr = 128 + r * 112.0 - g * 93.786 - b * 18.214
+            y, cb, cr = y.round(), cb.round(), cr.round()
+            computed_ycbcr[i, j, :] = [y, cb, cr]
+    assert_image_almost_equal(out_img, computed_ycbcr)
+    # y_only=True
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.rgb2ycbcr(in_img, y_only=True)
+    computed_y = np.empty_like(out_img, dtype=out_img.dtype)
+    in_img = in_img / 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            r, g, b = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            y = y.round()
+            computed_y[i, j] = y
+    assert_image_almost_equal(out_img, computed_y)
+
+
+def test_bgr2ycbcr():
+    # float32
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2ycbcr(in_img)
+    computed_ycbcr = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            cb = 128 - r * 37.797 - g * 74.203 + b * 112.0
+            cr = 128 + r * 112.0 - g * 93.786 - b * 18.214
+            computed_ycbcr[i, j, :] = [y, cb, cr]
+    computed_ycbcr /= 255.
+    assert_array_almost_equal(out_img, computed_ycbcr, decimal=2)
+    # y_only=True
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2ycbcr(in_img, y_only=True)
+    computed_y = np.empty_like(out_img, dtype=out_img.dtype)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            computed_y[i, j] = y
+    computed_y /= 255.
+    assert_array_almost_equal(out_img, computed_y, decimal=2)
+
+    # uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.bgr2ycbcr(in_img)
+    computed_ycbcr = np.empty_like(in_img)
+    in_img = in_img / 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            cb = 128 - r * 37.797 - g * 74.203 + b * 112.0
+            cr = 128 + r * 112.0 - g * 93.786 - b * 18.214
+            y, cb, cr = y.round(), cb.round(), cr.round()
+            computed_ycbcr[i, j, :] = [y, cb, cr]
+    assert_image_almost_equal(out_img, computed_ycbcr)
+    # y_only = True
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.bgr2ycbcr(in_img, y_only=True)
+    computed_y = np.empty_like(out_img, dtype=out_img.dtype)
+    in_img = in_img / 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            y = 16 + r * 65.481 + g * 128.553 + b * 24.966
+            y = y.round()
+            computed_y[i, j] = y
+    assert_image_almost_equal(out_img, computed_y)
+
+
+def test_ycbcr2rgb():
+    with pytest.raises(TypeError):
+        # The img type should be np.float32 or np.uint8
+        in_img = np.random.rand(10, 10, 3).astype(np.uint64)
+        mmcv.ycbcr2rgb(in_img)
+
+    # float32
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.ycbcr2rgb(in_img)
+    computed_rgb = np.empty_like(in_img)
+    in_img *= 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            y, cb, cr = in_img[i, j]
+            r = -222.921 + y * 0.00456621 * 255 + cr * 0.00625893 * 255
+            g = 135.576 + y * 0.00456621 * 255 - cb * 0.00153632 * 255 - \
+                cr * 0.00318811 * 255
+            b = -276.836 + y * 0.00456621 * 255. + cb * 0.00791071 * 255
+            computed_rgb[i, j, :] = [r, g, b]
+    computed_rgb /= 255.
+    assert_array_almost_equal(out_img, computed_rgb, decimal=2)
+
+    # uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.ycbcr2rgb(in_img)
+    computed_rgb = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            y, cb, cr = in_img[i, j]
+            r = -222.921 + y * 0.00456621 * 255 + cr * 0.00625893 * 255
+            g = 135.576 + y * 0.00456621 * 255 - cb * 0.00153632 * 255 - \
+                cr * 0.00318811 * 255
+            b = -276.836 + y * 0.00456621 * 255. + cb * 0.00791071 * 255
+            r, g, b = r.round(), g.round(), b.round()
+            computed_rgb[i, j, :] = [r, g, b]
+    assert_image_almost_equal(out_img, computed_rgb)
+
+
+def test_ycbcr2bgr():
+    # float32
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.ycbcr2bgr(in_img)
+    computed_bgr = np.empty_like(in_img)
+    in_img *= 255.
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            y, cb, cr = in_img[i, j]
+            r = -222.921 + y * 0.00456621 * 255 + cr * 0.00625893 * 255
+            g = 135.576 + y * 0.00456621 * 255 - cb * 0.00153632 * 255 - \
+                cr * 0.00318811 * 255
+            b = -276.836 + y * 0.00456621 * 255. + cb * 0.00791071 * 255
+            computed_bgr[i, j, :] = [b, g, r]
+    computed_bgr /= 255.
+    assert_array_almost_equal(out_img, computed_bgr, decimal=2)
+
+    # uint8
+    in_img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+    out_img = mmcv.ycbcr2bgr(in_img)
+    computed_bgr = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            y, cb, cr = in_img[i, j]
+            r = -222.921 + y * 0.00456621 * 255 + cr * 0.00625893 * 255
+            g = 135.576 + y * 0.00456621 * 255 - cb * 0.00153632 * 255 - \
+                cr * 0.00318811 * 255
+            b = -276.836 + y * 0.00456621 * 255. + cb * 0.00791071 * 255
+            r, g, b = r.round(), g.round(), b.round()
+            computed_bgr[i, j, :] = [b, g, r]
+    assert_image_almost_equal(out_img, computed_bgr)
+
+
+def test_bgr2hls():
+    in_img = np.random.rand(10, 10, 3).astype(np.float32)
+    out_img = mmcv.bgr2hls(in_img)
+    argmax = in_img.argmax(axis=2)
+    computed_hls = np.empty_like(in_img)
+    for i in range(in_img.shape[0]):
+        for j in range(in_img.shape[1]):
+            b, g, r = in_img[i, j]
+            maxc = max(r, g, b)
+            minc = min(r, g, b)
+            _l = (minc + maxc) / 2.0
+            if minc == maxc:
+                h = 0.0
+                s = 0.0
+            if _l <= 0.5:
+                s = (maxc - minc) / (maxc + minc)
+            else:
+                s = (maxc - minc) / (2.0 - maxc - minc)
+            if argmax[i, j] == 2:
+                h = 60 * (g - b) / (maxc - minc)
+            elif argmax[i, j] == 1:
+                h = 60 * (2.0 + (b - r) / (maxc - minc))
+            else:
+                h = 60 * (4.0 + (r - g) / (maxc - minc))
+            if h < 0:
+                h += 360
+            computed_hls[i, j, :] = [h, _l, s]
+    assert_array_almost_equal(out_img, computed_hls, decimal=2)
+
+
+@pytest.mark.parametrize('src,dst,ref', [('bgr', 'gray', cv2.COLOR_BGR2GRAY),
+                                         ('rgb', 'gray', cv2.COLOR_RGB2GRAY),
+                                         ('bgr', 'rgb', cv2.COLOR_BGR2RGB),
+                                         ('rgb', 'bgr', cv2.COLOR_RGB2BGR),
+                                         ('bgr', 'hsv', cv2.COLOR_BGR2HSV),
+                                         ('hsv', 'bgr', cv2.COLOR_HSV2BGR),
+                                         ('bgr', 'hls', cv2.COLOR_BGR2HLS),
+                                         ('hls', 'bgr', cv2.COLOR_HLS2BGR)])
+def test_imconvert(src, dst, ref):
+    img = np.random.rand(10, 10, 3).astype(np.float32)
+    assert_array_equal(mmcv.imconvert(img, src, dst), cv2.cvtColor(img, ref))
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_geometric.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_geometric.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6409d7e5c73d22fffebf87b2b2d84ea05e83c16
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_geometric.py
@@ -0,0 +1,617 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+
+import cv2
+import numpy as np
+import pytest
+from numpy.testing import assert_array_equal
+
+import mmcv
+
+
+class TestGeometric:
+
+    @classmethod
+    def setup_class(cls):
+        cls.data_dir = osp.join(osp.dirname(__file__), '../data')
+        # the test img resolution is 400x300
+        cls.img_path = osp.join(cls.data_dir, 'color.jpg')
+        cls.img = cv2.imread(cls.img_path)
+
+    def test_imresize(self):
+        resized_img = mmcv.imresize(self.img, (1000, 600))
+        assert resized_img.shape == (600, 1000, 3)
+        resized_img, w_scale, h_scale = mmcv.imresize(self.img, (1000, 600),
+                                                      True)
+        assert (resized_img.shape == (600, 1000, 3) and w_scale == 2.5
+                and h_scale == 2.0)
+        resized_img_dst = np.empty((600, 1000, 3), dtype=self.img.dtype)
+        resized_img = mmcv.imresize(self.img, (1000, 600), out=resized_img_dst)
+        assert id(resized_img_dst) == id(resized_img)
+        assert_array_equal(resized_img_dst,
+                           mmcv.imresize(self.img, (1000, 600)))
+        for mode in ['nearest', 'bilinear', 'bicubic', 'area', 'lanczos']:
+            resized_img = mmcv.imresize(
+                self.img, (1000, 600), interpolation=mode)
+            assert resized_img.shape == (600, 1000, 3)
+
+        # test pillow resize
+        for mode in [
+                'nearest', 'bilinear', 'bicubic', 'box', 'lanczos', 'hamming'
+        ]:
+            resized_img = mmcv.imresize(
+                self.img, (1000, 600), interpolation=mode, backend='pillow')
+            assert resized_img.shape == (600, 1000, 3)
+
+        # resize backend must be 'cv2' or 'pillow'
+        with pytest.raises(ValueError):
+            mmcv.imresize(self.img, (1000, 600), backend='not support')
+
+    def test_imresize_to_multiple(self):
+        # test size and keep_ratio = False
+        resized_img = mmcv.imresize_to_multiple(
+            self.img, divisor=16, size=(511, 513), keep_ratio=False)
+        assert resized_img.shape == (528, 512, 3)
+        resized_img = mmcv.imresize_to_multiple(
+            self.img, divisor=(16, 32), size=(511, 513), keep_ratio=False)
+        assert resized_img.shape == (544, 512, 3)
+
+        # test size, keep_ratio = True, and return_scale
+        resized_img, w_scale, h_scale = mmcv.imresize_to_multiple(
+            self.img,
+            divisor=16,
+            size=(1000, 600),
+            keep_ratio=True,
+            return_scale=True)
+        assert resized_img.shape == (
+            608, 800, 3) and h_scale == 608 / 300 and w_scale == 800 / 400
+        resized_img, w_scale, h_scale = mmcv.imresize_to_multiple(
+            self.img,
+            divisor=(18, 16),
+            size=(1000, 600),
+            keep_ratio=True,
+            return_scale=True)
+        assert resized_img.shape == (
+            608, 810, 3) and h_scale == 608 / 300 and w_scale == 810 / 400
+
+        # test scale_factor and return_scale
+        resized_img, w_scale, h_scale = mmcv.imresize_to_multiple(
+            self.img, divisor=16, scale_factor=2, return_scale=True)
+        assert resized_img.shape == (
+            608, 800, 3) and h_scale == 608 / 300 and w_scale == 800 / 400
+        resized_img, w_scale, h_scale = mmcv.imresize_to_multiple(
+            self.img, divisor=16, scale_factor=(2, 3), return_scale=True)
+        assert resized_img.shape == (
+            912, 800, 3) and h_scale == 912 / 300 and w_scale == 800 / 400
+        resized_img, w_scale, h_scale = mmcv.imresize_to_multiple(
+            self.img, divisor=(18, 16), scale_factor=(2, 3), return_scale=True)
+        assert resized_img.shape == (
+            912, 810, 3) and h_scale == 912 / 300 and w_scale == 810 / 400
+
+        # one of size and scale_factor should be given
+        with pytest.raises(ValueError):
+            mmcv.imresize_to_multiple(
+                self.img, divisor=16, size=(1000, 600), scale_factor=2)
+        with pytest.raises(ValueError):
+            mmcv.imresize_to_multiple(
+                self.img, divisor=16, size=None, scale_factor=None)
+
+    def test_imresize_like(self):
+        a = np.zeros((100, 200, 3))
+        resized_img = mmcv.imresize_like(self.img, a)
+        assert resized_img.shape == (100, 200, 3)
+
+    def test_rescale_size(self):
+        new_size, scale_factor = mmcv.rescale_size((400, 300), 1.5, True)
+        assert new_size == (600, 450) and scale_factor == 1.5
+        new_size, scale_factor = mmcv.rescale_size((400, 300), 0.934, True)
+        assert new_size == (374, 280) and scale_factor == 0.934
+
+        new_size = mmcv.rescale_size((400, 300), 1.5)
+        assert new_size == (600, 450)
+        new_size = mmcv.rescale_size((400, 300), 0.934)
+        assert new_size == (374, 280)
+
+        new_size, scale_factor = mmcv.rescale_size((400, 300), (1000, 600),
+                                                   True)
+        assert new_size == (800, 600) and scale_factor == 2.0
+        new_size, scale_factor = mmcv.rescale_size((400, 300), (180, 200),
+                                                   True)
+        assert new_size == (200, 150) and scale_factor == 0.5
+
+        new_size = mmcv.rescale_size((400, 300), (1000, 600))
+        assert new_size == (800, 600)
+        new_size = mmcv.rescale_size((400, 300), (180, 200))
+        assert new_size == (200, 150)
+
+        with pytest.raises(ValueError):
+            mmcv.rescale_size((400, 300), -0.5)
+        with pytest.raises(TypeError):
+            mmcv.rescale_size()((400, 300), [100, 100])
+
+    def test_imrescale(self):
+        # rescale by a certain factor
+        resized_img = mmcv.imrescale(self.img, 1.5)
+        assert resized_img.shape == (450, 600, 3)
+        resized_img = mmcv.imrescale(self.img, 0.934)
+        assert resized_img.shape == (280, 374, 3)
+
+        # rescale by a certain max_size
+        # resize (400, 300) to (max_1000, max_600)
+        resized_img = mmcv.imrescale(self.img, (1000, 600))
+        assert resized_img.shape == (600, 800, 3)
+        resized_img, scale = mmcv.imrescale(
+            self.img, (1000, 600), return_scale=True)
+        assert resized_img.shape == (600, 800, 3) and scale == 2.0
+        # resize (400, 300) to (max_200, max_180)
+        resized_img = mmcv.imrescale(self.img, (180, 200))
+        assert resized_img.shape == (150, 200, 3)
+        resized_img, scale = mmcv.imrescale(
+            self.img, (180, 200), return_scale=True)
+        assert resized_img.shape == (150, 200, 3) and scale == 0.5
+
+        # test exceptions
+        with pytest.raises(ValueError):
+            mmcv.imrescale(self.img, -0.5)
+        with pytest.raises(TypeError):
+            mmcv.imrescale(self.img, [100, 100])
+
+    def test_imflip(self):
+        # direction must be "horizontal" or "vertical" or "diagonal"
+        with pytest.raises(AssertionError):
+            mmcv.imflip(np.random.rand(80, 60, 3), direction='random')
+
+        # test horizontal flip (color image)
+        img = np.random.rand(80, 60, 3)
+        h, w, c = img.shape
+        flipped_img = mmcv.imflip(img)
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[i, w - 1 - j, k]
+
+        # test vertical flip (color image)
+        flipped_img = mmcv.imflip(img, direction='vertical')
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[h - 1 - i, j, k]
+
+        # test diagonal flip (color image)
+        flipped_img = mmcv.imflip(img, direction='diagonal')
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[h - 1 - i, w - 1 - j, k]
+
+        # test horizontal flip (grayscale image)
+        img = np.random.rand(80, 60)
+        h, w = img.shape
+        flipped_img = mmcv.imflip(img)
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[i, w - 1 - j]
+
+        # test vertical flip (grayscale image)
+        flipped_img = mmcv.imflip(img, direction='vertical')
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[h - 1 - i, j]
+
+        # test diagonal flip (grayscale image)
+        flipped_img = mmcv.imflip(img, direction='diagonal')
+        assert flipped_img.shape == img.shape
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[h - 1 - i, w - 1 - j]
+
+    def test_imflip_(self):
+        # direction must be "horizontal" or "vertical" or "diagonal"
+        with pytest.raises(AssertionError):
+            mmcv.imflip_(np.random.rand(80, 60, 3), direction='random')
+
+        # test horizontal flip (color image)
+        img = np.random.rand(80, 60, 3)
+        h, w, c = img.shape
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip)
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[i, w - 1 - j, k]
+                    assert flipped_img[i, j, k] == img_for_flip[i, j, k]
+
+        # test vertical flip (color image)
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip, direction='vertical')
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[h - 1 - i, j, k]
+                    assert flipped_img[i, j, k] == img_for_flip[i, j, k]
+
+        # test diagonal flip (color image)
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip, direction='diagonal')
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                for k in range(c):
+                    assert flipped_img[i, j, k] == img[h - 1 - i, w - 1 - j, k]
+                    assert flipped_img[i, j, k] == img_for_flip[i, j, k]
+
+        # test horizontal flip (grayscale image)
+        img = np.random.rand(80, 60)
+        h, w = img.shape
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip)
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[i, w - 1 - j]
+                assert flipped_img[i, j] == img_for_flip[i, j]
+
+        # test vertical flip (grayscale image)
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip, direction='vertical')
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[h - 1 - i, j]
+                assert flipped_img[i, j] == img_for_flip[i, j]
+
+        # test diagonal flip (grayscale image)
+        img_for_flip = img.copy()
+        flipped_img = mmcv.imflip_(img_for_flip, direction='diagonal')
+        assert flipped_img.shape == img.shape
+        assert flipped_img.shape == img_for_flip.shape
+        assert id(flipped_img) == id(img_for_flip)
+        for i in range(h):
+            for j in range(w):
+                assert flipped_img[i, j] == img[h - 1 - i, w - 1 - j]
+                assert flipped_img[i, j] == img_for_flip[i, j]
+
+    def test_imcrop(self):
+        # yapf: disable
+        bboxes = np.array([[100, 100, 199, 199],  # center
+                           [0, 0, 150, 100],  # left-top corner
+                           [250, 200, 399, 299],  # right-bottom corner
+                           [0, 100, 399, 199],  # wide
+                           [150, 0, 299, 299]])  # tall
+        # yapf: enable
+
+        # crop one bbox
+        patch = mmcv.imcrop(self.img, bboxes[0, :])
+        patches = mmcv.imcrop(self.img, bboxes[[0], :])
+        assert patch.shape == (100, 100, 3)
+        patch_path = osp.join(self.data_dir, 'patches')
+        ref_patch = np.load(patch_path + '/0.npy')
+        assert_array_equal(patch, ref_patch)
+        assert isinstance(patches, list) and len(patches) == 1
+        assert_array_equal(patches[0], ref_patch)
+
+        # crop with no scaling and padding
+        patches = mmcv.imcrop(self.img, bboxes)
+        assert len(patches) == bboxes.shape[0]
+        for i in range(len(patches)):
+            ref_patch = np.load(patch_path + f'/{i}.npy')
+            assert_array_equal(patches[i], ref_patch)
+
+        # crop with scaling and no padding
+        patches = mmcv.imcrop(self.img, bboxes, 1.2)
+        for i in range(len(patches)):
+            ref_patch = np.load(patch_path + f'/scale_{i}.npy')
+            assert_array_equal(patches[i], ref_patch)
+
+        # crop with scaling and padding
+        patches = mmcv.imcrop(self.img, bboxes, 1.2, pad_fill=[255, 255, 0])
+        for i in range(len(patches)):
+            ref_patch = np.load(patch_path + f'/pad_{i}.npy')
+            assert_array_equal(patches[i], ref_patch)
+        patches = mmcv.imcrop(self.img, bboxes, 1.2, pad_fill=0)
+        for i in range(len(patches)):
+            ref_patch = np.load(patch_path + f'/pad0_{i}.npy')
+            assert_array_equal(patches[i], ref_patch)
+
+    def test_impad(self):
+        # grayscale image
+        img = np.random.rand(10, 10).astype(np.float32)
+        padded_img = mmcv.impad(img, padding=(0, 0, 2, 5), pad_val=0)
+        assert_array_equal(img, padded_img[:10, :10])
+        assert_array_equal(
+            np.zeros((5, 12), dtype='float32'), padded_img[10:, :])
+        assert_array_equal(
+            np.zeros((15, 2), dtype='float32'), padded_img[:, 10:])
+
+        # RGB image
+        img = np.random.rand(10, 10, 3).astype(np.float32)
+        padded_img = mmcv.impad(img, padding=(0, 0, 2, 5), pad_val=0)
+        assert_array_equal(img, padded_img[:10, :10, :])
+        assert_array_equal(
+            np.zeros((5, 12, 3), dtype='float32'), padded_img[10:, :, :])
+        assert_array_equal(
+            np.zeros((15, 2, 3), dtype='float32'), padded_img[:, 10:, :])
+
+        # RGB image with different values for three channels.
+        img = np.random.randint(256, size=(10, 10, 3)).astype('uint8')
+        padded_img = mmcv.impad(
+            img, padding=(0, 0, 2, 5), pad_val=(100, 110, 120))
+        assert_array_equal(img, padded_img[:10, :10, :])
+        assert_array_equal(
+            np.array([100, 110, 120], dtype='uint8') * np.ones(
+                (5, 12, 3), dtype='uint8'), padded_img[10:, :, :])
+        assert_array_equal(
+            np.array([100, 110, 120], dtype='uint8') * np.ones(
+                (15, 2, 3), dtype='uint8'), padded_img[:, 10:, :])
+
+        # Pad the grayscale image to shape (15, 12)
+        img = np.random.rand(10, 10).astype(np.float32)
+        padded_img = mmcv.impad(img, shape=(15, 12))
+        assert_array_equal(img, padded_img[:10, :10])
+        assert_array_equal(
+            np.zeros((5, 12), dtype='float32'), padded_img[10:, :])
+        assert_array_equal(
+            np.zeros((15, 2), dtype='float32'), padded_img[:, 10:])
+
+        # Pad the RGB image to shape (15, 12)
+        img = np.random.rand(10, 10, 3).astype(np.float32)
+        padded_img = mmcv.impad(img, shape=(15, 12))
+        assert_array_equal(img, padded_img[:10, :10, :])
+        assert_array_equal(
+            np.zeros((5, 12, 3), dtype='float32'), padded_img[10:, :, :])
+        assert_array_equal(
+            np.zeros((15, 2, 3), dtype='float32'), padded_img[:, 10:, :])
+
+        # Pad the RGB image to shape (15, 12) with different values for
+        # three channels.
+        img = np.random.randint(256, size=(10, 10, 3)).astype('uint8')
+        padded_img = mmcv.impad(img, shape=(15, 12), pad_val=(100, 110, 120))
+        assert_array_equal(img, padded_img[:10, :10, :])
+        assert_array_equal(
+            np.array([100, 110, 120], dtype='uint8') * np.ones(
+                (5, 12, 3), dtype='uint8'), padded_img[10:, :, :])
+        assert_array_equal(
+            np.array([100, 110, 120], dtype='uint8') * np.ones(
+                (15, 2, 3), dtype='uint8'), padded_img[:, 10:, :])
+
+        # RGB image with padding=[5, 2]
+        img = np.random.rand(10, 10, 3).astype(np.float32)
+        padded_img = mmcv.impad(img, padding=(5, 2), pad_val=0)
+
+        assert padded_img.shape == (14, 20, 3)
+        assert_array_equal(img, padded_img[2:12, 5:15, :])
+        assert_array_equal(
+            np.zeros((2, 5, 3), dtype='float32'), padded_img[:2, :5, :])
+        assert_array_equal(
+            np.zeros((2, 5, 3), dtype='float32'), padded_img[12:, :5, :])
+        assert_array_equal(
+            np.zeros((2, 5, 3), dtype='float32'), padded_img[:2, 15:, :])
+        assert_array_equal(
+            np.zeros((2, 5, 3), dtype='float32'), padded_img[12:, 15:, :])
+
+        # RGB image with type(pad_val) = tuple
+        pad_val = (0, 1, 2)
+        img = np.random.rand(10, 10, 3).astype(np.float32)
+        padded_img = mmcv.impad(img, padding=(0, 0, 5, 2), pad_val=pad_val)
+
+        assert padded_img.shape == (12, 15, 3)
+        assert_array_equal(img, padded_img[:10, :10, :])
+        assert_array_equal(pad_val[0] * np.ones((2, 15, 1), dtype='float32'),
+                           padded_img[10:, :, 0:1])
+        assert_array_equal(pad_val[1] * np.ones((2, 15, 1), dtype='float32'),
+                           padded_img[10:, :, 1:2])
+        assert_array_equal(pad_val[2] * np.ones((2, 15, 1), dtype='float32'),
+                           padded_img[10:, :, 2:3])
+
+        assert_array_equal(pad_val[0] * np.ones((12, 5, 1), dtype='float32'),
+                           padded_img[:, 10:, 0:1])
+        assert_array_equal(pad_val[1] * np.ones((12, 5, 1), dtype='float32'),
+                           padded_img[:, 10:, 1:2])
+        assert_array_equal(pad_val[2] * np.ones((12, 5, 1), dtype='float32'),
+                           padded_img[:, 10:, 2:3])
+
+        # test different padding mode with channel number = 3
+        for mode in ['constant', 'edge', 'reflect', 'symmetric']:
+            img = np.random.rand(10, 10, 3).astype(np.float32)
+            padded_img = mmcv.impad(
+                img, padding=(0, 0, 5, 2), pad_val=pad_val, padding_mode=mode)
+            assert padded_img.shape == (12, 15, 3)
+
+        # test different padding mode with channel number = 1
+        for mode in ['constant', 'edge', 'reflect', 'symmetric']:
+            img = np.random.rand(10, 10).astype(np.float32)
+            padded_img = mmcv.impad(
+                img, padding=(0, 0, 5, 2), pad_val=0, padding_mode=mode)
+            assert padded_img.shape == (12, 15)
+
+        # Padding must be a int or a 2, or 4 element tuple.
+        with pytest.raises(ValueError):
+            mmcv.impad(img, padding=(1, 1, 1))
+
+        # pad_val must be a int or a tuple
+        with pytest.raises(TypeError):
+            mmcv.impad(img, padding=(1, 1, 1, 1), pad_val='wrong')
+
+        # When pad_val is a tuple,
+        # len(pad_val) should be equal to img.shape[-1]
+        img = np.random.rand(10, 10, 3).astype(np.float32)
+        with pytest.raises(AssertionError):
+            mmcv.impad(img, padding=3, pad_val=(100, 200))
+
+        with pytest.raises(AssertionError):
+            mmcv.impad(img, padding=2, pad_val=0, padding_mode='unknown')
+
+        with pytest.raises(AssertionError):
+            mmcv.impad(img, shape=(12, 15), padding=(0, 0, 5, 2))
+
+        # Pad shape smaller than image shape
+        padded_img = mmcv.impad(img, shape=(8, 8))
+        assert padded_img.shape == (10, 10, 3)
+
+    def test_impad_to_multiple(self):
+        img = np.random.rand(11, 14, 3).astype(np.float32)
+        padded_img = mmcv.impad_to_multiple(img, 4)
+        assert padded_img.shape == (12, 16, 3)
+        img = np.random.rand(20, 12).astype(np.float32)
+        padded_img = mmcv.impad_to_multiple(img, 5)
+        assert padded_img.shape == (20, 15)
+        img = np.random.rand(20, 12).astype(np.float32)
+        padded_img = mmcv.impad_to_multiple(img, 2)
+        assert padded_img.shape == (20, 12)
+
+    def test_cutout(self):
+        img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).astype(np.uint8)
+
+        # shape must be int or tuple
+        with pytest.raises(AssertionError):
+            mmcv.cutout(img, 2.5)
+        # pad_val must be int or float or tuple with the same length
+        # of img channels
+        with pytest.raises(AssertionError):
+            mmcv.cutout(img, 1, (1, 2, 3))
+        with pytest.raises(TypeError):
+            mmcv.cutout(img, 1, None)
+
+        # test cutout the whole img
+        assert_array_equal(mmcv.cutout(img, 6), np.zeros_like(img))
+        # test not cutout
+        assert_array_equal(mmcv.cutout(img, 0), img)
+        # test cutout when shape is int
+        np.random.seed(0)
+        img_cutout = np.array([[1, 2, 3], [4, 0, 6], [7, 8,
+                                                      9]]).astype(np.uint8)
+        assert_array_equal(mmcv.cutout(img, 1), img_cutout)
+        img_cutout = np.array([[1, 2, 3], [4, 10, 6], [7, 8,
+                                                       9]]).astype(np.uint8)
+        assert_array_equal(mmcv.cutout(img, 1, pad_val=10), img_cutout)
+        # test cutout when shape is tuple
+        np.random.seed(0)
+        img_cutout = np.array([[1, 2, 3], [0, 0, 6], [7, 8,
+                                                      9]]).astype(np.uint8)
+        assert_array_equal(mmcv.cutout(img, (1, 2)), img_cutout)
+        img_cutout = np.array([[1, 2, 3], [10, 10, 6], [7, 8,
+                                                        9]]).astype(np.uint8)
+        assert_array_equal(mmcv.cutout(img, (1, 2), pad_val=10), img_cutout)
+
+    def test_imrotate(self):
+        img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).astype(np.uint8)
+        assert_array_equal(mmcv.imrotate(img, 0), img)
+        img_r = np.array([[7, 4, 1], [8, 5, 2], [9, 6, 3]])
+        assert_array_equal(mmcv.imrotate(img, 90), img_r)
+        img_r = np.array([[3, 6, 9], [2, 5, 8], [1, 4, 7]])
+        assert_array_equal(mmcv.imrotate(img, -90), img_r)
+
+        img = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).astype(np.uint8)
+        img_r = np.array([[0, 6, 2, 0], [0, 7, 3, 0]])
+        assert_array_equal(mmcv.imrotate(img, 90), img_r)
+        img_r = np.array([[1, 0, 0, 0], [2, 0, 0, 0]])
+        assert_array_equal(mmcv.imrotate(img, 90, center=(0, 0)), img_r)
+        img_r = np.array([[255, 6, 2, 255], [255, 7, 3, 255]])
+        assert_array_equal(mmcv.imrotate(img, 90, border_value=255), img_r)
+        img_r = np.array([[5, 1], [6, 2], [7, 3], [8, 4]])
+        assert_array_equal(mmcv.imrotate(img, 90, auto_bound=True), img_r)
+        img_r = np.array([[6, 6, 2, 2], [7, 7, 3, 3]])
+        assert_array_equal(
+            mmcv.imrotate(img, 90, border_mode='replicate'), img_r)
+
+        with pytest.raises(ValueError):
+            mmcv.imrotate(img, 90, center=(0, 0), auto_bound=True)
+
+    def test_imshear(self):
+        img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).astype(np.uint8)
+        assert_array_equal(mmcv.imshear(img, 0), img)
+        # magnitude=1, horizontal
+        img_sheared = np.array([[1, 2, 3], [0, 4, 5], [0, 0, 7]],
+                               dtype=np.uint8)
+        assert_array_equal(mmcv.imshear(img, 1), img_sheared)
+        # magnitude=-1, vertical
+        img_sheared = np.array([[1, 5, 9], [4, 8, 0], [7, 0, 0]],
+                               dtype=np.uint8)
+        assert_array_equal(mmcv.imshear(img, -1, 'vertical'), img_sheared)
+        # magnitude=1, vertical, borderValue=100
+        borderValue = 100
+        img_sheared = np.array(
+            [[1, borderValue, borderValue], [4, 2, borderValue], [7, 5, 3]],
+            dtype=np.uint8)
+        assert_array_equal(
+            mmcv.imshear(img, 1, 'vertical', borderValue), img_sheared)
+        # magnitude=1, vertical, borderValue=100, img shape (h,w,3)
+        img = np.stack([img, img, img], axis=-1)
+        img_sheared = np.stack([img_sheared, img_sheared, img_sheared],
+                               axis=-1)
+        assert_array_equal(
+            mmcv.imshear(img, 1, 'vertical', borderValue), img_sheared)
+        # test tuple format of borderValue
+        assert_array_equal(
+            mmcv.imshear(img, 1, 'vertical',
+                         (borderValue, borderValue, borderValue)), img_sheared)
+
+        # test invalid length of borderValue
+        with pytest.raises(AssertionError):
+            mmcv.imshear(img, 0.5, 'horizontal', (borderValue, ))
+
+        # test invalid type of borderValue
+        with pytest.raises(ValueError):
+            mmcv.imshear(img, 0.5, 'horizontal', [borderValue])
+
+        # test invalid value of direction
+        with pytest.raises(AssertionError):
+            mmcv.imshear(img, 0.5, 'diagonal')
+
+    def test_imtranslate(self):
+        img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.uint8)
+        assert_array_equal(mmcv.imtranslate(img, 0), img)
+        # offset=1, horizontal
+        img_translated = np.array([[128, 1, 2], [128, 4, 5], [128, 7, 8]],
+                                  dtype=np.uint8)
+        assert_array_equal(
+            mmcv.imtranslate(img, 1, border_value=128), img_translated)
+        # offset=-1, vertical
+        img_translated = np.array([[4, 5, 6], [7, 8, 9], [0, 0, 0]],
+                                  dtype=np.uint8)
+        assert_array_equal(
+            mmcv.imtranslate(img, -1, 'vertical'), img_translated)
+        # offset=-2, horizontal
+        img = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        img_translated = [[3, 4, 128, 128], [7, 8, 128, 128]]
+        img_translated = np.stack(
+            [img_translated, img_translated, img_translated], axis=-1)
+        assert_array_equal(
+            mmcv.imtranslate(img, -2, border_value=128), img_translated)
+        # offset=2, vertical
+        border_value = (110, 120, 130)
+        img_translated = np.stack([
+            np.ones((2, 4)) * border_value[0],
+            np.ones((2, 4)) * border_value[1],
+            np.ones((2, 4)) * border_value[2]
+        ],
+                                  axis=-1).astype(np.uint8)
+        assert_array_equal(
+            mmcv.imtranslate(img, 2, 'vertical', border_value), img_translated)
+        # test invalid number elements in border_value
+        with pytest.raises(AssertionError):
+            mmcv.imtranslate(img, 1, border_value=(1, ))
+        # test invalid type of border_value
+        with pytest.raises(ValueError):
+            mmcv.imtranslate(img, 1, border_value=[1, 2, 3])
+        # test invalid value of direction
+        with pytest.raises(AssertionError):
+            mmcv.imtranslate(img, 1, 'diagonal')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_image_misc.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_image_misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..51e61d8e60e719f118bd275f1c7637c7a7adab1e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_image_misc.py
@@ -0,0 +1,73 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+from numpy.testing import assert_array_equal
+
+import mmcv
+
+try:
+    import torch
+except ImportError:
+    torch = None
+
+
+@pytest.mark.skipif(torch is None, reason='requires torch library')
+def test_tensor2imgs():
+
+    # test tensor obj
+    with pytest.raises(AssertionError):
+        tensor = np.random.rand(2, 3, 3)
+        mmcv.tensor2imgs(tensor)
+
+    # test tensor ndim
+    with pytest.raises(AssertionError):
+        tensor = torch.randn(2, 3, 3)
+        mmcv.tensor2imgs(tensor)
+
+    # test tensor dim-1
+    with pytest.raises(AssertionError):
+        tensor = torch.randn(2, 4, 3, 3)
+        mmcv.tensor2imgs(tensor)
+
+    # test mean length
+    with pytest.raises(AssertionError):
+        tensor = torch.randn(2, 3, 5, 5)
+        mmcv.tensor2imgs(tensor, mean=(1, ))
+        tensor = torch.randn(2, 1, 5, 5)
+        mmcv.tensor2imgs(tensor, mean=(0, 0, 0))
+
+    # test std length
+    with pytest.raises(AssertionError):
+        tensor = torch.randn(2, 3, 5, 5)
+        mmcv.tensor2imgs(tensor, std=(1, ))
+        tensor = torch.randn(2, 1, 5, 5)
+        mmcv.tensor2imgs(tensor, std=(1, 1, 1))
+
+    # test to_rgb
+    with pytest.raises(AssertionError):
+        tensor = torch.randn(2, 1, 5, 5)
+        mmcv.tensor2imgs(tensor, mean=(0, ), std=(1, ), to_rgb=True)
+
+    # test rgb=True
+    tensor = torch.randn(2, 3, 5, 5)
+    gts = [
+        t.cpu().numpy().transpose(1, 2, 0).astype(np.uint8)
+        for t in tensor.flip(1)
+    ]
+    outputs = mmcv.tensor2imgs(tensor, to_rgb=True)
+    for gt, output in zip(gts, outputs):
+        assert_array_equal(gt, output)
+
+    # test rgb=False
+    tensor = torch.randn(2, 3, 5, 5)
+    gts = [t.cpu().numpy().transpose(1, 2, 0).astype(np.uint8) for t in tensor]
+    outputs = mmcv.tensor2imgs(tensor, to_rgb=False)
+    for gt, output in zip(gts, outputs):
+        assert_array_equal(gt, output)
+
+    # test tensor channel 1 and rgb=False
+    tensor = torch.randn(2, 1, 5, 5)
+    gts = [t.squeeze(0).cpu().numpy().astype(np.uint8) for t in tensor]
+    outputs = mmcv.tensor2imgs(tensor, to_rgb=False)
+    for gt, output in zip(gts, outputs):
+        assert_array_equal(gt, output)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_io.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_io.py
new file mode 100644
index 0000000000000000000000000000000000000000..6742924f2303dc3cfba7390644fd31f5f9f363c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_io.py
@@ -0,0 +1,437 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import sys
+import tempfile
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import cv2
+import mmengine
+import numpy as np
+import pytest
+import torch
+from mmengine.fileio.file_client import HTTPBackend, PetrelBackend
+from numpy.testing import assert_allclose, assert_array_equal
+
+import mmcv
+
+if torch.__version__ == 'parrots':
+    pytest.skip('not necessary in parrots test', allow_module_level=True)
+
+
+class TestIO:
+
+    @classmethod
+    def setup_class(cls):
+        cls.data_dir = osp.join(osp.dirname(__file__), '../data')
+        # the test img resolution is 400x300
+        cls.img_path = osp.join(cls.data_dir, 'color.jpg')
+        cls.img_path_obj = Path(cls.img_path)
+        cls.gray_img_path = osp.join(cls.data_dir, 'grayscale.jpg')
+        cls.gray_img_path_obj = Path(cls.gray_img_path)
+        cls.gray_img_dim3_path = osp.join(cls.data_dir, 'grayscale_dim3.jpg')
+        cls.gray_alpha_img_path = osp.join(cls.data_dir, 'gray_alpha.png')
+        cls.palette_img_path = osp.join(cls.data_dir, 'palette.gif')
+        cls.exif_img_path = osp.join(cls.data_dir, 'color_exif.jpg')
+        cls.img = cv2.imread(cls.img_path)
+        cls.tiff_path = osp.join(cls.data_dir, 'uint16-5channel.tif')
+        # petrel s3 path
+        cls.s3_path = 's3://path/of/your/file.jpg'
+        # http path
+        cls.http_path = 'http://path/of/your/file.jpg'
+        # add mock package
+        sys.modules['petrel_client'] = MagicMock()
+        sys.modules['petrel_client.client'] = MagicMock()
+
+    @classmethod
+    def teardown_class(cls):
+        # clean instances avoid to influence other unittest
+        mmengine.FileClient._instances = {}
+
+    def assert_img_equal(self, img, ref_img, ratio_thr=0.999):
+        assert img.shape == ref_img.shape
+        assert img.dtype == ref_img.dtype
+        area = ref_img.shape[0] * ref_img.shape[1]
+        diff = np.abs(img.astype('int32') - ref_img.astype('int32'))
+        assert np.sum(diff <= 1) / float(area) > ratio_thr
+
+    def test_imread(self):
+        # backend cv2
+        mmcv.use_backend('cv2')
+
+        # file_client_args and backend_args can not be both set
+        with pytest.raises(
+                ValueError,
+                match='"file_client_args" and "backend_args" cannot be set'):
+            mmcv.imread(
+                self.img_path,
+                file_client_args={'backend': 'disk'},
+                backend_args={'backend': 'disk'})
+
+        # HardDiskBackend
+        img_cv2_color_bgr = mmcv.imread(self.img_path)
+        assert img_cv2_color_bgr.shape == (300, 400, 3)
+        img_cv2_color_rgb = mmcv.imread(self.img_path, channel_order='rgb')
+        assert img_cv2_color_rgb.shape == (300, 400, 3)
+        assert_array_equal(img_cv2_color_rgb[:, :, ::-1], img_cv2_color_bgr)
+        img_cv2_grayscale1 = mmcv.imread(self.img_path, 'grayscale')
+        assert img_cv2_grayscale1.shape == (300, 400)
+        img_cv2_grayscale2 = mmcv.imread(self.gray_img_path)
+        assert img_cv2_grayscale2.shape == (300, 400, 3)
+        img_cv2_unchanged = mmcv.imread(self.gray_img_path, 'unchanged')
+        assert img_cv2_unchanged.shape == (300, 400)
+        img_cv2_unchanged = mmcv.imread(img_cv2_unchanged)
+        assert_array_equal(img_cv2_unchanged, mmcv.imread(img_cv2_unchanged))
+
+        img_cv2_color_bgr = mmcv.imread(self.img_path_obj)
+        assert img_cv2_color_bgr.shape == (300, 400, 3)
+        img_cv2_color_rgb = mmcv.imread(self.img_path_obj, channel_order='rgb')
+        assert img_cv2_color_rgb.shape == (300, 400, 3)
+        assert_array_equal(img_cv2_color_rgb[:, :, ::-1], img_cv2_color_bgr)
+        img_cv2_grayscale1 = mmcv.imread(self.img_path_obj, 'grayscale')
+        assert img_cv2_grayscale1.shape == (300, 400)
+        img_cv2_grayscale2 = mmcv.imread(self.gray_img_path_obj)
+        assert img_cv2_grayscale2.shape == (300, 400, 3)
+        img_cv2_unchanged = mmcv.imread(self.gray_img_path_obj, 'unchanged')
+        assert img_cv2_unchanged.shape == (300, 400)
+        with pytest.raises(TypeError):
+            mmcv.imread(1)
+
+        # PetrelBackend
+        img_cv2_color_bgr = mmcv.imread(self.img_path)
+        with patch.object(
+                PetrelBackend, 'get',
+                return_value=img_cv2_color_bgr) as mock_method:
+            img_cv2_color_bgr_petrel = mmcv.imread(self.s3_path, backend='cv2')
+            img_cv2_color_bgr_petrel_with_args = mmcv.imread(
+                self.s3_path,
+                backend='cv2',
+                file_client_args={'backend': 'petrel'})
+            mock_method.assert_called()
+            assert_array_equal(img_cv2_color_bgr_petrel,
+                               img_cv2_color_bgr_petrel_with_args)
+
+            mock_method.reset_mock()
+
+            img_cv2_color_bgr_petrel_with_args = mmcv.imread(
+                self.s3_path,
+                backend='cv2',
+                backend_args={'backend': 'petrel'})
+            mock_method.assert_called()
+            assert_array_equal(img_cv2_color_bgr_petrel,
+                               img_cv2_color_bgr_petrel_with_args)
+
+        # HTTPBackend
+        img_cv2_color_bgr = mmcv.imread(self.img_path)
+        with patch.object(
+                HTTPBackend, 'get',
+                return_value=img_cv2_color_bgr) as mock_method:
+            img_cv2_color_bgr_http = mmcv.imread(self.http_path, backend='cv2')
+            img_cv2_color_bgr_http_with_args = mmcv.imread(
+                self.http_path,
+                backend='cv2',
+                file_client_args={'backend': 'http'})
+            mock_method.assert_called()
+            assert_array_equal(img_cv2_color_bgr_http,
+                               img_cv2_color_bgr_http_with_args)
+
+            mock_method.reset_mock()
+
+            img_cv2_color_bgr_http_with_args = mmcv.imread(
+                self.http_path,
+                backend='cv2',
+                backend_args={'backend': 'http'})
+            mock_method.assert_called()
+            assert_array_equal(img_cv2_color_bgr_http,
+                               img_cv2_color_bgr_http_with_args)
+
+        with pytest.raises(FileNotFoundError):
+            mmcv.imread('/not/exists/' + self.img_path)
+
+        # test arg backend pillow
+        img_pil_gray_alpha = mmcv.imread(
+            self.gray_alpha_img_path, 'grayscale', backend='pillow')
+        assert img_pil_gray_alpha.shape == (400, 500)
+        mean = img_pil_gray_alpha[300:, 400:].mean()
+        assert_allclose(img_pil_gray_alpha[300:, 400:] - mean, 0)
+        img_pil_gray_alpha = mmcv.imread(
+            self.gray_alpha_img_path, backend='pillow')
+        mean = img_pil_gray_alpha[300:, 400:].mean(axis=(0, 1))
+        assert_allclose(img_pil_gray_alpha[300:, 400:] - mean, 0)
+        assert img_pil_gray_alpha.shape == (400, 500, 3)
+        img_pil_gray_alpha = mmcv.imread(
+            self.gray_alpha_img_path, 'unchanged', backend='pillow')
+        assert img_pil_gray_alpha.shape == (400, 500, 2)
+        img_pil_palette = mmcv.imread(
+            self.palette_img_path, 'grayscale', backend='pillow')
+        assert img_pil_palette.shape == (300, 400)
+        img_pil_palette = mmcv.imread(self.palette_img_path, backend='pillow')
+        assert img_pil_palette.shape == (300, 400, 3)
+        img_pil_palette = mmcv.imread(
+            self.palette_img_path, 'unchanged', backend='pillow')
+        assert img_pil_palette.shape == (300, 400)
+
+        # backend pillow
+        mmcv.use_backend('pillow')
+        img_pil_grayscale1 = mmcv.imread(self.img_path, 'grayscale')
+        assert img_pil_grayscale1.shape == (300, 400)
+        img_pil_gray_alpha = mmcv.imread(self.gray_alpha_img_path, 'grayscale')
+        assert img_pil_gray_alpha.shape == (400, 500)
+        mean = img_pil_gray_alpha[300:, 400:].mean()
+        assert_allclose(img_pil_gray_alpha[300:, 400:] - mean, 0)
+        img_pil_gray_alpha = mmcv.imread(self.gray_alpha_img_path)
+        mean = img_pil_gray_alpha[300:, 400:].mean(axis=(0, 1))
+        assert_allclose(img_pil_gray_alpha[300:, 400:] - mean, 0)
+        assert img_pil_gray_alpha.shape == (400, 500, 3)
+        img_pil_gray_alpha = mmcv.imread(self.gray_alpha_img_path, 'unchanged')
+        assert img_pil_gray_alpha.shape == (400, 500, 2)
+        img_pil_palette = mmcv.imread(self.palette_img_path, 'grayscale')
+        assert img_pil_palette.shape == (300, 400)
+        img_pil_palette = mmcv.imread(self.palette_img_path)
+        assert img_pil_palette.shape == (300, 400, 3)
+        img_pil_palette = mmcv.imread(self.palette_img_path, 'unchanged')
+        assert img_pil_palette.shape == (300, 400)
+        img_pil_grayscale2 = mmcv.imread(self.gray_img_path)
+        assert img_pil_grayscale2.shape == (300, 400, 3)
+        img_pil_unchanged = mmcv.imread(self.gray_img_path, 'unchanged')
+        assert img_pil_unchanged.shape == (300, 400)
+        img_pil_unchanged = mmcv.imread(img_pil_unchanged)
+        assert_array_equal(img_pil_unchanged, mmcv.imread(img_pil_unchanged))
+
+        img_pil_color_bgr = mmcv.imread(self.img_path_obj)
+        assert img_pil_color_bgr.shape == (300, 400, 3)
+        img_pil_color_rgb = mmcv.imread(self.img_path_obj, channel_order='rgb')
+        assert img_pil_color_rgb.shape == (300, 400, 3)
+        assert (img_pil_color_rgb == img_cv2_color_rgb).sum() / float(
+            img_cv2_color_rgb.size) > 0.5
+        assert_array_equal(img_pil_color_rgb[:, :, ::-1], img_pil_color_bgr)
+        img_pil_grayscale1 = mmcv.imread(self.img_path_obj, 'grayscale')
+        assert img_pil_grayscale1.shape == (300, 400)
+        img_pil_grayscale2 = mmcv.imread(self.gray_img_path_obj)
+        assert img_pil_grayscale2.shape == (300, 400, 3)
+        img_pil_unchanged = mmcv.imread(self.gray_img_path_obj, 'unchanged')
+        assert img_pil_unchanged.shape == (300, 400)
+        with pytest.raises(TypeError):
+            mmcv.imread(1)
+
+        # backend turbojpeg
+        mmcv.use_backend('turbojpeg')
+
+        img_turbojpeg_color_bgr = mmcv.imread(self.img_path)
+        assert img_turbojpeg_color_bgr.shape == (300, 400, 3)
+        assert_array_equal(img_turbojpeg_color_bgr, img_cv2_color_bgr)
+
+        img_turbojpeg_color_rgb = mmcv.imread(
+            self.img_path, channel_order='rgb')
+        assert img_turbojpeg_color_rgb.shape == (300, 400, 3)
+        assert_array_equal(img_turbojpeg_color_rgb, img_cv2_color_rgb)
+
+        with pytest.raises(ValueError):
+            mmcv.imread(self.img_path, channel_order='unsupport_order')
+
+        img_turbojpeg_grayscale1 = mmcv.imread(self.img_path, flag='grayscale')
+        assert img_turbojpeg_grayscale1.shape == (300, 400)
+        assert_array_equal(img_turbojpeg_grayscale1, img_cv2_grayscale1)
+
+        img_turbojpeg_grayscale2 = mmcv.imread(self.gray_img_path)
+        assert img_turbojpeg_grayscale2.shape == (300, 400, 3)
+        assert_array_equal(img_turbojpeg_grayscale2, img_cv2_grayscale2)
+
+        img_turbojpeg_grayscale2 = mmcv.imread(img_turbojpeg_grayscale2)
+        assert_array_equal(img_turbojpeg_grayscale2,
+                           mmcv.imread(img_turbojpeg_grayscale2))
+
+        with pytest.raises(ValueError):
+            mmcv.imread(self.gray_img_path, 'unchanged')
+
+        with pytest.raises(TypeError):
+            mmcv.imread(1)
+
+        with pytest.raises(AssertionError):
+            mmcv.use_backend('unsupport_backend')
+
+        with pytest.raises(ValueError):
+            mmcv.imread(self.img_path, 'unsupported_backend')
+
+        # backend tifffile, multi channel tiff file(> 4 channels).
+        mmcv.use_backend('tifffile')
+        img_tifffile = mmcv.imread(self.tiff_path)
+        assert img_tifffile.shape == (200, 150, 5)
+
+        mmcv.use_backend('cv2')
+
+        # consistent exif behaviour
+        img_cv2_exif = mmcv.imread(self.exif_img_path)
+        img_pil_exif = mmcv.imread(self.exif_img_path, backend='pillow')
+        assert img_cv2_exif.shape == (400, 300, 3)
+        assert img_pil_exif.shape == (400, 300, 3)
+        img_cv2_exif_unchanged = mmcv.imread(
+            self.exif_img_path, flag='unchanged')
+        img_pil_exif_unchanged = mmcv.imread(
+            self.exif_img_path, backend='pillow', flag='unchanged')
+        assert img_cv2_exif_unchanged.shape == (300, 400, 3)
+        assert img_pil_exif_unchanged.shape == (300, 400, 3)
+        img_cv2_color_ignore_exif = mmcv.imread(
+            self.exif_img_path, flag='color_ignore_orientation')
+        img_pil_color_ignore_exif = mmcv.imread(
+            self.exif_img_path,
+            backend='pillow',
+            flag='color_ignore_orientation')
+        assert img_cv2_color_ignore_exif.shape == (300, 400, 3)
+        assert img_pil_color_ignore_exif.shape == (300, 400, 3)
+        img_cv2_grayscale_ignore_exif = mmcv.imread(
+            self.exif_img_path, flag='grayscale_ignore_orientation')
+        img_pil_grayscale_ignore_exif = mmcv.imread(
+            self.exif_img_path,
+            backend='pillow',
+            flag='grayscale_ignore_orientation')
+        assert img_cv2_grayscale_ignore_exif.shape == (300, 400)
+        assert img_pil_grayscale_ignore_exif.shape == (300, 400)
+
+    def test_imfrombytes(self):
+        # backend cv2, channel order: bgr
+        mmcv.use_backend('cv2')
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_cv2 = mmcv.imfrombytes(img_bytes)
+        assert img_cv2.shape == (300, 400, 3)
+
+        # backend cv2, channel order: rgb
+        mmcv.use_backend('cv2')
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_rgb_cv2 = mmcv.imfrombytes(img_bytes, channel_order='rgb')
+        assert img_rgb_cv2.shape == (300, 400, 3)
+        assert_array_equal(img_rgb_cv2, img_cv2[:, :, ::-1])
+
+        # backend cv2, grayscale, decode as 3 channels
+        with open(self.gray_img_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_rgb_cv2 = mmcv.imfrombytes(img_bytes)
+        assert gray_img_rgb_cv2.shape == (300, 400, 3)
+
+        # backend cv2, grayscale
+        with open(self.gray_img_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_cv2 = mmcv.imfrombytes(img_bytes, flag='grayscale')
+        assert gray_img_cv2.shape == (300, 400)
+
+        # backend cv2, grayscale dim3
+        with open(self.gray_img_dim3_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_dim3_cv2 = mmcv.imfrombytes(img_bytes, flag='grayscale')
+        assert gray_img_dim3_cv2.shape == (300, 400)
+
+        # arg backend pillow, channel order: bgr
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_pillow = mmcv.imfrombytes(img_bytes, backend='pillow')
+        assert img_pillow.shape == (300, 400, 3)
+        # Pillow and opencv decoding may not be the same
+        assert (img_cv2 == img_pillow).sum() / float(img_cv2.size) > 0.5
+
+        # backend pillow, channel order: bgr
+        mmcv.use_backend('pillow')
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_pillow = mmcv.imfrombytes(img_bytes)
+        assert img_pillow.shape == (300, 400, 3)
+        # Pillow and opencv decoding may not be the same
+        assert (img_cv2 == img_pillow).sum() / float(img_cv2.size) > 0.5
+
+        # backend turbojpeg, channel order: bgr
+        mmcv.use_backend('turbojpeg')
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_turbojpeg = mmcv.imfrombytes(img_bytes)
+        assert img_turbojpeg.shape == (300, 400, 3)
+        assert_array_equal(img_cv2, img_turbojpeg)
+
+        # backend turbojpeg, channel order: rgb
+        with open(self.img_path, 'rb') as f:
+            img_bytes = f.read()
+        img_rgb_turbojpeg = mmcv.imfrombytes(img_bytes, channel_order='rgb')
+        assert img_rgb_turbojpeg.shape == (300, 400, 3)
+        assert_array_equal(img_rgb_turbojpeg, img_cv2[:, :, ::-1])
+
+        # backend turbojpeg, grayscale, decode as 3 channels
+        with open(self.gray_img_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_turbojpeg = mmcv.imfrombytes(img_bytes)
+        assert gray_img_turbojpeg.shape == (300, 400, 3)
+        assert_array_equal(gray_img_rgb_cv2, gray_img_turbojpeg)
+
+        # backend turbojpeg, grayscale
+        with open(self.gray_img_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_turbojpeg = mmcv.imfrombytes(img_bytes, flag='grayscale')
+        assert gray_img_turbojpeg.shape == (300, 400)
+        assert_array_equal(gray_img_cv2, gray_img_turbojpeg)
+
+        # backend turbojpeg, grayscale dim3
+        with open(self.gray_img_dim3_path, 'rb') as f:
+            img_bytes = f.read()
+        gray_img_dim3_turbojpeg = mmcv.imfrombytes(img_bytes, flag='grayscale')
+        assert gray_img_dim3_turbojpeg.shape == (300, 400)
+        assert_array_equal(gray_img_dim3_cv2, gray_img_dim3_turbojpeg)
+
+        mmcv.use_backend('cv2')
+
+        with pytest.raises(ValueError):
+            with open(self.img_path, 'rb') as f:
+                img_bytes = f.read()
+            mmcv.imfrombytes(img_bytes, backend='unsupported_backend')
+
+    def test_imwrite(self):
+        img = mmcv.imread(self.img_path)
+        out_file = osp.join(tempfile.gettempdir(), 'mmcv_test.jpg')
+
+        # file_client_args and backend_args can not be both set
+        with pytest.raises(
+                ValueError,
+                match='"file_client_args" and "backend_args" cannot be set'):
+            mmcv.imwrite(
+                img,
+                out_file,
+                file_client_args={'backend': 'disk'},
+                backend_args={'backend': 'disk'})
+
+        mmcv.imwrite(img, out_file)
+        rewrite_img = mmcv.imread(out_file)
+        os.remove(out_file)
+        self.assert_img_equal(img, rewrite_img)
+
+        # test petrel client
+        with patch.object(
+                PetrelBackend, 'put', return_value=None) as mock_method:
+            ret = mmcv.imwrite(img, self.s3_path)
+            ret_with_args = mmcv.imwrite(
+                img, self.s3_path, file_client_args={'backend': 'petrel'})
+            assert ret
+            assert ret_with_args
+            mock_method.assert_called()
+
+            mock_method.reset_mock()
+
+            ret_with_args = mmcv.imwrite(
+                img, self.s3_path, backend_args={'backend': 'petrel'})
+            assert ret_with_args
+            mock_method.assert_called()
+
+        with pytest.raises(cv2.error):
+            mmcv.imwrite(img, 'error_file.jppg')
+
+    @patch('mmcv.image.io.TurboJPEG', None)
+    def test_no_turbojpeg(self):
+        with pytest.raises(ImportError):
+            mmcv.use_backend('turbojpeg')
+
+        mmcv.use_backend('cv2')
+
+    @patch('mmcv.image.io.Image', None)
+    def test_no_pillow(self):
+        with pytest.raises(ImportError):
+            mmcv.use_backend('pillow')
+
+        mmcv.use_backend('cv2')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_photometric.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_photometric.py
new file mode 100644
index 0000000000000000000000000000000000000000..2288a5ef62e3bcee8f4c62ac5453d183b50c1241
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_image/test_photometric.py
@@ -0,0 +1,426 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+
+import cv2
+import numpy as np
+import pytest
+from numpy.testing import assert_array_equal
+
+import mmcv
+
+
+class TestPhotometric:
+
+    @classmethod
+    def setup_class(cls):
+        # the test img resolution is 400x300
+        cls.img_path = osp.join(osp.dirname(__file__), '../data/color.jpg')
+        cls.img = cv2.imread(cls.img_path)
+        cls.mean = np.array([123.675, 116.28, 103.53], dtype=np.float32)
+        cls.std = np.array([58.395, 57.12, 57.375], dtype=np.float32)
+
+    def test_imnormalize(self):
+        rgb_img = self.img[:, :, ::-1]
+        baseline = (rgb_img - self.mean) / self.std
+        img = mmcv.imnormalize(self.img, self.mean, self.std)
+        assert np.allclose(img, baseline)
+        assert id(img) != id(self.img)
+        img = mmcv.imnormalize(rgb_img, self.mean, self.std, to_rgb=False)
+        assert np.allclose(img, baseline)
+        assert id(img) != id(rgb_img)
+
+    def test_imnormalize_(self):
+        img_for_normalize = np.float32(self.img)
+        rgb_img_for_normalize = np.float32(self.img[:, :, ::-1])
+        baseline = (rgb_img_for_normalize - self.mean) / self.std
+        img = mmcv.imnormalize_(img_for_normalize, self.mean, self.std)
+        assert np.allclose(img_for_normalize, baseline)
+        assert id(img) == id(img_for_normalize)
+        img = mmcv.imnormalize_(
+            rgb_img_for_normalize, self.mean, self.std, to_rgb=False)
+        assert np.allclose(img, baseline)
+        assert id(img) == id(rgb_img_for_normalize)
+
+    def test_imdenormalize(self):
+        norm_img = (self.img[:, :, ::-1] - self.mean) / self.std
+        rgb_baseline = (norm_img * self.std + self.mean)
+        bgr_baseline = rgb_baseline[:, :, ::-1]
+        img = mmcv.imdenormalize(norm_img, self.mean, self.std)
+        assert np.allclose(img, bgr_baseline)
+        img = mmcv.imdenormalize(norm_img, self.mean, self.std, to_bgr=False)
+        assert np.allclose(img, rgb_baseline)
+
+    def test_iminvert(self):
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img_r = np.array([[255, 127, 0], [254, 128, 1], [253, 126, 2]],
+                         dtype=np.uint8)
+        assert_array_equal(mmcv.iminvert(img), img_r)
+
+    def test_solarize(self):
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img_r = np.array([[0, 127, 0], [1, 127, 1], [2, 126, 2]],
+                         dtype=np.uint8)
+        assert_array_equal(mmcv.solarize(img), img_r)
+        img_r = np.array([[0, 127, 0], [1, 128, 1], [2, 126, 2]],
+                         dtype=np.uint8)
+        assert_array_equal(mmcv.solarize(img, 100), img_r)
+
+    def test_posterize(self):
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img_r = np.array([[0, 128, 128], [0, 0, 128], [0, 128, 128]],
+                         dtype=np.uint8)
+        assert_array_equal(mmcv.posterize(img, 1), img_r)
+        img_r = np.array([[0, 128, 224], [0, 96, 224], [0, 128, 224]],
+                         dtype=np.uint8)
+        assert_array_equal(mmcv.posterize(img, 3), img_r)
+
+    def test_adjust_color(self, nb_rand_test=100):
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        assert_array_equal(mmcv.adjust_color(img), img)
+        img_gray = mmcv.bgr2gray(img)
+        img_r = np.stack([img_gray, img_gray, img_gray], axis=-1)
+        assert_array_equal(mmcv.adjust_color(img, 0), img_r)
+        assert_array_equal(mmcv.adjust_color(img, 0, 1), img_r)
+        assert_array_equal(
+            mmcv.adjust_color(img, 0.5, 0.5),
+            np.round(np.clip((img * 0.5 + img_r * 0.5), 0,
+                             255)).astype(img.dtype))
+        assert_array_equal(
+            mmcv.adjust_color(img, 1, 1.5),
+            np.round(np.clip(img * 1 + img_r * 1.5, 0, 255)).astype(img.dtype))
+        assert_array_equal(
+            mmcv.adjust_color(img, 0.8, -0.6, gamma=2),
+            np.round(np.clip(img * 0.8 - 0.6 * img_r + 2, 0,
+                             255)).astype(img.dtype))
+        assert_array_equal(
+            mmcv.adjust_color(img, 0.8, -0.6, gamma=-0.6),
+            np.round(np.clip(img * 0.8 - 0.6 * img_r - 0.6, 0,
+                             255)).astype(img.dtype))
+
+        # test float type of image
+        img = img.astype(np.float32)
+        assert_array_equal(
+            np.round(mmcv.adjust_color(img, 0.8, -0.6, gamma=-0.6)),
+            np.round(np.clip(img * 0.8 - 0.6 * img_r - 0.6, 0, 255)))
+
+        # test equalize with randomly sampled image.
+        for _ in range(nb_rand_test):
+            img = np.clip(np.random.normal(0, 1, (256, 256, 3)) * 260, 0,
+                          255).astype(np.uint8)
+            factor = np.random.uniform()
+            cv2_img = mmcv.adjust_color(img, alpha=factor)
+            pil_img = mmcv.adjust_color(img, alpha=factor, backend='pillow')
+            np.testing.assert_allclose(cv2_img, pil_img, rtol=0, atol=2)
+
+        # the input type must be uint8 for pillow backend
+        with pytest.raises(AssertionError):
+            mmcv.adjust_color(img.astype(np.float32), backend='pillow')
+
+        # backend must be 'cv2' or 'pillow'
+        with pytest.raises(ValueError):
+            mmcv.adjust_color(img.astype(np.uint8), backend='not support')
+
+    def test_imequalize(self, nb_rand_test=100):
+
+        def _imequalize(img):
+            # equalize the image using PIL.ImageOps.equalize
+            from PIL import Image, ImageOps
+            img = Image.fromarray(img)
+            equalized_img = np.asarray(ImageOps.equalize(img))
+            return equalized_img
+
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        equalized_img = mmcv.imequalize(img)
+        assert_array_equal(equalized_img, _imequalize(img))
+
+        # test equalize with case step=0
+        img = np.array([[0, 0, 0], [120, 120, 120], [255, 255, 255]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        assert_array_equal(mmcv.imequalize(img), img)
+
+        # test equalize with randomly sampled image.
+        for _ in range(nb_rand_test):
+            img = np.clip(np.random.normal(0, 1, (256, 256, 3)) * 260, 0,
+                          255).astype(np.uint8)
+            equalized_img = mmcv.imequalize(img)
+            assert_array_equal(equalized_img, _imequalize(img))
+
+    def test_adjust_brightness(self, nb_rand_test=100):
+
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        # test case with factor 1.0
+        assert_array_equal(mmcv.adjust_brightness(img, 1.), img)
+        # test case with factor 0.0
+        assert_array_equal(mmcv.adjust_brightness(img, 0.), np.zeros_like(img))
+        # test adjust_brightness with randomly sampled images and factors.
+        for _ in range(nb_rand_test):
+            img = np.clip(
+                np.random.uniform(0, 1, (1000, 1200, 3)) * 260, 0,
+                255).astype(np.uint8)
+            factor = np.random.uniform() + np.random.choice([0, 1])
+            np.testing.assert_allclose(
+                mmcv.adjust_brightness(img, factor).astype(np.int32),
+                mmcv.adjust_brightness(img, factor,
+                                       backend='pillow').astype(np.int32),
+                rtol=0,
+                atol=1)
+
+        # the input type must be uint8 for pillow backend
+        with pytest.raises(AssertionError):
+            mmcv.adjust_brightness(img.astype(np.float32), backend='pillow')
+
+        # backend must be 'cv2' or 'pillow'
+        with pytest.raises(ValueError):
+            mmcv.adjust_brightness(img.astype(np.uint8), backend='not support')
+
+    def test_adjust_contrast(self, nb_rand_test=100):
+
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+        # test case with factor 1.0
+        assert_array_equal(mmcv.adjust_contrast(img, 1.), img)
+        # test case with factor 0.0
+        assert_array_equal(
+            mmcv.adjust_contrast(img, 0.),
+            mmcv.adjust_contrast(img, 0., backend='pillow'))
+        # test adjust_contrast with randomly sampled images and factors.
+        for _ in range(nb_rand_test):
+            img = np.clip(
+                np.random.uniform(0, 1, (1200, 1000, 3)) * 260, 0,
+                255).astype(np.uint8)
+            factor = np.random.uniform() + np.random.choice([0, 1])
+            # Note the gap (less_equal 1) between PIL.ImageEnhance.Contrast
+            # and mmcv.adjust_contrast comes from the gap that converts from
+            # a color image to gray image using mmcv or PIL.
+            np.testing.assert_allclose(
+                mmcv.adjust_contrast(img, factor).astype(np.int32),
+                mmcv.adjust_contrast(img, factor,
+                                     backend='pillow').astype(np.int32),
+                rtol=0,
+                atol=1)
+
+        # the input type must be uint8 pillow backend
+        with pytest.raises(AssertionError):
+            mmcv.adjust_contrast(img.astype(np.float32), backend='pillow')
+
+        # backend must be 'cv2' or 'pillow'
+        with pytest.raises(ValueError):
+            mmcv.adjust_contrast(img.astype(np.uint8), backend='not support')
+
+    def test_auto_contrast(self, nb_rand_test=100):
+
+        def _auto_contrast(img, cutoff=0):
+            from PIL import Image
+            from PIL.ImageOps import autocontrast
+
+            # Image.fromarray defaultly supports RGB, not BGR.
+            # convert from BGR to RGB
+            img = Image.fromarray(img[..., ::-1], mode='RGB')
+            contrasted_img = autocontrast(img, cutoff)
+            # convert from RGB to BGR
+            return np.asarray(contrasted_img)[..., ::-1]
+
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+
+        # test case without cut-off
+        assert_array_equal(mmcv.auto_contrast(img), _auto_contrast(img))
+        # test case with cut-off as int
+        assert_array_equal(
+            mmcv.auto_contrast(img, 10), _auto_contrast(img, 10))
+        # test case with cut-off as float
+        assert_array_equal(
+            mmcv.auto_contrast(img, 12.5), _auto_contrast(img, 12.5))
+        # test case with cut-off as tuple
+        assert_array_equal(
+            mmcv.auto_contrast(img, (10, 10)), _auto_contrast(img, 10))
+        # test case with cut-off with sum over 100
+        assert_array_equal(
+            mmcv.auto_contrast(img, 60), _auto_contrast(img, 60))
+
+        # test auto_contrast with randomly sampled images and factors.
+        for _ in range(nb_rand_test):
+            img = np.clip(
+                np.random.uniform(0, 1, (1200, 1000, 3)) * 260, 0,
+                255).astype(np.uint8)
+            # cut-offs are not set as tuple since in `build.yml`, pillow 6.2.2
+            # is installed, which does not support setting low cut-off and high
+            #  cut-off differently.
+            # With pillow above 8.0.0, cutoff can be set as tuple
+            cutoff = np.random.rand() * 100
+            assert_array_equal(
+                mmcv.auto_contrast(img, cutoff), _auto_contrast(img, cutoff))
+
+    def test_adjust_sharpness(self, nb_rand_test=100):
+
+        def _adjust_sharpness(img, factor):
+            # adjust the sharpness of image using
+            # PIL.ImageEnhance.Sharpness
+            from PIL import Image
+            from PIL.ImageEnhance import Sharpness
+            img = Image.fromarray(img)
+            sharpened_img = Sharpness(img).enhance(factor)
+            return np.asarray(sharpened_img)
+
+        img = np.array([[0, 128, 255], [1, 127, 254], [2, 129, 253]],
+                       dtype=np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+
+        # test case with invalid type of kernel
+        with pytest.raises(AssertionError):
+            mmcv.adjust_sharpness(img, 1., kernel=1.)
+        # test case with invalid shape of kernel
+        kernel = np.ones((3, 3, 3))
+        with pytest.raises(AssertionError):
+            mmcv.adjust_sharpness(img, 1., kernel=kernel)
+        # test case with all-zero kernel, factor 0.0
+        kernel = np.zeros((3, 3))
+        assert_array_equal(
+            mmcv.adjust_sharpness(img, 0., kernel=kernel), np.zeros_like(img))
+
+        # test case with factor 1.0
+        assert_array_equal(mmcv.adjust_sharpness(img, 1.), img)
+        # test adjust_sharpness with randomly sampled images and factors.
+        for _ in range(nb_rand_test):
+            img = np.clip(
+                np.random.uniform(0, 1, (1000, 1200, 3)) * 260, 0,
+                255).astype(np.uint8)
+            factor = np.random.uniform()
+            # Note the gap between PIL.ImageEnhance.Sharpness and
+            # mmcv.adjust_sharpness mainly comes from the difference ways of
+            # handling img edges when applying filters
+            np.testing.assert_allclose(
+                mmcv.adjust_sharpness(img, factor).astype(np.int32)[1:-1,
+                                                                    1:-1],
+                _adjust_sharpness(img, factor).astype(np.int32)[1:-1, 1:-1],
+                rtol=0,
+                atol=1)
+
+    def test_adjust_lighting(self):
+        img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).astype(np.uint8)
+        img = np.stack([img, img, img], axis=-1)
+
+        # eigval and eigvec must be np.ndarray
+        with pytest.raises(AssertionError):
+            mmcv.adjust_lighting(img, 1, np.ones((3, 1)))
+        with pytest.raises(AssertionError):
+            mmcv.adjust_lighting(img, np.array([1]), (1, 1, 1))
+        # we must have the same number of eigval and eigvec
+        with pytest.raises(AssertionError):
+            mmcv.adjust_lighting(img, np.array([1]), np.eye(2))
+        with pytest.raises(AssertionError):
+            mmcv.adjust_lighting(img, np.array([1]), np.array([1]))
+
+        img_adjusted = mmcv.adjust_lighting(
+            img,
+            np.random.normal(0, 1, 2),
+            np.random.normal(0, 1, (3, 2)),
+            alphastd=0.)
+        assert_array_equal(img_adjusted, img)
+
+    def test_lut_transform(self):
+        lut_table = np.array(list(range(256)))
+
+        # test assertion image values should between 0 and 255.
+        with pytest.raises(AssertionError):
+            mmcv.lut_transform(np.array([256]), lut_table)
+        with pytest.raises(AssertionError):
+            mmcv.lut_transform(np.array([-1]), lut_table)
+
+        # test assertion lut_table should be ndarray with shape (256, )
+        with pytest.raises(AssertionError):
+            mmcv.lut_transform(np.array([0]), list(range(256)))
+        with pytest.raises(AssertionError):
+            mmcv.lut_transform(np.array([1]), np.array(list(range(257))))
+
+        img = mmcv.lut_transform(self.img, lut_table)
+        baseline = cv2.LUT(self.img, lut_table)
+        assert np.allclose(img, baseline)
+
+        input_img = np.array(
+            [[[0, 128, 255], [255, 128, 0]], [[0, 128, 255], [255, 128, 0]]],
+            dtype=float)
+        img = mmcv.lut_transform(input_img, lut_table)
+        baseline = cv2.LUT(np.array(input_img, dtype=np.uint8), lut_table)
+        assert np.allclose(img, baseline)
+
+        input_img = np.random.randint(0, 256, size=(7, 8, 9, 10, 11))
+        img = mmcv.lut_transform(input_img, lut_table)
+        baseline = cv2.LUT(np.array(input_img, dtype=np.uint8), lut_table)
+        assert np.allclose(img, baseline)
+
+    def test_clahe(self):
+
+        def _clahe(img, clip_limit=40.0, tile_grid_size=(8, 8)):
+            clahe = cv2.createCLAHE(clip_limit, tile_grid_size)
+            return clahe.apply(np.array(img, dtype=np.uint8))
+
+        # test assertion image should have the right shape
+        with pytest.raises(AssertionError):
+            mmcv.clahe(self.img)
+
+        # test assertion tile_grid_size should be a tuple with 2 integers
+        with pytest.raises(AssertionError):
+            mmcv.clahe(self.img[:, :, 0], tile_grid_size=(8.0, 8.0))
+        with pytest.raises(AssertionError):
+            mmcv.clahe(self.img[:, :, 0], tile_grid_size=(8, 8, 8))
+        with pytest.raises(AssertionError):
+            mmcv.clahe(self.img[:, :, 0], tile_grid_size=[8, 8])
+
+        # test with different channels
+        for i in range(self.img.shape[-1]):
+            img = mmcv.clahe(self.img[:, :, i])
+            img_std = _clahe(self.img[:, :, i])
+            assert np.allclose(img, img_std)
+            assert id(img) != id(self.img[:, :, i])
+            assert id(img_std) != id(self.img[:, :, i])
+
+        # test case with clip_limit=1.2
+        for i in range(self.img.shape[-1]):
+            img = mmcv.clahe(self.img[:, :, i], 1.2)
+            img_std = _clahe(self.img[:, :, i], 1.2)
+            assert np.allclose(img, img_std)
+            assert id(img) != id(self.img[:, :, i])
+            assert id(img_std) != id(self.img[:, :, i])
+
+    def test_adjust_hue(self):
+        # test case with img is not ndarray
+        from PIL import Image
+        pil_img = Image.fromarray(self.img)
+
+        with pytest.raises(TypeError):
+            mmcv.adjust_hue(pil_img, hue_factor=0.0)
+
+        # test case with hue_factor > 0.5 or hue_factor < -0.5
+        with pytest.raises(ValueError):
+            mmcv.adjust_hue(self.img, hue_factor=-0.6)
+        with pytest.raises(ValueError):
+            mmcv.adjust_hue(self.img, hue_factor=0.6)
+
+        for i in np.arange(-0.5, 0.5, 0.2):
+            pil_res = mmcv.adjust_hue(self.img, hue_factor=i, backend='pillow')
+            pil_res = np.array(pil_res)
+            cv2_res = mmcv.adjust_hue(self.img, hue_factor=i)
+            assert np.allclose(pil_res, cv2_res, atol=10.0)
+
+        # test pillow backend
+        with pytest.raises(AssertionError):
+            mmcv.adjust_hue(
+                self.img.astype(np.float32), hue_factor=0, backend='pillow')
+
+        # backend must be 'cv2' or 'pillow'
+        with pytest.raises(ValueError):
+            mmcv.adjust_hue(
+                self.img.astype(np.uint8), hue_factor=0, backend='not support')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/output.pkl b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/output.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..bcb7b2dd606930522b102d3a59fef70d6f3eb885
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/output.pkl differ
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_active_rotated_filter.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_active_rotated_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..30ea59c5c62a4fd7c01fbd03a98485be359984f4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_active_rotated_filter.py
@@ -0,0 +1,258 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import active_rotated_filter
+
+np_feature = np.array([[[[[-1.4934e-01, 1.1341e+00, -1.6241e-01],
+                          [-1.0986e+00, -1.1463e+00, -1.3176e+00],
+                          [1.4808e+00, 7.6572e-01, -1.4548e+00]]]],
+                       [[[[1.9370e+00, 6.2799e-01, 2.5834e-02],
+                          [-1.4242e+00, 7.6566e-01, 1.0015e+00],
+                          [9.8669e-01, 4.1356e-01, 6.1068e-01]]]],
+                       [[[[1.4565e+00, 1.4960e+00, 2.4339e-01],
+                          [-2.2484e-01, 7.5942e-01, -8.1184e-01],
+                          [-1.7077e+00, 1.0658e+00, 3.8311e-01]]]],
+                       [[[[8.4734e-01, 1.0904e+00, 2.4356e+00],
+                          [9.5822e-01, 2.2260e-01, -2.4450e-01],
+                          [-1.5078e+00, 7.0902e-02, -1.5921e+00]]]],
+                       [[[[2.1173e+00, -7.3524e-01, 1.8888e+00],
+                          [1.0169e+00, 4.7033e-01, -1.0875e+00],
+                          [-1.0736e+00, -5.2245e-01, -2.8733e-01]]]],
+                       [[[[-5.6433e-01, 1.5835e+00, -1.5826e+00],
+                          [-8.8974e-01, -4.3128e-01, -2.2423e-01],
+                          [1.6552e-03, -1.7292e+00, 2.6639e-01]]]],
+                       [[[[-1.2951e-01, 1.3493e+00, -1.9329e+00],
+                          [5.6248e-01, -5.1189e-01, 1.3614e+00],
+                          [3.3680e-01, -8.7148e-01, 5.0592e-01]]]],
+                       [[[[1.6781e-02, -8.3929e-01, 1.2060e+00],
+                          [-1.0764e+00, 4.7821e-01, 1.5342e+00],
+                          [-4.4542e-01, -1.8606e+00, 3.0827e-01]]]]])
+
+np_indices = np.array([[[[1, 2, 3, 6, 9, 8, 7, 4], [2, 3, 6, 9, 8, 7, 4, 1],
+                         [3, 6, 9, 8, 7, 4, 1, 2]],
+                        [[4, 1, 2, 3, 6, 9, 8, 7], [5, 5, 5, 5, 5, 5, 5, 5],
+                         [6, 9, 8, 7, 4, 1, 2, 3]],
+                        [[7, 4, 1, 2, 3, 6, 9, 8], [8, 7, 4, 1, 2, 3, 6, 9],
+                         [9, 8, 7, 4, 1, 2, 3, 6]]]])
+
+expected_output = np.array([[[[-1.4934e-01, 1.1341e+00, -1.6241e-01],
+                              [-1.0986e+00, -1.1463e+00, -1.3176e+00],
+                              [1.4808e+00, 7.6572e-01, -1.4548e+00]]],
+                            [[[-1.0986e+00, -1.4934e-01, 1.1341e+00],
+                              [1.4808e+00, -1.1463e+00, -1.6241e-01],
+                              [7.6572e-01, -1.4548e+00, -1.3176e+00]]],
+                            [[[1.4808e+00, -1.0986e+00, -1.4934e-01],
+                              [7.6572e-01, -1.1463e+00, 1.1341e+00],
+                              [-1.4548e+00, -1.3176e+00, -1.6241e-01]]],
+                            [[[7.6572e-01, 1.4808e+00, -1.0986e+00],
+                              [-1.4548e+00, -1.1463e+00, -1.4934e-01],
+                              [-1.3176e+00, -1.6241e-01, 1.1341e+00]]],
+                            [[[-1.4548e+00, 7.6572e-01, 1.4808e+00],
+                              [-1.3176e+00, -1.1463e+00, -1.0986e+00],
+                              [-1.6241e-01, 1.1341e+00, -1.4934e-01]]],
+                            [[[-1.3176e+00, -1.4548e+00, 7.6572e-01],
+                              [-1.6241e-01, -1.1463e+00, 1.4808e+00],
+                              [1.1341e+00, -1.4934e-01, -1.0986e+00]]],
+                            [[[-1.6241e-01, -1.3176e+00, -1.4548e+00],
+                              [1.1341e+00, -1.1463e+00, 7.6572e-01],
+                              [-1.4934e-01, -1.0986e+00, 1.4808e+00]]],
+                            [[[1.1341e+00, -1.6241e-01, -1.3176e+00],
+                              [-1.4934e-01, -1.1463e+00, -1.4548e+00],
+                              [-1.0986e+00, 1.4808e+00, 7.6572e-01]]],
+                            [[[1.9370e+00, 6.2799e-01, 2.5834e-02],
+                              [-1.4242e+00, 7.6566e-01, 1.0015e+00],
+                              [9.8669e-01, 4.1356e-01, 6.1068e-01]]],
+                            [[[-1.4242e+00, 1.9370e+00, 6.2799e-01],
+                              [9.8669e-01, 7.6566e-01, 2.5834e-02],
+                              [4.1356e-01, 6.1068e-01, 1.0015e+00]]],
+                            [[[9.8669e-01, -1.4242e+00, 1.9370e+00],
+                              [4.1356e-01, 7.6566e-01, 6.2799e-01],
+                              [6.1068e-01, 1.0015e+00, 2.5834e-02]]],
+                            [[[4.1356e-01, 9.8669e-01, -1.4242e+00],
+                              [6.1068e-01, 7.6566e-01, 1.9370e+00],
+                              [1.0015e+00, 2.5834e-02, 6.2799e-01]]],
+                            [[[6.1068e-01, 4.1356e-01, 9.8669e-01],
+                              [1.0015e+00, 7.6566e-01, -1.4242e+00],
+                              [2.5834e-02, 6.2799e-01, 1.9370e+00]]],
+                            [[[1.0015e+00, 6.1068e-01, 4.1356e-01],
+                              [2.5834e-02, 7.6566e-01, 9.8669e-01],
+                              [6.2799e-01, 1.9370e+00, -1.4242e+00]]],
+                            [[[2.5834e-02, 1.0015e+00, 6.1068e-01],
+                              [6.2799e-01, 7.6566e-01, 4.1356e-01],
+                              [1.9370e+00, -1.4242e+00, 9.8669e-01]]],
+                            [[[6.2799e-01, 2.5834e-02, 1.0015e+00],
+                              [1.9370e+00, 7.6566e-01, 6.1068e-01],
+                              [-1.4242e+00, 9.8669e-01, 4.1356e-01]]],
+                            [[[1.4565e+00, 1.4960e+00, 2.4339e-01],
+                              [-2.2484e-01, 7.5942e-01, -8.1184e-01],
+                              [-1.7077e+00, 1.0658e+00, 3.8311e-01]]],
+                            [[[-2.2484e-01, 1.4565e+00, 1.4960e+00],
+                              [-1.7077e+00, 7.5942e-01, 2.4339e-01],
+                              [1.0658e+00, 3.8311e-01, -8.1184e-01]]],
+                            [[[-1.7077e+00, -2.2484e-01, 1.4565e+00],
+                              [1.0658e+00, 7.5942e-01, 1.4960e+00],
+                              [3.8311e-01, -8.1184e-01, 2.4339e-01]]],
+                            [[[1.0658e+00, -1.7077e+00, -2.2484e-01],
+                              [3.8311e-01, 7.5942e-01, 1.4565e+00],
+                              [-8.1184e-01, 2.4339e-01, 1.4960e+00]]],
+                            [[[3.8311e-01, 1.0658e+00, -1.7077e+00],
+                              [-8.1184e-01, 7.5942e-01, -2.2484e-01],
+                              [2.4339e-01, 1.4960e+00, 1.4565e+00]]],
+                            [[[-8.1184e-01, 3.8311e-01, 1.0658e+00],
+                              [2.4339e-01, 7.5942e-01, -1.7077e+00],
+                              [1.4960e+00, 1.4565e+00, -2.2484e-01]]],
+                            [[[2.4339e-01, -8.1184e-01, 3.8311e-01],
+                              [1.4960e+00, 7.5942e-01, 1.0658e+00],
+                              [1.4565e+00, -2.2484e-01, -1.7077e+00]]],
+                            [[[1.4960e+00, 2.4339e-01, -8.1184e-01],
+                              [1.4565e+00, 7.5942e-01, 3.8311e-01],
+                              [-2.2484e-01, -1.7077e+00, 1.0658e+00]]],
+                            [[[8.4734e-01, 1.0904e+00, 2.4356e+00],
+                              [9.5822e-01, 2.2260e-01, -2.4450e-01],
+                              [-1.5078e+00, 7.0902e-02, -1.5921e+00]]],
+                            [[[9.5822e-01, 8.4734e-01, 1.0904e+00],
+                              [-1.5078e+00, 2.2260e-01, 2.4356e+00],
+                              [7.0902e-02, -1.5921e+00, -2.4450e-01]]],
+                            [[[-1.5078e+00, 9.5822e-01, 8.4734e-01],
+                              [7.0902e-02, 2.2260e-01, 1.0904e+00],
+                              [-1.5921e+00, -2.4450e-01, 2.4356e+00]]],
+                            [[[7.0902e-02, -1.5078e+00, 9.5822e-01],
+                              [-1.5921e+00, 2.2260e-01, 8.4734e-01],
+                              [-2.4450e-01, 2.4356e+00, 1.0904e+00]]],
+                            [[[-1.5921e+00, 7.0902e-02, -1.5078e+00],
+                              [-2.4450e-01, 2.2260e-01, 9.5822e-01],
+                              [2.4356e+00, 1.0904e+00, 8.4734e-01]]],
+                            [[[-2.4450e-01, -1.5921e+00, 7.0902e-02],
+                              [2.4356e+00, 2.2260e-01, -1.5078e+00],
+                              [1.0904e+00, 8.4734e-01, 9.5822e-01]]],
+                            [[[2.4356e+00, -2.4450e-01, -1.5921e+00],
+                              [1.0904e+00, 2.2260e-01, 7.0902e-02],
+                              [8.4734e-01, 9.5822e-01, -1.5078e+00]]],
+                            [[[1.0904e+00, 2.4356e+00, -2.4450e-01],
+                              [8.4734e-01, 2.2260e-01, -1.5921e+00],
+                              [9.5822e-01, -1.5078e+00, 7.0902e-02]]],
+                            [[[2.1173e+00, -7.3524e-01, 1.8888e+00],
+                              [1.0169e+00, 4.7033e-01, -1.0875e+00],
+                              [-1.0736e+00, -5.2245e-01, -2.8733e-01]]],
+                            [[[1.0169e+00, 2.1173e+00, -7.3524e-01],
+                              [-1.0736e+00, 4.7033e-01, 1.8888e+00],
+                              [-5.2245e-01, -2.8733e-01, -1.0875e+00]]],
+                            [[[-1.0736e+00, 1.0169e+00, 2.1173e+00],
+                              [-5.2245e-01, 4.7033e-01, -7.3524e-01],
+                              [-2.8733e-01, -1.0875e+00, 1.8888e+00]]],
+                            [[[-5.2245e-01, -1.0736e+00, 1.0169e+00],
+                              [-2.8733e-01, 4.7033e-01, 2.1173e+00],
+                              [-1.0875e+00, 1.8888e+00, -7.3524e-01]]],
+                            [[[-2.8733e-01, -5.2245e-01, -1.0736e+00],
+                              [-1.0875e+00, 4.7033e-01, 1.0169e+00],
+                              [1.8888e+00, -7.3524e-01, 2.1173e+00]]],
+                            [[[-1.0875e+00, -2.8733e-01, -5.2245e-01],
+                              [1.8888e+00, 4.7033e-01, -1.0736e+00],
+                              [-7.3524e-01, 2.1173e+00, 1.0169e+00]]],
+                            [[[1.8888e+00, -1.0875e+00, -2.8733e-01],
+                              [-7.3524e-01, 4.7033e-01, -5.2245e-01],
+                              [2.1173e+00, 1.0169e+00, -1.0736e+00]]],
+                            [[[-7.3524e-01, 1.8888e+00, -1.0875e+00],
+                              [2.1173e+00, 4.7033e-01, -2.8733e-01],
+                              [1.0169e+00, -1.0736e+00, -5.2245e-01]]],
+                            [[[-5.6433e-01, 1.5835e+00, -1.5826e+00],
+                              [-8.8974e-01, -4.3128e-01, -2.2423e-01],
+                              [1.6552e-03, -1.7292e+00, 2.6639e-01]]],
+                            [[[-8.8974e-01, -5.6433e-01, 1.5835e+00],
+                              [1.6552e-03, -4.3128e-01, -1.5826e+00],
+                              [-1.7292e+00, 2.6639e-01, -2.2423e-01]]],
+                            [[[1.6552e-03, -8.8974e-01, -5.6433e-01],
+                              [-1.7292e+00, -4.3128e-01, 1.5835e+00],
+                              [2.6639e-01, -2.2423e-01, -1.5826e+00]]],
+                            [[[-1.7292e+00, 1.6552e-03, -8.8974e-01],
+                              [2.6639e-01, -4.3128e-01, -5.6433e-01],
+                              [-2.2423e-01, -1.5826e+00, 1.5835e+00]]],
+                            [[[2.6639e-01, -1.7292e+00, 1.6552e-03],
+                              [-2.2423e-01, -4.3128e-01, -8.8974e-01],
+                              [-1.5826e+00, 1.5835e+00, -5.6433e-01]]],
+                            [[[-2.2423e-01, 2.6639e-01, -1.7292e+00],
+                              [-1.5826e+00, -4.3128e-01, 1.6552e-03],
+                              [1.5835e+00, -5.6433e-01, -8.8974e-01]]],
+                            [[[-1.5826e+00, -2.2423e-01, 2.6639e-01],
+                              [1.5835e+00, -4.3128e-01, -1.7292e+00],
+                              [-5.6433e-01, -8.8974e-01, 1.6552e-03]]],
+                            [[[1.5835e+00, -1.5826e+00, -2.2423e-01],
+                              [-5.6433e-01, -4.3128e-01, 2.6639e-01],
+                              [-8.8974e-01, 1.6552e-03, -1.7292e+00]]],
+                            [[[-1.2951e-01, 1.3493e+00, -1.9329e+00],
+                              [5.6248e-01, -5.1189e-01, 1.3614e+00],
+                              [3.3680e-01, -8.7148e-01, 5.0592e-01]]],
+                            [[[5.6248e-01, -1.2951e-01, 1.3493e+00],
+                              [3.3680e-01, -5.1189e-01, -1.9329e+00],
+                              [-8.7148e-01, 5.0592e-01, 1.3614e+00]]],
+                            [[[3.3680e-01, 5.6248e-01, -1.2951e-01],
+                              [-8.7148e-01, -5.1189e-01, 1.3493e+00],
+                              [5.0592e-01, 1.3614e+00, -1.9329e+00]]],
+                            [[[-8.7148e-01, 3.3680e-01, 5.6248e-01],
+                              [5.0592e-01, -5.1189e-01, -1.2951e-01],
+                              [1.3614e+00, -1.9329e+00, 1.3493e+00]]],
+                            [[[5.0592e-01, -8.7148e-01, 3.3680e-01],
+                              [1.3614e+00, -5.1189e-01, 5.6248e-01],
+                              [-1.9329e+00, 1.3493e+00, -1.2951e-01]]],
+                            [[[1.3614e+00, 5.0592e-01, -8.7148e-01],
+                              [-1.9329e+00, -5.1189e-01, 3.3680e-01],
+                              [1.3493e+00, -1.2951e-01, 5.6248e-01]]],
+                            [[[-1.9329e+00, 1.3614e+00, 5.0592e-01],
+                              [1.3493e+00, -5.1189e-01, -8.7148e-01],
+                              [-1.2951e-01, 5.6248e-01, 3.3680e-01]]],
+                            [[[1.3493e+00, -1.9329e+00, 1.3614e+00],
+                              [-1.2951e-01, -5.1189e-01, 5.0592e-01],
+                              [5.6248e-01, 3.3680e-01, -8.7148e-01]]],
+                            [[[1.6781e-02, -8.3929e-01, 1.2060e+00],
+                              [-1.0764e+00, 4.7821e-01, 1.5342e+00],
+                              [-4.4542e-01, -1.8606e+00, 3.0827e-01]]],
+                            [[[-1.0764e+00, 1.6781e-02, -8.3929e-01],
+                              [-4.4542e-01, 4.7821e-01, 1.2060e+00],
+                              [-1.8606e+00, 3.0827e-01, 1.5342e+00]]],
+                            [[[-4.4542e-01, -1.0764e+00, 1.6781e-02],
+                              [-1.8606e+00, 4.7821e-01, -8.3929e-01],
+                              [3.0827e-01, 1.5342e+00, 1.2060e+00]]],
+                            [[[-1.8606e+00, -4.4542e-01, -1.0764e+00],
+                              [3.0827e-01, 4.7821e-01, 1.6781e-02],
+                              [1.5342e+00, 1.2060e+00, -8.3929e-01]]],
+                            [[[3.0827e-01, -1.8606e+00, -4.4542e-01],
+                              [1.5342e+00, 4.7821e-01, -1.0764e+00],
+                              [1.2060e+00, -8.3929e-01, 1.6781e-02]]],
+                            [[[1.5342e+00, 3.0827e-01, -1.8606e+00],
+                              [1.2060e+00, 4.7821e-01, -4.4542e-01],
+                              [-8.3929e-01, 1.6781e-02, -1.0764e+00]]],
+                            [[[1.2060e+00, 1.5342e+00, 3.0827e-01],
+                              [-8.3929e-01, 4.7821e-01, -1.8606e+00],
+                              [1.6781e-02, -1.0764e+00, -4.4542e-01]]],
+                            [[[-8.3929e-01, 1.2060e+00, 1.5342e+00],
+                              [1.6781e-02, 4.7821e-01, 3.0827e-01],
+                              [-1.0764e+00, -4.4542e-01, -1.8606e+00]]]])
+
+expected_grad = np.array([[[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]],
+                          [[[[8., 8., 8.], [8., 8., 8.], [8., 8., 8.]]]]])
+
+
+@pytest.mark.parametrize('device', [
+    'cpu',
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not torch.cuda.is_available(), reason='requires CUDA support')),
+])
+def test_active_rotated_filter(device):
+    feature = torch.tensor(
+        np_feature, dtype=torch.float, device=device, requires_grad=True)
+    indices = torch.tensor(np_indices, dtype=torch.int, device=device)
+    output = active_rotated_filter(feature, indices)
+    output.backward(torch.ones_like(output))
+    assert np.allclose(output.data.cpu().numpy(), expected_output, atol=1e-3)
+    assert np.allclose(
+        feature.grad.data.cpu().numpy(), expected_grad, atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_assign_score_withk.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_assign_score_withk.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8fc6ae6261b77a634e7681c4939612fe80ddf38
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_assign_score_withk.py
@@ -0,0 +1,188 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import assign_score_withk
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_paconv_assign_scores():
+    scores = torch.tensor([[[[0.06947571, 0.6065746], [0.28462553, 0.8378516],
+                             [0.7595994, 0.97220325], [0.519155, 0.766185]],
+                            [[0.15348864, 0.6051019], [0.21510637, 0.31916398],
+                             [0.00236845, 0.5842595], [0.6783676, 0.5216348]]],
+                           [[[0.23089725, 0.5568468], [0.7405102, 0.06438422],
+                             [0.6887394, 0.22089851], [0.0502342, 0.79228795]],
+                            [[0.44883424, 0.15427643],
+                             [0.13817799, 0.34856772], [0.7989621, 0.33788306],
+                             [0.15699774, 0.7693662]]]]).float().cuda()
+    scores.requires_grad_()
+    points = torch.tensor([[[[0.06001121, 0.92963666, 0.5753327, 0.7251477],
+                             [0.53563064, 0.23129565, 0.92366195, 0.44261628]],
+                            [[0.5770022, 0.56625944, 0.23560429, 0.11178821],
+                             [0.7735967, 0.95678777, 0.25468266, 0.02895975]],
+                            [[0.0589869, 0.09017515, 0.5977862, 0.02797985],
+                             [0.603862, 0.35991007, 0.85761684, 0.3096559]],
+                            [[0.22359002, 0.13983732, 0.5544243, 0.68863827],
+                             [0.85646236, 0.75651926, 0.8638947, 0.83600986]],
+                            [[0.45424145, 0.27458847, 0.6456112, 0.47162914],
+                             [0.15773582, 0.47645122, 0.79964715, 0.3323908]],
+                            [[0.8351399, 0.84696376, 0.9431732, 0.29418713],
+                             [0.77168906, 0.6996871, 0.19354361, 0.03392768]],
+                            [[0.30976456, 0.7074133, 0.581795, 0.976677],
+                             [0.69656056, 0.07199162, 0.4708506, 0.29117996]],
+                            [[0.5829035, 0.30201727, 0.76556486, 0.0935446],
+                             [0.88030535, 0.16129416, 0.9242525, 0.49545723]]],
+                           [[[0.50899494, 0.06482804, 0.44939405, 0.37704808],
+                             [0.47028124, 0.11969638, 0.62823206, 0.28560323]],
+                            [[0.40690207, 0.689753, 0.51636654, 0.23040164],
+                             [0.06935787, 0.00488842, 0.22462702, 0.09182382]],
+                            [[0.26611632, 0.00184339, 0.7730655, 0.5228131],
+                             [0.87776035, 0.77895886, 0.2787183, 0.16620636]],
+                            [[0.502574, 0.04039001, 0.5368497, 0.98379374],
+                             [0.40973026, 0.3238272, 0.9733018, 0.13988364]],
+                            [[0.04586202, 0.20983845, 0.20662665, 0.22270602],
+                             [0.60387236, 0.5155574, 0.51237285, 0.6528438]],
+                            [[0.45735973, 0.86821306, 0.61054605, 0.8370336],
+                             [0.45193362, 0.3734138, 0.7825672, 0.5699416]],
+                            [[0.44591594, 0.12447512, 0.09282011, 0.7055254],
+                             [0.25223452, 0.46696228, 0.7051136, 0.892151]],
+                            [[0.49615085, 0.47321403, 0.93138885, 0.7652197],
+                             [0.38766378, 0.30332977, 0.23131835,
+                              0.02863514]]]]).float().cuda()
+    points.requires_grad_()
+    centers = torch.tensor([[[[0.83878064, 0.96658987, 0.8033424, 0.9598312],
+                              [0.45035273, 0.8768925, 0.977736, 0.54547966]],
+                             [[0.01041394, 0.597893, 0.36212963, 0.4410367],
+                              [0.94879234, 0.8372817, 0.21237361, 0.67945415]],
+                             [[0.5096087, 0.26401454, 0.60034937, 0.5417416],
+                              [0.87591463, 0.546456, 0.4096033, 0.16373193]],
+                             [[0.79547447, 0.1482386, 0.12840575, 0.45384115],
+                              [0.5640288, 0.944541, 0.5745328, 0.73229736]],
+                             [[0.93011934, 0.7406011, 0.62621707, 0.8677915],
+                              [0.91563636, 0.3595413, 0.6678378, 0.6085383]],
+                             [[0.22431666, 0.65617776, 0.7483924, 0.6263364],
+                              [0.30968404, 0.78204364, 0.14899081,
+                               0.09628749]],
+                             [[0.73675203, 0.72104895, 0.4648038, 0.6101647],
+                              [0.7817645, 0.16572917, 0.3311919, 0.43407398]],
+                             [[0.8193154, 0.09559608, 0.05978829, 0.90262103],
+                              [0.4256065, 0.8165596, 0.8206446, 0.6604721]]],
+                            [[[0.7159653, 0.18600845, 0.21433902, 0.3159626],
+                              [0.3921569, 0.33221376, 0.5061177, 0.7961841]],
+                             [[0.95338356, 0.04785997, 0.67185795, 0.6538394],
+                              [0.4729132, 0.33404195, 0.17750603, 0.8445621]],
+                             [[0.6755793, 0.16193843, 0.75943846, 0.92123103],
+                              [0.2781859, 0.03114432, 0.710638, 0.52729136]],
+                             [[0.8376105, 0.10858494, 0.13208169, 0.365772],
+                              [0.5930795, 0.27390373, 0.14036089, 0.170403]],
+                             [[0.3479789, 0.89855295, 0.04844379, 0.9871029],
+                              [0.29781651, 0.0244137, 0.9179047, 0.8081611]],
+                             [[0.12460887, 0.44991326, 0.19382608, 0.35037738],
+                              [0.2773472, 0.4362057, 0.36757517, 0.5993509]],
+                             [[0.29630446, 0.90046406, 0.5417113, 0.13510644],
+                              [0.09623539, 0.04226565, 0.32001644,
+                               0.44358212]],
+                             [[0.5274848, 0.82096446, 0.9415489, 0.7123748],
+                              [0.7537517, 0.8086482, 0.85345286,
+                               0.7472754]]]]).float().cuda()
+    centers.requires_grad_()
+    knn_idx = torch.tensor([[[6, 7, 4, 6], [2, 4, 2, 4]],
+                            [[7, 1, 3, 2], [6, 0, 2, 6]]]).long().cuda()
+    aggregate = 'sum'
+    expected_output = torch.tensor(
+        [[[[-0.08134781, 0.03877336, -0.8212776, -0.2869547],
+           [-0.23378491, -0.24112664, -0.1600166, -0.4121864]],
+          [[-0.05780616, -0.12298299, -0.0370461, -0.07889931],
+           [-0.13956165, -0.02006848, -0.10940295, -0.0293439]],
+          [[0.09284145, 0.58250105, 0.5927749, 0.16774094],
+           [0.27070042, 0.13422406, 0.2617501, 0.23416464]],
+          [[-0.06121218, -0.09561322, -0.20408826, 0.08079343],
+           [0.00944228, 0.03874819, 0.08404065, 0.04041629]]],
+         [[[-0.2110898, -0.13335688, -0.09315082, 0.08512095],
+           [0.09121774, 0.15976946, 0.23994486, 0.14350912]],
+          [[-0.36167958, -0.14891288, -0.64470863, -0.0646704],
+           [-0.28276974, -0.08847666, -0.46904767, 0.20491874]],
+          [[-0.34877953, -0.35533834, -0.25225785, -0.4638189],
+           [-0.1420663, 0.09467781, 0.17088932, 0.22580585]],
+          [[-0.3879708, -0.3991068, 0.05276498, -0.46989647],
+           [0.32522714, -0.02163534, 0.21604237, 0.4346682]]]]).float()
+
+    # test forward
+    output = assign_score_withk(scores, points, centers, knn_idx, aggregate)
+    assert torch.allclose(output.detach().cpu(), expected_output, atol=1e-6)
+
+    # test backward
+    loss = output.sum()
+    loss.backward()
+    expected_scores_grad = torch.tensor([[[[0.04288036, -0.18217683],
+                                           [-0.78873926, 0.7485497],
+                                           [-0.6866992, 0.05346543],
+                                           [0.04288036, -0.18217683]],
+                                          [[-1.1407862, 0.13533896],
+                                           [-0.06964391, -0.22948086],
+                                           [-1.1407862, 0.13533896],
+                                           [-0.06964391, -0.22948086]]],
+                                         [[[-0.3363995, -2.212181],
+                                           [-1.1589496, -2.7724311],
+                                           [-0.9387654, -1.3163853],
+                                           [-1.4385346, -1.0614843]],
+                                          [[-0.5048497, 1.4143617],
+                                           [-0.47332114, 0.6017133],
+                                           [-0.30974793, 1.1995442],
+                                           [-0.5048497, 1.4143617]]]]).float()
+    expected_points_grad = torch.tensor(
+        [[[[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0.15585709, 0.15585709, 0.15585709, 0.15585709],
+           [1.1893613, 1.1893613, 1.1893613, 1.1893613]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[1.6530733, 1.6530733, 1.6530733, 1.6530733],
+           [1.8130021, 1.8130021, 1.8130021, 1.8130021]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0.58863074, 0.58863074, 0.58863074, 0.58863074],
+           [1.3727596, 1.3727596, 1.3727596, 1.3727596]],
+          [[0.28462553, 0.28462553, 0.28462553, 0.28462553],
+           [0.8378516, 0.8378516, 0.8378516, 0.8378516]]],
+         [[[0.13817799, 0.13817799, 0.13817799, 0.13817799],
+           [0.34856772, 0.34856772, 0.34856772, 0.34856772]],
+          [[0.7405102, 0.7405102, 0.7405102, 0.7405102],
+           [0.06438422, 0.06438422, 0.06438422, 0.06438422]],
+          [[0.8491963, 0.8491963, 0.8491963, 0.8491963],
+           [1.1301711, 1.1301711, 1.1301711, 1.1301711]],
+          [[0.6887394, 0.6887394, 0.6887394, 0.6887394],
+           [0.22089851, 0.22089851, 0.22089851, 0.22089851]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0.605832, 0.605832, 0.605832, 0.605832],
+           [0.92364264, 0.92364264, 0.92364264, 0.92364264]],
+          [[0.23089725, 0.23089725, 0.23089725, 0.23089725],
+           [0.5568468, 0.5568468, 0.5568468, 0.5568468]]]]).float()
+    expected_centers_grad = torch.tensor(
+        [[[[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[-1.0493311, -1.0493311, -1.0493311, -1.0493311],
+           [-2.0301602, -2.0301602, -2.0301602, -2.0301602]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[-1.6328557, -1.6328557, -1.6328557, -1.6328557],
+           [-3.1828144, -3.1828144, -3.1828144, -3.1828144]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]]],
+         [[[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[0., 0., 0., 0.], [0., 0., 0., 0.]],
+          [[-1.5429721, -1.5429721, -1.5429721, -1.5429721],
+           [-1.6100934, -1.6100934, -1.6100934, -1.6100934]],
+          [[-1.7103812, -1.7103812, -1.7103812, -1.7103812],
+           [-1.6344175, -1.6344175, -1.6344175, -1.6344175]]]]).float()
+    assert torch.allclose(
+        scores.grad.detach().cpu(), expected_scores_grad, atol=1e-6)
+    assert torch.allclose(
+        points.grad.detach().cpu(), expected_points_grad, atol=1e-6)
+    assert torch.allclose(
+        centers.grad.detach().cpu(), expected_centers_grad, atol=1e-6)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ball_query.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ball_query.py
new file mode 100644
index 0000000000000000000000000000000000000000..41376d687162467451aa902eb149473db2a168bd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ball_query.py
@@ -0,0 +1,102 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import ball_query
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_ball_query():
+    new_xyz = torch.tensor([[[-0.0740, 1.3147, -1.3625],
+                             [-2.2769, 2.7817, -0.2334],
+                             [-0.4003, 2.4666, -0.5116],
+                             [-0.0740, 1.3147, -1.3625],
+                             [-0.0740, 1.3147, -1.3625]],
+                            [[-2.0289, 2.4952, -0.1708],
+                             [-2.0668, 6.0278, -0.4875],
+                             [0.4066, 1.4211, -0.2947],
+                             [-2.0289, 2.4952, -0.1708],
+                             [-2.0289, 2.4952, -0.1708]]]).cuda()
+
+    xyz = torch.tensor([[[-0.0740, 1.3147, -1.3625], [0.5555, 1.0399, -1.3634],
+                         [-0.4003, 2.4666,
+                          -0.5116], [-0.5251, 2.4379, -0.8466],
+                         [-0.9691, 1.1418,
+                          -1.3733], [-0.2232, 0.9561, -1.3626],
+                         [-2.2769, 2.7817, -0.2334],
+                         [-0.2822, 1.3192, -1.3645], [0.1533, 1.5024, -1.0432],
+                         [0.4917, 1.1529, -1.3496]],
+                        [[-2.0289, 2.4952,
+                          -0.1708], [-0.7188, 0.9956, -0.5096],
+                         [-2.0668, 6.0278, -0.4875], [-1.9304, 3.3092, 0.6610],
+                         [0.0949, 1.4332, 0.3140], [-1.2879, 2.0008, -0.7791],
+                         [-0.7252, 0.9611, -0.6371], [0.4066, 1.4211, -0.2947],
+                         [0.3220, 1.4447, 0.3548], [-0.9744, 2.3856,
+                                                    -1.2000]]]).cuda()
+
+    idx = ball_query(0, 0.2, 5, xyz, new_xyz)
+    expected_idx = torch.tensor([[[0, 0, 0, 0, 0], [6, 6, 6, 6, 6],
+                                  [2, 2, 2, 2, 2], [0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0]],
+                                 [[0, 0, 0, 0, 0], [2, 2, 2, 2, 2],
+                                  [7, 7, 7, 7, 7], [0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0]]]).cuda()
+    assert torch.all(idx == expected_idx)
+
+    # test dilated ball query
+    idx = ball_query(0.2, 0.4, 5, xyz, new_xyz)
+    expected_idx = torch.tensor([[[0, 5, 7, 0, 0], [6, 6, 6, 6, 6],
+                                  [2, 3, 2, 2, 2], [0, 5, 7, 0, 0],
+                                  [0, 5, 7, 0, 0]],
+                                 [[0, 0, 0, 0, 0], [2, 2, 2, 2, 2],
+                                  [7, 7, 7, 7, 7], [0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0]]]).cuda()
+    assert torch.all(idx == expected_idx)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_stack_ball_query():
+    new_xyz = torch.tensor([[-0.0740, 1.3147, -1.3625],
+                            [-2.2769, 2.7817, -0.2334],
+                            [-0.4003, 2.4666, -0.5116],
+                            [-0.0740, 1.3147, -1.3625],
+                            [-0.0740, 1.3147, -1.3625],
+                            [-2.0289, 2.4952, -0.1708],
+                            [-2.0668, 6.0278, -0.4875],
+                            [0.4066, 1.4211, -0.2947],
+                            [-2.0289, 2.4952, -0.1708],
+                            [-2.0289, 2.4952, -0.1708]]).cuda()
+    new_xyz_batch_cnt = torch.tensor([5, 5], dtype=torch.int32).cuda()
+    xyz = torch.tensor([[-0.0740, 1.3147, -1.3625], [0.5555, 1.0399, -1.3634],
+                        [-0.4003, 2.4666, -0.5116], [-0.5251, 2.4379, -0.8466],
+                        [-0.9691, 1.1418, -1.3733], [-0.2232, 0.9561, -1.3626],
+                        [-2.2769, 2.7817, -0.2334], [-0.2822, 1.3192, -1.3645],
+                        [0.1533, 1.5024, -1.0432], [0.4917, 1.1529, -1.3496],
+                        [-2.0289, 2.4952, -0.1708], [-0.7188, 0.9956, -0.5096],
+                        [-2.0668, 6.0278, -0.4875], [-1.9304, 3.3092, 0.6610],
+                        [0.0949, 1.4332, 0.3140], [-1.2879, 2.0008, -0.7791],
+                        [-0.7252, 0.9611, -0.6371], [0.4066, 1.4211, -0.2947],
+                        [0.3220, 1.4447, 0.3548], [-0.9744, 2.3856,
+                                                   -1.2000]]).cuda()
+    xyz_batch_cnt = torch.tensor([10, 10], dtype=torch.int32).cuda()
+    idx = ball_query(0, 0.2, 5, xyz, new_xyz, xyz_batch_cnt, new_xyz_batch_cnt)
+    expected_idx = torch.tensor([[0, 0, 0, 0, 0], [6, 6, 6, 6, 6],
+                                 [2, 2, 2, 2, 2], [0, 0, 0, 0, 0],
+                                 [0, 0, 0, 0, 0], [0, 0, 0, 0, 0],
+                                 [2, 2, 2, 2, 2], [7, 7, 7, 7, 7],
+                                 [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]).cuda()
+    assert torch.all(idx == expected_idx)
+
+    xyz = xyz.float()
+    new_xyz = new_xyz.float()
+    expected_idx = expected_idx.float()
+    idx = ball_query(0, 0.2, 5, xyz, new_xyz, xyz_batch_cnt, new_xyz_batch_cnt)
+    assert torch.all(idx == expected_idx)
+
+    xyz = xyz.half()
+    new_xyz = new_xyz.half()
+    expected_idx = expected_idx.half()
+    idx = ball_query(0, 0.2, 5, xyz, new_xyz, xyz_batch_cnt, new_xyz_batch_cnt)
+    assert torch.all(idx == expected_idx)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bbox.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bbox.py
new file mode 100644
index 0000000000000000000000000000000000000000..7123b1ee103fa5a0e1b8865ce97c14a359e4918d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bbox.py
@@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE, IS_MPS_AVAILABLE
+
+
+class TestBBox:
+
+    def _test_bbox_overlaps(self, device='cpu', dtype=torch.float):
+        from mmcv.ops import bbox_overlaps
+        b1 = torch.tensor([[1.0, 1.0, 3.0, 4.0], [2.0, 2.0, 3.0, 4.0],
+                           [7.0, 7.0, 8.0, 8.0]]).to(device).type(dtype)
+        b2 = torch.tensor([[0.0, 2.0, 2.0, 5.0], [2.0, 1.0, 3.0,
+                                                  3.0]]).to(device).type(dtype)
+        should_output = np.array([[0.33333334, 0.5], [0.2, 0.5], [0.0, 0.0]])
+        out = bbox_overlaps(b1, b2, offset=1)
+        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
+
+        b1 = torch.tensor([[1.0, 1.0, 3.0, 4.0], [2.0, 2.0, 3.0,
+                                                  4.0]]).to(device).type(dtype)
+        b2 = torch.tensor([[0.0, 2.0, 2.0, 5.0], [2.0, 1.0, 3.0,
+                                                  3.0]]).to(device).type(dtype)
+        should_output = np.array([0.33333334, 0.5])
+        out = bbox_overlaps(b1, b2, aligned=True, offset=1)
+        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
+
+        b1 = torch.tensor([[0.0, 0.0, 3.0, 3.0]]).to(device).type(dtype)
+        b2 = torch.tensor([[4.0, 0.0, 5.0, 3.0], [3.0, 0.0, 4.0, 3.0],
+                           [2.0, 0.0, 3.0, 3.0], [1.0, 0.0, 2.0,
+                                                  3.0]]).to(device).type(dtype)
+        should_output = np.array([0, 0.2, 0.5, 0.5])
+        out = bbox_overlaps(b1, b2, offset=1)
+        assert np.allclose(out.cpu().numpy(), should_output, 1e-2)
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support')),
+        pytest.param(
+            'mps',
+            marks=pytest.mark.skipif(
+                not IS_MPS_AVAILABLE, reason='requires MPS support'))
+    ])
+    def test_bbox_overlaps_float(self, device):
+        self._test_bbox_overlaps(device, dtype=torch.float)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_bbox_overlaps_half(self, device):
+        self._test_bbox_overlaps(device, dtype=torch.half)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bezier_align.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bezier_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f05b75b2d3a3390ea609e3b70aa21ca64d1810b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bezier_align.py
@@ -0,0 +1,54 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE
+
+inputs = ([[[
+    [1., 2., 5., 6.],
+    [3., 4., 7., 8.],
+    [9., 10., 13., 14.],
+    [11., 12., 15., 16.],
+]]], [[0., 0., 0., 1, 0., 2., 0., 3., 0., 3., 3., 2., 3., 1., 3., 0., 3.]])
+outputs = ([[[[1., 1.75, 3.5, 5.25], [2.5, 3.25, 5., 6.75],
+              [6., 6.75, 8.5, 10.25],
+              [9.5, 10.25, 12., 13.75]]]], [[[[1.5625, 1.5625, 1.5625, 0.3125],
+                                              [1.5625, 1.5625, 1.5625, 0.3125],
+                                              [1.5625, 1.5625, 1.5625, 0.3125],
+                                              [0.3125, 0.3125, 0.3125,
+                                               0.0625]]]])
+
+
+@pytest.mark.parametrize('device', [
+    'cpu',
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+])
+@pytest.mark.parametrize('dtype', [torch.float, torch.float, torch.half])
+def test_bezieralign(device, dtype):
+    try:
+        from mmcv.ops import bezier_align
+    except ModuleNotFoundError:
+        pytest.skip('test requires compilation')
+    pool_h = 4
+    pool_w = 4
+    spatial_scale = 1.0
+    sampling_ratio = 1
+    np_input = np.array(inputs[0])
+    np_rois = np.array(inputs[1])
+    np_output = np.array(outputs[0])
+    np_grad = np.array(outputs[1])
+
+    x = torch.tensor(np_input, dtype=dtype, device=device, requires_grad=True)
+    rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+    output = bezier_align(x, rois, (pool_h, pool_w), spatial_scale,
+                          sampling_ratio, False)
+    output.backward(torch.ones_like(output))
+    assert np.allclose(
+        output.data.type(torch.float).cpu().numpy(), np_output, atol=1e-3)
+    assert np.allclose(
+        x.grad.data.type(torch.float).cpu().numpy(), np_grad, atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bilinear_grid_sample.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bilinear_grid_sample.py
new file mode 100644
index 0000000000000000000000000000000000000000..87264fb8939559fd013c5f3a6be64d0102929135
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_bilinear_grid_sample.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+import torch.nn.functional as F
+
+
+class TestBilinearGridSample:
+
+    def _test_bilinear_grid_sample(self,
+                                   dtype=torch.float,
+                                   align_corners=False,
+                                   multiplier=1,
+                                   precision=1e-3):
+        from mmcv.ops.point_sample import bilinear_grid_sample
+
+        input = torch.rand(1, 1, 20, 20, dtype=dtype)
+        grid = torch.Tensor([[[1, 0, 0], [0, 1, 0]]])
+        grid = F.affine_grid(
+            grid, (1, 1, 15, 15), align_corners=align_corners).type_as(input)
+        grid *= multiplier
+
+        out = bilinear_grid_sample(input, grid, align_corners=align_corners)
+        ref_out = F.grid_sample(input, grid, align_corners=align_corners)
+
+        assert np.allclose(out.data.detach().cpu().numpy(),
+                           ref_out.data.detach().cpu().numpy(), precision)
+
+    def test_bilinear_grid_sample(self):
+        self._test_bilinear_grid_sample(torch.float, False)
+        self._test_bilinear_grid_sample(torch.float, True)
+        self._test_bilinear_grid_sample(torch.float, False)
+        self._test_bilinear_grid_sample(torch.float, True)
+        self._test_bilinear_grid_sample(torch.float, False)
+        self._test_bilinear_grid_sample(torch.float, True, 5)
+        self._test_bilinear_grid_sample(torch.float, False, 10)
+        self._test_bilinear_grid_sample(torch.float, True, -6)
+        self._test_bilinear_grid_sample(torch.float, False, -10)
+        self._test_bilinear_grid_sample(torch.float, True, 5)
+        self._test_bilinear_grid_sample(torch.float, False, 10)
+        self._test_bilinear_grid_sample(torch.float, True, -6)
+        self._test_bilinear_grid_sample(torch.float, False, -10)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_border_align.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_border_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d8dc32420fbb295e23f4b97959fdc415906fd5b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_border_align.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+
+import numpy as np
+import pytest
+import torch
+
+# [1,4c,h,w]
+input_arr = [[[[1., 2., 3., 4.], [5., 6., 7., 8.], [9., 10., 11., 12.]],
+              [[6, 7, 5, 8], [2, 1, 3, 4], [12, 9, 11, 10]],
+              [[-2, -3, 2, 0], [-4, -5, 1, -1], [-1, -1, -1, -1]],
+              [[0, -1, 2, 1], [-4, -3, -2, -1], [-1, -2, -3, -4]]]]
+# [1,h*w,4]
+boxes_arr = [[[0, 0, 2, 1], [1, 0, 3, 1], [1, 0, 2, 1], [0, 0, 3, 1],
+              [0, 0, 1, 2], [0, 0, 2, 2], [1, 0, 2, 1], [1, 0, 3, 1],
+              [0, 1, 1, 2], [0, 0, 3, 2], [1, 0, 3, 2], [2, 0, 3, 2]]]
+output_dict = {
+    # [1,c,h*w,4] for each value,
+    # the output is manually checked for its correctness
+
+    # pool_size=1
+    1: [[[[3., 6., 1., 2.], [4., 7., -1., 1.], [3., 7., 1., 2.],
+          [4., 6., -1., 1.], [2., 12., -1., -1.], [3., 12., -1., 2.],
+          [3., 7., 1., 2.], [4., 7., -1., 1.], [6., 12., -1., -2.],
+          [4., 12., -1., 1.], [4., 9., -1., 1.], [4., 11., -1., 1.]]]],
+
+    # pool_size=2
+    2: [[[[3., 6., 1., 2.], [4., 7., 1., 1.], [3., 7., 1., 2.],
+          [4., 6., -1., 1.], [2., 12., -1., -1.], [3., 12., -1., 2.],
+          [3., 7., 1., 2.], [4., 7., 1., 1.], [6., 12., -1., -2.],
+          [4., 12., -1., 1.], [4., 9., -1., 1.], [4., 11., -1., 1.]]]],
+}
+input_grad_dict = {
+    # [1,4c,h,w] for each value
+    # the grad is manually checked for its correctness
+
+    # pool_size=1
+    1: [[[[0., 1., 4., 6.], [0., 1., 0., 0.], [0., 0., 0., 0.]],
+         [[2., 4., 0., 0.], [0., 0., 0., 0.], [4., 1., 1., 0.]],
+         [[0., 0., 0., 0.], [0., 0., 3., 3.], [0., 2., 1., 3.]],
+         [[0., 1., 4., 6.], [0., 0., 0., 0.], [0., 1., 0., 0.]]]],
+
+    # pool_size=2
+    2: [[[[0., 1., 4., 6.], [0., 1., 0., 0.], [0., 0., 0., 0.]],
+         [[2., 4., 0., 0.], [0., 0., 0., 0.], [4., 1., 1., 0.]],
+         [[0., 0., 0., 0.], [0., 0., 5., 1.], [0., 2., 1., 3.]],
+         [[0., 1., 4., 6.], [0., 0., 0., 0.], [0., 1., 0., 0.]]]],
+}
+
+
+def _test_border_align_allclose(device, dtype, pool_size):
+    if not torch.cuda.is_available() and device == 'cuda':
+        pytest.skip('test requires GPU')
+    try:
+        from mmcv.ops import BorderAlign, border_align
+    except ModuleNotFoundError:
+        pytest.skip('BorderAlign op is not successfully compiled')
+
+    np_input = np.array(input_arr)
+    np_boxes = np.array(boxes_arr)
+    np_output = np.array(output_dict[pool_size])
+    np_grad = np.array(input_grad_dict[pool_size])
+
+    input = torch.tensor(
+        np_input, dtype=dtype, device=device, requires_grad=True)
+    boxes = torch.tensor(np_boxes, dtype=dtype, device=device)
+
+    # test for border_align
+    input_cp = copy.deepcopy(input)
+    output = border_align(input_cp, boxes, pool_size)
+    output.backward(torch.ones_like(output))
+    assert np.allclose(
+        output.data.type(dtype).cpu().numpy(), np_output, atol=1e-5)
+    assert np.allclose(
+        input_cp.grad.data.type(dtype).cpu().numpy(), np_grad, atol=1e-5)
+
+    # test for BorderAlign
+    pool_module = BorderAlign(pool_size)
+    output = pool_module(input, boxes)
+    output.backward(torch.ones_like(output))
+    assert np.allclose(
+        output.data.type(dtype).cpu().numpy(), np_output, atol=1e-5)
+    assert np.allclose(
+        input.grad.data.type(dtype).cpu().numpy(), np_grad, atol=1e-5)
+
+
+@pytest.mark.parametrize('device', ['cuda'])
+@pytest.mark.parametrize('dtype', [torch.float, torch.half, torch.float])
+@pytest.mark.parametrize('pool_size', [1, 2])
+def test_border_align(device, dtype, pool_size):
+    _test_border_align_allclose(device, dtype, pool_size)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_quadri.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_quadri.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5cfcab61b0eec158c372e633b520db942e66bee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_quadri.py
@@ -0,0 +1,77 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE
+
+
+class TestBoxIoUQuadri:
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    ])
+    def test_box_iou_quadri_cuda(self, device):
+        from mmcv.ops import box_iou_quadri
+        np_boxes1 = np.asarray([[1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.0, 1.0],
+                                [2.0, 2.0, 3.0, 4.0, 4.0, 2.0, 3.0, 1.0],
+                                [7.0, 7.0, 8.0, 8.0, 9.0, 7.0, 8.0, 6.0]],
+                               dtype=np.float32)
+        np_boxes2 = np.asarray([[0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                                [2.0, 1.0, 2.0, 4.0, 4.0, 4.0, 4.0, 1.0],
+                                [7.0, 6.0, 7.0, 8.0, 9.0, 8.0, 9.0, 6.0]],
+                               dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.0714, 1.0000, 0.0000], [0.0000, 0.5000, 0.0000],
+             [0.0000, 0.0000, 0.5000]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.0714, 0.5000, 0.5000],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1).to(device)
+        boxes2 = torch.from_numpy(np_boxes2).to(device)
+
+        ious = box_iou_quadri(boxes1, boxes2)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_quadri(boxes1, boxes2, aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    ])
+    def test_box_iou_quadri_iof_cuda(self, device):
+        from mmcv.ops import box_iou_quadri
+        np_boxes1 = np.asarray([[1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.0, 1.0],
+                                [2.0, 2.0, 3.0, 4.0, 4.0, 2.0, 3.0, 1.0],
+                                [7.0, 7.0, 8.0, 8.0, 9.0, 7.0, 8.0, 6.0]],
+                               dtype=np.float32)
+        np_boxes2 = np.asarray([[0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                                [2.0, 1.0, 2.0, 4.0, 4.0, 4.0, 4.0, 1.0],
+                                [7.0, 6.0, 7.0, 8.0, 9.0, 8.0, 9.0, 6.0]],
+                               dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.1111, 1.0000, 0.0000], [0.0000, 1.0000, 0.0000],
+             [0.0000, 0.0000, 1.0000]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.1111, 1.0000, 1.0000],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1).to(device)
+        boxes2 = torch.from_numpy(np_boxes2).to(device)
+
+        ious = box_iou_quadri(boxes1, boxes2, mode='iof')
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_quadri(boxes1, boxes2, mode='iof', aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_rotated.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f5e0dfa3e4e56a4e5c5ea43df2b1ee2b625fbbe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_box_iou_rotated.py
@@ -0,0 +1,163 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+
+class TestBoxIoURotated:
+
+    def test_box_iou_rotated_cpu(self):
+        from mmcv.ops import box_iou_rotated
+        np_boxes1 = np.asarray(
+            [[1.0, 1.0, 3.0, 4.0, 0.5], [2.0, 2.0, 3.0, 4.0, 0.6],
+             [7.0, 7.0, 8.0, 8.0, 0.4]],
+            dtype=np.float32)
+        np_boxes2 = np.asarray(
+            [[0.0, 2.0, 2.0, 5.0, 0.3], [2.0, 1.0, 3.0, 3.0, 0.5],
+             [5.0, 5.0, 6.0, 7.0, 0.4]],
+            dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.3708, 0.4351, 0.0000], [0.1104, 0.4487, 0.0424],
+             [0.0000, 0.0000, 0.3622]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.3708, 0.4487, 0.3622],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1)
+        boxes2 = torch.from_numpy(np_boxes2)
+
+        # test cw angle definition
+        ious = box_iou_rotated(boxes1, boxes2)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(boxes1, boxes2, aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+        # test ccw angle definition
+        boxes1[..., -1] *= -1
+        boxes2[..., -1] *= -1
+        ious = box_iou_rotated(boxes1, boxes2, clockwise=False)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(boxes1, boxes2, aligned=True, clockwise=False)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='requires CUDA support')
+    def test_box_iou_rotated_cuda(self):
+        from mmcv.ops import box_iou_rotated
+        np_boxes1 = np.asarray(
+            [[1.0, 1.0, 3.0, 4.0, 0.5], [2.0, 2.0, 3.0, 4.0, 0.6],
+             [7.0, 7.0, 8.0, 8.0, 0.4]],
+            dtype=np.float32)
+        np_boxes2 = np.asarray(
+            [[0.0, 2.0, 2.0, 5.0, 0.3], [2.0, 1.0, 3.0, 3.0, 0.5],
+             [5.0, 5.0, 6.0, 7.0, 0.4]],
+            dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.3708, 0.4351, 0.0000], [0.1104, 0.4487, 0.0424],
+             [0.0000, 0.0000, 0.3622]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.3708, 0.4487, 0.3622],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1).cuda()
+        boxes2 = torch.from_numpy(np_boxes2).cuda()
+
+        # test cw angle definition
+        ious = box_iou_rotated(boxes1, boxes2)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(boxes1, boxes2, aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+        # test ccw angle definition
+        boxes1[..., -1] *= -1
+        boxes2[..., -1] *= -1
+        ious = box_iou_rotated(boxes1, boxes2, clockwise=False)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(boxes1, boxes2, aligned=True, clockwise=False)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+    def test_box_iou_rotated_iof_cpu(self):
+        from mmcv.ops import box_iou_rotated
+        np_boxes1 = np.asarray(
+            [[1.0, 1.0, 3.0, 4.0, 0.5], [2.0, 2.0, 3.0, 4.0, 0.6],
+             [7.0, 7.0, 8.0, 8.0, 0.4]],
+            dtype=np.float32)
+        np_boxes2 = np.asarray(
+            [[0.0, 2.0, 2.0, 5.0, 0.3], [2.0, 1.0, 3.0, 3.0, 0.5],
+             [5.0, 5.0, 6.0, 7.0, 0.4]],
+            dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.4959, 0.5306, 0.0000], [0.1823, 0.5420, 0.1832],
+             [0.0000, 0.0000, 0.4404]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.4959, 0.5420, 0.4404],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1)
+        boxes2 = torch.from_numpy(np_boxes2)
+
+        # test cw angle definition
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof')
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof', aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+        # test ccw angle definition
+        boxes1[..., -1] *= -1
+        boxes2[..., -1] *= -1
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof', clockwise=False)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+        ious = box_iou_rotated(
+            boxes1, boxes2, mode='iof', aligned=True, clockwise=False)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='requires CUDA support')
+    def test_box_iou_rotated_iof_cuda(self):
+        from mmcv.ops import box_iou_rotated
+        np_boxes1 = np.asarray(
+            [[1.0, 1.0, 3.0, 4.0, 0.5], [2.0, 2.0, 3.0, 4.0, 0.6],
+             [7.0, 7.0, 8.0, 8.0, 0.4]],
+            dtype=np.float32)
+        np_boxes2 = np.asarray(
+            [[0.0, 2.0, 2.0, 5.0, 0.3], [2.0, 1.0, 3.0, 3.0, 0.5],
+             [5.0, 5.0, 6.0, 7.0, 0.4]],
+            dtype=np.float32)
+        np_expect_ious = np.asarray(
+            [[0.4959, 0.5306, 0.0000], [0.1823, 0.5420, 0.1832],
+             [0.0000, 0.0000, 0.4404]],
+            dtype=np.float32)
+        np_expect_ious_aligned = np.asarray([0.4959, 0.5420, 0.4404],
+                                            dtype=np.float32)
+
+        boxes1 = torch.from_numpy(np_boxes1).cuda()
+        boxes2 = torch.from_numpy(np_boxes2).cuda()
+
+        # test cw angle definition
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof')
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof', aligned=True)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
+
+        # test ccw angle definition
+        boxes1[..., -1] *= -1
+        boxes2[..., -1] *= -1
+        ious = box_iou_rotated(boxes1, boxes2, mode='iof', clockwise=False)
+        assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+        ious = box_iou_rotated(
+            boxes1, boxes2, mode='iof', aligned=True, clockwise=False)
+        assert np.allclose(
+            ious.cpu().numpy(), np_expect_ious_aligned, atol=1e-4)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_carafe.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_carafe.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a3809f4c4dc46b689d14f8b7b2000087f6bb0e7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_carafe.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+from torch.autograd import gradcheck
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+class TestCarafe:
+
+    def test_carafe_naive_gradcheck(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import CARAFENaive
+        feat = torch.randn(
+            2, 64, 3, 3, requires_grad=True, device='cuda').float()
+        mask = torch.randn(
+            2, 100, 6, 6, requires_grad=True,
+            device='cuda').sigmoid().float()
+        gradcheck(CARAFENaive(5, 4, 2), (feat, mask), atol=1e-4, eps=1e-4)
+
+    def test_carafe_gradcheck(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import CARAFE
+        feat = torch.randn(
+            2, 64, 3, 3, requires_grad=True, device='cuda').float()
+        mask = torch.randn(
+            2, 100, 6, 6, requires_grad=True,
+            device='cuda').sigmoid().float()
+        gradcheck(CARAFE(5, 4, 2), (feat, mask), atol=1e-4, eps=1e-4)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_carafe_allclose(self, device):
+        try:
+            from mmcv.ops import CARAFE
+        except ModuleNotFoundError:
+            pytest.skip('test requires compilation')
+
+        np_feat = np.fromfile(
+            'tests/data/for_carafe/carafe_feat.bin', dtype=np.float32)
+        np_mask = np.fromfile(
+            'tests/data/for_carafe/carafe_mask.bin', dtype=np.float32)
+        np_output = np.fromfile(
+            'tests/data/for_carafe/carafe_output.bin', dtype=np.float32)
+        np_feat_grad = np.fromfile(
+            'tests/data/for_carafe/carafe_feat_grad.bin', dtype=np.float32)
+        np_mask_grad = np.fromfile(
+            'tests/data/for_carafe/carafe_mask_grad.bin', dtype=np.float32)
+
+        np_feat = np_feat.reshape((2, 64, 3, 3))
+        np_mask = np_mask.reshape((2, 100, 6, 6))
+        np_output = np_output.reshape((2, 64, 6, 6))
+        np_feat_grad = np_feat_grad.reshape((2, 64, 3, 3))
+        np_mask_grad = np_mask_grad.reshape((2, 100, 6, 6))
+
+        feat = torch.tensor(
+            np_feat, dtype=torch.float, device=device, requires_grad=True)
+        mask = torch.tensor(
+            np_mask, dtype=torch.float, device=device, requires_grad=True)
+
+        carafe = CARAFE(5, 4, 2)
+
+        output = carafe(feat, mask)
+        output.backward(torch.ones_like(output))
+        assert np.allclose(
+            output.data.type(torch.float).cpu().numpy(), np_output, atol=1e-3)
+        assert np.allclose(
+            feat.grad.data.type(torch.float).cpu().numpy(),
+            np_feat_grad,
+            atol=1e-3)
+        assert np.allclose(
+            mask.grad.data.type(torch.float).cpu().numpy(),
+            np_mask_grad,
+            atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_cc_attention.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_cc_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2a8d22a39424c4401b0d6c35a1169da72c58dc2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_cc_attention.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+import torch.nn as nn
+
+
+class Loss(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, input, target):
+        input = input.view(-1)
+        target = target.view(-1)
+        return torch.mean(input - target)
+
+
+class TestCrissCrossAttention:
+
+    def test_cc_attention(self):
+        device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
+
+        from mmcv.ops import CrissCrossAttention
+        loss_func = Loss()
+
+        input = np.fromfile(
+            'tests/data/for_ccattention/ccattention_input.bin',
+            dtype=np.float32)
+        output = np.fromfile(
+            'tests/data/for_ccattention/ccattention_output.bin',
+            dtype=np.float32)
+        input = input.reshape((1, 32, 45, 45))
+        output = output.reshape((1, 32, 45, 45))
+        label = torch.ones((1, 32, 45, 45))
+
+        input = torch.FloatTensor(input)
+        output = torch.FloatTensor(output)
+
+        input.requires_grad = True
+
+        shape = input.shape
+        channel = shape[1]
+
+        cca = CrissCrossAttention(channel)
+        cca.to(device)
+        input = input.to(device)
+        label = label.to(device)
+        cca.train()
+        test_output = cca(input)
+        test_loss = loss_func(test_output, label)
+        test_loss.backward()
+        test_output = test_output.detach().cpu().numpy()
+        output = output.numpy()
+
+        assert np.allclose(test_output, output)
+        assert test_output.shape == shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_chamfer_distance.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_chamfer_distance.py
new file mode 100644
index 0000000000000000000000000000000000000000..522dcdddc76d49cab6e5b5846bee9ae32d116c66
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_chamfer_distance.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import chamfer_distance
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_chamfer_distance():
+    pointset1 = torch.tensor(
+        [[[1.3, 9.39], [2.3, 9.39], [2.3, 10.39], [1.3, 10.39]],
+         [[1.0, 9.39], [3.0, 9.39], [3.0, 10.39], [1.0, 10.39]],
+         [[1.6, 9.99], [2.3, 9.99], [2.3, 10.39], [1.6, 10.39]]],
+        device='cuda',
+        requires_grad=True)
+
+    pointset2 = torch.tensor(
+        [[[1.0, 9.39], [3.0, 9.39], [3.0, 10.39], [1.0, 10.39]],
+         [[1.3, 9.39], [2.3, 9.39], [2.3, 10.39], [1.3, 10.39]],
+         [[1.0, 9.39], [3.0, 9.39], [3.0, 10.39], [1.0, 10.39]]],
+        device='cuda',
+        requires_grad=True)
+
+    expected_dist1 = torch.tensor(
+        [[0.0900, 0.4900, 0.4900, 0.0900], [0.0900, 0.4900, 0.4900, 0.0900],
+         [0.5200, 0.6500, 0.4900, 0.3600]],
+        device='cuda')
+    expected_dist2 = torch.tensor(
+        [[0.0900, 0.4900, 0.4900, 0.0900], [0.0900, 0.4900, 0.4900, 0.0900],
+         [0.7200, 0.8500, 0.4900, 0.3600]],
+        device='cuda')
+
+    expected_pointset1_grad = torch.tensor(
+        [[[0.6000, 0.0000], [-1.4000, 0.0000], [-1.4000, 0.0000],
+          [0.6000, 0.0000]],
+         [[-0.6000, 0.0000], [1.4000, 0.0000], [1.4000, 0.0000],
+          [-0.6000, 0.0000]],
+         [[1.2000, -0.8000], [-1.4000, -0.8000], [-1.4000, 0.0000],
+          [1.2000, 0.0000]]],
+        device='cuda')
+
+    expected_pointset2_grad = torch.tensor(
+        [[[-0.6000, 0.0000], [1.4000, 0.0000], [1.4000, 0.0000],
+          [-0.6000, 0.0000]],
+         [[0.6000, 0.0000], [-1.4000, 0.0000], [-1.4000, 0.0000],
+          [0.6000, 0.0000]],
+         [[0.0000, 0.0000], [0.0000, 0.0000], [2.8000, 0.8000],
+          [-2.4000, 0.8000]]],
+        device='cuda')
+
+    dist1, dist2, idx1, idx2 = chamfer_distance(pointset1, pointset2)
+    dist1.backward(torch.ones_like(dist1))
+    assert torch.allclose(dist1, expected_dist1, 1e-2)
+    assert torch.allclose(dist2, expected_dist2, 1e-2)
+    assert torch.allclose(pointset1.grad.data, expected_pointset1_grad, 1e-2)
+    assert torch.allclose(pointset2.grad.data, expected_pointset2_grad, 1e-2)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_contour_expand.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_contour_expand.py
new file mode 100644
index 0000000000000000000000000000000000000000..b36bbf4155c282418b3659984a536a24fad0d8b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_contour_expand.py
@@ -0,0 +1,49 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+
+
+def test_contour_expand():
+    from mmcv.ops import contour_expand
+
+    np_internal_kernel_label = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                         [0, 0, 1, 1, 0, 0, 0, 0, 2, 0],
+                                         [0, 0, 1, 1, 0, 0, 0, 0, 2, 0],
+                                         [0, 0, 1, 1, 0, 0, 0, 0, 2, 0],
+                                         [0, 0, 1, 1, 0, 0, 0, 0, 2, 0],
+                                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                         [0, 0, 0, 0, 0, 0, 0, 0, 0,
+                                          0]]).astype(np.int32)
+    np_kernel_mask1 = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
+                                [0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
+                                [0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
+                                [0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0,
+                                 0]]).astype(np.uint8)
+    np_kernel_mask2 = (np_internal_kernel_label > 0).astype(np.uint8)
+
+    np_kernel_mask = np.stack([np_kernel_mask1, np_kernel_mask2])
+    min_area = 1
+    kernel_region_num = 3
+    result = contour_expand(np_kernel_mask, np_internal_kernel_label, min_area,
+                            kernel_region_num)
+    gt = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 1, 1, 1, 2, 2, 2, 0],
+          [0, 0, 1, 1, 1, 1, 2, 2, 2, 0], [0, 0, 1, 1, 1, 1, 2, 2, 2, 0],
+          [0, 0, 1, 1, 1, 1, 2, 2, 2, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+          [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
+    assert np.allclose(result, gt)
+
+    np_kernel_mask_t = torch.from_numpy(np_kernel_mask)
+    np_internal_kernel_label_t = torch.from_numpy(np_internal_kernel_label)
+    result = contour_expand(np_kernel_mask_t, np_internal_kernel_label_t,
+                            min_area, kernel_region_num)
+    assert np.allclose(result, gt)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_convex_iou.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_convex_iou.py
new file mode 100644
index 0000000000000000000000000000000000000000..533037762ca37e3f2918be07deb43d73dd151d13
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_convex_iou.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import convex_giou, convex_iou
+
+np_pointsets = np.asarray([[
+    1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 3.0, 1.0, 2.0, 3.0, 3.0,
+    2.0, 1.5, 1.5
+],
+                           [
+                               1.5, 1.5, 2.5, 2.5, 1.5, 2.5, 2.5, 1.5, 1.5,
+                               3.5, 3.5, 1.5, 2.5, 3.5, 3.5, 2.5, 2.0, 2.0
+                           ]])
+
+np_polygons = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0],
+                          [1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 1.0]])
+
+np_expected_iou = np.asarray([[0.2857, 0.8750], [0.0588, 0.4286]])
+
+np_expected_giou = np.asarray([0.2857, 0.3831])
+
+np_expected_grad = np.asarray([[
+    0.0204, 0.0408, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0612,
+    -0.0408, -0.0408, 0.0816, -0.0408, -0.0816, -0.0816, -0.0408, 0.0000,
+    0.0000
+],
+                               [
+                                   -0.1848, -0.1848, 0.0000, 0.0000, 0.0000,
+                                   0.0000, 0.0000, 0.0000, -0.1076, -0.0801,
+                                   -0.0801, -0.1076, -0.0367, -0.0734, -0.0734,
+                                   -0.0367, 0.0000, 0.0000
+                               ]])
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_convex_iou():
+    pointsets = torch.from_numpy(np_pointsets).float().cuda()
+    polygons = torch.from_numpy(np_polygons).float().cuda()
+    expected_iou = torch.from_numpy(np_expected_iou).float().cuda()
+    assert torch.allclose(
+        convex_iou(pointsets, polygons), expected_iou, atol=1e-3)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_convex_giou():
+    pointsets = torch.from_numpy(np_pointsets).float().cuda()
+    polygons = torch.from_numpy(np_polygons).float().cuda()
+    expected_giou = torch.from_numpy(np_expected_giou).float().cuda()
+    expected_grad = torch.from_numpy(np_expected_grad).float().cuda()
+    giou, grad = convex_giou(pointsets, polygons)
+    assert torch.allclose(giou, expected_giou, atol=1e-3)
+    assert torch.allclose(grad, expected_grad, atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_corner_pool.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_corner_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6dd25f2232f0a420249b8e538357280bf05de61
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_corner_pool.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""
+CommandLine:
+    pytest tests/test_corner_pool.py
+"""
+import pytest
+import torch
+
+from mmcv.ops import CornerPool
+
+
+def test_corner_pool_device_and_dtypes_cpu():
+    """
+    CommandLine:
+        xdoctest -m tests/test_corner_pool.py \
+            test_corner_pool_device_and_dtypes_cpu
+    """
+    with pytest.raises(AssertionError):
+        # pool mode must in ['bottom', 'left', 'right', 'top']
+        pool = CornerPool('corner')
+
+    lr_tensor = torch.tensor([[[[0, 0, 0, 0, 0], [2, 1, 3, 0, 2],
+                                [5, 4, 1, 1, 6], [0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0]]]])
+    tb_tensor = torch.tensor([[[[0, 3, 1, 0, 0], [0, 1, 1, 0, 0],
+                                [0, 3, 4, 0, 0], [0, 2, 2, 0, 0],
+                                [0, 0, 2, 0, 0]]]])
+    # Left Pool
+    left_answer = torch.tensor([[[[0, 0, 0, 0, 0], [3, 3, 3, 2, 2],
+                                  [6, 6, 6, 6, 6], [0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0]]]])
+    pool = CornerPool('left')
+    left_tensor = pool(lr_tensor)
+    assert left_tensor.type() == lr_tensor.type()
+    assert torch.equal(left_tensor, left_answer)
+    # Right Pool
+    right_answer = torch.tensor([[[[0, 0, 0, 0, 0], [2, 2, 3, 3, 3],
+                                   [5, 5, 5, 5, 6], [0, 0, 0, 0, 0],
+                                   [0, 0, 0, 0, 0]]]])
+    pool = CornerPool('right')
+    right_tensor = pool(lr_tensor)
+    assert right_tensor.type() == lr_tensor.type()
+    assert torch.equal(right_tensor, right_answer)
+    # Top Pool
+    top_answer = torch.tensor([[[[0, 3, 4, 0, 0], [0, 3, 4, 0, 0],
+                                 [0, 3, 4, 0, 0], [0, 2, 2, 0, 0],
+                                 [0, 0, 2, 0, 0]]]])
+    pool = CornerPool('top')
+    top_tensor = pool(tb_tensor)
+    assert top_tensor.type() == tb_tensor.type()
+    assert torch.equal(top_tensor, top_answer)
+    # Bottom Pool
+    bottom_answer = torch.tensor([[[[0, 3, 1, 0, 0], [0, 3, 1, 0, 0],
+                                    [0, 3, 4, 0, 0], [0, 3, 4, 0, 0],
+                                    [0, 3, 4, 0, 0]]]])
+    pool = CornerPool('bottom')
+    bottom_tensor = pool(tb_tensor)
+    assert bottom_tensor.type() == tb_tensor.type()
+    assert torch.equal(bottom_tensor, bottom_answer)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_correlation.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_correlation.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e5011ae1bdf0b6f1ac9c09702e325780f576d3b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_correlation.py
@@ -0,0 +1,46 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import Correlation
+
+_input1 = [[[[1., 2., 3.], [0., 1., 2.], [3., 5., 2.]]]]
+_input2 = [[[[1., 2., 3.], [3., 1., 2.], [8., 5., 2.]]]]
+
+gt_out_shape = (1, 1, 1, 3, 3)
+_gt_out = [[[[[1., 4., 9.], [0., 1., 4.], [24., 25., 4.]]]]]
+gt_input1_grad = [[[[1., 2., 3.], [3., 1., 2.], [8., 5., 2.]]]]
+
+
+def assert_equal_tensor(tensor_a, tensor_b):
+
+    assert tensor_a.eq(tensor_b).all()
+
+
+class TestCorrelation:
+
+    def _test_correlation(self, dtype=torch.float):
+
+        layer = Correlation(max_displacement=0)
+
+        input1 = torch.tensor(_input1, dtype=dtype).cuda()
+        input2 = torch.tensor(_input2, dtype=dtype).cuda()
+        input1.requires_grad = True
+        input2.requires_grad = True
+        out = layer(input1, input2)
+        out.backward(torch.ones_like(out))
+
+        # `eq_cpu` is not implemented for 'Half' in torch1.5.0,
+        # so we need to make a comparison for cuda tensor
+        # rather than cpu tensor
+        gt_out = torch.tensor(_gt_out, dtype=dtype).cuda()
+        assert_equal_tensor(out, gt_out)
+        assert_equal_tensor(input1.grad.detach(), input2)
+        assert_equal_tensor(input2.grad.detach(), input1)
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='requires CUDA support')
+    def test_correlation(self):
+        self._test_correlation(torch.float)
+        self._test_correlation(torch.float)
+        self._test_correlation(torch.half)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_conv.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..9411024affa9697c1be785e4f87df57d86b6907a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_conv.py
@@ -0,0 +1,200 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+try:
+    # If PyTorch version >= 1.6.0 and fp16 is enabled, torch.cuda.amp.autocast
+    # would be imported and used; we should test if our modules support it.
+    from torch.cuda.amp import autocast
+except ImportError:
+    pass
+
+input = [[[[1., 2., 3.], [0., 1., 2.], [3., 5., 2.]]]]
+offset_weight = [[[0.1, 0.4, 0.6, 0.1]], [[0.3, 0.2, 0.1, 0.3]],
+                 [[0.5, 0.5, 0.2, 0.8]], [[0.8, 0.3, 0.9, 0.1]],
+                 [[0.3, 0.1, 0.2, 0.5]], [[0.3, 0.7, 0.5, 0.3]],
+                 [[0.6, 0.2, 0.5, 0.3]], [[0.4, 0.1, 0.8, 0.4]]]
+offset_bias = [0.7, 0.1, 0.8, 0.5, 0.6, 0.5, 0.4, 0.7]
+deform_weight = [[[0.4, 0.2, 0.1, 0.9]]]
+
+gt_out = [[[[1.650, 0.], [0.000, 0.]]]]
+gt_x_grad = [[[[-0.666, 0.204, 0.000], [0.030, -0.416, 0.012],
+               [0.000, 0.252, 0.129]]]]
+gt_offset_weight_grad = [[[[1.44, 2.88], [0.00, 1.44]]],
+                         [[[-0.72, -1.44], [0.00, -0.72]]],
+                         [[[0.00, 0.00], [0.00, 0.00]]],
+                         [[[0.00, 0.00], [0.00, 0.00]]],
+                         [[[-0.10, -0.20], [0.00, -0.10]]],
+                         [[[-0.08, -0.16], [0.00, -0.08]]],
+                         [[[-0.54, -1.08], [0.00, -0.54]]],
+                         [[[-0.54, -1.08], [0.00, -0.54]]]]
+gt_offset_bias_grad = [1.44, -0.72, 0., 0., -0.10, -0.08, -0.54, -0.54],
+gt_deform_weight_grad = [[[[3.62, 0.], [0.40, 0.18]]]]
+
+
+class TestDeformconv:
+
+    def _test_deformconv(self,
+                         dtype=torch.float,
+                         threshold=1e-3,
+                         device='cuda',
+                         batch_size=10,
+                         im2col_step=2):
+        if not torch.cuda.is_available() and device == 'cuda':
+            pytest.skip('test requires GPU')
+        from mmcv.ops import DeformConv2dPack
+        c_in = 1
+        c_out = 1
+        batch_size = 10
+        repeated_input = np.repeat(input, batch_size, axis=0)
+        repeated_gt_out = np.repeat(gt_out, batch_size, axis=0)
+        repeated_gt_x_grad = np.repeat(gt_x_grad, batch_size, axis=0)
+        x = torch.tensor(repeated_input, device=device, dtype=dtype)
+        x.requires_grad = True
+        model = DeformConv2dPack(
+            in_channels=c_in,
+            out_channels=c_out,
+            kernel_size=2,
+            stride=1,
+            padding=0,
+            im2col_step=im2col_step)
+        model.conv_offset.weight.data = torch.nn.Parameter(
+            torch.Tensor(offset_weight).reshape(8, 1, 2, 2))
+        model.conv_offset.bias.data = torch.nn.Parameter(
+            torch.Tensor(offset_bias).reshape(8))
+        model.weight.data = torch.nn.Parameter(
+            torch.Tensor(deform_weight).reshape(1, 1, 2, 2))
+        if device == 'cuda':
+            model.cuda()
+        model.type(dtype)
+
+        out = model(x)
+        out.backward(torch.ones_like(out))
+
+        assert np.allclose(out.data.detach().cpu().numpy(), repeated_gt_out,
+                           threshold)
+        assert np.allclose(x.grad.detach().cpu().numpy(), repeated_gt_x_grad,
+                           threshold)
+        # the batch size of the input is increased which results in
+        # a larger gradient so we need to divide by the batch_size
+        assert np.allclose(
+            model.conv_offset.weight.grad.detach().cpu().numpy() / batch_size,
+            gt_offset_weight_grad, threshold)
+        assert np.allclose(
+            model.conv_offset.bias.grad.detach().cpu().numpy() / batch_size,
+            gt_offset_bias_grad, threshold)
+        assert np.allclose(
+            model.weight.grad.detach().cpu().numpy() / batch_size,
+            gt_deform_weight_grad, threshold)
+
+        from mmcv.ops import DeformConv2d
+
+        # test bias
+        model = DeformConv2d(1, 1, 2, stride=1, padding=0)
+        assert not hasattr(model, 'bias')
+        # test bias=True
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(1, 1, 2, stride=1, padding=0, bias=True)
+        # test in_channels % group != 0
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(3, 2, 3, groups=2)
+        # test out_channels % group != 0
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(3, 4, 3, groups=3)
+
+    def _test_amp_deformconv(self,
+                             input_dtype,
+                             threshold=1e-3,
+                             batch_size=10,
+                             im2col_step=2):
+        """The function to test amp released on pytorch 1.6.0.
+
+        The type of input data might be torch.float or torch.half,
+        so we should test deform_conv in both cases. With amp, the
+        data type of model will NOT be set manually.
+
+        Args:
+            input_dtype: torch.float or torch.half.
+            threshold: the same as above function.
+        """
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import DeformConv2dPack
+        c_in = 1
+        c_out = 1
+        repeated_input = np.repeat(input, batch_size, axis=0)
+        repeated_gt_out = np.repeat(gt_out, batch_size, axis=0)
+        repeated_gt_x_grad = np.repeat(gt_x_grad, batch_size, axis=0)
+        x = torch.Tensor(repeated_input).cuda().type(input_dtype)
+        x.requires_grad = True
+        model = DeformConv2dPack(
+            in_channels=c_in,
+            out_channels=c_out,
+            kernel_size=2,
+            stride=1,
+            padding=0,
+            im2col_step=im2col_step)
+        model.conv_offset.weight.data = torch.nn.Parameter(
+            torch.Tensor(offset_weight).reshape(8, 1, 2, 2))
+        model.conv_offset.bias.data = torch.nn.Parameter(
+            torch.Tensor(offset_bias).reshape(8))
+        model.weight.data = torch.nn.Parameter(
+            torch.Tensor(deform_weight).reshape(1, 1, 2, 2))
+        model.cuda()
+
+        out = model(x)
+        out.backward(torch.ones_like(out))
+
+        assert np.allclose(out.data.detach().cpu().numpy(), repeated_gt_out,
+                           threshold)
+        assert np.allclose(x.grad.detach().cpu().numpy(), repeated_gt_x_grad,
+                           threshold)
+        assert np.allclose(
+            model.conv_offset.weight.grad.detach().cpu().numpy() / batch_size,
+            gt_offset_weight_grad, threshold)
+        assert np.allclose(
+            model.conv_offset.bias.grad.detach().cpu().numpy() / batch_size,
+            gt_offset_bias_grad, threshold)
+        assert np.allclose(
+            model.weight.grad.detach().cpu().numpy() / batch_size,
+            gt_deform_weight_grad, threshold)
+
+        from mmcv.ops import DeformConv2d
+
+        # test bias
+        model = DeformConv2d(1, 1, 2, stride=1, padding=0)
+        assert not hasattr(model, 'bias')
+        # test bias=True
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(1, 1, 2, stride=1, padding=0, bias=True)
+        # test in_channels % group != 0
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(3, 2, 3, groups=2)
+        # test out_channels % group != 0
+        with pytest.raises(AssertionError):
+            model = DeformConv2d(3, 4, 3, groups=3)
+
+    def test_deformconv(self):
+        self._test_deformconv(torch.float, device='cpu')
+        self._test_deformconv(torch.float, device='cpu', threshold=1e-1)
+        self._test_deformconv(torch.float)
+        self._test_deformconv(torch.float)
+        self._test_deformconv(torch.half, threshold=1e-1)
+        # test batch_size < im2col_step
+        self._test_deformconv(torch.float, batch_size=1, im2col_step=2)
+        # test bach_size % im2col_step != 0
+        with pytest.raises(
+                AssertionError,
+                match='batch size must be divisible by im2col_step'):
+            self._test_deformconv(torch.float, batch_size=10, im2col_step=3)
+
+        # test amp when torch version >= '1.6.0', the type of
+        # input data for deformconv might be torch.float or torch.half
+        if (TORCH_VERSION != 'parrots'
+                and digit_version(TORCH_VERSION) >= digit_version('1.6.0')):
+            with autocast(enabled=True):
+                self._test_amp_deformconv(torch.float, 1e-1)
+                self._test_amp_deformconv(torch.half, 1e-1)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_roi_pool.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_roi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..826e1fd21fb9202dc4fb5755c42a8d6fb3e2cc3d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_deform_roi_pool.py
@@ -0,0 +1,152 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE, IS_NPU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+cur_dir = os.path.dirname(os.path.abspath(__file__))
+
+inputs = [([[[[1., 2.], [3., 4.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2.], [3., 4.]], [[4., 3.], [2.,
+                                               1.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.], [9., 10., 13., 14.],
+              [11., 12., 15., 16.]]]], [[0., 0., 0., 3., 3.]])]
+outputs = [([[[[1, 1.25], [1.5, 1.75]]]], [[[[3.0625, 0.4375],
+                                             [0.4375, 0.0625]]]]),
+           ([[[[1., 1.25], [1.5, 1.75]], [[4, 3.75],
+                                          [3.5, 3.25]]]], [[[[3.0625, 0.4375],
+                                                             [0.4375, 0.0625]],
+                                                            [[3.0625, 0.4375],
+                                                             [0.4375,
+                                                              0.0625]]]]),
+           ([[[[1.9375, 4.75],
+               [7.5625,
+                10.375]]]], [[[[0.47265625, 0.4296875, 0.4296875, 0.04296875],
+                               [0.4296875, 0.390625, 0.390625, 0.0390625],
+                               [0.4296875, 0.390625, 0.390625, 0.0390625],
+                               [0.04296875, 0.0390625, 0.0390625,
+                                0.00390625]]]])]
+
+
+class TestDeformRoIPool:
+
+    def test_deform_roi_pool_gradcheck(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import DeformRoIPoolPack
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+        sampling_ratio = 2
+
+        for case in inputs:
+            np_input = np.array(case[0])
+            np_rois = np.array(case[1])
+
+            x = torch.tensor(
+                np_input, device='cuda', dtype=torch.float, requires_grad=True)
+            rois = torch.tensor(np_rois, device='cuda', dtype=torch.float)
+            output_c = x.size(1)
+
+            droipool = DeformRoIPoolPack((pool_h, pool_w),
+                                         output_c,
+                                         spatial_scale=spatial_scale,
+                                         sampling_ratio=sampling_ratio).cuda()
+
+            if _USING_PARROTS:
+                gradcheck(droipool, (x, rois), no_grads=[rois])
+            else:
+                gradcheck(droipool, (x, rois), eps=1e-2, atol=1e-2)
+
+    def test_modulated_deform_roi_pool_gradcheck(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import ModulatedDeformRoIPoolPack
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+        sampling_ratio = 2
+
+        for case in inputs:
+            np_input = np.array(case[0])
+            np_rois = np.array(case[1])
+
+            x = torch.tensor(
+                np_input, device='cuda', dtype=torch.float, requires_grad=True)
+            rois = torch.tensor(np_rois, device='cuda', dtype=torch.float)
+            output_c = x.size(1)
+
+            droipool = ModulatedDeformRoIPoolPack(
+                (pool_h, pool_w),
+                output_c,
+                spatial_scale=spatial_scale,
+                sampling_ratio=sampling_ratio).cuda()
+
+            if _USING_PARROTS:
+                gradcheck(droipool, (x, rois), no_grads=[rois])
+            else:
+                gradcheck(droipool, (x, rois), eps=1e-2, atol=1e-2)
+
+    def _test_deform_roi_pool_allclose(self, device, dtype=torch.float):
+        from mmcv.ops import DeformRoIPoolPack
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+        sampling_ratio = 2
+
+        for case, output in zip(inputs, outputs):
+            np_input = np.array(case[0])
+            np_rois = np.array(case[1])
+            np_output = np.array(output[0])
+            np_grad = np.array(output[1])
+
+            x = torch.tensor(
+                np_input, device=device, dtype=torch.float, requires_grad=True)
+            rois = torch.tensor(np_rois, device=device, dtype=torch.float)
+            output_c = x.size(1)
+            droipool = DeformRoIPoolPack(
+                (pool_h, pool_w),
+                output_c,
+                spatial_scale=spatial_scale,
+                sampling_ratio=sampling_ratio).to(device)
+
+            output = droipool(x, rois)
+            output.backward(torch.ones_like(output))
+            assert np.allclose(output.data.cpu().numpy(), np_output, 1e-3)
+            assert np.allclose(x.grad.data.cpu().numpy(), np_grad, 1e-3)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support')),
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    @pytest.mark.parametrize('dtype', [
+        torch.float,
+        pytest.param(
+            torch.float,
+            marks=pytest.mark.skipif(
+                IS_MLU_AVAILABLE,
+                reason='MLU does not support for 64-bit floating point')),
+        torch.half
+    ])
+    def test_deform_roi_pool_allclose(self, device, dtype):
+        self._test_deform_roi_pool_allclose(device, dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_diff_iou_rotated.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_diff_iou_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..01e05551b04b4df2994cebe4af65ad232be1234d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_diff_iou_rotated.py
@@ -0,0 +1,49 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import diff_iou_rotated_2d, diff_iou_rotated_3d
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_diff_iou_rotated_2d():
+    np_boxes1 = np.asarray([[[0.5, 0.5, 1., 1., .0], [0.5, 0.5, 1., 1., .0],
+                             [0.5, 0.5, 1., 1., .0], [0.5, 0.5, 1., 1., .0],
+                             [0.5, 0.5, 1., 1., .0]]],
+                           dtype=np.float32)
+    np_boxes2 = np.asarray(
+        [[[0.5, 0.5, 1., 1., .0], [0.5, 0.5, 1., 1., np.pi / 2],
+          [0.5, 0.5, 1., 1., np.pi / 4], [1., 1., 1., 1., .0],
+          [1.5, 1.5, 1., 1., .0]]],
+        dtype=np.float32)
+
+    boxes1 = torch.from_numpy(np_boxes1).cuda()
+    boxes2 = torch.from_numpy(np_boxes2).cuda()
+
+    np_expect_ious = np.asarray([[1., 1., .7071, 1 / 7, .0]])
+    ious = diff_iou_rotated_2d(boxes1, boxes2)
+    assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_diff_iou_rotated_3d():
+    np_boxes1 = np.asarray(
+        [[[.5, .5, .5, 1., 1., 1., .0], [.5, .5, .5, 1., 1., 1., .0],
+          [.5, .5, .5, 1., 1., 1., .0], [.5, .5, .5, 1., 1., 1., .0],
+          [.5, .5, .5, 1., 1., 1., .0]]],
+        dtype=np.float32)
+    np_boxes2 = np.asarray(
+        [[[.5, .5, .5, 1., 1., 1., .0], [.5, .5, .5, 1., 1., 2., np.pi / 2],
+          [.5, .5, .5, 1., 1., 1., np.pi / 4], [1., 1., 1., 1., 1., 1., .0],
+          [-1.5, -1.5, -1.5, 2.5, 2.5, 2.5, .0]]],
+        dtype=np.float32)
+
+    boxes1 = torch.from_numpy(np_boxes1).cuda()
+    boxes2 = torch.from_numpy(np_boxes2).cuda()
+
+    np_expect_ious = np.asarray([[1., .5, .7071, 1 / 15, .0]])
+    ious = diff_iou_rotated_3d(boxes1, boxes2)
+    assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_focal_loss.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_focal_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee7c9861aea72d556d5255bab959153e54611766
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_focal_loss.py
@@ -0,0 +1,170 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE, IS_NPU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+# torch.set_printoptions(precision=8, threshold=100)
+
+inputs = [
+    ([[1., 0], [0, 1.]], [0, 1]),
+    ([[1., 0, -1.], [0, 1., 2.]], [2, 1]),
+    ([[1e-6, 2e-6, 3e-6], [4e-6, 5e-5, 6e-4], [7e-3, 8e-2, 9e-1]], [1, 2, 0]),
+]
+
+softmax_outputs = [(0.00566451, [[-0.00657264, 0.00657264],
+                                 [0.00657264, -0.00657264]]),
+                   (0.34956908, [[0.10165970, 0.03739851, -0.13905823],
+                                 [0.01227554, -0.10298023, 0.09070466]]),
+                   (0.15754992, [[0.02590877, -0.05181759, 0.02590882],
+                                 [0.02589641, 0.02589760, -0.05179400],
+                                 [-0.07307514, 0.02234372, 0.05073142]])]
+
+sigmoid_outputs = [(0.13562961, [[-0.00657264, 0.11185755],
+                                 [0.11185755, -0.00657264]]),
+                   (1.10251057, [[0.28808805, 0.11185755, -0.09602935],
+                                 [0.11185755, -0.00657264, 0.40376765]]),
+                   (0.42287254, [[0.07457182, -0.02485716, 0.07457201],
+                                 [0.07457211, 0.07457669, -0.02483728],
+                                 [-0.02462499, 0.08277918, 0.18050370]])]
+
+
+class Testfocalloss:
+
+    def _test_softmax(self, dtype=torch.float):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import softmax_focal_loss
+        alpha = 0.25
+        gamma = 2.0
+        for case, output in zip(inputs, softmax_outputs):
+            np_x = np.array(case[0])
+            np_y = np.array(case[1])
+            np_x_grad = np.array(output[1])
+
+            x = torch.from_numpy(np_x).cuda().type(dtype)
+            x.requires_grad_()
+            y = torch.from_numpy(np_y).cuda().long()
+
+            loss = softmax_focal_loss(x, y, gamma, alpha, None, 'mean')
+            loss.backward()
+
+            assert np.allclose(loss.data.cpu().numpy(), output[0], 1e-2)
+            assert np.allclose(x.grad.data.cpu(), np_x_grad, 1e-2)
+
+    def _test_sigmoid(self, device, dtype=torch.float):
+        from mmcv.ops import sigmoid_focal_loss
+        alpha = 0.25
+        gamma = 2.0
+        for case, output in zip(inputs, sigmoid_outputs):
+            np_x = np.array(case[0])
+            np_y = np.array(case[1])
+            np_x_grad = np.array(output[1])
+
+            x = torch.from_numpy(np_x).to(device).type(dtype)
+            x.requires_grad_()
+            y = torch.from_numpy(np_y).to(device).long()
+
+            loss = sigmoid_focal_loss(x, y, gamma, alpha, None, 'mean')
+            loss.backward()
+
+            assert np.allclose(loss.data.cpu().numpy(), output[0], 1e-2)
+            assert np.allclose(x.grad.data.cpu(), np_x_grad, 1e-2)
+
+    def _test_grad_softmax(self, dtype=torch.float):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import SoftmaxFocalLoss
+        alpha = 0.25
+        gamma = 2.0
+        for case in inputs:
+            np_x = np.array(case[0])
+            np_y = np.array(case[1])
+
+            x = torch.from_numpy(np_x).cuda().type(dtype)
+            x.requires_grad_()
+            y = torch.from_numpy(np_y).cuda().long()
+
+            floss = SoftmaxFocalLoss(gamma, alpha)
+            if _USING_PARROTS:
+                # gradcheck(floss, (x, y),
+                #           no_grads=[y])
+                pass
+            else:
+                gradcheck(floss, (x, y), eps=1e-2, atol=1e-2)
+
+    def _test_grad_sigmoid(self, dtype=torch.float):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import SigmoidFocalLoss
+        alpha = 0.25
+        gamma = 2.0
+        for case in inputs:
+            np_x = np.array(case[0])
+            np_y = np.array(case[1])
+
+            x = torch.from_numpy(np_x).cuda().type(dtype)
+            x.requires_grad_()
+            y = torch.from_numpy(np_y).cuda().long()
+
+            floss = SigmoidFocalLoss(gamma, alpha)
+            if _USING_PARROTS:
+                # gradcheck(floss, (x, y),
+                #           no_grads=[y])
+                pass
+            else:
+                gradcheck(floss, (x, y), eps=1e-2, atol=1e-2)
+
+    def test_softmax_float(self):
+        self._test_softmax(dtype=torch.float)
+
+    def test_softmax_half(self):
+        self._test_softmax(dtype=torch.half)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support')),
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_sigmoid_float(self, device):
+        self._test_sigmoid(device=device, dtype=torch.float)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support')),
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_sigmoid_half(self, device):
+        self._test_sigmoid(device, dtype=torch.half)
+
+    def test_grad_softmax_float(self):
+        self._test_grad_softmax(dtype=torch.float)
+
+    def test_grad_sigmoid_float(self):
+        self._test_grad_sigmoid(dtype=torch.float)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_furthest_point_sample.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_furthest_point_sample.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e61e64a91f541f49828d1e91e6b79c06aa1470a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_furthest_point_sample.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import furthest_point_sample, furthest_point_sample_with_dist
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_fps():
+    xyz = torch.tensor([[[-0.2748, 1.0020, -1.1674], [0.1015, 1.3952, -1.2681],
+                         [-0.8070, 2.4137,
+                          -0.5845], [-1.0001, 2.1982, -0.5859],
+                         [0.3841, 1.8983, -0.7431]],
+                        [[-1.0696, 3.0758,
+                          -0.1899], [-0.2559, 3.5521, -0.1402],
+                         [0.8164, 4.0081, -0.1839], [-1.1000, 3.0213, -0.8205],
+                         [-0.0518, 3.7251, -0.3950]]]).cuda()
+
+    idx = furthest_point_sample(xyz, 3)
+    expected_idx = torch.tensor([[0, 2, 4], [0, 2, 1]]).cuda()
+    assert torch.all(idx == expected_idx)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_fps_with_dist():
+    xyz = torch.tensor([[[-0.2748, 1.0020, -1.1674], [0.1015, 1.3952, -1.2681],
+                         [-0.8070, 2.4137,
+                          -0.5845], [-1.0001, 2.1982, -0.5859],
+                         [0.3841, 1.8983, -0.7431]],
+                        [[-1.0696, 3.0758,
+                          -0.1899], [-0.2559, 3.5521, -0.1402],
+                         [0.8164, 4.0081, -0.1839], [-1.1000, 3.0213, -0.8205],
+                         [-0.0518, 3.7251, -0.3950]]]).cuda()
+
+    expected_idx = torch.tensor([[0, 2, 4], [0, 2, 1]]).cuda()
+    xyz_square_dist = ((xyz.unsqueeze(dim=1) -
+                        xyz.unsqueeze(dim=2))**2).sum(-1)
+    idx = furthest_point_sample_with_dist(xyz_square_dist, 3)
+    assert torch.all(idx == expected_idx)
+
+    import numpy as np
+    fps_idx = np.load('tests/data/for_3d_ops/fps_idx.npy')
+    features_for_fps_distance = np.load(
+        'tests/data/for_3d_ops/features_for_fps_distance.npy')
+    expected_idx = torch.from_numpy(fps_idx).cuda()
+    features_for_fps_distance = torch.from_numpy(
+        features_for_fps_distance).cuda()
+
+    idx = furthest_point_sample_with_dist(features_for_fps_distance, 16)
+    assert torch.all(idx == expected_idx)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_fused_bias_leakyrelu.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_fused_bias_leakyrelu.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6f6fb9f75916f2ef625b856b47db9b8674a4756
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_fused_bias_leakyrelu.py
@@ -0,0 +1,74 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_NPU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck, gradgradcheck
+    _USING_PARROTS = False
+
+
+class TestFusedBiasLeakyReLU:
+
+    @classmethod
+    def setup_class(cls):
+        if not IS_CUDA_AVAILABLE and not IS_NPU_AVAILABLE:
+            return
+        if IS_CUDA_AVAILABLE:
+            cls.input_tensor = torch.randn((2, 2, 2, 2),
+                                           requires_grad=True).cuda()
+            cls.bias = torch.zeros(2, requires_grad=True).cuda()
+        elif IS_NPU_AVAILABLE:
+            cls.input_tensor = torch.randn((2, 2, 2, 2),
+                                           requires_grad=True).npu()
+            cls.bias = torch.zeros(2, requires_grad=True).npu()
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support'))
+    ])
+    def test_gradient(self, device):
+
+        from mmcv.ops import FusedBiasLeakyReLU
+        if _USING_PARROTS:
+            if IS_CUDA_AVAILABLE:
+                gradcheck(
+                    FusedBiasLeakyReLU(2).cuda(),
+                    self.input_tensor,
+                    delta=1e-4,
+                    pt_atol=1e-3)
+        else:
+            gradcheck(
+                FusedBiasLeakyReLU(2).to(device),
+                self.input_tensor,
+                eps=1e-4,
+                atol=1e-3)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support'))
+    ])
+    def test_gradgradient(self, device):
+
+        from mmcv.ops import FusedBiasLeakyReLU
+        gradgradcheck(
+            FusedBiasLeakyReLU(2).to(device),
+            self.input_tensor,
+            eps=1e-4,
+            atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_gather_points.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_gather_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..a93df692a58425140fc1fd73f5cefe9c07cf0d6b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_gather_points.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import gather_points
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_gather_points():
+    features = torch.tensor([[[
+        -1.6095, -0.1029, -0.8876, -1.2447, -2.4031, 0.3708, -1.1586, -1.4967,
+        -0.4800, 0.2252
+    ],
+                              [
+                                  1.9138, 3.4979, 1.6854, 1.5631, 3.6776,
+                                  3.1154, 2.1705, 2.5221, 2.0411, 3.1446
+                              ],
+                              [
+                                  -1.4173, 0.3073, -1.4339, -1.4340, -1.2770,
+                                  -0.2867, -1.4162, -1.4044, -1.4245, -1.4074
+                              ]],
+                             [[
+                                 0.2160, 0.0842, 0.3661, -0.2749, -0.4909,
+                                 -0.6066, -0.8773, -0.0745, -0.9496, 0.1434
+                             ],
+                              [
+                                  1.3644, 1.8087, 1.6855, 1.9563, 1.2746,
+                                  1.9662, 0.9566, 1.8778, 1.1437, 1.3639
+                              ],
+                              [
+                                  -0.7172, 0.1692, 0.2241, 0.0721, -0.7540,
+                                  0.0462, -0.6227, 0.3223, -0.6944, -0.5294
+                              ]]]).cuda()
+
+    idx = torch.tensor([[0, 1, 4, 0, 0, 0], [0, 5, 6, 0, 0, 0]]).int().cuda()
+
+    output = gather_points(features, idx)
+    expected_output = torch.tensor(
+        [[[-1.6095, -0.1029, -2.4031, -1.6095, -1.6095, -1.6095],
+          [1.9138, 3.4979, 3.6776, 1.9138, 1.9138, 1.9138],
+          [-1.4173, 0.3073, -1.2770, -1.4173, -1.4173, -1.4173]],
+         [[0.2160, -0.6066, -0.8773, 0.2160, 0.2160, 0.2160],
+          [1.3644, 1.9662, 0.9566, 1.3644, 1.3644, 1.3644],
+          [-0.7172, 0.0462, -0.6227, -0.7172, -0.7172, -0.7172]]]).cuda()
+
+    assert torch.allclose(output, expected_output)
+
+    # test fp16
+    output_half = gather_points(features.half(), idx)
+    assert torch.allclose(output_half, expected_output.half())
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_group_points.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_group_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..a6dfef5f1e8394ea4a687057d8724cb45e6464c9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_group_points.py
@@ -0,0 +1,164 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import grouping_operation
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+@pytest.mark.parametrize('dtype', [torch.half, torch.float, torch.float])
+def test_grouping_points(dtype):
+    idx = torch.tensor([[[0, 0, 0], [3, 3, 3], [8, 8, 8], [0, 0, 0], [0, 0, 0],
+                         [0, 0, 0]],
+                        [[0, 0, 0], [6, 6, 6], [9, 9, 9], [0, 0, 0], [0, 0, 0],
+                         [0, 0, 0]]]).int().cuda()
+    features = torch.tensor([[[
+        0.5798, -0.7981, -0.9280, -1.3311, 1.3687, 0.9277, -0.4164, -1.8274,
+        0.9268, 0.8414
+    ],
+                              [
+                                  5.4247, 1.5113, 2.3944, 1.4740, 5.0300,
+                                  5.1030, 1.9360, 2.1939, 2.1581, 3.4666
+                              ],
+                              [
+                                  -1.6266, -1.0281, -1.0393, -1.6931, -1.3982,
+                                  -0.5732, -1.0830, -1.7561, -1.6786, -1.6967
+                              ]],
+                             [[
+                                 -0.0380, -0.1880, -1.5724, 0.6905, -0.3190,
+                                 0.7798, -0.3693, -0.9457, -0.2942, -1.8527
+                             ],
+                              [
+                                  1.1773, 1.5009, 2.6399, 5.9242, 1.0962,
+                                  2.7346, 6.0865, 1.5555, 4.3303, 2.8229
+                              ],
+                              [
+                                  -0.6646, -0.6870, -0.1125, -0.2224, -0.3445,
+                                  -1.4049, 0.4990, -0.7037, -0.9924, 0.0386
+                              ]]],
+                            dtype=dtype).cuda()
+
+    output = grouping_operation(features, idx)
+    expected_output = torch.tensor(
+        [[[[0.5798, 0.5798, 0.5798], [-1.3311, -1.3311, -1.3311],
+           [0.9268, 0.9268, 0.9268], [0.5798, 0.5798, 0.5798],
+           [0.5798, 0.5798, 0.5798], [0.5798, 0.5798, 0.5798]],
+          [[5.4247, 5.4247, 5.4247], [1.4740, 1.4740, 1.4740],
+           [2.1581, 2.1581, 2.1581], [5.4247, 5.4247, 5.4247],
+           [5.4247, 5.4247, 5.4247], [5.4247, 5.4247, 5.4247]],
+          [[-1.6266, -1.6266, -1.6266], [-1.6931, -1.6931, -1.6931],
+           [-1.6786, -1.6786, -1.6786], [-1.6266, -1.6266, -1.6266],
+           [-1.6266, -1.6266, -1.6266], [-1.6266, -1.6266, -1.6266]]],
+         [[[-0.0380, -0.0380, -0.0380], [-0.3693, -0.3693, -0.3693],
+           [-1.8527, -1.8527, -1.8527], [-0.0380, -0.0380, -0.0380],
+           [-0.0380, -0.0380, -0.0380], [-0.0380, -0.0380, -0.0380]],
+          [[1.1773, 1.1773, 1.1773], [6.0865, 6.0865, 6.0865],
+           [2.8229, 2.8229, 2.8229], [1.1773, 1.1773, 1.1773],
+           [1.1773, 1.1773, 1.1773], [1.1773, 1.1773, 1.1773]],
+          [[-0.6646, -0.6646, -0.6646], [0.4990, 0.4990, 0.4990],
+           [0.0386, 0.0386, 0.0386], [-0.6646, -0.6646, -0.6646],
+           [-0.6646, -0.6646, -0.6646], [-0.6646, -0.6646, -0.6646]]]],
+        dtype=dtype).cuda()
+    assert torch.allclose(output, expected_output)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+@pytest.mark.parametrize('dtype', [torch.half, torch.float, torch.float])
+def test_stack_grouping_points(dtype):
+    idx = torch.tensor([[0, 0, 0], [3, 3, 3], [8, 8, 8], [1, 1, 1], [0, 0, 0],
+                        [2, 2, 2], [0, 0, 0], [6, 6, 6], [9, 9, 9], [0, 0, 0],
+                        [1, 1, 1], [0, 0, 0]]).int().cuda()
+    features = torch.tensor([[
+        0.5798, -0.7981, -0.9280, -1.3311, 1.3687, 0.9277, -0.4164, -1.8274,
+        0.9268, 0.8414
+    ],
+                             [
+                                 5.4247, 1.5113, 2.3944, 1.4740, 5.0300,
+                                 5.1030, 1.9360, 2.1939, 2.1581, 3.4666
+                             ],
+                             [
+                                 -1.6266, -1.0281, -1.0393, -1.6931, -1.3982,
+                                 -0.5732, -1.0830, -1.7561, -1.6786, -1.6967
+                             ],
+                             [
+                                 -0.0380, -0.1880, -1.5724, 0.6905, -0.3190,
+                                 0.7798, -0.3693, -0.9457, -0.2942, -1.8527
+                             ],
+                             [
+                                 1.1773, 1.5009, 2.6399, 5.9242, 1.0962,
+                                 2.7346, 6.0865, 1.5555, 4.3303, 2.8229
+                             ],
+                             [
+                                 -0.6646, -0.6870, -0.1125, -0.2224, -0.3445,
+                                 -1.4049, 0.4990, -0.7037, -0.9924, 0.0386
+                             ]],
+                            dtype=dtype).cuda()
+    features_batch_cnt = torch.tensor([3, 3]).int().cuda()
+    indices_batch_cnt = torch.tensor([6, 6]).int().cuda()
+    output = grouping_operation(features, idx, features_batch_cnt,
+                                indices_batch_cnt)
+    expected_output = torch.tensor(
+        [[[0.5798, 0.5798, 0.5798], [-0.7981, -0.7981, -0.7981],
+          [-0.9280, -0.9280, -0.9280], [-1.3311, -1.3311, -1.3311],
+          [1.3687, 1.3687, 1.3687], [0.9277, 0.9277, 0.9277],
+          [-0.4164, -0.4164, -0.4164], [-1.8274, -1.8274, -1.8274],
+          [0.9268, 0.9268, 0.9268], [0.8414, 0.8414, 0.8414]],
+         [[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000]],
+         [[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000]],
+         [[5.4247, 5.4247, 5.4247], [1.5113, 1.5113, 1.5113],
+          [2.3944, 2.3944, 2.3944], [1.4740, 1.4740, 1.4740],
+          [5.0300, 5.0300, 5.0300], [5.1030, 5.1030, 5.1030],
+          [1.9360, 1.9360, 1.9360], [2.1939, 2.1939, 2.1939],
+          [2.1581, 2.1581, 2.1581], [3.4666, 3.4666, 3.4666]],
+         [[0.5798, 0.5798, 0.5798], [-0.7981, -0.7981, -0.7981],
+          [-0.9280, -0.9280, -0.9280], [-1.3311, -1.3311, -1.3311],
+          [1.3687, 1.3687, 1.3687], [0.9277, 0.9277, 0.9277],
+          [-0.4164, -0.4164, -0.4164], [-1.8274, -1.8274, -1.8274],
+          [0.9268, 0.9268, 0.9268], [0.8414, 0.8414, 0.8414]],
+         [[-1.6266, -1.6266, -1.6266], [-1.0281, -1.0281, -1.0281],
+          [-1.0393, -1.0393, -1.0393], [-1.6931, -1.6931, -1.6931],
+          [-1.3982, -1.3982, -1.3982], [-0.5732, -0.5732, -0.5732],
+          [-1.0830, -1.0830, -1.0830], [-1.7561, -1.7561, -1.7561],
+          [-1.6786, -1.6786, -1.6786], [-1.6967, -1.6967, -1.6967]],
+         [[-0.0380, -0.0380, -0.0380], [-0.1880, -0.1880, -0.1880],
+          [-1.5724, -1.5724, -1.5724], [0.6905, 0.6905, 0.6905],
+          [-0.3190, -0.3190, -0.3190], [0.7798, 0.7798, 0.7798],
+          [-0.3693, -0.3693, -0.3693], [-0.9457, -0.9457, -0.9457],
+          [-0.2942, -0.2942, -0.2942], [-1.8527, -1.8527, -1.8527]],
+         [[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000]],
+         [[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000],
+          [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000]],
+         [[-0.0380, -0.0380, -0.0380], [-0.1880, -0.1880, -0.1880],
+          [-1.5724, -1.5724, -1.5724], [0.6905, 0.6905, 0.6905],
+          [-0.3190, -0.3190, -0.3190], [0.7798, 0.7798, 0.7798],
+          [-0.3693, -0.3693, -0.3693], [-0.9457, -0.9457, -0.9457],
+          [-0.2942, -0.2942, -0.2942], [-1.8527, -1.8527, -1.8527]],
+         [[1.1773, 1.1773, 1.1773], [1.5009, 1.5009, 1.5009],
+          [2.6399, 2.6399, 2.6399], [5.9242, 5.9242, 5.9242],
+          [1.0962, 1.0962, 1.0962], [2.7346, 2.7346, 2.7346],
+          [6.0865, 6.0865, 6.0865], [1.5555, 1.5555, 1.5555],
+          [4.3303, 4.3303, 4.3303], [2.8229, 2.8229, 2.8229]],
+         [[-0.0380, -0.0380, -0.0380], [-0.1880, -0.1880, -0.1880],
+          [-1.5724, -1.5724, -1.5724], [0.6905, 0.6905, 0.6905],
+          [-0.3190, -0.3190, -0.3190], [0.7798, 0.7798, 0.7798],
+          [-0.3693, -0.3693, -0.3693], [-0.9457, -0.9457, -0.9457],
+          [-0.2942, -0.2942, -0.2942], [-1.8527, -1.8527, -1.8527]]],
+        dtype=dtype).cuda()
+    assert torch.allclose(output, expected_output)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_info.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_info.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f2cc23eecf613f382aeaa38e6f37114033ce88b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_info.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+
+class TestInfo:
+
+    def test_info(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import get_compiler_version, get_compiling_cuda_version
+        cv = get_compiler_version()
+        ccv = get_compiling_cuda_version()
+        assert cv is not None
+        assert ccv is not None
+
+
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_iou3d.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_iou3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..40ec90164f0a29e83f1de812f45659b3adda0fdf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_iou3d.py
@@ -0,0 +1,145 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import boxes_iou3d, boxes_overlap_bev, nms3d, nms3d_normal
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+])
+def test_boxes_overlap_bev(device):
+    np_boxes1 = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                            [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                            [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.0]],
+                           dtype=np.float32)
+    np_boxes2 = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                            [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, np.pi / 2],
+                            [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, np.pi / 4]],
+                           dtype=np.float32)
+    np_expect_overlaps = np.asarray(
+        [[4.0, 4.0, (8 + 8 * 2**0.5) /
+          (3 + 2 * 2**0.5)], [1.0, 1.0, 1.0], [0.0, 0.0, 0.0]],
+        dtype=np.float32)
+
+    boxes1 = torch.from_numpy(np_boxes1).to(device)
+    boxes2 = torch.from_numpy(np_boxes2).to(device)
+
+    # test for 3 boxes
+    overlaps = boxes_overlap_bev(boxes1, boxes2)
+    assert np.allclose(overlaps.cpu().numpy(), np_expect_overlaps, atol=1e-4)
+
+    # test for many boxes
+    boxes2 = boxes2.repeat_interleave(555, 0)
+
+    overlaps = boxes_overlap_bev(boxes1, boxes2)
+    assert np.allclose(
+        overlaps.cpu().numpy(), np_expect_overlaps.repeat(555, 1), atol=1e-4)
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+])
+def test_boxes_iou3d(device):
+    np_boxes1 = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                            [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                            [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.0]],
+                           dtype=np.float32)
+    np_boxes2 = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                            [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, np.pi / 2],
+                            [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, np.pi / 4]],
+                           dtype=np.float32)
+    np_expect_ious = np.asarray(
+        [[1.0, 1.0, 1.0 / 2**0.5], [1.0 / 15, 1.0 / 15, 1.0 / 15],
+         [0.0, 0.0, 0.0]],
+        dtype=np.float32)
+
+    boxes1 = torch.from_numpy(np_boxes1).to(device)
+    boxes2 = torch.from_numpy(np_boxes2).to(device)
+
+    ious = boxes_iou3d(boxes1, boxes2)
+    assert np.allclose(ious.cpu().numpy(), np_expect_ious, atol=1e-4)
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+def test_nms3d(device):
+    # test for 5 boxes
+    np_boxes = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                           [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                           [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.3],
+                           [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.0],
+                           [3.0, 3.2, 3.2, 3.0, 2.0, 2.0, 0.3]],
+                          dtype=np.float32)
+    np_scores = np.array([0.6, 0.9, 0.1, 0.2, 0.15], dtype=np.float32)
+    np_inds = np.array([1, 0, 3])
+    boxes = torch.from_numpy(np_boxes)
+    scores = torch.from_numpy(np_scores)
+    inds = nms3d(boxes.to(device), scores.to(device), iou_threshold=0.3)
+
+    assert np.allclose(inds.cpu().numpy(), np_inds)
+
+    # test for many boxes
+    # In the float data type calculation process, float will be converted to
+    # float in CUDA kernel (https://github.com/open-mmlab/mmcv/blob
+    # /master/mmcv/ops/csrc/common/box_iou_rotated_utils.hpp#L61),
+    # always use float in MLU kernel. The difference between the mentioned
+    # above leads to different results.
+    if device != 'mlu':
+        np.random.seed(42)
+        np_boxes = np.random.rand(555, 7).astype(np.float32)
+        np_scores = np.random.rand(555).astype(np.float32)
+        boxes = torch.from_numpy(np_boxes)
+        scores = torch.from_numpy(np_scores)
+        inds = nms3d(boxes.to(device), scores.to(device), iou_threshold=0.3)
+
+        assert len(inds.cpu().numpy()) == 176
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+])
+def test_nms3d_normal(device):
+    # test for 5 boxes
+    np_boxes = np.asarray([[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0],
+                           [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0],
+                           [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.3],
+                           [3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 0.0],
+                           [3.0, 3.2, 3.2, 3.0, 2.0, 2.0, 0.3]],
+                          dtype=np.float32)
+    np_scores = np.array([0.6, 0.9, 0.1, 0.2, 0.15], dtype=np.float32)
+    np_inds = np.array([1, 0, 3])
+    boxes = torch.from_numpy(np_boxes)
+    scores = torch.from_numpy(np_scores)
+    inds = nms3d_normal(boxes.to(device), scores.to(device), iou_threshold=0.3)
+
+    assert np.allclose(inds.cpu().numpy(), np_inds)
+
+    # test for many boxes
+    np.random.seed(42)
+    np_boxes = np.random.rand(555, 7).astype(np.float32)
+    np_scores = np.random.rand(555).astype(np.float32)
+    boxes = torch.from_numpy(np_boxes)
+    scores = torch.from_numpy(np_scores)
+    inds = nms3d_normal(boxes.to(device), scores.to(device), iou_threshold=0.3)
+
+    assert len(inds.cpu().numpy()) == 148
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_knn.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_knn.py
new file mode 100644
index 0000000000000000000000000000000000000000..1236a5fcbe732fd287cea0a97e3166dbcd5555fa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_knn.py
@@ -0,0 +1,55 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import knn
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_knn():
+    new_xyz = torch.tensor([[[-0.0740, 1.3147, -1.3625],
+                             [-2.2769, 2.7817, -0.2334],
+                             [-0.4003, 2.4666, -0.5116],
+                             [-0.0740, 1.3147, -1.3625],
+                             [-0.0740, 1.3147, -1.3625]],
+                            [[-2.0289, 2.4952, -0.1708],
+                             [-2.0668, 6.0278, -0.4875],
+                             [0.4066, 1.4211, -0.2947],
+                             [-2.0289, 2.4952, -0.1708],
+                             [-2.0289, 2.4952, -0.1708]]]).cuda()
+
+    xyz = torch.tensor([[[-0.0740, 1.3147, -1.3625], [0.5555, 1.0399, -1.3634],
+                         [-0.4003, 2.4666,
+                          -0.5116], [-0.5251, 2.4379, -0.8466],
+                         [-0.9691, 1.1418,
+                          -1.3733], [-0.2232, 0.9561, -1.3626],
+                         [-2.2769, 2.7817, -0.2334],
+                         [-0.2822, 1.3192, -1.3645], [0.1533, 1.5024, -1.0432],
+                         [0.4917, 1.1529, -1.3496]],
+                        [[-2.0289, 2.4952,
+                          -0.1708], [-0.7188, 0.9956, -0.5096],
+                         [-2.0668, 6.0278, -0.4875], [-1.9304, 3.3092, 0.6610],
+                         [0.0949, 1.4332, 0.3140], [-1.2879, 2.0008, -0.7791],
+                         [-0.7252, 0.9611, -0.6371], [0.4066, 1.4211, -0.2947],
+                         [0.3220, 1.4447, 0.3548], [-0.9744, 2.3856,
+                                                    -1.2000]]]).cuda()
+
+    idx = knn(5, xyz, new_xyz)
+    new_xyz_ = new_xyz.unsqueeze(2).repeat(1, 1, xyz.shape[1], 1)
+    xyz_ = xyz.unsqueeze(1).repeat(1, new_xyz.shape[1], 1, 1)
+    dist = ((new_xyz_ - xyz_) * (new_xyz_ - xyz_)).sum(-1)
+    expected_idx = dist.topk(k=5, dim=2, largest=False)[1].transpose(2, 1)
+    assert torch.all(idx == expected_idx)
+
+    idx = knn(5,
+              xyz.transpose(1, 2).contiguous(),
+              new_xyz.transpose(1, 2).contiguous(), True)
+    assert torch.all(idx == expected_idx)
+
+    idx = knn(5, xyz, xyz)
+    xyz_ = xyz.unsqueeze(2).repeat(1, 1, xyz.shape[1], 1)
+    xyz__ = xyz.unsqueeze(1).repeat(1, xyz.shape[1], 1, 1)
+    dist = ((xyz_ - xyz__) * (xyz_ - xyz__)).sum(-1)
+    expected_idx = dist.topk(k=5, dim=2, largest=False)[1].transpose(2, 1)
+    assert torch.all(idx == expected_idx)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_masked_conv2d.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_masked_conv2d.py
new file mode 100644
index 0000000000000000000000000000000000000000..a292f6a4fd5cde1788f872c80fb89de043b0e27f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_masked_conv2d.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+class TestMaskedConv2d:
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_masked_conv2d_all_close(self, device):
+        from mmcv.ops import MaskedConv2d
+        np_input = np.load(
+            'tests/data/for_masked_conv2d/masked_conv2d_for_input.npy')
+        np_mask = np.load(
+            'tests/data/for_masked_conv2d/masked_conv2d_for_mask.npy')
+        np_weight = np.load(
+            'tests/data/for_masked_conv2d/masked_conv2d_for_weight.npy')
+        np_bias = np.load(
+            'tests/data/for_masked_conv2d/masked_conv2d_for_bias.npy')
+        np_output = np.load(
+            'tests/data/for_masked_conv2d/masked_conv2d_for_output.npy')
+        input = torch.tensor(np_input, dtype=torch.float, device=device)
+        mask = torch.tensor(np_mask, dtype=torch.float, device=device)
+        weight = torch.tensor(np_weight, dtype=torch.float, device=device)
+        bias = torch.tensor(np_bias, dtype=torch.float, device=device)
+        conv = MaskedConv2d(3, 3, 3, 1, 1).to(device)
+        conv.weight = torch.nn.Parameter(weight)
+        conv.bias = torch.nn.Parameter(bias)
+        output = conv(input, mask)
+        assert np.allclose(output.data.cpu().numpy(), np_output, 1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_merge_cells.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_merge_cells.py
new file mode 100644
index 0000000000000000000000000000000000000000..51551c1416eb39340ed0ec170ce5dd35e436df68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_merge_cells.py
@@ -0,0 +1,95 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""
+CommandLine:
+    pytest tests/test_merge_cells.py
+"""
+import math
+
+import pytest
+import torch
+import torch.nn.functional as F
+
+from mmcv.ops.merge_cells import (BaseMergeCell, ConcatCell, GlobalPoolingCell,
+                                  SumCell)
+
+
+# All size (14, 7) below is to test the situation that
+# the input size can't be divisible by the target size.
+@pytest.mark.parametrize(
+    'inputs_x, inputs_y',
+    [(torch.randn([2, 256, 16, 16]), torch.randn([2, 256, 32, 32])),
+     (torch.randn([2, 256, 14, 7]), torch.randn([2, 256, 32, 32]))])
+def test_sum_cell(inputs_x, inputs_y):
+    sum_cell = SumCell(256, 256)
+    output = sum_cell(inputs_x, inputs_y, out_size=inputs_x.shape[-2:])
+    assert output.size() == inputs_x.size()
+    output = sum_cell(inputs_x, inputs_y, out_size=inputs_y.shape[-2:])
+    assert output.size() == inputs_y.size()
+    output = sum_cell(inputs_x, inputs_y)
+    assert output.size() == inputs_y.size()
+
+
+@pytest.mark.parametrize(
+    'inputs_x, inputs_y',
+    [(torch.randn([2, 256, 16, 16]), torch.randn([2, 256, 32, 32])),
+     (torch.randn([2, 256, 14, 7]), torch.randn([2, 256, 32, 32]))])
+def test_concat_cell(inputs_x, inputs_y):
+    concat_cell = ConcatCell(256, 256)
+    output = concat_cell(inputs_x, inputs_y, out_size=inputs_x.shape[-2:])
+    assert output.size() == inputs_x.size()
+    output = concat_cell(inputs_x, inputs_y, out_size=inputs_y.shape[-2:])
+    assert output.size() == inputs_y.size()
+    output = concat_cell(inputs_x, inputs_y)
+    assert output.size() == inputs_y.size()
+
+
+@pytest.mark.parametrize(
+    'inputs_x, inputs_y',
+    [(torch.randn([2, 256, 16, 16]), torch.randn([2, 256, 32, 32])),
+     (torch.randn([2, 256, 14, 7]), torch.randn([2, 256, 32, 32]))])
+def test_global_pool_cell(inputs_x, inputs_y):
+    gp_cell = GlobalPoolingCell(with_out_conv=False)
+    gp_cell_out = gp_cell(inputs_x, inputs_y, out_size=inputs_x.shape[-2:])
+    assert (gp_cell_out.size() == inputs_x.size())
+    gp_cell = GlobalPoolingCell(256, 256)
+    gp_cell_out = gp_cell(inputs_x, inputs_y, out_size=inputs_x.shape[-2:])
+    assert (gp_cell_out.size() == inputs_x.size())
+
+
+@pytest.mark.parametrize('target_size', [(256, 256), (128, 128), (64, 64),
+                                         (14, 7)])
+def test_resize_methods(target_size):
+    inputs_x = torch.randn([2, 256, 128, 128])
+    h, w = inputs_x.shape[-2:]
+    target_h, target_w = target_size
+    if (h <= target_h) or w <= target_w:
+        rs_mode = 'upsample'
+    else:
+        rs_mode = 'downsample'
+
+    if rs_mode == 'upsample':
+        upsample_methods_list = ['nearest', 'bilinear']
+        for method in upsample_methods_list:
+            merge_cell = BaseMergeCell(upsample_mode=method)
+            merge_cell_out = merge_cell._resize(inputs_x, target_size)
+            gt_out = F.interpolate(inputs_x, size=target_size, mode=method)
+            assert merge_cell_out.equal(gt_out)
+    elif rs_mode == 'downsample':
+        merge_cell = BaseMergeCell()
+        merge_cell_out = merge_cell._resize(inputs_x, target_size)
+        if h % target_h != 0 or w % target_w != 0:
+            pad_h = math.ceil(h / target_h) * target_h - h
+            pad_w = math.ceil(w / target_w) * target_w - w
+            pad_l = pad_w // 2
+            pad_r = pad_w - pad_l
+            pad_t = pad_h // 2
+            pad_b = pad_h - pad_t
+            pad = (pad_l, pad_r, pad_t, pad_b)
+            inputs_x = F.pad(inputs_x, pad, mode='constant', value=0.0)
+        kernel_size = (inputs_x.shape[-2] // target_h,
+                       inputs_x.shape[-1] // target_w)
+        gt_out = F.max_pool2d(
+            inputs_x, kernel_size=kernel_size, stride=kernel_size)
+        print(merge_cell_out.shape, gt_out.shape)
+        assert (merge_cell_out == gt_out).all()
+        assert merge_cell_out.shape[-2:] == target_size
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_min_area_polygons.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_min_area_polygons.py
new file mode 100644
index 0000000000000000000000000000000000000000..b049e765d25aed33793c8ac018464a8a218bbf9a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_min_area_polygons.py
@@ -0,0 +1,30 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import min_area_polygons
+
+np_pointsets = np.asarray([[
+    1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 3.0, 1.0, 2.0, 3.0, 3.0,
+    2.0, 1.5, 1.5
+],
+                           [
+                               1.0, 1.0, 8.0, 8.0, 1.0, 2.0, 2.0, 1.0, 1.0,
+                               3.0, 3.0, 1.0, 2.0, 3.0, 3.0, 2.0, 1.5, 1.5
+                           ]])
+
+expected_polygons = np.asarray(
+    [[3.0000, 1.0000, 1.0000, 1.0000, 1.0000, 3.0000, 3.0000, 3.0000],
+     [8.0, 8.0, 2.3243, 0.0541, 0.0541, 1.6757, 5.7297, 9.6216]])
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_min_area_polygons():
+    pointsets = torch.from_numpy(np_pointsets).float().cuda()
+
+    assert np.allclose(
+        min_area_polygons(pointsets).cpu().numpy(),
+        expected_polygons,
+        atol=1e-4)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_modulated_deform_conv.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_modulated_deform_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..58a325459b1da2fb989cb9cca64e5598149a0b42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_modulated_deform_conv.py
@@ -0,0 +1,127 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+import numpy
+import pytest
+import torch
+from mmengine.utils import digit_version
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+try:
+    # If PyTorch version >= 1.6.0 and fp16 is enabled, torch.cuda.amp.autocast
+    # would be imported and used; we should test if our modules support it.
+    from torch.cuda.amp import autocast
+except ImportError:
+    pass
+
+cur_dir = os.path.dirname(os.path.abspath(__file__))
+
+input_t = [[[[1., 2., 3.], [1., 2., 3.], [1., 2., 3.]]]]
+output_t = [[[[0.5, 1.5, 2.5, 1.5], [1.0, 3.0, 5.0, 3.0], [1.0, 3.0, 5.0, 3.0],
+              [0.5, 1.5, 2.5, 1.5]]]]
+input_grad = [[[[2., 2., 2.], [2., 2., 2.], [2., 2., 2.]]]]
+dcn_w_grad = [[[[9., 9.], [9., 9.]]]]
+dcn_offset_w_grad = [[[[-7.0, -4.0], [0.0, 0.0]]], [[[-9.0, 7.5], [-6.0,
+                                                                   5.0]]],
+                     [[[-4.0, -7.0], [0.0, 0.0]]],
+                     [[[-7.5, -9.0], [-5.0, -6.0]]],
+                     [[[-7.0, -4.0], [-7.0, -4.0]]],
+                     [[[-6.0, 5.0], [-9.0, 7.5]]],
+                     [[[-4.0, -7.0], [-4.0, -7.0]]],
+                     [[[-5.0, -6.0], [-7.5, -9.0]]], [[[10.5, 6.0], [7.0,
+                                                                     4.0]]],
+                     [[[6.0, 10.5], [4.0, 7.0]]], [[[7.0, 4.0], [10.5, 6.0]]],
+                     [[[4.0, 7.0], [6.0, 10.5]]]]
+dcn_offset_b_grad = [
+    -3.0, -1.5, -3.0, -1.5, -3.0, -1.5, -3.0, -1.5, 4.5, 4.5, 4.5, 4.5
+]
+
+
+class TestMdconv:
+
+    def _test_mdconv(self, dtype=torch.float, device='cuda'):
+        if not torch.cuda.is_available() and device == 'cuda':
+            pytest.skip('test requires GPU')
+        from mmcv.ops import ModulatedDeformConv2dPack
+        input = torch.tensor(input_t, dtype=dtype, device=device)
+        input.requires_grad = True
+
+        dcn = ModulatedDeformConv2dPack(
+            1,
+            1,
+            kernel_size=(2, 2),
+            stride=1,
+            padding=1,
+            deform_groups=1,
+            bias=False)
+
+        if device == 'cuda':
+            dcn.cuda()
+
+        dcn.weight.data.fill_(1.)
+        dcn.type(dtype)
+        output = dcn(input)
+        output.sum().backward()
+        assert numpy.allclose(output.cpu().detach().numpy(), output_t, 1e-2)
+        assert numpy.allclose(input.grad.cpu().detach().numpy(), input_grad,
+                              1e-2)
+        assert numpy.allclose(dcn.weight.grad.cpu().detach().numpy(),
+                              dcn_w_grad, 1e-2)
+        assert numpy.allclose(
+            dcn.conv_offset.weight.grad.cpu().detach().numpy(),
+            dcn_offset_w_grad, 1e-2)
+        assert numpy.allclose(dcn.conv_offset.bias.grad.cpu().detach().numpy(),
+                              dcn_offset_b_grad, 1e-2)
+
+    def _test_amp_mdconv(self, input_dtype=torch.float):
+        """The function to test amp released on pytorch 1.6.0.
+
+        The type of input data might be torch.float or torch.half,
+        so we should test mdconv in both cases. With amp, the data
+        type of model will NOT be set manually.
+
+        Args:
+            input_dtype: torch.float or torch.half.
+        """
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import ModulatedDeformConv2dPack
+        input = torch.tensor(input_t).cuda().type(input_dtype)
+        input.requires_grad = True
+
+        dcn = ModulatedDeformConv2dPack(
+            1,
+            1,
+            kernel_size=(2, 2),
+            stride=1,
+            padding=1,
+            deform_groups=1,
+            bias=False).cuda()
+        dcn.weight.data.fill_(1.)
+        output = dcn(input)
+        output.sum().backward()
+        assert numpy.allclose(output.cpu().detach().numpy(), output_t, 1e-2)
+        assert numpy.allclose(input.grad.cpu().detach().numpy(), input_grad,
+                              1e-2)
+        assert numpy.allclose(dcn.weight.grad.cpu().detach().numpy(),
+                              dcn_w_grad, 1e-2)
+        assert numpy.allclose(
+            dcn.conv_offset.weight.grad.cpu().detach().numpy(),
+            dcn_offset_w_grad, 1e-2)
+        assert numpy.allclose(dcn.conv_offset.bias.grad.cpu().detach().numpy(),
+                              dcn_offset_b_grad, 1e-2)
+
+    def test_mdconv(self):
+        self._test_mdconv(torch.float, device='cpu')
+        self._test_mdconv(torch.float, device='cpu')
+        self._test_mdconv(torch.float)
+        self._test_mdconv(torch.float)
+        self._test_mdconv(torch.half)
+
+        # test amp when torch version >= '1.6.0', the type of
+        # input data for mdconv might be torch.float or torch.half
+        if (TORCH_VERSION != 'parrots'
+                and digit_version(TORCH_VERSION) >= digit_version('1.6.0')):
+            with autocast(enabled=True):
+                self._test_amp_mdconv(torch.float)
+                self._test_amp_mdconv(torch.half)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ms_deformable_attn.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ms_deformable_attn.py
new file mode 100644
index 0000000000000000000000000000000000000000..04f3d04437862516d47a37f814929676ee334b4b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_ms_deformable_attn.py
@@ -0,0 +1,244 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops.multi_scale_deform_attn import (
+    MultiScaleDeformableAttention, MultiScaleDeformableAttnFunction,
+    multi_scale_deformable_attn_pytorch)
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+
+@pytest.mark.parametrize('device', [
+    'cpu',
+    pytest.param(
+        'cuda:0',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+def test_multiscale_deformable_attention(device):
+    with pytest.raises(ValueError):
+        # embed_dims must be divisible by num_heads,
+        MultiScaleDeformableAttention(
+            embed_dims=256,
+            num_heads=7,
+        )
+    device = torch.device(device)
+    msda = MultiScaleDeformableAttention(
+        embed_dims=3, num_levels=2, num_heads=3)
+    msda.init_weights()
+    num_query = 5
+    bs = 1
+    embed_dims = 3
+    query = torch.rand(num_query, bs, embed_dims).to(device)
+    key = torch.rand(num_query, bs, embed_dims).to(device)
+    spatial_shapes = torch.Tensor([[2, 2], [1, 1]]).long().to(device)
+    level_start_index = torch.Tensor([0, 4]).long().to(device)
+    reference_points = torch.rand(bs, num_query, 2, 2).to(device)
+    msda.to(device)
+    msda(
+        query,
+        key,
+        key,
+        reference_points=reference_points,
+        spatial_shapes=spatial_shapes,
+        level_start_index=level_start_index)
+
+    # test with value_proj_ratio
+    embed_dims = 6
+    value_proj_ratio = 0.5
+    query = torch.rand(num_query, bs, embed_dims).to(device)
+    key = torch.rand(num_query, bs, embed_dims).to(device)
+    msda = MultiScaleDeformableAttention(
+        embed_dims=embed_dims,
+        num_levels=2,
+        num_heads=3,
+        value_proj_ratio=value_proj_ratio)
+    msda.init_weights()
+    msda.to(device)
+    msda(
+        query,
+        key,
+        key,
+        reference_points=reference_points,
+        spatial_shapes=spatial_shapes,
+        level_start_index=level_start_index)
+
+
+def test_forward_multi_scale_deformable_attn_pytorch():
+    N, M, D = 1, 2, 2
+    Lq, L, P = 2, 2, 2
+    shapes = torch.as_tensor([(6, 4), (3, 2)], dtype=torch.long)
+    S = sum((H * W).item() for H, W in shapes)
+
+    torch.manual_seed(3)
+    value = torch.rand(N, S, M, D) * 0.01
+    sampling_locations = torch.rand(N, Lq, M, L, P, 2)
+    attention_weights = torch.rand(N, Lq, M, L, P) + 1e-5
+    attention_weights /= attention_weights.sum(
+        -1, keepdim=True).sum(
+            -2, keepdim=True)
+
+    multi_scale_deformable_attn_pytorch(value.float(), shapes,
+                                        sampling_locations.float(),
+                                        attention_weights.float()).detach()
+
+
+@pytest.mark.skipif(not IS_CUDA_AVAILABLE, reason='requires CUDA support')
+def test_forward_equal_with_pytorch_float():
+    N, M, D = 1, 2, 2
+    Lq, L, P = 2, 2, 2
+    shapes = torch.as_tensor([(6, 4), (3, 2)], dtype=torch.long)
+    level_start_index = torch.cat((shapes.new_zeros(
+        (1, )), shapes.prod(1).cumsum(0)[:-1]))
+    S = sum((H * W).item() for H, W in shapes)
+
+    torch.manual_seed(3)
+    value = torch.rand(N, S, M, D) * 0.01
+    sampling_locations = torch.rand(N, Lq, M, L, P, 2)
+    attention_weights = torch.rand(N, Lq, M, L, P) + 1e-5
+    attention_weights /= attention_weights.sum(
+        -1, keepdim=True).sum(
+            -2, keepdim=True)
+    im2col_step = 2
+    output_pytorch = multi_scale_deformable_attn_pytorch(
+        value.float(), shapes, sampling_locations.float(),
+        attention_weights.float()).detach().cpu()
+
+    output_cuda = MultiScaleDeformableAttnFunction.apply(
+        value.cuda().float(), shapes.cuda(), level_start_index.cuda(),
+        sampling_locations.cuda().float(),
+        attention_weights.cuda().float(), im2col_step).detach().cpu()
+    assert torch.allclose(output_cuda, output_pytorch)
+    max_abs_err = (output_cuda - output_pytorch).abs().max()
+    max_rel_err = ((output_cuda - output_pytorch).abs() /
+                   output_pytorch.abs()).max()
+    assert max_abs_err < 1e-18
+    assert max_rel_err < 1e-15
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+def test_forward_equal_with_pytorch_float(device):
+    N, M, D = 1, 2, 2
+    Lq, L, P = 2, 2, 2
+    shapes = torch.as_tensor([(6, 4), (3, 2)], dtype=torch.long)
+    level_start_index = torch.cat((shapes.new_zeros(
+        (1, )), shapes.prod(1).cumsum(0)[:-1]))
+    S = sum((H * W).item() for H, W in shapes)
+
+    torch.manual_seed(3)
+    value = torch.rand(N, S, M, D) * 0.01
+    sampling_locations = torch.rand(N, Lq, M, L, P, 2)
+    attention_weights = torch.rand(N, Lq, M, L, P) + 1e-5
+    attention_weights /= attention_weights.sum(
+        -1, keepdim=True).sum(
+            -2, keepdim=True)
+    im2col_step = 2
+    output_pytorch = multi_scale_deformable_attn_pytorch(
+        value, shapes, sampling_locations, attention_weights).detach().cpu()
+
+    output_device = MultiScaleDeformableAttnFunction.apply(
+        value.to(device), shapes.to(device), level_start_index.to(device),
+        sampling_locations.to(device), attention_weights.to(device),
+        im2col_step).detach().cpu()
+    assert torch.allclose(output_device, output_pytorch, rtol=1e-2, atol=1e-3)
+    max_abs_err = (output_device - output_pytorch).abs().max()
+    max_rel_err = ((output_device - output_pytorch).abs() /
+                   output_pytorch.abs()).max()
+    assert max_abs_err < 1e-9
+    assert max_rel_err < 1e-6
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
+@pytest.mark.parametrize('channels', [
+    4,
+    30,
+    32,
+    64,
+    71,
+    1025,
+])
+def test_gradient_numerical(channels,
+                            device,
+                            dtype,
+                            grad_value=True,
+                            grad_sampling_loc=True,
+                            grad_attn_weight=True):
+
+    N, M, _ = 1, 2, 2
+    Lq, L, P = 2, 2, 2
+    shapes = torch.as_tensor([(3, 2), (2, 1)], dtype=torch.long).to(device)
+    level_start_index = torch.cat((shapes.new_zeros(
+        (1, )), shapes.prod(1).cumsum(0)[:-1]))
+    S = sum((H * W).item() for H, W in shapes)
+
+    value = torch.rand(N, S, M, channels).to(device) * 0.01
+    sampling_locations = torch.rand(N, Lq, M, L, P, 2).to(device)
+    attention_weights = torch.rand(N, Lq, M, L, P).to(device) + 1e-5
+    attention_weights /= attention_weights.sum(
+        -1, keepdim=True).sum(
+            -2, keepdim=True)
+    im2col_step = 2
+
+    func = MultiScaleDeformableAttnFunction.apply
+
+    value.requires_grad = grad_value
+    sampling_locations.requires_grad = grad_sampling_loc
+    attention_weights.requires_grad = grad_attn_weight
+    if device == 'cuda':
+        dtype = torch.float
+        eps = 1e-6
+    elif device == 'mlu':
+        dtype = torch.float
+        eps = 1e-4
+    if _USING_PARROTS:
+        assert gradcheck(
+            func, (value.to(dtype), shapes, level_start_index,
+                   sampling_locations.to(dtype), attention_weights.to(dtype),
+                   im2col_step),
+            no_grads=[shapes, level_start_index],
+            eps=eps)
+    else:
+        assert gradcheck(
+            func, (value.to(dtype), shapes, level_start_index,
+                   sampling_locations.to(dtype), attention_weights.to(dtype),
+                   im2col_step),
+            eps=eps,
+            atol=1e-2)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f1ac65d61848d67b6a65da4c610b4658bd3ed10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms.py
@@ -0,0 +1,205 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import mmengine
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+class Testnms:
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support'))
+    ])
+    def test_nms_allclose(self, device):
+        from mmcv.ops import nms
+        np_boxes = np.array([[6.0, 3.0, 8.0, 7.0], [3.0, 6.0, 9.0, 11.0],
+                             [3.0, 7.0, 10.0, 12.0], [1.0, 4.0, 13.0, 7.0]],
+                            dtype=np.float32)
+        np_scores = np.array([0.6, 0.9, 0.7, 0.2], dtype=np.float32)
+        np_inds = np.array([1, 0, 3])
+        np_dets = np.array([[3.0, 6.0, 9.0, 11.0, 0.9],
+                            [6.0, 3.0, 8.0, 7.0, 0.6],
+                            [1.0, 4.0, 13.0, 7.0, 0.2]])
+        boxes = torch.from_numpy(np_boxes)
+        scores = torch.from_numpy(np_scores)
+        dets, inds = nms(boxes, scores, iou_threshold=0.3, offset=0)
+        assert np.allclose(dets, np_dets)  # test cpu
+        assert np.allclose(inds, np_inds)  # test cpu
+        dets, inds = nms(
+            boxes.to(device), scores.to(device), iou_threshold=0.3, offset=0)
+        assert np.allclose(dets.cpu().numpy(), np_dets)  # test gpu
+        assert np.allclose(inds.cpu().numpy(), np_inds)  # test gpu
+
+    def test_softnms_allclose(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import soft_nms
+        np_boxes = np.array([[6.0, 3.0, 8.0, 7.0], [3.0, 6.0, 9.0, 11.0],
+                             [3.0, 7.0, 10.0, 12.0], [1.0, 4.0, 13.0, 7.0]],
+                            dtype=np.float32)
+        np_scores = np.array([0.6, 0.9, 0.7, 0.2], dtype=np.float32)
+
+        np_output = {
+            'linear': {
+                'dets':
+                np.array(
+                    [[3., 6., 9., 11., 0.9], [6., 3., 8., 7., 0.6],
+                     [3., 7., 10., 12., 0.29024392], [1., 4., 13., 7., 0.2]],
+                    dtype=np.float32),
+                'inds':
+                np.array([1, 0, 2, 3], dtype=np.int64)
+            },
+            'gaussian': {
+                'dets':
+                np.array([[3., 6., 9., 11., 0.9], [6., 3., 8., 7., 0.59630775],
+                          [3., 7., 10., 12., 0.35275510],
+                          [1., 4., 13., 7., 0.18650459]],
+                         dtype=np.float32),
+                'inds':
+                np.array([1, 0, 2, 3], dtype=np.int64)
+            },
+            'naive': {
+                'dets':
+                np.array([[3., 6., 9., 11., 0.9], [6., 3., 8., 7., 0.6],
+                          [1., 4., 13., 7., 0.2]],
+                         dtype=np.float32),
+                'inds':
+                np.array([1, 0, 3], dtype=np.int64)
+            }
+        }
+
+        boxes = torch.from_numpy(np_boxes)
+        scores = torch.from_numpy(np_scores)
+
+        configs = [[0.3, 0.5, 0.01, 'linear'], [0.3, 0.5, 0.01, 'gaussian'],
+                   [0.3, 0.5, 0.01, 'naive']]
+
+        for iou, sig, mscore, m in configs:
+            dets, inds = soft_nms(
+                boxes,
+                scores,
+                iou_threshold=iou,
+                sigma=sig,
+                min_score=mscore,
+                method=m)
+            assert np.allclose(dets.cpu().numpy(), np_output[m]['dets'])
+            assert np.allclose(inds.cpu().numpy(), np_output[m]['inds'])
+
+        if torch.__version__ != 'parrots':
+            boxes = boxes.cuda()
+            scores = scores.cuda()
+            for iou, sig, mscore, m in configs:
+                dets, inds = soft_nms(
+                    boxes,
+                    scores,
+                    iou_threshold=iou,
+                    sigma=sig,
+                    min_score=mscore,
+                    method=m)
+                assert np.allclose(dets.cpu().numpy(), np_output[m]['dets'])
+                assert np.allclose(inds.cpu().numpy(), np_output[m]['inds'])
+
+    def test_nms_match(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import nms, nms_match
+        iou_thr = 0.6
+        # empty input
+        empty_dets = np.array([])
+        assert len(nms_match(empty_dets, iou_thr)) == 0
+
+        # non empty ndarray input
+        np_dets = np.array(
+            [[49.1, 32.4, 51.0, 35.9, 0.9], [49.3, 32.9, 51.0, 35.3, 0.9],
+             [35.3, 11.5, 39.9, 14.5, 0.4], [35.2, 11.7, 39.7, 15.7, 0.3]],
+            dtype=np.float32)
+        np_groups = nms_match(np_dets, iou_thr)
+        assert isinstance(np_groups[0], np.ndarray)
+        assert len(np_groups) == 2
+        tensor_dets = torch.from_numpy(np_dets)
+        boxes = tensor_dets[:, :4]
+        scores = tensor_dets[:, 4]
+        nms_keep_inds = nms(boxes.contiguous(), scores.contiguous(),
+                            iou_thr)[1]
+        assert {g[0].item() for g in np_groups} == set(nms_keep_inds.tolist())
+
+        # non empty tensor input
+        tensor_dets = torch.from_numpy(np_dets)
+        tensor_groups = nms_match(tensor_dets, iou_thr)
+        assert isinstance(tensor_groups[0], torch.Tensor)
+        for i in range(len(tensor_groups)):
+            assert np.equal(tensor_groups[i].numpy(), np_groups[i]).all()
+
+        # input of wrong shape
+        wrong_dets = np.zeros((2, 3))
+        with pytest.raises(AssertionError):
+            nms_match(wrong_dets, iou_thr)
+
+    def test_batched_nms(self):
+        from mmcv.ops import batched_nms
+        results = mmengine.load('./tests/data/batched_nms_data.pkl')
+
+        nms_max_num = 100
+        nms_cfg = dict(
+            type='nms',
+            iou_threshold=0.7,
+            score_threshold=0.5,
+            max_num=nms_max_num)
+        boxes, keep = batched_nms(
+            torch.from_numpy(results['boxes']),
+            torch.from_numpy(results['scores']),
+            torch.from_numpy(results['idxs']),
+            nms_cfg,
+            class_agnostic=False)
+
+        nms_cfg.update(split_thr=100)
+        seq_boxes, seq_keep = batched_nms(
+            torch.from_numpy(results['boxes']),
+            torch.from_numpy(results['scores']),
+            torch.from_numpy(results['idxs']),
+            nms_cfg,
+            class_agnostic=False)
+
+        assert torch.equal(keep, seq_keep)
+        assert torch.equal(boxes, seq_boxes)
+        assert torch.equal(keep,
+                           torch.from_numpy(results['keep'][:nms_max_num]))
+
+        nms_cfg = dict(type='soft_nms', iou_threshold=0.7)
+        boxes, keep = batched_nms(
+            torch.from_numpy(results['boxes']),
+            torch.from_numpy(results['scores']),
+            torch.from_numpy(results['idxs']),
+            nms_cfg,
+            class_agnostic=False)
+
+        nms_cfg.update(split_thr=100)
+        seq_boxes, seq_keep = batched_nms(
+            torch.from_numpy(results['boxes']),
+            torch.from_numpy(results['scores']),
+            torch.from_numpy(results['idxs']),
+            nms_cfg,
+            class_agnostic=False)
+
+        assert torch.equal(keep, seq_keep)
+        assert torch.equal(boxes, seq_boxes)
+
+        # test skip nms when `nms_cfg` is None
+        seq_boxes, seq_keep = batched_nms(
+            torch.from_numpy(results['boxes']),
+            torch.from_numpy(results['scores']),
+            torch.from_numpy(results['idxs']),
+            None,
+            class_agnostic=False)
+        assert len(seq_keep) == len(results['boxes'])
+        # assert score is descending order
+        assert ((seq_boxes[:, -1][1:] - seq_boxes[:, -1][:-1]) < 0).all()
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_quadri.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_quadri.py
new file mode 100644
index 0000000000000000000000000000000000000000..51f91f06205ea0a20ef603138d3be872a9e1e0c7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_quadri.py
@@ -0,0 +1,119 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE
+
+
+class TestNMSQuadri:
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    ])
+    def test_ml_nms_quadri(self, device):
+        from mmcv.ops import nms_quadri
+        np_boxes = np.array([[1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.0, 1.0, 0.7],
+                             [2.0, 2.0, 3.0, 4.0, 4.0, 2.0, 3.0, 1.0, 0.8],
+                             [7.0, 7.0, 8.0, 8.0, 9.0, 7.0, 8.0, 6.0, 0.5],
+                             [0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.9]],
+                            dtype=np.float32)
+        np_labels = np.array([1, 0, 1, 0], dtype=np.float32)
+
+        np_expect_dets = np.array([[0., 0., 0., 2., 2., 2., 2., 0.],
+                                   [2., 2., 3., 4., 4., 2., 3., 1.],
+                                   [7., 7., 8., 8., 9., 7., 8., 6.]],
+                                  dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 2], dtype=np.int64)
+
+        boxes = torch.from_numpy(np_boxes).to(device)
+        labels = torch.from_numpy(np_labels).to(device)
+
+        dets, keep_inds = nms_quadri(boxes[:, :8], boxes[:, -1], 0.3, labels)
+
+        assert np.allclose(dets.cpu().numpy()[:, :8], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    ])
+    def test_nms_quadri(self, device):
+        from mmcv.ops import nms_quadri
+        np_boxes = np.array([[1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.0, 1.0, 0.7],
+                             [2.0, 2.0, 3.0, 4.0, 4.0, 2.0, 3.0, 1.0, 0.8],
+                             [7.0, 7.0, 8.0, 8.0, 9.0, 7.0, 8.0, 6.0, 0.5],
+                             [0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.9]],
+                            dtype=np.float32)
+
+        np_expect_dets = np.array([[0., 0., 0., 2., 2., 2., 2., 0.],
+                                   [2., 2., 3., 4., 4., 2., 3., 1.],
+                                   [7., 7., 8., 8., 9., 7., 8., 6.]],
+                                  dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 2], dtype=np.int64)
+
+        boxes = torch.from_numpy(np_boxes).to(device)
+
+        dets, keep_inds = nms_quadri(boxes[:, :8], boxes[:, -1], 0.3)
+        assert np.allclose(dets.cpu().numpy()[:, :8], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+    @pytest.mark.parametrize('device', [
+        'cpu',
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    ])
+    def test_batched_nms(self, device):
+        # test batched_nms with nms_quadri
+        from mmcv.ops import batched_nms
+
+        np_boxes = np.array([[1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.0, 1.0, 0.7],
+                             [2.0, 2.0, 3.0, 4.0, 4.0, 2.0, 3.0, 1.0, 0.8],
+                             [7.0, 7.0, 8.0, 8.0, 9.0, 7.0, 8.0, 6.0, 0.5],
+                             [0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.9]],
+                            dtype=np.float32)
+        np_labels = np.array([1, 0, 1, 0], dtype=np.float32)
+
+        np_expect_agnostic_dets = np.array([[0., 0., 0., 2., 2., 2., 2., 0.],
+                                            [2., 2., 3., 4., 4., 2., 3., 1.],
+                                            [7., 7., 8., 8., 9., 7., 8., 6.]],
+                                           dtype=np.float32)
+        np_expect_agnostic_keep_inds = np.array([3, 1, 2], dtype=np.int64)
+
+        np_expect_dets = np.array([[0., 0., 0., 2., 2., 2., 2., 0.],
+                                   [2., 2., 3., 4., 4., 2., 3., 1.],
+                                   [1., 1., 3., 4., 4., 4., 4., 1.],
+                                   [7., 7., 8., 8., 9., 7., 8., 6.]],
+                                  dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 0, 2], dtype=np.int64)
+
+        nms_cfg = dict(type='nms_quadri', iou_threshold=0.3)
+
+        # test class_agnostic is True
+        boxes, keep = batched_nms(
+            torch.from_numpy(np_boxes[:, :8]).to(device),
+            torch.from_numpy(np_boxes[:, -1]).to(device),
+            torch.from_numpy(np_labels).to(device),
+            nms_cfg,
+            class_agnostic=True)
+        assert np.allclose(boxes.cpu().numpy()[:, :8], np_expect_agnostic_dets)
+        assert np.allclose(keep.cpu().numpy(), np_expect_agnostic_keep_inds)
+
+        # test class_agnostic is False
+        boxes, keep = batched_nms(
+            torch.from_numpy(np_boxes[:, :8]).to(device),
+            torch.from_numpy(np_boxes[:, -1]).to(device),
+            torch.from_numpy(np_labels).to(device),
+            nms_cfg,
+            class_agnostic=False)
+        assert np.allclose(boxes.cpu().numpy()[:, :8], np_expect_dets)
+        assert np.allclose(keep.cpu().numpy(), np_expect_keep_inds)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_rotated.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..1b7f3607b0dedcb23564b021f13125d1681c1485
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_nms_rotated.py
@@ -0,0 +1,116 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(),
+    reason='GPU is required to test NMSRotated op')
+class TestNmsRotated:
+
+    def test_ml_nms_rotated(self):
+        from mmcv.ops import nms_rotated
+        np_boxes = np.array(
+            [[6.0, 3.0, 8.0, 7.0, 0.5, 0.7], [3.0, 6.0, 9.0, 11.0, 0.6, 0.8],
+             [3.0, 7.0, 10.0, 12.0, 0.3, 0.5], [1.0, 4.0, 13.0, 7.0, 0.6, 0.9]
+             ],
+            dtype=np.float32)
+        np_labels = np.array([1, 0, 1, 0], dtype=np.float32)
+
+        np_expect_dets = np.array(
+            [[1.0, 4.0, 13.0, 7.0, 0.6], [3.0, 6.0, 9.0, 11.0, 0.6],
+             [6.0, 3.0, 8.0, 7.0, 0.5]],
+            dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 0], dtype=np.int64)
+
+        boxes = torch.from_numpy(np_boxes).cuda()
+        labels = torch.from_numpy(np_labels).cuda()
+
+        # test cw angle definition
+        dets, keep_inds = nms_rotated(boxes[:, :5], boxes[:, -1], 0.5, labels)
+
+        assert np.allclose(dets.cpu().numpy()[:, :5], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+        # test ccw angle definition
+        boxes[..., -2] *= -1
+        dets, keep_inds = nms_rotated(
+            boxes[:, :5], boxes[:, -1], 0.5, labels, clockwise=False)
+        dets[..., -2] *= -1
+        assert np.allclose(dets.cpu().numpy()[:, :5], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+    def test_nms_rotated(self):
+        from mmcv.ops import nms_rotated
+        np_boxes = np.array(
+            [[6.0, 3.0, 8.0, 7.0, 0.5, 0.7], [3.0, 6.0, 9.0, 11.0, 0.6, 0.8],
+             [3.0, 7.0, 10.0, 12.0, 0.3, 0.5], [1.0, 4.0, 13.0, 7.0, 0.6, 0.9]
+             ],
+            dtype=np.float32)
+
+        np_expect_dets = np.array(
+            [[1.0, 4.0, 13.0, 7.0, 0.6], [3.0, 6.0, 9.0, 11.0, 0.6],
+             [6.0, 3.0, 8.0, 7.0, 0.5]],
+            dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 0], dtype=np.int64)
+
+        boxes = torch.from_numpy(np_boxes).cuda()
+
+        # test cw angle definition
+        dets, keep_inds = nms_rotated(boxes[:, :5], boxes[:, -1], 0.5)
+        assert np.allclose(dets.cpu().numpy()[:, :5], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+        # test ccw angle definition
+        boxes[..., -2] *= -1
+        dets, keep_inds = nms_rotated(
+            boxes[:, :5], boxes[:, -1], 0.5, clockwise=False)
+        dets[..., -2] *= -1
+        assert np.allclose(dets.cpu().numpy()[:, :5], np_expect_dets)
+        assert np.allclose(keep_inds.cpu().numpy(), np_expect_keep_inds)
+
+    def test_batched_nms(self):
+        # test batched_nms with nms_rotated
+        from mmcv.ops import batched_nms
+
+        np_boxes = np.array(
+            [[6.0, 3.0, 8.0, 7.0, 0.5, 0.7], [3.0, 6.0, 9.0, 11.0, 0.6, 0.8],
+             [3.0, 7.0, 10.0, 12.0, 0.3, 0.5], [1.0, 4.0, 13.0, 7.0, 0.6, 0.9]
+             ],
+            dtype=np.float32)
+        np_labels = np.array([1, 0, 1, 0], dtype=np.float32)
+
+        np_expect_agnostic_dets = np.array(
+            [[1.0, 4.0, 13.0, 7.0, 0.6], [3.0, 6.0, 9.0, 11.0, 0.6],
+             [6.0, 3.0, 8.0, 7.0, 0.5]],
+            dtype=np.float32)
+        np_expect_agnostic_keep_inds = np.array([3, 1, 0], dtype=np.int64)
+
+        np_expect_dets = np.array(
+            [[1.0, 4.0, 13.0, 7.0, 0.6], [3.0, 6.0, 9.0, 11.0, 0.6],
+             [6.0, 3.0, 8.0, 7.0, 0.5], [3.0, 7.0, 10.0, 12.0, 0.3]],
+            dtype=np.float32)
+        np_expect_keep_inds = np.array([3, 1, 0, 2], dtype=np.int64)
+
+        nms_cfg = dict(type='nms_rotated', iou_threshold=0.5)
+
+        # test class_agnostic is True
+        boxes, keep = batched_nms(
+            torch.from_numpy(np_boxes[:, :5]),
+            torch.from_numpy(np_boxes[:, -1]),
+            torch.from_numpy(np_labels),
+            nms_cfg,
+            class_agnostic=True)
+        assert np.allclose(boxes.cpu().numpy()[:, :5], np_expect_agnostic_dets)
+        assert np.allclose(keep.cpu().numpy(), np_expect_agnostic_keep_inds)
+
+        # test class_agnostic is False
+        boxes, keep = batched_nms(
+            torch.from_numpy(np_boxes[:, :5]),
+            torch.from_numpy(np_boxes[:, -1]),
+            torch.from_numpy(np_labels),
+            nms_cfg,
+            class_agnostic=False)
+        assert np.allclose(boxes.cpu().numpy()[:, :5], np_expect_dets)
+        assert np.allclose(keep.cpu().numpy(), np_expect_keep_inds)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_onnx.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_onnx.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5edc9ac4a6efb2129a74b7db23ff46e8cf89d82
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_onnx.py
@@ -0,0 +1,286 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+import numpy as np
+import onnx
+import onnxruntime as rt
+import pytest
+import torch
+import torch.nn as nn
+
+onnx_file = 'tmp.onnx'
+if torch.__version__ == 'parrots':
+    pytest.skip('not supported in parrots now', allow_module_level=True)
+
+
+@pytest.fixture(autouse=True)
+def run_before_and_after_test():
+    # clear onnx_file before test
+    if os.path.exists(onnx_file):
+        os.remove(onnx_file)
+
+    yield
+
+    # clear onnx_file after test
+    if os.path.exists(onnx_file):
+        os.remove(onnx_file)
+
+
+class WrapFunction(nn.Module):
+
+    def __init__(self, wrapped_function):
+        super().__init__()
+        self.wrapped_function = wrapped_function
+
+    def forward(self, *args, **kwargs):
+        return self.wrapped_function(*args, **kwargs)
+
+
+def test_roialign():
+    try:
+        from mmcv.ops import roi_align
+    except (ImportError, ModuleNotFoundError):
+        pytest.skip('roi_align op is not successfully compiled')
+
+    # roi align config
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    sampling_ratio = 2
+
+    inputs = [([[[[1., 2.], [3., 4.]]]], [[0., 0., 0., 1., 1.]]),
+              ([[[[1., 2.], [3., 4.]], [[4., 3.],
+                                        [2., 1.]]]], [[0., 0., 0., 1., 1.]]),
+              ([[[[1., 2., 5., 6.], [3., 4., 7., 8.], [9., 10., 13., 14.],
+                  [11., 12., 15., 16.]]]], [[0., 0., 0., 3., 3.]])]
+
+    def warpped_function(torch_input, torch_rois):
+        return roi_align(torch_input, torch_rois, (pool_w, pool_h),
+                         spatial_scale, sampling_ratio, 'avg', True)
+
+    for case in inputs:
+        np_input = np.array(case[0], dtype=np.float32)
+        np_rois = np.array(case[1], dtype=np.float32)
+        input = torch.from_numpy(np_input)
+        rois = torch.from_numpy(np_rois)
+
+        # compute pytorch_output
+        with torch.no_grad():
+            pytorch_output = roi_align(input, rois, (pool_w, pool_h),
+                                       spatial_scale, sampling_ratio, 'avg',
+                                       True)
+
+        # export and load onnx model
+        wrapped_model = WrapFunction(warpped_function)
+        with torch.no_grad():
+            torch.onnx.export(
+                wrapped_model, (input, rois),
+                onnx_file,
+                export_params=True,
+                keep_initializers_as_inputs=True,
+                input_names=['input', 'rois'],
+                opset_version=11)
+
+        onnx_model = onnx.load(onnx_file)
+        session_options = rt.SessionOptions()
+
+        # compute onnx_output
+        input_all = [node.name for node in onnx_model.graph.input]
+        input_initializer = [
+            node.name for node in onnx_model.graph.initializer
+        ]
+        net_feed_input = list(set(input_all) - set(input_initializer))
+        assert (len(net_feed_input) == 2)
+        sess = rt.InferenceSession(
+            onnx_file, session_options, providers=['CPUExecutionProvider'])
+        onnx_output = sess.run(None, {
+            'input': input.detach().numpy(),
+            'rois': rois.detach().numpy()
+        })
+        onnx_output = onnx_output[0]
+
+        # allclose
+
+        assert np.allclose(pytorch_output, onnx_output, atol=1e-3)
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_roipool():
+    from mmcv.ops import roi_pool
+
+    # roi pool config
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+
+    inputs = [([[[[1., 2.], [3., 4.]]]], [[0., 0., 0., 1., 1.]]),
+              ([[[[1., 2.], [3., 4.]], [[4., 3.],
+                                        [2., 1.]]]], [[0., 0., 0., 1., 1.]]),
+              ([[[[1., 2., 5., 6.], [3., 4., 7., 8.], [9., 10., 13., 14.],
+                  [11., 12., 15., 16.]]]], [[0., 0., 0., 3., 3.]])]
+
+    def warpped_function(torch_input, torch_rois):
+        return roi_pool(torch_input, torch_rois, (pool_w, pool_h),
+                        spatial_scale)
+
+    for case in inputs:
+        np_input = np.array(case[0], dtype=np.float32)
+        np_rois = np.array(case[1], dtype=np.float32)
+        input = torch.from_numpy(np_input).cuda()
+        rois = torch.from_numpy(np_rois).cuda()
+
+        # compute pytorch_output
+        with torch.no_grad():
+            pytorch_output = roi_pool(input, rois, (pool_w, pool_h),
+                                      spatial_scale)
+            pytorch_output = pytorch_output.cpu()
+
+        # export and load onnx model
+        wrapped_model = WrapFunction(warpped_function)
+        with torch.no_grad():
+            torch.onnx.export(
+                wrapped_model, (input, rois),
+                onnx_file,
+                export_params=True,
+                keep_initializers_as_inputs=True,
+                input_names=['input', 'rois'],
+                opset_version=11)
+        onnx_model = onnx.load(onnx_file)
+
+        # compute onnx_output
+        input_all = [node.name for node in onnx_model.graph.input]
+        input_initializer = [
+            node.name for node in onnx_model.graph.initializer
+        ]
+        net_feed_input = list(set(input_all) - set(input_initializer))
+        assert (len(net_feed_input) == 2)
+        sess = rt.InferenceSession(
+            onnx_file, providers=['CPUExecutionProvider'])
+        onnx_output = sess.run(
+            None, {
+                'input': input.detach().cpu().numpy(),
+                'rois': rois.detach().cpu().numpy()
+            })
+        onnx_output = onnx_output[0]
+
+        # allclose
+        assert np.allclose(pytorch_output, onnx_output, atol=1e-3)
+
+
+def _test_symbolic(model, inputs, symbol_name):
+    with torch.no_grad():
+        torch.onnx.export(model, inputs, onnx_file, opset_version=11)
+
+    import onnx
+    model = onnx.load(onnx_file)
+    nodes = model.graph.node
+
+    symbol_exist = False
+    for n in nodes:
+        if n.op_type == symbol_name:
+            symbol_exist = True
+    assert symbol_exist
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_border_align():
+    from mmcv.ops import BorderAlign
+    model = BorderAlign(2)
+    input = torch.rand(1, 8, 2, 2).cuda()
+    boxes = torch.rand(1, 4, 4).cuda()
+    _test_symbolic(model, (input, boxes), 'MMCVBorderAlign')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_carafe():
+    from mmcv.ops import CARAFENaive
+    feat = torch.randn(2, 64, 3, 3, device='cuda').float()
+    mask = torch.randn(2, 100, 6, 6, device='cuda').sigmoid().float()
+    _test_symbolic(CARAFENaive(5, 4, 2), (feat, mask), 'MMCVCARAFENaive')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_deform_conv():
+    from mmcv.ops import DeformConv2dPack
+    x = torch.randn(1, 2, 4, 4, device='cuda')
+    _test_symbolic(
+        DeformConv2dPack(2, 4, 3, 1, 1).cuda(), x, 'MMCVDeformConv2d')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_modulated_deform_conv():
+    from mmcv.ops import ModulatedDeformConv2dPack
+    x = torch.randn(1, 2, 4, 4, device='cuda')
+    _test_symbolic(
+        ModulatedDeformConv2dPack(2, 4, 3, 1, 1).cuda(), x,
+        'MMCVModulatedDeformConv2d')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_deform_roi_pool():
+    from mmcv.ops import DeformRoIPoolPack
+    x = torch.tensor([[[[1., 2.], [3., 4.]]]], device='cuda')
+    rois = torch.tensor([[0., 0., 0., 1., 1.]], device='cuda')
+    output_c = x.size(1)
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    sampling_ratio = 2
+    model = DeformRoIPoolPack((pool_h, pool_w),
+                              output_c,
+                              spatial_scale=spatial_scale,
+                              sampling_ratio=sampling_ratio).cuda()
+
+    _test_symbolic(model, (x, rois), 'MMCVDeformRoIPool')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_masked_conv():
+    from mmcv.ops import MaskedConv2d
+    x = torch.rand(1, 2, 4, 4, device='cuda')
+    mask = torch.rand(1, 4, 4, device='cuda')
+    _test_symbolic(
+        MaskedConv2d(2, 4, 3, 1, 1).cuda(), (x, mask), 'MMCVMaskedConv2d')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_pr_roi_pool():
+    from mmcv.ops import PrRoIPool
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    x = torch.tensor([[[[1., 2.], [3., 4.]]]], device='cuda')
+    rois = torch.tensor([[0., 0., 0., 1., 1.]], device='cuda')
+    model = PrRoIPool((pool_h, pool_w), spatial_scale).cuda()
+    _test_symbolic(model, (x, rois), 'PrRoIPool')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_psa_mask():
+    from mmcv.ops import PSAMask
+    input = torch.rand(4, 16, 8, 8).cuda()
+    model = PSAMask('collect', (4, 4)).cuda()
+    _test_symbolic(model, input, 'MMCVPSAMask')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_roi_align_rotated():
+    from mmcv.ops import RoIAlignRotated
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    sampling_ratio = 2
+    x = torch.tensor([[[[1., 2.], [3., 4.]]]], device='cuda')
+    rois = torch.tensor([[0., 0.5, 0.5, 1., 1., 0]], device='cuda')
+    model = RoIAlignRotated((pool_h, pool_w), spatial_scale,
+                            sampling_ratio).cuda()
+    _test_symbolic(model, (x, rois), 'MMCVRoIAlignRotated')
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason='test requires GPU')
+def test_roi_feaeture_align():
+    from mmcv.ops import rotated_feature_align
+    wrapped_model = WrapFunction(rotated_feature_align)
+    feature = torch.rand(1, 1, 2, 2, device='cuda')
+    bbox = torch.rand(1, 2, 2, 5, device='cuda')
+    _test_symbolic(wrapped_model, (feature, bbox), 'MMCVRotatedFeatureAlign')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_pixel_group.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_pixel_group.py
new file mode 100644
index 0000000000000000000000000000000000000000..ceb257365729d0238359ccce8d6a0e60939c6ef6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_pixel_group.py
@@ -0,0 +1,78 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import torch
+
+
+def test_pixel_group():
+    from mmcv.ops import pixel_group
+    np_score = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0],
+                         [0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0],
+                         [0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0],
+                         [0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]).astype(np.float32)
+    np_mask = (np_score > 0.5)
+    np_embedding = np.zeros((10, 10, 8)).astype(np.float32)
+    np_embedding[:, :7] = 0.9
+    np_embedding[:, 7:] = 10.0
+    np_kernel_label = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 1, 1, 1, 0, 0, 0, 2, 0],
+                                [0, 0, 1, 1, 1, 0, 0, 0, 2, 0],
+                                [0, 0, 1, 1, 1, 0, 0, 0, 2, 0],
+                                [0, 0, 1, 1, 1, 0, 0, 0, 2, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                [0, 0, 0, 0, 0, 0, 0, 0, 0,
+                                 0]]).astype(np.int32)
+    np_kernel_contour = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                  [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
+                                  [0, 0, 1, 0, 1, 0, 0, 0, 1, 0],
+                                  [0, 0, 1, 0, 1, 0, 0, 0, 1, 0],
+                                  [0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
+                                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                                  [0, 0, 0, 0, 0, 0, 0, 0, 0,
+                                   0]]).astype(np.uint8)
+    kernel_region_num = 3
+    distance_threshold = float(0.8)
+    result = pixel_group(np_score, np_mask, np_embedding, np_kernel_label,
+                         np_kernel_contour, kernel_region_num,
+                         distance_threshold)
+    gt_1 = [
+        0.8999997973442078, 24.0, 1.0, 3.0, 2.0, 3.0, 3.0, 3.0, 4.0, 3.0, 5.0,
+        3.0, 6.0, 3.0, 1.0, 4.0, 2.0, 4.0, 3.0, 4.0, 4.0, 4.0, 5.0, 4.0, 6.0,
+        4.0, 1.0, 5.0, 2.0, 5.0, 3.0, 5.0, 4.0, 5.0, 5.0, 5.0, 6.0, 5.0, 1.0,
+        6.0, 2.0, 6.0, 3.0, 6.0, 4.0, 6.0, 5.0, 6.0, 6.0, 6.0
+    ]
+
+    gt_2 = [
+        0.9000000357627869, 8.0, 7.0, 3.0, 8.0, 3.0, 7.0, 4.0, 8.0, 4.0, 7.0,
+        5.0, 8.0, 5.0, 7.0, 6.0, 8.0, 6.0
+    ]
+
+    assert np.allclose(result[0], [0, 0])
+    assert np.allclose(result[1], gt_1)
+    assert np.allclose(result[2], gt_2)
+
+    # test torch Tensor
+    np_score_t = torch.from_numpy(np_score)
+    np_mask_t = torch.from_numpy(np_mask)
+    np_embedding_t = torch.from_numpy(np_embedding)
+    np_kernel_label_t = torch.from_numpy(np_kernel_label)
+    np_kernel_contour_t = torch.from_numpy(np_kernel_contour)
+
+    result = pixel_group(np_score_t, np_mask_t, np_embedding_t,
+                         np_kernel_label_t, np_kernel_contour_t,
+                         kernel_region_num, distance_threshold)
+
+    assert np.allclose(result[0], [0, 0])
+    assert np.allclose(result[1], gt_1)
+    assert np.allclose(result[2], gt_2)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_points_in_polygons.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_points_in_polygons.py
new file mode 100644
index 0000000000000000000000000000000000000000..28bb8951d18936bd67825ef3f447e34de9ae2519
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_points_in_polygons.py
@@ -0,0 +1,23 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import points_in_polygons
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_points_in_polygons():
+    points = np.array([[300., 300.], [400., 400.], [100., 100], [300, 250],
+                       [100, 0]])
+    polygons = np.array([[200., 200., 400., 400., 500., 200., 400., 100.],
+                         [400., 400., 500., 500., 600., 300., 500., 200.],
+                         [300., 300., 600., 700., 700., 700., 700., 100.]])
+    expected_output = np.array([[0., 0., 0.], [0., 0., 1.], [0., 0., 0.],
+                                [1., 0., 0.], [0., 0., 0.]])
+    points = torch.from_numpy(points).float().cuda()
+    polygons = torch.from_numpy(polygons).float().cuda()
+    expected_output = torch.from_numpy(expected_output).float().cuda()
+    assert torch.allclose(
+        points_in_polygons(points, polygons), expected_output, 1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_prroi_pool.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_prroi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..0535dfbe21c817a5067279f3d0229ab4b94deb78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_prroi_pool.py
@@ -0,0 +1,98 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+
+    _USING_PARROTS = False
+
+inputs = [([[[[1., 2.], [3., 4.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2.], [3., 4.]], [[4., 3.], [2.,
+                                               1.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.], [9., 10., 13., 14.],
+              [11., 12., 15., 16.]]]], [[0., 0., 0., 3., 3.]])]
+outputs = [
+    ([[[[1.75, 2.25], [2.75, 3.25]]]], [[[[1., 1.],
+                                          [1., 1.]]]], [[0., 2., 4., 2., 4.]]),
+    ([[[[1.75, 2.25], [2.75, 3.25]],
+       [[3.25, 2.75], [2.25, 1.75]]]], [[[[1., 1.], [1., 1.]],
+                                         [[1., 1.],
+                                          [1., 1.]]]], [[0., 0., 0., 0., 0.]]),
+    ([[[[3.75, 6.91666651],
+        [10.08333302,
+         13.25]]]], [[[[0.11111111, 0.22222224, 0.22222222, 0.11111111],
+                       [0.22222224, 0.444444448, 0.44444448, 0.22222224],
+                       [0.22222224, 0.44444448, 0.44444448, 0.22222224],
+                       [0.11111111, 0.22222224, 0.22222224, 0.11111111]]]],
+     [[0.0, 3.33333302, 6.66666603, 3.33333349, 6.66666698]])
+]
+
+
+class TestPrRoiPool:
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+    ])
+    def test_roipool_gradcheck(self, device):
+        from mmcv.ops import PrRoIPool
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+
+        for case in inputs:
+            np_input = np.array(case[0], dtype=np.float32)
+            np_rois = np.array(case[1], dtype=np.float32)
+
+            x = torch.tensor(np_input, device=device, requires_grad=True)
+            rois = torch.tensor(np_rois, device=device)
+
+            froipool = PrRoIPool((pool_h, pool_w), spatial_scale)
+
+            if _USING_PARROTS:
+                gradcheck(froipool, (x, rois), no_grads=[rois])
+            else:
+                gradcheck(froipool, (x, rois), eps=1e-2, atol=1e-2)
+
+    def _test_roipool_allclose(self, device, dtype=torch.float):
+        from mmcv.ops import prroi_pool
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+
+        for case, output in zip(inputs, outputs):
+            np_input = np.array(case[0], dtype=np.float32)
+            np_rois = np.array(case[1], dtype=np.float32)
+            np_output = np.array(output[0], dtype=np.float32)
+            np_input_grad = np.array(output[1], dtype=np.float32)
+            np_rois_grad = np.array(output[2], dtype=np.float32)
+
+            x = torch.tensor(
+                np_input, dtype=dtype, device=device, requires_grad=True)
+            rois = torch.tensor(
+                np_rois, dtype=dtype, device=device, requires_grad=True)
+
+            output = prroi_pool(x, rois, (pool_h, pool_w), spatial_scale)
+            output.backward(torch.ones_like(output))
+            assert np.allclose(output.data.cpu().numpy(), np_output, 1e-3)
+            assert np.allclose(x.grad.data.cpu().numpy(), np_input_grad, 1e-3)
+            assert np.allclose(rois.grad.data.cpu().numpy(), np_rois_grad,
+                               1e-3)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support'))
+    ])
+    def test_roipool_allclose_float(self, device):
+        self._test_roipool_allclose(device, dtype=torch.float)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_psa_mask.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_psa_mask.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0fd86e8f5f10ded735d4f87ac285c399736f194
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_psa_mask.py
@@ -0,0 +1,126 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+import torch.nn as nn
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE, IS_NPU_AVAILABLE
+
+
+class Loss(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, input, target):
+        input = input.view(-1)
+        target = target.view(-1)
+        return torch.mean(input - target)
+
+
+class TestPSAMask:
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support')),
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support'))
+    ])
+    def test_psa_mask_collect(self, device):
+        from mmcv.ops import PSAMask
+        test_loss = Loss()
+
+        input = np.fromfile(
+            'tests/data/for_psa_mask/psa_input.bin', dtype=np.float32)
+        output_collect = np.fromfile(
+            'tests/data/for_psa_mask/psa_output_collect.bin', dtype=np.float32)
+
+        input = input.reshape((4, 16, 8, 8))
+        output_collect = output_collect.reshape((4, 64, 8, 8))
+        label = torch.ones((4, 64, 8, 8))
+
+        input = torch.FloatTensor(input)
+        input.requires_grad = True
+
+        psamask_collect = PSAMask('collect', (4, 4))
+
+        # test collect cpu
+        test_output = psamask_collect(input)
+        loss = test_loss(test_output, label)
+        loss.backward()
+        test_output = test_output.detach().numpy()
+        assert np.allclose(test_output, output_collect)
+        assert test_output.shape == output_collect.shape
+
+        psamask_collect.to(device)
+        input = input.to(device)
+        label = label.to(device)
+
+        # test collect on device
+        test_output = psamask_collect(input)
+        loss = test_loss(test_output, label)
+        loss.backward()
+        test_output = test_output.detach().cpu().numpy()
+        assert np.allclose(test_output, output_collect)
+        assert test_output.shape == output_collect.shape
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support')),
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support'))
+    ])
+    def test_psa_mask_distribute(self, device):
+        from mmcv.ops import PSAMask
+        test_loss = Loss()
+
+        input = np.fromfile(
+            'tests/data/for_psa_mask/psa_input.bin', dtype=np.float32)
+        output_distribute = np.fromfile(
+            'tests/data/for_psa_mask/psa_output_distribute.bin',
+            dtype=np.float32)
+
+        input = input.reshape((4, 16, 8, 8))
+        output_distribute = output_distribute.reshape((4, 64, 8, 8))
+        label = torch.ones((4, 64, 8, 8))
+
+        input = torch.FloatTensor(input)
+        input.requires_grad = True
+
+        psamask_distribute = PSAMask('distribute', (4, 4))
+
+        # test distribute cpu
+        test_output = psamask_distribute(input)
+        loss = test_loss(test_output, label)
+        loss.backward()
+        test_output = test_output.detach().numpy()
+        assert np.allclose(test_output, output_distribute)
+        assert test_output.shape == output_distribute.shape
+
+        psamask_distribute.to(device)
+        input = input.to(device)
+        label = label.to(device)
+
+        # test distribute on device
+        test_output = psamask_distribute(input)
+        loss = test_loss(test_output, label)
+        loss.backward()
+        test_output = test_output.detach().cpu().numpy()
+        assert np.allclose(test_output, output_distribute)
+        assert test_output.shape == output_distribute.shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_riroi_align_rotated.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_riroi_align_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..c7b501cf44b89b687cc8bf687e0583c84705143e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_riroi_align_rotated.py
@@ -0,0 +1,84 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import RiRoIAlignRotated
+
+if torch.__version__ == 'parrots':
+    from parrots.autograd import gradcheck
+    _USING_PARROTS = True
+else:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+np_feature = np.array([[[[1, 2], [3, 4]], [[1, 2], [4, 3]], [[4, 3], [2, 1]],
+                        [[1, 2], [5, 6]], [[3, 4], [7, 8]], [[9, 10], [13,
+                                                                       14]],
+                        [[11, 12], [15, 16]], [[1, 1], [2, 2]]]])
+np_rois = np.array([[0., 0.5, 0.5, 1., 1., np.pi / 3],
+                    [0., 1., 1., 3., 3., np.pi / 2]])
+expect_output = np.array([[[[1.8425, 1.3516], [2.3151, 1.8241]],
+                           [[2.4779, 1.7416], [3.2173, 2.5632]],
+                           [[2.7149, 2.2638], [2.6540, 2.3673]],
+                           [[2.9461, 2.8638], [2.8028, 2.7205]],
+                           [[4.1943, 2.7214], [5.6119, 4.1391]],
+                           [[7.5276, 6.0547], [8.9453, 7.4724]],
+                           [[12.1943, 10.7214], [13.6119, 12.1391]],
+                           [[9.5489, 8.4237], [10.5763, 9.4511]]],
+                          [[[7.6562, 12.5625], [4.0000, 6.6250]],
+                           [[1.0000, 1.3125], [0.5000, 0.6562]],
+                           [[1.6562, 1.9375], [1.0000, 1.3125]],
+                           [[1.8438, 2.0547], [0.7500, 1.1562]],
+                           [[0.8438, 3.0625], [0.2500, 1.1875]],
+                           [[2.6562, 2.5625], [1.5000, 1.6250]],
+                           [[3.6562, 4.5625], [2.0000, 2.6250]],
+                           [[6.6562, 10.5625], [3.5000, 5.6250]]]])
+
+expect_grad = np.array([[[[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]],
+                         [[1.4727, 1.5586], [1.5586, 1.6602]]]])
+
+pool_h = 2
+pool_w = 2
+spatial_scale = 1.0
+num_samples = 2
+sampling_ratio = 2
+num_orientations = 8
+clockwise = False
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_roialign_rotated_gradcheck():
+    x = torch.tensor(
+        np_feature, dtype=torch.float, device='cuda', requires_grad=True)
+    rois = torch.tensor(np_rois, dtype=torch.float, device='cuda')
+    froipool = RiRoIAlignRotated((pool_h, pool_w), spatial_scale, num_samples,
+                                 num_orientations, clockwise)
+    if _USING_PARROTS:
+        gradcheck(
+            froipool, (x, rois), no_grads=[rois], delta=1e-3, pt_atol=1e-3)
+    else:
+        gradcheck(froipool, (x, rois), eps=1e-3, atol=1e-3)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_roialign_rotated_allclose():
+    x = torch.tensor(
+        np_feature, dtype=torch.float, device='cuda', requires_grad=True)
+    rois = torch.tensor(np_rois, dtype=torch.float, device='cuda')
+    froipool = RiRoIAlignRotated((pool_h, pool_w), spatial_scale, num_samples,
+                                 num_orientations, clockwise)
+    output = froipool(x, rois)
+    output.backward(torch.ones_like(output))
+    assert np.allclose(
+        output.data.type(torch.float).cpu().numpy(), expect_output, atol=1e-3)
+    assert np.allclose(
+        x.grad.data.type(torch.float).cpu().numpy(), expect_grad, atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..fa7045c5cf6ed6960aa04835f7c25c24a1d4d557
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align.py
@@ -0,0 +1,120 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+# yapf:disable
+
+inputs = [([[[[1., 2.], [3., 4.]]]],
+           [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2.], [3., 4.]],
+             [[4., 3.], [2., 1.]]]],
+           [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.],
+              [9., 10., 13., 14.], [11., 12., 15., 16.]]]],
+           [[0., 0., 0., 3., 3.]])]
+outputs = [([[[[1.0, 1.25], [1.5, 1.75]]]],
+            [[[[3.0625, 0.4375], [0.4375, 0.0625]]]]),
+           ([[[[1.0, 1.25], [1.5, 1.75]],
+              [[4.0, 3.75], [3.5, 3.25]]]],
+            [[[[3.0625, 0.4375], [0.4375, 0.0625]],
+              [[3.0625, 0.4375], [0.4375, 0.0625]]]]),
+           ([[[[1.9375, 4.75], [7.5625, 10.375]]]],
+            [[[[0.47265625, 0.42968750, 0.42968750, 0.04296875],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.04296875, 0.03906250, 0.03906250, 0.00390625]]]])]
+# yapf:enable
+
+pool_h = 2
+pool_w = 2
+spatial_scale = 1.0
+sampling_ratio = 2
+
+
+def _test_roialign_gradcheck(device, dtype):
+    try:
+        from mmcv.ops import RoIAlign
+    except ModuleNotFoundError:
+        pytest.skip('RoIAlign op is not successfully compiled')
+    if dtype is torch.half:
+        pytest.skip('grad check does not support fp16')
+    for case in inputs:
+        np_input = np.array(case[0])
+        np_rois = np.array(case[1])
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+        froipool = RoIAlign((pool_h, pool_w), spatial_scale, sampling_ratio)
+
+        if torch.__version__ == 'parrots':
+            gradcheck(
+                froipool, (x, rois), no_grads=[rois], delta=1e-5, pt_atol=1e-5)
+        else:
+            gradcheck(froipool, (x, rois), eps=1e-5, atol=1e-5)
+
+
+def _test_roialign_allclose(device, dtype):
+    try:
+        from mmcv.ops import roi_align
+    except ModuleNotFoundError:
+        pytest.skip('test requires compilation')
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    sampling_ratio = 2
+    for case, output in zip(inputs, outputs):
+        np_input = np.array(case[0])
+        np_rois = np.array(case[1])
+        np_output = np.array(output[0])
+        np_grad = np.array(output[1])
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+        output = roi_align(x, rois, (pool_h, pool_w), spatial_scale,
+                           sampling_ratio, 'avg', True)
+        output.backward(torch.ones_like(output))
+        assert np.allclose(
+            output.data.type(torch.float).cpu().numpy(), np_output, atol=1e-3)
+        assert np.allclose(
+            x.grad.data.type(torch.float).cpu().numpy(), np_grad, atol=1e-3)
+
+
+@pytest.mark.parametrize('device', [
+    'cpu',
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
+def test_roialign(device, dtype):
+    # check float only
+    if dtype is torch.float:
+        _test_roialign_gradcheck(device=device, dtype=dtype)
+    _test_roialign_allclose(device=device, dtype=dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align_rotated.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align_rotated.py
new file mode 100644
index 0000000000000000000000000000000000000000..08038fa9171c1384ec69def0079422ea001f63f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_align_rotated.py
@@ -0,0 +1,151 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+    _USING_PARROTS = False
+
+# yapf:disable
+inputs = [([[[[1., 2.], [3., 4.]]]],
+           [[0., 0.5, 0.5, 1., 1., 0]]),
+          ([[[[1., 2.], [3., 4.]]]],
+           [[0., 0.5, 0.5, 1., 1., np.pi / 2]]),
+          ([[[[1., 2.], [3., 4.]],
+             [[4., 3.], [2., 1.]]]],
+           [[0., 0.5, 0.5, 1., 1., 0]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.],
+              [9., 10., 13., 14.], [11., 12., 15., 16.]]]],
+           [[0., 1.5, 1.5, 3., 3., 0]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.],
+              [9., 10., 13., 14.], [11., 12., 15., 16.]]]],
+           [[0., 1.5, 1.5, 3., 3., np.pi / 2]])]
+outputs = [([[[[1.0, 1.25], [1.5, 1.75]]]],
+            [[[[3.0625, 0.4375], [0.4375, 0.0625]]]]),
+           ([[[[1.5, 1], [1.75, 1.25]]]],
+            [[[[3.0625, 0.4375], [0.4375, 0.0625]]]]),
+           ([[[[1.0, 1.25], [1.5, 1.75]],
+              [[4.0, 3.75], [3.5, 3.25]]]],
+            [[[[3.0625, 0.4375], [0.4375, 0.0625]],
+              [[3.0625, 0.4375], [0.4375, 0.0625]]]]),
+           ([[[[1.9375, 4.75], [7.5625, 10.375]]]],
+            [[[[0.47265625, 0.42968750, 0.42968750, 0.04296875],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.04296875, 0.03906250, 0.03906250, 0.00390625]]]]),
+           ([[[[7.5625, 1.9375], [10.375, 4.75]]]],
+            [[[[0.47265625, 0.42968750, 0.42968750, 0.04296875],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.42968750, 0.39062500, 0.39062500, 0.03906250],
+               [0.04296875, 0.03906250, 0.03906250, 0.00390625]]]])]
+# yapf:enable
+
+pool_h = 2
+pool_w = 2
+spatial_scale = 1.0
+sampling_ratio = 2
+
+
+def _test_roialign_rotated_gradcheck(device, dtype):
+    try:
+        from mmcv.ops import RoIAlignRotated
+    except ModuleNotFoundError:
+        pytest.skip('RoIAlignRotated op is not successfully compiled')
+    if dtype is torch.half:
+        pytest.skip('grad check does not support fp16')
+    for case in inputs:
+        np_input = np.array(case[0])
+        np_rois = np.array(case[1])
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+        froipool = RoIAlignRotated((pool_h, pool_w), spatial_scale,
+                                   sampling_ratio)
+        if torch.__version__ == 'parrots':
+            gradcheck(
+                froipool, (x, rois), no_grads=[rois], delta=1e-5, pt_atol=1e-5)
+        else:
+            gradcheck(froipool, (x, rois), eps=1e-5, atol=1e-5)
+
+
+def _test_roialign_rotated_allclose(device, dtype):
+    try:
+        from mmcv.ops import RoIAlignRotated, roi_align_rotated
+    except ModuleNotFoundError:
+        pytest.skip('test requires compilation')
+    pool_h = 2
+    pool_w = 2
+    spatial_scale = 1.0
+    sampling_ratio = 2
+
+    for case, output in zip(inputs, outputs):
+        np_input = np.array(case[0])
+        np_rois = np.array(case[1])
+        np_output = np.array(output[0])
+        np_grad = np.array(output[1])
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+        output = roi_align_rotated(x, rois, (pool_h, pool_w), spatial_scale,
+                                   sampling_ratio, True)
+        output.backward(torch.ones_like(output))
+        assert np.allclose(
+            output.data.type(torch.float).cpu().numpy(), np_output, atol=1e-3)
+        assert np.allclose(
+            x.grad.data.type(torch.float).cpu().numpy(), np_grad, atol=1e-3)
+
+    # Test deprecated parameters
+    roi_align_rotated_module_deprecated = RoIAlignRotated(
+        out_size=(pool_h, pool_w),
+        spatial_scale=spatial_scale,
+        sample_num=sampling_ratio)
+
+    output_1 = roi_align_rotated_module_deprecated(x, rois)
+
+    roi_align_rotated_module_new = RoIAlignRotated(
+        output_size=(pool_h, pool_w),
+        spatial_scale=spatial_scale,
+        sampling_ratio=sampling_ratio)
+
+    output_2 = roi_align_rotated_module_new(x, rois)
+
+    assert np.allclose(
+        output_1.data.type(torch.float).cpu().numpy(),
+        output_2.data.type(torch.float).cpu().numpy())
+
+
+@pytest.mark.parametrize('device', [
+    'cpu',
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
+def test_roialign_rotated(device, dtype):
+    # check float only
+    if dtype is torch.float:
+        _test_roialign_rotated_gradcheck(device=device, dtype=dtype)
+    _test_roialign_rotated_allclose(device=device, dtype=dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_pool.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..275d71f53cd199fa48a7f6bc94b815c7edccfc94
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roi_pool.py
@@ -0,0 +1,105 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE, IS_NPU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+
+    _USING_PARROTS = False
+
+cur_dir = os.path.dirname(os.path.abspath(__file__))
+
+inputs = [([[[[1., 2.], [3., 4.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2.], [3., 4.]], [[4., 3.], [2.,
+                                               1.]]]], [[0., 0., 0., 1., 1.]]),
+          ([[[[1., 2., 5., 6.], [3., 4., 7., 8.], [9., 10., 13., 14.],
+              [11., 12., 15., 16.]]]], [[0., 0., 0., 3., 3.]])]
+outputs = [([[[[1., 2.], [3., 4.]]]], [[[[1., 1.], [1., 1.]]]]),
+           ([[[[1., 2.], [3., 4.]], [[4., 3.], [2., 1.]]]], [[[[1., 1.],
+                                                               [1., 1.]],
+                                                              [[1., 1.],
+                                                               [1., 1.]]]]),
+           ([[[[4., 8.], [12., 16.]]]], [[[[0., 0., 0., 0.], [0., 1., 0., 1.],
+                                           [0., 0., 0., 0.], [0., 1., 0.,
+                                                              1.]]]])]
+
+
+class TestRoiPool:
+
+    def test_roipool_gradcheck(self):
+        if not torch.cuda.is_available():
+            return
+        from mmcv.ops import RoIPool
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+
+        for case in inputs:
+            np_input = np.array(case[0])
+            np_rois = np.array(case[1])
+
+            x = torch.tensor(np_input, device='cuda', requires_grad=True)
+            rois = torch.tensor(np_rois, device='cuda')
+
+            froipool = RoIPool((pool_h, pool_w), spatial_scale)
+
+            if _USING_PARROTS:
+                pass
+                # gradcheck(froipool, (x, rois), no_grads=[rois])
+            else:
+                gradcheck(froipool, (x, rois), eps=1e-2, atol=1e-2)
+
+    def _test_roipool_allclose(self, device, dtype=torch.float):
+        from mmcv.ops import roi_pool
+        pool_h = 2
+        pool_w = 2
+        spatial_scale = 1.0
+
+        for case, output in zip(inputs, outputs):
+            np_input = np.array(case[0])
+            np_rois = np.array(case[1])
+            np_output = np.array(output[0])
+            np_grad = np.array(output[1])
+
+            x = torch.tensor(
+                np_input, dtype=dtype, device=device, requires_grad=True)
+            rois = torch.tensor(np_rois, dtype=dtype, device=device)
+
+            output = roi_pool(x, rois, (pool_h, pool_w), spatial_scale)
+            output.backward(torch.ones_like(output))
+            assert np.allclose(output.data.cpu().numpy(), np_output, 1e-3)
+            assert np.allclose(x.grad.data.cpu().numpy(), np_grad, 1e-3)
+
+    @pytest.mark.parametrize('device', [
+        pytest.param(
+            'cuda',
+            marks=pytest.mark.skipif(
+                not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+        pytest.param(
+            'mlu',
+            marks=pytest.mark.skipif(
+                not IS_MLU_AVAILABLE, reason='requires MLU support')),
+        pytest.param(
+            'npu',
+            marks=pytest.mark.skipif(
+                not IS_NPU_AVAILABLE, reason='requires NPU support'))
+    ])
+    @pytest.mark.parametrize('dtype', [
+        torch.float,
+        pytest.param(
+            torch.float,
+            marks=pytest.mark.skipif(
+                IS_MLU_AVAILABLE,
+                reason='MLU does not support for 64-bit floating point')),
+        torch.half
+    ])
+    def test_roipool_allclose(self, device, dtype):
+        self._test_roipool_allclose(device, dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roiaware_pool3d.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roiaware_pool3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..155fce664cfe415fc33a8a99d3135acd0c8ba55e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roiaware_pool3d.py
@@ -0,0 +1,159 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import (RoIAwarePool3d, points_in_boxes_all, points_in_boxes_cpu,
+                      points_in_boxes_part)
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float, torch.half,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE, reason='MLU does not support for float'))
+])
+def test_RoIAwarePool3d(device, dtype):
+    roiaware_pool3d_max = RoIAwarePool3d(
+        out_size=4, max_pts_per_voxel=128, mode='max')
+    roiaware_pool3d_avg = RoIAwarePool3d(
+        out_size=4, max_pts_per_voxel=128, mode='avg')
+    rois = torch.tensor(
+        [[1.0, 2.0, 3.0, 5.0, 4.0, 6.0, -0.3 - np.pi / 2],
+         [-10.0, 23.0, 16.0, 20.0, 10.0, 20.0, -0.5 - np.pi / 2]],
+        dtype=dtype).to(device)
+    # boxes (m, 7) with bottom center in lidar coordinate
+    pts = torch.tensor(
+        [[1, 2, 3.3], [1.2, 2.5, 3.0], [0.8, 2.1, 3.5], [1.6, 2.6, 3.6],
+         [0.8, 1.2, 3.9], [-9.2, 21.0, 18.2], [3.8, 7.9, 6.3],
+         [4.7, 3.5, -12.2], [3.8, 7.6, -2], [-10.6, -12.9, -20], [-16, -18, 9],
+         [-21.3, -52, -5], [0, 0, 0], [6, 7, 8], [-2, -3, -4]],
+        dtype=dtype).to(device)  # points (n, 3) in lidar coordinate
+    pts_feature = pts.clone()
+
+    pooled_features_max = roiaware_pool3d_max(
+        rois=rois, pts=pts, pts_feature=pts_feature)
+    assert pooled_features_max.shape == torch.Size([2, 4, 4, 4, 3])
+    assert torch.allclose(pooled_features_max.sum(),
+                          torch.tensor(51.100, dtype=dtype).to(device), 1e-3)
+
+    pooled_features_avg = roiaware_pool3d_avg(
+        rois=rois, pts=pts, pts_feature=pts_feature)
+    assert pooled_features_avg.shape == torch.Size([2, 4, 4, 4, 3])
+    assert torch.allclose(pooled_features_avg.sum(),
+                          torch.tensor(49.750, dtype=dtype).to(device), 1e-3)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_points_in_boxes_part():
+    boxes = torch.tensor(
+        [[[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 0.3]],
+         [[-10.0, 23.0, 16.0, 10, 20, 20, 0.5]]],
+        dtype=torch.float32).cuda(
+        )  # boxes (b, t, 7) with bottom center in lidar coordinate
+    pts = torch.tensor(
+        [[[1, 2, 3.3], [1.2, 2.5, 3.0], [0.8, 2.1, 3.5], [1.6, 2.6, 3.6],
+          [0.8, 1.2, 3.9], [-9.2, 21.0, 18.2], [3.8, 7.9, 6.3],
+          [4.7, 3.5, -12.2]],
+         [[3.8, 7.6, -2], [-10.6, -12.9, -20], [-16, -18, 9], [-21.3, -52, -5],
+          [0, 0, 0], [6, 7, 8], [-2, -3, -4], [6, 4, 9]]],
+        dtype=torch.float32).cuda()  # points (b, m, 3) in lidar coordinate
+
+    point_indices = points_in_boxes_part(points=pts, boxes=boxes)
+    expected_point_indices = torch.tensor(
+        [[0, 0, 0, 0, 0, -1, -1, -1], [-1, -1, -1, -1, -1, -1, -1, -1]],
+        dtype=torch.int32).cuda()
+    assert point_indices.shape == torch.Size([2, 8])
+    assert (point_indices == expected_point_indices).all()
+
+    boxes = torch.tensor([[[0.0, 0.0, 0.0, 1.0, 20.0, 1.0, 0.523598]]],
+                         dtype=torch.float32).cuda()  # 30 degrees
+    pts = torch.tensor(
+        [[[4, 6.928, 0], [6.928, 4, 0], [4, -6.928, 0], [6.928, -4, 0],
+          [-4, 6.928, 0], [-6.928, 4, 0], [-4, -6.928, 0], [-6.928, -4, 0]]],
+        dtype=torch.float32).cuda()
+    point_indices = points_in_boxes_part(points=pts, boxes=boxes)
+    expected_point_indices = torch.tensor([[-1, -1, 0, -1, 0, -1, -1, -1]],
+                                          dtype=torch.int32).cuda()
+    assert (point_indices == expected_point_indices).all()
+
+
+def test_points_in_boxes_cpu():
+    boxes = torch.tensor(
+        [[[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 0.3],
+          [-10.0, 23.0, 16.0, 10, 20, 20, 0.5]]],
+        dtype=torch.float32
+    )  # boxes (m, 7) with bottom center in lidar coordinate
+    pts = torch.tensor(
+        [[[1, 2, 3.3], [1.2, 2.5, 3.0], [0.8, 2.1, 3.5], [1.6, 2.6, 3.6],
+          [0.8, 1.2, 3.9], [-9.2, 21.0, 18.2], [3.8, 7.9, 6.3],
+          [4.7, 3.5, -12.2], [3.8, 7.6, -2], [-10.6, -12.9, -20], [
+              -16, -18, 9
+          ], [-21.3, -52, -5], [0, 0, 0], [6, 7, 8], [-2, -3, -4]]],
+        dtype=torch.float32)  # points (n, 3) in lidar coordinate
+
+    point_indices = points_in_boxes_cpu(points=pts, boxes=boxes)
+    expected_point_indices = torch.tensor(
+        [[[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [0, 1], [0, 0], [0, 0],
+          [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]],
+        dtype=torch.int32)
+    assert point_indices.shape == torch.Size([1, 15, 2])
+    assert (point_indices == expected_point_indices).all()
+
+    boxes = torch.tensor([[[0.0, 0.0, 0.0, 1.0, 20.0, 1.0, 0.523598]]],
+                         dtype=torch.float32)  # 30 degrees
+    pts = torch.tensor(
+        [[[4, 6.928, 0], [6.928, 4, 0], [4, -6.928, 0], [6.928, -4, 0],
+          [-4, 6.928, 0], [-6.928, 4, 0], [-4, -6.928, 0], [-6.928, -4, 0]]],
+        dtype=torch.float32)
+    point_indices = points_in_boxes_cpu(points=pts, boxes=boxes)
+    expected_point_indices = torch.tensor(
+        [[[0], [0], [1], [0], [1], [0], [0], [0]]], dtype=torch.int32)
+    assert (point_indices == expected_point_indices).all()
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_points_in_boxes_all():
+
+    boxes = torch.tensor(
+        [[[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 0.3],
+          [-10.0, 23.0, 16.0, 10, 20, 20, 0.5]]],
+        dtype=torch.float32).cuda(
+        )  # boxes (m, 7) with bottom center in lidar coordinate
+    pts = torch.tensor(
+        [[[1, 2, 3.3], [1.2, 2.5, 3.0], [0.8, 2.1, 3.5], [1.6, 2.6, 3.6],
+          [0.8, 1.2, 3.9], [-9.2, 21.0, 18.2], [3.8, 7.9, 6.3],
+          [4.7, 3.5, -12.2], [3.8, 7.6, -2], [-10.6, -12.9, -20], [
+              -16, -18, 9
+          ], [-21.3, -52, -5], [0, 0, 0], [6, 7, 8], [-2, -3, -4]]],
+        dtype=torch.float32).cuda()  # points (n, 3) in lidar coordinate
+
+    point_indices = points_in_boxes_all(points=pts, boxes=boxes)
+    expected_point_indices = torch.tensor(
+        [[[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [0, 1], [0, 0], [0, 0],
+          [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]],
+        dtype=torch.int32).cuda()
+    assert point_indices.shape == torch.Size([1, 15, 2])
+    assert (point_indices == expected_point_indices).all()
+
+    if torch.cuda.device_count() > 1:
+        pts = pts.to('cuda:1')
+        boxes = boxes.to('cuda:1')
+        expected_point_indices = expected_point_indices.to('cuda:1')
+        point_indices = points_in_boxes_all(points=pts, boxes=boxes)
+        assert point_indices.shape == torch.Size([1, 15, 2])
+        assert (point_indices == expected_point_indices).all()
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roipoint_pool3d.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roipoint_pool3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..c69a95f81e1f0e18c22769c94fde0ec3860d82f7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_roipoint_pool3d.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import RoIPointPool3d
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float, torch.half,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE, reason='MLU does not support for float'))
+])
+def test_roipoint(device, dtype):
+    points = torch.tensor(
+        [[1, 2, 3.3], [1.2, 2.5, 3.0], [0.8, 2.1, 3.5], [1.6, 2.6, 3.6],
+         [0.8, 1.2, 3.9], [-9.2, 21.0, 18.2], [3.8, 7.9, 6.3],
+         [4.7, 3.5, -12.2], [3.8, 7.6, -2], [-10.6, -12.9, -20], [-16, -18, 9],
+         [-21.3, -52, -5], [0, 0, 0], [6, 7, 8], [-2, -3, -4]],
+        dtype=dtype).unsqueeze(0).to(device)
+    feats = points.clone()
+    rois = torch.tensor([[[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 0.3],
+                          [-10.0, 23.0, 16.0, 10, 20, 20, 0.5]]],
+                        dtype=dtype).to(device)
+
+    roipoint_pool3d = RoIPointPool3d(num_sampled_points=4)
+    roi_feat, empty_flag = roipoint_pool3d(points, feats, rois)
+    expected_roi_feat = torch.tensor(
+        [[[[1, 2, 3.3, 1, 2, 3.3], [1.2, 2.5, 3, 1.2, 2.5, 3],
+           [0.8, 2.1, 3.5, 0.8, 2.1, 3.5], [1.6, 2.6, 3.6, 1.6, 2.6, 3.6]],
+          [[-9.2, 21, 18.2, -9.2, 21, 18.2], [-9.2, 21, 18.2, -9.2, 21, 18.2],
+           [-9.2, 21, 18.2, -9.2, 21, 18.2], [-9.2, 21, 18.2, -9.2, 21, 18.2]]]
+         ],
+        dtype=dtype).to(device)
+    expected_empty_flag = torch.tensor([[0, 0]]).int().to(device)
+
+    assert torch.allclose(roi_feat, expected_roi_feat)
+    assert torch.allclose(empty_flag, expected_empty_flag)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_rotated_feature_align.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_rotated_feature_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7422a3106bb71ccfdec1919c0a6fb939fb182ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_rotated_feature_align.py
@@ -0,0 +1,131 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import rotated_feature_align
+from mmcv.utils import IS_CUDA_AVAILABLE
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'cpu',
+        marks=pytest.mark.skipif(
+            torch.__version__ == 'parrots', reason='requires PyTorch support'))
+])
+def test_rotated_feature_align(device):
+    feature = torch.tensor([[[[1.2924, -0.2172, -0.5222, 0.1172],
+                              [0.9144, 1.2248, 1.3115, -0.9690],
+                              [-0.8949, -1.1797, -0.9093, -0.3961],
+                              [-0.4586, 0.5062, -0.7947, -0.7397]],
+                             [[-1.0943, -0.7495, 1.3461, -1.1652],
+                              [0.2034, 0.6763, -1.2357, 0.5231],
+                              [-1.0062, 1.2592, 1.4225, -0.3951],
+                              [-0.1242, -1.6240, 0.1932, 2.7181]],
+                             [[-1.6271, -1.0276, 0.0578, -0.2997],
+                              [-0.9684, -1.6946, -1.3188, -1.1938],
+                              [-1.6744, -0.8917, -0.6556,
+                               1.0073], [-0.1205, 0.3671, -0.3731, -0.5347]]],
+                            [[[0.7035, 0.2089, -0.1774, 3.4670],
+                              [-0.8505, -0.9278, 1.4714, 0.1644],
+                              [0.0898, 0.3531, -0.4007, 0.1927],
+                              [1.2569, -0.2636, -0.5223, 0.0616]],
+                             [[0.1760, -0.7639, -0.4600, -1.3260],
+                              [-0.9921, -0.2970, -0.8955, 1.0508],
+                              [1.3515, -0.1641, 1.9679, 1.1986],
+                              [-0.3616, 0.6287, 0.4933, 0.3360]],
+                             [[-0.5860, 0.2124, -0.8700, 2.4200],
+                              [-0.0551, -1.5103, -1.6779, 0.8399],
+                              [0.8431, 1.2414, -1.1243, -0.3887],
+                              [-2.1254, 0.6047, -0.3515, 0.7254]]]],
+                           device=device,
+                           requires_grad=True)
+
+    bbox = torch.tensor(
+        [[[[1.3080e+01, 1.2688e+01, 1.1214e+01, 9.3944e+01, -9.1905e-01],
+           [3.8104e+01, 1.0134e+01, 1.4659e+02, 9.0306e+01, -9.8211e-01],
+           [-5.3213e+01, 4.9508e+01, 5.1513e+01, 3.2055e+01, -3.1954e-01],
+           [2.6974e+01, 2.5248e+01, 5.4495e+01, 3.1083e+00, -6.2127e-01]],
+          [[-1.5604e+01, -5.1908e+01, 2.3998e+02, 1.5008e+01, -1.2546e+00],
+           [3.1354e+01, -7.3635e+00, 6.7879e+01, 3.5081e+01, -3.3851e-01],
+           [-5.3292e+00, 9.1946e+00, 1.2834e+01, 1.0485e+01, -1.3039e+00],
+           [-2.3925e+01, 3.6623e+01, 3.9875e+01, 7.2009e+01, -6.5934e-01]],
+          [[7.2114e+01, -2.3781e+01, 2.9106e+01, 8.4501e+01, -1.1340e+00],
+           [2.6258e+01, -7.7034e+00, 1.7629e+02, 1.0615e+02, -1.2156e+00],
+           [3.8057e+01, 4.6016e+01, 1.2965e+01, 6.9384e+00, -1.0855e+00],
+           [2.4428e+01, -1.6189e+01, 2.0572e+02, 3.1622e+01, -1.5719e-01]],
+          [[3.8226e+00, 2.9608e+01, 1.4457e+01, 6.8179e+01, -9.1997e-01],
+           [2.5003e+01, -4.2490e+01, 9.6007e+01, 4.9086e+01, -1.4786e+00],
+           [8.5983e+01, 5.4980e+01, 7.8080e+01, 1.0003e+02, -1.0926e+00],
+           [9.9065e+00, 4.1457e+01, 5.9799e+00, 1.7973e+01, -5.6313e-01]]],
+         [[[-1.8244e+01, 4.6309e+00, 5.3010e+01, 2.4310e+01, -7.0345e-01],
+           [1.9419e+01, 3.6704e+01, 5.2390e+01, 5.4133e+01, -3.7730e-01],
+           [5.6387e+01, 2.3752e+01, 9.0441e+00, 1.7792e+01, -1.5583e+00],
+           [3.6303e+01, 1.6396e+01, 2.0283e+01, 1.9148e+01, -8.3419e-01]],
+          [[3.2169e+01, 3.0521e+01, 2.6283e+01, 1.9680e+02, -3.0454e-01],
+           [2.5788e+01, -3.2189e+01, 8.8882e+01, 1.0207e+02, -1.5328e+00],
+           [8.4676e+00, -1.6668e+01, 2.4657e+01, 1.1275e+02, -4.0388e-01],
+           [-1.0799e+01, 6.0422e+00, 9.5807e+00, 3.3677e+01, -3.5438e-01]],
+          [[6.9363e+01, 1.0850e+01, 2.5968e+01, 2.2311e+01, -1.6408e-01],
+           [2.8140e+00, 4.6843e+00, 3.1289e+00, 2.1480e+01, -6.7583e-01],
+           [2.6661e+01, 4.5290e+01, 6.1679e+00, 3.0005e+01, -8.9806e-01],
+           [5.0871e+00, 1.3234e+01, 9.2087e+01, 4.9622e+01, -2.8020e-01]],
+          [[-1.2643e+01, 2.5176e+01, 5.0488e+01, 5.4246e+01, -4.4840e-01],
+           [-3.4521e+01, 9.8435e-01, 5.2413e+01, 9.7996e+00, -8.4218e-01],
+           [4.9829e+01, -1.0808e+01, 2.9848e+01, 7.3579e+01, -6.2672e-01],
+           [8.0446e+01, 2.8064e+01, 4.5273e+01, 5.3809e+01, -1.2359e+00]]]],
+        device=device,
+        requires_grad=True)
+
+    expected_output = torch.tensor([[[[1.1095, -0.2172, -0.5222, -0.6225],
+                                      [0.9144, 0.7662, 1.0487, -0.9690],
+                                      [-0.8949, -1.6384, -0.9093, -0.3961],
+                                      [-0.8604, 0.5062, -0.7947, -0.7397]],
+                                     [[-0.3961, -0.7495, 1.3461, 1.5528],
+                                      [0.2034, 0.5522, -1.6722, 0.5231],
+                                      [-1.0062, 1.1350, 1.4225, -0.3951],
+                                      [-0.4826, -1.6240, 0.1932, 2.7181]],
+                                     [[-2.6436, -1.0276, 0.0578, -0.8344],
+                                      [-0.9684, -1.8151, -2.1843, -1.1938],
+                                      [-1.6744, -1.0121, -0.6556, 1.0073],
+                                      [-0.8474, 0.3671, -0.3731, -0.5347]]],
+                                    [[[0.7035, 0.2089, -0.1774, 3.4670],
+                                      [-0.8505, -0.9278, 1.4714, 0.1644],
+                                      [0.0898, 0.3064, -0.4007, 0.5849],
+                                      [1.2569, -0.2636, -0.5223, 0.0616]],
+                                     [[0.1760, -0.7639, -0.4600, -1.3260],
+                                      [-0.9921, -0.2970, -0.8955, 1.0508],
+                                      [1.3515, -0.6125, 1.9679, 0.5550],
+                                      [-0.3616, 0.6287, 0.4933, 0.3360]],
+                                     [[-0.5860, 0.2124, -0.8700, 2.4200],
+                                      [-0.0551, -1.5103, -1.6779, 0.8399],
+                                      [0.8431, 0.8455, -1.1243, -1.5994],
+                                      [-2.1254, 0.6047, -0.3515, 0.7254]]]],
+                                   device=device)
+
+    expected_grad = torch.tensor([
+        [[[1.0000, 1.8507, 1.1493, 1.5222], [1.0000, 1.1511, 1.2139, 1.4778],
+          [1.0000, 1.2629, 1.3721, 1.0000], [3.0000, 1.0000, 1.0000, 2.0000]],
+         [[1.0000, 1.8507, 1.1493, 1.5222], [1.0000, 1.1511, 1.2139, 1.4778],
+          [1.0000, 1.2629, 1.3721, 1.0000], [3.0000, 1.0000, 1.0000, 2.0000]],
+         [[1.0000, 1.8507, 1.1493, 1.5222], [1.0000, 1.1511, 1.2139, 1.4778],
+          [1.0000, 1.2629, 1.3721, 1.0000], [3.0000, 1.0000, 1.0000, 2.0000]]],
+        [[[1.2687, 1.5055, 1.2382, 1.0000], [1.1458, 1.4258, 1.4160, 1.0000],
+          [1.0000, 1.0000, 1.0000, 1.0000], [1.0000, 1.0000, 1.0000, 1.0000]],
+         [[1.2687, 1.5055, 1.2382, 1.0000], [1.1458, 1.4258, 1.4160, 1.0000],
+          [1.0000, 1.0000, 1.0000, 1.0000], [1.0000, 1.0000, 1.0000, 1.0000]],
+         [[1.2687, 1.5055, 1.2382, 1.0000], [1.1458, 1.4258, 1.4160, 1.0000],
+          [1.0000, 1.0000, 1.0000, 1.0000], [1.0000, 1.0000, 1.0000, 1.0000]]]
+    ],
+                                 device=device)
+
+    output = rotated_feature_align(
+        feature, bbox, spatial_scale=1 / 8, points=1)
+    output.backward(torch.ones_like(output))
+    assert torch.allclose(output, expected_output, 1e-2)
+    assert torch.allclose(feature.grad, expected_grad, 1e-2)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_saconv.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_saconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..607775c38511d5f3afd01ae4656a232474420761
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_saconv.py
@@ -0,0 +1,47 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmcv.ops import SAConv2d
+
+
+def test_sacconv():
+
+    # test with normal cast
+    x = torch.rand(1, 3, 256, 256)
+    saconv = SAConv2d(3, 5, kernel_size=3, padding=1)
+    sac_out = saconv(x)
+    refer_conv = nn.Conv2d(3, 5, kernel_size=3, padding=1)
+    refer_out = refer_conv(x)
+    assert sac_out.shape == refer_out.shape
+
+    # test with dilation >= 2
+    dalited_saconv = SAConv2d(3, 5, kernel_size=3, padding=2, dilation=2)
+    dalited_sac_out = dalited_saconv(x)
+    refer_conv = nn.Conv2d(3, 5, kernel_size=3, padding=2, dilation=2)
+    refer_out = refer_conv(x)
+    assert dalited_sac_out.shape == refer_out.shape
+
+    # test with deform
+    deform_saconv = SAConv2d(3, 5, kernel_size=3, padding=1, use_deform=True)
+    if torch.cuda.is_available():
+        x = torch.rand(1, 3, 256, 256).cuda()
+        deform_saconv = SAConv2d(
+            3, 5, kernel_size=3, padding=1, use_deform=True).cuda()
+        deform_sac_out = deform_saconv(x).cuda()
+        refer_conv = nn.Conv2d(3, 5, kernel_size=3, padding=1).cuda()
+        refer_out = refer_conv(x)
+        assert deform_sac_out.shape == refer_out.shape
+    else:
+        deform_sac_out = deform_saconv(x)
+        refer_conv = nn.Conv2d(3, 5, kernel_size=3, padding=1)
+        refer_out = refer_conv(x)
+        assert deform_sac_out.shape == refer_out.shape
+
+    # test with groups >= 2
+    x = torch.rand(1, 4, 256, 256)
+    group_saconv = SAConv2d(4, 4, kernel_size=3, padding=1, groups=2)
+    group_sac_out = group_saconv(x)
+    refer_conv = nn.Conv2d(4, 4, kernel_size=3, padding=1, groups=2)
+    refer_out = refer_conv(x)
+    assert group_sac_out.shape == refer_out.shape
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_scatter_points.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_scatter_points.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf4516047a11117fbd79b3a985d902446001afdf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_scatter_points.py
@@ -0,0 +1,132 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+from torch.autograd import gradcheck
+
+from mmcv.ops import DynamicScatter
+
+if torch.__version__ == 'parrots':
+    pytest.skip('not supported in parrots now', allow_module_level=True)
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_dynamic_scatter():
+    dsmean = DynamicScatter([0.32, 0.32, 6],
+                            [-74.88, -74.88, -2, 74.88, 74.88, 4], True)
+    dsmax = DynamicScatter([0.32, 0.32, 6],
+                           [-74.88, -74.88, -2, 74.88, 74.88, 4], False)
+
+    # test empty input
+    empty_feats = torch.empty(size=(0, 3), dtype=torch.float32, device='cuda')
+    empty_coors = torch.empty(size=(0, 3), dtype=torch.int32, device='cuda')
+
+    empty_feats.requires_grad_()
+    empty_feats_out_mean, empty_coors_out_mean = dsmean(
+        empty_feats, empty_coors)
+    empty_feats_out_mean.sum().backward()
+    empty_feats_out_max, empty_coors_out_max = dsmax(empty_feats, empty_coors)
+    empty_feats_out_max.sum().backward()
+
+    assert empty_feats_out_mean.shape == empty_feats.shape
+    assert empty_feats_out_max.shape == empty_feats.shape
+    assert empty_coors_out_mean.shape == empty_coors.shape
+    assert empty_coors_out_max.shape == empty_coors.shape
+
+    # test empty reduced output
+    empty_o_feats = torch.rand(
+        size=(200000, 3), dtype=torch.float32, device='cuda') * 100 - 50
+    empty_o_coors = torch.randint(
+        low=-1, high=0, size=(200000, 3), dtype=torch.int32, device='cuda')
+
+    empty_o_feats.requires_grad_()
+    empty_o_feats_out_mean, empty_o_coors_out_mean = dsmean(
+        empty_o_feats, empty_o_coors)
+    empty_o_feats_out_mean.sum().backward()
+    assert (empty_o_feats.grad == 0).all()
+
+    empty_o_feats_out_max, empty_o_coors_out_max = dsmax(
+        empty_o_feats, empty_o_coors)
+    empty_o_feats_out_max.sum().backward()
+    assert (empty_o_feats.grad == 0).all()
+
+    # test non-empty input
+    feats = torch.rand(
+        size=(200000, 3), dtype=torch.float32, device='cuda') * 100 - 50
+    coors = torch.randint(
+        low=-1, high=20, size=(200000, 3), dtype=torch.int32, device='cuda')
+
+    ref_voxel_coors = coors.unique(dim=0, sorted=True)
+    ref_voxel_coors = ref_voxel_coors[ref_voxel_coors.min(dim=-1).values >= 0]
+    ref_voxel_feats_mean = []
+    ref_voxel_feats_max = []
+    for ref_voxel_coor in ref_voxel_coors:
+        voxel_mask = (coors == ref_voxel_coor).all(dim=-1)
+        ref_voxel_feats_mean.append(feats[voxel_mask].mean(dim=0))
+        ref_voxel_feats_max.append(feats[voxel_mask].max(dim=0).values)
+    ref_voxel_feats_mean = torch.stack(ref_voxel_feats_mean)
+    ref_voxel_feats_max = torch.stack(ref_voxel_feats_max)
+
+    feats_out_mean, coors_out_mean = dsmean(feats, coors)
+    seq_mean = (coors_out_mean[:, 0] * 400 + coors_out_mean[:, 1] * 20 +
+                coors_out_mean[:, 2]).argsort()
+    feats_out_mean = feats_out_mean[seq_mean]
+    coors_out_mean = coors_out_mean[seq_mean]
+
+    feats_out_max, coors_out_max = dsmax(feats, coors)
+    seq_max = (coors_out_max[:, 0] * 400 + coors_out_max[:, 1] * 20 +
+               coors_out_max[:, 2]).argsort()
+    feats_out_max = feats_out_max[seq_max]
+    coors_cout_max = coors_out_max[seq_max]
+
+    assert (coors_out_mean == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_mean, ref_voxel_feats_mean, atol=1e-2, rtol=1e-5)
+    assert (coors_cout_max == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_max, ref_voxel_feats_max, atol=1e-2, rtol=1e-5)
+
+    # test non-empty input without any point out of bound
+    feats = torch.rand(
+        size=(200000, 3), dtype=torch.float32, device='cuda') * 100 - 50
+    coors = torch.randint(
+        low=0, high=20, size=(200000, 3), dtype=torch.int32, device='cuda')
+
+    ref_voxel_coors = coors.unique(dim=0, sorted=True)
+    ref_voxel_coors = ref_voxel_coors[ref_voxel_coors.min(dim=-1).values >= 0]
+    ref_voxel_feats_mean = []
+    ref_voxel_feats_max = []
+    for ref_voxel_coor in ref_voxel_coors:
+        voxel_mask = (coors == ref_voxel_coor).all(dim=-1)
+        ref_voxel_feats_mean.append(feats[voxel_mask].mean(dim=0))
+        ref_voxel_feats_max.append(feats[voxel_mask].max(dim=0).values)
+    ref_voxel_feats_mean = torch.stack(ref_voxel_feats_mean)
+    ref_voxel_feats_max = torch.stack(ref_voxel_feats_max)
+
+    feats_out_mean, coors_out_mean = dsmean(feats, coors)
+    seq_mean = (coors_out_mean[:, 0] * 400 + coors_out_mean[:, 1] * 20 +
+                coors_out_mean[:, 2]).argsort()
+    feats_out_mean = feats_out_mean[seq_mean]
+    coors_out_mean = coors_out_mean[seq_mean]
+
+    feats_out_max, coors_out_max = dsmax(feats, coors)
+    seq_max = (coors_out_max[:, 0] * 400 + coors_out_max[:, 1] * 20 +
+               coors_out_max[:, 2]).argsort()
+    feats_out_max = feats_out_max[seq_max]
+    coors_cout_max = coors_out_max[seq_max]
+
+    assert (coors_out_mean == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_mean, ref_voxel_feats_mean, atol=1e-2, rtol=1e-5)
+    assert (coors_cout_max == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_max, ref_voxel_feats_max, atol=1e-2, rtol=1e-5)
+
+    # test grad #
+    feats = torch.rand(
+        size=(100, 4), dtype=torch.float32, device='cuda') * 100 - 50
+    coors = torch.randint(
+        low=-1, high=3, size=(100, 3), dtype=torch.int32, device='cuda')
+    feats.requires_grad_()
+    gradcheck(dsmean, (feats, coors), eps=1e-2, atol=1e-2, rtol=1e-5)
+    gradcheck(dsmax, (feats, coors), eps=1e-2, atol=1e-2, rtol=1e-5)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_spconv.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_spconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..098ff2189ae5c44ae2acac8f11f54aa43d5ba4cb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_spconv.py
@@ -0,0 +1,133 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+from torch import nn
+
+from mmcv.cnn import build_conv_layer, build_norm_layer
+from mmcv.ops import (SparseConvTensor, SparseInverseConv3d, SparseSequential,
+                      SubMConv3d)
+
+if torch.__version__ == 'parrots':
+    pytest.skip('not supported in parrots now', allow_module_level=True)
+
+
+def make_sparse_convmodule(in_channels,
+                           out_channels,
+                           kernel_size,
+                           indice_key,
+                           stride=1,
+                           padding=0,
+                           conv_type='SubMConv3d',
+                           norm_cfg=None,
+                           order=('conv', 'norm', 'act')):
+    """Make sparse convolution module.
+
+    Args:
+        in_channels (int): the number of input channels
+        out_channels (int): the number of out channels
+        kernel_size (int|tuple(int)): kernel size of convolution
+        indice_key (str): the indice key used for sparse tensor
+        stride (int|tuple(int)): the stride of convolution
+        padding (int or list[int]): the padding number of input
+        conv_type (str): sparse conv type in spconv
+        norm_cfg (dict[str]): config of normalization layer
+        order (tuple[str]): The order of conv/norm/activation layers. It is a
+            sequence of "conv", "norm" and "act". Common examples are
+            ("conv", "norm", "act") and ("act", "conv", "norm").
+
+    Returns:
+        spconv.SparseSequential: sparse convolution module.
+    """
+    assert isinstance(order, tuple) and len(order) <= 3
+    assert set(order) | {'conv', 'norm', 'act'} == {'conv', 'norm', 'act'}
+
+    conv_cfg = dict(type=conv_type, indice_key=indice_key)
+
+    layers = list()
+    for layer in order:
+        if layer == 'conv':
+            if conv_type not in [
+                    'SparseInverseConv3d', 'SparseInverseConv2d',
+                    'SparseInverseConv1d'
+            ]:
+                layers.append(
+                    build_conv_layer(
+                        conv_cfg,
+                        in_channels,
+                        out_channels,
+                        kernel_size,
+                        stride=stride,
+                        padding=padding,
+                        bias=False))
+            else:
+                layers.append(
+                    build_conv_layer(
+                        conv_cfg,
+                        in_channels,
+                        out_channels,
+                        kernel_size,
+                        bias=False))
+        elif layer == 'norm':
+            layers.append(build_norm_layer(norm_cfg, out_channels)[1])
+        elif layer == 'act':
+            layers.append(nn.ReLU(inplace=True))
+
+    layers = SparseSequential(*layers)
+    return layers
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_make_sparse_convmodule():
+    torch.cuda.empty_cache()
+    voxel_features = torch.tensor([[6.56126, 0.9648336, -1.7339306, 0.315],
+                                   [6.8162713, -2.480431, -1.3616394, 0.36],
+                                   [11.643568, -4.744306, -1.3580885, 0.16],
+                                   [23.482342, 6.5036807, 0.5806964, 0.35]],
+                                  dtype=torch.float32,
+                                  device='cuda')  # n, point_features
+    coordinates = torch.tensor(
+        [[0, 12, 819, 131], [0, 16, 750, 136], [1, 16, 705, 232],
+         [1, 35, 930, 469]],
+        dtype=torch.int32,
+        device='cuda')  # n, 4(batch, ind_x, ind_y, ind_z)
+
+    # test
+    input_sp_tensor = SparseConvTensor(voxel_features, coordinates,
+                                       [41, 1600, 1408], 2)
+
+    sparse_block0 = make_sparse_convmodule(
+        4,
+        16,
+        3,
+        'test0',
+        stride=1,
+        padding=0,
+        conv_type='SubMConv3d',
+        norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
+        order=('conv', 'norm', 'act')).cuda()
+    assert isinstance(sparse_block0[0], SubMConv3d)
+    assert sparse_block0[0].in_channels == 4
+    assert sparse_block0[0].out_channels == 16
+    assert isinstance(sparse_block0[1], torch.nn.BatchNorm1d)
+    assert sparse_block0[1].eps == 0.001
+    assert sparse_block0[1].momentum == 0.01
+    assert isinstance(sparse_block0[2], torch.nn.ReLU)
+
+    # test forward
+    out_features = sparse_block0(input_sp_tensor)
+    assert out_features.features.shape == torch.Size([4, 16])
+
+    sparse_block1 = make_sparse_convmodule(
+        4,
+        16,
+        3,
+        'test1',
+        stride=1,
+        padding=0,
+        conv_type='SparseInverseConv3d',
+        norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
+        order=('norm', 'act', 'conv')).cuda()
+    assert isinstance(sparse_block1[0], torch.nn.BatchNorm1d)
+    assert isinstance(sparse_block1[1], torch.nn.ReLU)
+    assert isinstance(sparse_block1[2], SparseInverseConv3d)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_syncbn.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_syncbn.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1c1605ad5aa4f846cbd62db62a27e8af32b6840
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_syncbn.py
@@ -0,0 +1,295 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import platform
+
+import numpy as np
+import pytest
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+
+if platform.system() == 'Windows':
+    import regex as re
+else:
+    import re
+
+
+class TestSyncBN:
+
+    def dist_init(self):
+        rank = int(os.environ['SLURM_PROCID'])
+        world_size = int(os.environ['SLURM_NTASKS'])
+        local_rank = int(os.environ['SLURM_LOCALID'])
+        node_list = str(os.environ['SLURM_NODELIST'])
+
+        node_parts = re.findall('[0-9]+', node_list)
+        os.environ['MASTER_ADDR'] = (f'{node_parts[1]}.{node_parts[2]}' +
+                                     f'.{node_parts[3]}.{node_parts[4]}')
+        os.environ['MASTER_PORT'] = '12341'
+        os.environ['WORLD_SIZE'] = str(world_size)
+        os.environ['RANK'] = str(rank)
+
+        dist.init_process_group('nccl')
+        torch.cuda.set_device(local_rank)
+
+    def _test_syncbn_train(self, size=1, half=False):
+
+        if 'SLURM_NTASKS' not in os.environ or int(
+                os.environ['SLURM_NTASKS']) != 4:
+            print('must run with slurm has 4 processes!\n'
+                  'srun -p test --gres=gpu:4 -n4')
+            return
+        else:
+            print('Running syncbn test')
+        from mmcv.ops import SyncBatchNorm
+
+        assert size in (1, 2, 4)
+        if not dist.is_initialized():
+            self.dist_init()
+        rank = dist.get_rank()
+
+        torch.manual_seed(9)
+        torch.cuda.manual_seed(9)
+
+        self.x = torch.rand(16, 3, 2, 3).cuda()
+        self.y_bp = torch.rand(16, 3, 2, 3).cuda()
+
+        if half:
+            self.x = self.x.half()
+            self.y_bp = self.y_bp.half()
+        dist.broadcast(self.x, src=0)
+        dist.broadcast(self.y_bp, src=0)
+
+        torch.cuda.synchronize()
+        if size == 1:
+            groups = [None, None, None, None]
+            groups[0] = dist.new_group([0])
+            groups[1] = dist.new_group([1])
+            groups[2] = dist.new_group([2])
+            groups[3] = dist.new_group([3])
+            group = groups[rank]
+        elif size == 2:
+            groups = [None, None, None, None]
+            groups[0] = groups[1] = dist.new_group([0, 1])
+            groups[2] = groups[3] = dist.new_group([2, 3])
+            group = groups[rank]
+        elif size == 4:
+            group = dist.group.WORLD
+        syncbn = SyncBatchNorm(3, group=group).cuda()
+        syncbn.weight.data[0] = 0.2
+        syncbn.weight.data[1] = 0.5
+        syncbn.weight.data[2] = 0.7
+        syncbn.train()
+
+        bn = nn.BatchNorm2d(3).cuda()
+        bn.weight.data[0] = 0.2
+        bn.weight.data[1] = 0.5
+        bn.weight.data[2] = 0.7
+        bn.train()
+
+        sx = self.x[rank * 4:rank * 4 + 4]
+        sx.requires_grad_()
+        sy = syncbn(sx)
+        sy.backward(self.y_bp[rank * 4:rank * 4 + 4])
+
+        smean = syncbn.running_mean
+        svar = syncbn.running_var
+        sx_grad = sx.grad
+        sw_grad = syncbn.weight.grad
+        sb_grad = syncbn.bias.grad
+
+        if size == 1:
+            x = self.x[rank * 4:rank * 4 + 4]
+            y_bp = self.y_bp[rank * 4:rank * 4 + 4]
+        elif size == 2:
+            x = self.x[rank // 2 * 8:rank // 2 * 8 + 8]
+            y_bp = self.y_bp[rank // 2 * 8:rank // 2 * 8 + 8]
+        elif size == 4:
+            x = self.x
+            y_bp = self.y_bp
+        x.requires_grad_()
+        y = bn(x)
+        y.backward(y_bp)
+
+        if size == 2:
+            y = y[rank % 2 * 4:rank % 2 * 4 + 4]
+        elif size == 4:
+            y = y[rank * 4:rank * 4 + 4]
+
+        mean = bn.running_mean
+        var = bn.running_var
+        if size == 1:
+            x_grad = x.grad
+            w_grad = bn.weight.grad
+            b_grad = bn.bias.grad
+        elif size == 2:
+            x_grad = x.grad[rank % 2 * 4:rank % 2 * 4 + 4]
+            w_grad = bn.weight.grad / 2
+            b_grad = bn.bias.grad / 2
+        elif size == 4:
+            x_grad = x.grad[rank * 4:rank * 4 + 4]
+            w_grad = bn.weight.grad / 4
+            b_grad = bn.bias.grad / 4
+
+        assert np.allclose(mean.data.cpu().numpy(),
+                           smean.data.cpu().numpy(), 1e-3)
+        assert np.allclose(var.data.cpu().numpy(),
+                           svar.data.cpu().numpy(), 1e-3)
+        assert np.allclose(y.data.cpu().numpy(), sy.data.cpu().numpy(), 1e-3)
+        assert np.allclose(w_grad.data.cpu().numpy(),
+                           sw_grad.data.cpu().numpy(), 1e-3)
+        assert np.allclose(b_grad.data.cpu().numpy(),
+                           sb_grad.data.cpu().numpy(), 1e-3)
+        assert np.allclose(x_grad.data.cpu().numpy(),
+                           sx_grad.data.cpu().numpy(), 1e-2)
+
+    def _test_syncbn_empty_train(self, size=1, half=False):
+
+        if 'SLURM_NTASKS' not in os.environ or int(
+                os.environ['SLURM_NTASKS']) != 4:
+            print('must run with slurm has 4 processes!\n'
+                  'srun -p test --gres=gpu:4 -n4')
+            return
+        else:
+            print('Running syncbn test')
+        from mmcv.ops import SyncBatchNorm
+
+        assert size in (1, 2, 4)
+        if not dist.is_initialized():
+            self.dist_init()
+        rank = dist.get_rank()
+
+        torch.manual_seed(9)
+        torch.cuda.manual_seed(9)
+
+        self.x = torch.rand(0, 3, 2, 3).cuda()
+        self.y_bp = torch.rand(0, 3, 2, 3).cuda()
+
+        if half:
+            self.x = self.x.half()
+            self.y_bp = self.y_bp.half()
+        dist.broadcast(self.x, src=0)
+        dist.broadcast(self.y_bp, src=0)
+
+        torch.cuda.synchronize()
+        if size == 1:
+            groups = [None, None, None, None]
+            groups[0] = dist.new_group([0])
+            groups[1] = dist.new_group([1])
+            groups[2] = dist.new_group([2])
+            groups[3] = dist.new_group([3])
+            group = groups[rank]
+        elif size == 2:
+            groups = [None, None, None, None]
+            groups[0] = groups[1] = dist.new_group([0, 1])
+            groups[2] = groups[3] = dist.new_group([2, 3])
+            group = groups[rank]
+        elif size == 4:
+            group = dist.group.WORLD
+
+        syncbn = SyncBatchNorm(3, group=group, stats_mode='N').cuda()
+        syncbn.weight.data[0] = 0.2
+        syncbn.weight.data[1] = 0.5
+        syncbn.weight.data[2] = 0.7
+        syncbn.train()
+
+        bn = nn.BatchNorm2d(3).cuda()
+        bn.weight.data[0] = 0.2
+        bn.weight.data[1] = 0.5
+        bn.weight.data[2] = 0.7
+        bn.train()
+
+        sx = self.x[rank * 4:rank * 4 + 4]
+        sx.requires_grad_()
+        sy = syncbn(sx)
+        sy.backward(self.y_bp[rank * 4:rank * 4 + 4])
+        smean = syncbn.running_mean
+        svar = syncbn.running_var
+        sx_grad = sx.grad
+        sw_grad = syncbn.weight.grad
+        sb_grad = syncbn.bias.grad
+
+        if size == 1:
+            x = self.x[rank * 4:rank * 4 + 4]
+            y_bp = self.y_bp[rank * 4:rank * 4 + 4]
+        elif size == 2:
+            x = self.x[rank // 2 * 8:rank // 2 * 8 + 8]
+            y_bp = self.y_bp[rank // 2 * 8:rank // 2 * 8 + 8]
+        elif size == 4:
+            x = self.x
+            y_bp = self.y_bp
+        x.requires_grad_()
+        y = bn(x)
+        y.backward(y_bp)
+
+        if size == 2:
+            y = y[rank % 2 * 4:rank % 2 * 4 + 4]
+        elif size == 4:
+            y = y[rank * 4:rank * 4 + 4]
+
+        mean = bn.running_mean
+        var = bn.running_var
+        if size == 1:
+            x_grad = x.grad
+            w_grad = bn.weight.grad
+            b_grad = bn.bias.grad
+        elif size == 2:
+            x_grad = x.grad[rank % 2 * 4:rank % 2 * 4 + 4]
+            w_grad = bn.weight.grad / 2
+            b_grad = bn.bias.grad / 2
+        elif size == 4:
+            x_grad = x.grad[rank * 4:rank * 4 + 4]
+            w_grad = bn.weight.grad / 4
+            b_grad = bn.bias.grad / 4
+
+        assert np.allclose(mean.data.cpu().numpy(),
+                           smean.data.cpu().numpy(), 1e-3)
+        assert np.allclose(var.data.cpu().numpy(),
+                           svar.data.cpu().numpy(), 1e-3)
+        assert np.allclose(y.data.cpu().numpy(), sy.data.cpu().numpy(), 1e-3)
+        assert np.allclose(w_grad.data.cpu().numpy(),
+                           sw_grad.data.cpu().numpy(), 1e-3)
+        assert np.allclose(b_grad.data.cpu().numpy(),
+                           sb_grad.data.cpu().numpy(), 1e-3)
+        assert np.allclose(x_grad.data.cpu().numpy(),
+                           sx_grad.data.cpu().numpy(), 1e-2)
+
+        # 'stats_mode' only allows 'default' and 'N'
+        with pytest.raises(AssertionError):
+            SyncBatchNorm(3, group=group, stats_mode='X')
+
+    def test_syncbn_1(self):
+        self._test_syncbn_train(size=1)
+
+    def test_syncbn_2(self):
+        self._test_syncbn_train(size=2)
+
+    def test_syncbn_4(self):
+        self._test_syncbn_train(size=4)
+
+    def test_syncbn_1_half(self):
+        self._test_syncbn_train(size=1, half=True)
+
+    def test_syncbn_2_half(self):
+        self._test_syncbn_train(size=2, half=True)
+
+    def test_syncbn_4_half(self):
+        self._test_syncbn_train(size=4, half=True)
+
+    def test_syncbn_empty_1(self):
+        self._test_syncbn_empty_train(size=1)
+
+    def test_syncbn_empty_2(self):
+        self._test_syncbn_empty_train(size=2)
+
+    def test_syncbn_empty_4(self):
+        self._test_syncbn_empty_train(size=4)
+
+    def test_syncbn_empty_1_half(self):
+        self._test_syncbn_empty_train(size=1, half=True)
+
+    def test_syncbn_empty_2_half(self):
+        self._test_syncbn_empty_train(size=2, half=True)
+
+    def test_syncbn_empty_4_half(self):
+        self._test_syncbn_empty_train(size=4, half=True)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_interpolate.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_interpolate.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f56e3ee828411e31b091258e24cbda3123f3716
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_interpolate.py
@@ -0,0 +1,78 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import three_interpolate
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+@pytest.mark.parametrize('dtype', [torch.half, torch.float, torch.float])
+def test_three_interpolate(dtype):
+    features = torch.tensor(
+        [[[2.4350, 4.7516, 4.4995, 2.4350, 2.4350, 2.4350],
+          [3.1236, 2.6278, 3.0447, 3.1236, 3.1236, 3.1236],
+          [2.6732, 2.8677, 2.6436, 2.6732, 2.6732, 2.6732],
+          [0.0124, 7.0150, 7.0199, 0.0124, 0.0124, 0.0124],
+          [0.3207, 0.0000, 0.3411, 0.3207, 0.3207, 0.3207]],
+         [[0.0000, 0.9544, 2.4532, 0.0000, 0.0000, 0.0000],
+          [0.5346, 1.9176, 1.4715, 0.5346, 0.5346, 0.5346],
+          [0.0000, 0.2744, 2.0842, 0.0000, 0.0000, 0.0000],
+          [0.3414, 1.5063, 1.6209, 0.3414, 0.3414, 0.3414],
+          [0.5814, 0.0103, 0.0000, 0.5814, 0.5814, 0.5814]]],
+        dtype=dtype).cuda()
+
+    idx = torch.tensor([[[0, 1, 2], [2, 3, 4], [2, 3, 4], [0, 1, 2], [0, 1, 2],
+                         [0, 1, 3]],
+                        [[0, 2, 3], [1, 3, 4], [2, 1, 4], [0, 2, 4], [0, 2, 4],
+                         [0, 1, 2]]]).int().cuda()
+
+    weight = torch.tensor([[[3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [1.0000e+00, 5.8155e-08, 2.2373e-08],
+                            [1.0000e+00, 1.7737e-08, 1.7356e-08],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01]],
+                           [[3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [1.0000e+00, 1.3651e-08, 7.7312e-09],
+                            [1.0000e+00, 1.7148e-08, 1.4070e-08],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01],
+                            [3.3333e-01, 3.3333e-01, 3.3333e-01]]],
+                          dtype=dtype).cuda()
+
+    output = three_interpolate(features, idx, weight)
+    expected_output = torch.tensor([[[
+        3.8953e+00, 4.4995e+00, 4.4995e+00, 3.8953e+00, 3.8953e+00, 3.2072e+00
+    ], [
+        2.9320e+00, 3.0447e+00, 3.0447e+00, 2.9320e+00, 2.9320e+00, 2.9583e+00
+    ], [
+        2.7281e+00, 2.6436e+00, 2.6436e+00, 2.7281e+00, 2.7281e+00, 2.7380e+00
+    ], [
+        4.6824e+00, 7.0199e+00, 7.0199e+00, 4.6824e+00, 4.6824e+00, 2.3466e+00
+    ], [
+        2.2060e-01, 3.4110e-01, 3.4110e-01, 2.2060e-01, 2.2060e-01, 2.1380e-01
+    ]],
+                                    [[
+                                        8.1773e-01, 9.5440e-01, 2.4532e+00,
+                                        8.1773e-01, 8.1773e-01, 1.1359e+00
+                                    ],
+                                     [
+                                         8.4689e-01, 1.9176e+00, 1.4715e+00,
+                                         8.4689e-01, 8.4689e-01, 1.3079e+00
+                                     ],
+                                     [
+                                         6.9473e-01, 2.7440e-01, 2.0842e+00,
+                                         6.9473e-01, 6.9473e-01, 7.8619e-01
+                                     ],
+                                     [
+                                         7.6789e-01, 1.5063e+00, 1.6209e+00,
+                                         7.6789e-01, 7.6789e-01, 1.1562e+00
+                                     ],
+                                     [
+                                         3.8760e-01, 1.0300e-02, 8.3569e-09,
+                                         3.8760e-01, 3.8760e-01, 1.9723e-01
+                                     ]]],
+                                   dtype=dtype).cuda()
+
+    assert torch.allclose(output, expected_output, 1e-3, 1e-4)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_nn.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_nn.py
new file mode 100644
index 0000000000000000000000000000000000000000..456188b9179bf8dc577985f80d3883af42c4aa86
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_three_nn.py
@@ -0,0 +1,65 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmcv.ops import three_nn
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+known = [[[-1.8373, 3.5605, -0.7867], [0.7615, 2.9420, 0.2314],
+          [-0.6503, 3.6637, -1.0622], [-1.8373, 3.5605, -0.7867],
+          [-1.8373, 3.5605, -0.7867]],
+         [[-1.3399, 1.9991, -0.3698], [-0.0799, 0.9698, -0.8457],
+          [0.0858, 2.4721, -0.1928], [-1.3399, 1.9991, -0.3698],
+          [-1.3399, 1.9991, -0.3698]]]
+
+unknown = [[[-1.8373, 3.5605, -0.7867], [0.7615, 2.9420, 0.2314],
+            [-0.6503, 3.6637, -1.0622], [-1.5237, 2.3976, -0.8097],
+            [-0.0722, 3.4017, -0.2880], [0.5198, 3.0661, -0.4605],
+            [-2.0185, 3.5019, -0.3236], [0.5098, 3.1020, 0.5799],
+            [-1.6137, 3.8443, -0.5269], [0.7341, 2.9626, -0.3189]],
+           [[-1.3399, 1.9991, -0.3698], [-0.0799, 0.9698, -0.8457],
+            [0.0858, 2.4721, -0.1928], [-0.9022, 1.6560, -1.3090],
+            [0.1156, 1.6901, -0.4366], [-0.6477, 2.3576, -0.1563],
+            [-0.8482, 1.1466, -1.2704], [-0.8753, 2.0845, -0.3460],
+            [-0.5621, 1.4233, -1.2858], [-0.5883, 1.3114, -1.2899]]]
+
+expected_dist = [[[0.0000, 0.0000, 0.0000], [0.0000, 2.0463, 2.8588],
+                  [0.0000, 1.2229, 1.2229], [1.2047, 1.2047, 1.2047],
+                  [1.0011, 1.0845, 1.8411], [0.7433, 1.4451, 2.4304],
+                  [0.5007, 0.5007, 0.5007], [0.4587, 2.0875, 2.7544],
+                  [0.4450, 0.4450, 0.4450], [0.5514, 1.7206, 2.6811]],
+                 [[0.0000, 0.0000, 0.0000], [0.0000, 1.6464, 1.6952],
+                  [0.0000, 1.5125, 1.5125], [1.0915, 1.0915, 1.0915],
+                  [0.8197, 0.8511, 1.4894], [0.7433, 0.8082, 0.8082],
+                  [0.8955, 1.3340, 1.3340], [0.4730, 0.4730, 0.4730],
+                  [0.7949, 1.3325, 1.3325], [0.7566, 1.3727, 1.3727]]]
+
+expected_idx = [[[0, 3, 4], [1, 2, 0], [2, 0, 3], [0, 3, 4], [2, 1, 0],
+                 [1, 2, 0], [0, 3, 4], [1, 2, 0], [0, 3, 4], [1, 2, 0]],
+                [[0, 3, 4], [1, 2, 0], [2, 0, 3], [0, 3, 4], [2, 1, 0],
+                 [2, 0, 3], [1, 0, 3], [0, 3, 4], [1, 0, 3], [1, 0, 3]]]
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype,rtol', [(torch.float, 1e-8),
+                                        (torch.half, 1e-3)])
+def test_three_nn(device, dtype, rtol):
+    dtype = torch.float
+    known_t = torch.tensor(known, dtype=dtype, device=device)
+    unknown_t = torch.tensor(unknown, dtype=dtype, device=device)
+
+    dist_t, idx_t = three_nn(unknown_t, known_t)
+    expected_dist_t = torch.tensor(expected_dist, dtype=dtype, device=device)
+    expected_idx_t = torch.tensor(expected_idx, device=device)
+
+    assert torch.allclose(dist_t, expected_dist_t, atol=1e-4, rtol=rtol)
+    assert torch.all(idx_t == expected_idx_t)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_tin_shift.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_tin_shift.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea684f31b801ac1346063bb124be1b054506593a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_tin_shift.py
@@ -0,0 +1,226 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+import numpy as np
+import pytest
+import torch
+
+from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck
+
+    _USING_PARROTS = False
+
+cur_dir = os.path.dirname(os.path.abspath(__file__))
+
+inputs = ([[[[0.88572276, 0.46422583], [0.97408265, 0.59547687],
+             [0.030812204, 0.96236038], [0.75418317, 0.44058233],
+             [0.33279222, 0.00084149837], [0.7069388, 0.23255438],
+             [0.13547045, 0.81549376], [0.40174931, 0.36317211]],
+            [[0.57444429, 0.15905505], [0.39897251, 0.25790238],
+             [0.93282568, 0.18451685], [0.92526674, 0.18283755],
+             [0.31664443, 0.59323865], [0.1957739, 0.42505842],
+             [0.081158757, 0.81340349], [0.43456328, 0.30195212]],
+            [[0.8198145, 0.05990988], [0.98062474, 0.34803438],
+             [0.10412294, 0.37183142], [0.15021622, 0.038857818],
+             [0.40985721, 0.42253625], [0.71150124, 0.59778064],
+             [0.83851069, 0.15194464], [0.097513378, 0.74820143]],
+            [[0.80680406, 0.49327564], [0.17821097, 0.12980539],
+             [0.50657678, 0.14446253], [0.04178369, 0.53071898],
+             [0.84983683, 0.3826949], [0.32193625, 0.91275406],
+             [0.75628334, 0.52934098], [0.27994192, 0.3053292]]],
+           [[[0.082397044, 0.4210068], [0.23563534, 0.7938987],
+             [0.63669145, 0.69397897], [0.8844561, 0.97854084],
+             [0.79027033, 0.60640401], [0.63528901, 0.72172403],
+             [0.0097346902, 0.70800996], [0.87891227, 0.13674974]],
+            [[0.74329448, 0.0243572], [0.82178867, 0.85750699],
+             [0.7568835, 0.73146772], [0.5031184, 0.30479157],
+             [0.28713053, 0.47414285], [0.4682079, 0.067471564],
+             [0.48368263, 0.14590704], [0.25397325, 0.19946373]],
+            [[0.4291026, 0.068739474], [0.7159555, 0.79903615],
+             [0.76412082, 0.85348046], [0.081224024, 0.82264912],
+             [0.97173303, 0.24291694], [0.48957139, 0.43488795],
+             [0.67382395, 0.21889746], [0.36712623, 0.67127824]],
+            [[0.12054044, 0.18096751], [0.86675781, 0.54755616],
+             [0.68208277, 0.15164375], [0.79991871, 0.80811197],
+             [0.85256428, 0.68253738], [0.185983, 0.95642138],
+             [0.48102546, 0.28009653], [0.35726011, 0.58168036]]]])
+
+shifts = [([[1, 0, 1, -2], [-2, 1, -1, 1]]), ([[2, 1, 2, -1], [-1, 2, 0, 2]])]
+
+outputs = [([[[[0.0, 0.0], [0.0, 0.0], [0.030812, 0.96236], [0.75418, 0.44058],
+               [0.0, 0.0], [0.0, 0.0], [0.83851, 0.15194], [0.097513, 0.7482]],
+              [[0.88572, 0.46423], [0.97408, 0.59548], [0.93283, 0.18452],
+               [0.92527, 0.18284], [0.33279, 0.0008415], [0.70694, 0.23255],
+               [0.75628, 0.52934], [0.27994, 0.30533]],
+              [[0.57444, 0.15906], [0.39897, 0.2579], [0.10412, 0.37183],
+               [0.15022, 0.038858], [0.31664, 0.59324], [0.19577, 0.42506],
+               [0.0, 0.0], [0.0, 0.0]],
+              [[0.81981, 0.05991], [0.98062, 0.34803], [0.50658, 0.14446],
+               [0.041784, 0.53072], [0.40986, 0.42254], [0.7115, 0.59778],
+               [0.0, 0.0], [0.0, 0.0]]],
+             [[[0.4291, 0.068739], [0.71596, 0.79904], [0.0, 0.0], [0.0, 0.0],
+               [0.28713, 0.47414], [0.46821, 0.067472], [0.0, 0.0], [0.0,
+                                                                     0.0]],
+              [[0.12054, 0.18097], [0.86676, 0.54756], [0.63669, 0.69398],
+               [0.88446, 0.97854], [0.97173, 0.24292], [0.48957, 0.43489],
+               [0.0097347, 0.70801], [0.87891, 0.13675]],
+              [[0.0, 0.0], [0.0, 0.0], [0.75688, 0.73147], [0.50312, 0.30479],
+               [0.85256, 0.68254], [0.18598, 0.95642], [0.48368, 0.14591],
+               [0.25397, 0.19946]],
+              [[0.0, 0.0], [0.0, 0.0], [0.76412, 0.85348], [0.081224, 0.82265],
+               [0.0, 0.0], [0.0, 0.0], [0.67382, 0.2189], [0.36713,
+                                                           0.67128]]]]),
+           ([[[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0],
+               [0.0, 0.0], [0.081159, 0.8134], [0.43456, 0.30195]],
+              [[0.0, 0.0], [0.0, 0.0], [0.030812, 0.96236], [0.75418, 0.44058],
+               [0.0, 0.0], [0.0, 0.0], [0.83851, 0.15194], [0.097513, 0.7482]],
+              [[0.88572, 0.46423], [0.97408, 0.59548], [0.93283, 0.18452],
+               [0.92527, 0.18284], [0.33279, 0.0008415], [0.70694, 0.23255],
+               [0.75628, 0.52934], [0.27994, 0.30533]],
+              [[0.57444, 0.15906], [0.39897, 0.2579], [0.10412, 0.37183],
+               [0.15022, 0.038858], [0.31664, 0.59324], [0.19577, 0.42506],
+               [0.0, 0.0], [0.0, 0.0]]],
+             [[[0.74329, 0.024357], [0.82179, 0.85751], [0.0, 0.0], [0.0, 0.0],
+               [0.79027, 0.6064], [0.63529, 0.72172], [0.0, 0.0], [0.0, 0.0]],
+              [[0.4291, 0.068739], [0.71596, 0.79904], [0.0, 0.0], [0.0, 0.0],
+               [0.28713, 0.47414], [0.46821, 0.067472], [0.0, 0.0], [0.0,
+                                                                     0.0]],
+              [[0.12054, 0.18097], [0.86676, 0.54756], [0.63669, 0.69398],
+               [0.88446, 0.97854], [0.97173, 0.24292], [0.48957, 0.43489],
+               [0.0097347, 0.70801], [0.87891, 0.13675]],
+              [[0.0, 0.0], [0.0, 0.0], [0.75688, 0.73147], [0.50312, 0.30479],
+               [0.85256, 0.68254], [0.18598, 0.95642], [0.48368, 0.14591],
+               [0.25397, 0.19946]]]])]
+
+grads = [
+    [[[[0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.],
+       [1., 1.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]]],
+     [[[1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]],
+      [[0., 0.], [0., 0.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]],
+      [[0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.],
+       [1., 1.]]]],
+    [[[[0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [0., 0.], [1., 1.],
+       [1., 1.]],
+      [[0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.],
+       [1., 1.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]]],
+     [[[1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]],
+      [[1., 1.], [1., 1.], [0., 0.], [0., 0.], [1., 1.], [1., 1.], [0., 0.],
+       [0., 0.]],
+      [[1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]],
+      [[0., 0.], [0., 0.], [1., 1.], [1., 1.], [1., 1.], [1., 1.], [1., 1.],
+       [1., 1.]]]]
+]
+
+
+def _test_tinshift_gradcheck(device, dtype):
+    try:
+        from mmcv.ops import tin_shift
+    except ModuleNotFoundError:
+        pytest.skip('TINShift op is not successfully compiled')
+
+    if dtype == torch.half:
+        pytest.skip('"add_cpu/sub_cpu" not implemented for Half')
+
+    for shift in shifts:
+        np_input = np.array(inputs)
+        np_shift = np.array(shift)
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        shift = torch.tensor(np_shift, device=device).int()
+        if torch.__version__ == 'parrots':
+            gradcheck(tin_shift, (x, shift))
+        else:
+            gradcheck(tin_shift, (x, shift), atol=1, rtol=0.1)
+
+
+def _test_tinshift_allclose(device, dtype):
+    try:
+        from mmcv.ops import tin_shift
+    except ModuleNotFoundError:
+        pytest.skip('TINShift op is not successfully compiled')
+
+    for shift, output, grad in zip(shifts, outputs, grads):
+        np_input = np.array(inputs)
+        np_shift = np.array(shift)
+        np_output = np.array(output)
+        np_grad = np.array(grad)
+
+        x = torch.tensor(
+            np_input, dtype=dtype, device=device, requires_grad=True)
+        shift = torch.tensor(np_shift, device=device).int()
+
+        output = tin_shift(x, shift)
+        output.backward(torch.ones_like(output))
+        assert np.allclose(
+            output.data.type(torch.float).cpu().numpy(), np_output, 1e-3)
+        assert np.allclose(
+            x.grad.data.type(torch.float).cpu().numpy(), np_grad, 1e-3)
+
+
+def _test_tinshift_assert(device, dtype):
+    try:
+        from mmcv.ops import tin_shift
+    except ModuleNotFoundError:
+        pytest.skip('TINShift op is not successfully compiled')
+
+    inputs = [
+        torch.rand(2, 3, 4, 2),
+        torch.rand(2, 3, 4, 2),
+        torch.rand(1, 3, 4, 2)
+    ]
+    shifts = [torch.rand(2, 3), torch.rand(2, 5)]
+
+    for x, shift in zip(inputs, shifts):
+        x = x.to(device).type(dtype)
+        shift = shift.to(device).type(dtype)
+
+        # A ValueError should be raised if ops get inputs with wrong shapes.
+        with pytest.raises(ValueError):
+            tin_shift(x, shift)
+
+
+@pytest.mark.parametrize('device', [
+    pytest.param(
+        'cuda',
+        marks=pytest.mark.skipif(
+            not IS_CUDA_AVAILABLE, reason='requires CUDA support')),
+    pytest.param(
+        'mlu',
+        marks=pytest.mark.skipif(
+            not IS_MLU_AVAILABLE, reason='requires MLU support'))
+])
+@pytest.mark.parametrize('dtype', [
+    torch.float,
+    pytest.param(
+        torch.float,
+        marks=pytest.mark.skipif(
+            IS_MLU_AVAILABLE,
+            reason='MLU does not support for 64-bit floating point')),
+    torch.half
+])
+def test_tinshift(device, dtype):
+    _test_tinshift_allclose(device=device, dtype=dtype)
+    _test_tinshift_gradcheck(device=device, dtype=dtype)
+    _test_tinshift_assert(device=device, dtype=dtype)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_upfirdn2d.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_upfirdn2d.py
new file mode 100644
index 0000000000000000000000000000000000000000..6037a51c2f59285acb270192ab5e41f437b7c589
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_upfirdn2d.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+_USING_PARROTS = True
+try:
+    from parrots.autograd import gradcheck
+except ImportError:
+    from torch.autograd import gradcheck, gradgradcheck
+    _USING_PARROTS = False
+
+
+class TestUpFirDn2d:
+    """Unit test for UpFirDn2d.
+
+    Here, we just test the basic case of upsample version. More gerneal tests
+    will be included in other unit test for UpFirDnUpsample and
+    UpFirDnDownSample modules.
+    """
+
+    @classmethod
+    def setup_class(cls):
+        kernel_1d = torch.tensor([1., 3., 3., 1.])
+        cls.kernel = kernel_1d[:, None] * kernel_1d[None, :]
+        cls.kernel = cls.kernel / cls.kernel.sum()
+        cls.factor = 2
+        pad = cls.kernel.shape[0] - cls.factor
+        cls.pad = ((pad + 1) // 2 + cls.factor - 1, pad // 2)
+
+        cls.input_tensor = torch.randn((2, 3, 4, 4), requires_grad=True)
+
+    @pytest.mark.skipif(not torch.cuda.is_available(), reason='requires cuda')
+    def test_upfirdn2d(self):
+        from mmcv.ops import upfirdn2d
+        if _USING_PARROTS:
+            gradcheck(
+                upfirdn2d,
+                (self.input_tensor.cuda(),
+                 self.kernel.type_as(
+                     self.input_tensor).cuda(), self.factor, 1, self.pad),
+                delta=1e-4,
+                pt_atol=1e-3)
+        else:
+            gradcheck(
+                upfirdn2d,
+                (self.input_tensor.cuda(),
+                 self.kernel.type_as(
+                     self.input_tensor).cuda(), self.factor, 1, self.pad),
+                eps=1e-4,
+                atol=1e-3)
+
+            gradgradcheck(
+                upfirdn2d,
+                (self.input_tensor.cuda(),
+                 self.kernel.type_as(
+                     self.input_tensor).cuda(), self.factor, 1, self.pad),
+                eps=1e-4,
+                atol=1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_voxelization.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_voxelization.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3555ac694d5fc0f1ebf03e50bbbd609d3e53682
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_ops/test_voxelization.py
@@ -0,0 +1,139 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+import torch
+
+from mmcv.ops import Voxelization
+
+
+def _get_voxel_points_indices(points, coors, voxel):
+    result_form = np.equal(coors, voxel)
+    return result_form[:, 0] & result_form[:, 1] & result_form[:, 2]
+
+
+@pytest.mark.parametrize('device_type', [
+    'cpu',
+    pytest.param(
+        'cuda:0',
+        marks=pytest.mark.skipif(
+            not torch.cuda.is_available(), reason='requires CUDA support'))
+])
+def test_voxelization(device_type):
+    voxel_size = [0.5, 0.5, 0.5]
+    point_cloud_range = [0, -40, -3, 70.4, 40, 1]
+
+    voxel_dict = np.load(
+        'tests/data/for_3d_ops/test_voxel.npy', allow_pickle=True).item()
+    expected_coors = voxel_dict['coors']
+    expected_voxels = voxel_dict['voxels']
+    expected_num_points_per_voxel = voxel_dict['num_points_per_voxel']
+    points = voxel_dict['points']
+
+    points = torch.tensor(points)
+    max_num_points = -1
+    dynamic_voxelization = Voxelization(voxel_size, point_cloud_range,
+                                        max_num_points)
+    max_num_points = 1000
+    hard_voxelization = Voxelization(voxel_size, point_cloud_range,
+                                     max_num_points)
+
+    device = torch.device(device_type)
+
+    # test hard_voxelization on cpu/gpu
+    points = points.contiguous().to(device)
+    coors, voxels, num_points_per_voxel = hard_voxelization.forward(points)
+    coors = coors.cpu().detach().numpy()
+    voxels = voxels.cpu().detach().numpy()
+    num_points_per_voxel = num_points_per_voxel.cpu().detach().numpy()
+    assert np.all(coors == expected_coors)
+    assert np.all(voxels == expected_voxels)
+    assert np.all(num_points_per_voxel == expected_num_points_per_voxel)
+
+    # test dynamic_voxelization on cpu/gpu
+    coors = dynamic_voxelization.forward(points)
+    coors = coors.cpu().detach().numpy()
+    points = points.cpu().detach().numpy()
+    for i in range(expected_voxels.shape[0]):
+        indices = _get_voxel_points_indices(points, coors, expected_voxels[i])
+        num_points_current_voxel = points[indices].shape[0]
+        assert num_points_current_voxel > 0
+        assert np.all(
+            points[indices] == expected_coors[i][:num_points_current_voxel])
+        assert num_points_current_voxel == expected_num_points_per_voxel[i]
+
+
+@pytest.mark.skipif(
+    not torch.cuda.is_available(), reason='requires CUDA support')
+def test_voxelization_nondeterministic():
+    voxel_size = [0.5, 0.5, 0.5]
+    point_cloud_range = [0, -40, -3, 70.4, 40, 1]
+
+    voxel_dict = np.load(
+        'tests/data/for_3d_ops/test_voxel.npy', allow_pickle=True).item()
+    points = voxel_dict['points']
+
+    points = torch.tensor(points)
+    max_num_points = -1
+    dynamic_voxelization = Voxelization(voxel_size, point_cloud_range,
+                                        max_num_points)
+
+    max_num_points = 10
+    max_voxels = 50
+    hard_voxelization = Voxelization(
+        voxel_size,
+        point_cloud_range,
+        max_num_points,
+        max_voxels,
+        deterministic=False)
+
+    # test hard_voxelization (non-deterministic version) on gpu
+    points = torch.tensor(points).contiguous().to(device='cuda:0')
+    voxels, coors, num_points_per_voxel = hard_voxelization.forward(points)
+    coors = coors.cpu().detach().numpy().tolist()
+    voxels = voxels.cpu().detach().numpy().tolist()
+    num_points_per_voxel = num_points_per_voxel.cpu().detach().numpy().tolist()
+
+    coors_all = dynamic_voxelization.forward(points)
+    coors_all = coors_all.cpu().detach().numpy().tolist()
+
+    coors_set = {tuple(c) for c in coors}
+    coors_all_set = {tuple(c) for c in coors_all}
+
+    assert len(coors_set) == len(coors)
+    assert len(coors_set - coors_all_set) == 0
+
+    points = points.cpu().detach().numpy().tolist()
+
+    coors_points_dict = {}
+    for c, ps in zip(coors_all, points):
+        if tuple(c) not in coors_points_dict:
+            coors_points_dict[tuple(c)] = set()
+        coors_points_dict[tuple(c)].add(tuple(ps))
+
+    for c, ps, n in zip(coors, voxels, num_points_per_voxel):
+        ideal_voxel_points_set = coors_points_dict[tuple(c)]
+        voxel_points_set = {tuple(p) for p in ps[:n]}
+        assert len(voxel_points_set) == n
+        if n < max_num_points:
+            assert voxel_points_set == ideal_voxel_points_set
+            for p in ps[n:]:
+                assert max(p) == min(p) == 0
+        else:
+            assert len(voxel_points_set - ideal_voxel_points_set) == 0
+
+    # test hard_voxelization (non-deterministic version) on gpu
+    # with all input point in range
+    points = torch.tensor(points).contiguous().to(device='cuda:0')[:max_voxels]
+    coors_all = dynamic_voxelization.forward(points)
+    valid_mask = coors_all.ge(0).all(-1)
+    points = points[valid_mask]
+    coors_all = coors_all[valid_mask]
+    coors_all = coors_all.cpu().detach().numpy().tolist()
+
+    voxels, coors, num_points_per_voxel = hard_voxelization.forward(points)
+    coors = coors.cpu().detach().numpy().tolist()
+
+    coors_set = {tuple(c) for c in coors}
+    coors_all_set = {tuple(c) for c in coors_all}
+
+    assert len(coors_set) == len(coors) == len(coors_all_set)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_formatting.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_formatting.py
new file mode 100644
index 0000000000000000000000000000000000000000..96abc8c221b81ec50f8374a843778de81a1b9c24
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_formatting.py
@@ -0,0 +1,101 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+try:
+    import torch
+except ModuleNotFoundError:
+    torch = None
+else:
+    from mmcv.transforms import ToTensor, to_tensor, ImageToTensor
+
+import copy
+
+import numpy as np
+import pytest
+
+
+@pytest.mark.skipif(condition=torch is None, reason='No torch in current env')
+def test_to_tensor():
+
+    # The type of the input object is torch.Tensor
+    data_tensor = torch.tensor([1, 2, 3])
+    tensor_from_tensor = to_tensor(data_tensor)
+    assert isinstance(tensor_from_tensor, torch.Tensor)
+
+    # The type of the input object is numpy.ndarray
+    data_numpy = np.array([1, 2, 3])
+    tensor_from_numpy = to_tensor(data_numpy)
+    assert isinstance(tensor_from_numpy, torch.Tensor)
+
+    # The type of the input object is list
+    data_list = [1, 2, 3]
+    tensor_from_list = to_tensor(data_list)
+    assert isinstance(tensor_from_list, torch.Tensor)
+
+    # The type of the input object is int
+    data_int = 1
+    tensor_from_int = to_tensor(data_int)
+    assert isinstance(tensor_from_int, torch.Tensor)
+
+    # The type of the input object is float
+    data_float = 1.0
+    tensor_from_float = to_tensor(data_float)
+    assert isinstance(tensor_from_float, torch.Tensor)
+
+    # The type of the input object is invalid
+    with pytest.raises(TypeError):
+        data_str = '123'
+        _ = to_tensor(data_str)
+
+
+@pytest.mark.skipif(condition=torch is None, reason='No torch in current env')
+class TestToTensor:
+
+    def test_init(self):
+        TRANSFORM = ToTensor(keys=['img_label'])
+        assert TRANSFORM.keys == ['img_label']
+
+    def test_transform(self):
+        TRANSFORMS = ToTensor(['instances.bbox', 'img_label'])
+
+        # Test multi-level key and single-level key (multi-level key is
+        # not in results)
+        with pytest.raises(KeyError):
+            results = {'instances': {'label': [1]}, 'img_label': [1]}
+            results_tensor = TRANSFORMS.transform(copy.deepcopy(results))
+            assert isinstance(results_tensor['instances']['label'], list)
+            assert isinstance(results_tensor['img_label'], torch.Tensor)
+
+        # Test multi-level key (multi-level key is in results)
+        results = {'instances': {'bbox': [[0, 0, 10, 10]]}, 'img_label': [1]}
+        results_tensor = TRANSFORMS.transform(copy.deepcopy(results))
+        assert isinstance(results_tensor['instances']['bbox'], torch.Tensor)
+
+    def test_repr(self):
+        TRANSFORMS = ToTensor(['instances.bbox', 'img_label'])
+        TRANSFORMS_str = str(TRANSFORMS)
+        isinstance(TRANSFORMS_str, str)
+
+
+@pytest.mark.skipif(condition=torch is None, reason='No torch in current env')
+class TestImageToTensor:
+
+    def test_init(self):
+        TRANSFORMS = ImageToTensor(['img'])
+        assert TRANSFORMS.keys == ['img']
+
+    def test_transform(self):
+        TRANSFORMS = ImageToTensor(['img'])
+
+        # image only has one channel
+        results = {'img': np.zeros((224, 224))}
+        results = TRANSFORMS.transform(results)
+        assert results['img'].shape == (1, 224, 224)
+
+        # image has three channels
+        results = {'img': np.zeros((224, 224, 3))}
+        results = TRANSFORMS.transform(results)
+        assert results['img'].shape == (3, 224, 224)
+
+    def test_repr(self):
+        TRANSFORMS = ImageToTensor(['img'])
+        TRANSFORMS_str = str(TRANSFORMS)
+        assert isinstance(TRANSFORMS_str, str)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_loading.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_loading.py
new file mode 100644
index 0000000000000000000000000000000000000000..918783c993de58d22d0962452632f3864ceb64dc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_loading.py
@@ -0,0 +1,151 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+
+import numpy as np
+import pytest
+
+from mmcv.transforms import LoadAnnotations, LoadImageFromFile
+
+
+class TestLoadImageFromFile:
+
+    def test_load_img(self):
+        # file_client_args and backend_args can not be both set
+        with pytest.raises(
+                ValueError,
+                match='"file_client_args" and "backend_args" cannot be set'):
+            LoadImageFromFile(
+                file_client_args={'backend': 'disk'},
+                backend_args={'backend': 'disk'})
+        data_prefix = osp.join(osp.dirname(__file__), '../data')
+
+        results = dict(img_path=osp.join(data_prefix, 'color.jpg'))
+        transform = LoadImageFromFile()
+        results = transform(copy.deepcopy(results))
+        assert results['img_path'] == osp.join(data_prefix, 'color.jpg')
+        assert results['img'].shape == (300, 400, 3)
+        assert results['img'].dtype == np.uint8
+        assert results['img_shape'] == (300, 400)
+        assert results['ori_shape'] == (300, 400)
+        assert repr(transform) == transform.__class__.__name__ + \
+            "(ignore_empty=False, to_float32=False, color_type='color', " + \
+            "imdecode_backend='cv2', backend_args=None)"
+
+        # to_float32
+        transform = LoadImageFromFile(to_float32=True)
+        results = transform(copy.deepcopy(results))
+        assert results['img'].dtype == np.float32
+
+        # gray image
+        results = dict(img_path=osp.join(data_prefix, 'grayscale.jpg'))
+        transform = LoadImageFromFile()
+        results = transform(copy.deepcopy(results))
+        assert results['img'].shape == (300, 400, 3)
+        assert results['img'].dtype == np.uint8
+
+        transform = LoadImageFromFile(color_type='unchanged')
+        results = transform(copy.deepcopy(results))
+        assert results['img'].shape == (300, 400)
+        assert results['img'].dtype == np.uint8
+
+        # test load empty
+        fake_img_path = osp.join(data_prefix, 'fake.jpg')
+        results['img_path'] = fake_img_path
+        transform = LoadImageFromFile(ignore_empty=False)
+        with pytest.raises(FileNotFoundError):
+            transform(copy.deepcopy(results))
+        transform = LoadImageFromFile(ignore_empty=True)
+        assert transform(copy.deepcopy(results)) is None
+
+
+class TestLoadAnnotations:
+
+    def setup_class(cls):
+        data_prefix = osp.join(osp.dirname(__file__), '../data')
+        seg_map = osp.join(data_prefix, 'grayscale.jpg')
+        cls.results = {
+            'seg_map_path':
+            seg_map,
+            'instances': [{
+                'bbox': [0, 0, 10, 20],
+                'bbox_label': 1,
+                'keypoints': [1, 2, 3]
+            }, {
+                'bbox': [10, 10, 110, 120],
+                'bbox_label': 2,
+                'keypoints': [4, 5, 6]
+            }]
+        }
+
+    def test_init(self):
+        # file_client_args and backend_args can not be both set
+        with pytest.raises(
+                ValueError,
+                match='"file_client_args" and "backend_args" cannot be set'):
+            LoadAnnotations(
+                file_client_args={'backend': 'disk'},
+                backend_args={'backend': 'disk'})
+
+    def test_load_bboxes(self):
+        transform = LoadAnnotations(
+            with_bbox=True,
+            with_label=False,
+            with_seg=False,
+            with_keypoints=False,
+        )
+        results = transform(copy.deepcopy(self.results))
+        assert 'gt_bboxes' in results
+        assert (results['gt_bboxes'] == np.array([[0, 0, 10, 20],
+                                                  [10, 10, 110, 120]])).all()
+        assert results['gt_bboxes'].dtype == np.float32
+
+    def test_load_labels(self):
+        transform = LoadAnnotations(
+            with_bbox=False,
+            with_label=True,
+            with_seg=False,
+            with_keypoints=False,
+        )
+        results = transform(copy.deepcopy(self.results))
+        assert 'gt_bboxes_labels' in results
+        assert (results['gt_bboxes_labels'] == np.array([1, 2])).all()
+        assert results['gt_bboxes_labels'].dtype == np.int64
+
+    def test_load_kps(self):
+        transform = LoadAnnotations(
+            with_bbox=False,
+            with_label=False,
+            with_seg=False,
+            with_keypoints=True,
+        )
+        results = transform(copy.deepcopy(self.results))
+        assert 'gt_keypoints' in results
+        assert (results['gt_keypoints'] == np.array([[[1, 2, 3]],
+                                                     [[4, 5, 6]]])).all()
+        assert results['gt_keypoints'].dtype == np.float32
+
+    def test_load_seg_map(self):
+        transform = LoadAnnotations(
+            with_bbox=False,
+            with_label=False,
+            with_seg=True,
+            with_keypoints=False,
+        )
+        results = transform(copy.deepcopy(self.results))
+        assert 'gt_seg_map' in results
+        assert results['gt_seg_map'].shape[:2] == (300, 400)
+        assert results['gt_seg_map'].dtype == np.uint8
+
+    def test_repr(self):
+        transform = LoadAnnotations(
+            with_bbox=True,
+            with_label=False,
+            with_seg=False,
+            with_keypoints=False,
+        )
+        assert repr(transform) == (
+            'LoadAnnotations(with_bbox=True, '
+            'with_label=False, with_seg=False, '
+            "with_keypoints=False, imdecode_backend='cv2', "
+            'backend_args=None)')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_processing.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_processing.py
new file mode 100644
index 0000000000000000000000000000000000000000..716b9cf26d0a327fdcebadc9705963724a151b5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_processing.py
@@ -0,0 +1,1014 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+from unittest.mock import Mock
+
+import numpy as np
+import pytest
+
+import mmcv
+from mmcv.transforms import (TRANSFORMS, Normalize, Pad, RandomFlip,
+                             RandomResize, Resize, TestTimeAug)
+from mmcv.transforms.base import BaseTransform
+
+try:
+    import torch
+except ModuleNotFoundError:
+    torch = None
+else:
+    import torchvision
+
+from numpy.testing import assert_array_almost_equal, assert_array_equal
+from PIL import Image
+
+
+class TestNormalize:
+
+    def test_normalize(self):
+        img_norm_cfg = dict(
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            to_rgb=True)
+        transform = Normalize(**img_norm_cfg)
+        results = dict()
+        img = mmcv.imread(
+            osp.join(osp.dirname(__file__), '../data/color.jpg'), 'color')
+        original_img = copy.deepcopy(img)
+        results['img'] = img
+        results = transform(results)
+        mean = np.array(img_norm_cfg['mean'])
+        std = np.array(img_norm_cfg['std'])
+        converted_img = (original_img[..., ::-1] - mean) / std
+        assert np.allclose(results['img'], converted_img)
+
+    def test_repr(self):
+        img_norm_cfg = dict(
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            to_rgb=True)
+        transform = Normalize(**img_norm_cfg)
+        assert repr(transform) == ('Normalize(mean=[123.675 116.28  103.53 ], '
+                                   'std=[58.395 57.12  57.375], to_rgb=True)')
+
+
+class TestResize:
+
+    def test_resize(self):
+        data_info = dict(
+            img=np.random.random((1333, 800, 3)),
+            gt_seg_map=np.random.random((1333, 800, 3)),
+            gt_bboxes=np.array([[0, 0, 112, 112]]),
+            gt_keypoints=np.array([[[20, 50, 1]]]))
+
+        with pytest.raises(AssertionError):
+            transform = Resize(scale=None, scale_factor=None)
+        with pytest.raises(TypeError):
+            transform = Resize(scale_factor=[])
+        # test scale is int
+        transform = Resize(scale=2000)
+        results = transform(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2000, 2000)
+        assert results['scale_factor'] == (2000 / 800, 2000 / 1333)
+
+        # test scale is tuple
+        transform = Resize(scale=(2000, 2000))
+        results = transform(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2000, 2000)
+        assert results['scale_factor'] == (2000 / 800, 2000 / 1333)
+
+        # test scale_factor is float
+        transform = Resize(scale_factor=2.0)
+        results = transform(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2666, 1600)
+        assert results['scale_factor'] == (2.0, 2.0)
+
+        # test scale_factor is tuple
+        transform = Resize(scale_factor=(1.5, 2))
+        results = transform(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2666, 1200)
+        assert results['scale_factor'] == (1.5, 2)
+
+        # test keep_ratio is True
+        transform = Resize(scale=(2000, 2000), keep_ratio=True)
+        results = transform(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2000, 1200)
+        assert results['scale_factor'] == (1200 / 800, 2000 / 1333)
+
+        # test resize_bboxes/seg/kps
+        transform = Resize(scale_factor=(1.5, 2))
+        results = transform(copy.deepcopy(data_info))
+        assert (results['gt_bboxes'] == np.array([[0, 0, 168, 224]])).all()
+        assert (results['gt_keypoints'] == np.array([[[30, 100, 1]]])).all()
+        assert results['gt_seg_map'].shape[:2] == (2666, 1200)
+
+        # test clip_object_border = False
+        data_info = dict(
+            img=np.random.random((300, 400, 3)),
+            gt_bboxes=np.array([[200, 150, 600, 450]]))
+        transform = Resize(scale=(200, 150), clip_object_border=False)
+        results = transform(data_info)
+        assert (results['gt_bboxes'] == np.array([100, 75, 300, 225])).all()
+
+    def test_repr(self):
+        transform = Resize(scale=(2000, 2000), keep_ratio=True)
+        assert repr(transform) == ('Resize(scale=(2000, 2000), '
+                                   'scale_factor=None, keep_ratio=True, '
+                                   'clip_object_border=True), backend=cv2), '
+                                   'interpolation=bilinear)')
+
+
+class TestPad:
+
+    def test_pad(self):
+        # test size and size_divisor are both set
+        with pytest.raises(AssertionError):
+            Pad(size=(10, 10), size_divisor=2)
+
+        # test size and size_divisor are both None
+        with pytest.raises(AssertionError):
+            Pad(size=None, size_divisor=None)
+
+        # test size and pad_to_square are both None
+        with pytest.raises(AssertionError):
+            Pad(size=(10, 10), pad_to_square=True)
+
+        # test pad_val is not int or tuple
+        with pytest.raises(AssertionError):
+            Pad(size=(10, 10), pad_val=[])
+
+        # test padding_mode is not 'constant', 'edge', 'reflect' or 'symmetric'
+        with pytest.raises(AssertionError):
+            Pad(size=(10, 10), padding_mode='edg')
+
+        data_info = dict(
+            img=np.random.random((1333, 800, 3)),
+            gt_seg_map=np.random.random((1333, 800, 3)),
+            gt_bboxes=np.array([[0, 0, 112, 112]]),
+            gt_keypoints=np.array([[[20, 50, 1]]]))
+
+        # test pad img / gt_seg_map with size
+        trans = Pad(size=(1200, 2000))
+        results = trans(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (2000, 1200)
+        assert results['gt_seg_map'].shape[:2] == (2000, 1200)
+
+        # test pad img/gt_seg_map with size_divisor
+        trans = Pad(size_divisor=11)
+        results = trans(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (1342, 803)
+        assert results['gt_seg_map'].shape[:2] == (1342, 803)
+
+        # test pad img/gt_seg_map with pad_to_square
+        trans = Pad(pad_to_square=True)
+        results = trans(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (1333, 1333)
+        assert results['gt_seg_map'].shape[:2] == (1333, 1333)
+
+        # test pad img/gt_seg_map with pad_to_square and size_divisor
+        trans = Pad(pad_to_square=True, size_divisor=11)
+        results = trans(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (1342, 1342)
+        assert results['gt_seg_map'].shape[:2] == (1342, 1342)
+
+        # test pad img/gt_seg_map with pad_to_square and size_divisor
+        trans = Pad(pad_to_square=True, size_divisor=11)
+        results = trans(copy.deepcopy(data_info))
+        assert results['img'].shape[:2] == (1342, 1342)
+        assert results['gt_seg_map'].shape[:2] == (1342, 1342)
+
+        # test padding_mode
+        new_img = np.ones((1333, 800, 3))
+        data_info['img'] = new_img
+        trans = Pad(pad_to_square=True, padding_mode='edge')
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'] == np.ones((1333, 1333, 3))).all()
+
+        # test pad_val is dict
+        # test rgb image, size=(2000, 2000)
+        trans = Pad(
+            size=(2000, 2000),
+            pad_val=dict(img=(12, 12, 12), seg=(10, 10, 10)))
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][1333:2000, 800:2000, :] == 12).all()
+        assert (results['gt_seg_map'][1333:2000, 800:2000, :] == 10).all()
+
+        trans = Pad(size=(2000, 2000), pad_val=dict(img=(12, 12, 12)))
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][1333:2000, 800:2000, :] == 12).all()
+        assert (results['gt_seg_map'][1333:2000, 800:2000, :] == 255).all()
+
+        # test rgb image, pad_to_square=True
+        trans = Pad(
+            pad_to_square=True,
+            pad_val=dict(img=(12, 12, 12), seg=(10, 10, 10)))
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][:, 800:1333, :] == 12).all()
+        assert (results['gt_seg_map'][:, 800:1333, :] == 10).all()
+
+        trans = Pad(pad_to_square=True, pad_val=dict(img=(12, 12, 12)))
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][:, 800:1333, :] == 12).all()
+        assert (results['gt_seg_map'][:, 800:1333, :] == 255).all()
+
+        # test pad_val is int
+        # test rgb image
+        trans = Pad(size=(2000, 2000), pad_val=12)
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][1333:2000, 800:2000, :] == 12).all()
+        assert (results['gt_seg_map'][1333:2000, 800:2000, :] == 255).all()
+        # test gray image
+        new_img = np.random.random((1333, 800))
+        data_info['img'] = new_img
+        new_semantic_seg = np.random.random((1333, 800))
+        data_info['gt_seg_map'] = new_semantic_seg
+        trans = Pad(size=(2000, 2000), pad_val=12)
+        results = trans(copy.deepcopy(data_info))
+        assert (results['img'][1333:2000, 800:2000] == 12).all()
+        assert (results['gt_seg_map'][1333:2000, 800:2000] == 255).all()
+
+    def test_repr(self):
+        trans = Pad(pad_to_square=True, size_divisor=11, padding_mode='edge')
+        assert repr(trans) == (
+            'Pad(size=None, size_divisor=11, pad_to_square=True, '
+            "pad_val={'img': 0, 'seg': 255}), padding_mode=edge)")
+
+
+class TestCenterCrop:
+
+    @classmethod
+    def setup_class(cls):
+        img = mmcv.imread(
+            osp.join(osp.dirname(__file__), '../data/color.jpg'), 'color')
+        cls.original_img = copy.deepcopy(img)
+        seg = np.random.randint(0, 19, (300, 400)).astype(np.uint8)
+        cls.gt_semantic_map = copy.deepcopy(seg)
+
+    @staticmethod
+    def reset_results(results, original_img, gt_semantic_map):
+        results['img'] = copy.deepcopy(original_img)
+        results['gt_seg_map'] = copy.deepcopy(gt_semantic_map)
+        results['gt_bboxes'] = np.array([[0, 0, 210, 160],
+                                         [200, 150, 400, 300]])
+        results['gt_keypoints'] = np.array([[[20, 50, 1]], [[200, 150, 1]],
+                                            [[300, 225, 1]]])
+        return results
+
+    @pytest.mark.skipif(
+        condition=torch is None, reason='No torch in current env')
+    def test_error(self):
+        # test assertion if size is smaller than 0
+        with pytest.raises(AssertionError):
+            transform = dict(type='CenterCrop', crop_size=-1)
+            TRANSFORMS.build(transform)
+
+        # test assertion if size is tuple but one value is smaller than 0
+        with pytest.raises(AssertionError):
+            transform = dict(type='CenterCrop', crop_size=(224, -1))
+            TRANSFORMS.build(transform)
+
+        # test assertion if size is tuple and len(size) < 2
+        with pytest.raises(AssertionError):
+            transform = dict(type='CenterCrop', crop_size=(224, ))
+            TRANSFORMS.build(transform)
+
+        # test assertion if size is tuple len(size) > 2
+        with pytest.raises(AssertionError):
+            transform = dict(type='CenterCrop', crop_size=(224, 224, 3))
+            TRANSFORMS.build(transform)
+
+    def test_repr(self):
+        # test repr
+        transform = dict(type='CenterCrop', crop_size=224)
+        center_crop_module = TRANSFORMS.build(transform)
+        assert isinstance(repr(center_crop_module), str)
+
+    def test_transform(self):
+        results = {}
+        self.reset_results(results, self.original_img, self.gt_semantic_map)
+
+        # test CenterCrop when size is int
+        transform = dict(type='CenterCrop', crop_size=224)
+        center_crop_module = TRANSFORMS.build(transform)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (224, 224)
+        assert (results['img'] == self.original_img[38:262, 88:312, ...]).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map[38:262,
+                                                              88:312]).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 122, 122], [112, 112, 224,
+                                                     224]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 12, 0]], [[112, 112, 1]], [[212, 187, 1]]])).all()
+
+        # test CenterCrop when size is tuple
+        transform = dict(type='CenterCrop', crop_size=(224, 224))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (224, 224)
+        assert (results['img'] == self.original_img[38:262, 88:312, ...]).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map[38:262,
+                                                              88:312]).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 122, 122], [112, 112, 224,
+                                                     224]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 12, 0]], [[112, 112, 1]], [[212, 187, 1]]])).all()
+
+        # test CenterCrop when crop_height != crop_width
+        transform = dict(type='CenterCrop', crop_size=(224, 256))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (256, 224)
+        assert (results['img'] == self.original_img[22:278, 88:312, ...]).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map[22:278,
+                                                              88:312]).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 122, 138], [112, 128, 224,
+                                                     256]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 28, 0]], [[112, 128, 1]], [[212, 203, 1]]])).all()
+
+        # test CenterCrop when crop_size is equal to img.shape
+        img_height, img_width, _ = self.original_img.shape
+        transform = dict(type='CenterCrop', crop_size=(img_width, img_height))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (300, 400)
+        assert (results['img'] == self.original_img).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 210, 160], [200, 150, 400,
+                                                     300]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[20, 50, 1]], [[200, 150, 1]], [[300, 225, 1]]])).all()
+
+        # test CenterCrop when crop_size is larger than img.shape
+        transform = dict(
+            type='CenterCrop', crop_size=(img_width * 2, img_height * 2))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (300, 400)
+        assert (results['img'] == self.original_img).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 210, 160], [200, 150, 400,
+                                                     300]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[20, 50, 1]], [[200, 150, 1]], [[300, 225, 1]]])).all()
+
+        # test with padding
+        transform = dict(
+            type='CenterCrop',
+            crop_size=(img_width // 2, img_height * 2),
+            auto_pad=True,
+            pad_cfg=dict(type='Pad', padding_mode='constant', pad_val=12))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (600, 200)
+        assert results['img'].shape[:2] == results['gt_seg_map'].shape
+        assert (results['img'][300:600, 100:300, ...] == 12).all()
+        assert (results['gt_seg_map'][300:600, 100:300] == 255).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 110, 160], [100, 150, 200,
+                                                     300]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 50, 0]], [[100, 150, 1]], [[200, 225, 0]]])).all()
+
+        transform = dict(
+            type='CenterCrop',
+            crop_size=(img_width // 2, img_height * 2),
+            auto_pad=True,
+            pad_cfg=dict(
+                type='Pad',
+                padding_mode='constant',
+                pad_val=dict(img=13, seg=33)))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (600, 200)
+        assert (results['img'][300:600, 100:300, ...] == 13).all()
+        assert (results['gt_seg_map'][300:600, 100:300] == 33).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 110, 160], [100, 150, 200,
+                                                     300]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 50, 0]], [[100, 150, 1]], [[200, 225, 0]]])).all()
+
+        # test CenterCrop when crop_width is smaller than img_width
+        transform = dict(
+            type='CenterCrop', crop_size=(img_width // 2, img_height))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (img_height, img_width // 2)
+        assert (results['img'] == self.original_img[:, 100:300, ...]).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map[:,
+                                                              100:300]).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 110, 160], [100, 150, 200,
+                                                     300]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[0, 50, 0]], [[100, 150, 1]], [[200, 225, 0]]])).all()
+
+        # test CenterCrop when crop_height is smaller than img_height
+        transform = dict(
+            type='CenterCrop', crop_size=(img_width, img_height // 2))
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        assert results['img_shape'] == (img_height // 2, img_width)
+        assert (results['img'] == self.original_img[75:225, ...]).all()
+        assert (results['gt_seg_map'] == self.gt_semantic_map[75:225,
+                                                              ...]).all()
+        assert np.equal(results['gt_bboxes'],
+                        np.array([[0, 0, 210, 85], [200, 75, 400,
+                                                    150]])).all()
+        assert np.equal(
+            results['gt_keypoints'],
+            np.array([[[20, 0, 0]], [[200, 75, 1]], [[300, 150, 0]]])).all()
+
+    @pytest.mark.skipif(
+        condition=torch is None, reason='No torch in current env')
+    def test_torchvision_compare(self):
+        # compare results with torchvision
+        results = {}
+        transform = dict(type='CenterCrop', crop_size=224)
+        center_crop_module = TRANSFORMS.build(transform)
+        results = self.reset_results(results, self.original_img,
+                                     self.gt_semantic_map)
+        results = center_crop_module(results)
+        center_crop_module = torchvision.transforms.CenterCrop(size=224)
+        pil_img = Image.fromarray(self.original_img)
+        pil_seg = Image.fromarray(self.gt_semantic_map)
+        cropped_img = center_crop_module(pil_img)
+        cropped_img = np.array(cropped_img)
+        cropped_seg = center_crop_module(pil_seg)
+        cropped_seg = np.array(cropped_seg)
+        assert np.equal(results['img'], cropped_img).all()
+        assert np.equal(results['gt_seg_map'], cropped_seg).all()
+
+
+class TestRandomGrayscale:
+
+    @classmethod
+    def setup_class(cls):
+        cls.img = (np.random.rand(10, 10, 3) * 255).astype(np.uint8)
+
+    def test_repr(self):
+        # test repr
+        transform = dict(
+            type='RandomGrayscale',
+            prob=1.,
+            channel_weights=(0.299, 0.587, 0.114),
+            keep_channels=True)
+        random_gray_scale_module = TRANSFORMS.build(transform)
+        assert isinstance(repr(random_gray_scale_module), str)
+
+    def test_error(self):
+        # test invalid argument
+        transform = dict(type='RandomGrayscale', prob=2)
+        with pytest.raises(AssertionError):
+            TRANSFORMS.build(transform)
+
+    def test_transform(self):
+        results = dict()
+        # test rgb2gray, return the grayscale image with prob = 1.
+        transform = dict(
+            type='RandomGrayscale',
+            prob=1.,
+            channel_weights=(0.299, 0.587, 0.114),
+            keep_channels=True)
+
+        random_gray_scale_module = TRANSFORMS.build(transform)
+        results['img'] = copy.deepcopy(self.img)
+        img = random_gray_scale_module(results)['img']
+        computed_gray = (self.img[:, :, 0] * 0.299 +
+                         self.img[:, :, 1] * 0.587 +
+                         self.img[:, :, 2] * 0.114).astype(np.uint8)
+        for i in range(img.shape[2]):
+            assert_array_almost_equal(img[:, :, i], computed_gray, decimal=4)
+        assert img.shape == (10, 10, 3)
+
+        # test rgb2gray, return the original image with p=0.
+        transform = dict(type='RandomGrayscale', prob=0.)
+        random_gray_scale_module = TRANSFORMS.build(transform)
+        results['img'] = copy.deepcopy(self.img)
+        img = random_gray_scale_module(results)['img']
+        assert_array_equal(img, self.img)
+        assert img.shape == (10, 10, 3)
+
+        # test image with one channel
+        transform = dict(type='RandomGrayscale', prob=1.)
+        results['img'] = self.img[:, :, 0:1]
+        random_gray_scale_module = TRANSFORMS.build(transform)
+        img = random_gray_scale_module(results)['img']
+        assert_array_equal(img, self.img[:, :, 0:1])
+        assert img.shape == (10, 10, 1)
+
+
+@TRANSFORMS.register_module()
+class MockPackTaskInputs(BaseTransform):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+    def transform(self, results):
+        packed_results = dict(inputs=results['img'], data_sample=Mock())
+        return packed_results
+
+
+class TestMultiScaleFlipAug:
+
+    @classmethod
+    def setup_class(cls):
+        cls.img = mmcv.imread(
+            osp.join(osp.dirname(__file__), '../data/color.jpg'), 'color')
+        cls.original_img = copy.deepcopy(cls.img)
+
+    def test_error(self):
+        # test assertion if scales is not tuple or list of tuple
+        with pytest.raises(AssertionError):
+            transform = dict(
+                type='MultiScaleFlipAug', scales=[1333, 800], transforms=[])
+            TRANSFORMS.build(transform)
+
+        # test assertion if flip_direction is not str or list of str
+        with pytest.raises(AssertionError):
+            transform = dict(
+                type='MultiScaleFlipAug',
+                scales=[(1333, 800)],
+                flip_direction=1,
+                transforms=[])
+            TRANSFORMS.build(transform)
+
+    @pytest.mark.skipif(
+        condition=torch is None, reason='No torch in current env')
+    def test_multi_scale_flip_aug(self):
+        # test with empty transforms
+        transform = dict(
+            type='MultiScaleFlipAug',
+            transforms=[dict(type='MockPackTaskInputs')],
+            scales=[(1333, 800), (800, 600), (640, 480)],
+            allow_flip=True,
+            flip_direction=['horizontal', 'vertical', 'diagonal'])
+        multi_scale_flip_aug_module = TRANSFORMS.build(transform)
+        results = dict()
+        results['img'] = copy.deepcopy(self.original_img)
+        packed_results = multi_scale_flip_aug_module(results)
+        assert len(packed_results['inputs']) == 12
+
+        # test with allow_flip=False
+        transform = dict(
+            type='MultiScaleFlipAug',
+            transforms=[dict(type='MockPackTaskInputs')],
+            scales=[(1333, 800), (800, 600), (640, 480)],
+            allow_flip=False,
+            flip_direction=['horizontal', 'vertical', 'diagonal'])
+        multi_scale_flip_aug_module = TRANSFORMS.build(transform)
+        results = dict()
+        results['img'] = copy.deepcopy(self.original_img)
+        packed_results = multi_scale_flip_aug_module(results)
+        assert len(packed_results['inputs']) == 3
+
+        # test with transforms
+        img_norm_cfg = dict(
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            to_rgb=True)
+        transforms_cfg = [
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='MockPackTaskInputs')
+        ]
+        transform = dict(
+            type='MultiScaleFlipAug',
+            transforms=transforms_cfg,
+            scales=[(1333, 800), (800, 600), (640, 480)],
+            allow_flip=True,
+            flip_direction=['horizontal', 'vertical', 'diagonal'])
+        multi_scale_flip_aug_module = TRANSFORMS.build(transform)
+        results = dict()
+        results['img'] = copy.deepcopy(self.original_img)
+        packed_results = multi_scale_flip_aug_module(results)
+        assert len(packed_results['inputs']) == 12
+
+        # test with scale_factor
+        img_norm_cfg = dict(
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            to_rgb=True)
+        transforms_cfg = [
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='MockPackTaskInputs')
+        ]
+        transform = dict(
+            type='MultiScaleFlipAug',
+            transforms=transforms_cfg,
+            scale_factor=[0.5, 1., 2.],
+            allow_flip=True,
+            flip_direction=['horizontal', 'vertical', 'diagonal'])
+        multi_scale_flip_aug_module = TRANSFORMS.build(transform)
+        results = dict()
+        results['img'] = copy.deepcopy(self.original_img)
+        packed_results = multi_scale_flip_aug_module(results)
+        assert len(packed_results['inputs']) == 12
+
+        # test no resize
+        img_norm_cfg = dict(
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            to_rgb=True)
+        transforms_cfg = [
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='MockPackTaskInputs')
+        ]
+        transform = dict(
+            type='MultiScaleFlipAug',
+            transforms=transforms_cfg,
+            allow_flip=True,
+            flip_direction=['horizontal', 'vertical', 'diagonal'])
+        multi_scale_flip_aug_module = TRANSFORMS.build(transform)
+        results = dict()
+        results['img'] = copy.deepcopy(self.original_img)
+        packed_results = multi_scale_flip_aug_module(results)
+        assert len(packed_results['inputs']) == 4
+
+
+class TestRandomChoiceResize:
+
+    @classmethod
+    def setup_class(cls):
+        cls.img = mmcv.imread(
+            osp.join(osp.dirname(__file__), '../data/color.jpg'), 'color')
+        cls.original_img = copy.deepcopy(cls.img)
+
+    def reset_results(self, results):
+        results['img'] = copy.deepcopy(self.original_img)
+        results['gt_seg_map'] = copy.deepcopy(self.original_img)
+
+    def test_repr(self):
+        # test repr
+        transform = dict(
+            type='RandomChoiceResize', scales=[(1333, 800), (1333, 600)])
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        assert isinstance(repr(random_multiscale_resize), str)
+
+    def test_error(self):
+        # test assertion if size is smaller than 0
+        with pytest.raises(AssertionError):
+            transform = dict(type='RandomChoiceResize', scales=[0.5, 1, 2])
+            TRANSFORMS.build(transform)
+
+    def test_random_multiscale_resize(self):
+        results = dict()
+        # test with one scale
+        transform = dict(type='RandomChoiceResize', scales=[(1333, 800)])
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        self.reset_results(results)
+        results = random_multiscale_resize(results)
+        assert results['img'].shape == (800, 1333, 3)
+
+        # test with multi scales
+        _scale_choice = [(1333, 800), (1333, 600)]
+        transform = dict(type='RandomChoiceResize', scales=_scale_choice)
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        self.reset_results(results)
+        results = random_multiscale_resize(results)
+        assert (results['img'].shape[1],
+                results['img'].shape[0]) in _scale_choice
+
+        # test keep_ratio
+        transform = dict(
+            type='RandomChoiceResize',
+            scales=[(900, 600)],
+            resize_type='Resize',
+            keep_ratio=True)
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        self.reset_results(results)
+        _input_ratio = results['img'].shape[0] / results['img'].shape[1]
+        results = random_multiscale_resize(results)
+        _output_ratio = results['img'].shape[0] / results['img'].shape[1]
+        assert_array_almost_equal(_input_ratio, _output_ratio)
+
+        # test clip_object_border
+        gt_bboxes = [[200, 150, 600, 450]]
+        transform = dict(
+            type='RandomChoiceResize',
+            scales=[(200, 150)],
+            resize_type='Resize',
+            clip_object_border=True)
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        self.reset_results(results)
+        results['gt_bboxes'] = np.array(gt_bboxes)
+        results = random_multiscale_resize(results)
+        assert results['img'].shape == (150, 200, 3)
+        assert np.equal(results['gt_bboxes'], np.array([[100, 75, 200,
+                                                         150]])).all()
+
+        transform = dict(
+            type='RandomChoiceResize',
+            scales=[(200, 150)],
+            resize_type='Resize',
+            clip_object_border=False)
+        random_multiscale_resize = TRANSFORMS.build(transform)
+        self.reset_results(results)
+        results['gt_bboxes'] = np.array(gt_bboxes)
+        results = random_multiscale_resize(results)
+        assert results['img'].shape == (150, 200, 3)
+        assert np.equal(results['gt_bboxes'], np.array([[100, 75, 300,
+                                                         225]])).all()
+
+
+class TestRandomFlip:
+
+    def test_init(self):
+
+        # prob is float
+        TRANSFORMS = RandomFlip(0.1)
+        assert TRANSFORMS.prob == 0.1
+
+        # prob is None
+        with pytest.raises(ValueError):
+            TRANSFORMS = RandomFlip(None)
+            assert TRANSFORMS.prob is None
+
+        # prob is a list
+        TRANSFORMS = RandomFlip([0.1, 0.2], ['horizontal', 'vertical'])
+        assert len(TRANSFORMS.prob) == 2
+        assert len(TRANSFORMS.direction) == 2
+
+        # direction is an invalid type
+        with pytest.raises(ValueError):
+            TRANSFORMS = RandomFlip(0.1, 1)
+
+        # prob is an invalid type
+        with pytest.raises(ValueError):
+            TRANSFORMS = RandomFlip('0.1')
+
+    def test_transform(self):
+
+        results = {
+            'img': np.random.random((224, 224, 3)),
+            'gt_bboxes': np.array([[0, 1, 100, 101]]),
+            'gt_keypoints': np.array([[[100, 100, 1.0]]]),
+            # seg map flip is irrelative with image, so there is no requirement
+            # that gt_set_map of test data matches image.
+            'gt_seg_map': np.array([[0, 1], [2, 3]])
+        }
+
+        # horizontal flip
+        TRANSFORMS = RandomFlip([1.0], ['horizontal'])
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_bboxes'] == np.array([[124, 1, 224,
+                                                          101]])).all()
+        assert (results_update['gt_seg_map'] == np.array([[1, 0], [3,
+                                                                   2]])).all()
+
+        # diagonal flip
+        TRANSFORMS = RandomFlip([1.0], ['diagonal'])
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_bboxes'] == np.array([[124, 123, 224,
+                                                          223]])).all()
+        assert (results_update['gt_seg_map'] == np.array([[3, 2], [1,
+                                                                   0]])).all()
+
+        # vertical flip
+        TRANSFORMS = RandomFlip([1.0], ['vertical'])
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_bboxes'] == np.array([[0, 123, 100,
+                                                          223]])).all()
+        assert (results_update['gt_seg_map'] == np.array([[2, 3], [0,
+                                                                   1]])).all()
+
+        # horizontal flip when direction is None
+        TRANSFORMS = RandomFlip(1.0)
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_bboxes'] == np.array([[124, 1, 224,
+                                                          101]])).all()
+        assert (results_update['gt_seg_map'] == np.array([[1, 0], [3,
+                                                                   2]])).all()
+
+        # horizontal flip and swap label pair
+        TRANSFORMS = RandomFlip([1.0], ['horizontal'],
+                                swap_seg_labels=[[0, 1]])
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_seg_map'] == np.array([[0, 1], [3,
+                                                                   2]])).all()
+        assert results_update['swap_seg_labels'] == [[0, 1]]
+
+        TRANSFORMS = RandomFlip(0.0)
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert (results_update['gt_bboxes'] == np.array([[0, 1, 100,
+                                                          101]])).all()
+        assert (results_update['gt_seg_map'] == np.array([[0, 1], [2,
+                                                                   3]])).all()
+
+        # flip direction is invalid in bbox flip
+        with pytest.raises(ValueError):
+            TRANSFORMS = RandomFlip(1.0)
+            results_update = TRANSFORMS._flip_bbox(results['gt_bboxes'],
+                                                   (224, 224), 'invalid')
+
+        # flip direction is invalid in keypoints flip
+        with pytest.raises(ValueError):
+            TRANSFORMS = RandomFlip(1.0)
+            results_update = TRANSFORMS._flip_keypoints(
+                results['gt_keypoints'], (224, 224), 'invalid')
+
+        # swap pair is invalid
+        with pytest.raises(AssertionError):
+            TRANSFORMS = RandomFlip(1.0, swap_seg_labels='invalid')
+            results_update = TRANSFORMS._flip_seg_map(results['gt_seg_map'],
+                                                      'horizontal')
+
+    def test_repr(self):
+        TRANSFORMS = RandomFlip(0.1)
+        TRANSFORMS_str = str(TRANSFORMS)
+        assert isinstance(TRANSFORMS_str, str)
+
+
+class TestRandomResize:
+
+    def test_init(self):
+        TRANSFORMS = RandomResize(
+            (224, 224),
+            (1.0, 2.0),
+        )
+        assert TRANSFORMS.scale == (224, 224)
+
+    def test_repr(self):
+        TRANSFORMS = RandomResize(
+            (224, 224),
+            (1.0, 2.0),
+        )
+        TRANSFORMS_str = str(TRANSFORMS)
+        assert isinstance(TRANSFORMS_str, str)
+
+    def test_transform(self):
+
+        # choose target scale from init when override is True
+        results = {}
+        TRANSFORMS = RandomResize((224, 224), (1.0, 2.0))
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert results_update['scale'][0] >= 224 and results_update['scale'][
+            0] <= 448
+        assert results_update['scale'][1] >= 224 and results_update['scale'][
+            1] <= 448
+
+        # keep ratio is True
+        results = {
+            'img': np.random.random((224, 224, 3)),
+            'gt_seg_map': np.random.random((224, 224, 3)),
+            'gt_bboxes': np.array([[0, 0, 112, 112]]),
+            'gt_keypoints': np.array([[[112, 112]]])
+        }
+
+        TRANSFORMS = RandomResize((224, 224), (1.0, 2.0),
+                                  resize_type='Resize',
+                                  keep_ratio=True)
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert 224 <= results_update['img_shape'][0]
+        assert 448 >= results_update['img_shape'][0]
+        assert 224 <= results_update['img_shape'][1]
+        assert 448 >= results_update['img_shape'][1]
+        assert results_update['keep_ratio']
+        assert results['gt_bboxes'][0][2] >= 112
+        assert results['gt_bboxes'][0][2] <= 112
+
+        # keep ratio is False
+        TRANSFORMS = RandomResize((224, 224), (1.0, 2.0),
+                                  resize_type='Resize',
+                                  keep_ratio=False)
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+
+        # choose target scale from init when override is False and scale is a
+        # list of tuples
+        results = {}
+        TRANSFORMS = RandomResize([(224, 448), (112, 224)],
+                                  resize_type='Resize',
+                                  keep_ratio=True)
+        results_update = TRANSFORMS.transform(copy.deepcopy(results))
+        assert results_update['scale'][1] >= 224 and results_update['scale'][
+            1] <= 448
+        assert results_update['scale'][0] >= 112 and results_update['scale'][
+            0] <= 224
+
+        # the type of scale is invalid in init
+        with pytest.raises(NotImplementedError):
+            results = {}
+            TRANSFORMS = RandomResize([(224, 448), [112, 224]],
+                                      resize_type='Resize',
+                                      keep_ratio=True)
+            results_update = TRANSFORMS.transform(copy.deepcopy(results))
+
+
+class TestTestTimeAug:
+
+    def test_init(self):
+        subroutines = [[
+            dict(type='Resize', scale=(1333, 800), keep_ratio=True),
+            dict(type='Resize', scale=(1333, 400), keep_ratio=True)
+        ], [
+            dict(type='RandomFlip', prob=1.),
+            dict(type='RandomFlip', prob=0.)
+        ], [dict(type='Normalize', mean=(0, 0, 0), std=(1, 1, 1))]]
+
+        tta_transform = TestTimeAug(subroutines)
+        subroutines = tta_transform.subroutines
+        assert len(subroutines) == 4
+
+        assert isinstance(subroutines[0].transforms[0], Resize)
+        assert isinstance(subroutines[0].transforms[1], RandomFlip)
+        assert isinstance(subroutines[0].transforms[2], Normalize)
+        assert isinstance(subroutines[1].transforms[0], Resize)
+        assert isinstance(subroutines[1].transforms[1], RandomFlip)
+        assert isinstance(subroutines[1].transforms[2], Normalize)
+
+    def test_transform(self):
+        results = {
+            'img': np.random.random((224, 224, 3)),
+            'gt_bboxes': np.array([[0, 1, 100, 101]]),
+            'gt_keypoints': np.array([[[100, 100, 1.0]]]),
+            'gt_seg_map': np.random.random((224, 224, 3))
+        }
+        input_results = copy.deepcopy(results)
+        transforms = [[
+            dict(type='Resize', scale=(1333, 800), keep_ratio=True),
+            dict(type='Resize', scale=(1333, 400), keep_ratio=True)
+        ], [
+            dict(type='RandomFlip', prob=0.),
+            dict(type='RandomFlip', prob=1.)
+        ], [dict(type='Normalize', mean=(0, 0, 0), std=(1, 1, 1))]]
+
+        tta_transform = TestTimeAug(transforms)
+        results = tta_transform.transform(results)
+        assert len(results['img']) == 4
+
+        resize1 = tta_transform.subroutines[0].transforms[0]
+        resize2 = tta_transform.subroutines[2].transforms[0]
+        flip1 = tta_transform.subroutines[0].transforms[1]
+        flip2 = tta_transform.subroutines[1].transforms[1]
+        normalize = tta_transform.subroutines[0].transforms[2]
+        target_results = [
+            normalize.transform(
+                flip1.transform(
+                    resize1.transform(copy.deepcopy(input_results)))),
+            normalize.transform(
+                flip2.transform(
+                    resize1.transform(copy.deepcopy(input_results)))),
+            normalize.transform(
+                flip1.transform(
+                    resize2.transform(copy.deepcopy(input_results)))),
+            normalize.transform(
+                flip2.transform(
+                    resize2.transform(copy.deepcopy(input_results)))),
+        ]
+
+        assert np.allclose(target_results[0]['img'], results['img'][0])
+        assert np.allclose(target_results[1]['img'], results['img'][1])
+        assert np.allclose(target_results[2]['img'], results['img'][2])
+        assert np.allclose(target_results[3]['img'], results['img'][3])
+
+    def test_repr(self):
+        transforms = [[
+            dict(type='Resize', scale=(1333, 800), keep_ratio=True),
+            dict(type='Resize', scale=(1333, 400), keep_ratio=True)
+        ], [
+            dict(type='RandomFlip', prob=0.),
+            dict(type='RandomFlip', prob=1.)
+        ], [dict(type='Normalize', mean=(0, 0, 0), std=(1, 1, 1))]]
+
+        tta_transform = TestTimeAug(transforms)
+        repr_str = repr(tta_transform)
+        repr_str_list = repr_str.split('\n')
+        assert repr_str_list[0] == 'TestTimeAugtransforms='
+        assert repr_str_list[1] == 'Compose('
+        assert repr_str_list[2].startswith('    Resize(scale=(1333, 800)')
+        assert repr_str_list[3].startswith('    RandomFlip(prob=0.0')
+        assert repr_str_list[4].startswith('    Normalize(mean=[0. 0. 0.]')
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_wrapper.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..98feeb83e0788f21aa44c25a09bc65524e30598f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_transforms/test_transforms_wrapper.py
@@ -0,0 +1,585 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+
+import numpy as np
+import pytest
+
+from mmcv.transforms.base import BaseTransform
+from mmcv.transforms.builder import TRANSFORMS
+from mmcv.transforms.utils import (avoid_cache_randomness, cache_random_params,
+                                   cache_randomness)
+from mmcv.transforms.wrappers import (Compose, KeyMapper, RandomApply,
+                                      RandomChoice, TransformBroadcaster)
+
+
+@TRANSFORMS.register_module()
+class AddToValue(BaseTransform):
+    """Dummy transform to add a given addend to results['value']"""
+
+    def __init__(self, addend=0) -> None:
+        super().__init__()
+        self.addend = addend
+
+    def add(self, results, addend):
+        augend = results['value']
+
+        if isinstance(augend, list):
+            warnings.warn('value is a list', UserWarning)
+        if isinstance(augend, dict):
+            warnings.warn('value is a dict', UserWarning)
+
+        def _add_to_value(augend, addend):
+            if isinstance(augend, list):
+                return [_add_to_value(v, addend) for v in augend]
+            if isinstance(augend, dict):
+                return {k: _add_to_value(v, addend) for k, v in augend.items()}
+            return augend + addend
+
+        results['value'] = _add_to_value(results['value'], addend)
+        return results
+
+    def transform(self, results):
+        return self.add(results, self.addend)
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'addend = {self.addend}'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class RandomAddToValue(AddToValue):
+    """Dummy transform to add a random addend to results['value']"""
+
+    def __init__(self, repeat=1) -> None:
+        super().__init__(addend=None)
+        self.repeat = repeat
+
+    @cache_randomness
+    def get_random_addend(self):
+        return np.random.rand()
+
+    def transform(self, results):
+        for _ in range(self.repeat):
+            results = self.add(results, addend=self.get_random_addend())
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'repeat = {self.repeat}'
+        return repr_str
+
+
+@TRANSFORMS.register_module()
+class SumTwoValues(BaseTransform):
+    """Dummy transform to test transform wrappers."""
+
+    def transform(self, results):
+        if 'num_1' in results and 'num_2' in results:
+            results['sum'] = results['num_1'] + results['num_2']
+        elif 'num_1' in results:
+            results['sum'] = results['num_1']
+        elif 'num_2' in results:
+            results['sum'] = results['num_2']
+        else:
+            results['sum'] = np.nan
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        return repr_str
+
+
+def test_compose():
+
+    # Case 1: build from cfg
+    pipeline = [dict(type='AddToValue')]
+    pipeline = Compose(pipeline)
+    _ = str(pipeline)
+
+    # Case 2: build from transform list
+    pipeline = [AddToValue()]
+    pipeline = Compose(pipeline)
+
+    # Case 3: invalid build arguments
+    pipeline = [[dict(type='AddToValue')]]
+    with pytest.raises(TypeError):
+        pipeline = Compose(pipeline)
+
+    # Case 4: contain transform with None output
+    class DummyTransform(BaseTransform):
+
+        def transform(self, results):
+            return None
+
+    pipeline = Compose([DummyTransform()])
+    results = pipeline({})
+    assert results is None
+
+
+def test_cache_random_parameters():
+
+    transform = RandomAddToValue()
+
+    # Case 1: cache random parameters
+    assert hasattr(RandomAddToValue, '_methods_with_randomness')
+    assert 'get_random_addend' in RandomAddToValue._methods_with_randomness
+
+    with cache_random_params(transform):
+        results_1 = transform(dict(value=0))
+        results_2 = transform(dict(value=0))
+        np.testing.assert_equal(results_1['value'], results_2['value'])
+
+    # Case 2: do not cache random parameters
+    results_1 = transform(dict(value=0))
+    results_2 = transform(dict(value=0))
+    with pytest.raises(AssertionError):
+        np.testing.assert_equal(results_1['value'], results_2['value'])
+
+    # Case 3: allow to invoke random method 0 times
+    transform = RandomAddToValue(repeat=0)
+    with cache_random_params(transform):
+        _ = transform(dict(value=0))
+
+    # Case 4: NOT allow to invoke random method >1 times
+    transform = RandomAddToValue(repeat=2)
+    with pytest.raises(RuntimeError):
+        with cache_random_params(transform):
+            _ = transform(dict(value=0))
+
+    # Case 5: apply on nested transforms
+    transform = Compose([RandomAddToValue()])
+    with cache_random_params(transform):
+        results_1 = transform(dict(value=0))
+        results_2 = transform(dict(value=0))
+        np.testing.assert_equal(results_1['value'], results_2['value'])
+
+
+def test_key_mapper():
+    # Case 0: only remap
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=1)], remapping={'value': 'v_out'})
+
+    results = dict(value=0)
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)  # should be unchanged
+    np.testing.assert_equal(results['v_out'], 1)
+
+    # Case 1: simple remap
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=1)],
+        mapping={'value': 'v_in'},
+        remapping={'value': 'v_out'})
+
+    results = dict(value=0, v_in=1)
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)  # should be unchanged
+    np.testing.assert_equal(results['v_in'], 1)
+    np.testing.assert_equal(results['v_out'], 2)
+
+    # Case 2: collecting list
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=2)],
+        mapping={'value': ['v_in_1', 'v_in_2']},
+        remapping={'value': ['v_out_1', 'v_out_2']})
+    results = dict(value=0, v_in_1=1, v_in_2=2)
+
+    with pytest.warns(UserWarning, match='value is a list'):
+        results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)  # should be unchanged
+    np.testing.assert_equal(results['v_in_1'], 1)
+    np.testing.assert_equal(results['v_in_2'], 2)
+    np.testing.assert_equal(results['v_out_1'], 3)
+    np.testing.assert_equal(results['v_out_2'], 4)
+
+    # Case 3: collecting dict
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=2)],
+        mapping={'value': {
+            'v1': 'v_in_1',
+            'v2': 'v_in_2'
+        }},
+        remapping={'value': {
+            'v1': 'v_out_1',
+            'v2': 'v_out_2'
+        }})
+    results = dict(value=0, v_in_1=1, v_in_2=2)
+
+    with pytest.warns(UserWarning, match='value is a dict'):
+        results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)  # should be unchanged
+    np.testing.assert_equal(results['v_in_1'], 1)
+    np.testing.assert_equal(results['v_in_2'], 2)
+    np.testing.assert_equal(results['v_out_1'], 3)
+    np.testing.assert_equal(results['v_out_2'], 4)
+
+    # Case 4: collecting list with auto_remap mode
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=2)],
+        mapping=dict(value=['v_in_1', 'v_in_2']),
+        auto_remap=True)
+    results = dict(value=0, v_in_1=1, v_in_2=2)
+
+    with pytest.warns(UserWarning, match='value is a list'):
+        results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)
+    np.testing.assert_equal(results['v_in_1'], 3)
+    np.testing.assert_equal(results['v_in_2'], 4)
+
+    # Case 5: collecting dict with auto_remap mode
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=2)],
+        mapping=dict(value=dict(v1='v_in_1', v2='v_in_2')),
+        auto_remap=True)
+    results = dict(value=0, v_in_1=1, v_in_2=2)
+
+    with pytest.warns(UserWarning, match='value is a dict'):
+        results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)
+    np.testing.assert_equal(results['v_in_1'], 3)
+    np.testing.assert_equal(results['v_in_2'], 4)
+
+    # Case 6: nested collection with auto_remap mode
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=2)],
+        mapping=dict(value=['v1', dict(v2=['v21', 'v22'], v3='v3')]),
+        auto_remap=True)
+    results = dict(value=0, v1=1, v21=2, v22=3, v3=4)
+
+    with pytest.warns(UserWarning, match='value is a list'):
+        results = pipeline(results)
+
+    np.testing.assert_equal(results['value'], 0)
+    np.testing.assert_equal(results['v1'], 3)
+    np.testing.assert_equal(results['v21'], 4)
+    np.testing.assert_equal(results['v22'], 5)
+    np.testing.assert_equal(results['v3'], 6)
+
+    # Case 7: output_map must be None if `auto_remap` is set True
+    with pytest.raises(ValueError):
+        pipeline = KeyMapper(
+            transforms=[AddToValue(addend=1)],
+            mapping=dict(value='v_in'),
+            remapping=dict(value='v_out'),
+            auto_remap=True)
+
+    # Case 8: allow_nonexist_keys8
+    pipeline = KeyMapper(
+        transforms=[SumTwoValues()],
+        mapping=dict(num_1='a', num_2='b'),
+        auto_remap=False,
+        allow_nonexist_keys=True)
+
+    results = pipeline(dict(a=1, b=2))
+    np.testing.assert_equal(results['sum'], 3)
+
+    results = pipeline(dict(a=1))
+    np.testing.assert_equal(results['sum'], 1)
+
+    # Case 9: use wrapper as a transform
+    transform = KeyMapper(mapping=dict(b='a'), auto_remap=False)
+    results = transform(dict(a=1))
+    # note that the original key 'a' will not be removed
+    assert results == dict(a=1, b=1)
+
+    # Case 10: manually set keys ignored
+    pipeline = KeyMapper(
+        transforms=[SumTwoValues()],
+        mapping=dict(num_1='a', num_2=...),  # num_2 (b) will be ignored
+        auto_remap=False,
+        # allow_nonexist_keys will not affect manually ignored keys
+        allow_nonexist_keys=False)
+
+    results = pipeline(dict(a=1, b=2))
+    np.testing.assert_equal(results['sum'], 1)
+
+    # Test basic functions
+    pipeline = KeyMapper(
+        transforms=[AddToValue(addend=1)],
+        mapping=dict(value='v_in'),
+        remapping=dict(value='v_out'))
+
+    # __iter__
+    for _ in pipeline:
+        pass
+
+    # __repr__
+    assert repr(pipeline) == (
+        'KeyMapper(transforms = Compose(\n    ' + 'AddToValueaddend = 1' +
+        '\n), mapping = {\'value\': \'v_in\'}, ' +
+        'remapping = {\'value\': \'v_out\'}, auto_remap = False, ' +
+        'allow_nonexist_keys = False)')
+
+
+def test_transform_broadcaster():
+
+    # Case 1: apply to list in results
+    pipeline = TransformBroadcaster(
+        transforms=[AddToValue(addend=1)],
+        mapping=dict(value='values'),
+        auto_remap=True)
+    results = dict(values=[1, 2])
+
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['values'], [2, 3])
+
+    # Case 2: apply to multiple keys
+    pipeline = TransformBroadcaster(
+        transforms=[AddToValue(addend=1)],
+        mapping=dict(value=['v_1', 'v_2']),
+        auto_remap=True)
+    results = dict(v_1=1, v_2=2)
+
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['v_1'], 2)
+    np.testing.assert_equal(results['v_2'], 3)
+
+    # Case 3: apply to multiple groups of keys
+    pipeline = TransformBroadcaster(
+        transforms=[SumTwoValues()],
+        mapping=dict(num_1=['a_1', 'b_1'], num_2=['a_2', 'b_2']),
+        remapping=dict(sum=['a', 'b']),
+        auto_remap=False)
+
+    results = dict(a_1=1, a_2=2, b_1=3, b_2=4)
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['a'], 3)
+    np.testing.assert_equal(results['b'], 7)
+
+    # Case 3: apply to all keys
+    pipeline = TransformBroadcaster(
+        transforms=[SumTwoValues()], mapping=None, remapping=None)
+    results = dict(num_1=[1, 2, 3], num_2=[4, 5, 6])
+
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['sum'], [5, 7, 9])
+
+    # Case 4: inconsistent sequence length
+    with pytest.raises(ValueError):
+        pipeline = TransformBroadcaster(
+            transforms=[SumTwoValues()],
+            mapping=dict(num_1='list_1', num_2='list_2'),
+            auto_remap=False)
+
+        results = dict(list_1=[1, 2], list_2=[1, 2, 3])
+        _ = pipeline(results)
+
+    # Case 5: share random parameter
+    pipeline = TransformBroadcaster(
+        transforms=[RandomAddToValue()],
+        mapping=dict(value='values'),
+        auto_remap=True,
+        share_random_params=True)
+
+    results = dict(values=[0, 0])
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['values'][0], results['values'][1])
+
+    # Case 6: partial broadcasting
+    pipeline = TransformBroadcaster(
+        transforms=[SumTwoValues()],
+        mapping=dict(num_1=['a_1', 'b_1'], num_2=['a_2', ...]),
+        remapping=dict(sum=['a', 'b']),
+        auto_remap=False)
+
+    results = dict(a_1=1, a_2=2, b_1=3, b_2=4)
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['a'], 3)
+    np.testing.assert_equal(results['b'], 3)
+
+    pipeline = TransformBroadcaster(
+        transforms=[SumTwoValues()],
+        mapping=dict(num_1=['a_1', 'b_1'], num_2=['a_2', 'b_2']),
+        remapping=dict(sum=['a', ...]),
+        auto_remap=False)
+
+    results = dict(a_1=1, a_2=2, b_1=3, b_2=4)
+    results = pipeline(results)
+
+    np.testing.assert_equal(results['a'], 3)
+    assert 'b' not in results
+
+    # Test repr
+    assert repr(pipeline) == (
+        'TransformBroadcaster(transforms = Compose(\n' + '    SumTwoValues' +
+        '\n), mapping = {\'num_1\': [\'a_1\', \'b_1\'], ' +
+        '\'num_2\': [\'a_2\', \'b_2\']}, ' +
+        'remapping = {\'sum\': [\'a\', Ellipsis]}, auto_remap = False, ' +
+        'allow_nonexist_keys = False, share_random_params = False)')
+
+
+def test_random_choice():
+
+    # Case 1: given probability
+    pipeline = RandomChoice(
+        transforms=[[AddToValue(addend=1.0)], [AddToValue(addend=2.0)]],
+        prob=[1.0, 0.0])
+
+    results = pipeline(dict(value=1))
+    np.testing.assert_equal(results['value'], 2.0)
+
+    # Case 2: default probability
+    pipeline = RandomChoice(transforms=[[AddToValue(
+        addend=1.0)], [AddToValue(addend=2.0)]])
+
+    _ = pipeline(dict(value=1))
+
+    # Case 3: nested RandomChoice in TransformBroadcaster
+    pipeline = TransformBroadcaster(
+        transforms=[
+            RandomChoice(
+                transforms=[[AddToValue(addend=1.0)],
+                            [AddToValue(addend=2.0)]], ),
+        ],
+        mapping={'value': 'values'},
+        auto_remap=True,
+        share_random_params=True)
+
+    results = dict(values=[0 for _ in range(10)])
+    results = pipeline(results)
+    # check share_random_params=True works so that all values are same
+    values = results['values']
+    assert all(map(lambda x: x == values[0], values))
+
+    # repr
+    assert repr(pipeline) == (
+        'TransformBroadcaster(transforms = Compose(\n' +
+        '    RandomChoice(transforms = [Compose(\n' +
+        '    AddToValueaddend = 1.0' + '\n), Compose(\n' +
+        '    AddToValueaddend = 2.0' + '\n)]prob = None)' +
+        '\n), mapping = {\'value\': \'values\'}, ' +
+        'remapping = {\'value\': \'values\'}, auto_remap = True, ' +
+        'allow_nonexist_keys = False, share_random_params = True)')
+
+
+def test_random_apply():
+
+    # Case 1: simple use
+    pipeline = RandomApply(transforms=[AddToValue(addend=1.0)], prob=1.0)
+    results = pipeline(dict(value=1))
+    np.testing.assert_equal(results['value'], 2.0)
+
+    pipeline = RandomApply(transforms=[AddToValue(addend=1.0)], prob=0.0)
+    results = pipeline(dict(value=1))
+    np.testing.assert_equal(results['value'], 1.0)
+
+    # Case 2: nested RandomApply in TransformBroadcaster
+    pipeline = TransformBroadcaster(
+        transforms=[RandomApply(transforms=[AddToValue(addend=1)], prob=0.5)],
+        mapping={'value': 'values'},
+        auto_remap=True,
+        share_random_params=True)
+
+    results = dict(values=[0 for _ in range(10)])
+    results = pipeline(results)
+    # check share_random_params=True works so that all values are same
+    values = results['values']
+    assert all(map(lambda x: x == values[0], values))
+
+    # __iter__
+    for _ in pipeline:
+        pass
+
+    # repr
+    assert repr(pipeline) == (
+        'TransformBroadcaster(transforms = Compose(\n' +
+        '    RandomApply(transforms = Compose(\n' +
+        '    AddToValueaddend = 1' + '\n), prob = 0.5)' +
+        '\n), mapping = {\'value\': \'values\'}, ' +
+        'remapping = {\'value\': \'values\'}, auto_remap = True, ' +
+        'allow_nonexist_keys = False, share_random_params = True)')
+
+
+def test_utils():
+    # Test cache_randomness: normal case
+    class DummyTransform(BaseTransform):
+
+        @cache_randomness
+        def func(self):
+            return np.random.rand()
+
+        def transform(self, results):
+            _ = self.func()
+            return results
+
+    transform = DummyTransform()
+    _ = transform({})
+    with cache_random_params(transform):
+        _ = transform({})
+
+    # Test cache_randomness: invalid function type
+    with pytest.raises(TypeError):
+
+        class DummyTransform(BaseTransform):
+
+            @cache_randomness
+            @staticmethod
+            def func():
+                return np.random.rand()
+
+            def transform(self, results):
+                return results
+
+    # Test cache_randomness: invalid function argument list
+    with pytest.raises(TypeError):
+
+        class DummyTransform(BaseTransform):
+
+            @cache_randomness
+            def func(cls):
+                return np.random.rand()
+
+            def transform(self, results):
+                return results
+
+    # Test avoid_cache_randomness: invalid mixture with cache_randomness
+    with pytest.raises(RuntimeError):
+
+        @avoid_cache_randomness
+        class DummyTransform(BaseTransform):
+
+            @cache_randomness
+            def func(self):
+                pass
+
+            def transform(self, results):
+                return results
+
+    # Test avoid_cache_randomness: raise error in cache_random_params
+    with pytest.raises(RuntimeError):
+
+        @avoid_cache_randomness
+        class DummyTransform(BaseTransform):
+
+            def transform(self, results):
+                return results
+
+        transform = DummyTransform()
+        with cache_random_params(transform):
+            pass
+
+    # Test avoid_cache_randomness: non-inheritable
+    @avoid_cache_randomness
+    class DummyBaseTransform(BaseTransform):
+
+        def transform(self, results):
+            return results
+
+    class DummyTransform(DummyBaseTransform):
+        pass
+
+    transform = DummyTransform()
+    with cache_random_params(transform):
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_env.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_env.py
new file mode 100644
index 0000000000000000000000000000000000000000..74bafff3715d862394147f505adff77448108e11
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_env.py
@@ -0,0 +1,34 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import sys
+
+import pytest
+
+import mmcv
+
+
+def test_collect_env():
+    try:
+        import torch  # noqa: F401
+    except ModuleNotFoundError:
+        pytest.skip('skipping tests that require PyTorch')
+
+    from mmcv.utils import collect_env
+    env_info = collect_env()
+    expected_keys = [
+        'sys.platform', 'Python', 'CUDA available', 'PyTorch',
+        'PyTorch compiling details', 'OpenCV', 'MMCV', 'MMCV Compiler', 'GCC',
+        'MMCV CUDA Compiler'
+    ]
+    for key in expected_keys:
+        assert key in env_info
+
+    if env_info['CUDA available']:
+        for key in ['CUDA_HOME', 'NVCC']:
+            assert key in env_info
+
+    if sys.platform == 'win32':
+        assert 'MSVC' in env_info
+
+    assert env_info['sys.platform'] == sys.platform
+    assert env_info['Python'] == sys.version.replace('\n', '')
+    assert env_info['MMCV'] == mmcv.__version__
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_parrots_jit.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_parrots_jit.py
new file mode 100644
index 0000000000000000000000000000000000000000..921a4402de82f699b6b96566da6dfed12f0a2b5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_utils/test_parrots_jit.py
@@ -0,0 +1,278 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+from mmengine.utils.dl_utils import TORCH_VERSION
+
+import mmcv
+
+pytest.skip('this test not ready now', allow_module_level=True)
+skip_no_parrots = pytest.mark.skipif(
+    TORCH_VERSION != 'parrots', reason='test case under parrots environment')
+
+
+class TestJit:
+
+    def test_add_dict(self):
+
+        @mmcv.jit
+        def add_dict(oper):
+            rets = oper['x'] + oper['y']
+            return {'result': rets}
+
+        def add_dict_pyfunc(oper):
+            rets = oper['x'] + oper['y']
+            return {'result': rets}
+
+        a = torch.rand((3, 4))
+        b = torch.rand((3, 4))
+        oper = {'x': a, 'y': b}
+
+        rets_t = add_dict(oper)
+        rets = add_dict_pyfunc(oper)
+        assert 'result' in rets
+        assert (rets_t['result'] == rets['result']).all()
+
+    def test_add_list(self):
+
+        @mmcv.jit
+        def add_list(oper, x, y):
+            rets = {}
+            for idx, pair in enumerate(oper):
+                rets[f'k{idx}'] = pair['x'] + pair['y']
+            rets[f'k{len(oper)}'] = x + y
+            return rets
+
+        def add_list_pyfunc(oper, x, y):
+            rets = {}
+            for idx, pair in enumerate(oper):
+                rets[f'k{idx}'] = pair['x'] + pair['y']
+            rets[f'k{len(oper)}'] = x + y
+            return rets
+
+        pair_num = 3
+        oper = []
+        for _ in range(pair_num):
+            oper.append({'x': torch.rand((3, 4)), 'y': torch.rand((3, 4))})
+        a = torch.rand((3, 4))
+        b = torch.rand((3, 4))
+        rets = add_list_pyfunc(oper, x=a, y=b)
+        rets_t = add_list(oper, x=a, y=b)
+        for idx in range(pair_num + 1):
+            assert f'k{idx}' in rets_t
+            assert (rets[f'k{idx}'] == rets_t[f'k{idx}']).all()
+
+    @skip_no_parrots
+    def test_jit_cache(self):
+
+        @mmcv.jit
+        def func(oper):
+            if oper['const'] > 1:
+                return oper['x'] * 2 + oper['y']
+            else:
+                return oper['x'] * 2 - oper['y']
+
+        def pyfunc(oper):
+            if oper['const'] > 1:
+                return oper['x'] * 2 + oper['y']
+            else:
+                return oper['x'] * 2 - oper['y']
+
+        assert len(func._cache._cache) == 0
+
+        oper = {'const': 2, 'x': torch.rand((3, 4)), 'y': torch.rand((3, 4))}
+        rets_plus = pyfunc(oper)
+        rets_plus_t = func(oper)
+        assert (rets_plus == rets_plus_t).all()
+        assert len(func._cache._cache) == 1
+
+        oper['const'] = 0.5
+        rets_minus = pyfunc(oper)
+        rets_minus_t = func(oper)
+        assert (rets_minus == rets_minus_t).all()
+        assert len(func._cache._cache) == 2
+
+        rets_a = (rets_minus_t + rets_plus_t) / 4
+        assert torch.allclose(oper['x'], rets_a)
+
+    @skip_no_parrots
+    def test_jit_shape(self):
+
+        @mmcv.jit
+        def func(a):
+            return a + 1
+
+        assert len(func._cache._cache) == 0
+
+        a = torch.ones((3, 4))
+        r = func(a)
+        assert r.shape == (3, 4)
+        assert (r == 2).all()
+        assert len(func._cache._cache) == 1
+
+        a = torch.ones((2, 3, 4))
+        r = func(a)
+        assert r.shape == (2, 3, 4)
+        assert (r == 2).all()
+        assert len(func._cache._cache) == 2
+
+    @skip_no_parrots
+    def test_jit_kwargs(self):
+
+        @mmcv.jit
+        def func(a, b):
+            return torch.mean((a - b) * (a - b))
+
+        assert len(func._cache._cache) == 0
+        x = torch.rand((16, 32))
+        y = torch.rand((16, 32))
+        func(x, y)
+        assert len(func._cache._cache) == 1
+        func(x, b=y)
+        assert len(func._cache._cache) == 1
+        func(b=y, a=x)
+        assert len(func._cache._cache) == 1
+
+    def test_jit_derivate(self):
+
+        @mmcv.jit(derivate=True)
+        def func(x, y):
+            return (x + 2) * (y - 2)
+
+        a = torch.rand((3, 4))
+        b = torch.rand((3, 4))
+        a.requires_grad = True
+
+        c = func(a, b)
+        assert c.requires_grad
+        d = torch.empty_like(c)
+        d.fill_(1.0)
+        c.backward(d)
+        assert torch.allclose(a.grad, (b - 2))
+        assert b.grad is None
+
+        a.grad = None
+        c = func(a, b)
+        assert c.requires_grad
+        d = torch.empty_like(c)
+        d.fill_(2.7)
+        c.backward(d)
+        assert torch.allclose(a.grad, 2.7 * (b - 2))
+        assert b.grad is None
+
+    def test_jit_optimize(self):
+
+        @mmcv.jit(optimize=True)
+        def func(a, b):
+            return torch.mean((a - b) * (a - b))
+
+        def pyfunc(a, b):
+            return torch.mean((a - b) * (a - b))
+
+        a = torch.rand((16, 32))
+        b = torch.rand((16, 32))
+
+        c = func(a, b)
+        d = pyfunc(a, b)
+        assert torch.allclose(c, d)
+
+    @mmcv.skip_no_elena
+    def test_jit_coderize(self):
+        if not torch.cuda.is_available():
+            return
+
+        @mmcv.jit(coderize=True)
+        def func(a, b):
+            return (a + b) * (a - b)
+
+        def pyfunc(a, b):
+            return (a + b) * (a - b)
+
+        a = torch.rand((16, 32), device='cuda')
+        b = torch.rand((16, 32), device='cuda')
+
+        c = func(a, b)
+        d = pyfunc(a, b)
+        assert torch.allclose(c, d)
+
+    def test_jit_value_dependent(self):
+
+        @mmcv.jit
+        def func(a, b):
+            torch.nonzero(a)
+            return torch.mean((a - b) * (a - b))
+
+        def pyfunc(a, b):
+            torch.nonzero(a)
+            return torch.mean((a - b) * (a - b))
+
+        a = torch.rand((16, 32))
+        b = torch.rand((16, 32))
+
+        c = func(a, b)
+        d = pyfunc(a, b)
+        assert torch.allclose(c, d)
+
+    @skip_no_parrots
+    def test_jit_check_input(self):
+
+        def func(x):
+            y = torch.rand_like(x)
+            return x + y
+
+        a = torch.ones((3, 4))
+        with pytest.raises(AssertionError):
+            func = mmcv.jit(func, check_input=(a, ))
+
+    @skip_no_parrots
+    def test_jit_partial_shape(self):
+
+        @mmcv.jit(full_shape=False)
+        def func(a, b):
+            return torch.mean((a - b) * (a - b))
+
+        def pyfunc(a, b):
+            return torch.mean((a - b) * (a - b))
+
+        a = torch.rand((3, 4))
+        b = torch.rand((3, 4))
+        assert torch.allclose(func(a, b), pyfunc(a, b))
+        assert len(func._cache._cache) == 1
+
+        a = torch.rand((6, 5))
+        b = torch.rand((6, 5))
+        assert torch.allclose(func(a, b), pyfunc(a, b))
+        assert len(func._cache._cache) == 1
+
+        a = torch.rand((3, 4, 5))
+        b = torch.rand((3, 4, 5))
+        assert torch.allclose(func(a, b), pyfunc(a, b))
+        assert len(func._cache._cache) == 2
+
+        a = torch.rand((1, 9, 8))
+        b = torch.rand((1, 9, 8))
+        assert torch.allclose(func(a, b), pyfunc(a, b))
+        assert len(func._cache._cache) == 2
+
+    def test_instance_method(self):
+
+        class T:
+
+            def __init__(self, shape):
+                self._c = torch.rand(shape)
+
+            @mmcv.jit
+            def test_method(self, x, y):
+                return (x * self._c) + y
+
+        shape = (16, 32)
+        t = T(shape)
+        a = torch.rand(shape)
+        b = torch.rand(shape)
+        res = (a * t._c) + b
+        jit_res = t.test_method(a, b)
+        assert torch.allclose(res, jit_res)
+
+        t = T(shape)
+        res = (a * t._c) + b
+        jit_res = t.test_method(a, b)
+        assert torch.allclose(res, jit_res)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_optflow.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_optflow.py
new file mode 100644
index 0000000000000000000000000000000000000000..c5aaba3f5e062babe723d753e74e4b4451b1e452
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_optflow.py
@@ -0,0 +1,291 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import tempfile
+
+import cv2
+import numpy as np
+import pytest
+from numpy.testing import assert_array_almost_equal, assert_array_equal
+
+import mmcv
+
+
+def test_flowread():
+    data_dir = osp.join(osp.dirname(__file__), '../data')
+    flow_shape = (60, 80, 2)
+
+    # read .flo file
+    flow = mmcv.flowread(osp.join(data_dir, 'optflow.flo'))
+    assert flow.shape == flow_shape
+
+    # pseudo read
+    flow_same = mmcv.flowread(flow)
+    assert_array_equal(flow, flow_same)
+
+    # read quantized flow concatenated vertically
+    flow = mmcv.flowread(
+        osp.join(data_dir, 'optflow_concat0.jpg'), quantize=True, denorm=True)
+    assert flow.shape == flow_shape
+
+    # read quantized flow concatenated horizontally
+    flow = mmcv.flowread(
+        osp.join(data_dir, 'optflow_concat1.jpg'),
+        quantize=True,
+        concat_axis=1,
+        denorm=True)
+    assert flow.shape == flow_shape
+
+    # test exceptions
+    notflow_file = osp.join(data_dir, 'color.jpg')
+    with pytest.raises(TypeError):
+        mmcv.flowread(1)
+    with pytest.raises(IOError):
+        mmcv.flowread(notflow_file)
+    with pytest.raises(IOError):
+        mmcv.flowread(notflow_file, quantize=True)
+    with pytest.raises(ValueError):
+        mmcv.flowread(np.zeros((100, 100, 1)))
+
+
+def test_flowwrite():
+    flow = np.random.rand(100, 100, 2).astype(np.float32)
+
+    # write to a .flo file
+    tmp_filehandler, filename = tempfile.mkstemp()
+    mmcv.flowwrite(flow, filename)
+    flow_from_file = mmcv.flowread(filename)
+    assert_array_equal(flow, flow_from_file)
+    os.close(tmp_filehandler)
+    os.remove(filename)
+
+    # write to two .jpg files
+    tmp_filename = osp.join(tempfile.gettempdir(), 'mmcv_test_flow.jpg')
+    for concat_axis in range(2):
+        mmcv.flowwrite(
+            flow, tmp_filename, quantize=True, concat_axis=concat_axis)
+        shape = (200, 100) if concat_axis == 0 else (100, 200)
+        assert osp.isfile(tmp_filename)
+        assert mmcv.imread(tmp_filename, flag='unchanged').shape == shape
+        os.remove(tmp_filename)
+
+    # test exceptions
+    with pytest.raises(AssertionError):
+        mmcv.flowwrite(flow, tmp_filename, quantize=True, concat_axis=2)
+
+
+def test_quantize_flow():
+    flow = (np.random.rand(10, 8, 2).astype(np.float32) - 0.5) * 15
+    max_val = 5.0
+    dx, dy = mmcv.quantize_flow(flow, max_val=max_val, norm=False)
+    ref = np.zeros_like(flow, dtype=np.uint8)
+    for i in range(ref.shape[0]):
+        for j in range(ref.shape[1]):
+            for k in range(ref.shape[2]):
+                val = flow[i, j, k] + max_val
+                val = min(max(val, 0), 2 * max_val)
+                ref[i, j, k] = min(np.floor(255 * val / (2 * max_val)), 254)
+    assert_array_equal(dx, ref[..., 0])
+    assert_array_equal(dy, ref[..., 1])
+    max_val = 0.5
+    dx, dy = mmcv.quantize_flow(flow, max_val=max_val, norm=True)
+    ref = np.zeros_like(flow, dtype=np.uint8)
+    for i in range(ref.shape[0]):
+        for j in range(ref.shape[1]):
+            for k in range(ref.shape[2]):
+                scale = flow.shape[1] if k == 0 else flow.shape[0]
+                val = flow[i, j, k] / scale + max_val
+                val = min(max(val, 0), 2 * max_val)
+                ref[i, j, k] = min(np.floor(255 * val / (2 * max_val)), 254)
+    assert_array_equal(dx, ref[..., 0])
+    assert_array_equal(dy, ref[..., 1])
+
+
+def test_dequantize_flow():
+    dx = np.random.randint(256, size=(10, 8), dtype=np.uint8)
+    dy = np.random.randint(256, size=(10, 8), dtype=np.uint8)
+    max_val = 5.0
+    flow = mmcv.dequantize_flow(dx, dy, max_val=max_val, denorm=False)
+    ref = np.zeros_like(flow, dtype=np.float32)
+    for i in range(ref.shape[0]):
+        for j in range(ref.shape[1]):
+            ref[i, j, 0] = float(dx[i, j] + 0.5) * 2 * max_val / 255 - max_val
+            ref[i, j, 1] = float(dy[i, j] + 0.5) * 2 * max_val / 255 - max_val
+    assert_array_almost_equal(flow, ref)
+    max_val = 0.5
+    flow = mmcv.dequantize_flow(dx, dy, max_val=max_val, denorm=True)
+    h, w = dx.shape
+    ref = np.zeros_like(flow, dtype=np.float32)
+    for i in range(ref.shape[0]):
+        for j in range(ref.shape[1]):
+            ref[i, j,
+                0] = (float(dx[i, j] + 0.5) * 2 * max_val / 255 - max_val) * w
+            ref[i, j,
+                1] = (float(dy[i, j] + 0.5) * 2 * max_val / 255 - max_val) * h
+    assert_array_almost_equal(flow, ref)
+
+
+def test_flow2rgb():
+    flow = np.array([[[0, 0], [0.5, 0.5], [1, 1], [2, 1], [3, np.inf]]],
+                    dtype=np.float32)
+    flow_img = mmcv.flow2rgb(flow)
+    # yapf: disable
+    assert_array_almost_equal(
+        flow_img,
+        np.array([[[1., 1., 1.],
+                   [1., 0.826074731, 0.683772236],
+                   [1., 0.652149462, 0.367544472],
+                   [1., 0.265650552, 5.96046448e-08],
+                   [0., 0., 0.]]],
+                 dtype=np.float32))
+    # yapf: enable
+
+
+def test_flow_warp():
+
+    img = np.zeros((5, 5, 3))
+    img[2, 2, 0] = 1
+    flow = np.ones((5, 5, 2))
+
+    res_nn = mmcv.flow_warp(img, flow, interpolate_mode='nearest')
+    res_bi = mmcv.flow_warp(img, flow, interpolate_mode='bilinear')
+
+    assert_array_almost_equal(res_nn, res_bi, decimal=5)
+
+    img = np.zeros((5, 5, 1))
+    img[2, 2, 0] = 1
+    img[2, 3, 0] = 0.75
+    flow = np.zeros((5, 5, 2))
+    flow[2, 2, :] = [0.5, 0.7]
+
+    res_ = np.copy(img)
+    res_[2, 2] = 0.5 * 0.3 + 0.75 * 0.5 * 0.3
+    res_bi = mmcv.flow_warp(img, flow, interpolate_mode='bilinear')
+    assert_array_almost_equal(res_, res_bi, decimal=5)
+
+    with pytest.raises(NotImplementedError):
+        _ = mmcv.flow_warp(img, flow, interpolate_mode='xxx')
+
+    with pytest.raises(AssertionError):
+        _ = mmcv.flow_warp(img, flow[:, :, 0], interpolate_mode='xxx')
+
+
+def test_make_color_wheel():
+    default_color_wheel = mmcv.make_color_wheel()
+    color_wheel = mmcv.make_color_wheel([2, 2, 2, 2, 2, 2])
+    # yapf: disable
+    assert_array_equal(default_color_wheel, np.array(
+        [[1.       , 0.        , 0.        ],  # noqa
+        [1.        , 0.06666667, 0.        ],  # noqa
+        [1.        , 0.13333334, 0.        ],  # noqa
+        [1.        , 0.2       , 0.        ],  # noqa
+        [1.        , 0.26666668, 0.        ],  # noqa
+        [1.        , 0.33333334, 0.        ],  # noqa
+        [1.        , 0.4       , 0.        ],  # noqa
+        [1.        , 0.46666667, 0.        ],  # noqa
+        [1.        , 0.53333336, 0.        ],  # noqa
+        [1.        , 0.6       , 0.        ],  # noqa
+        [1.        , 0.6666667 , 0.        ],  # noqa
+        [1.        , 0.73333335, 0.        ],  # noqa
+        [1.        , 0.8       , 0.        ],  # noqa
+        [1.        , 0.8666667 , 0.        ],  # noqa
+        [1.        , 0.93333334, 0.        ],  # noqa
+        [1.        , 1.        , 0.        ],  # noqa
+        [0.8333333 , 1.        , 0.        ],  # noqa
+        [0.6666667 , 1.        , 0.        ],  # noqa
+        [0.5       , 1.        , 0.        ],  # noqa
+        [0.33333334, 1.        , 0.        ],  # noqa
+        [0.16666667, 1.        , 0.        ],  # noqa
+        [0.        , 1.        , 0.        ],  # noqa
+        [0.        , 1.        , 0.25      ],  # noqa
+        [0.        , 1.        , 0.5       ],  # noqa
+        [0.        , 1.        , 0.75      ],  # noqa
+        [0.        , 1.        , 1.        ],  # noqa
+        [0.        , 0.90909094, 1.        ],  # noqa
+        [0.        , 0.8181818 , 1.        ],  # noqa
+        [0.        , 0.72727275, 1.        ],  # noqa
+        [0.        , 0.6363636 , 1.        ],  # noqa
+        [0.        , 0.54545456, 1.        ],  # noqa
+        [0.        , 0.45454547, 1.        ],  # noqa
+        [0.        , 0.36363637, 1.        ],  # noqa
+        [0.        , 0.27272728, 1.        ],  # noqa
+        [0.        , 0.18181819, 1.        ],  # noqa
+        [0.        , 0.09090909, 1.        ],  # noqa
+        [0.        , 0.        , 1.        ],  # noqa
+        [0.07692308, 0.        , 1.        ],  # noqa
+        [0.15384616, 0.        , 1.        ],  # noqa
+        [0.23076923, 0.        , 1.        ],  # noqa
+        [0.30769232, 0.        , 1.        ],  # noqa
+        [0.3846154 , 0.        , 1.        ],  # noqa
+        [0.46153846, 0.        , 1.        ],  # noqa
+        [0.53846157, 0.        , 1.        ],  # noqa
+        [0.61538464, 0.        , 1.        ],  # noqa
+        [0.6923077 , 0.        , 1.        ],  # noqa
+        [0.7692308 , 0.        , 1.        ],  # noqa
+        [0.84615386, 0.        , 1.        ],  # noqa
+        [0.9230769 , 0.        , 1.        ],  # noqa
+        [1.        , 0.        , 1.        ],  # noqa
+        [1.        , 0.        , 0.8333333 ],  # noqa
+        [1.        , 0.        , 0.6666667 ],  # noqa
+        [1.        , 0.        , 0.5       ],  # noqa
+        [1.        , 0.        , 0.33333334],  # noqa
+        [1.        , 0.        , 0.16666667]], dtype=np.float32))  # noqa
+
+    assert_array_equal(
+        color_wheel,
+        np.array([[1., 0. , 0. ],  # noqa
+                 [1. , 0.5, 0. ],  # noqa
+                 [1. , 1. , 0. ],  # noqa
+                 [0.5, 1. , 0. ],  # noqa
+                 [0. , 1. , 0. ],  # noqa
+                 [0. , 1. , 0.5],  # noqa
+                 [0. , 1. , 1. ],  # noqa
+                 [0. , 0.5, 1. ],  # noqa
+                 [0. , 0. , 1. ],  # noqa
+                 [0.5, 0. , 1. ],  # noqa
+                 [1. , 0. , 1. ],  # noqa
+                 [1. , 0. , 0.5]], dtype=np.float32))  # noqa
+    # yapf: enable
+
+
+def test_flow_from_bytes():
+    data_dir = osp.join(osp.dirname(__file__), '../data')
+    flow_shape = (60, 80, 2)
+    flow_file = osp.join(data_dir, 'optflow.flo')
+
+    # read .flo file
+    flow_fromfile = mmcv.flowread(flow_file)
+
+    with open(flow_file, 'rb') as f:
+        flow_bytes = f.read()
+    flow_frombytes = mmcv.flow_from_bytes(flow_bytes)
+
+    assert flow_frombytes.shape == flow_shape
+    assert np.all(flow_frombytes == flow_fromfile)
+
+
+def test_sparse_flow_from_bytes():
+    data_dir = osp.join(osp.dirname(__file__), '../data')
+    flow_file = osp.join(data_dir, 'sparse_flow.png')
+
+    with open(flow_file, 'rb') as f:
+        flow_bytes = f.read()
+    # read flow from bytes
+    flow_frombytes, valid_frombytes = mmcv.sparse_flow_from_bytes(flow_bytes)
+
+    # test flow shape is [H, W, 2] and valid shape is [H, W]
+    assert flow_frombytes.shape[:2] == valid_frombytes.shape
+    assert flow_frombytes.shape[2] == 2
+
+    def read_sparse_flow_from_file():
+        flow = cv2.imread(flow_file, cv2.IMREAD_ANYDEPTH | cv2.IMREAD_COLOR)
+        flow = flow[:, :, ::-1].astype(np.float32)
+        flow, valid = flow[:, :, :2], flow[:, :, 2]
+        flow = (flow - 2**15) / 64.0
+        return flow, valid
+
+    # read flow from file
+    flow_flowfile, valid_fromfile = read_sparse_flow_from_file()
+
+    assert np.all(flow_frombytes == flow_flowfile)
+    assert np.all(valid_frombytes == valid_fromfile)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_processing.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_processing.py
new file mode 100644
index 0000000000000000000000000000000000000000..88c37a2bd3f1f353e1b402119226b4725cabe904
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_processing.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import platform
+import tempfile
+
+import pytest
+
+import mmcv
+
+
+class TestVideoEditor:
+
+    @classmethod
+    def setup_class(cls):
+        cls.video_path = osp.join(osp.dirname(__file__), '../data/test.mp4')
+        cls.num_frames = 168
+
+    @pytest.mark.skipif(platform.system() == 'Windows', reason='skip windows')
+    def test_cut_concat_video(self):
+        part1_file = osp.join(tempfile.gettempdir(), '.mmcv_test1.mp4')
+        part2_file = osp.join(tempfile.gettempdir(), '.mmcv_test2.mp4')
+        mmcv.cut_video(self.video_path, part1_file, end=3, vcodec='h264')
+        mmcv.cut_video(self.video_path, part2_file, start=3, vcodec='h264')
+        v1 = mmcv.VideoReader(part1_file)
+        v2 = mmcv.VideoReader(part2_file)
+        assert len(v1) == 75
+        assert len(v2) == self.num_frames - 75
+
+        out_file = osp.join(tempfile.gettempdir(), '.mmcv_test.mp4')
+        mmcv.concat_video([part1_file, part2_file], out_file)
+        v = mmcv.VideoReader(out_file)
+        assert len(v) == self.num_frames
+        os.remove(part1_file)
+        os.remove(part2_file)
+        os.remove(out_file)
+
+    @pytest.mark.skipif(platform.system() == 'Windows', reason='skip windows')
+    def test_resize_video(self):
+        out_file = osp.join(tempfile.gettempdir(), '.mmcv_test.mp4')
+        mmcv.resize_video(
+            self.video_path, out_file, (200, 100), log_level='panic')
+        v = mmcv.VideoReader(out_file)
+        assert v.resolution == (200, 100)
+        os.remove(out_file)
+        mmcv.resize_video(self.video_path, out_file, ratio=2)
+        v = mmcv.VideoReader(out_file)
+        assert v.resolution == (294 * 2, 240 * 2)
+        os.remove(out_file)
+        mmcv.resize_video(self.video_path, out_file, (1000, 480), keep_ar=True)
+        v = mmcv.VideoReader(out_file)
+        assert v.resolution == (294 * 2, 240 * 2)
+        os.remove(out_file)
+        mmcv.resize_video(
+            self.video_path, out_file, ratio=(2, 1.5), keep_ar=True)
+        v = mmcv.VideoReader(out_file)
+        assert v.resolution == (294 * 2, 360)
+        os.remove(out_file)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_reader.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..c3bbdb7dcbbdd42e3c1e5ffefccbbf8b5c6c3897
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_video/test_reader.py
@@ -0,0 +1,210 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import shutil
+import tempfile
+from collections import OrderedDict
+
+import pytest
+
+import mmcv
+
+
+class TestCache:
+
+    def test_init(self):
+        with pytest.raises(ValueError):
+            mmcv.Cache(0)
+        cache = mmcv.Cache(100)
+        assert cache.capacity == 100
+        assert cache.size == 0
+
+    def test_put(self):
+        cache = mmcv.Cache(3)
+        for i in range(1, 4):
+            cache.put(f'k{i}', i)
+            assert cache.size == i
+        assert cache._cache == OrderedDict([('k1', 1), ('k2', 2), ('k3', 3)])
+        cache.put('k4', 4)
+        assert cache.size == 3
+        assert cache._cache == OrderedDict([('k2', 2), ('k3', 3), ('k4', 4)])
+        cache.put('k2', 2)
+        assert cache._cache == OrderedDict([('k2', 2), ('k3', 3), ('k4', 4)])
+
+    def test_get(self):
+        cache = mmcv.Cache(3)
+        assert cache.get('key_none') is None
+        assert cache.get('key_none', 0) == 0
+        cache.put('k1', 1)
+        assert cache.get('k1') == 1
+
+
+class TestVideoReader:
+
+    @classmethod
+    def setup_class(cls):
+        cls.video_path = osp.join(osp.dirname(__file__), '../data/test.mp4')
+        cls.num_frames = 168
+        cls.video_url = 'https://download.openmmlab.com/mmcv/test_data/sample-mp4-file.mp4'  # noqa: E501
+
+    def test_load(self):
+        # read from video file
+        v = mmcv.VideoReader(self.video_path)
+        assert v.width == 294
+        assert v.height == 240
+        assert v.fps == 25
+        assert v.frame_cnt == self.num_frames
+        assert len(v) == self.num_frames
+        assert v.opened
+        import cv2
+        assert isinstance(v.vcap, type(cv2.VideoCapture()))
+
+        # read from video url
+        v = mmcv.VideoReader(self.video_url)
+        assert v.width == 320
+        assert v.height == 240
+        assert v.fps == 15
+        assert v.frame_cnt == 1889
+        assert len(v) == 1889
+        assert v.opened
+        assert isinstance(v.vcap, type(cv2.VideoCapture()))
+
+    def test_read(self):
+        v = mmcv.VideoReader(self.video_path)
+        img = v.read()
+        assert int(round(img.mean())) == 94
+        img = v.get_frame(63)
+        assert int(round(img.mean())) == 94
+        img = v[64]
+        assert int(round(img.mean())) == 205
+        img = v[-104]
+        assert int(round(img.mean())) == 205
+        img = v[63]
+        assert int(round(img.mean())) == 94
+        img = v[-105]
+        assert int(round(img.mean())) == 94
+        img = v.read()
+        assert int(round(img.mean())) == 205
+        with pytest.raises(IndexError):
+            v.get_frame(self.num_frames + 1)
+        with pytest.raises(IndexError):
+            v[-self.num_frames - 1]
+
+    def test_slice(self):
+        v = mmcv.VideoReader(self.video_path)
+        imgs = v[-105:-103]
+        assert int(round(imgs[0].mean())) == 94
+        assert int(round(imgs[1].mean())) == 205
+        assert len(imgs) == 2
+        imgs = v[63:65]
+        assert int(round(imgs[0].mean())) == 94
+        assert int(round(imgs[1].mean())) == 205
+        assert len(imgs) == 2
+        imgs = v[64:62:-1]
+        assert int(round(imgs[0].mean())) == 205
+        assert int(round(imgs[1].mean())) == 94
+        assert len(imgs) == 2
+        imgs = v[:5]
+        assert len(imgs) == 5
+        for img in imgs:
+            assert int(round(img.mean())) == 94
+        imgs = v[165:]
+        assert len(imgs) == 3
+        for img in imgs:
+            assert int(round(img.mean())) == 0
+        imgs = v[-3:]
+        assert len(imgs) == 3
+        for img in imgs:
+            assert int(round(img.mean())) == 0
+
+    def test_current_frame(self):
+        v = mmcv.VideoReader(self.video_path)
+        assert v.current_frame() is None
+        v.read()
+        img = v.current_frame()
+        assert int(round(img.mean())) == 94
+
+    def test_position(self):
+        v = mmcv.VideoReader(self.video_path)
+        assert v.position == 0
+        for _ in range(10):
+            v.read()
+        assert v.position == 10
+        v.get_frame(99)
+        assert v.position == 100
+
+    def test_iterator(self):
+        cnt = 0
+        for img in mmcv.VideoReader(self.video_path):
+            cnt += 1
+            assert img.shape == (240, 294, 3)
+        assert cnt == self.num_frames
+
+    def test_with(self):
+        with mmcv.VideoReader(self.video_path) as v:
+            assert v.opened
+        assert not v.opened
+
+    def test_cvt2frames(self):
+        v = mmcv.VideoReader(self.video_path)
+        frame_dir = tempfile.mkdtemp()
+        v.cvt2frames(frame_dir)
+        assert osp.isdir(frame_dir)
+        for i in range(self.num_frames):
+            filename = f'{frame_dir}/{i:06d}.jpg'
+            assert osp.isfile(filename)
+            os.remove(filename)
+
+        v = mmcv.VideoReader(self.video_path)
+        v.cvt2frames(frame_dir, show_progress=False)
+        assert osp.isdir(frame_dir)
+        for i in range(self.num_frames):
+            filename = f'{frame_dir}/{i:06d}.jpg'
+            assert osp.isfile(filename)
+            os.remove(filename)
+
+        v = mmcv.VideoReader(self.video_path)
+        v.cvt2frames(
+            frame_dir,
+            file_start=100,
+            filename_tmpl='{:03d}.JPEG',
+            start=100,
+            max_num=20)
+        assert osp.isdir(frame_dir)
+        for i in range(100, 120):
+            filename = f'{frame_dir}/{i:03d}.JPEG'
+            assert osp.isfile(filename)
+            os.remove(filename)
+        shutil.rmtree(frame_dir)
+
+    def test_frames2video(self):
+        v = mmcv.VideoReader(self.video_path)
+        frame_dir = tempfile.mkdtemp()
+        v.cvt2frames(frame_dir)
+        assert osp.isdir(frame_dir)
+        for i in range(self.num_frames):
+            filename = f'{frame_dir}/{i:06d}.jpg'
+            assert osp.isfile(filename)
+
+        out_filename = osp.join(tempfile.gettempdir(), 'mmcv_test.avi')
+        mmcv.frames2video(frame_dir, out_filename)
+        v = mmcv.VideoReader(out_filename)
+        assert v.fps == 30
+        assert len(v) == self.num_frames
+
+        mmcv.frames2video(
+            frame_dir,
+            out_filename,
+            fps=25,
+            start=10,
+            end=50,
+            show_progress=False)
+
+        with mmcv.VideoReader(out_filename) as v:
+            assert v.fps == 25
+            assert len(v) == 40
+
+            for i in range(self.num_frames):
+                filename = f'{frame_dir}/{i:06d}.jpg'
+                os.remove(filename)
+            shutil.rmtree(frame_dir)
diff --git a/cv/distiller/CWD/pytorch/mmcv/tests/test_visualization.py b/cv/distiller/CWD/pytorch/mmcv/tests/test_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..82dd093bf8b6b97d196396d0ff79cde8d239b119
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmcv/tests/test_visualization.py
@@ -0,0 +1,19 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+import pytest
+
+import mmcv
+
+
+def test_color():
+    assert mmcv.color_val(mmcv.Color.blue) == (255, 0, 0)
+    assert mmcv.color_val('green') == (0, 255, 0)
+    assert mmcv.color_val((1, 2, 3)) == (1, 2, 3)
+    assert mmcv.color_val(100) == (100, 100, 100)
+    assert mmcv.color_val(np.zeros(3, dtype=int)) == (0, 0, 0)
+    with pytest.raises(TypeError):
+        mmcv.color_val([255, 255, 255])
+    with pytest.raises(TypeError):
+        mmcv.color_val(1.0)
+    with pytest.raises(AssertionError):
+        mmcv.color_val((0, 0, 500))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.circleci/config.yml b/cv/distiller/CWD/pytorch/mmrazor/.circleci/config.yml
new file mode 100644
index 0000000000000000000000000000000000000000..27e235306b3382385b9377f1f10da8b55cf283c4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.circleci/config.yml
@@ -0,0 +1,34 @@
+version: 2.1
+
+# this allows you to use CircleCI's dynamic configuration feature
+setup: true
+
+# the path-filtering orb is required to continue a pipeline based on
+# the path of an updated fileset
+orbs:
+  path-filtering: circleci/path-filtering@0.1.2
+
+workflows:
+  # the always-run workflow is always triggered, regardless of the pipeline parameters.
+  always-run:
+    jobs:
+      # the path-filtering/filter job determines which pipeline
+      # parameters to update.
+      - path-filtering/filter:
+          name: check-updated-files
+          # 3-column, whitespace-delimited mapping. One mapping per
+          # line:
+          # <regex path-to-test> <parameter-to-set> <value-of-pipeline-parameter>
+          mapping: |
+            mmrazor/.* lint_only false
+            requirements/.* lint_only false
+            tests/.* lint_only false
+            tools/.* lint_only false
+            configs/.* lint_only false
+            .circleci/.* lint_only false
+          base-revision: main
+          # this is the path of the configuration we should trigger once
+          # path filtering and pipeline parameter value updates are
+          # complete. In this case, we are using the parent dynamic
+          # configuration itself.
+          config-path: .circleci/test.yml
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.circleci/docker/Dockerfile b/cv/distiller/CWD/pytorch/mmrazor/.circleci/docker/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..d9cf8cc7712d5241975c3b748fb0d01a5545b4fd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.circleci/docker/Dockerfile
@@ -0,0 +1,11 @@
+ARG PYTORCH="1.8.1"
+ARG CUDA="10.2"
+ARG CUDNN="7"
+
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+
+# To fix GPG key error when running apt-get update
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+
+RUN apt-get update && apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.circleci/test.yml b/cv/distiller/CWD/pytorch/mmrazor/.circleci/test.yml
new file mode 100644
index 0000000000000000000000000000000000000000..9acc7fdfcd698784d55c1aff897367176b872a6f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.circleci/test.yml
@@ -0,0 +1,193 @@
+version: 2.1
+
+# the default pipeline parameters, which will be updated according to
+# the results of the path-filtering orb
+parameters:
+  lint_only:
+    type: boolean
+    default: true
+
+jobs:
+  lint:
+    docker:
+      - image: cimg/python:3.7.4
+    steps:
+      - checkout
+      - run:
+          name: Install pre-commit hook
+          command: |
+            pip install pre-commit
+            pre-commit install
+      - run:
+          name: Linting
+          command: pre-commit run --all-files
+      - run:
+          name: Check docstring coverage
+          command: |
+            pip install interrogate
+            interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-magic --ignore-regex "__repr__" --fail-under 80 mmrazor
+  build_cpu:
+    parameters:
+      # The python version must match available image tags in
+      # https://circleci.com/developer/images/image/cimg/python
+      python:
+        type: string
+      torch:
+        type: string
+      torchvision:
+        type: string
+    docker:
+      - image: cimg/python:<< parameters.python >>
+    resource_class: large
+    steps:
+      - checkout
+      - run:
+          name: Install Libraries
+          command: |
+            sudo apt-get update
+            sudo apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx libjpeg-dev zlib1g-dev libtinfo-dev libncurses5
+      - run:
+          name: Configure Python & pip
+          command: |
+            pip install --upgrade pip
+            pip install wheel
+      - run:
+          name: Install PyTorch
+          command: |
+            python -V
+            pip install torch==<< parameters.torch >>+cpu torchvision==<< parameters.torchvision >>+cpu -f https://download.pytorch.org/whl/torch_stable.html
+      - when:
+          condition:
+            equal: ["3.9.0", << parameters.python >>]
+          steps:
+            - run: pip install "protobuf <= 3.20.1" && sudo apt-get update && sudo apt-get -y install libprotobuf-dev protobuf-compiler cmake
+      - run:
+          name: Install mmrazor dependencies
+          command: |
+            pip install git+https://github.com/open-mmlab/mmengine.git@main
+            pip install -U openmim
+            mim install 'mmcv >= 2.0.0rc1'
+            pip install git+https://github.com/open-mmlab/mmpretrain.git@mmcls-1.x
+            pip install git+https://github.com/open-mmlab/mmdetection.git@main
+            pip install git+https://github.com/open-mmlab/mmsegmentation.git@main
+            python -m pip install git+ssh://git@github.com/open-mmlab/mmpose.git@main
+            pip install -r requirements.txt
+      - run:
+          name: Build and install
+          command: |
+            pip install -e .
+      - run:
+          name: Run unittests
+          command: |
+            coverage run --branch --source mmrazor -m pytest tests/
+            coverage xml
+            coverage report -m
+  build_cuda:
+    parameters:
+      torch:
+        type: string
+      cuda:
+        type: enum
+        enum: ["10.1", "10.2", "11.1"]
+      cudnn:
+        type: integer
+        default: 7
+    machine:
+      image: ubuntu-2004-cuda-11.4:202110-01
+      # docker_layer_caching: true
+    resource_class: gpu.nvidia.small
+    steps:
+      - checkout
+      - run:
+          # Cloning repos in VM since Docker doesn't have access to the private key
+          name: Clone Repos
+          command: |
+            git clone -b main --depth 1 https://github.com/open-mmlab/mmengine.git /home/circleci/mmengine
+            git clone -b main --depth 1 https://github.com/open-mmlab/mmdetection.git /home/circleci/mmdetection
+            git clone -b 1.x --depth 1 https://github.com/open-mmlab/mmclassification.git /home/circleci/mmclassification
+            git clone -b main --depth 1 https://github.com/open-mmlab/mmsegmentation.git /home/circleci/mmsegmentation
+      - run:
+          name: Build Docker image
+          command: |
+            docker build .circleci/docker -t mmrazor:gpu --build-arg PYTORCH=<< parameters.torch >> --build-arg CUDA=<< parameters.cuda >> --build-arg CUDNN=<< parameters.cudnn >>
+            docker run --gpus all -t -d -v /home/circleci/project:/mmrazor -v /home/circleci/mmengine:/mmengine -v /home/circleci/mmdetection:/mmdetection -v /home/circleci/mmclassification:/mmclassification -v /home/circleci/mmsegmentation:/mmsegmentation -w /mmrazor --name mmrazor mmrazor:gpu
+      - run:
+          name: Install mmrazor dependencies
+          command: |
+            docker exec mmrazor pip install -e /mmengine
+            docker exec mmrazor pip install -U openmim
+            docker exec mmrazor mim install 'mmcv >= 2.0.0rc1'
+            docker exec mmrazor pip install -e /mmdetection
+            docker exec mmrazor pip install -e /mmclassification
+            docker exec mmrazor pip install -e /mmsegmentation
+            docker exec mmrazor pip install -r requirements.txt
+      - run:
+          name: Build and install
+          command: |
+            docker exec mmrazor pip install -e .
+      - run:
+          name: Run unittests
+          command: |
+            docker exec mmrazor pytest tests/
+
+workflows:
+  pr_stage_lint:
+    when: << pipeline.parameters.lint_only >>
+    jobs:
+      - lint:
+          name: lint
+          filters:
+            branches:
+              ignore:
+                - main
+                - 1.x
+  pr_stage_test:
+    when:
+      not: << pipeline.parameters.lint_only >>
+    jobs:
+      - lint:
+          name: lint
+          filters:
+            branches:
+              ignore:
+                - main
+      - build_cpu:
+          name: minimum_version_cpu
+          torch: 1.8.1
+          torchvision: 0.9.1
+          python: 3.7.4
+          requires:
+            - lint
+      - build_cpu:
+          name: maximum_version_cpu
+          torch: 1.13.1
+          torchvision: 0.14.1
+          python: 3.9.0
+          requires:
+            - lint
+      - hold:
+          type: approval
+          requires:
+            - maximum_version_cpu
+      - build_cuda:
+          name: mainstream_version_gpu
+          torch: 1.8.1
+          # Use double quotation mark to explicitly specify its type
+          # as string instead of number
+          cuda: "10.2"
+          requires:
+            - hold
+  merge_stage_test:
+    when:
+      not: << pipeline.parameters.lint_only >>
+    jobs:
+      - build_cuda:
+          name: minimum_version_gpu
+          torch: 1.8.1
+          # Use double quotation mark to explicitly specify its type
+          # as string instead of number
+          cuda: "10.2"
+          filters:
+            branches:
+              only:
+                - main
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_summary_analyse.py b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_summary_analyse.py
new file mode 100644
index 0000000000000000000000000000000000000000..372e1326c04ad273975079376f0d6389963f2de2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_summary_analyse.py
@@ -0,0 +1,67 @@
+import argparse
+import os
+
+import mmengine
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Analyse summary.yml generated by benchmark test')
+    parser.add_argument('file_path', help='Summary.yml path')
+    args = parser.parse_args()
+    return args
+
+
+metric_mapping = {
+    'Top 1 Accuracy': 'accuracy/top1',
+    'Top 5 Accuracy': 'accuracy/top5',
+    'box AP': 'coco/bbox_mAP',
+    'mIoU': 'mIoU'
+}
+
+
+def compare_metric(result, metric):
+    expect_val = result['expect'][metric]
+    actual_val = result['actual'].get(metric_mapping[metric], None)
+    if actual_val is None:
+        return None, None
+    if metric == 'box AP':
+        actual_val *= 100
+    decimal_bit = len(str(expect_val).split('.')[-1])
+    actual_val = round(actual_val, decimal_bit)
+    error = round(actual_val - expect_val, decimal_bit)
+    error_percent = round(abs(error) * 100 / expect_val, 3)
+    return error, error_percent
+
+
+def main():
+    args = parse_args()
+    file_path = args.file_path
+    results = mmengine.load(file_path, 'yml')
+    miss_models = dict()
+    sort_by_error = dict()
+    for k, v in results.items():
+        valid_keys = v['expect'].keys()
+        compare_res = dict()
+        for m in valid_keys:
+            error, error_percent = compare_metric(v, m)
+            if error is None:
+                continue
+            compare_res[m] = {'error': error, 'error_percent': error_percent}
+            if error != 0:
+                miss_models[k] = compare_res
+                sort_by_error[k] = error
+    sort_by_error = sorted(
+        sort_by_error.items(), key=lambda x: abs(x[1]), reverse=True)
+    miss_models_sort = dict()
+    miss_models_sort['total error models'] = len(sort_by_error)
+    for k_v in sort_by_error:
+        index = k_v[0]
+        miss_models_sort[index] = miss_models[index]
+    save_path = os.path.join(os.path.dirname(file_path), 'summary_error.yml')
+    mmengine.fileio.dump(miss_models_sort, save_path, sort_keys=False)
+    print(f'Summary analysis result saved in {save_path}')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_test.py b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_test.py
new file mode 100644
index 0000000000000000000000000000000000000000..1af3e4fa48eb2ea290b47b05c4680327660761fe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_test.py
@@ -0,0 +1,307 @@
+import argparse
+import os
+import os.path as osp
+import re
+from collections import OrderedDict
+from pathlib import Path
+
+import mmengine
+import wget
+from modelindex.load_model_index import load
+from rich.console import Console
+from rich.syntax import Syntax
+from rich.table import Table
+
+console = Console()
+MMRAZOR_ROOT = Path(__file__).absolute().parents[1]
+
+METRIC_MAPPINGS = {
+    'accuracy/top1': 'Top 1 Accuracy',
+    'accuracy/top5': 'Top 5 Accuracy'
+}
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description="Test all models' accuracy in model-index.yml")
+    parser.add_argument('checkpoint_root', help='Checkpoint file root path.')
+    parser.add_argument(
+        '--partition', type=str, help='Cluster partition to use.')
+    parser.add_argument(
+        '--job-name',
+        type=str,
+        default='razor-test-benchmark',
+        help='Slurm job name prefix')
+    parser.add_argument('--port', type=int, default=29666, help='dist port')
+    parser.add_argument(
+        '--models', nargs='+', type=str, help='Specify model names to run.')
+    parser.add_argument('--gpus', type=int, default=8, help='num gpus')
+    parser.add_argument(
+        '--work-dir',
+        default='work_dirs/benchmark_test',
+        help='the dir to save metric')
+    parser.add_argument(
+        '--replace-ceph', action='store_true', help='load data from ceph')
+    parser.add_argument(
+        '--run', action='store_true', help='run script directly')
+    parser.add_argument(
+        '--summary', action='store_true', help='collect results')
+    parser.add_argument(
+        '--local',
+        action='store_true',
+        help='run at local instead of cluster.')
+    parser.add_argument(
+        '--mail', type=str, help='Mail address to watch test status.')
+    parser.add_argument(
+        '--mail-type',
+        nargs='+',
+        default=['BEGIN'],
+        choices=['NONE', 'BEGIN', 'END', 'FAIL', 'REQUEUE', 'ALL'],
+        help='Mail address to watch test status.')
+    parser.add_argument(
+        '--quotatype',
+        default=None,
+        choices=['reserved', 'auto', 'spot'],
+        help='Quota type, only available for phoenix-slurm>=0.2')
+
+    args = parser.parse_args()
+    return args
+
+
+def replace_to_ceph(cfg):
+
+    file_client_args = dict(
+        backend='petrel',
+        path_mapping=dict({
+            './data/coco':
+            's3://openmmlab/datasets/detection/coco',
+            'data/coco':
+            's3://openmmlab/datasets/detection/coco',
+            './data/cityscapes':
+            's3://openmmlab/datasets/segmentation/cityscapes',
+            'data/cityscapes':
+            's3://openmmlab/datasets/segmentation/cityscapes',
+            './data/imagenet':
+            's3://openmmlab/datasets/classification/imagenet',
+            'data/imagenet':
+            's3://openmmlab/datasets/classification/imagenet',
+        }))
+
+    def _process_dataset(dataset):
+
+        def replace_pipline(pipelines):
+            for pipeline in pipelines:
+                if pipeline['type'] in [
+                        'LoadImageFromFile',
+                        'LoadAnnotations',
+                        'LoadPanopticAnnotations',
+                ]:
+                    pipeline['file_client_args'] = file_client_args
+
+        if dataset['type'] in ['CityscapesDataset']:
+            dataset['file_client_args'] = file_client_args
+        if 'pipeline' in dataset:
+            replace_pipline(dataset['pipeline'])
+        if 'dataset' in dataset:
+            _process_dataset(dataset['dataset'])
+
+    def _process_evaluator(evaluator):
+        if evaluator['type'] == 'CocoPanopticMetric':
+            evaluator['file_client_args'] = file_client_args
+
+    # half ceph
+    _process_dataset(cfg.train_dataloader.dataset, )
+    _process_dataset(cfg.val_dataloader.dataset)
+    _process_dataset(cfg.test_dataloader.dataset)
+    _process_evaluator(cfg.val_evaluator)
+    _process_evaluator(cfg.test_evaluator)
+
+
+def create_test_job_batch(commands, model_info, args, port):
+
+    fname = model_info.name
+
+    cfg_path = Path(model_info.config)
+
+    cfg = mmengine.Config.fromfile(cfg_path)
+
+    if args.replace_ceph:
+        replace_to_ceph(cfg)
+
+    http_prefix = 'https://download.openmmlab.com/mmrazor/'
+    if 's3://' in args.checkpoint_root:
+        from mmengine.fileio import FileClient
+        from petrel_client.common.exception import AccessDeniedError
+        file_client = FileClient.infer_client(uri=args.checkpoint_root)
+        checkpoint = file_client.join_path(
+            args.checkpoint_root, model_info.weights[len(http_prefix):])
+
+        try:
+            exists = file_client.exists(checkpoint)
+        except AccessDeniedError:
+            exists = False
+    else:
+        checkpoint_root = Path(args.checkpoint_root)
+        checkpoint = checkpoint_root / model_info.weights[len(http_prefix):]
+        checkpoint.parent.mkdir(parents=True, exist_ok=True)
+        exists = checkpoint.exists()
+    if exists:
+        print(f'{checkpoint} already exists.')
+    else:
+        print(f'start downloading {fname}')
+        wget.download(model_info.weights, str(checkpoint))
+        print(f'\nSaved in {checkpoint}.')
+
+    job_name = f'{args.job_name}_{fname}'
+    work_dir = Path(args.work_dir) / fname
+    work_dir.mkdir(parents=True, exist_ok=True)
+    test_cfg_path = work_dir / 'config.py'
+    cfg.dump(test_cfg_path)
+
+    if args.quotatype is not None:
+        quota_cfg = f'#SBATCH --quotatype {args.quotatype}\n'
+    else:
+        quota_cfg = ''
+
+    launcher = 'none' if args.local else 'slurm'
+    runner = 'python' if args.local else 'srun python'
+    master_port = f'MASTER_PORT={port}'
+
+    script_name = osp.join('tools', 'test.py')
+    job_script = (f'#!/bin/bash\n'
+                  f'#SBATCH --output {work_dir}/job.%j.out\n'
+                  f'#SBATCH --partition={args.partition}\n'
+                  f'#SBATCH --job-name {job_name}\n'
+                  f'#SBATCH --gres=gpu:{args.gpus}\n'
+                  f'{quota_cfg}'
+                  f'#SBATCH --ntasks-per-node={args.gpus}\n'
+                  f'#SBATCH --ntasks={args.gpus}\n'
+                  f'#SBATCH --cpus-per-task=5\n\n'
+                  f'{master_port} {runner} -u {script_name} '
+                  f'{test_cfg_path} {checkpoint} '
+                  f'--work-dir {work_dir} '
+                  f'--launcher={launcher}\n')
+
+    with open(work_dir / 'job.sh', 'w') as f:
+        f.write(job_script)
+
+    commands.append(f'echo "{test_cfg_path}"')
+    if args.local:
+        commands.append(f'bash {work_dir}/job.sh')
+    else:
+        commands.append(f'sbatch {work_dir}/job.sh')
+
+    return work_dir / 'job.sh'
+
+
+def summary(args):
+    # parse model-index.yml
+    model_index_file = MMRAZOR_ROOT / 'model-index.yml'
+    model_index = load(str(model_index_file))
+    model_index.build_models_with_collections()
+    models = OrderedDict({model.name: model for model in model_index.models})
+
+    if args.models:
+        patterns = [re.compile(pattern) for pattern in args.models]
+        filter_models = {}
+        for k, v in models.items():
+            if any([re.match(pattern, k) for pattern in patterns]):
+                filter_models[k] = v
+        if len(filter_models) == 0:
+            print('No model found, please specify models in:')
+            print('\n'.join(models.keys()))
+            return
+        models = filter_models
+
+    model_results = dict()
+    for model_info in models.values():
+        model_name = model_info.name
+        work_dir = Path(args.work_dir) / model_name
+        sub_dirs = [p.name for p in work_dir.iterdir() if p.is_dir()]
+
+        if len(sub_dirs) == 0:
+            print(f'{model_name} has no results.')
+            continue
+
+        latest_time = sub_dirs[-1]
+        latest_json = work_dir / latest_time / f'{latest_time}.json'
+
+        if not latest_json.exists():
+            print(f'{model_name} has no results.')
+            continue
+        latest_result = mmengine.load(latest_json, 'json')
+
+        expect_result = model_info.results[0].metrics
+        summary_result = {
+            'expect': expect_result,
+            'actual': {k: v
+                       for k, v in latest_result.items()}
+        }
+        model_results[model_name] = summary_result
+
+    mmengine.fileio.dump(model_results,
+                         Path(args.work_dir) / 'summary.yml', 'yaml')
+    print(f'Summary results saved in {Path(args.work_dir)}/summary.yml')
+
+
+def test(args):
+    # parse model-index.yml
+    model_index_file = MMRAZOR_ROOT / 'model-index.yml'
+    model_index = load(str(model_index_file))
+    model_index.build_models_with_collections()
+    models = OrderedDict({model.name: model for model in model_index.models})
+
+    commands = []
+    if args.models:
+        patterns = [re.compile(pattern) for pattern in args.models]
+        filter_models = {}
+        for k, v in models.items():
+            if any([re.match(pattern, k) for pattern in patterns]):
+                filter_models[k] = v
+        if len(filter_models) == 0:
+            print('No model found, please specify models in:')
+            print('\n'.join(models.keys()))
+            return
+        models = filter_models
+
+    preview_script = ''
+    port = args.port
+    for model_info in models.values():
+        script_path = create_test_job_batch(commands, model_info, args, port)
+        preview_script = script_path or preview_script
+        port += 1
+    command_str = '\n'.join(commands)
+
+    preview = Table()
+    preview.add_column(str(preview_script))
+    preview.add_column('Shell command preview')
+    preview.add_row(
+        Syntax.from_path(
+            preview_script,
+            background_color='default',
+            line_numbers=True,
+            word_wrap=True),
+        Syntax(
+            command_str,
+            'bash',
+            background_color='default',
+            line_numbers=True,
+            word_wrap=True))
+    console.print(preview)
+
+    if args.run:
+        os.system(command_str)
+    else:
+        console.print('Please set "--run" to start the job')
+
+
+def main():
+    args = parse_args()
+    if args.summary:
+        summary(args)
+    else:
+        test(args)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_train.py b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..597e9af0c9fd92d0a192690411c8812f42054989
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/benchmark_train.py
@@ -0,0 +1,338 @@
+import argparse
+import logging
+import os
+import os.path as osp
+import re
+from collections import OrderedDict
+from pathlib import Path
+
+import mmcv
+import mmengine
+from mmengine.logging import print_log
+from modelindex.load_model_index import load
+from rich.console import Console
+from rich.syntax import Syntax
+from rich.table import Table
+
+from mmrazor.testing import FastStopTrainingHook  # noqa: F401
+
+os.environ['MKL_THREADING_LAYER'] = 'GNU'
+
+console = Console()
+MMRAZOR_ROOT = Path(__file__).absolute().parents[1]
+
+METRIC_MAPPINGS = {
+    'accuracy/top1': 'Top 1 Accuracy',
+    'accuracy/top5': 'Top 5 Accuracy'
+}
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description="Test all models' accuracy in model-index.yml")
+    parser.add_argument(
+        'partition', type=str, help='Cluster partition to use.')
+    parser.add_argument(
+        '--job-name',
+        type=str,
+        default='razor-train-benchmark',
+        help='Slurm job name prefix')
+    parser.add_argument('--port', type=int, default=29666, help='dist port')
+    parser.add_argument(
+        '--models', nargs='+', type=str, help='Specify model names to run.')
+    parser.add_argument('--gpus', type=int, default=8, help='num gpus')
+    parser.add_argument(
+        '--work-dir',
+        default='work_dirs/benchmark_train',
+        help='the dir to save metric')
+    parser.add_argument('--amp', action='store_true', help='use amp')
+    parser.add_argument(
+        '--auto-scale-lr', action='store_true', help='use auto scale lr')
+    parser.add_argument(
+        '--auto-resume', action='store_true', help='use auto resume')
+    parser.add_argument(
+        '--replace-ceph', action='store_true', help='load data from ceph')
+    parser.add_argument(
+        '--early-stop', action='store_true', help='early stop training')
+    parser.add_argument(
+        '--run', action='store_true', help='run script directly')
+    parser.add_argument(
+        '--summary', action='store_true', help='collect results')
+    parser.add_argument(
+        '--local',
+        action='store_true',
+        help='run at local instead of cluster.')
+    parser.add_argument(
+        '--mail', type=str, help='Mail address to watch test status.')
+    parser.add_argument(
+        '--mail-type',
+        nargs='+',
+        default=['BEGIN'],
+        choices=['NONE', 'BEGIN', 'END', 'FAIL', 'REQUEUE', 'ALL'],
+        help='Mail address to watch test status.')
+    parser.add_argument(
+        '--quotatype',
+        default=None,
+        choices=['reserved', 'auto', 'spot'],
+        help='Quota type, only available for phoenix-slurm>=0.2')
+
+    args = parser.parse_args()
+    return args
+
+
+def replace_to_ceph(cfg):
+
+    file_client_args = dict(
+        backend='petrel',
+        path_mapping=dict({
+            './data/coco':
+            's3://openmmlab/datasets/detection/coco',
+            'data/coco':
+            's3://openmmlab/datasets/detection/coco',
+            './data/cityscapes':
+            's3://openmmlab/datasets/segmentation/cityscapes',
+            'data/cityscapes':
+            's3://openmmlab/datasets/segmentation/cityscapes',
+            './data/imagenet':
+            's3://openmmlab/datasets/classification/imagenet',
+            'data/imagenet':
+            's3://openmmlab/datasets/classification/imagenet',
+        }))
+
+    def _process_pipeline(dataset, name):
+
+        def replace_img(pipeline):
+            if pipeline['type'] == 'LoadImageFromFile':
+                pipeline['file_client_args'] = file_client_args
+
+        def replace_ann(pipeline):
+            if pipeline['type'] == 'LoadAnnotations' or pipeline[
+                    'type'] == 'LoadPanopticAnnotations':
+                pipeline['file_client_args'] = file_client_args
+
+        if 'pipeline' in dataset:
+            replace_img(dataset.pipeline[0])
+            replace_ann(dataset.pipeline[1])
+            if 'dataset' in dataset:
+                # dataset wrapper
+                replace_img(dataset.dataset.pipeline[0])
+                replace_ann(dataset.dataset.pipeline[1])
+        else:
+            # dataset wrapper
+            replace_img(dataset.dataset.pipeline[0])
+            replace_ann(dataset.dataset.pipeline[1])
+
+    def _process_evaluator(evaluator, name):
+        if evaluator['type'] == 'CocoPanopticMetric':
+            evaluator['file_client_args'] = file_client_args
+
+    # half ceph
+    _process_pipeline(cfg.train_dataloader.dataset, cfg.filename)
+    _process_pipeline(cfg.val_dataloader.dataset, cfg.filename)
+    _process_pipeline(cfg.test_dataloader.dataset, cfg.filename)
+    _process_evaluator(cfg.val_evaluator, cfg.filename)
+    _process_evaluator(cfg.test_evaluator, cfg.filename)
+
+
+def create_train_job_batch(commands, model_info, args, port):
+
+    fname = model_info.name
+
+    cfg_path = Path(model_info.config)
+
+    cfg = mmengine.Config.fromfile(cfg_path)
+
+    if args.replace_ceph:
+        replace_to_ceph(cfg)
+
+    # enable automatically scaling LR
+    if args.auto_scale_lr:
+        if 'auto_scale_lr' in cfg and \
+                'enable' in cfg.auto_scale_lr and \
+                'base_batch_size' in cfg.auto_scale_lr:
+            cfg.auto_scale_lr.enable = True
+        else:
+            raise RuntimeError('Can not find "auto_scale_lr" or '
+                               '"auto_scale_lr.enable" or '
+                               '"auto_scale_lr.base_batch_size" in your'
+                               ' configuration file.')
+
+    # enable automatic-mixed-precision training
+    if args.amp is True:
+        optim_wrapper = cfg.optim_wrapper.type
+        if optim_wrapper == 'AmpOptimWrapper':
+            print_log(
+                'AMP training is already enabled in your config.',
+                logger='current',
+                level=logging.WARNING)
+        else:
+            assert optim_wrapper == 'OptimWrapper', (
+                '`--amp` is only supported when the optimizer wrapper type is '
+                f'`OptimWrapper` but got {optim_wrapper}.')
+            cfg.optim_wrapper.type = 'AmpOptimWrapper'
+            cfg.optim_wrapper.loss_scale = 'dynamic'
+
+    if args.auto_resume:
+        cfg.resume = True
+
+    if args.early_stop:
+        if 'custom_hooks' in cfg:
+            cfg.custom_hooks.append(dict(type='mmrazor.FastStopTrainingHook'))
+        else:
+            custom_hooks = [dict(type='mmrazor.FastStopTrainingHook')]
+            cfg.custom_hooks = custom_hooks
+
+    job_name = f'{args.job_name}_{fname}'
+    work_dir = Path(args.work_dir) / fname
+    work_dir.mkdir(parents=True, exist_ok=True)
+
+    train_cfg_path = work_dir / 'config.py'
+    cfg.dump(train_cfg_path)
+
+    if args.quotatype is not None:
+        quota_cfg = f'#SBATCH --quotatype {args.quotatype}\n'
+    else:
+        quota_cfg = ''
+
+    launcher = 'none' if args.local else 'slurm'
+    runner = 'python' if args.local else 'srun python'
+    master_port = f'MASTER_PORT={port}'
+
+    script_name = osp.join('tools', 'train.py')
+    job_script = (f'#!/bin/bash\n'
+                  f'#SBATCH --output {work_dir}/job.%j.out\n'
+                  f'#SBATCH --partition={args.partition}\n'
+                  f'#SBATCH --job-name {job_name}\n'
+                  f'#SBATCH --gres=gpu:{args.gpus}\n'
+                  f'{quota_cfg}'
+                  f'#SBATCH --ntasks-per-node={args.gpus}\n'
+                  f'#SBATCH --ntasks={args.gpus}\n'
+                  f'#SBATCH --cpus-per-task=5\n\n'
+                  f'{master_port} {runner} -u {script_name} {train_cfg_path} '
+                  f'--work-dir {work_dir} '
+                  f'--launcher={launcher}\n')
+
+    with open(work_dir / 'job.sh', 'w') as f:
+        f.write(job_script)
+
+    commands.append(f'echo "{train_cfg_path}"')
+    if args.local:
+        commands.append(f'bash {work_dir}/job.sh')
+    else:
+        commands.append(f'sbatch {work_dir}/job.sh')
+
+    return work_dir / 'job.sh'
+
+
+def summary(args):
+    # parse model-index.yml
+    model_index_file = MMRAZOR_ROOT / 'model-index.yml'
+    model_index = load(str(model_index_file))
+    model_index.build_models_with_collections()
+    models = OrderedDict({model.name: model for model in model_index.models})
+
+    if args.models:
+        patterns = [re.compile(pattern) for pattern in args.models]
+        filter_models = {}
+        for k, v in models.items():
+            if any([re.match(pattern, k) for pattern in patterns]):
+                filter_models[k] = v
+        if len(filter_models) == 0:
+            print('No model found, please specify models in:')
+            print('\n'.join(models.keys()))
+            return
+        models = filter_models
+
+    model_results = dict()
+    for model_info in models.values():
+        model_name = model_info.name
+        work_dir = Path(args.work_dir) / model_name
+        sub_dirs = [p.name for p in work_dir.iterdir() if p.is_dir()]
+
+        if len(sub_dirs) == 0:
+            print(f'{model_name} has no results.')
+            continue
+
+        latest_time = sub_dirs[-1]
+        latest_json = work_dir / latest_time / f'{latest_time}.json'
+
+        if not latest_json.exists():
+            print(f'{model_name} has no results.')
+            continue
+        latest_result = mmcv.load(latest_json, 'json')
+
+        expect_result = model_info.results[0].metrics
+        summary_result = {
+            'expect': expect_result,
+            'actual':
+            {METRIC_MAPPINGS[k]: v
+             for k, v in latest_result.items()}
+        }
+        model_results[model_name] = summary_result
+
+    mmengine.fileio.dump(model_results,
+                         Path(args.work_dir) / 'summary.yml', 'yaml')
+    print(f'Summary results saved in {Path(args.work_dir)}/summary.yml')
+
+
+def train(args):
+    # parse model-index.yml
+    model_index_file = MMRAZOR_ROOT / 'model-index.yml'
+    model_index = load(str(model_index_file))
+    model_index.build_models_with_collections()
+    models = OrderedDict({model.name: model for model in model_index.models})
+
+    commands = []
+    if args.models:
+        patterns = [re.compile(pattern) for pattern in args.models]
+        filter_models = {}
+        for k, v in models.items():
+            if any([re.match(pattern, k) for pattern in patterns]):
+                filter_models[k] = v
+        if len(filter_models) == 0:
+            print('No model found, please specify models in:')
+            print('\n'.join(models.keys()))
+            return
+        models = filter_models
+
+    preview_script = ''
+    port = args.port
+    for model_info in models.values():
+        script_path = create_train_job_batch(commands, model_info, args, port)
+        preview_script = script_path or preview_script
+        port += 1
+    command_str = '\n'.join(commands)
+
+    preview = Table()
+    preview.add_column(str(preview_script))
+    preview.add_column('Shell command preview')
+    preview.add_row(
+        Syntax.from_path(
+            preview_script,
+            background_color='default',
+            line_numbers=True,
+            word_wrap=True),
+        Syntax(
+            command_str,
+            'bash',
+            background_color='default',
+            line_numbers=True,
+            word_wrap=True))
+    console.print(preview)
+
+    if args.run:
+        os.system(command_str)
+    else:
+        console.print('Please set "--run" to start the job')
+
+
+def main():
+    args = parse_args()
+    if args.summary:
+        summary(args)
+    else:
+        train(args)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/meta_files_test.py b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/meta_files_test.py
new file mode 100644
index 0000000000000000000000000000000000000000..92c0f2f0d07063fd4a8f81d28ded2ae5bb6e7d6d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.dev_scripts/meta_files_test.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import unittest
+from pathlib import Path
+
+import requests
+import yaml
+
+MMRAZOR_ROOT = Path(__file__).absolute().parents[1]
+
+
+class TestMetafiles(unittest.TestCase):
+
+    def get_metafiles(self, code_path):
+        """
+        Function: get the metafile of all configs from model-index.yml
+        """
+        metafile = os.path.join(code_path, 'model-index.yml')
+        with open(metafile, 'r') as f:
+            meta = yaml.safe_load(f)
+        return meta['Import']
+
+    def test_metafiles(self):
+        metafiles = self.get_metafiles(MMRAZOR_ROOT)
+        for mf in metafiles:
+            metafile = os.path.abspath(os.path.join(MMRAZOR_ROOT, mf))
+            with open(metafile, 'r') as f:
+                meta = yaml.safe_load(f)
+            for model in meta['Models']:
+                # 1. weights url check
+                r = requests.head(model['Weights'], timeout=4)
+                assert r.status_code != 404, \
+                    f"can't connect url {model['Weights']} in " \
+                    f'metafile {metafile}'
+
+                # 2. config check
+                dir_path = os.path.abspath(os.path.join(metafile, '../'))
+                # list all files which are in the same directory of
+                # current metafile
+                config_files = os.listdir(dir_path)
+
+                if isinstance(model['Config'], list):
+                    # TODO: 3. log error
+                    continue
+
+                assert (model['Config'].split('/')[-1] in config_files), \
+                    f"config error in {metafile} model {model['Name']}"
+
+                # 4. name check
+                # erase '.py'
+                correct_name = model['Config'].split('/')[-1][:-3]
+                assert model['Name'] == correct_name, \
+                    f'name error in {metafile}, correct name should ' \
+                    f'be {correct_name}'
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.pre-commit-config.yaml b/cv/distiller/CWD/pytorch/mmrazor/.pre-commit-config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..cd73ef928652e8996c2e8622c3860fbdeb362401
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.pre-commit-config.yaml
@@ -0,0 +1,72 @@
+
+exclude: ^tests/data/
+repos:
+  - repo: https://github.com/PyCQA/flake8
+    rev: 4.0.1
+    hooks:
+      - id: flake8
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+  - repo: https://github.com/pre-commit/mirrors-yapf
+    rev: v0.30.0
+    hooks:
+      - id: yapf
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v3.1.0
+    hooks:
+      - id: trailing-whitespace
+      - id: check-yaml
+      - id: end-of-file-fixer
+      - id: requirements-txt-fixer
+      - id: double-quote-string-fixer
+      - id: check-merge-conflict
+      - id: fix-encoding-pragma
+        args: ["--remove"]
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.1.0
+    hooks:
+      - id: codespell
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.14
+    hooks:
+      - id: mdformat
+        args: ["--number"]
+        additional_dependencies:
+          - mdformat-gfm
+          - mdformat_frontmatter
+          - linkify-it-py
+  - repo: https://github.com/myint/docformatter
+    rev: v1.3.1
+    hooks:
+      - id: docformatter
+        args: ["--in-place", "--wrap-descriptions", "79"]
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.9
+    hooks:
+      - id: mdformat
+        args: ["--number"]
+        additional_dependencies:
+          - mdformat-gfm
+          - mdformat_frontmatter
+          - linkify-it-py
+  - repo: https://github.com/open-mmlab/pre-commit-hooks
+    rev: v0.2.0
+    hooks:
+    -   id: check-algo-readme
+    -   id: check-copyright
+        args: [ "mmrazor", "tests", "tools"]
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v0.812
+    hooks:
+      - id: mypy
+        exclude: |-
+          (?x)(
+              ^test
+              | ^docs
+              | ^configs
+              | ^.*/configs*
+          )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/.readthedocs.yml b/cv/distiller/CWD/pytorch/mmrazor/.readthedocs.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6cfbf5d310f1436c971e053b96265210d2f683fa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/.readthedocs.yml
@@ -0,0 +1,9 @@
+version: 2
+
+formats: all
+
+python:
+  version: 3.7
+  install:
+    - requirements: requirements/docs.txt
+    - requirements: requirements/readthedocs.txt
diff --git a/cv/distiller/CWD/pytorch/mmrazor/LICENSE b/cv/distiller/CWD/pytorch/mmrazor/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..f731325b2c07e508ca303c2c279991d11aa1e96f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/LICENSE
@@ -0,0 +1,203 @@
+Copyright (c) OpenMMLab. All rights reserved
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2020 MMClassification Authors.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/MANIFEST.in b/cv/distiller/CWD/pytorch/mmrazor/MANIFEST.in
new file mode 100644
index 0000000000000000000000000000000000000000..0aba33385b1075d5f80cd70b9c07f146f6c211b6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/MANIFEST.in
@@ -0,0 +1,6 @@
+include requirements/*.txt
+include mmrazor/VERSION
+include mmrazor/.mim/model-index.yml
+include mmrazor/.mim/demo/*/*
+recursive-include mmrazor/.mim/configs *.py *.yml
+recursive-include mmrazor/.mim/tools *.sh *.py
diff --git a/cv/distiller/CWD/pytorch/mmrazor/README.md b/cv/distiller/CWD/pytorch/mmrazor/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4dbb364d5bd54d463222d64bc2cf28182e497abf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/README.md
@@ -0,0 +1,235 @@
+<div align="center">
+  <img src="resources/mmrazor-logo.png" width="600"/>
+  <div>&nbsp;</div>
+  <div align="center">
+    <b><font size="5">OpenMMLab website</font></b>
+    <sup>
+      <a href="https://openmmlab.com">
+        <i><font size="4">HOT</font></i>
+      </a>
+    </sup>
+    &nbsp;&nbsp;&nbsp;&nbsp;
+    <b><font size="5">OpenMMLab platform</font></b>
+    <sup>
+      <a href="https://platform.openmmlab.com">
+        <i><font size="4">TRY IT OUT</font></i>
+      </a>
+    </sup>
+  </div>
+  <div>&nbsp;</div>
+
+<!--算法库 Badges-->
+
+[![PyPI](https://img.shields.io/pypi/v/mmrazor)](https://pypi.org/project/mmrazor)
+[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmrazor.readthedocs.io/en/main/)
+[![badge](https://github.com/open-mmlab/mmrazor/workflows/build/badge.svg)](https://github.com/open-mmlab/mmrazor/actions)
+[![codecov](https://codecov.io/gh/open-mmlab/mmrazor/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmrazor)
+[![license](https://img.shields.io/github/license/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/blob/master/LICENSE)
+[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/issues)
+[![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/issues)
+
+<!--快速链接-->
+
+<!--Note:请根据各算法库自身情况设置项目和链接-->
+
+[📘Documentation](https://mmrazor.readthedocs.io/en/main/) |
+[🛠️Installation](https://mmrazor.readthedocs.io/en/main/get_started/installation.html) |
+[👀Model Zoo](https://mmrazor.readthedocs.io/en/main/get_started/model_zoo.html) |
+[🤔Reporting Issues](https://github.com/open-mmlab/mmrazor/issues/new/choose)
+
+</div>
+
+<!--中/英 文档切换-->
+
+<div align="center">
+
+English | [简体中文](README_zh-CN.md)
+
+<div align="center">
+  <a href="https://openmmlab.medium.com/" style="text-decoration:none;">
+    <img src="https://user-images.githubusercontent.com/25839884/218352562-cdded397-b0f3-4ca1-b8dd-a60df8dca75b.png" width="3%" alt="" /></a>
+  <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
+  <a href="https://discord.gg/raweFPmdzG" style="text-decoration:none;">
+    <img src="https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png" width="3%" alt="" /></a>
+  <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
+  <a href="https://twitter.com/OpenMMLab" style="text-decoration:none;">
+    <img src="https://user-images.githubusercontent.com/25839884/218346637-d30c8a0f-3eba-4699-8131-512fb06d46db.png" width="3%" alt="" /></a>
+  <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
+  <a href="https://www.youtube.com/openmmlab" style="text-decoration:none;">
+    <img src="https://user-images.githubusercontent.com/25839884/218346691-ceb2116a-465a-40af-8424-9f30d2348ca9.png" width="3%" alt="" /></a>
+</div>
+
+</div>
+
+## Introduction
+
+MMRazor is a model compression toolkit for model slimming and AutoML, which includes 4 mainstream technologies:
+
+- Neural Architecture Search (NAS)
+- Pruning
+- Knowledge Distillation (KD)
+- Quantization
+
+It is a part of the [OpenMMLab](https://openmmlab.com/) project.
+
+Major features:
+
+- **Compatibility**
+
+  MMRazor can be easily applied to various projects in OpenMMLab, due to the similar architecture design of OpenMMLab as well as the decoupling of slimming algorithms and vision tasks.
+
+- **Flexibility**
+
+  Different algorithms, e.g., NAS, pruning and KD, can be incorporated in a plug-n-play manner to build a more powerful system.
+
+- **Convenience**
+
+  With better modular design, developers can implement new model compression algorithms with only a few codes, or even by simply modifying config files.
+
+About MMRazor's design and implementation, please refer to [tutorials](https://mmrazor.readthedocs.io/en/main/get_started/overview.html) for more details.
+
+## Latest Updates
+
+**The default branch is now `main` and the code on the branch has been upgraded to v1.0.0. The old `master` branch code now exists on the 0.x branch**
+
+MMRazor v1.0.0 was released in 2023-4-24, Major updates from 1.0.0rc2 include:
+
+1. MMRazor quantization is released.
+2. Add a new pruning algorithm named GroupFisher.
+3. Support distilling rtmdet with MMRazor.
+
+To know more about the updates in MMRazor 1.0, please refer to [Changelog](https://mmrazor.readthedocs.io/en/main/notes/changelog.html) for more details!
+
+## Benchmark and model zoo
+
+Results and models are available in the [model zoo](https://mmrazor.readthedocs.io/en/main/get_started/model_zoo.html).
+
+Supported algorithms:
+
+<details open>
+<summary>Neural Architecture Search</summary>
+
+- [x] [DARTS(ICLR'2019)](configs/nas/mmcls/darts)
+
+- [x] [DetNAS(NeurIPS'2019)](configs/nas/mmdet/detnas)
+
+- [x] [SPOS(ECCV'2020)](configs/nas/mmcls/spos)
+
+</details>
+
+<details open>
+<summary>Pruning</summary>
+
+- [x] [AutoSlim(NeurIPS'2019)](/configs/pruning/mmcls/autoslim)
+
+- [x] [L1-norm](/configs/pruning/mmcls/l1-norm)
+
+- [x] [Group Fisher](/configs/pruning/base/group_fisher)
+
+- [x] [DMCP](/configs/pruning/mmcls/dmcp)
+
+</details>
+
+<details open>
+<summary>Knowledge Distillation</summary>
+
+- [x] [CWD(ICCV'2021)](/configs/distill/mmdet/cwd)
+
+- [x] [WSLD(ICLR'2021)](/configs/distill/mmcls/wsld)
+
+- [x] [ABLoss](/configs/distill/mmcls/abloss)
+
+- [x] [BYOT](/configs/distill/mmcls/byot)
+
+- [x] [DAFL](/configs/distill/mmcls/dafl)
+
+- [x] [DFAD](/configs/distill/mmcls/dfad)
+
+- [x] [DKD](/configs/distill/mmcls/dkd)
+
+- [x] [Factor Transfer](/configs/distill/mmcls/factor_transfer)
+
+- [x] [FitNets](/configs/distill/mmcls/fitnets)
+
+- [x] [KD](/configs/distill/mmcls/kd)
+
+- [x] [OFD](/configs/distill/mmcls/ofd)
+
+- [x] [RKD](/configs/distill/mmcls/rkd)
+
+- [x] [ZSKT](/configs/distill/mmcls/zskt)
+
+- [x] [FBKD](/configs/distill/mmdet/fbkd)
+
+</details>
+
+<details open>
+<summary>Quantization</summary>
+
+- [x] [PTQ](/configs/quantization/ptq/base)
+
+- [x] [QAT](/configs/quantization/qat/base)
+
+- [x] [LSQ](/configs/quantization/qat/lsq)
+
+</details>
+
+## Installation
+
+MMRazor depends on [PyTorch](https://pytorch.org/), [MMCV](https://github.com/open-mmlab/mmcv) and [MMEngine](https://github.com/open-mmlab/mmengine).
+
+Please refer to [installation.md](https://mmrazor.readthedocs.io/en/main/get_started/installation.html) for more detailed instruction.
+
+## Getting Started
+
+Please refer to [user guides](https://mmrazor.readthedocs.io/en/main/user_guides/index.html) for the basic usage of MMRazor. There are also [advanced guides](https://mmrazor.readthedocs.io/en/main/advanced_guides/index.html):
+
+## Contributing
+
+We appreciate all contributions to improve MMRazor.
+Please refer to [CONTRUBUTING.md](https://mmrazor.readthedocs.io/en/main/notes/contribution_guide.html) for the contributing guideline.
+
+## Acknowledgement
+
+MMRazor is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.
+We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new model compression methods.
+
+## Citation
+
+If you find this project useful in your research, please consider cite:
+
+```BibTeX
+@misc{2021mmrazor,
+    title={OpenMMLab Model Compression Toolbox and Benchmark},
+    author={MMRazor Contributors},
+    howpublished = {\url{https://github.com/open-mmlab/mmrazor}},
+    year={2021}
+}
+```
+
+## License
+
+This project is released under the [Apache 2.0 license](LICENSE).
+
+## Projects in OpenMMLab
+
+- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
+- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages.
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO series toolbox and benchmark.
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/README_zh-CN.md b/cv/distiller/CWD/pytorch/mmrazor/README_zh-CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..fc59086fb581ea1b1b98b00ea963f7e6329e6b9c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/README_zh-CN.md
@@ -0,0 +1,226 @@
+<div align="center">
+  <img src="./resources/mmrazor-logo.png" width="600"/>
+  <div>&nbsp;</div>
+  <div align="center">
+    <b><font size="5">OpenMMLab 官网</font></b>
+    <sup>
+      <a href="https://openmmlab.com">
+        <i><font size="4">HOT</font></i>
+      </a>
+    </sup>
+    &nbsp;&nbsp;&nbsp;&nbsp;
+    <b><font size="5">OpenMMLab 开放平台</font></b>
+    <sup>
+      <a href="https://platform.openmmlab.com">
+        <i><font size="4">TRY IT OUT</font></i>
+      </a>
+    </sup>
+  </div>
+  <div>&nbsp;</div>
+
+<!--算法库 Badges-->
+
+[![PyPI](https://img.shields.io/pypi/v/mmrazor)](https://pypi.org/project/mmrazor)
+[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmrazor.readthedocs.io/en/main/)
+[![badge](https://github.com/open-mmlab/mmrazor/workflows/build/badge.svg)](https://github.com/open-mmlab/mmrazor/actions)
+[![codecov](https://codecov.io/gh/open-mmlab/mmrazor/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmrazor)
+[![license](https://img.shields.io/github/license/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/blob/master/LICENSE)
+[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/issues)
+[![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/issues)
+
+<!--快速链接-->
+
+<!--Note:请根据各算法库自身情况设置项目和链接-->
+
+[📘使用文档](https://mmrazor.readthedocs.io/en/main/) |
+[🛠️安装教程](https://mmrazor.readthedocs.io/en/main/get_started/installation.html) |
+[👀👀模型库](https://mmrazor.readthedocs.io/en/main/get_started/model_zoo.html) |
+[🤔报告问题](https://github.com/open-mmlab/mmrazor/issues/new/choose)
+
+</div>
+
+<!--中/英 文档切换-->
+
+<div align="center">
+
+[English](/README.md) | 简体中文
+
+</div>
+
+## 说明
+
+MMRazor是一个可用于模型瘦身和AutoML的模型压缩工具箱，包含了4种主流的技术：
+
+- 网络结构搜索 (NAS)
+- 模型剪枝
+- 知识蒸馏 (KD)
+- 量化
+
+MMRazor是[OpenMMLab](https://openmmlab.com/)项目的一部分。
+
+主要特性
+
+- **兼容性**
+
+  MMRazor和OpenMMLab有着类似的架构设计，并且实现了轻量化算法和视觉任务间轻耦合，因此很容易应用于OpenMMLab中其他的项目。
+
+- **灵活性**
+
+  多种轻量化算法可以以一种即插即用的方式来组合使用，从而搭建出功能更强大的系统。
+
+- **便利性**
+
+  得益于更好的模块化设计，开发者仅用修改少量代码，甚至只用修改配置文件即可实现新的轻量化算法。
+
+关于MMRazor设计和实现的概括图, 如果想了解更多的细节，请参考 [tutorials](/docs/en/tutorials/Tutorial_1_overview.md)。
+
+## 近期更新
+
+**默认分支目前为 main，且分支上的代码已经切换到 v1.0.0 版本。旧版 master 分支的代码现存在 0.x 分支上**
+
+## 更新日志
+
+MMRazor v0.3.1 版本已经在 2022.5.4 发布。
+
+## 基准测试和模型库
+
+测试结果可以在 [模型库](https://mmrazor.readthedocs.io/en/main/get_started/model_zoo.html) 中找到.
+
+已经支持的算法：
+
+Neural Architecture Search
+
+- [x] [DARTS(ICLR'2019)](configs/nas/darts)
+
+- [x] [DetNAS(NeurIPS'2019)](configs/nas/detnas)
+
+- [x] [SPOS(ECCV'2020)](configs/nas/spos)
+
+Pruning
+
+- [x] [AutoSlim(NeurIPS'2019)](/configs/pruning/mmcls/autoslim)
+
+- [x] [L1-norm](/configs/pruning/mmcls/l1-norm)
+
+- [x] [Group Fisher](/configs/pruning/base/group_fisher)
+
+- [x] [DMCP](/configs/pruning/mmcls/dmcp)
+
+Knowledge Distillation
+
+- [x] [CWD(ICCV'2021)](/configs/distill/mmdet/cwd)
+
+- [x] [WSLD(ICLR'2021)](/configs/distill/mmcls/wsld)
+
+- [x] [ABLoss](/configs/distill/mmcls/abloss)
+
+- [x] [BYOT](/configs/distill/mmcls/byot)
+
+- [x] [DAFL](/configs/distill/mmcls/dafl)
+
+- [x] [DFAD](/configs/distill/mmcls/dfad)
+
+- [x] [DKD](/configs/distill/mmcls/dkd)
+
+- [x] [Factor Transfer](/configs/distill/mmcls/factor_transfer)
+
+- [x] [FitNets](/configs/distill/mmcls/fitnets)
+
+- [x] [KD](/configs/distill/mmcls/kd)
+
+- [x] [OFD](/configs/distill/mmcls/ofd)
+
+- [x] [RKD](/configs/distill/mmcls/rkd)
+
+- [x] [ZSKT](/configs/distill/mmcls/zskt)
+
+- [x] [FBKD](/configs/distill/mmdet/fbkd)
+
+<details open>
+<summary>Quantization</summary>
+
+- [x] [PTQ](/configs/quantization/ptq/base)
+
+- [x] [QAT](/configs/quantization/qat/base)
+
+- [x] [LSQ](/configs/quantization/qat/lsq)
+
+</details>
+
+## 安装
+
+MMRazor 依赖 [PyTorch](https://pytorch.org/) 和 [MMCV](https://github.com/open-mmlab/mmcv)。
+
+请参考[安装教程](https://mmrazor.readthedocs.io/en/main/get_started/installation.html)获取更详细的安装指南。
+
+## 快速入门
+
+请参考 [用户指引](https://mmrazor.readthedocs.io/en/main/user_guides/index.html) 学习 MMRazor 的基本使用。 我们也提供了一些[进阶教程](https://mmrazor.readthedocs.io/en/main/advanced_guides/index.html):
+
+## 贡献指南
+
+我们感谢所有的贡献者为改进和提升 MMRazor 所作出的努力。
+请参考[贡献指南](https://mmrazor.readthedocs.io/en/main/notes/contribution_guide.html)来了解参与项目贡献的相关指引。
+
+## 致谢
+
+MMRazor 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者，以及提供宝贵反馈的用户。 我们希望这个工具箱和基准测试可以为社区提供灵活的代码工具，供用户复现已有算法并开发自己的新模型压缩算法，从而不断为开源社区提供贡献。
+
+## 引用
+
+如果您发现此项目对您的研究有用，请考虑引用：
+
+```BibTeX
+@misc{2021mmrazor,
+    title={OpenMMLab Model Compression Toolbox and Benchmark},
+    author={MMRazor Contributors},
+    howpublished = {\url{https://github.com/open-mmlab/mmrazor}},
+    year={2021}
+}
+```
+
+## 开源许可证
+
+该项目采用 [Apache 2.0 开源许可证](LICENSE)。
+
+## OpenMMLab 的其他项目
+
+- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
+- [MIM](https://github.com/open-mmlab/mim): MIM 是 OpenMMlab 项目、算法、模型的统一入口
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab 目标检测工具箱
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab 新一代通用 3D 目标检测平台
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab 旋转框检测工具箱与测试基准
+- [MMYOLO](https://github.com/open-mmlab/mmyolo): OpenMMLab YOLO 系列工具箱与测试基准
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab 语义分割工具箱
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab 全流程文字检测识别理解工具箱
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab 姿态估计工具箱
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 人体参数化模型工具箱与测试基准
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab 自监督学习工具箱与测试基准
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab 模型压缩工具箱与测试基准
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab 少样本学习工具箱与测试基准
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab 新一代视频理解工具箱
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab 一体化视频目标感知平台
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab 光流估计工具箱与测试基准
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab 图像视频编辑工具箱
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab 图片视频生成模型工具箱
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架
+
+## 欢迎加入 OpenMMLab 社区
+
+扫描下方的二维码可关注 OpenMMLab 团队的 [知乎官方账号](https://www.zhihu.com/people/openmmlab)，加入 OpenMMLab 团队的 [官方交流 QQ 群](https://jq.qq.com/?_wv=1027&k=aCvMxdr3)，添加OpenMMLab 官方小助手微信，加入 MMSelfSup 微信社区。
+
+<div align="center">
+<img src="./resources/zhihu_qrcode.jpg" height="400"/>  <img src="./resources/qq_group_qrcode.jpg" height="400"/> <img src="./resources/xiaozhushou_weixin_qrcode.jpeg" height="300"/>
+</div>
+
+我们会在 OpenMMLab 社区为大家
+
+- 📢 分享 AI 框架的前沿核心技术
+- 💻 解读 PyTorch 常用模块源码
+- 📰 发布 OpenMMLab 的相关新闻
+- 🚀 介绍 OpenMMLab 开发的前沿算法
+- 🏃 获取更高效的问题答疑和意见反馈
+- 🔥 提供与各行各业开发者充分交流的平台
+
+干货满满 📘，等你来撩 💗，OpenMMLab 社区期待您的加入 👬
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/cifar100_bs16_auto_aug.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/cifar100_bs16_auto_aug.py
new file mode 100644
index 0000000000000000000000000000000000000000..46c31ac72f14f120453a53d21b5f5db2feeee0b0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/cifar100_bs16_auto_aug.py
@@ -0,0 +1,50 @@
+_base_ = ['./pipelines/auto_aug_cifar.py']
+
+# dataset settings
+dataset_type = 'CIFAR100'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[129.304, 124.070, 112.434],
+    std=[68.170, 65.392, 70.418],
+    # loaded images are already RGB format
+    to_rgb=False)
+
+train_pipeline = [
+    dict(type='RandomCrop', crop_size=32, padding=4),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='Cutout', shape=16, pad_val=0),
+    dict(type='AutoAugment', policies={{_base_.policy_cifar}}),
+    dict(type='PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar100',
+        test_mode=False,
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar100/',
+        test_mode=True,
+        pipeline=test_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='Accuracy', topk=(1, 5))
+
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/pipelines/auto_aug_cifar.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/pipelines/auto_aug_cifar.py
new file mode 100644
index 0000000000000000000000000000000000000000..4767a8fe1084bd59cd1b2e8b67b131b4f8c9ef0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/datasets/mmcls/pipelines/auto_aug_cifar.py
@@ -0,0 +1,125 @@
+# Policy for CIFAR, refer to
+# https://github.com/DeepVoltaire/AutoAugment/blame/master/autoaugment.py
+policy_cifar = [
+    # Group 1
+    [
+        dict(type='Invert', prob=0.1),
+        dict(type='Contrast', magnitude=0.5, prob=0.2)
+    ],
+    [
+        dict(type='Rotate', angle=10., prob=0.7),
+        dict(type='Translate', magnitude=150 / 331, prob=0.3)
+    ],
+    [
+        dict(type='Sharpness', magnitude=0.9, prob=0.8),
+        dict(type='Sharpness', magnitude=0.3, prob=0.9)
+    ],
+    [
+        dict(
+            type='Shear',
+            magnitude=0.3 / 9 * 8,
+            direction='vertical',
+            prob=0.5),
+        dict(
+            type='Translate',
+            magnitude=150 / 331,
+            direction='vertical',
+            prob=0.3)
+    ],
+    [dict(type='AutoContrast', prob=0.5),
+     dict(type='Equalize', prob=0.9)],
+    # Group 2
+    [
+        dict(
+            type='Shear',
+            magnitude=0.3 / 9 * 7,
+            direction='vertical',
+            prob=0.2),
+        dict(type='Posterize', bits=5, prob=0.3)
+    ],
+    [
+        dict(type='ColorTransform', magnitude=0.3, prob=0.4),
+        dict(type='Brightness', magnitude=0.7, prob=0.7)
+    ],
+    [
+        dict(type='Sharpness', magnitude=1.0, prob=0.3),
+        dict(type='Brightness', magnitude=1.0, prob=0.7)
+    ],
+    [dict(type='Equalize', prob=0.6),
+     dict(type='Equalize', prob=0.5)],
+    [
+        dict(type='Contrast', magnitude=0.6, prob=0.6),
+        dict(type='Sharpness', magnitude=0.4, prob=0.8),
+    ],
+    # Group 3
+    [
+        dict(type='ColorTransform', magnitude=0.6, prob=0.7),
+        dict(type='Translate', magnitude=150 / 331 / 9 * 7, prob=0.5)
+    ],
+    [dict(type='Equalize', prob=0.3),
+     dict(type='AutoContrast', prob=0.4)],
+    [
+        dict(
+            type='Translate',
+            magnitude=150 / 331 / 9 * 2,
+            direction='vertical',
+            prob=0.4),
+        dict(type='Sharpness', magnitude=0.5, prob=0.2)
+    ],
+    [
+        dict(type='Brightness', magnitude=0.5, prob=0.9),
+        dict(type='ColorTransform', magnitude=0.7, prob=0.2),
+    ],
+    [
+        dict(type='Solarize', thr=256 / 9 * 7, prob=0.5),
+        dict(type='Invert', prob=0.0),
+    ],
+    # Group 4
+    [dict(type='Equalize', prob=0.2),
+     dict(type='AutoContrast', prob=0.6)],
+    [dict(type='Equalize', prob=0.2),
+     dict(type='Equalize', prob=0.6)],
+    [
+        dict(type='ColorTransform', magnitude=0.9, prob=0.9),
+        dict(type='Equalize', prob=0.6)
+    ],
+    [
+        dict(type='AutoContrast', prob=0.8),
+        dict(type='Solarize', thr=256 / 9 * 1, prob=0.2),
+    ],
+    [
+        dict(type='Brightness', magnitude=0.3, prob=0.1),
+        dict(type='ColorTransform', magnitude=0.0, prob=0.7)
+    ],
+    # Group 5
+    [
+        dict(type='Solarize', thr=256 / 9 * 4, prob=0.4),
+        dict(type='AutoContrast', prob=0.9)
+    ],
+    [
+        dict(
+            type='Translate',
+            magnitude=150 / 331,
+            direction='vertical',
+            prob=0.9),
+        dict(
+            type='Translate',
+            magnitude=150 / 331,
+            direction='vertical',
+            prob=0.7)
+    ],
+    [
+        dict(type='AutoContrast', prob=0.9),
+        dict(type='Solarize', thr=256 / 9 * 6, prob=0.8)
+    ],
+    [dict(type='Equalize', prob=0.8),
+     dict(type='Invert', prob=0.1)],
+    [
+        dict(
+            type='Translate',
+            magnitude=150 / 331,
+            direction='vertical',
+            prob=0.7),
+        dict(type='AutoContrast', prob=0.9)
+    ]
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/attentive_mobilenetv3_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/attentive_mobilenetv3_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e5af29ad2bc538fd0caad1cb4ef0617cd958c62
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/attentive_mobilenetv3_supernet.py
@@ -0,0 +1,49 @@
+# search space
+arch_setting = dict(
+    kernel_size=[  # [min_kernel_size, max_kernel_size, step]
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+    ],
+    num_blocks=[  # [min_num_blocks, max_num_blocks, step]
+        [1, 2, 1],
+        [3, 5, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [3, 8, 1],
+        [3, 8, 1],
+        [1, 2, 1],
+    ],
+    expand_ratio=[  # [min_expand_ratio, max_expand_ratio, step]
+        [1, 1, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [6, 6, 1],
+        [6, 6, 1],
+        [6, 6, 1],  # last layer
+    ],
+    num_out_channels=[  # [min_channel, max_channel, step]
+        [16, 24, 8],  # first layer
+        [16, 24, 8],
+        [24, 32, 8],
+        [32, 40, 8],
+        [64, 72, 8],
+        [112, 128, 8],
+        [192, 216, 8],
+        [216, 224, 8],
+        [1792, 1984, 1984 - 1792],  # last layer
+    ])
+
+input_resizer_cfg = dict(
+    input_sizes=[[192, 192], [224, 224], [256, 256], [288, 288]])
+
+nas_backbone = dict(
+    type='AttentiveMobileNetV3',
+    arch_setting=arch_setting,
+    norm_cfg=dict(type='DynamicBatchNorm2d', momentum=0.0))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/darts_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/darts_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..36cec1328f566071a4f9bc655b5924b5c3bd02b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/darts_supernet.py
@@ -0,0 +1,31 @@
+mutable_cfg = dict(
+    type='mmrazor.DiffMutableOP',
+    candidates=dict(
+        zero=dict(type='mmrazor.DartsZero'),
+        skip_connect=dict(type='mmrazor.DartsSkipConnect', use_drop_path=True),
+        max_pool_3x3=dict(
+            type='mmrazor.DartsPoolBN', pool_type='max', use_drop_path=True),
+        avg_pool_3x3=dict(
+            type='mmrazor.DartsPoolBN', pool_type='avg', use_drop_path=True),
+        sep_conv_3x3=dict(
+            type='mmrazor.DartsSepConv', kernel_size=3, use_drop_path=True),
+        sep_conv_5x5=dict(
+            type='mmrazor.DartsSepConv', kernel_size=5, use_drop_path=True),
+        dil_conv_3x3=dict(
+            type='mmrazor.DartsDilConv', kernel_size=3, use_drop_path=True),
+        dil_conv_5x5=dict(
+            type='mmrazor.DartsDilConv', kernel_size=5, use_drop_path=True)))
+
+route_cfg = dict(type='mmrazor.DiffChoiceRoute', with_arch_param=True)
+
+nas_backbone = dict(
+    type='mmrazor.DartsBackbone',
+    in_channels=3,
+    base_channels=16,
+    num_layers=8,
+    num_nodes=4,
+    stem_multiplier=3,
+    out_indices=(7, ),
+    mutable_cfg=mutable_cfg,
+    route_cfg=route_cfg,
+    norm_cfg=dict(type='BN', affine=False))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/dsnas_shufflenet_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/dsnas_shufflenet_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f73c8b90edf9f090474f6dede679f250086f24cf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/dsnas_shufflenet_supernet.py
@@ -0,0 +1,28 @@
+norm_cfg = dict(type='BN', eps=0.01)
+
+_STAGE_MUTABLE = dict(
+    type='mmrazor.OneHotMutableOP',
+    fix_threshold=0.3,
+    candidates=dict(
+        shuffle_3x3=dict(
+            type='ShuffleBlock', kernel_size=3, norm_cfg=norm_cfg),
+        shuffle_5x5=dict(
+            type='ShuffleBlock', kernel_size=5, norm_cfg=norm_cfg),
+        shuffle_7x7=dict(
+            type='ShuffleBlock', kernel_size=7, norm_cfg=norm_cfg),
+        shuffle_xception=dict(type='ShuffleXception', norm_cfg=norm_cfg)))
+
+arch_setting = [
+    # Parameters to build layers. 3 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, mutable_cfg.
+    [64, 4, _STAGE_MUTABLE],
+    [160, 4, _STAGE_MUTABLE],
+    [320, 8, _STAGE_MUTABLE],
+    [640, 4, _STAGE_MUTABLE]
+]
+
+nas_backbone = dict(
+    type='mmrazor.SearchableShuffleNetV2',
+    widen_factor=1.0,
+    arch_setting=arch_setting,
+    norm_cfg=norm_cfg)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/ofa_mobilenetv3_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/ofa_mobilenetv3_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d39d7fdfbc99bd9b8128d77986a1f05747fda824
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/ofa_mobilenetv3_supernet.py
@@ -0,0 +1,57 @@
+# search space
+arch_setting = dict(
+    kernel_size=[  # [min_kernel_size, max_kernel_size, step]
+        [3, 3, 1],
+        [3, 7, 2],
+        [3, 7, 2],
+        [3, 7, 2],
+        [3, 7, 2],
+        [3, 7, 2],
+    ],
+    num_blocks=[  # [min_num_blocks, max_num_blocks, step]
+        [1, 1, 1],
+        [2, 4, 1],
+        [2, 4, 1],
+        [2, 4, 1],
+        [2, 4, 1],
+        [2, 4, 1],
+    ],
+    expand_ratio=[  # [min_expand_ratio, max_expand_ratio, step]
+        [1, 1, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [6, 6, 1],  # last layer
+    ],
+    # [16, 16, 24, 40, 80, 112, 160, 960, 1280]
+    num_out_channels=[  # [min_channel, max_channel, step]
+        [16, 16, 8],  # first layer
+        [16, 16, 8],
+        [16, 24, 8],
+        [24, 40, 8],
+        [40, 80, 8],
+        [80, 112, 8],
+        [112, 160, 8],
+        [1024, 1280, 1280 - 1024],  # last layer
+    ])
+
+input_resizer_cfg = dict(
+    input_sizes=[[128, 128], [140, 140], [144, 144], [152, 152], [192, 192],
+                 [204, 204], [224, 224], [256, 256]])
+
+nas_backbone = dict(
+    type='mmrazor.AttentiveMobileNetV3',
+    arch_setting=arch_setting,
+    out_indices=(6, ),
+    stride_list=[1, 2, 2, 2, 1, 2],
+    with_se_list=[False, False, True, False, True, True],
+    act_cfg_list=[
+        'HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish',
+        'HSwish', 'HSwish'
+    ],
+    conv_cfg=dict(type='OFAConv2d'),
+    norm_cfg=dict(type='mmrazor.DynamicBatchNorm2d', momentum=0.1),
+    fine_grained_mode=True,
+    with_attentive_shortcut=False)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_mobilenet_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_mobilenet_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f65ef88d89cd1dd608897ce9cd17b3225662cf76
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_mobilenet_supernet.py
@@ -0,0 +1,65 @@
+_STAGE_MUTABLE = dict(
+    _scope_='mmrazor',
+    type='OneShotMutableOP',
+    candidates=dict(
+        mb_k3e3=dict(
+            type='MBBlock',
+            kernel_size=3,
+            expand_ratio=3,
+            act_cfg=dict(type='ReLU6')),
+        mb_k5e3=dict(
+            type='MBBlock',
+            kernel_size=5,
+            expand_ratio=3,
+            act_cfg=dict(type='ReLU6')),
+        mb_k7e3=dict(
+            type='MBBlock',
+            kernel_size=7,
+            expand_ratio=3,
+            act_cfg=dict(type='ReLU6')),
+        mb_k3e6=dict(
+            type='MBBlock',
+            kernel_size=3,
+            expand_ratio=6,
+            act_cfg=dict(type='ReLU6')),
+        mb_k5e6=dict(
+            type='MBBlock',
+            kernel_size=5,
+            expand_ratio=6,
+            act_cfg=dict(type='ReLU6')),
+        mb_k7e6=dict(
+            type='MBBlock',
+            kernel_size=7,
+            expand_ratio=6,
+            act_cfg=dict(type='ReLU6')),
+        identity=dict(type='Identity')))
+
+_FIRST_MUTABLE = dict(
+    _scope_='mmrazor',
+    type='OneShotMutableOP',
+    candidates=dict(
+        mb_k3e1=dict(
+            type='MBBlock',
+            kernel_size=3,
+            expand_ratio=1,
+            act_cfg=dict(type='ReLU6'))))
+
+arch_setting = [
+    # Parameters to build layers. 3 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, stride, mutable_cfg.
+    [24, 1, 1, _FIRST_MUTABLE],
+    [32, 4, 2, _STAGE_MUTABLE],
+    [56, 4, 2, _STAGE_MUTABLE],
+    [112, 4, 2, _STAGE_MUTABLE],
+    [128, 4, 1, _STAGE_MUTABLE],
+    [256, 4, 2, _STAGE_MUTABLE],
+    [432, 1, 1, _STAGE_MUTABLE]
+]
+
+nas_backbone = dict(
+    _scope_='mmrazor',
+    type='SearchableMobileNetV2',
+    first_channels=40,
+    last_channels=1728,
+    widen_factor=1.0,
+    arch_setting=arch_setting)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_shufflenet_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_shufflenet_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f57e8acf40307849d34bd9b756bc41cf670b0af
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/nas_backbones/spos_shufflenet_supernet.py
@@ -0,0 +1,23 @@
+_STAGE_MUTABLE = dict(
+    _scope_='mmrazor',
+    type='OneShotMutableOP',
+    candidates=dict(
+        shuffle_3x3=dict(type='ShuffleBlock', kernel_size=3),
+        shuffle_5x5=dict(type='ShuffleBlock', kernel_size=5),
+        shuffle_7x7=dict(type='ShuffleBlock', kernel_size=7),
+        shuffle_xception=dict(type='ShuffleXception')))
+
+arch_setting = [
+    # Parameters to build layers. 3 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, mutable_cfg.
+    [64, 4, _STAGE_MUTABLE],
+    [160, 4, _STAGE_MUTABLE],
+    [320, 8, _STAGE_MUTABLE],
+    [640, 4, _STAGE_MUTABLE]
+]
+
+nas_backbone = dict(
+    _scope_='mmrazor',
+    type='SearchableShuffleNetV2',
+    widen_factor=1.0,
+    arch_setting=arch_setting)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_subnet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_subnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..5eacb9510d0820846069d5704be3a9ab26122453
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_subnet.py
@@ -0,0 +1,68 @@
+# dataset settings
+dataset_type = 'mmcls.CIFAR10'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[125.307, 122.961, 113.8575],
+    std=[51.5865, 50.847, 51.255],
+    # loaded images are already RGB format
+    to_rgb=False)
+
+train_pipeline = [
+    dict(type='mmcls.RandomCrop', crop_size=32, padding=4),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='mmcls.Cutout', shape=16, pad_val=0, prob=1.0),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=96,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10',
+        test_mode=False,
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10/',
+        test_mode=True,
+        pipeline=test_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, ))
+
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=3e-4),
+    clip_grad=dict(max_norm=5, norm_type=2))
+
+# leanring policy
+param_scheduler = [
+    dict(
+        type='CosineAnnealingLR',
+        T_max=600,
+        by_epoch=True,
+        begin=0,
+        end=600,
+    )
+]
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=600)
+val_cfg = dict()  # validate each epoch
+test_cfg = dict()  # dataset settings
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_supernet.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_supernet.py
new file mode 100644
index 0000000000000000000000000000000000000000..66fb75fe489e89379c6327af0a5004db2c71442f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/cifar10_darts_supernet.py
@@ -0,0 +1,88 @@
+# dataset settings
+dataset_type = 'mmcls.CIFAR10'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[125.307, 122.961, 113.8575],
+    std=[51.5865, 50.847, 51.255],
+    # loaded images are already RGB format
+    to_rgb=False)
+
+train_pipeline = [
+    dict(type='mmcls.RandomCrop', crop_size=32, padding=4),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10',
+        indices=-25000,
+        test_mode=False,
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=128,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10/',
+        test_mode=True,
+        pipeline=test_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, ))
+
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+optim_wrapper = dict(
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(
+        optimizer=dict(
+            type='mmcls.SGD', lr=0.025, momentum=0.9, weight_decay=3e-4),
+        clip_grad=dict(max_norm=5, norm_type=2)),
+    mutator=dict(
+        optimizer=dict(type='mmcls.Adam', lr=3e-4, weight_decay=1e-3)))
+
+search_epochs = 50
+# leanring policy
+param_scheduler = [
+    dict(
+        type='mmcls.CosineAnnealingLR',
+        T_max=search_epochs,
+        eta_min=1e-3,
+        begin=0,
+        end=search_epochs),
+]
+
+# train, val, test setting
+train_cfg = dict(
+    type='mmrazor.DartsEpochBasedTrainLoop',
+    mutator_dataloader=dict(
+        batch_size=64,
+        num_workers=4,
+        dataset=dict(
+            type=dataset_type,
+            data_prefix='data/cifar10',
+            indices=25000,
+            test_mode=False,
+            pipeline=train_pipeline),
+        sampler=dict(type='mmcls.DefaultSampler', shuffle=True),
+        persistent_workers=True,
+    ),
+    max_epochs=search_epochs)
+
+val_cfg = dict()  # validate each epoch
+test_cfg = dict()  # dataset settings
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_dsnas.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_dsnas.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf266c51c0b80bfc4fcc7653d6ac606d4b90e24e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_dsnas.py
@@ -0,0 +1,102 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+data_preprocessor = dict(
+    type='mmcls.ClsDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+
+train_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(type='mmcls.RandomResizedCrop', scale=224),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(type='mmcls.ResizeEdge', scale=256, edge='short'),
+    dict(type='mmcls.CenterCrop', crop_size=224),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=128,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=128,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, 5))
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+paramwise_cfg = dict(bias_decay_mult=0.0, norm_decay_mult=0.0)
+
+optim_wrapper = dict(
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(
+        optimizer=dict(
+            type='mmcls.SGD', lr=0.5, momentum=0.9, weight_decay=4e-5),
+        paramwise_cfg=paramwise_cfg),
+    mutator=dict(
+        optimizer=dict(
+            type='mmcls.Adam', lr=0.001, weight_decay=0.0, betas=(0.5,
+                                                                  0.999))))
+
+search_epochs = 85
+# leanring policy
+param_scheduler = dict(
+    architecture=[
+        dict(
+            type='mmcls.LinearLR',
+            end=5,
+            start_factor=0.2,
+            by_epoch=True,
+            convert_to_iter_based=True),
+        dict(
+            type='mmcls.CosineAnnealingLR',
+            T_max=240,
+            begin=5,
+            end=search_epochs,
+            by_epoch=True,
+            convert_to_iter_based=True),
+        dict(
+            type='mmcls.CosineAnnealingLR',
+            T_max=160,
+            begin=search_epochs,
+            end=240,
+            eta_min=0.0,
+            by_epoch=True,
+            convert_to_iter_based=True)
+    ],
+    mutator=[])
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=240)
+val_cfg = dict()
+test_cfg = dict()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_spos.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_spos.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ae2fcd606a51fd2a4b1d9a7ee0a3f2a36dfc931
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs1024_spos.py
@@ -0,0 +1,83 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+
+train_pipeline = [
+    dict(_scope_='mmcls', type='LoadImageFromFile'),
+    dict(_scope_='mmcls', type='RandomResizedCrop', scale=224),
+    dict(
+        _scope_='mmcls',
+        type='ColorJitter',
+        brightness=0.4,
+        contrast=0.4,
+        saturation=0.4),
+    dict(_scope_='mmcls', type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(_scope_='mmcls', type='PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(_scope_='mmcls', type='LoadImageFromFile'),
+    dict(_scope_='mmcls', type='ResizeEdge', scale=256, edge='short'),
+    dict(_scope_='mmcls', type='CenterCrop', crop_size=224),
+    dict(_scope_='mmcls', type='PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=128,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=128,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='Accuracy', topk=(1, 5), _scope_='mmcls')
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+paramwise_cfg = dict(
+    bias_decay_mult=0.0, norm_decay_mult=0.0, dwconv_decay_mult=0.0)
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.5, momentum=0.9, weight_decay=4e-5),
+    paramwise_cfg=paramwise_cfg,
+    clip_grad=None)
+
+# leanring policy
+param_scheduler = dict(
+    type='PolyLR',
+    power=1.0,
+    eta_min=0.0,
+    by_epoch=True,
+    end=300,
+    convert_to_iter_based=True)
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=300)
+val_cfg = dict()
+test_cfg = dict()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_AdamW.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_AdamW.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b7b2909716c66ec01b229f4c57eb19e70902b96
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_AdamW.py
@@ -0,0 +1,180 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+
+bgr_mean = preprocess_cfg['mean'][::-1]
+bgr_std = preprocess_cfg['std'][::-1]
+
+# Refers to `_RAND_INCREASING_TRANSFORMS` in pytorch-image-models
+rand_increasing_policies = [
+    dict(type='mmcls.AutoContrast'),
+    dict(type='mmcls.Equalize'),
+    dict(type='mmcls.Invert'),
+    dict(type='mmcls.Rotate', magnitude_key='angle', magnitude_range=(0, 30)),
+    dict(type='mmcls.Posterize', magnitude_key='bits', magnitude_range=(4, 0)),
+    dict(type='mmcls.Solarize', magnitude_key='thr', magnitude_range=(256, 0)),
+    dict(
+        type='mmcls.SolarizeAdd',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 110)),
+    dict(
+        type='mmcls.ColorTransform',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.9)),
+    dict(
+        type='mmcls.Contrast',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.9)),
+    dict(
+        type='mmcls.Brightness',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.9)),
+    dict(
+        type='mmcls.Sharpness',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.9)),
+    dict(
+        type='mmcls.Shear',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.3),
+        direction='horizontal'),
+    dict(
+        type='mmcls.Shear',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.3),
+        direction='vertical'),
+    dict(
+        type='mmcls.Translate',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.45),
+        direction='horizontal'),
+    dict(
+        type='mmcls.Translate',
+        magnitude_key='magnitude',
+        magnitude_range=(0, 0.45),
+        direction='vertical')
+]
+
+train_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(
+        type='mmcls.RandomResizedCrop',
+        scale=224,
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(
+        type='mmcls.RandAugment',
+        policies=rand_increasing_policies,
+        num_policies=2,
+        total_level=10,
+        magnitude_level=9,
+        magnitude_std=0.5,
+        hparams=dict(
+            pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
+    dict(
+        type='mmcls.RandomErasing',
+        erase_prob=0.25,
+        mode='rand',
+        min_area_ratio=0.02,
+        max_area_ratio=1 / 3,
+        fill_color=bgr_mean,
+        fill_std=bgr_std),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(
+        type='mmcls.ResizeEdge',
+        scale=248,
+        edge='short',
+        backend='pillow',
+        interpolation='bicubic'),
+    dict(type='mmcls.CenterCrop', crop_size=224),
+    dict(type='mmcls.PackClsInputs')
+]
+
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=6,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.RepeatAugSampler'),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=256,
+    num_workers=6,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, 5))
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+paramwise_cfg = dict(
+    bias_decay_mult=0.0, norm_decay_mult=0.0, dwconv_decay_mult=0.0)
+
+optim_wrapper = dict(
+    optimizer=dict(
+        type='AdamW',
+        lr=0.002,
+        weight_decay=0.05,
+        eps=1e-8,
+        betas=(0.9, 0.999)),
+    # specific to vit pretrain
+    paramwise_cfg=dict(custom_keys={
+        '.cls_token': dict(decay_mult=0.0),
+        '.pos_embed': dict(decay_mult=0.0)
+    }))
+
+# leanring policy
+param_scheduler = [
+    # warm up learning rate scheduler
+    dict(
+        type='LinearLR',
+        start_factor=1e-3,
+        by_epoch=True,
+        begin=0,
+        # about 10000 iterations for ImageNet-1k
+        end=20,
+        # update by iter
+        convert_to_iter_based=True),
+    # main learning rate scheduler
+    dict(
+        type='CosineAnnealingLR',
+        T_max=500,
+        eta_min=1e-5,
+        by_epoch=True,
+        begin=20,
+        end=500,
+        convert_to_iter_based=True),
+]
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=500)
+val_cfg = dict()
+test_cfg = dict()
+
+auto_scale_lr = dict(base_batch_size=2048)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim.py
new file mode 100644
index 0000000000000000000000000000000000000000..1202587637eea0bac4bb0fb4830f0376ef9dc551
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim.py
@@ -0,0 +1,15 @@
+_base_ = [
+    './imagenet_bs1024_spos.py',
+]
+
+_RandomResizedCrop_cfg = _base_.train_dataloader.dataset.pipeline[1]
+assert _RandomResizedCrop_cfg.type == 'RandomResizedCrop'
+_RandomResizedCrop_cfg.crop_ratio_range = (0.25, 1.0)
+
+optim_wrapper = dict(optimizer=dict(weight_decay=1e-4, nesterov=True))
+
+train_dataloader = dict(batch_size=256)
+
+val_dataloader = dict(batch_size=256)
+
+test_dataloader = dict(batch_size=256)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim_pil.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim_pil.py
new file mode 100644
index 0000000000000000000000000000000000000000..90a88cb1f14f0ffb1e1bbfb256e9a11c385b96c0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_autoslim_pil.py
@@ -0,0 +1,13 @@
+_base_ = 'imagenet_bs2048_autoslim.py'
+
+_RandomResizedCrop_cfg = _base_.train_dataloader.dataset.pipeline[1]
+assert _RandomResizedCrop_cfg.type == 'RandomResizedCrop'
+_RandomResizedCrop_cfg.backend = 'pillow'
+
+_ResizeEdge_cfg_val = _base_.val_dataloader.dataset.pipeline[1]
+assert _ResizeEdge_cfg_val.type == 'ResizeEdge'
+_ResizeEdge_cfg_val.backend = 'pillow'
+
+_ResizeEdge_cfg_test = _base_.test_dataloader.dataset.pipeline[1]
+assert _ResizeEdge_cfg_test.type == 'ResizeEdge'
+_ResizeEdge_cfg_test.backend = 'pillow'
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_bignas.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_bignas.py
new file mode 100644
index 0000000000000000000000000000000000000000..617b72ef153bca20de428b9db59e1a38ffdcd96a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_bignas.py
@@ -0,0 +1,400 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+
+# data preprocessor
+data_preprocessor = dict(
+    type='mmcls.ClsDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+
+bgr_mean = data_preprocessor['mean'][::-1]
+bgr_std = data_preprocessor['std'][::-1]
+
+extra_params = dict(
+    translate_const=int(224 * 0.45),
+    img_mean=tuple(round(x) for x in data_preprocessor['mean']),
+)
+policies = [
+    [
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.8,
+            magnitude=1,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.ShearY',
+            prob=0.8,
+            magnitude=4,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.4,
+            magnitude=9,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.6,
+            magnitude=3,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.4,
+            magnitude=1,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.6,
+            magnitude=8,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.8,
+            magnitude=3,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.4,
+            magnitude=7,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.4,
+            magnitude=2,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.6,
+            magnitude=2,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.2,
+            magnitude=0,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.8,
+            magnitude=8,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.4,
+            magnitude=8,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.SolarizeAddV2',
+            prob=0.8,
+            magnitude=3,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.ShearX',
+            prob=0.2,
+            magnitude=9,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.6,
+            magnitude=8,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.6,
+            magnitude=1,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=1.0,
+            magnitude=2,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.InvertV2',
+            prob=0.4,
+            magnitude=9,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.6,
+            magnitude=0,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=1.0,
+            magnitude=9,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.ShearY',
+            prob=0.6,
+            magnitude=3,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.4,
+            magnitude=7,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.6,
+            magnitude=0,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.PosterizeV2',
+            prob=0.4,
+            magnitude=6,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.AutoContrastV2',
+            prob=0.4,
+            magnitude=7,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.6,
+            magnitude=8,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.Color',
+            prob=0.6,
+            magnitude=9,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.2,
+            magnitude=4,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.8,
+            magnitude=9,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.RotateV2',
+            prob=1.0,
+            magnitude=7,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.TranslateYRel',
+            prob=0.8,
+            magnitude=9,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.ShearX',
+            prob=0.0,
+            magnitude=0,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.8,
+            magnitude=4,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.ShearY',
+            prob=0.8,
+            magnitude=0,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.Color',
+            prob=0.6,
+            magnitude=4,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=1.0,
+            magnitude=0,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.6,
+            magnitude=2,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.8,
+            magnitude=4,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.0,
+            magnitude=8,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=1.0,
+            magnitude=4,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.AutoContrastV2',
+            prob=0.6,
+            magnitude=2,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.ShearY',
+            prob=0.4,
+            magnitude=7,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.SolarizeAddV2',
+            prob=0.6,
+            magnitude=7,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.PosterizeV2',
+            prob=0.8,
+            magnitude=2,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.6,
+            magnitude=10,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.SolarizeV2',
+            prob=0.6,
+            magnitude=8,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.EqualizeV2',
+            prob=0.6,
+            magnitude=1,
+            extra_params=extra_params),
+    ],
+    [
+        dict(
+            type='mmrazor.Color',
+            prob=0.8,
+            magnitude=6,
+            extra_params=extra_params),
+        dict(
+            type='mmrazor.RotateV2',
+            prob=0.4,
+            magnitude=5,
+            extra_params=extra_params),
+    ],
+]
+
+train_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(type='mmcls.RandomResizedCrop', scale=224, backend='pillow'),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='mmrazor.AutoAugmentV2', policies=policies),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(
+        type='mmcls.ResizeEdge',
+        scale=256,
+        edge='short',
+        backend='pillow',
+        interpolation='bilinear'),
+    dict(type='mmcls.CenterCrop', crop_size=224),
+    dict(type='mmcls.PackClsInputs')
+]
+
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=16,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.RepeatAugSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=64,
+    num_workers=16,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, 5))
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(
+        type='SGD', lr=0.8, momentum=0.9, weight_decay=0.00001, nesterov=True),
+    paramwise_cfg=dict(bias_decay_mult=0., norm_decay_mult=0.))
+
+# learning policy
+max_epochs = 360
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
+        end=3125),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=max_epochs,
+        eta_min=0,
+        by_epoch=True,
+        begin=0,
+        end=max_epochs,
+        convert_to_iter_based=True)
+]
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
+val_cfg = dict(type='mmrazor.SubnetValLoop', calibrate_sample_num=4096)
+test_cfg = dict(type='mmrazor.SubnetValLoop', calibrate_sample_num=4096)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_dmcp.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_dmcp.py
new file mode 100644
index 0000000000000000000000000000000000000000..3532423fc55b9f1e1ca99bf46872dd5cb2c8e42b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_dmcp.py
@@ -0,0 +1,98 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+
+max_search_epochs = 100
+# learning rate setting
+param_scheduler = [
+    # warm up learning rate scheduler
+    dict(
+        type='LinearLR',
+        start_factor=0.5,
+        by_epoch=True,
+        begin=0,
+        end=10,
+        convert_to_iter_based=True),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=max_search_epochs,
+        eta_min=0.08,
+        by_epoch=True,
+        begin=10,
+        end=max_search_epochs,
+        convert_to_iter_based=True),
+]
+
+# optimizer setting
+paramwise_cfg = dict(norm_decay_mult=0.0, bias_decay_mult=0.0)
+
+optim_wrapper = dict(
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(
+        type='OptimWrapper',
+        optimizer=dict(type='SGD', lr=0.5, momentum=0.9, weight_decay=3e-4),
+        paramwise_cfg=paramwise_cfg),
+    mutator=dict(
+        type='OptimWrapper',
+        optimizer=dict(type='Adam', lr=0.5, weight_decay=1e-3)))
+
+# data preprocessor
+data_preprocessor = dict(
+    type='mmcls.ClsDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='RandomResizedCrop', scale=224),
+    dict(type='ColorJitter', brightness=0.2, contrast=0.2, saturation=0.2),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='ResizeEdge', scale=256, edge='short'),
+    dict(type='CenterCrop', crop_size=224),
+    dict(type='PackClsInputs'),
+]
+
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True, _scope_='mmcls'),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=64,
+    num_workers=4,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True, _scope_='mmcls'),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, 5))
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+evaluation = dict(interval=1, metric='accuracy')
+
+train_cfg = dict(by_epoch=True, max_epochs=max_search_epochs, val_interval=1)
+val_cfg = dict()
+test_cfg = dict()
+custom_hooks = [dict(type='DMCPSubnetHook')]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_ofa.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_ofa.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe9ff75b450a02e436e3ef0c6fe53469a0739a90
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/settings/imagenet_bs2048_ofa.py
@@ -0,0 +1,98 @@
+# dataset settings
+dataset_type = 'mmcls.ImageNet'
+
+# data preprocessor
+data_preprocessor = dict(
+    type='mmcls.ClsDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+)
+
+bgr_mean = data_preprocessor['mean'][::-1]
+bgr_std = data_preprocessor['std'][::-1]
+
+extra_params = dict(
+    translate_const=int(224 * 0.45),
+    img_mean=tuple(round(x) for x in data_preprocessor['mean']),
+)
+
+train_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(type='mmcls.RandomResizedCrop', scale=224),
+    dict(type='mmcls.RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='mmcls.ColorJitter', brightness=0.1254, saturation=0.5),
+    dict(type='mmcls.PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='mmcls.LoadImageFromFile'),
+    dict(
+        type='mmcls.ResizeEdge',
+        scale=256,
+        edge='short',
+        backend='pillow',
+        interpolation='bilinear'),
+    dict(type='mmcls.CenterCrop', crop_size=224),
+    dict(type='mmcls.PackClsInputs')
+]
+
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=16,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/train.txt',
+        data_prefix='train',
+        pipeline=train_pipeline),
+    sampler=dict(type='mmcls.RepeatAugSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=64,
+    num_workers=16,
+    dataset=dict(
+        type=dataset_type,
+        data_root='data/imagenet',
+        ann_file='meta/val.txt',
+        data_prefix='val',
+        pipeline=test_pipeline),
+    sampler=dict(type='mmcls.DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='mmcls.Accuracy', topk=(1, 5))
+
+# If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(
+        type='SGD', lr=0.8, momentum=0.9, weight_decay=0.00001, nesterov=True),
+    paramwise_cfg=dict(bias_decay_mult=0., norm_decay_mult=0.))
+
+# learning policy
+max_epochs = 360
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
+        end=3125),
+    dict(
+        type='CosineAnnealingLR',
+        T_max=max_epochs,
+        eta_min=0,
+        by_epoch=True,
+        begin=0,
+        end=max_epochs,
+        convert_to_iter_based=True)
+]
+
+# train, val, test setting
+train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
+val_cfg = dict(type='mmrazor.SubnetValLoop', calibrate_sample_num=4096)
+test_cfg = dict(type='mmrazor.SubnetValLoop', calibrate_sample_num=4096)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/vanilla_models/wrn16_2_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/vanilla_models/wrn16_2_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e0a83bf529d64ae1e4fadd34f82a163cb86b898
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/_base_/vanilla_models/wrn16_2_cifar10.py
@@ -0,0 +1,20 @@
+model = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(
+        _scope_='mmrazor',
+        type='WideResNet',
+        depth=16,
+        num_stages=3,
+        widen_factor=2,
+    ),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=10,
+        in_channels=128,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1e6e1951252409179b8187bb07af2ef63b93ef9c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/README.md
@@ -0,0 +1,69 @@
+# Activation Boundaries Loss (ABLoss)
+
+> [Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons](https://arxiv.org/pdf/1811.03233.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this pa- per, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the stu- dent coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating bound- ary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art [link](https://github.com/bhheo/AB_distillation)
+
+<img width="1184" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187422794-d681ed58-293a-4d9e-9e5b-9937289136a7.png">
+
+## Results and models
+
+### Classification
+
+|      Location       | Dataset  |                                                   Teacher                                                    |                                                   Student                                                    |  Acc  | Acc(T) | Acc(S) |                                   Config                                   | Download                                                                                                                                                                                                                                                                                                                                                                                                     |
+| :-----------------: | :------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| backbone (pretrain) | ImageNet | [resnet50](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb32_in1k.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb32_in1k.py) |       | 76.55  | 69.90  | [pretrain_config](./abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/ABLoss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k_20220830_165724-a6284e9f.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/ABLoss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k_20220830_165724-a6284e9f.json) |
+|   logits (train)    | ImageNet | [resnet50](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb32_in1k.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb32_in1k.py) | 69.94 | 76.55  | 69.90  |      [train_config](./abloss_logits_resnet50_resnet18_8xb32_in1k.py)       | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/ABLoss/abloss_logits_resnet50_resnet18_8xb32_in1k_20220830_202129-f35edde8.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/ABLoss/abloss_logits_resnet50_resnet18_8xb32_in1k_20220830_202129-f35edde8.json)                       |
+
+## Citation
+
+```latex
+@inproceedings{DBLP:conf/aaai/HeoLY019a,
+  author    = {Byeongho Heo, Minsik Lee, Sangdoo Yun and Jin Young Choi},
+  title     = {Knowledge Transfer via Distillation of Activation Boundaries Formed
+               by Hidden Neurons},
+  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
+               2019, The Thirty-First Innovative Applications of Artificial Intelligence
+               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
+               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
+               USA, January 27 - February 1, 2019},
+  pages     = {3779--3787},
+  publisher = {{AAAI} Press},
+  year      = {2019},
+  url       = {https://doi.org/10.1609/aaai.v33i01.33013779},
+  doi       = {10.1609/aaai.v33i01.33013779},
+  timestamp = {Fri, 07 May 2021 11:57:04 +0200},
+  biburl    = {https://dblp.org/rec/conf/aaai/HeoLY019a.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+
+## Getting Started
+
+### Pre-training.
+
+```bash
+sh tools/dist_train.sh configs/distill/mmcls/abloss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k.py 8
+```
+
+### Modify Distillation training config
+
+open file 'configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py'
+
+```python
+# Modify init_cfg in model settings.
+# 'pretrain_work_dir' is same as the 'work_dir of pre-training'.
+# 'last_epoch' defaults to 'epoch_20' in ABLoss.
+init_cfg=dict(
+    type='Pretrained', checkpoint='pretrain_work_dir/last_epoch.pth'),
+```
+
+### Distillation training.
+
+```bash
+sh tools/dist_train.sh configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py 8
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..e1e9c3a7c2c24afde625daefa48ec2ade6cd8541
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py
@@ -0,0 +1,40 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+# Modify pretrain_checkpoint before training.
+pretrain_checkpoint = 'work_dir_of_abloss_pretrain/last_epoch.pth'
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    init_cfg=dict(type='Pretrained', checkpoint=pretrain_checkpoint),
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(
+                type='KLDivergence', loss_weight=6.25, reduction='mean')),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe7fdc82c506f8ac2f827bd46ec05b59c03b6cc7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/abloss_pretrain_backbone_resnet50_resnet18_8xb32_in1k.py
@@ -0,0 +1,97 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=True),
+    teacher_ckpt=teacher_ckpt,
+    calculate_student_loss=False,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs', source='backbone.layer4.1.conv2'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.1.conv2'),
+            bb_s2=dict(type='ModuleOutputs', source='backbone.layer2.1.conv2'),
+            bb_s1=dict(type='ModuleOutputs',
+                       source='backbone.layer1.1.conv2')),
+        teacher_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs', source='backbone.layer4.2.conv3'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.5.conv3'),
+            bb_s2=dict(type='ModuleOutputs', source='backbone.layer2.3.conv3'),
+            bb_s1=dict(type='ModuleOutputs',
+                       source='backbone.layer1.2.conv3')),
+        distill_losses=dict(
+            loss_s4=dict(type='ABLoss', loss_weight=1.0),
+            loss_s3=dict(type='ABLoss', loss_weight=0.5),
+            loss_s2=dict(type='ABLoss', loss_weight=0.25),
+            loss_s1=dict(type='ABLoss', loss_weight=0.125)),
+        connectors=dict(
+            loss_s4_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=512,
+                out_channel=2048,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_s3_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=256,
+                out_channel=1024,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_s2_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=128,
+                out_channel=512,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_s1_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=64,
+                out_channel=256,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None)),
+        loss_forward_mappings=dict(
+            loss_s4=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s4',
+                    connector='loss_s4_sfeat'),
+                t_feature=dict(from_student=False, recorder='bb_s4')),
+            loss_s3=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s3',
+                    connector='loss_s3_sfeat'),
+                t_feature=dict(from_student=False, recorder='bb_s3')),
+            loss_s2=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s2',
+                    connector='loss_s2_sfeat'),
+                t_feature=dict(from_student=False, recorder='bb_s2')),
+            loss_s1=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s1',
+                    connector='loss_s1_sfeat'),
+                t_feature=dict(from_student=False, recorder='bb_s1')))))
+
+find_unused_parameters = True
+
+train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1)
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..df230a9c9a3372fe728970f6120cde0e27a25c2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/abloss/metafile.yml
@@ -0,0 +1,36 @@
+Collections:
+  - Name: ABLoss
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/pdf/1811.03233.pdf
+      Title: Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
+    README: configs/distill/mmcls/abloss/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/bhheo/AB_distillation
+Models:
+  - Name: abloss_logits_resnet50_resnet18_8xb32_in1k
+    In Collection: ABLoss
+    Metadata:
+      Location: logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+          Top 5 Accuracy: 93.06
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.94
+    Config: configs/distill/mmcls/abloss/abloss_logits_resnet50_resnet18_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/ABLoss/abloss_logits_resnet50_resnet18_8xb32_in1k_20220830_202129-f35edde8.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..95d82ecb565bd71ae633cd001d6908514be6326f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/README.md
@@ -0,0 +1,57 @@
+# BYOT
+
+> [Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://arxiv.org/abs/1905.08094)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices.Our codes will be released on github soon. [Unofficial code](https://github.com/luanyunteng/pytorch-be-your-own-teacher)
+
+## Pipeline
+
+![byot](https://user-images.githubusercontent.com/88702197/187422992-e7bd692d-b6d4-44d8-8b36-741e0cf1c4f6.png)
+
+## Results and models
+
+#### Classification
+
+| Location | Dataset  |                     Model                     | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                                                                                                               Download                                                                                                                |
+| :------: | :------: | :-------------------------------------------: | :-------: | :------: | :-------: | :-------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|  logits  | CIFAR100 | [R18_BYOT](./byot_resnet18_8xb16_cifar100.py) |   11.22   |   0.56   |   80.66   |   95.76   | [model](https://download.openmmlab.com/mmrazor/v1/byot/byot_resnet18_8xb16_cifar100_20220817_191217-0251084e.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/byot/byot_resnet18_8xb16_cifar100_20220817_191217-0251084e.json) |
+
+## Citation
+
+```latex
+@ARTICLE{2019arXiv190508094Z,
+       author = {{Zhang}, Linfeng and {Song}, Jiebo and {Gao}, Anni and {Chen}, Jingwei and {Bao}, Chenglong and {Ma}, Kaisheng},
+        title = {Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation},
+      journal = {arXiv e-prints},
+     keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
+         year = 2019,
+        month = may,
+          eid = {arXiv:1905.08094},
+        pages = {arXiv:1905.08094},
+archivePrefix = {arXiv},
+       eprint = {1905.08094},
+ primaryClass = {cs.LG},
+       adsurl = {https://ui.adsabs.harvard.edu/abs/2019arXiv190508094Z},
+      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+```
+
+## Get Started
+
+### Distillation training.
+
+```bash
+sh tools/dist_train.sh \
+  configs/distill/mmcls/byot/byot_logits_resnet18_cifar100_8xb16_in1k.py 8
+```
+
+### Test
+
+```bash
+sh tools/dist_train.sh \
+  configs/distill/mmcls/byot/byot_logits_resnet18_cifar100_8xb16_in1k.py 8
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/byot_resnet18_8xb16_cifar100.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/byot_resnet18_8xb16_cifar100.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f5fc8b7f2002a532db6d287687da0cc9cfec794
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/byot_resnet18_8xb16_cifar100.py
@@ -0,0 +1,155 @@
+_base_ = [
+    '../../../_base_/datasets/mmcls/cifar100_bs16_auto_aug.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0005))
+param_scheduler = dict(
+    type='MultiStepLR', by_epoch=True, milestones=[80, 160, 240], gamma=0.1)
+train_cfg = dict(by_epoch=True, max_epochs=250, val_interval=1)
+
+model = dict(
+    _scope_='mmrazor',
+    type='SelfDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[129.304, 124.070, 112.434],
+        std=[68.170, 65.392, 70.418],
+        # convert image from BGR to RGB
+        bgr_to_rgb=False),
+    architecture=dict(
+        type='mmcls.ImageClassifier',
+        backbone=dict(
+            type='mmcls.ResNet_CIFAR',
+            depth=18,
+            num_stages=4,
+            out_indices=(3, ),
+            style='pytorch'),
+        neck=dict(type='mmcls.GlobalAveragePooling'),
+        head=dict(
+            type='mmcls.LinearClsHead',
+            num_classes=100,
+            in_channels=512,
+            loss=dict(type='mmcls.CrossEntropyLoss', loss_weight=1.0))),
+    distiller=dict(
+        type='BYOTDistiller',
+        student_recorders=dict(
+            bb_s1=dict(type='ModuleOutputs', source='backbone.layer1.1.relu'),
+            bb_s2=dict(type='ModuleOutputs', source='backbone.layer2.1.relu'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.1.relu')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc'),
+            neck_gap=dict(type='ModuleOutputs', source='neck.gap'),
+            gt_labels=dict(type='ModuleInputs', source='head.loss_module')),
+        distill_losses=dict(
+            loss_fet_1=dict(
+                type='L2Loss', normalize=False, loss_weight=0.03, dist=True),
+            loss_label_1=dict(type='mmcls.CrossEntropyLoss', loss_weight=0.7),
+            loss_softl_1=dict(type='KLDivergence', tau=3, loss_weight=0.3),
+            loss_fet_2=dict(
+                type='L2Loss', normalize=False, loss_weight=0.03, dist=True),
+            loss_label_2=dict(type='mmcls.CrossEntropyLoss', loss_weight=0.7),
+            loss_softl_2=dict(type='KLDivergence', tau=3, loss_weight=0.3),
+            loss_fet_3=dict(
+                type='L2Loss', normalize=False, loss_weight=0., dist=True),
+            loss_label_3=dict(type='mmcls.CrossEntropyLoss', loss_weight=0.7),
+            loss_softl_3=dict(type='KLDivergence', tau=3, loss_weight=0.3)),
+        connectors=dict(
+            loss_s1_sfeat=dict(
+                type='BYOTConnector',
+                in_channel=64,
+                out_channel=512,
+                expansion=1,
+                kernel_size=3,
+                stride=2,
+                num_classes=100),
+            loss_s2_sfeat=dict(
+                type='BYOTConnector',
+                in_channel=128,
+                out_channel=512,
+                expansion=1,
+                kernel_size=3,
+                stride=2,
+                num_classes=100),
+            loss_s3_sfeat=dict(
+                type='BYOTConnector',
+                in_channel=256,
+                out_channel=512,
+                expansion=1,
+                kernel_size=3,
+                stride=2,
+                num_classes=100)),
+        loss_forward_mappings=dict(
+            loss_fet_1=dict(
+                s_feature=dict(
+                    recorder='bb_s1',
+                    from_student=True,
+                    connector='loss_s1_sfeat',
+                    connector_idx=0),
+                t_feature=dict(recorder='neck_gap', from_student=False)),
+            loss_label_1=dict(
+                cls_score=dict(
+                    recorder='bb_s1',
+                    from_student=True,
+                    connector='loss_s1_sfeat',
+                    connector_idx=1),
+                label=dict(
+                    recorder='gt_labels', from_student=False, data_idx=1)),
+            loss_softl_1=dict(
+                preds_S=dict(
+                    recorder='bb_s1',
+                    from_student=True,
+                    connector='loss_s1_sfeat',
+                    connector_idx=1),
+                preds_T=dict(recorder='fc', from_student=False)),
+            loss_fet_2=dict(
+                s_feature=dict(
+                    recorder='bb_s2',
+                    from_student=True,
+                    connector='loss_s2_sfeat',
+                    connector_idx=0),
+                t_feature=dict(recorder='neck_gap', from_student=False)),
+            loss_label_2=dict(
+                cls_score=dict(
+                    recorder='bb_s2',
+                    from_student=True,
+                    connector='loss_s2_sfeat',
+                    connector_idx=1),
+                label=dict(
+                    recorder='gt_labels', from_student=False, data_idx=1)),
+            loss_softl_2=dict(
+                preds_S=dict(
+                    recorder='bb_s2',
+                    from_student=True,
+                    connector='loss_s2_sfeat',
+                    connector_idx=1),
+                preds_T=dict(recorder='fc', from_student=False)),
+            loss_fet_3=dict(
+                s_feature=dict(
+                    recorder='bb_s3',
+                    from_student=True,
+                    connector='loss_s3_sfeat',
+                    connector_idx=0),
+                t_feature=dict(recorder='neck_gap', from_student=False)),
+            loss_label_3=dict(
+                cls_score=dict(
+                    recorder='bb_s3',
+                    from_student=True,
+                    connector='loss_s3_sfeat',
+                    connector_idx=1),
+                label=dict(
+                    recorder='gt_labels', from_student=False, data_idx=1)),
+            loss_softl_3=dict(
+                preds_S=dict(
+                    recorder='bb_s3',
+                    from_student=True,
+                    connector='loss_s3_sfeat',
+                    connector_idx=1),
+                preds_T=dict(recorder='fc', from_student=False)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SelfDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..2855dd6e05170ed64eb487dcec39563800cf5331
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/byot/metafile.yml
@@ -0,0 +1,31 @@
+Collections:
+  - Name: BYOT
+    Metadata:
+      Training Data:
+        - CIFAR100
+    Paper:
+      URL: https://arxiv.org/pdf/2107.06916.pdf
+      Title: Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+    README: configs/distill/mmcls/byot/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/luanyunteng/pytorch-be-your-own-teacher
+Models:
+  - Name: byot_resnet18_8xb16_cifar100
+    In Collection: BYOT
+    Metadata:
+      inference time (ms/im):
+        - value: 0.62
+          hardware: V100
+          backend: PyTorch
+          batch size: 16
+          mode: FP32
+          resolution: (32, 32)
+    Results:
+      - Task: Classification
+        Dataset: CIFAR100
+        Metrics:
+          Top 1 Accuracy: 80.66
+          Top 5 Accuracy: 95.76
+    Weights: https://download.openmmlab.com/mmrazor/v1/byot/byot_resnet18_8xb16_cifar100_20220817_191217-0251084e.pth
+    Config: configs/distill/mmcls/byot/byot_resnet18_8xb16_cifar100.py
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0f02f365eb6a8e7365780a9f9ac9b00932bebb01
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/README.md
@@ -0,0 +1,30 @@
+# CONTRASTIVE REPRESENTATION DISTILLATION
+
+> [CONTRASTIVE REPRESENTATION DISTILLATION](https://arxiv.org/abs/1910.10699)
+
+## Abstract
+
+Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture signiﬁcantly more information in the teacher’s representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation.[Original code](http://github.com/HobbitLong/RepDistiller)
+
+![pipeline](../../../../docs/en/imgs/model_zoo/crd/pipeline.jpg)
+
+## Citation
+
+```latex
+@article{tian2019contrastive,
+  title={Contrastive representation distillation},
+  author={Tian, Yonglong and Krishnan, Dilip and Isola, Phillip},
+  journal={arXiv preprint arXiv:1910.10699},
+  year={2019}
+}
+```
+
+## Results and models
+
+| Dataset | Model     | Teacher   | Top-1 (%) | Top-5 (%) | Configs                                     | Download                                                                                                                                     |
+| ------- | --------- | --------- | --------- | --------- | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
+| CIFAR10 | ResNet-18 | ResNet-50 | 94.79     | 99.86     | [config](crd_neck_r50_r18_8xb16_cifar10.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_b16x8_cifar10_20210528-f54bfad9.pth) \|[model](<>) \| [log](<>) |
+
+## Acknowledgement
+
+Shout out to @chengshuang18 for his special contribution.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/crd_neck_r50_r18_8xb16_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/crd_neck_r50_r18_8xb16_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e36e9a2ade5cd6cf72830fcc41c52b22b3ecd6a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/crd_neck_r50_r18_8xb16_cifar10.py
@@ -0,0 +1,108 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb16_cifar10.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb16_cifar10.py', pretrained=True),
+    teacher_ckpt='resnet50_b16x8_cifar10_20210528-f54bfad9.pth',
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            neck=dict(type='ModuleOutputs', source='neck.gap'),
+            data_samples=dict(type='ModuleInputs', source='')),
+        teacher_recorders=dict(
+            neck=dict(type='ModuleOutputs', source='neck.gap')),
+        distill_losses=dict(loss_crd=dict(type='CRDLoss', loss_weight=0.8)),
+        connectors=dict(
+            loss_crd_stu=dict(type='CRDConnector', dim_in=512, dim_out=128),
+            loss_crd_tea=dict(type='CRDConnector', dim_in=2048, dim_out=128)),
+        loss_forward_mappings=dict(
+            loss_crd=dict(
+                s_feats=dict(
+                    from_student=True,
+                    recorder='neck',
+                    connector='loss_crd_stu'),
+                t_feats=dict(
+                    from_student=False,
+                    recorder='neck',
+                    connector='loss_crd_tea'),
+                data_samples=dict(
+                    from_student=True, recorder='data_samples', data_idx=1)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
+
+# change `CIFAR10` dataset to `CRDDataset` dataset.
+dataset_type = 'CIFAR10'
+train_pipeline = [
+    dict(_scope_='mmcls', type='RandomCrop', crop_size=32, padding=4),
+    dict(_scope_='mmcls', type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(_scope_='mmrazor', type='PackCRDClsInputs'),
+]
+
+test_pipeline = [
+    dict(_scope_='mmrazor', type='PackCRDClsInputs'),
+]
+
+ori_train_dataset = dict(
+    _scope_='mmcls',
+    type=dataset_type,
+    data_prefix='data/cifar10',
+    test_mode=False,
+    pipeline=train_pipeline)
+
+crd_train_dataset = dict(
+    _scope_='mmrazor',
+    type='CRDDataset',
+    dataset=ori_train_dataset,
+    neg_num=16384,
+    sample_mode='exact',
+    percent=1.0)
+
+ori_test_dataset = dict(
+    _scope_='mmcls',
+    type=dataset_type,
+    data_prefix='data/cifar10/',
+    test_mode=True,
+    pipeline=test_pipeline)
+
+crd_test_dataset = dict(
+    _scope_='mmrazor',
+    type='CRDDataset',
+    dataset=ori_test_dataset,
+    neg_num=16384,
+    sample_mode='exact',
+    percent=1.0)
+
+train_dataloader = dict(
+    _delete_=True,
+    batch_size=16,
+    num_workers=2,
+    dataset=crd_train_dataset,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    _delete_=True,
+    batch_size=16,
+    num_workers=2,
+    dataset=crd_test_dataset,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/datasets/crd_cifar10_bs16.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/datasets/crd_cifar10_bs16.py
new file mode 100644
index 0000000000000000000000000000000000000000..c7cb74c39e0779e9581457838e6a7d083b7b4521
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/crd/datasets/crd_cifar10_bs16.py
@@ -0,0 +1,49 @@
+# dataset settings
+dataset_type = 'CIFAR10'
+preprocess_cfg = dict(
+    # RGB format normalization parameters
+    mean=[125.307, 122.961, 113.8575],
+    std=[51.5865, 50.847, 51.255],
+    # loaded images are already RGB format
+    to_rgb=False)
+
+train_pipeline = [
+    dict(type='RandomCrop', crop_size=32, padding=4),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='PackClsInputs'),
+]
+
+test_pipeline = [
+    dict(type='PackClsInputs'),
+]
+
+neg_num = 16384
+train_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10',
+        test_mode=False,
+        pipeline=train_pipeline,
+        neg_num=neg_num),
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    persistent_workers=True,
+)
+
+val_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10/',
+        test_mode=True,
+        pipeline=test_pipeline,
+        neg_num=neg_num),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+val_evaluator = dict(type='Accuracy', topk=(1, ))
+
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..76bf6dee6e5f156db4e73dc5f91b8206b5537b1d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/README.md
@@ -0,0 +1,38 @@
+# Data-Free Learning of Student Networks (DAFL)
+
+> [Data-Free Learning of Student Networks](https://doi.org/10.1109/ICCV.2019.00361)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for deviating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the pro- posed Data-Free Learning (DAFL) method achieve 92.22% and 74.47% accuracies using ResNet-18 without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.
+
+<img width="910" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187423163-b34896fc-8516-403b-acd7-4c0b8e43af5b.png">
+
+## Results and models
+
+### Classification
+
+|     Location      | Dataset |                                                     Teacher                                                     |                                                     Student                                                     |  Acc  | Acc(T) | Acc(S) |                           Config                            | Download                                                                                                                                                                                                                                                                                                                                                                                  |
+| :---------------: | :-----: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :---------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| backbone & logits | Cifar10 | [resnet34](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet34_8xb16_cifar10.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb16_cifar10.py) | 93.27 | 95.34  | 94.82  | [config](./dafl_logits_resnet34_resnet18_8xb256_cifar10.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/DAFL/dafl_logits_resnet34_resnet18_8xb256_cifar10_20220815_202654-67142167.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/DAFL/dafl_logits_resnet34_resnet18_8xb256_cifar10_20220815_202654-67142167.json) |
+
+## Citation
+
+```latex
+@inproceedings{DBLP:conf/iccv/ChenW0YLSXX019,
+  author    = {Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu,
+               Boxin Shi, Chunjing Xu, Chao Xu and Qi Tian},
+  title     = {Data-Free Learning of Student Networks},
+  booktitle = {2019 {IEEE/CVF} International Conference on Computer Vision, {ICCV}
+               2019, Seoul, Korea (South), October 27 - November 2, 2019},
+  pages     = {3513--3521},
+  publisher = {{IEEE}},
+  year      = {2019},
+  url       = {https://doi.org/10.1109/ICCV.2019.00361},
+  doi       = {10.1109/ICCV.2019.00361},
+  timestamp = {Mon, 17 May 2021 08:18:18 +0200},
+  biburl    = {https://dblp.org/rec/conf/iccv/ChenW0YLSXX019.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/dafl_logits_resnet34_resnet18_8xb256_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/dafl_logits_resnet34_resnet18_8xb256_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a3a3f264cf658663ede6b05a72a1b638f42f1de
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/dafl_logits_resnet34_resnet18_8xb256_cifar10.py
@@ -0,0 +1,105 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+res34_ckpt_path = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='DAFLDataFreeDistillation',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[125.307, 122.961, 113.8575],
+        std=[51.5865, 50.847, 51.255],
+        # convert image from BGR to RGB
+        bgr_to_rgb=False),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb16_cifar10.py', pretrained=False),
+    teachers=dict(
+        res34=dict(
+            build_cfg=dict(
+                cfg_path='mmcls::resnet/resnet34_8xb16_cifar10.py',
+                pretrained=True),
+            ckpt_path=res34_ckpt_path)),
+    generator=dict(
+        type='DAFLGenerator',
+        img_size=32,
+        latent_dim=1000,
+        hidden_channels=128),
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=6, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='res34_fc')))),
+    generator_distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            res34_neck_gap=dict(type='ModuleOutputs', source='res34.neck.gap'),
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(
+            loss_res34_oh=dict(type='OnehotLikeLoss', loss_weight=0.05),
+            loss_res34_ie=dict(type='InformationEntropyLoss', loss_weight=5),
+            loss_res34_ac=dict(type='ActivationLoss', loss_weight=0.01)),
+        loss_forward_mappings=dict(
+            loss_res34_oh=dict(
+                preds_T=dict(from_student=False, recorder='res34_fc')),
+            loss_res34_ie=dict(
+                preds_T=dict(from_student=False, recorder='res34_fc')),
+            loss_res34_ac=dict(
+                feat_T=dict(from_student=False, recorder='res34_neck_gap')))))
+
+# model wrapper
+model_wrapper_cfg = dict(
+    type='mmengine.MMSeparateDistributedDataParallel',
+    broadcast_buffers=False,
+    find_unused_parameters=False)
+
+find_unused_parameters = True
+
+# optimizer wrapper
+optim_wrapper = dict(
+    _delete_=True,
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(optimizer=dict(type='AdamW', lr=1e-1)),
+    generator=dict(optimizer=dict(type='AdamW', lr=1e-3)))
+
+auto_scale_lr = dict(base_batch_size=256)
+
+param_scheduler = dict(
+    _delete_=True,
+    architecture=[
+        dict(type='LinearLR', end=500, by_epoch=False, start_factor=0.0001),
+        dict(
+            type='MultiStepLR',
+            begin=500,
+            milestones=[100 * 120, 200 * 120],
+            by_epoch=False)
+    ],
+    generator=dict(
+        type='LinearLR', end=500, by_epoch=False, start_factor=0.0001))
+
+train_cfg = dict(
+    _delete_=True, by_epoch=False, max_iters=250 * 120, val_interval=150)
+
+train_dataloader = dict(
+    batch_size=256, sampler=dict(type='InfiniteSampler', shuffle=True))
+val_dataloader = dict(batch_size=256)
+val_evaluator = dict(type='Accuracy', topk=(1, 5))
+
+default_hooks = dict(
+    logger=dict(type='LoggerHook', interval=75, log_metric_by_epoch=False),
+    checkpoint=dict(
+        type='CheckpointHook', by_epoch=False, interval=150, max_keep_ckpts=2))
+
+log_processor = dict(by_epoch=False)
+# Must set diff_rank_seed to True!
+randomness = dict(seed=None, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..9438b8781c8cd622cc44852f34077aef847215fa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dafl/metafile.yml
@@ -0,0 +1,43 @@
+Collections:
+  - Name: DAFL
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://doi.org/10.1109/ICCV.2019.00361
+      Title: Data-Free Learning of Student Networks
+    README: configs/distill/mmcls/dafl/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/huawei-noah/Efficient-Computing/tree/master/Data-Efficient-Model-Compression/DAFL
+Models:
+  - Name: dafl_logits_resnet34_resnet18_8xb256_cifar10
+    In Collection: DAFL
+    Metadata:
+      inference time (ms/im):
+        - value: 0.34
+          hardware: NVIDIA A100-SXM4-80GB
+          backend: PyTorch
+          batch size: 256
+          mode: FP32
+          resolution: (32, 32)
+      Location: logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+        Metrics:
+          Top 1 Accuracy: 94.82
+          Top 5 Accuracy: 99.87
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth
+        Metrics:
+          Top 1 Accuracy: 95.34
+          Top 5 Accuracy: 99.87
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 93.27
+    Config: configs/distill/mmcls/dafl/dafl_logits_resnet34_resnet18_8xb256_cifar10.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/DAFL/dafl_logits_resnet34_resnet18_8xb256_cifar10_20220815_202654-67142167.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4ccfa8cc91296842c490b26d8cc6f5d893374da5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/README.md
@@ -0,0 +1,45 @@
+# DeiT
+
+> [](https://arxiv.org/abs/2012.12877)
+> Training data-efficient image transformers & distillation through attention
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption.   In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet with no external data.   More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and models.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/143225703-c287c29e-82c9-4c85-a366-dfae30d198cd.png" width="40%"/>
+</div>
+
+## Results and models
+
+### Classification
+
+| Dataset  | Model     | Teacher     | Top-1 (%) | Top-5 (%) | Configs                                          | Download                                                                                                                                                                                                                                                                |
+| -------- | --------- | ----------- | --------- | --------- | ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ImageNet | Deit-base | RegNety-160 | 83.24     | 96.33     | [config](deit-base_regnety160_pt-16xb64_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/deit/deit-base/deit-base_regnety160_pt-16xb64_in1k_20221011_113403-a67bf475.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/deit/deit-base/deit-base_regnety160_pt-16xb64_in1k_20221011_113403-a67bf475.json) |
+
+```{warning}
+Before training, please first install `timm`.
+
+pip install timm
+or
+git clone https://github.com/rwightman/pytorch-image-models
+cd pytorch-image-models && pip install -e .
+```
+
+## Citation
+
+```
+@InProceedings{pmlr-v139-touvron21a,
+  title =     {Training data-efficient image transformers &amp; distillation through attention},
+  author =    {Touvron, Hugo and Cord, Matthieu and Douze, Matthijs and Massa, Francisco and Sablayrolles, Alexandre and Jegou, Herve},
+  booktitle = {International Conference on Machine Learning},
+  pages =     {10347--10357},
+  year =      {2021},
+  volume =    {139},
+  month =     {July}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/deit-base_regnety160_pt-16xb64_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/deit-base_regnety160_pt-16xb64_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2cfaf56a8603353b3e1ac857e98a5b69d0f8819
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/deit-base_regnety160_pt-16xb64_in1k.py
@@ -0,0 +1,64 @@
+_base_ = ['mmcls::deit/deit-base_pt-16xb64_in1k.py']
+
+# student settings
+student = _base_.model
+student.backbone.type = 'DistilledVisionTransformer'
+student.head = dict(
+    type='mmrazor.DeiTClsHead',
+    num_classes=1000,
+    in_channels=768,
+    loss=dict(
+        type='mmcls.LabelSmoothLoss',
+        label_smooth_val=0.1,
+        mode='original',
+        loss_weight=0.5))
+
+data_preprocessor = dict(
+    type='mmcls.ClsDataPreprocessor', batch_augments=student.train_cfg)
+
+# teacher settings
+checkpoint_path = 'https://dl.fbaipublicfiles.com/deit/regnety_160-a5fe301d.pth'  # noqa: E501
+teacher = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(
+        type='TIMMBackbone', model_name='regnety_160', pretrained=True),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=3024,
+        loss=dict(
+            type='LabelSmoothLoss',
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=0.5),
+        topk=(1, 5),
+        init_cfg=dict(
+            type='Pretrained', checkpoint=checkpoint_path, prefix='head.')))
+
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='SingleTeacherDistill',
+    architecture=student,
+    teacher=teacher,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.layers.head_dist')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_distill=dict(
+                type='CrossEntropyLoss',
+                loss_weight=0.5,
+            )),
+        loss_forward_mappings=dict(
+            loss_distill=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6fe41c3a9af3061bde87fbc7b55c2054b7653b2b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/deit/metafile.yml
@@ -0,0 +1,34 @@
+Collections:
+  - Name: DEIT
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/2012.12877
+      Title: Training data-efficient image transformers & distillation through attention
+    README: configs/distill/mmcls/deit/README.md
+
+Models:
+  - Name: deit-base_regnety160_pt-16xb64_in1k
+    In Collection: DEIT
+    Metadata:
+      Student:
+        Config: mmcls::deit/deit-base_pt-16xb64_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/deit/deit-base_pt-16xb64_in1k_20220216-db63c16c.pth
+        Metrics:
+          Top 1 Accuracy: 81.76
+          Top 5 Accuracy: 95.81
+      Teacher:
+        Config: mmrazor::distill/mmcls/deit/deit-base_regnety160_pt-16xb64_in1k.py
+        Weights: https://dl.fbaipublicfiles.com/deit/regnety_160-a5fe301d.pth
+        Metrics:
+          Top 1 Accuracy: 82.83
+          Top 5 Accuracy: 96.42
+    Results:
+      - Task: Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 83.24
+          Top 5 Accuracy: 96.33
+    Weights: https://download.openmmlab.com/mmrazor/v1/deit/deit-base/deit-base_regnety160_pt-16xb64_in1k_20221011_113403-a67bf475.pth
+    Config: configs/distill/mmcls/deit/deit-base_regnety160_pt-16xb64_in1k.py
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4f81fcc471381864e7b4966c65a2488166ba6ac8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/README.md
@@ -0,0 +1,30 @@
+# Data-Free Adversarial Distillation (DFAD)
+
+> [Data-Free Adversarial Distillation](https://arxiv.org/pdf/1912.11006.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results.
+
+<img width="1001" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187423332-30a5d409-6f83-45d7-9e11-e306f7ffec78.png">
+
+## Results and models
+
+### Classification
+
+| Location | Dataset |                                                     Teacher                                                     |                                                     Student                                                     |  Acc  | Acc(T) | Acc(S) |                           Config                           | Download                                                                                                                                                                                                                                                                                                                                                                                |
+| :------: | :-----: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :--------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|  logits  | Cifar10 | [resnet34](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet34_8xb16_cifar10.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb16_cifar10.py) | 92.80 | 95.34  | 94.82  | [config](./dfad_logits_resnet34_resnet18_8xb32_cifar10.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/DFAD/dfad_logits_resnet34_resnet18_8xb32_cifar10_20220819_051141-961a5b09.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/DFAD/dfad_logits_resnet34_resnet18_8xb32_cifar10_20220819_051141-961a5b09.json) |
+
+## Citation
+
+```latex
+@article{fang2019data,
+  title={Data-free adversarial distillation},
+  author={Fang, Gongfan and Song, Jie and Shen, Chengchao and Wang, Xinchao and Chen, Da and Song, Mingli},
+  journal={arXiv preprint arXiv:1912.11006},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/dfad_logits_resnet34_resnet18_8xb32_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/dfad_logits_resnet34_resnet18_8xb32_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..59bc4d325c5a18f6808c2c51255800e0bfc2ef69
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/dfad_logits_resnet34_resnet18_8xb32_cifar10.py
@@ -0,0 +1,100 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+res34_ckpt_path = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='DataFreeDistillation',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[125.307, 122.961, 113.8575],
+        std=[51.5865, 50.847, 51.255],
+        # convert image from BGR to RGB
+        bgr_to_rgb=False),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb16_cifar10.py', pretrained=False),
+    teachers=dict(
+        res34=dict(
+            build_cfg=dict(
+                cfg_path='mmcls::resnet/resnet34_8xb16_cifar10.py',
+                pretrained=True),
+            ckpt_path=res34_ckpt_path)),
+    generator=dict(
+        type='DAFLGenerator',
+        img_size=32,
+        latent_dim=256,
+        hidden_channels=128,
+        bn_eps=1e-5),
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(loss_kl=dict(type='L1Loss', loss_weight=1.0)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                s_feature=dict(from_student=True, recorder='fc'),
+                t_feature=dict(from_student=False, recorder='res34_fc')))),
+    generator_distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(loss_l1=dict(type='L1Loss', loss_weight=-1.0)),
+        loss_forward_mappings=dict(
+            loss_l1=dict(
+                s_feature=dict(from_student=True, recorder='fc'),
+                t_feature=dict(from_student=False, recorder='res34_fc')))),
+    student_iter=5,
+    student_train_first=True)
+
+# model wrapper
+model_wrapper_cfg = dict(
+    type='mmengine.MMSeparateDistributedDataParallel',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+# optimizer wrapper
+optim_wrapper = dict(
+    _delete_=True,
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(
+        optimizer=dict(type='SGD', lr=0.1, weight_decay=5e-4, momentum=0.9)),
+    generator=dict(optimizer=dict(type='AdamW', lr=1e-3)))
+
+auto_scale_lr = dict(base_batch_size=32)
+
+iter_size = 50
+param_scheduler = dict(
+    _delete_=True,
+    architecture=dict(
+        type='MultiStepLR',
+        milestones=[100 * iter_size, 200 * iter_size],
+        by_epoch=False),
+    generator=dict(
+        type='MultiStepLR',
+        milestones=[100 * iter_size, 200 * iter_size],
+        by_epoch=False))
+
+train_cfg = dict(
+    _delete_=True, by_epoch=False, max_iters=500 * iter_size, val_interval=250)
+
+train_dataloader = dict(
+    batch_size=32, sampler=dict(type='InfiniteSampler', shuffle=True))
+val_dataloader = dict(batch_size=32)
+val_evaluator = dict(type='Accuracy', topk=(1, 5))
+
+default_hooks = dict(
+    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
+    checkpoint=dict(
+        type='CheckpointHook', by_epoch=False, interval=100, max_keep_ckpts=2))
+
+log_processor = dict(by_epoch=False)
+# Must set diff_rank_seed to True!
+randomness = dict(seed=None, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0601f895cafe4dd70c11146542ee7bb5867a3d5e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dfad/metafile.yml
@@ -0,0 +1,43 @@
+Collections:
+  - Name: DFAD
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://arxiv.org/pdf/1912.11006.pdf
+      Title: Data-Free Adversarial Distillation
+    README: configs/distill/mmcls/dfad/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/VainF/Data-Free-Adversarial-Distillation
+Models:
+  - Name: dfad_logits_resnet34_resnet18_8xb32_cifar10
+    In Collection: DFAD
+    Metadata:
+      inference time (ms/im):
+        - value: 0.38
+          hardware: NVIDIA A100-SXM4-80GB
+          backend: PyTorch
+          batch size: 32
+          mode: FP32
+          resolution: (32, 32)
+      Location: logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+        Metrics:
+          Top 1 Accuracy: 94.82
+          Top 5 Accuracy: 99.87
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth
+        Metrics:
+          Top 1 Accuracy: 95.34
+          Top 5 Accuracy: 99.87
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 92.80
+    Config: configs/distill/mmcls/dfad/dfad_logits_resnet34_resnet18_8xb32_cifar10.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/DFAD/dfad_logits_resnet34_resnet18_8xb32_cifar10_20220819_051141-961a5b09.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e040d54b4f86c30c629a8d444a0ac5a45246f994
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/README.md
@@ -0,0 +1,47 @@
+# Decoupled Knowledge Distillation
+
+> [Decoupled Knowledge Distillation](https://arxiv.org/pdf/2203.08679.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we reformulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the "difficulty" of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation (DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. Compared with complex feature-based methods, our DKD achieves comparable or even better results and has better training efficiency on CIFAR-100, ImageNet, and MS-COCO datasets for image classification and object detection tasks. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research. The code is available at https://github.com/megvii-research/mdistiller.
+
+<img width="921" alt="dkd" src="https://user-images.githubusercontent.com/88702197/187423438-c9eadb93-826f-471c-9553-bdae2e434541.png">
+
+## Results and models
+
+### Classification
+
+| Dataset  | Model     | Teacher   | Top-1 (%) | Top-5 (%) | Configs                                       | Download                                                                                                                                                                                                                                    |
+| -------- | --------- | --------- | --------- | --------- | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ImageNet | ResNet-18 | ResNet-34 | 71.368    | 90.256    | [config](dkd_resnet34_resnet18_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/dkd/dkd_resnet34_resnet18_8xb32_in1k_20220804_202619-f9519768.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/dkd/dkd_resnet34_resnet18_8xb32_in1k_20220804_202619-f9519768.json) |
+
+## Citation
+
+```latex
+@article{zhao2022decoupled,
+  title={Decoupled Knowledge Distillation},
+  author={Zhao, Borui and Cui, Quan and Song, Renjie and Qiu, Yiyu and Liang, Jiajun},
+  journal={arXiv preprint arXiv:2203.08679},
+  year={2022}
+}
+```
+
+## Getting Started
+
+### Download teacher ckpt from
+
+https://mmclassification.readthedocs.io/en/latest/papers/resnet.html
+
+### Distillation training.
+
+```bash
+sh tools/dist_train.sh \
+  configs/distill/mmcls/dkd/dkd_logits_r34_r18_8xb32_in1k.py 8
+```
+
+## Acknowledgement
+
+Shout out to Davidgzx for his special contribution.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/dkd_resnet34_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/dkd_resnet34_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..391ccf5b6f990152265485b37d5deee93f421caa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/dkd_resnet34_resnet18_8xb32_in1k.py
@@ -0,0 +1,47 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=True),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc'),
+            gt_labels=dict(type='ModuleInputs', source='head.loss_module')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_dkd=dict(
+                type='DKDLoss',
+                tau=1,
+                beta=0.5,
+                loss_weight=1,
+                reduction='mean')),
+        loss_forward_mappings=dict(
+            loss_dkd=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc'),
+                gt_labels=dict(
+                    recorder='gt_labels', from_student=True, data_idx=1)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..72d65b96a4a20c00b864d00c9e66cdd9ed36bdf6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/dkd/metafile.yml
@@ -0,0 +1,43 @@
+Collections:
+  - Name: DKD
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/pdf/2203.08679.pdf
+      Title: Decoupled Knowledge Distillation
+    README: configs/distill/mmcls/dkd/README.md
+    Converted From:
+      Code:
+        URL:  https://github.com/megvii-research/mdistiller
+Models:
+  - Name: dkd_resnet34_resnet18_8xb32_in1k
+    In Collection: DKD
+    Metadata:
+      inference time (ms/im):
+        - value: 0.75
+          hardware: V100
+          backend: PyTorch
+          batch size: 16
+          mode: FP32
+          resolution: (224, 224)
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth
+        Metrics:
+          Top 1 Accuracy: 73.62
+          Top 5 Accuracy: 91.59
+    Results:
+      - Task: Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 71.368
+          Top 5 Accuracy: 90.256
+    Weights: https://download.openmmlab.com/mmrazor/v1/dkd/dkd_resnet34_resnet18_8xb32_in1k_20220804_202619-f9519768.pth
+    Config: configs/distill/mmcls/dkd/dkd_resnet34_resnet18_8xb32_in1k.py
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..795df563c2608bbeb4a559eb0365bade516c8ad9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/README.md
@@ -0,0 +1,53 @@
+# Paraphrasing Complex Network: Network Compression via Factor Transfer
+
+> [Paraphrasing Complex Network: Network Compression via Factor Transfer](https://arxiv.org/abs/1802.04977)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Many researchers have sought ways of model compression to reduce the size of a deep neural network (DNN) with minimal performance degradation in order to use DNNs in embedded systems. Among the model compression methods, a method called knowledge transfer is to train a student network with a stronger teacher network. In this paper, we propose a novel knowledge transfer method which uses convolutional operations to paraphrase teacher’s knowledge and to translate it for the student. This is done by two convolutional modules, which are called a paraphraser and a translator. The paraphraser is trained in an unsupervised manner to extract the teacher factors which are defined as paraphrased information of the teacher network. The translator located at the student network extracts the student factors and helps to translate the teacher factors by mimicking them. We observed that our student network trained with the proposed factor transfer method outperforms the ones trained with conventional knowledge transfer methods. The original code is available at this [link](https://github.com/Jangho-Kim/Factor-Transfer-pytorch)
+
+## Results and models
+
+| Dataset | Model     | Teacher   | Top-1 (%) | Top-5 (%) | Configs                                                                                                                                                            | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| ------- | --------- | --------- | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| CIFAR10 | ResNet-18 | ResNet-50 | 94.86     | 99.88     | [pretrain](./factor-transfer_backbone_resnet50_resnet18_8xb32_cifar10_pretrain.py) \| [train](./factor-transfer_backbone_resnet50_resnet18_8xb32_cifar10_train.py) | [pretrain model](https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain_20220831_173259-ebdb09e2.pth) \| [pretrain log](https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain_20220831_173259-ebdb09e2.json) \| [train model](https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train_20220831_201322-943df33f.pth) \| [train log](https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train_20220831_201322-943df33f.json) |
+
+## Getting Started
+
+### Connectors pre-training.
+
+```bash
+sh tools/dist_train.sh $PARTITION $JOB_NAME \
+  configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb32_cifar10_pretrain.py \
+  $PRETRAIN_WORK_DIR
+```
+
+### Distillation training.
+
+```bash
+sh tools/dist_train.sh $PARTITION $JOB_NAME \
+  configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb32_cifar10_train.py \
+  $DISTILLATION_WORK_DIR
+```
+
+### Test
+
+```bash
+sh tools/dist_test.sh $PARTITION $JOB_NAME \
+  configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb32_cifar10_train.py \
+  $DISTILLATION_WORK_DIR/latest.sh --eval $EVAL_SETTING
+```
+
+## Citation
+
+```latex
+@inproceedings{kim2018paraphrasing,
+  title={Paraphrasing complex network: network compression via factor transfer},
+  author={Kim, Jangho and Park, SeongUk and Kwak, Nojun},
+  booktitle={Proceedings of the 32nd International Conference on Neural Information Processing Systems},
+  pages={2765--2774},
+  year={2018}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain.py
new file mode 100644
index 0000000000000000000000000000000000000000..33903cdf2ea8e317f2da0303bc8ab34a6457089d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain.py
@@ -0,0 +1,59 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1)
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb16_cifar10.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb16_cifar10.py', pretrained=True),
+    teacher_ckpt=  # noqa: E251
+    'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_b16x8_cifar10_20210528-f54bfad9.pth',  # noqa: E501
+    calculate_student_loss=False,
+    student_trainable=False,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs',
+                       source='backbone.layer4.1.conv2')),
+        teacher_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs',
+                       source='backbone.layer4.2.conv3')),
+        distill_losses=dict(
+            loss_s4_pretrain=dict(type='L2Loss', loss_weight=1.0)),
+        connectors=dict(
+            loss_s4_sfeat=dict(
+                type='Translator', in_channel=512, out_channel=1024),
+            loss_s4_tfeat=dict(
+                type='Paraphraser',
+                phase='pretrain',
+                in_channel=2048,
+                out_channel=1024)),
+        loss_forward_mappings=dict(
+            loss_s4_pretrain=dict(
+                s_feature=dict(
+                    # it actually is t_feature
+                    from_student=False,
+                    recorder='bb_s4'),
+                t_feature=dict(
+                    from_student=False,
+                    recorder='bb_s4',
+                    connector='loss_s4_tfeat'),
+            ))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba1b9f25888ce227f9abdbc62121ce9142e708e5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train.py
@@ -0,0 +1,29 @@
+_base_ = [
+    './factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain.py'
+]
+
+train_cfg = dict(by_epoch=True, max_epochs=200, val_interval=1)
+
+model = dict(
+    calculate_student_loss=True,
+    student_trainable=True,
+    distiller=dict(
+        distill_losses=dict(loss_s4=dict(type='FTLoss', loss_weight=1.0)),
+        connectors=dict(loss_s4_tfeat=dict(phase='train')),
+        loss_forward_mappings=dict(
+            _delete_=True,
+            loss_s4=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s4',
+                    connector='loss_s4_sfeat'),
+                t_feature=dict(
+                    from_student=False,
+                    recorder='bb_s4',
+                    connector='loss_s4_tfeat'),
+            ))),
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_pretrain_20220831_173259-ebdb09e2.pth'  # noqa: E501
+    ))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..615dc34999ee4bd03eda4ca24b2df00c78c1763c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/factor_transfer/metafile.yml
@@ -0,0 +1,38 @@
+Collections:
+  - Name: FactorTransfer
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://arxiv.org/abs/1802.04977
+      Title: 'Paraphrasing Complex Network: Network Compression via Factor Transfer'
+    README: configs/distill/mmcls/factor_transfer/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/dev-1.x/mmrazor/models/losses/factor_transfer_loss.py
+      Version: v2.0.0
+    Converted From:
+      Code: https://github.com/Jangho-Kim/Factor-Transfer-pytorch
+Models:
+  - Name: factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train
+    In Collection: FactorTransfer
+    Metadata:
+      Location: backbone
+      Student:
+        Config: mmcls::resnet/resnet18_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+        Metrics:
+          Top 1 Accuracy: 94.82
+          Top 5 Accuracy: 99.87
+      Teacher:
+        Config: mmcls::resnet/resnet50_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_b16x8_cifar10_20210528-f54bfad9.pth
+        Metrics:
+          Top 1 Accuracy: 95.55
+          Top 5 Accuracy: 99.91
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 94.8800
+    Config: configs/distill/mmcls/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/factor_transfer/factor-transfer_backbone_resnet50_resnet18_8xb16_cifar10_train_20220831_201322-943df33f.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8a6515a3c6f89717b3e821c58b281d3c36451276
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/README.md
@@ -0,0 +1,48 @@
+# FitNets
+
+> [FitNets: Hints for Thin Deep Nets](https://arxiv.org/abs/1412.6550)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+While depth tends to improve network performances, it also makes gradient-based
+training more difficult since deeper networks tend to be more non-linear. The recently
+proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute
+models, and it has shown that a student network could imitate the soft output of a larger
+teacher network or ensemble of networks. In this paper, we extend this idea to allow the
+training of a student that is deeper and thinner than the teacher, using not only the outputs
+but also the intermediate representations learned by the teacher as hints to improve the
+training process and final performance of the student. Because the student intermediate hidden
+layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters
+are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This
+allows one to train deeper students that can generalize better or run faster, a trade-off that is
+controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with
+almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.
+
+<img width="743" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187423686-68719140-a978-4a19-a684-42b1d793d1fb.png">
+
+## Results and models
+
+### Classification
+
+|     Location      | Dataset  |                                                   Teacher                                                    |                                                   Student                                                    |  Acc  | Acc(T) | Acc(S) |                               Config                                | Download                                                                                                                                                                                                                                                                                                                                                                                                     |
+| :---------------: | :------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :-----------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| backbone & logits | ImageNet | [resnet50](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb32_in1k.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb32_in1k.py) | 70.58 | 76.55  | 69.90  | [config](./fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/FieNets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k_20220830_155608-00ccdbe2.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/FieNets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k_20220830_155608-00ccdbe2.json) |
+
+## Citation
+
+```latex
+@inproceedings{DBLP:journals/corr/RomeroBKCGB14,
+  author    = {Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio},
+  editor    = {Yoshua Bengio and Yann LeCun},
+  title     = {FitNets: Hints for Thin Deep Nets},
+  booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
+               San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
+  year      = {2015},
+  url       = {http://arxiv.org/abs/1412.6550},
+  timestamp = {Thu, 25 Jul 2019 14:25:38 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/RomeroBKCGB14.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..5947684a6ea806f297a22202c30802de67cdb3f7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k.py
@@ -0,0 +1,72 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=True),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs', source='backbone.layer4.1.relu'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.1.relu'),
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            bb_s4=dict(type='ModuleOutputs', source='backbone.layer4.2.relu'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.5.relu'),
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_s4=dict(type='L2Loss', loss_weight=10),
+            loss_s3=dict(type='L2Loss', loss_weight=10),
+            loss_kl=dict(
+                type='KLDivergence', tau=6, loss_weight=10, reduction='mean')),
+        connectors=dict(
+            loss_s4_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=512,
+                out_channel=2048,
+                norm_cfg=dict(type='BN')),
+            loss_s3_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=256,
+                out_channel=1024,
+                norm_cfg=dict(type='BN'))),
+        loss_forward_mappings=dict(
+            loss_s4=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s4',
+                    record_idx=1,
+                    connector='loss_s4_sfeat'),
+                t_feature=dict(
+                    from_student=False, recorder='bb_s4', record_idx=2)),
+            loss_s3=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_s3',
+                    record_idx=1,
+                    connector='loss_s3_sfeat'),
+                t_feature=dict(
+                    from_student=False, recorder='bb_s3', record_idx=2)),
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..4dd7eb85db33f10780771961d251b4c895555a78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/fitnets/metafile.yml
@@ -0,0 +1,40 @@
+Collections:
+  - Name: FitNets
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/1412.6550
+      Title: FitNets- Hints for Thin Deep Nets
+    README: configs/distill/mmcls/fitnets/README.md
+Models:
+  - Name: fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k
+    In Collection: FitNets
+    Metadata:
+      inference time (ms/im):
+        - value: 0.18
+          hardware: NVIDIA A100-SXM4-80GB
+          backend: PyTorch
+          batch size: 32
+          mode: FP32
+          resolution: (224, 224)
+      Location: backbone & logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+          Top 5 Accuracy: 93.06
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.58
+    Config: configs/distill/mmcls/fitnets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/FieNets/fitnets_backbone_logits_resnet50_resnet18_8xb32_in1k_20220830_155608-00ccdbe2.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0fbe7bd9a75a2daeb50ee3aa614660bd5991e865
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/README.md
@@ -0,0 +1,34 @@
+# KD
+
+> [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187423762-e932dd3e-16cb-4714-a85f-cddfc906c1b7.png)
+
+## Results and models
+
+### Classification
+
+| Location | Dataset  |                                                    Teacher                                                    |                                                              Student                                                               |  Acc  | Acc(T) | Acc(S) |                             Config                             | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| :------: | :------: | :-----------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|  logits  | ImageNet | [resnet34](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/resnet/resnet34_8xb32_in1k.py) |           [resnet18](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/resnet/resnet18_8xb32_in1k.py)            | 71.81 | 73.62  | 69.90  |     [config](./kd_logits_resnet34_resnet18_8xb32_in1k.py)      | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/kd/kl_r18_w3/kd_logits_resnet34_resnet18_8xb32_in1k_w3_20221011_181115-5c6a834d.pth?versionId=CAEQThiBgID1_Me0oBgiIDE3NTk3MDgxZmU2YjRlMjVhMzg1ZTQwMmRhNmYyNGU2) \| [log](https://download.openmmlab.com/mmrazor/v1/kd/kl_r18_w3/kd_logits_resnet34_resnet18_8xb32_in1k_w3_20221011_181115-5c6a834d.json?versionId=CAEQThiBgMDx_se0oBgiIDQxNTM2MWZjZGRhNjRhZDZiZTIzY2Y0NDU3NDA4ODBl) |
+|  logits  | ImageNet | [resnet50](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/resnet/resnet50_8xb32_in1k.py) |    [mobilenet-v2](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py)     | 73.56 | 76.55  | 71.86  |   [config](./kd_logits_resnet50_mobilenet-v2_8xb32_in1k.py)    | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/kd/kl_mbv2_w3t1/kd_logits_resnet50_mobilenet-v2_8xb32_in1k_20221025_212407-6ea9e2a5.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/kd/kl_mbv2_w3t1/kd_logits_resnet50_mobilenet-v2_8xb32_in1k_20221025_212407-6ea9e2a5.json)                                                                                                                                               |
+|  logits  | ImageNet | [resnet50](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/resnet/resnet50_8xb32_in1k.py) | [shufflenet-v2](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) | 70.87 | 76.55  | 69.55  | [config](./kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/kd/kl_shuffle_w3t1/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k_20221025_224424-5d748c1b.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/kd/kl_shuffle_w3t1/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k_20221025_224424-5d748c1b.json)                                                                                                                               |
+
+## Citation
+
+```latex
+@article{hinton2015distilling,
+  title={Distilling the knowledge in a neural network},
+  author={Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff and others},
+  journal={arXiv preprint arXiv:1503.02531},
+  volume={2},
+  number={7},
+  year={2015}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet34_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet34_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..35921c03b3b9d39ec44b0dee36c57837b91adad1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet34_resnet18_8xb32_in1k.py
@@ -0,0 +1,39 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=3)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f82fb3b066682be97a10a8712d3d265a29916a1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,37 @@
+_base_ = ['mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py']
+
+student = _base_.model
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=student,
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=3)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe9dd5891b33270bf707de1a055c6426a7db4958
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k.py
@@ -0,0 +1,37 @@
+_base_ = ['mmcls::shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py']
+
+student = _base_.model
+
+teacher_ckpt = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=student,
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=3)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a208de78396b342d26104e737e27a6750d98de2a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/kd/metafile.yml
@@ -0,0 +1,82 @@
+Collections:
+  - Name: KD
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/1503.02531
+      Title: Distilling the Knowledge in a Neural Network
+    README: configs/distill/mmcls/kd/README.md
+
+Models:
+  - Name: kd_logits_resnet34_resnet18_8xb32_in1k
+    In Collection: KD
+    Metadata:
+      Location: logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth
+        Metrics:
+          Top 1 Accuracy: 73.62
+          Top 5 Accuracy: 91.59
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 71.81
+    Config: configs/distill/mmcls/kd/kd_logits_resnet34_resnet18_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/kd/kl_r18_w3/kd_logits_resnet34_resnet18_8xb32_in1k_w3_20221011_181115-5c6a834d.pth?versionId=CAEQThiBgID1_Me0oBgiIDE3NTk3MDgxZmU2YjRlMjVhMzg1ZTQwMmRhNmYyNGU2
+
+  - Name: kd_logits_resnet50_mobilenet-v2_8xb32_in1k
+    In Collection: KD
+    Metadata:
+      Location: logits
+      Student:
+        Config: mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+        Metrics:
+          Top 1 Accuracy: 71.86
+          Top 5 Accuracy: 90.42
+      Teacher:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+          Top 5 Accuracy: 93.06
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 73.56
+    Config: configs/distill/mmcls/kd/kd_logits_resnet50_mobilenet-v2_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/kd/kl_mbv2_w3t1/kd_logits_resnet50_mobilenet-v2_8xb32_in1k_20221025_212407-6ea9e2a5.pth
+
+  - Name: kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k
+    In Collection: KD
+    Metadata:
+      Location: logits
+      Student:
+        Config: mmcls::shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/shufflenet_v2/shufflenet_v2_batch1024_imagenet_20200812-5bf4721e.pth
+        Metrics:
+          Top 1 Accuracy: 69.55
+          Top 5 Accuracy: 88.92
+      Teacher:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+          Top 5 Accuracy: 93.06
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.87
+    Config: configs/distill/mmcls/kd/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/kd/kl_shuffle_w3t1/kd_logits_resnet50_shufflenet-v2-1x_16xb64_in1k_20221025_224424-5d748c1b.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..eb789e840baac3b25a7cbc63af54f31fffce3bbf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/README.md
@@ -0,0 +1,63 @@
+# Overhaul
+
+> [A Comprehensive Overhaul of Feature Distillation](https://arxiv.org/abs/1904.01866)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152. Our proposed method is evaluated on various tasks such as image classification, object detection and semantic segmentation and achieves a significant performance improvement in all tasks. The code is available at [link](https://sites.google.com/view/byeongho-heo/overhaul)
+
+### Feature-based Distillation
+
+![feature_base](https://user-images.githubusercontent.com/88702197/187423965-bb3bde16-c71a-43c6-903c-69aff1005415.png)
+
+### Margin ReLU
+
+![margin_relu](https://user-images.githubusercontent.com/88702197/187423981-67106ac2-48d9-4002-8b32-b92a90b1dacd.png)
+
+## Results and models
+
+### 1. Classification
+
+#### Vanilla
+
+| Dataset | Model                                                                   | Top-1 (%) | Download                                                                                                                                                                                                                              |
+| ------- | ----------------------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| CIFAR10 | [WRN16-2](../../../vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py) | 93.43     | [model](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn16_2_b16x8_cifar10_20220831_204709-446b466e.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn16_2_b16x8_cifar10_20220831_204709-446b466e.json) |
+| CIFAR10 | [WRN28-4](../../../vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py) | 95.49     | [model](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.json) |
+
+#### Distillation
+
+| Dataset | Model   | Flops(M) | Teacher | Top-1 (%) | Configs                                                     | Download                                                                                                                                                                                                                                                                     |
+| ------- | ------- | -------- | ------- | --------- | ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| CIFAR10 | WRN16-2 | 101      | WRN28-4 | 94.21     | [config](./ofd_backbone_resnet50_resnet18_8xb16_cifar10.py) | [model](https://download.openmmlab.com/mmrazor/v1/overhaul/ofd_backbone_resnet50_resnet18_8xb16_cifar10_20230417_192216-ace2908f.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/overhaul/ofd_backbone_resnet50_resnet18_8xb16_cifar10_20230417_192216-ace2908f.log) |
+
+## Getting Started
+
+### Distillation training.
+
+```bash
+sh tools/slurm_train.sh $PARTITION $JOB_NAME \
+  configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py \
+  $DISTILLATION_WORK_DIR
+```
+
+### Test
+
+```bash
+sh tools/slurm_test.sh $PARTITION $JOB_NAME \
+  configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py \
+  $DISTILLATION_WORK_DIR/latest.pth --eval $EVAL_SETTING
+```
+
+## Citation
+
+```latex
+@inproceedings{heo2019overhaul,
+  title={A Comprehensive Overhaul of Feature Distillation},
+  author={Heo, Byeongho and Kim, Jeesoo and Yun, Sangdoo and Park, Hyojin and Kwak, Nojun and Choi, Jin Young},
+  booktitle = {International Conference on Computer Vision (ICCV)},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..cb176b1c33a3af69697a2c822e76084ac3bafcda
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/metafile.yml
@@ -0,0 +1,38 @@
+Collections:
+  - Name: OFD
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://arxiv.org/abs/1904.01866
+      Title: A Comprehensive Overhaul of Feature Distillation
+    README: configs/distill/mmcls/ofd/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/dev-1.x/mmrazor/models/algorithms/distill/configurable/overhaul_feature_distillation.py
+      Version: v2.0.0
+    Converted From:
+      Code: https://github.com/clovaai/overhaul-distillation
+Models:
+  - Name: ofd_backbone_resnet50_resnet18_8xb16_cifar10
+    In Collection: OFD
+    Metadata:
+      Location: backbone
+      Student:
+        Config: mmrazor::vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py
+        Weights: https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn16_2_b16x8_cifar10_20220831_204709-446b466e.pth
+        Metrics:
+          Top 1 Accuracy: 93.2600
+          Top 5 Accuracy: 99.8000
+      Teacher:
+        Config: mmrazor::vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py
+        Weights: https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.pth
+        Metrics:
+          Top 1 Accuracy: 95.4400
+          Top 5 Accuracy: 99.8200
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 94.21
+    Config: configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/overhaul/ofd_backbone_resnet50_resnet18_8xb16_cifar10_20230417_192216-ace2908f.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..d47088ee7b0415052baa19cab5fe9b94874a8522
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/ofd/ofd_backbone_resnet50_resnet18_8xb16_cifar10.py
@@ -0,0 +1,100 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+model = dict(
+    _scope_='mmrazor',
+    type='OverhaulFeatureDistillation',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[125.307, 122.961, 113.8575],
+        std=[51.5865, 50.847, 51.255],
+        # convert image from BGR to RGB
+        bgr_to_rgb=False),
+    architecture=dict(
+        cfg_path=  # noqa: E251
+        'mmrazor::vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py',
+        pretrained=False),
+    teacher=dict(
+        cfg_path=  # noqa: E251
+        'mmrazor::vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py',
+        pretrained=False),
+    teacher_ckpt=  # noqa: E251
+    'https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.pth',  # noqa: E501
+    calculate_student_loss=True,
+    student_trainable=True,
+    distiller=dict(
+        type='OFDDistiller',
+        student_recorders=dict(
+            bb_1=dict(type='ModuleOutputs', source='backbone.layer2.0.bn1'),
+            bb_2=dict(type='ModuleOutputs', source='backbone.layer3.0.bn1'),
+            bb_3=dict(type='ModuleOutputs', source='backbone.bn1')),
+        teacher_recorders=dict(
+            bb_1=dict(type='ModuleOutputs', source='backbone.layer2.0.bn1'),
+            bb_2=dict(type='ModuleOutputs', source='backbone.layer3.0.bn1'),
+            bb_3=dict(type='ModuleOutputs', source='backbone.bn1')),
+        distill_losses=dict(
+            loss_1=dict(type='OFDLoss', loss_weight=0.25),
+            loss_2=dict(type='OFDLoss', loss_weight=0.5),
+            loss_3=dict(type='OFDLoss', loss_weight=1.0)),
+        connectors=dict(
+            loss_1_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=32,
+                out_channel=64,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_1_tfeat=dict(type='OFDTeacherConnector'),
+            loss_2_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=64,
+                out_channel=128,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_2_tfeat=dict(type='OFDTeacherConnector'),
+            loss_3_sfeat=dict(
+                type='ConvModuleConnector',
+                in_channel=128,
+                out_channel=256,
+                norm_cfg=dict(type='BN'),
+                act_cfg=None),
+            loss_3_tfeat=dict(type='OFDTeacherConnector')),
+        loss_forward_mappings=dict(
+            loss_1=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_1',
+                    connector='loss_1_sfeat'),
+                t_feature=dict(
+                    from_student=False,
+                    recorder='bb_1',
+                    connector='loss_1_tfeat'),
+            ),
+            loss_2=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_2',
+                    connector='loss_2_sfeat'),
+                t_feature=dict(
+                    from_student=False,
+                    recorder='bb_2',
+                    connector='loss_2_tfeat'),
+            ),
+            loss_3=dict(
+                s_feature=dict(
+                    from_student=True,
+                    recorder='bb_3',
+                    connector='loss_3_sfeat'),
+                t_feature=dict(
+                    from_student=False,
+                    recorder='bb_3',
+                    connector='loss_3_tfeat'),
+            ),
+        )))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3b25e5e7d22af3f98357bfebb8173667c60f206f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/README.md
@@ -0,0 +1,43 @@
+# RKD
+
+> [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation aims at transferring knowledge acquired
+in one model (a teacher) to another model (a student) that is
+typically smaller. Previous approaches can be expressed as
+a form of training the student to mimic output activations of
+individual data examples represented by the teacher. We introduce
+a novel approach, dubbed relational knowledge distillation (RKD),
+that transfers mutual relations of data examples instead.
+For concrete realizations of RKD, we propose distance-wise and
+angle-wise distillation losses that penalize structural differences
+in relations. Experiments conducted on different tasks show that the
+proposed method improves educated student models with a significant margin.
+In particular for metric learning, it allows students to outperform their
+teachers' performance, achieving the state of the arts on standard benchmark datasets.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187424092-b58742aa-6724-4a89-8d28-62960efb58b4.png)
+
+## Results and models
+
+### Classification
+
+| Location | Dataset  |                                                   Teacher                                                    |                                                   Student                                                    |  Acc  | Acc(T) | Acc(S) |                        Config                        | Download                                                                                                                                                                                                                                                                                                                                                                          |
+| :------: | :------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :--------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|   neck   | ImageNet | [resnet34](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet34_8xb32_in1k.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb32_in1k.py) | 70.23 | 73.62  | 69.90  | [config](./rkd_neck_resnet34_resnet18_8xb32_in1k.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k_acc-70.23_20220401-a91e223f.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.3/distill/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k_20220312_130419.log.json) |
+
+## Citation
+
+```latex
+@inproceedings{park2019relational,
+  title={Relational knowledge distillation},
+  author={Park, Wonpyo and Kim, Dongju and Lu, Yan and Cho, Minsu},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={3967--3976},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..820d81ce92738446313056d2f4a0c5bb2d60656a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/metafile.yml
@@ -0,0 +1,38 @@
+Collections:
+  - Name: RKD
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/1904.05068
+      Title: Relational Knowledge Distillation
+    README: configs/distill/mmcls/rkd/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.3.0/mmrazor/models/losses/relation_kd.py
+      Version: v0.3.0
+    Converted From:
+      Code: https://github.com/lenscloth/RKD
+Models:
+  - Name: rkd_neck_resnet34_resnet18_8xb32_in1k
+    In Collection: RKD
+    Metadata:
+      Location: neck
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth
+        Metrics:
+          Top 1 Accuracy: 73.62
+          Top 5 Accuracy: 91.59
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.23
+    Config: configs/distill/mmcls/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k_acc-70.23_20220401-a91e223f.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d5e878d57490459ccb0c41faecd7a248258d769
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/rkd/rkd_neck_resnet34_resnet18_8xb32_in1k.py
@@ -0,0 +1,44 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=True),
+    teacher_ckpt=  # noqa
+    'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth',  # noqa
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            feat=dict(type='ModuleOutputs', source='neck.gap')),
+        teacher_recorders=dict(
+            feat=dict(type='ModuleOutputs', source='neck.gap')),
+        distill_losses=dict(
+            loss_dw=dict(
+                type='DistanceWiseRKD', with_l2_norm=True, loss_weight=25),
+            loss_aw=dict(
+                type='AngleWiseRKD', with_l2_norm=True, loss_weight=50)),
+        loss_forward_mappings=dict(
+            loss_dw=dict(
+                preds_S=dict(from_student=True, recorder='feat'),
+                preds_T=dict(from_student=False, recorder='feat')),
+            loss_aw=dict(
+                preds_S=dict(from_student=True, recorder='feat'),
+                preds_T=dict(from_student=False, recorder='feat')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..be35af6cdba34e679826fece9c015878887569b3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/README.md
@@ -0,0 +1,43 @@
+# WSLD
+
+> [Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective](https://arxiv.org/abs/2102.00650)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation is an effective approach to leverage a well-trained network
+or an ensemble of them, named as the teacher, to guide the training of a student
+network. The outputs from the teacher network are used as soft labels for supervising the training of a new network. Recent studies (Muller et al., 2019; Yuan ¨
+et al., 2020) revealed an intriguing property of the soft labels that making labels
+soft serves as a good regularization to the student network. From the perspective of statistical learning, regularization aims to reduce the variance, however
+how bias and variance change is not clear for training with soft labels. In this
+paper, we investigate the bias-variance tradeoff brought by distillation with soft
+labels. Specifically, we observe that during training the bias-variance tradeoff
+varies sample-wisely. Further, under the same distillation temperature setting, we
+observe that the distillation performance is negatively associated with the number of some specific samples, which are named as regularization samples since
+these samples lead to bias increasing and variance decreasing. Nevertheless, we
+empirically find that completely filtering out regularization samples also deteriorates distillation performance. Our discoveries inspired us to propose the novel
+weighted soft labels to help the network adaptively handle the sample-wise biasvariance tradeoff. Experiments on standard evaluation benchmarks validate the
+effectiveness of our method.
+
+<img width="1032" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187424195-a3ea3d72-5ee7-4ffc-b562-65677076c18e.png">
+
+## Results and models
+
+### Classification
+
+| Location | Dataset  |                                                   Teacher                                                    |                                                   Student                                                    |  Acc  | Acc(T) | Acc(S) |                          Config                           | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| :------: | :------: | :----------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :-------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| cls head | ImageNet | [resnet34](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet34_8xb32_in1k.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb32_in1k.py) | 71.54 | 73.62  | 69.90  | [config](./wsld_cls_head_resnet34_resnet18_8xb32_in1k.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/wsld/wsld_cls_head_resnet34_resnet18_8xb32_in1k/wsld_cls_head_resnet34_resnet18_8xb32_in1k_acc-71.54_20211222-57925cbf.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v0.1/distill/wsld/wsld_cls_head_resnet34_resnet18_8xb32_in1k/wsld_cls_head_resnet34_resnet18_8xb32_in1k_20211221_181516.log.json?versionId=CAEQHxiBgIDLmemK7xciIGNkM2FiN2Y4N2E5YjRhNDE4NDVlNmExNDczZDIxN2E5) |
+
+## Citation
+
+```latex
+@inproceedings{zhou2021wsl,
+  title={Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective},
+  author={Helong, Zhou and Liangchen, Song and Jiajie, Chen and Ye, Zhou and Guoli, Wang and Junsong, Yuan and Qian Zhang},
+  booktitle = {International Conference on Learning Representations (ICLR)},
+  year={2021}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0ba40d52328c5ddcee33b5729472ad05fadf1cb0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/metafile.yml
@@ -0,0 +1,38 @@
+Collections:
+  - Name: WSLD
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/2102.00650
+      Title: Rethinking Soft Labels for Knowledge Distillation:A Bias-Variance Tradeoff Perspective
+    README: configs/distill/mmcls/wsld/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/losses/weighted_soft_label_distillation.py
+      Version: v0.1.0
+    Converted From:
+      Code: https://github.com/bellymonster/Weighted-Soft-Label-Distillation
+Models:
+  - Name: wsld_logits_resnet34_resnet18_8xb32_in1k
+    In Collection: WSLD
+    Metadata:
+      Location: logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+          Top 5 Accuracy: 89.43
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth
+        Metrics:
+          Top 1 Accuracy: 73.62
+          Top 5 Accuracy: 91.59
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 71.54
+    Config: configs/distill/mmcls/wsld/wsld_logits_resnet34_resnet18_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/wsld/wsld_cls_head_resnet34_resnet18_8xb32_in1k/wsld_cls_head_resnet34_resnet18_8xb32_in1k_acc-71.54_20211222-57925cbf.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/wsld_logits_resnet34_resnet18_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/wsld_logits_resnet34_resnet18_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cf8106b2acfacc933def1d9b3e1d06f22a575e8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/wsld/wsld_logits_resnet34_resnet18_8xb32_in1k.py
@@ -0,0 +1,41 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        bgr_to_rgb=True),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb32_in1k.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=True),
+    teacher_ckpt=  # noqa
+    'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth',  # noqa
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc'),
+            gt_labels=dict(type='ModuleInputs', source='head.loss_module')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_wsld=dict(type='WSLD', tau=2, loss_weight=2.5)),
+        loss_forward_mappings=dict(
+            loss_wsld=dict(
+                student=dict(recorder='fc', from_student=True),
+                teacher=dict(recorder='fc', from_student=False),
+                gt_labels=dict(
+                    recorder='gt_labels', from_student=True, data_idx=1)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fff1236172eb34d3a95c1886023eb98168f81815
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/README.md
@@ -0,0 +1,37 @@
+# Zero-shot Knowledge Transfer via Adversarial Belief Matching (ZSKT)
+
+> [Zero-shot Knowledge Transfer via Adversarial Belief Matching](https://arxiv.org/abs/1905.09768)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with 100 images per class), despite using no data. Finally, we also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between our zero-shot student and the teacher, than between a student distilled with real data and the teacher. Code available at: https://github.com/polo5/ZeroShotKnowledgeTransfer
+
+## The teacher and student decision boundaries
+
+<img width="766" alt="distribution" src="https://user-images.githubusercontent.com/88702197/187424317-9f3c5547-a838-4858-b63e-608eee8165f5.png">
+
+## Pseudo images sampled from the generator
+
+<img width="1176" alt="synthesis" src="https://user-images.githubusercontent.com/88702197/187424322-79be0b07-66b5-4775-8e23-6c2ddca0ad0f.png">
+
+## Results and models
+
+### Classification
+
+|     Location      | Dataset |                                                     Teacher                                                     |                                                     Student                                                     |  Acc  | Acc(T) | Acc(S) |                               Config                                | Download                                                                                                                                                                                                                                                                                                                                                                                                  |
+| :---------------: | :-----: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :---: | :----: | :----: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| backbone & logits | Cifar10 | [resnet34](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet34_8xb16_cifar10.py) | [resnet18](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet18_8xb16_cifar10.py) | 93.05 | 95.34  | 94.82  | [config](./zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10.py) | [teacher](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/ZSKT/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10_20220823_114006-28584c2e.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/ZSKT/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10_20220823_114006-28584c2e.json) |
+
+## Citation
+
+```latex
+@article{micaelli2019zero,
+  title={Zero-shot knowledge transfer via adversarial belief matching},
+  author={Micaelli, Paul and Storkey, Amos J},
+  journal={Advances in Neural Information Processing Systems},
+  volume={32},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..54494fa701ba0640e4b2b2c1685056733d7feae3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/metafile.yml
@@ -0,0 +1,43 @@
+Collections:
+  - Name: ZSKT
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://arxiv.org/abs/1905.09768
+      Title: Zero-shot Knowledge Transfer via Adversarial Belief Matching
+    README: configs/distill/mmcls/zskt/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/polo5/ZeroShotKnowledgeTransfer
+Models:
+  - Name: zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10
+    In Collection: ZSKT
+    Metadata:
+      inference time (ms/im):
+        - value: 0.12
+          hardware: NVIDIA A100-SXM4-80GB
+          backend: PyTorch
+          batch size: 16
+          mode: FP32
+          resolution: (32, 32)
+      Location: backbone & logits
+      Student:
+        Config: mmcls::resnet/resnet18_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
+        Metrics:
+          Top 1 Accuracy: 94.82
+          Top 5 Accuracy: 99.87
+      Teacher:
+        Config: mmcls::resnet/resnet34_8xb16_cifar10.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth
+        Metrics:
+          Top 1 Accuracy: 95.34
+          Top 5 Accuracy: 99.87
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 93.05
+    Config: configs/distill/mmcls/zskt/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/ZSKT/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10_20220823_114006-28584c2e.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..5c1ab424280242002a79dbb6ec814355c04022c0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmcls/zskt/zskt_backbone_logits_resnet34_resnet18_8xb16_cifar10.py
@@ -0,0 +1,139 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+res34_ckpt_path = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_b16x8_cifar10_20210528-a8aa36a6.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='DataFreeDistillation',
+    data_preprocessor=dict(
+        type='ImgDataPreprocessor',
+        # RGB format normalization parameters
+        mean=[125.307, 122.961, 113.8575],
+        std=[51.5865, 50.847, 51.255],
+        # convert image from BGR to RGB
+        bgr_to_rgb=False),
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet18_8xb16_cifar10.py', pretrained=False),
+    teachers=dict(
+        res34=dict(
+            build_cfg=dict(
+                cfg_path='mmcls::resnet/resnet34_8xb16_cifar10.py',
+                pretrained=True),
+            ckpt_path=res34_ckpt_path)),
+    generator=dict(
+        type='ZSKTGenerator', img_size=32, latent_dim=256,
+        hidden_channels=128),
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            bb_s1=dict(type='ModuleOutputs', source='backbone.layer1.1.relu'),
+            bb_s2=dict(type='ModuleOutputs', source='backbone.layer2.1.relu'),
+            bb_s3=dict(type='ModuleOutputs', source='backbone.layer3.1.relu'),
+            bb_s4=dict(type='ModuleOutputs', source='backbone.layer4.1.relu'),
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            res34_bb_s1=dict(
+                type='ModuleOutputs', source='res34.backbone.layer1.2.relu'),
+            res34_bb_s2=dict(
+                type='ModuleOutputs', source='res34.backbone.layer2.3.relu'),
+            res34_bb_s3=dict(
+                type='ModuleOutputs', source='res34.backbone.layer3.5.relu'),
+            res34_bb_s4=dict(
+                type='ModuleOutputs', source='res34.backbone.layer4.2.relu'),
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(
+            loss_s1=dict(type='ATLoss', loss_weight=250.0),
+            loss_s2=dict(type='ATLoss', loss_weight=250.0),
+            loss_s3=dict(type='ATLoss', loss_weight=250.0),
+            loss_s4=dict(type='ATLoss', loss_weight=250.0),
+            loss_kl=dict(
+                type='KLDivergence', loss_weight=2.0, reduction='mean')),
+        loss_forward_mappings=dict(
+            loss_s1=dict(
+                s_feature=dict(
+                    from_student=True, recorder='bb_s1', record_idx=1),
+                t_feature=dict(
+                    from_student=False, recorder='res34_bb_s1', record_idx=1)),
+            loss_s2=dict(
+                s_feature=dict(
+                    from_student=True, recorder='bb_s2', record_idx=1),
+                t_feature=dict(
+                    from_student=False, recorder='res34_bb_s2', record_idx=1)),
+            loss_s3=dict(
+                s_feature=dict(
+                    from_student=True, recorder='bb_s3', record_idx=1),
+                t_feature=dict(
+                    from_student=False, recorder='res34_bb_s3', record_idx=1)),
+            loss_s4=dict(
+                s_feature=dict(
+                    from_student=True, recorder='bb_s4', record_idx=1),
+                t_feature=dict(
+                    from_student=False, recorder='res34_bb_s4', record_idx=1)),
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='res34_fc')))),
+    generator_distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            res34_fc=dict(type='ModuleOutputs', source='res34.head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(
+                type='KLDivergence',
+                loss_weight=-2.0,
+                reduction='mean',
+                teacher_detach=False)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='res34_fc')))),
+    student_iter=10)
+
+# model wrapper
+model_wrapper_cfg = dict(
+    type='mmengine.MMSeparateDistributedDataParallel',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+# optimizer wrapper
+optim_wrapper = dict(
+    _delete_=True,
+    constructor='mmrazor.SeparateOptimWrapperConstructor',
+    architecture=dict(
+        optimizer=dict(type='SGD', lr=0.1, weight_decay=0.0005, momentum=0.9)),
+    generator=dict(optimizer=dict(type='Adam', lr=1e-3)))
+auto_scale_lr = dict(base_batch_size=16)
+
+iter_size = 50
+
+param_scheduler = dict(
+    _delete_=True,
+    architecture=dict(
+        type='MultiStepLR',
+        milestones=[100 * iter_size, 200 * iter_size],
+        by_epoch=False),
+    generator=dict(
+        type='MultiStepLR',
+        milestones=[100 * iter_size, 200 * iter_size],
+        by_epoch=False))
+
+train_cfg = dict(
+    _delete_=True, by_epoch=False, max_iters=500 * iter_size, val_interval=250)
+
+train_dataloader = dict(
+    batch_size=16, sampler=dict(type='InfiniteSampler', shuffle=True))
+val_dataloader = dict(batch_size=16)
+val_evaluator = dict(type='Accuracy', topk=(1, 5))
+
+default_hooks = dict(
+    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
+    checkpoint=dict(
+        type='CheckpointHook', by_epoch=False, interval=100, max_keep_ckpts=2))
+
+log_processor = dict(by_epoch=False)
+# Must set diff_rank_seed to True!
+randomness = dict(seed=None, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4805c0696573883a43685629a84c4a4f0e619077
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/README.md
@@ -0,0 +1,37 @@
+# CWD
+
+> [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probability map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187424502-d8efb7a3-c40c-4e53-a36c-bd947de464a4.png)
+
+## Results and models
+
+### Segmentation
+
+| Location |  Dataset   |                                                             Teacher                                                              |                                                            Student                                                             | mIoU  | mIoU(T) | mIou(S) |    Config    | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| :------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | :---: | :-----: | :-----: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|  logits  | cityscapes | [pspnet_r101](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py) | [pspnet_r18](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py) | 75.54 |  79.76  |  74.87  | [config](<>) | [teacher](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211-e1e1100f.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k_mIoU-75.54_20211222-3e643f6f.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/distill/cwd/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k_20211212_205711.log.json) |
+
+### Detection
+
+| Location | Dataset |                                                     Teacher                                                      |                                                Student                                                 | mAP  | mAP(T) | mAP(S) |    Config    | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| :------: | :-----: | :--------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :--: | :----: | :----: | :----------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| cls head |  COCO   | [gfl_r101_2x](https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl/gfl_r101_fpn_mstrain_2x_coco.py) | [gfl_r50_1x](https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl/gfl_r50_fpn_1x_coco.py) | 41.9 |  44.7  |  40.2  | [config](<>) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco_20211222-c134bb21.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/distill/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco_20211212_205444.log.json) |
+
+## Citation
+
+```latex
+@inproceedings{shu2021channel,
+  title={Channel-Wise Knowledge Distillation for Dense Prediction},
+  author={Shu, Changyong and Liu, Yifan and Gao, Jianfei and Yan, Zheng and Shen, Chunhua},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
+  pages={5311--5320},
+  year={2021}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..92612046f839f6cb20bbd0af35e6b68ac3a55b1b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco.py
@@ -0,0 +1,9 @@
+_base_ = ['./cwd_fpn_retina_r101_retina_r50_1x_coco.py']
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth'  # noqa: E501
+model = dict(
+    architecture=dict(
+        cfg_path='mmdet::gfl/gfl_r50_fpn_1x_coco.py', pretrained=False),
+    teacher=dict(
+        cfg_path='mmdet::gfl/gfl_r101_fpn_ms-2x_coco.py', pretrained=True),
+    teacher_ckpt=teacher_ckpt)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_frcnn_r101_frcnn_r50_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_frcnn_r101_frcnn_r50_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..56756d426c9fc07f9fc3041136415f032dc65a9d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_frcnn_r101_frcnn_r50_1x_coco.py
@@ -0,0 +1,54 @@
+_base_ = [
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_1x.py',
+    'mmdet::_base_/default_runtime.py'
+]
+
+# default_scope = 'mmrazor'
+teacher_ckpt = 'faster_rcnn_r101_fpn_2x_coco_bbox_mAP-0.398_20200504_210455-1d2dac9c.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='FpnTeacherDistill',
+    architecture=dict(
+        cfg_path='mmdet::faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py',
+        pretrained=False),
+    teacher=dict(
+        cfg_path='mmdet::faster_rcnn/faster_rcnn_r101_fpn_2x_coco.py',
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        teacher_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        distill_losses=dict(
+            loss_cwd_fpn0=dict(
+                type='ChannelWiseDivergence', tau=1, loss_weight=10),
+            loss_cwd_fpn1=dict(
+                type='ChannelWiseDivergence', tau=1, loss_weight=10),
+            loss_cwd_fpn2=dict(
+                type='ChannelWiseDivergence', tau=1, loss_weight=10),
+            loss_cwd_fpn3=dict(
+                type='ChannelWiseDivergence', tau=1, loss_weight=10),
+            loss_cwd_fpn4=dict(
+                type='ChannelWiseDivergence', tau=1, loss_weight=10)),
+        loss_forward_mappings=dict(
+            loss_cwd_fpn0=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=0),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=0)),
+            loss_cwd_fpn1=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=1),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=1)),
+            loss_cwd_fpn2=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=2),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=2)),
+            loss_cwd_fpn3=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=3),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=3)),
+            loss_cwd_fpn4=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=4),
+                preds_T=dict(from_student=False, recorder='fpn',
+                             data_idx=4)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..682a5cfc873643508d58c0fe196907da9dea64b0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco.py
@@ -0,0 +1,13 @@
+_base_ = ['./cwd_fpn_frcnn_r101_frcnn_r50_1x_coco.py']
+
+model = dict(
+    architecture=dict(
+        cfg_path='mmdet::retinanet/retinanet_r50_fpn_1x_coco.py',
+        pretrained=False),
+    teacher=dict(
+        cfg_path='mmdet::retinanet/retinanet_r101_fpn_2x_coco.py',
+        pretrained=True))
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..13947952a6e0603f213f25cf804631dbd92c38ca
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/cwd_fpn_retina_r101_retina_r50_1x_coco_visualization.py
@@ -0,0 +1,21 @@
+_base_ = ['./cwd_fpn_retina_r101_retina_r50_1x_coco.py']
+
+default_hooks = dict(
+    checkpoint=dict(type='CheckpointHook', interval=-1),
+    visualization=dict(
+        _scope_='mmrazor',
+        type='RazorVisualizationHook',
+        enabled=True,
+        recorders=dict(
+            # todo: Maybe it is hard for users to understand why to add a
+            #  prefix `architecture.`
+            neck=dict(
+                _scope_='mmrazor',
+                type='ModuleOutputs',
+                source='architecture.neck')),
+        mappings=dict(
+            p3=dict(recorder='neck', data_idx=0),
+            p4=dict(recorder='neck', data_idx=1),
+            p5=dict(recorder='neck', data_idx=2),
+            p6=dict(recorder='neck', data_idx=3)),
+        out_dir='retina_vis'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..f2375c36666dbd7c697075755ade101de4b5b0d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/cwd/metafile.yml
@@ -0,0 +1,23 @@
+
+Models:
+  - Name: cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco
+    In Collection: CWD
+    Metadata:
+      Location: cls head
+      Student:
+        Metrics:
+          box AP: 40.2
+        Config: mmdet::gfl/gfl_r50_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r50_fpn_1x_coco/gfl_r50_fpn_1x_coco_20200629_121244-25944287.pth
+      Teacher:
+        Metrics:
+          box AP: 44.7
+        Config: mmdet::gfl/gfl_r50_fpn_mstrain_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 41.9
+    Config: configs/distill/mmdet/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco_20211222-c134bb21.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cded5983db34433e7fd79431894c5f58f13bd4c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/README.md
@@ -0,0 +1,37 @@
+# IMPROVE OBJECT DETECTION WITH FEATURE-BASED KNOWLEDGE DISTILLATION: TOWARDS ACCURATE AND EFFICIENT DETECTORS (FBKD)
+
+> [IMPROVE OBJECT DETECTION WITH FEATURE-BASED KNOWLEDGE DISTILLATION: TOWARDS ACCURATE AND EFFICIENT DETECTORS](https://openreview.net/pdf?id=uKhGRvM8QNH)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation, in which a student model is trained to mimic a teacher model, has been proved as an effective technique for model compression and model accuracy boosting. However, most knowledge distillation methods, designed for image classification, have failed on more challenging tasks, such as object detection. In this paper, we suggest that the failure of knowledge distillation on object detection is mainly caused by two reasons: (1) the imbalance between pixels of foreground and background and (2) lack of distillation on the relation between different pixels. Observing the above reasons, we propose attention-guided distillation and non-local distillation to address the two problems, respectively. Attention-guided distillation is proposed to find the crucial pixels of foreground objects with attention mechanism and then make the students take more effort to learn their features. Non-local distillation is proposed to enable students to learn not only the feature of an individual pixel but also the relation between different pixels captured by non-local modules. Experiments show that our methods achieve excellent AP improvements on both one-stage and two-stage, both anchor-based and anchor-free detectors. For example, Faster RCNN (ResNet101 backbone) with our distillation achieves 43.9 AP on COCO2017, which is 4.1 higher than the baseline.
+
+<img width="836" alt="pipeline" src="https://user-images.githubusercontent.com/88702197/187424617-6259a7fc-b610-40ae-92eb-f21450dcbaa1.png">
+
+## Results and models
+
+### Detection
+
+| Location | Dataset |                                                              Teacher                                                               |                                                             Student                                                              | box AP | box AP(T) | box AP(S) |                             Config                             | Download                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| :------: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :----: | :-------: | :-------: | :------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|   neck   |  COCO   | [faster-rcnn_resnet101](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.py) | [faster-rcnn_resnet50](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py) |  39.3  |   39.4    |   37.4    | [config](./fbkd_fpn_frcnn_resnet101_frcnn_resnet50_1x_coco.py) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_1x_coco/faster_rcnn_r101_fpn_1x_coco_20200130-f513f705.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/FBKD/fbkd_fpn_frcnn_resnet101_frcnn_resnet50_1x_coco_20220830_121522-8d7e11df.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/FBKD/fbkd_fpn_frcnn_resnet101_frcnn_resnet50_1x_coco_20220830_121522-8d7e11df.json) |
+
+## Citation
+
+```latex
+@inproceedings{DBLP:conf/iclr/ZhangM21,
+  author    = {Linfeng Zhang and Kaisheng Ma},
+  title     = {Improve Object Detection with Feature-based Knowledge Distillation:
+               Towards Accurate and Efficient Detectors},
+  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
+               Virtual Event, Austria, May 3-7, 2021},
+  publisher = {OpenReview.net},
+  year      = {2021},
+  url       = {https://openreview.net/forum?id=uKhGRvM8QNH},
+  timestamp = {Wed, 23 Jun 2021 17:36:39 +0200},
+  biburl    = {https://dblp.org/rec/conf/iclr/ZhangM21.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/fbkd_fpn_faster-rcnn_r101_faster-rcnn_r50_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/fbkd_fpn_faster-rcnn_r101_faster-rcnn_r50_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..3203e3fea9aad66e703c571c757b8443d9d7d6ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/fbkd_fpn_faster-rcnn_r101_faster-rcnn_r50_1x_coco.py
@@ -0,0 +1,126 @@
+_base_ = [
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_1x.py',
+    'mmdet::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_1x_coco/faster_rcnn_r101_fpn_1x_coco_20200130-f513f705.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    architecture=dict(
+        cfg_path='mmdet::faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py',
+        pretrained=True),
+    teacher=dict(
+        cfg_path='mmdet::faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py',
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            neck_s0=dict(type='ModuleOutputs', source='neck.fpn_convs.0.conv'),
+            neck_s1=dict(type='ModuleOutputs', source='neck.fpn_convs.1.conv'),
+            neck_s2=dict(type='ModuleOutputs', source='neck.fpn_convs.2.conv'),
+            neck_s3=dict(type='ModuleOutputs',
+                         source='neck.fpn_convs.3.conv')),
+        teacher_recorders=dict(
+            neck_s0=dict(type='ModuleOutputs', source='neck.fpn_convs.0.conv'),
+            neck_s1=dict(type='ModuleOutputs', source='neck.fpn_convs.1.conv'),
+            neck_s2=dict(type='ModuleOutputs', source='neck.fpn_convs.2.conv'),
+            neck_s3=dict(type='ModuleOutputs',
+                         source='neck.fpn_convs.3.conv')),
+        distill_losses=dict(
+            loss_s0=dict(type='FBKDLoss'),
+            loss_s1=dict(type='FBKDLoss'),
+            loss_s2=dict(type='FBKDLoss'),
+            loss_s3=dict(type='FBKDLoss')),
+        connectors=dict(
+            loss_s0_sfeat=dict(
+                type='FBKDStudentConnector',
+                in_channels=256,
+                reduction=4,
+                mode='dot_product',
+                sub_sample=True,
+                maxpool_stride=8),
+            loss_s0_tfeat=dict(
+                type='FBKDTeacherConnector',
+                in_channels=256,
+                reduction=4,
+                mode='dot_product',
+                sub_sample=True,
+                maxpool_stride=8),
+            loss_s1_sfeat=dict(
+                type='FBKDStudentConnector',
+                in_channels=256,
+                reduction=4,
+                mode='dot_product',
+                sub_sample=True,
+                maxpool_stride=4),
+            loss_s1_tfeat=dict(
+                type='FBKDTeacherConnector',
+                in_channels=256,
+                reduction=4,
+                mode='dot_product',
+                sub_sample=True,
+                maxpool_stride=4),
+            loss_s2_sfeat=dict(
+                type='FBKDStudentConnector',
+                in_channels=256,
+                mode='dot_product',
+                sub_sample=True),
+            loss_s2_tfeat=dict(
+                type='FBKDTeacherConnector',
+                in_channels=256,
+                mode='dot_product',
+                sub_sample=True),
+            loss_s3_sfeat=dict(
+                type='FBKDStudentConnector',
+                in_channels=256,
+                mode='dot_product',
+                sub_sample=True),
+            loss_s3_tfeat=dict(
+                type='FBKDTeacherConnector',
+                in_channels=256,
+                mode='dot_product',
+                sub_sample=True)),
+        loss_forward_mappings=dict(
+            loss_s0=dict(
+                s_input=dict(
+                    from_student=True,
+                    recorder='neck_s0',
+                    connector='loss_s0_sfeat'),
+                t_input=dict(
+                    from_student=False,
+                    recorder='neck_s0',
+                    connector='loss_s0_tfeat')),
+            loss_s1=dict(
+                s_input=dict(
+                    from_student=True,
+                    recorder='neck_s1',
+                    connector='loss_s1_sfeat'),
+                t_input=dict(
+                    from_student=False,
+                    recorder='neck_s1',
+                    connector='loss_s1_tfeat')),
+            loss_s2=dict(
+                s_input=dict(
+                    from_student=True,
+                    recorder='neck_s2',
+                    connector='loss_s2_sfeat'),
+                t_input=dict(
+                    from_student=False,
+                    recorder='neck_s2',
+                    connector='loss_s2_tfeat')),
+            loss_s3=dict(
+                s_input=dict(
+                    from_student=True,
+                    recorder='neck_s3',
+                    connector='loss_s3_sfeat'),
+                t_input=dict(
+                    from_student=False,
+                    recorder='neck_s3',
+                    connector='loss_s3_tfeat')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..aa8c1eaeafac5d95bb4ce6a7e8baace1b72e8aad
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/fbkd/metafile.yml
@@ -0,0 +1,41 @@
+Collections:
+  - Name: FBKD
+    Metadata:
+      Training Data:
+        - COCO
+    Paper:
+      URL: https://openreview.net/pdf?id=uKhGRvM8QNH
+      Title: IMPROVE OBJECT DETECTION WITH FEATURE-BASED KNOWLEDGE DISTILLATION- TOWARDS ACCURATE AND EFFICIENT DETECTORS
+    README: configs/distill/mmdet/fbkd/README.md
+    Converted From:
+      Code:
+        URL: https://github.com/ArchipLab-LinfengZhang/Object-Detection-Knowledge-Distillation-ICLR2021
+Models:
+  - Name: fbkd_fpn_faster-rcnn_r101_faster-rcnn_r50_1x_coco
+    In Collection: FBKD
+    Metadata:
+      inference time (ms/im):
+        - value: 0.32
+          hardware: NVIDIA A100-SXM4-80GB
+          backend: PyTorch
+          batch size: 2
+          mode: FP32
+          resolution: (1333, 800)
+      Location: fpn
+      Student:
+        Metrics:
+          box AP: 37.4
+        Config: mmdet::faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
+      Teacher:
+        Metrics:
+          box AP: 39.4
+        Config: mmdet::faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_1x_coco/faster_rcnn_r101_fpn_1x_coco_20200130-f513f705.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 39.3
+    Config: configs/distill/mmdet/fbkd/fbkd_fpn_faster-rcnn_r101_faster-rcnn_r50_1x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/FBKD/fbkd_fpn_frcnn_resnet101_frcnn_resnet50_1x_coco_20220830_121522-8d7e11df.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..19c52ac4b7e3c3924417ad373b90b758fc7f7144
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/README.md
@@ -0,0 +1,30 @@
+# MGD
+
+> [Masked Generative Distillation](https://arxiv.org/abs/2205.01529)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU.
+
+![pipeline](https://github.com/yzd-v/MGD/raw/master/architecture.png)
+
+## Results and models
+
+### Detection
+
+| Location | Dataset |                                                            Teacher                                                             |                                                        Student                                                         | Lr schd | mAP  | mAP(T) | mAP(S) |                       Config                        | Download                                                                                                                                                                                                                                                                                                                                                                                                              |
+| :------: | :-----: | :----------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :-----: | :--: | :----: | :----: | :-------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|   FPN    |  COCO   | [RetinaNet-X101](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py) | [RetinaNet-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_r50_fpn_2x_coco.py) |   2x    | 41.0 |  41.0  |  37.4  | [config](mgd_fpn_retina_x101_retina_r50_2x_coco.py) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_x101_64x4d_fpn_1x_coco/retinanet_x101_64x4d_fpn_1x_coco_20200130-366f5af1.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/mgd/mgd_fpn_retina_x101_retina_r50_2x_coco_20221209_191847-87141529.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/mgd/mgd_fpn_retina_x101_retina_r50_2x_coco_20221209_191847-87141529.log) |
+
+## Citation
+
+```latex
+@article{yang2022masked,
+  title={Masked Generative Distillation},
+  author={Yang, Zhendong and Li, Zhe and Shao, Mingqi and Shi, Dachuan and Yuan, Zehuan and Yuan, Chun},
+  journal={arXiv preprint arXiv:2205.01529},
+  year={2022}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/mgd_fpn_retina_x101_retina_r50_2x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/mgd_fpn_retina_x101_retina_r50_2x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a416cd8b860af64a2430360c79d57a6ba3569f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/mgd/mgd_fpn_retina_x101_retina_r50_2x_coco.py
@@ -0,0 +1,118 @@
+_base_ = ['mmdet::retinanet/retinanet_r50_fpn_2x_coco.py']
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_x101_64x4d_fpn_1x_coco/retinanet_x101_64x4d_fpn_1x_coco_20200130-366f5af1.pth'  # noqa: E501
+
+student = _base_.model
+student.neck.init_cfg = dict(
+    type='Pretrained', prefix='neck.', checkpoint=teacher_ckpt)
+student.bbox_head.init_cfg = dict(
+    type='Pretrained', prefix='bbox_head.', checkpoint=teacher_ckpt)
+
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='FpnTeacherDistill',
+    architecture=student,
+    teacher=dict(
+        cfg_path='mmdet::retinanet/retinanet_x101-64x4d_fpn_1x_coco.py',
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fpn0=dict(type='ModuleOutputs', source='neck.fpn_convs.0.conv'),
+            fpn1=dict(type='ModuleOutputs', source='neck.fpn_convs.1.conv'),
+            fpn2=dict(type='ModuleOutputs', source='neck.fpn_convs.2.conv'),
+            fpn3=dict(type='ModuleOutputs', source='neck.fpn_convs.3.conv'),
+            fpn4=dict(type='ModuleOutputs', source='neck.fpn_convs.4.conv')),
+        teacher_recorders=dict(
+            fpn0=dict(type='ModuleOutputs', source='neck.fpn_convs.0.conv'),
+            fpn1=dict(type='ModuleOutputs', source='neck.fpn_convs.1.conv'),
+            fpn2=dict(type='ModuleOutputs', source='neck.fpn_convs.2.conv'),
+            fpn3=dict(type='ModuleOutputs', source='neck.fpn_convs.3.conv'),
+            fpn4=dict(type='ModuleOutputs', source='neck.fpn_convs.4.conv')),
+        connectors=dict(
+            s_fpn0_connector=dict(
+                type='MGDConnector',
+                student_channels=256,
+                teacher_channels=256,
+                lambda_mgd=0.65),
+            s_fpn1_connector=dict(
+                type='MGDConnector',
+                student_channels=256,
+                teacher_channels=256,
+                lambda_mgd=0.65),
+            s_fpn2_connector=dict(
+                type='MGDConnector',
+                student_channels=256,
+                teacher_channels=256,
+                lambda_mgd=0.65),
+            s_fpn3_connector=dict(
+                type='MGDConnector',
+                student_channels=256,
+                teacher_channels=256,
+                lambda_mgd=0.65),
+            s_fpn4_connector=dict(
+                type='MGDConnector',
+                student_channels=256,
+                teacher_channels=256,
+                lambda_mgd=0.65)),
+        distill_losses=dict(
+            loss_mgd_fpn0=dict(type='MGDLoss', alpha_mgd=0.00002),
+            loss_mgd_fpn1=dict(type='MGDLoss', alpha_mgd=0.00002),
+            loss_mgd_fpn2=dict(type='MGDLoss', alpha_mgd=0.00002),
+            loss_mgd_fpn3=dict(type='MGDLoss', alpha_mgd=0.00002),
+            loss_mgd_fpn4=dict(type='MGDLoss', alpha_mgd=0.00002)),
+        loss_forward_mappings=dict(
+            loss_mgd_fpn0=dict(
+                preds_S=dict(
+                    from_student=True,
+                    recorder='fpn0',
+                    connector='s_fpn0_connector'),
+                preds_T=dict(from_student=False, recorder='fpn0')),
+            loss_mgd_fpn1=dict(
+                preds_S=dict(
+                    from_student=True,
+                    recorder='fpn1',
+                    connector='s_fpn1_connector'),
+                preds_T=dict(from_student=False, recorder='fpn1')),
+            loss_mgd_fpn2=dict(
+                preds_S=dict(
+                    from_student=True,
+                    recorder='fpn2',
+                    connector='s_fpn2_connector'),
+                preds_T=dict(from_student=False, recorder='fpn2')),
+            loss_mgd_fpn3=dict(
+                preds_S=dict(
+                    from_student=True,
+                    recorder='fpn3',
+                    connector='s_fpn3_connector'),
+                preds_T=dict(from_student=False, recorder='fpn3')),
+            loss_mgd_fpn4=dict(
+                preds_S=dict(
+                    from_student=True,
+                    recorder='fpn4',
+                    connector='s_fpn4_connector'),
+                preds_T=dict(from_student=False, recorder='fpn4')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
+
+optimizer_config = dict(
+    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
+
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=24,
+        by_epoch=True,
+        milestones=[16, 22],
+        gamma=0.1)
+]
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a25dc5ae8d8beac634f4d5b43eddfaf1467b2925
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/README.md
@@ -0,0 +1,34 @@
+# PKD
+
+> [PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient](https://arxiv.org/abs/2207.02039)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs. Furthermore, it converges faster. With a powerful MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1% and 4.8% higher than the baseline, respectively.
+
+![pipeline](https://user-images.githubusercontent.com/41630003/197719796-76fa5f33-1d54-4927-8a08-86f5c6e33879.png)
+
+## Results and models
+
+### Detection
+
+| Location | Dataset |                                                                          Teacher                                                                           |                                                                 Student                                                                  | Lr schd | mAP  | mAP(T) | mAP(S) |                            Config                             | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| :------: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :-----: | :--: | :----: | :----: | :-----------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|   FPN    |  COCO   |             [FCOS-X101](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py)             |          [RetinaNet-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_r50_fpn_1x_coco.py)          |   1x    | 40.3 |  42.6  |  36.5  |       [config](pkd_fpn_fcos_x101_retina_r50_1x_coco.py)       | [teacher](https://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco-ede514a8.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos_retina/pkd_fpn_fcos_x101_retina_r50_1x_coco_20220925_181547-9cac5059.pth?versionId=CAEQThiBgMCLyNC0oBgiIDBjY2FkY2JlNGFiYzRmM2RiZGUyYzM1NjQxYzQxODA4) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos_retina/pkd_fpn_fcos_x101_retina_r50_1x_coco_20220925_181547-9cac5059.json?versionId=CAEQThiBgMDA0dS0oBgiIDM4ZjZlZmVkMzc4MjQxMGJiN2FlMDFlOTA2NGIzZGQ4)                                                     |
+|   FPN    |  COCO   |               [Faster-Rcnn-R101](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py)               |       [Faster-rcnn-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py)       |   2x    | 40.3 |  39.8  |  38.4  | [config](pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_2x_coco/faster_rcnn_r101_fpn_2x_coco_bbox_mAP-0.398_20200504_210455-1d2dac9c.pth)         \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_frcnn/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco_20221014_103040-3efbd439.pth?versionId=CAEQThiBgMDQr9C0oBgiIDMyZWE1Y2ZlMDA2ZDQ2ZGNhZmQ3NzMxODk3YzgzYWFl) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_frcnn/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco_20221014_103040-3efbd439.json?versionId=CAEQThiBgICYsNC0oBgiIDdhNWY5ZjZlYjUyNzRjMGU4NGFhYzk4NzQwZDAxY2Rj)                                         |
+|   FPN    |  COCO   |          [Mask-Rcnn-Swin](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py)           |          [RetinaNet-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_r50_fpn_2x_coco.py)          |   2x    | 41.5 |  48.2  |  37.4  |    [config](pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco.py)     | [teacher](https://download.openmmlab.com/mmdetection/v2.0/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco/mask_rcnn_swin-t-p4-w7_fpn_1x_coco_20210902_120937-9d6b7cfa.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_swin_retina/pkd_fpn_mask_rcnn_swin_retina_r50_2x_coco_20220925_142555-edec7433.pth?versionId=CAEQThiBgIDWqNC0oBgiIDViOGE0ZDU4ODgxNzQ5YmE5OGU3MzRkMjFiZGRjZmRm) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_swin_retina/pkd_fpn_mask_rcnn_swin_retina_r50_2x_coco_20220925_142555-edec7433.json?versionId=CAEQThiBgIDVqdC0oBgiIDU3YzFjOWRmNWY3NTRmYjFhMDdmNzU2ODE3MzdlZThk)                                                             |
+|   FPN    |  COCO   | [Reppoints-X101-dcn](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py) | [Reppoints-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py) |   2x    | 42.3 |  44.2  |  38.6  | [config](pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco.py) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/reppoints/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco_20200329-f87da1ea.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_reppoints/pkd_fpn_reppoints_x101_dcn_reppoints_r50_2x_coco_20220926_145818-f8932e12.pth?versionId=CAEQThiBgIC8rNC0oBgiIGU2N2IxM2NkMjNlMjQyN2E4YmVlNmViNGI2MDY3OTE5) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_reppoints/pkd_fpn_reppoints_x101_dcn_reppoints_r50_2x_coco_20220926_145818-f8932e12.json?versionId=CAEQThiBgICordC0oBgiIDJhMjBjOGZiN2UxNjQxYmI5MzE3NWVhZDgxZDE2NmJm) |
+|   FPN    |  COCO   |               [RetinaNet-X101](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py)               |          [RetinaNet-R50](https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_r50_fpn_2x_coco.py)          |   2x    | 40.8 |  41.0  |  37.4  |      [config](pkd_fpn_retina_x101_retina_r50_2x_coco.py)      | [teacher](https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_x101_64x4d_fpn_1x_coco/retinanet_x101_64x4d_fpn_1x_coco_20200130-366f5af1.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_retinax_retina/pkd_fpn_retina_x101_retina_r50_2x_coco_20221014_232526-4c0f8d96.pth?versionId=CAEQThiBgIDQqdC0oBgiIGFmZjNmZmE4NDFiMDQ4MzhiMzdjOGI2NzI4MTQxMjFi) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_retinax_retina/pkd_fpn_retina_x101_retina_r50_2x_coco_20221014_232526-4c0f8d96.json?versionId=CAEQThiBgMC2qdC0oBgiIGRkMTIzODYwMzliMDQ3M2JiYjNlYjA5N2I4Y2QzMGFl)                                                                   |
+
+## Citation
+
+```latex
+@article{cao2022pkd,
+  title={PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient},
+  author={Cao, Weihan and Zhang, Yifan and Gao, Jianfei and Cheng, Anda and Cheng, Ke and Cheng, Jian},
+  journal={arXiv preprint arXiv:2207.02039},
+  year={2022}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6ea2347b54e2d027f8815c1fec3fc6091fb84609
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/metafile.yml
@@ -0,0 +1,110 @@
+Models:
+  - Name: pkd_fpn_fcos_x101_retina_r50_1x_coco
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 36.5
+        Config: mmdet::retinanet/retinanet_r50_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth
+      Teacher:
+        Metrics:
+          box AP: 42.6
+        Config: mmdet::fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco-ede514a8.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 40.3
+    Config: configs/distill/mmdet/pkd/pkd_fpn_fcos_x101_retina_r50_1x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos_retina/pkd_fpn_fcos_x101_retina_r50_1x_coco_20220925_181547-9cac5059.pth?versionId=CAEQThiBgMCLyNC0oBgiIDBjY2FkY2JlNGFiYzRmM2RiZGUyYzM1NjQxYzQxODA4
+
+  - Name: pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 38.4
+        Config: mmdet::faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
+      Teacher:
+        Metrics:
+          box AP: 39.8
+        Config: mmdet::faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_2x_coco/faster_rcnn_r101_fpn_2x_coco_bbox_mAP-0.398_20200504_210455-1d2dac9c.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 40.4
+    Config: configs/distill/mmdet/pkd/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_frcnn/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco_20221014_103040-3efbd439.pth?versionId=CAEQThiBgMDQr9C0oBgiIDMyZWE1Y2ZlMDA2ZDQ2ZGNhZmQ3NzMxODk3YzgzYWFl
+
+  - Name: pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 37.4
+        Config: mmdet::retinanet/retinanet_r50_fpn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_2x_coco/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth
+      Teacher:
+        Metrics:
+          box AP: 48.2
+        Config: mmdet::swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco/mask_rcnn_swin-t-p4-w7_fpn_1x_coco_20210902_120937-9d6b7cfa.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 41.5
+    Config: configs/distill/mmdet/pkd/pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_swin_retina/pkd_fpn_mask_rcnn_swin_retina_r50_2x_coco_20220925_142555-edec7433.pth?versionId=CAEQThiBgIDWqNC0oBgiIDViOGE0ZDU4ODgxNzQ5YmE5OGU3MzRkMjFiZGRjZmRm
+
+  - Name: pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 38.6
+        Config: mmdet::reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/reppoints/reppoints_moment_r50_fpn_gn-neck%2Bhead_2x_coco/reppoints_moment_r50_fpn_gn-neck%2Bhead_2x_coco_20200329-91babaa2.pth
+      Teacher:
+        Metrics:
+          box AP: 44.2
+        Config: mmdet::reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/reppoints/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco_20200329-f87da1ea.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 42.3
+    Config: configs/distill/mmdet/pkd/pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_reppoints/pkd_fpn_reppoints_x101_dcn_reppoints_r50_2x_coco_20220926_145818-f8932e12.pth?versionId=CAEQThiBgIC8rNC0oBgiIGU2N2IxM2NkMjNlMjQyN2E4YmVlNmViNGI2MDY3OTE5
+
+  - Name: pkd_fpn_retina_x101_retina_r50_2x_coco
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 37.4
+        Config: mmdet::retinanet/retinanet_r50_fpn_2x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_2x_coco/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth
+      Teacher:
+        Metrics:
+          box AP: 41.0
+        Config: mmdet::retinanet/retinanet_x101-64x4d_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_x101_64x4d_fpn_1x_coco/retinanet_x101_64x4d_fpn_1x_coco_20200130-366f5af1.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 40.8
+    Config: configs/distill/mmdet/pkd/pkd_fpn_retina_x101_retina_r50_2x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_retinax_retina/pkd_fpn_retina_x101_retina_r50_2x_coco_20221014_232526-4c0f8d96.pth?versionId=CAEQThiBgIDQqdC0oBgiIGFmZjNmZmE4NDFiMDQ4MzhiMzdjOGI2NzI4MTQxMjFi
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..c9496792dfb9d60b6d8ef3fb7cf1f10302381ce3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py
@@ -0,0 +1,45 @@
+_base_ = [
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_2x.py',
+    'mmdet::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r101_fpn_2x_coco/faster_rcnn_r101_fpn_2x_coco_bbox_mAP-0.398_20200504_210455-1d2dac9c.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    type='FpnTeacherDistill',
+    architecture=dict(
+        cfg_path='mmdet::faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py',
+        pretrained=False),
+    teacher=dict(
+        cfg_path='mmdet::faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py',
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        teacher_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        distill_losses=dict(
+            loss_pkd_fpn0=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn1=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn2=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn3=dict(type='PKDLoss', loss_weight=6)),
+        loss_forward_mappings=dict(
+            loss_pkd_fpn0=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=0),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=0)),
+            loss_pkd_fpn1=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=1),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=1)),
+            loss_pkd_fpn2=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=2),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=2)),
+            loss_pkd_fpn3=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=3),
+                preds_T=dict(from_student=False, recorder='fpn',
+                             data_idx=3)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_fcos_x101_retina_r50_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_fcos_x101_retina_r50_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a8059acca3d0d5fa8ff5d56de0f673469f9594a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_fcos_x101_retina_r50_1x_coco.py
@@ -0,0 +1,27 @@
+_base_ = ['./pkd_fpn_retina_x101_retina_r50_2x_coco.py']
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco-ede514a8.pth'  # noqa: E501
+
+model = dict(
+    architecture=dict(
+        cfg_path='mmdet::retinanet/retinanet_r50_fpn_1x_coco.py'),
+    teacher=dict(
+        cfg_path=  # noqa: E251
+        'mmdet::fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py'),
+    teacher_ckpt=teacher_ckpt)
+
+# training schedule for 1x
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1)
+
+# learning rate
+param_scheduler = [
+    dict(
+        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=12,
+        by_epoch=True,
+        milestones=[8, 11],
+        gamma=0.1)
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ba6727f5dd6bf65092da61eb60e4adc22244c0d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_mask-rcnn_swin_retina_r50_2x_coco.py
@@ -0,0 +1,53 @@
+_base_ = [
+    'mmdet::_base_/datasets/coco_instance.py',
+    'mmdet::_base_/schedules/schedule_2x.py',
+    'mmdet::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/swin/mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco/mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco_20210903_104808-b92c91f1.pth'  # noqa: E501
+
+model = dict(
+    _scope_='mmrazor',
+    type='FpnTeacherDistill',
+    architecture=dict(
+        cfg_path='mmdet::retinanet/retinanet_r50_fpn_2x_coco.py',
+        pretrained=False),
+    teacher=dict(
+        cfg_path=  # noqa: E251
+        'mmdet::swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py',
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        teacher_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        distill_losses=dict(
+            loss_pkd_fpn0=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn1=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn2=dict(type='PKDLoss', loss_weight=6),
+            loss_pkd_fpn3=dict(type='PKDLoss', loss_weight=6)),
+        loss_forward_mappings=dict(
+            loss_pkd_fpn0=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=0),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=0)),
+            loss_pkd_fpn1=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=1),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=1)),
+            loss_pkd_fpn2=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=2),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=2)),
+            loss_pkd_fpn3=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=3),
+                preds_T=dict(from_student=False, recorder='fpn',
+                             data_idx=3)))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
+
+# optimizer
+optim_wrapper = dict(optimizer=dict(lr=0.01))
+
+# dataset
+val_evaluator = dict(metric=['bbox'])
+test_evaluator = val_evaluator
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..ecf06bd37b0242a876a4e029f56efea033395d1f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_reppoints_x101-dcn_reppoints_r50_2x_coco.py
@@ -0,0 +1,13 @@
+_base_ = ['./pkd_fpn_retina_x101_retina_r50_2x_coco.py']
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/reppoints/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco/reppoints_moment_x101_fpn_dconv_c3-c5_gn-neck%2Bhead_2x_coco_20200329-f87da1ea.pth'  # noqa: E501
+
+model = dict(
+    architecture=dict(
+        cfg_path=  # noqa: E251
+        'mmdet::reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py'),
+    teacher=dict(
+        cfg_path=  # noqa: E251
+        'mmdet::reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py'  # noqa: E501
+    ),
+    teacher_ckpt=teacher_ckpt)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_retina_x101_retina_r50_2x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_retina_x101_retina_r50_2x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef2844e96a420a1a02769eac3fafb393eae3dbd8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet/pkd/pkd_fpn_retina_x101_retina_r50_2x_coco.py
@@ -0,0 +1,19 @@
+_base_ = ['./pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py']
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_x101_64x4d_fpn_1x_coco/retinanet_x101_64x4d_fpn_1x_coco_20200130-366f5af1.pth'  # noqa: E501
+
+model = dict(
+    architecture=dict(
+        cfg_path='mmdet::retinanet/retinanet_r50_fpn_2x_coco.py'),
+    teacher=dict(
+        cfg_path='mmdet::retinanet/retinanet_x101-64x4d_fpn_1x_coco.py'),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        distill_losses=dict(
+            loss_pkd_fpn0=dict(loss_weight=10),
+            loss_pkd_fpn1=dict(loss_weight=10),
+            loss_pkd_fpn2=dict(loss_weight=10),
+            loss_pkd_fpn3=dict(loss_weight=10))))
+
+# optimizer
+optim_wrapper = dict(optimizer=dict(lr=0.01))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fdd191f69881be352d5167f4c093e340f1a22d75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/README.md
@@ -0,0 +1,30 @@
+# PKD
+
+> [PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient](https://arxiv.org/abs/2207.02039)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs. Furthermore, it converges faster. With a powerful MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1% and 4.8% higher than the baseline, respectively.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187424502-d8efb7a3-c40c-4e53-a36c-bd947de464a4.png)
+
+## Results and models
+
+### Detection
+
+| Location |  Dataset   |                                                                         Teacher                                                                         |     Student      | Lr schd | mAP  | mAP(T) | mAP(S) |                             Config                             | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| :------: | :--------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | :-----: | :--: | :----: | :----: | :------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+|   FPN    | nus-mono3d | [FCOS3d-R101](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py) | [FCOS3d-R50](<>) |   1x    | 29.3 |  32.1  |  26.8  | [config](pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d.py) | [teacher](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos3d_w10/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d_20220928_234557-0b51b62e.pth?versionId=CAEQThiBgMC8sdC0oBgiIDAwOWE2OWUyNDU1NTQ1MjBhZTY1NmNjODZmMDZkZTM2) \| [log](https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos3d_w10/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d_20220928_234557-0b51b62e.json?versionId=CAEQThiBgIDrvdC0oBgiIDNmNGNkNDZhM2RmNjQ1MmI4ZDM0OGNmYmFkYjk5ZjFi) |
+
+## Citation
+
+```latex
+@article{cao2022pkd,
+  title={PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient},
+  author={Cao, Weihan and Zhang, Yifan and Gao, Jianfei and Cheng, Anda and Cheng, Ke and Cheng, Jian},
+  journal={arXiv preprint arXiv:2207.02039},
+  year={2022}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..1f60cffd70311e2dc2c661e2550e66b769fa93fa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/metafile.yml
@@ -0,0 +1,22 @@
+Models:
+  - Name: pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d
+    In Collection: PKD
+    Metadata:
+      Location: FPN
+      Student:
+        Metrics:
+          box AP: 26.8
+        Config:
+        Weights:
+      Teacher:
+        Metrics:
+          box AP: 32.1
+        Config: mmdet3d::fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py
+        Weights: https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 29.3
+    Config: configs/distill/mmdet3d/pkd/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pkd/pkd_fcos3d_w10/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d_20220928_234557-0b51b62e.json?versionId=CAEQThiBgIDrvdC0oBgiIDNmNGNkNDZhM2RmNjQ1MmI4ZDM0OGNmYmFkYjk5ZjFi
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbbedf7e0c0468a10a1e5b042fdb19a7989a805f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmdet3d/pkd/pkd_fpn_fcos3d_r101_fcos3d_r50_8xb2-1x_nus-mono3d.py
@@ -0,0 +1,49 @@
+_base_ = [
+    'mmdet3d::fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d.py',
+]
+
+train_dataloader = dict(num_workers=4)
+
+student = _base_.model
+student.backbone.depth = 50  # using original ResNet50
+student.backbone.dcn = None  # no dcn in backbone
+student.backbone.stage_with_dcn = (False, False, False, False)
+student.backbone.init_cfg.checkpoint = 'open-mmlab://detectron2/resnet50_caffe'
+
+teacher_ckpt = 'https://download.openmmlab.com/mmdetection3d/v0.1.0_models/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='FpnTeacherDistill',
+    architecture=student,
+    teacher=dict(
+        cfg_path=  # noqa: E251
+        'mmdet3d::fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py',  # noqa: E501
+        pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        teacher_recorders=dict(fpn=dict(type='ModuleOutputs', source='neck')),
+        distill_losses=dict(
+            loss_pkd_fpn0=dict(type='PKDLoss', loss_weight=10),
+            loss_pkd_fpn1=dict(type='PKDLoss', loss_weight=10),
+            loss_pkd_fpn2=dict(type='PKDLoss', loss_weight=10),
+            loss_pkd_fpn3=dict(type='PKDLoss', loss_weight=10)),
+        loss_forward_mappings=dict(
+            loss_pkd_fpn0=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=0),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=0)),
+            loss_pkd_fpn1=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=1),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=1)),
+            loss_pkd_fpn2=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=2),
+                preds_T=dict(from_student=False, recorder='fpn', data_idx=2)),
+            loss_pkd_fpn3=dict(
+                preds_S=dict(from_student=True, recorder='fpn', data_idx=3),
+                preds_T=dict(from_student=False, recorder='fpn',
+                             data_idx=3)))))
+
+find_unused_parameters = True
+train_cfg = dict(val_interval=12)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4805c0696573883a43685629a84c4a4f0e619077
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/README.md
@@ -0,0 +1,37 @@
+# CWD
+
+> [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probability map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187424502-d8efb7a3-c40c-4e53-a36c-bd947de464a4.png)
+
+## Results and models
+
+### Segmentation
+
+| Location |  Dataset   |                                                             Teacher                                                              |                                                            Student                                                             | mIoU  | mIoU(T) | mIou(S) |    Config    | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| :------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | :---: | :-----: | :-----: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+|  logits  | cityscapes | [pspnet_r101](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py) | [pspnet_r18](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py) | 75.54 |  79.76  |  74.87  | [config](<>) | [teacher](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211-e1e1100f.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k_mIoU-75.54_20211222-3e643f6f.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/distill/cwd/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k_20211212_205711.log.json) |
+
+### Detection
+
+| Location | Dataset |                                                     Teacher                                                      |                                                Student                                                 | mAP  | mAP(T) | mAP(S) |    Config    | Download                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| :------: | :-----: | :--------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :--: | :----: | :----: | :----------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| cls head |  COCO   | [gfl_r101_2x](https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl/gfl_r101_fpn_mstrain_2x_coco.py) | [gfl_r50_1x](https://github.com/open-mmlab/mmdetection/tree/master/configs/gfl/gfl_r50_fpn_1x_coco.py) | 41.9 |  44.7  |  40.2  | [config](<>) | [teacher](https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco_20211222-c134bb21.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/distill/cwd/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco/cwd_cls_head_gfl_r101_fpn_gfl_r50_fpn_1x_coco_20211212_205444.log.json) |
+
+## Citation
+
+```latex
+@inproceedings{shu2021channel,
+  title={Channel-Wise Knowledge Distillation for Dense Prediction},
+  author={Shu, Changyong and Liu, Yifan and Gao, Jianfei and Yan, Zheng and Shen, Chunhua},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
+  pages={5311--5320},
+  year={2021}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3904549559f2d5d4f6006c546921692343117ca
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py
@@ -0,0 +1,31 @@
+_base_ = [
+    'mmseg::_base_/datasets/cityscapes.py',
+    'mmseg::_base_/schedules/schedule_80k.py',
+    'mmseg::_base_/default_runtime.py'
+]
+
+teacher_ckpt = 'https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211-e1e1100f.pth'  # noqa: E501
+teacher_cfg_path = 'mmseg::pspnet/pspnet_r101-d8_4xb2-80k_cityscapes-512x1024.py'  # noqa: E501
+student_cfg_path = 'mmseg::pspnet/pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py'  # noqa: E501
+model = dict(
+    _scope_='mmrazor',
+    type='SingleTeacherDistill',
+    architecture=dict(cfg_path=student_cfg_path, pretrained=False),
+    teacher=dict(cfg_path=teacher_cfg_path, pretrained=False),
+    teacher_ckpt=teacher_ckpt,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        distill_losses=dict(
+            loss_cwd=dict(type='ChannelWiseDivergence', tau=1, loss_weight=5)),
+        student_recorders=dict(
+            logits=dict(type='ModuleOutputs', source='decode_head.conv_seg')),
+        teacher_recorders=dict(
+            logits=dict(type='ModuleOutputs', source='decode_head.conv_seg')),
+        loss_forward_mappings=dict(
+            loss_cwd=dict(
+                preds_S=dict(from_student=True, recorder='logits'),
+                preds_T=dict(from_student=False, recorder='logits')))))
+
+find_unused_parameters = True
+
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..2bb293275f923989ae3f5ea1535c83578b41618b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/distill/mmseg/cwd/metafile.yml
@@ -0,0 +1,41 @@
+Collections:
+  - Name: CWD
+    Metadata:
+      Training Data:
+        - Cityscapes
+        - COCO
+    Paper:
+      URL: https://arxiv.org/abs/2011.13256
+      Title: Channel-wise Knowledge Distillation for Dense Prediction
+    README: configs/distill/mmseg/cwd/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/losses/cwd.py#L10
+      Version: v0.1.0
+    Converted From:
+      Code:
+        - https://github.com/pppppM/mmsegmentation-distiller
+        - https://github.com/pppppM/mmdetection-distiller
+Models:
+  - Name: cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024
+    In Collection: CWD
+    Metadata:
+      Location: logits
+      Student:
+        Metrics:
+          mIoU: 74.87
+          mIoU(ms+flip): 76.04
+        Config: mmseg::pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py
+        Weights: https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes/pspnet_r18-d8_512x1024_80k_cityscapes_20201225_021458-09ffa746.pth
+      Teacher:
+        Metrics:
+          mIoU: 79.76
+          mIoU(ms+flip): 81.01
+        Config: mmseg::pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py
+        Weights: https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211-e1e1100f.pth
+    Results:
+      - Task: Semantic Segmentation
+        Dataset: Cityscapes
+        Metrics:
+          mIoU: 75.54
+    Config: configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/cwd/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k/cwd_cls_head_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k_mIoU-75.54_20211222-3e643f6f.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/AUTOFORMER_SUBNET_B.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/AUTOFORMER_SUBNET_B.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e3672feaf8c19f01a22db1fd8bb1e96720937100
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/AUTOFORMER_SUBNET_B.yaml
@@ -0,0 +1,134 @@
+backbone.base_embed_dims:
+  chosen: 64
+backbone.blocks.0.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.0.middle_channels:
+  chosen: 3.5
+backbone.blocks.0.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.0.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.1.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.1.middle_channels:
+  chosen: 3.5
+backbone.blocks.1.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.1.mutable_q_embed_dims:
+  chosen: 64
+backbone.blocks.10.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.10.middle_channels:
+  chosen: 4.0
+backbone.blocks.10.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.10.mutable_q_embed_dims:
+  chosen: 64
+backbone.blocks.11.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.11.middle_channels:
+  chosen: 576
+backbone.blocks.11.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.11.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.12.attn.mutable_attrs.num_heads:
+  chosen: 9
+backbone.blocks.12.middle_channels:
+  chosen: 4.0
+backbone.blocks.12.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.12.mutable_q_embed_dims:
+  chosen: 9
+backbone.blocks.13.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.13.middle_channels:
+  chosen: 4.0
+backbone.blocks.13.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.13.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.14.attn.mutable_attrs.num_heads:
+  chosen: 8
+backbone.blocks.14.middle_channels:
+  chosen: 576
+backbone.blocks.14.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.14.mutable_q_embed_dims:
+  chosen: 8
+backbone.blocks.15.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.15.middle_channels:
+  chosen: 3.0
+backbone.blocks.15.mutable_mlp_ratios:
+  chosen: 3.0
+backbone.blocks.15.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.2.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.2.middle_channels:
+  chosen: 576
+backbone.blocks.2.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.2.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.3.attn.mutable_attrs.num_heads:
+  chosen: 8
+backbone.blocks.3.middle_channels:
+  chosen: 4.0
+backbone.blocks.3.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.3.mutable_q_embed_dims:
+  chosen: 8
+backbone.blocks.4.attn.mutable_attrs.num_heads:
+  chosen: 10
+backbone.blocks.4.middle_channels:
+  chosen: 576
+backbone.blocks.4.mutable_mlp_ratios:
+  chosen: 3.0
+backbone.blocks.4.mutable_q_embed_dims:
+  chosen: 10
+backbone.blocks.5.attn.mutable_attrs.num_heads:
+  chosen: 9
+backbone.blocks.5.middle_channels:
+  chosen: 3.0
+backbone.blocks.5.mutable_mlp_ratios:
+  chosen: 3.0
+backbone.blocks.5.mutable_q_embed_dims:
+  chosen: 9
+backbone.blocks.6.attn.mutable_attrs.num_heads:
+  chosen: 8
+backbone.blocks.6.middle_channels:
+  chosen: 576
+backbone.blocks.6.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.6.mutable_q_embed_dims:
+  chosen: 8
+backbone.blocks.7.attn.mutable_attrs.num_heads:
+  chosen: 8
+backbone.blocks.7.middle_channels:
+  chosen: 3.5
+backbone.blocks.7.mutable_mlp_ratios:
+  chosen: 3.5
+backbone.blocks.7.mutable_q_embed_dims:
+  chosen: 8
+backbone.blocks.8.attn.mutable_attrs.num_heads:
+  chosen: 9
+backbone.blocks.8.middle_channels:
+  chosen: 576
+backbone.blocks.8.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.8.mutable_q_embed_dims:
+  chosen: 9
+backbone.blocks.9.attn.mutable_attrs.num_heads:
+  chosen: 8
+backbone.blocks.9.middle_channels:
+  chosen: 576
+backbone.blocks.9.mutable_mlp_ratios:
+  chosen: 4.0
+backbone.blocks.9.mutable_q_embed_dims:
+  chosen: 8
+backbone.mutable_depth:
+  chosen: 14
+backbone.mutable_embed_dims:
+  chosen: 576
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..161e83a5666e3c02df3316893b85108cd044cdb0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/README.md
@@ -0,0 +1,75 @@
+# AutoFormer
+
+> [Searching Transformers for Visual Recognition](https://arxiv.org/abs/2107.00651)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Recently, pure transformer-based models have shown
+great potentials for vision tasks such as image classification and detection. However, the design of transformer networks is challenging. It has been observed that the depth,
+embedding dimension, and number of heads can largely affect the performance of vision transformers. Previous models configure these dimensions based upon manual crafting. In this work, we propose a new one-shot architecture
+search framework, namely AutoFormer, dedicated to vision
+transformer search. AutoFormer entangles the weights of
+different blocks in the same layers during supernet training. Benefiting from the strategy, the trained supernet allows thousands of subnets to be very well-trained. Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained
+from scratch. Besides, the searched models, which we refer to AutoFormers, surpass the recent state-of-the-arts such
+as ViT and DeiT. In particular, AutoFormer-tiny/small/base
+achieve 74.7%/81.7%/82.4% top-1 accuracy on ImageNet
+with 5.7M/22.9M/53.7M parameters, respectively. Lastly,
+we verify the transferability of AutoFormer by providing
+the performance on downstream benchmarks and distillation experiments.
+
+![pipeline](/docs/en/imgs/model_zoo/autoformer/pipeline.png)
+
+## Get Started
+
+### Step 1: Supernet pre-training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/autoformer/autoformer_supernet_32xb256_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Step 2: Search for subnet on the trained supernet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/autoformer/autoformer_search_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP1_CKPT
+```
+
+### Step 3: Subnet inference on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/mmcls/autoformer/autoformer_subnet_8xb128_in1k.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP1_CKPT model.init_weight_from_supernet=True
+
+```
+
+## Results and models
+
+| Dataset  | Supernet |                               Subnet                               | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                     Config                      | Download                                                                                                                                                                                                                                                  |     Remarks      |
+| :------: | :------: | :----------------------------------------------------------------: | :-------: | :------: | :-------: | :-------: | :---------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------: |
+| ImageNet |   vit    | [mutable](./configs/nas/mmcls/autoformer/AUTOFORMER_SUBNET_B.yaml) |  54.319   |  10.57   |   82.47   |   95.99   | [config](./autoformer_supernet_32xb256_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/autoformer/autoformer_supernet_32xb256_in1k_20220919_110144-c658ce8f.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/autoformer/autoformer_supernet_32xb256_in1k_20220919_110144-c658ce8f.json) | MMRazor searched |
+
+**Note**:
+
+1. There are some small differences in our experiment in order to be consistent with mmrazor repo. For example, we set the max value of embed_channels 624 while the original repo set it 640. However, the original repo only search 528, 576, 624 embed_channels, so set 624 can also get the same result with orifinal paper.
+2. The original paper get 82.4 top-1 acc with 53.7M Params while we get 82.48 top-1 acc with 52.47M Params.
+
+## Citation
+
+```latex
+@article{xu2021autoformer,
+  title={Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting},
+  author={Xu, Jiehui and Wang, Jianmin and Long, Mingsheng and others},
+  journal={Advances in Neural Information Processing Systems},
+  volume={34},
+  year={2021}
+}
+```
+
+Footer
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_search_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_search_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..be1d4660d1f7431f2276d5587698e44e2e504e1a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_search_8xb128_in1k.py
@@ -0,0 +1,17 @@
+_base_ = ['./autoformer_supernet_32xb256_in1k.py']
+
+custom_hooks = None
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=20,
+    top_k=10,
+    num_mutation=5,
+    num_crossover=5,
+    mutate_prob=0.2,
+    constraints_range=dict(params=(0, 55)),
+    score_key='accuracy/top1')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_subnet_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_subnet_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4d53ae769bef0d5ac0da031f3c0e825895ad502
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_subnet_8xb256_in1k.py
@@ -0,0 +1,17 @@
+_base_ = 'autoformer_supernet_32xb256_in1k.py'
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/autoformer/AUTOFORMER_SUBNET_B.yaml',
+    # You can also load the checkpoint of supernet instead of the specific
+    # subnet by modifying the `checkpoint`(path) in the following `init_cfg`
+    # with `init_weight_from_supernet = True`.
+    init_weight_from_supernet=False,
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/autoformer/autoformer_supernet_32xb256_in1k_20220919_110144-c658ce8f.pth',  # noqa: E501
+        prefix='architecture.'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_supernet_32xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_supernet_32xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..21555bfe7a5a9942d47ff8866a877381e9b7e0b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoformer/autoformer_supernet_32xb256_in1k.py
@@ -0,0 +1,68 @@
+_base_ = [
+    'mmrazor::_base_/settings/imagenet_bs2048_AdamW.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+# data preprocessor
+data_preprocessor = dict(
+    _scope_='mmcls',
+    type='ClsDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    to_rgb=True,
+    num_classes=1000,
+    batch_augments=dict(
+        augments=[
+            dict(type='Mixup', alpha=0.2),
+            dict(type='CutMix', alpha=1.0)
+        ],
+        probs=[0.5, 0.5]))
+
+arch_setting = dict(
+    mlp_ratios=[3.0, 3.5, 4.0],
+    num_heads=[8, 9, 10],
+    depth=[14, 15, 16],
+    embed_dims=[528, 576, 624])
+
+supernet = dict(
+    _scope_='mmrazor',
+    type='SearchableImageClassifier',
+    data_preprocessor=data_preprocessor,
+    backbone=dict(
+        _scope_='mmrazor',
+        type='AutoformerBackbone',
+        arch_setting=arch_setting),
+    neck=None,
+    head=dict(
+        type='DynamicLinearClsHead',
+        num_classes=1000,
+        in_channels=624,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            mode='original',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            loss_weight=1.0),
+        topk=(1, 5)),
+    connect_head=dict(connect_with_backbone='backbone.last_mutable'),
+)
+
+model = dict(
+    type='mmrazor.Autoformer',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'))
+
+# runtime setting
+custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]
+
+# checkpoint saving
+_base_.default_hooks.checkpoint = dict(
+    type='CheckpointHook',
+    interval=2,
+    by_epoch=True,
+    save_best='accuracy/top1',
+    max_keep_ckpts=3)
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..37a651ddbdf48996ce8792f750f09f50db983f8d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/README.md
@@ -0,0 +1,87 @@
+# AutoSlim
+
+> [AutoSlim: Towards One-Shot Architecture Search for Channel Numbers](https://arxiv.org/abs/1903.11728)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods.
+Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2 (301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNet-v1 (569M FLOPs).
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187425354-d90e4b36-e033-4dc0-b951-64a536e61b71.png)
+
+## Get Started
+
+### Supernet pre-training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/autoslim/autoslim_mbv2_1.5x_supernet_8xb256_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Search for subnet on the trained supernet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/autoslim/autoslim_mbv2_1.5x_search_8xb256_in1k.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP1_CKPT
+```
+
+### Subnet retraining on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py 4 \
+  --work-dir $WORK_DIR \
+  --cfg-options algorithm.channel_cfg=configs/nas/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml,configs/nas/autoslim/AUTOSLIM_MBV2_320M_OFFICIAL.yaml,configs/nas/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml
+```
+
+### Split checkpoint
+
+```bash
+python ./tools/model_converters/split_checkpoint.py \
+  configs/nas/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py \
+  $RETRAINED_CKPT \
+  --channel-cfgs configs/nas/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml configs/nas/autoslim/AUTOSLIM_MBV2_320M_OFFICIAL.yaml configs/nas/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml
+```
+
+### Subnet inference
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py \
+  $SEARCHED_CKPT 1 --work-dir $WORK_DIR \
+  --cfg-options algorithm.channel_cfg=configs/nas/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml  # or modify the config directly
+```
+
+## Results and models
+
+### Subnet retrain
+
+| Supernet           | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                     Config                      |                                                                                                                                                                  Download                                                                                                                                                                   |                                                                                               Subnet                                                                                               |        Remark        |
+| :----------------- | :-------: | -------: | :-------: | :-------: | :---------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
+| MobileNet v2(x1.5) |    6.5    |     0.53 |   74.23   |   91.74   | [config](./autoslim_mbv2_subnet_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-530M_acc-74.23_20220715-aa8754fe.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1kautoslim_mbv2_subnet_8xb256_in1k_paper_channel_cfg.log.json) | [channel](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1k_flops-0.53M_acc-74.23_20211222-e5208bbd_channel_cfg.yaml) | official channel cfg |
+| MobileNet v2(x1.5) |   5.77    |     0.32 |   72.73   |   90.83   | [config](./autoslim_mbv2_subnet_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-320M_acc-72.73_20220715-9aa8f8ae.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1kautoslim_mbv2_subnet_8xb256_in1k_paper_channel_cfg.log.json) | [channel](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1k_flops-0.32M_acc-72.73_20211222-b5b0b33c_channel_cfg.yaml) | official channel cfg |
+| MobileNet v2(x1.5) |   4.13    |     0.22 |   71.39   |   90.08   | [config](./autoslim_mbv2_subnet_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-220M_acc-71.4_20220715-9c288f3b.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1kautoslim_mbv2_subnet_8xb256_in1k_paper_channel_cfg.log.json)  | [channel](https://download.openmmlab.com/mmrazor/v0.1/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k/autoslim_mbv2_subnet_8xb256_in1k_flops-0.22M_acc-71.39_20211222-43117c7b_channel_cfg.yaml) | official channel cfg |
+
+Note that we ran the official code and the Top-1 Acc of the models with official
+channel cfg are 73.8%, 72.5% and 71.1%. And there are 3 differences between our
+implementation and the official one.
+
+1. The implementation of Label Smooth is slightly different.
+2. Lighting is not used in our data pipeline. (Lighting is a kind of data
+   augmentation which adjust images lighting using AlexNet-style PCA jitter.)
+3. We do not recalibrating BN statistics after training.
+
+## Citation
+
+```latex
+@article{yu2019autoslim,
+  title={Autoslim: Towards one-shot architecture search for channel numbers},
+  author={Yu, Jiahui and Huang, Thomas},
+  journal={arXiv preprint arXiv:1903.11728},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_search_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_search_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5b3ba21d05caae91bfcd402cec268e516bf3168
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_search_8xb256_in1k.py
@@ -0,0 +1,19 @@
+_base_ = ['./autoslim_mbv2_1.5x_supernet_8xb256_in1k.py']
+
+model = dict(bn_training_mode=True)
+
+train_cfg = None
+optim_wrapper = None
+param_scheduler = None
+train_dataloader = None
+
+val_cfg = None
+val_dataloader = None
+val_evaluator = None
+
+test_cfg = dict(
+    _delete_=True,
+    type='mmrazor.AutoSlimGreedySearchLoop',
+    dataloader=_base_.test_dataloader,
+    evaluator=_base_.test_evaluator,
+    target_flops=(500., 300., 200.))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..61d64a22630c358d1d03d9808b526e185a93628c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py
@@ -0,0 +1,46 @@
+_base_ = [
+    'mmrazor::_base_/settings/imagenet_bs2048_autoslim_pil.py',
+    'mmcls::_base_/models/mobilenet_v2_1x.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+supernet = _base_.model
+supernet.backbone.widen_factor = 1.5
+supernet.head.in_channels = 1920
+
+# !dataset config
+# ==========================================================================
+# data preprocessor
+data_preprocessor = dict(
+    type='ImgDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    bgr_to_rgb=True)
+
+# !autoslim algorithm config
+# ==========================================================================
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='SlimmableNetwork',
+    architecture=supernet,
+    data_preprocessor=data_preprocessor,
+    mutator=dict(
+        type='SlimmableChannelMutator',
+        channel_unit_cfg=dict(
+            type='SlimmableChannelUnit',
+            units='tests/data/MBV2_slimmable_config.json'),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.SlimmableNetworkDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+val_cfg = dict(type='mmrazor.SlimmableValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-220M.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-220M.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ed71daf55c4e0d4a54efc92218797d4b7828906
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-220M.py
@@ -0,0 +1,3 @@
+_base_ = 'autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py'
+
+model = dict(deploy_index=0)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-320M.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-320M.py
new file mode 100644
index 0000000000000000000000000000000000000000..e53aae1bce95b1010ce063fb75092c8235e84a68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-320M.py
@@ -0,0 +1,3 @@
+_base_ = 'autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py'
+
+model = dict(deploy_index=1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M.py
new file mode 100644
index 0000000000000000000000000000000000000000..218a9b036becd7f7d30d37557b199b2c91cde664
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M.py
@@ -0,0 +1,3 @@
+_base_ = 'autoslim_mbv2_1.5x_slimmable_subnet_8xb256_in1k.py'
+
+model = dict(deploy_index=2)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_supernet_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_supernet_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..91cd9d2de74c99cc1ee893486eb185795f9e46fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_supernet_8xb256_in1k.py
@@ -0,0 +1,68 @@
+_base_ = [
+    '../../../_base_/settings/imagenet_bs2048_autoslim_pil.py',
+    'mmcls::_base_/models/mobilenet_v2_1x.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+supernet = _base_.model
+supernet.backbone.widen_factor = 1.5
+supernet.head.in_channels = 1920
+
+# !dataset config
+# ==========================================================================
+# data preprocessor
+data_preprocessor = dict(
+    type='ImgDataPreprocessor',
+    # RGB format normalization parameters
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    # convert image from BGR to RGB
+    bgr_to_rgb=True,
+)
+
+# !autoslim algorithm config
+num_random_samples = 2
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='AutoSlim',
+    num_random_samples=num_random_samples,
+    architecture=supernet,
+    data_preprocessor=data_preprocessor,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(recorder='fc', from_student=True),
+                preds_T=dict(recorder='fc', from_student=False)))),
+    mutator=dict(
+        type='OneShotChannelMutator',
+        channel_unit_cfg=dict(
+            type='OneShotMutableChannelUnit',
+            default_args=dict(
+                candidate_choices=list(i / 12 for i in range(2, 13)),
+                choice_mode='ratio',
+                divisor=8)),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.AutoSlimDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=False)
+
+# learning policy
+max_epochs = 50
+param_scheduler = dict(end=max_epochs)
+
+# train, val, test setting
+train_cfg = dict(max_epochs=max_epochs)
+val_cfg = dict(type='mmrazor.SubnetValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c46cd97b9e8c4589fb6b442dfe7409e211c654d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/autoslim/metafile.yml
@@ -0,0 +1,60 @@
+Collections:
+  - Name: AutoSlim
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/1903.11728
+      Title: AutoSlim:Towards One-Shot Architecture Search for Channel Numbers
+    README: configs/nas/mmcls/autoslim/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/algorithms/autoslim.py
+      Version: v0.1.0
+    Converted From:
+      Code: https://github.com/JiahuiYu/slimmable_networks
+Models:
+  - Name: autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M
+    In Collection: AutoSlim
+    Metadata:
+      Flops(G): 0.53
+      Params(M): 6.5
+      Supernet: MobileNet v2(x1.5)
+      Channel: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-530M_acc-74.23_20220715-aa8754fe_subnet_cfg.yaml
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 74.23
+          Top 5 Accuracy: 91.73
+    Config: configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-530M_acc-74.23_20220715-aa8754fe.pth
+  - Name: autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-320M
+    In Collection: AutoSlim
+    Metadata:
+      Flops(G): 0.32
+      Params(M): 5.77
+      Supernet: MobileNet v2(x1.5)
+      Channel: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-320M_acc-72.73_20220715-9aa8f8ae_subnet_cfg.yaml
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 72.73
+          Top 5 Accuracy: 90.84
+    Config: configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-320M.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-320M_acc-72.73_20220715-9aa8f8ae.pth
+  - Name: autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-220M
+    In Collection: AutoSlim
+    Metadata:
+      Flops(G): 0.22
+      Params(M): 4.13
+      Supernet: MobileNet v2(x1.5)
+      Channel: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-220M_acc-71.4_20220715-9c288f3b_subnet_cfg.yaml
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 71.4
+          Top 5 Accuracy: 90.08
+    Config: configs/nas/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-220M.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/autoslim/autoslim_mbv2_subnet_8xb256_in1k_flops-220M_acc-71.4_20220715-9c288f3b.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A0.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A0.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e926d4b035b056aa478d1f6423e41673233d2f33
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A0.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1792
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 4
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 3
+backbone.layers.4.expand_ratio:
+  chosen: 4
+backbone.layers.4.depth:
+  chosen: 3
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 3
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 3
+backbone.layers.6.out_channels:
+  chosen: 192
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 216
+input_shape:
+  chosen:
+  - 192
+  - 192
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A1.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b4b78c7a28ac635f64f988ca4fd48d22aceacbe2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A1.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1984
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 4
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 5
+backbone.layers.4.expand_ratio:
+  chosen: 4
+backbone.layers.4.depth:
+  chosen: 3
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 5
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 3
+backbone.layers.6.out_channels:
+  chosen: 192
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 216
+input_shape:
+  chosen:
+  - 224
+  - 224
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A2.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..704b7905122a756c8abc7744f6b5fb2ef6fb6bc5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A2.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1984
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 5
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 3
+backbone.layers.4.expand_ratio:
+  chosen: 4
+backbone.layers.4.depth:
+  chosen: 3
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 5
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 4
+backbone.layers.6.out_channels:
+  chosen: 200
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 224
+input_shape:
+  chosen:
+  - 224
+  - 224
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A3.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A3.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..45f4d8411774da9e26b6dae335fe40393ee5b5f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A3.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1984
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 2
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 4
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 3
+backbone.layers.4.expand_ratio:
+  chosen: 4
+backbone.layers.4.depth:
+  chosen: 4
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 5
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 3
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 5
+backbone.layers.6.out_channels:
+  chosen: 208
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 224
+input_shape:
+  chosen:
+  - 224
+  - 224
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A4.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A4.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e15485bc304453fed2c0fee6c7e18af533aea344
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A4.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1984
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 4
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 5
+backbone.layers.4.expand_ratio:
+  chosen: 5
+backbone.layers.4.depth:
+  chosen: 4
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 5
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 5
+backbone.layers.6.out_channels:
+  chosen: 192
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 216
+input_shape:
+  chosen:
+  - 256
+  - 256
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A5.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A5.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c0b789079a8f86ce10e1a919df1b5e9a57761b79
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A5.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1792
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 5
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.out_channels:
+  chosen: 32
+backbone.layers.4.kernel_size:
+  chosen: 5
+backbone.layers.4.expand_ratio:
+  chosen: 4
+backbone.layers.4.depth:
+  chosen: 3
+backbone.layers.4.out_channels:
+  chosen: 64
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.depth:
+  chosen: 4
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.kernel_size:
+  chosen: 3
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.depth:
+  chosen: 6
+backbone.layers.6.out_channels:
+  chosen: 192
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.out_channels:
+  chosen: 224
+input_shape:
+  chosen:
+  - 256
+  - 256
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A6.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A6.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..91f4d6a35e906a0db0d03eb67b1721ad944905a1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A6.yaml
@@ -0,0 +1,64 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1984
+backbone.layers.1.kernel_size:
+  chosen: 3
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.expand_ratio:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 24
+backbone.layers.2.kernel_size:
+  chosen: 3
+backbone.layers.2.depth:
+  chosen: 3
+backbone.layers.2.expand_ratio:
+  chosen: 4
+backbone.layers.2.out_channels:
+  chosen: 32
+backbone.layers.3.kernel_size:
+  chosen: 3
+backbone.layers.3.depth:
+  chosen: 3
+backbone.layers.3.expand_ratio:
+  chosen: 6
+backbone.layers.3.out_channels:
+  chosen: 40
+backbone.layers.4.kernel_size:
+  chosen: 3
+backbone.layers.4.depth:
+  chosen: 4
+backbone.layers.4.expand_ratio:
+  chosen: 5
+backbone.layers.4.out_channels:
+  chosen: 72
+backbone.layers.5.kernel_size:
+  chosen: 3
+backbone.layers.5.depth:
+  chosen: 4
+backbone.layers.5.expand_ratio:
+  chosen: 4
+backbone.layers.5.out_channels:
+  chosen: 128
+backbone.layers.6.kernel_size:
+  chosen: 5
+backbone.layers.6.depth:
+  chosen: 6
+backbone.layers.6.expand_ratio:
+  chosen: 6
+backbone.layers.6.out_channels:
+  chosen: 216
+backbone.layers.7.kernel_size:
+  chosen: 3
+backbone.layers.7.depth:
+  chosen: 1
+backbone.layers.7.expand_ratio:
+  chosen: 6
+backbone.layers.7.out_channels:
+  chosen: 224
+input_shape:
+  chosen:
+  - 288
+  - 288
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d6b117f11c58a2ae85f8991bc56b82f9c911f4ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/README.md
@@ -0,0 +1,69 @@
+# BigNAS
+
+> [BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models](https://arxiv.org/abs/2003.11142)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.
+
+## Get Started
+
+### Step 1: Supernet pre-training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Step 2: Search for subnet on the trained supernet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP1_CKPT
+```
+
+### Step 3: Subnet inference on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP2_CKPT model.init_weight_from_supernet=False
+```
+
+## Results and models
+
+| Dataset  |       Supernet       |                                                              Subnet                                                               |       Params(M)        |       Flops(G)       |          Top-1          |                                                              Config                                                               |                                                                       Download                                                                       |                                                                      Remarks                                                                      |
+| :------: | :------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :--------------------: | :------------------: | :---------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: |
+| ImageNet | AttentiveMobileNetV3 | [search space](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/_base_/nas_backbones/attentive_mobilenetv3_supernet.py) | 8.854(min) / 23.3(max) | 212(min) / 1944(max) | 77.19(min) / 81.42(max) | [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py) | [model\*](https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_supernet_32xb64_in1k_flops-2G_acc-81.72_20221229_200440-954772a3.pth) | [log](https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_supernet_32xb64_in1k_20221227_175800-bcf94eaa.json)  (`sandwich rule`) |
+| ImageNet | AttentiveMobileNetV3 |                  [AttentiveNAS-A0\*](https://download.openmmlab.com/mmrazor/v1/bignas/ATTENTIVE_SUBNET_A0.yaml)                   |         8.854          |         212          |          77.19          |  [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py)  | [model](https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_subnet_8xb256_in1k_flops-0.21G_acc-77.19_20221229_200440-282a1f70.pth)  |                                                              Converted from the repo                                                              |
+| ImageNet | AttentiveMobileNetV3 |                  [AttentiveNAS-A6\*](https://download.openmmlab.com/mmrazor/v1/bignas/ATTENTIVE_SUBNET_A6.yaml)                   |         15.594         |         927          |          80.81          |  [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py)  | [model](https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_subnet_8xb256_in1k_flops-0.93G_acc-80.81_20221229_200440-73d92cc6.pth)  |                                                              Converted from the repo                                                              |
+
+*Models with * are converted from the [official repo](https://github.com/facebookresearch/AttentiveNAS). The config files of these models
+are only for inference. We support training the supernet by `sandwich rule`, which is different from `rejection sampling` in [official repo](https://github.com/facebookresearch/AttentiveNAS), and welcome you to contribute your reproduction results.*
+
+**Note**: In the official `AttentiveNAS` code, the `AutoAugmentation` in Calib-BN subnet recommended to use large batchsize to evaluation like `256`, which leads to higher performance. Compared with the original configuration file, this configuration has been modified as follows:
+
+- modified the settings related to `batchsize` in `train_pipeline` and `test_pipeline`, e.g. setting `train_dataloader.batch_size=256`、 `val_dataloader.batch_size=256`、`test_cfg.calibrate_sample_num=16384` and `collate_fn=dict(type='default_collate')` in train_dataloader.
+- setting `dict(type='mmrazor.AutoAugment', policies='original')` instead of `dict(type='mmrazor.AutoAugmentV2', policies=policies)` in train_pipeline.
+
+1. Used search_space in AttentiveNAS, which is different from BigNAS paper.
+2. The Top-1 Acc is unstable and may fluctuate by about 0.1, convert [the official weight](https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_supernet_32xb64_in1k_flops-2G_acc-81.72_20221229_200440-954772a3.pth) according to the [converter script](../../../../tools/model_converters/convert_attentivenas_nas_ckpt.py). A Calib-BN model will be released later.
+3. We have observed that the searchable model has been officially released. We will also provide the completed version of supernet training configuration in the future.
+
+## Citation
+
+```latex
+@inproceedings{yu2020bignas,
+  title={Bignas: Scaling up neural architecture search with big single-stage models},
+  author={Yu, Jiahui and Jin, Pengchong and Liu, Hanxiao and Bender, Gabriel and Kindermans, Pieter-Jan and Tan, Mingxing and Huang, Thomas and Song, Xiaodan and Pang, Ruoming and Le, Quoc},
+  booktitle={European Conference on Computer Vision},
+  pages={702--717},
+  year={2020},
+  organization={Springer}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ee2f9578159ed2ff09b0ea53810c3b6ecb8ecec
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py
@@ -0,0 +1,16 @@
+_base_ = ['./attentive_mobilenet_supernet_32xb64_in1k.py']
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=50,
+    top_k=10,
+    num_mutation=25,
+    num_crossover=25,
+    mutate_prob=0.1,
+    calibrate_sample_num=4096,
+    constraints_range=dict(flops=(0., 700.)),
+    score_key='accuracy/top1')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ce2cfec680d351c80c0b58af2dff030d49e48d9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py
@@ -0,0 +1,24 @@
+_base_ = 'attentive_mobilenet_supernet_32xb64_in1k.py'
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/bignas/ATTENTIVE_SUBNET_A0.yaml',
+    # You can load the checkpoint of supernet instead of the specific
+    # subnet by modifying the `checkpoint`(path) in the following `init_cfg`
+    # with `init_weight_from_supernet = True`.
+    init_weight_from_supernet=True,
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/bignas/attentive_mobilenet_supernet_32xb64_in1k_flops-2G_acc-81.72_20221229_200440-954772a3.pth',  # noqa: E501
+        prefix='architecture.'))
+
+model_wrapper_cfg = None
+find_unused_parameters = True
+
+test_cfg = dict(evaluate_fixed_subnet=True)
+
+default_hooks = dict(checkpoint=None)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..2683b7e5b9cc7f51feaf4d3ceab3a8c60c3612ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py
@@ -0,0 +1,59 @@
+_base_ = [
+    'mmcls::_base_/default_runtime.py',
+    'mmrazor::_base_/settings/imagenet_bs2048_bignas.py',
+    'mmrazor::_base_/nas_backbones/attentive_mobilenetv3_supernet.py',
+]
+
+supernet = dict(
+    _scope_='mmrazor',
+    type='SearchableImageClassifier',
+    data_preprocessor=_base_.data_preprocessor,
+    backbone=_base_.nas_backbone,
+    neck=dict(type='SqueezeMeanPoolingWithDropout', drop_ratio=0.2),
+    head=dict(
+        type='DynamicLinearClsHead',
+        num_classes=1000,
+        in_channels=1984,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=1.0),
+        topk=(1, 5)),
+    input_resizer_cfg=_base_.input_resizer_cfg,
+    connect_head=dict(connect_with_backbone='backbone.last_mutable_channels'),
+)
+
+model = dict(
+    _scope_='mmrazor',
+    type='BigNAS',
+    drop_path_rate=0.2,
+    num_random_samples=2,
+    backbone_dropout_stages=[6, 7],
+    architecture=supernet,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(recorder='fc', from_student=True),
+                preds_T=dict(recorder='fc', from_student=False)))),
+    mutator=dict(type='mmrazor.NasMutator'))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.BigNASDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+optim_wrapper_cfg = dict(
+    type='OptimWrapper', clip_grad=dict(type='value', clip_value=0.2))
+
+default_hooks = dict(
+    checkpoint=dict(
+        type='CheckpointHook', interval=1, max_keep_ckpts=1, save_best='auto'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_MMRAZOR_97.32.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_MMRAZOR_97.32.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9a3610daf7aff6f20f0fa747610e2a92070218ea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_MMRAZOR_97.32.yaml
@@ -0,0 +1,80 @@
+normal_n2:
+  chosen:
+  - normal_n2_p0
+  - normal_n2_p1
+normal_n2_p0:
+  chosen:
+  - sep_conv_3x3
+normal_n2_p1:
+  chosen:
+  - sep_conv_3x3
+normal_n3:
+  chosen:
+  - normal_n3_p0
+  - normal_n3_p1
+normal_n3_p0:
+  chosen:
+  - skip_connect
+normal_n3_p1:
+  chosen:
+  - sep_conv_5x5
+normal_n4:
+  chosen:
+  - normal_n4_p0
+  - normal_n4_p1
+normal_n4_p0:
+  chosen:
+  - sep_conv_3x3
+normal_n4_p1:
+  chosen:
+  - skip_connect
+normal_n5:
+  chosen:
+  - normal_n5_p0
+  - normal_n5_p1
+normal_n5_p0:
+  chosen:
+  - skip_connect
+normal_n5_p1:
+  chosen:
+  - skip_connect
+reduce_n2:
+  chosen:
+  - reduce_n2_p0
+  - reduce_n2_p1
+reduce_n2_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n2_p1:
+  chosen:
+  - sep_conv_3x3
+reduce_n3:
+  chosen:
+  - reduce_n3_p0
+  - reduce_n3_p2
+reduce_n3_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n3_p2:
+  chosen:
+  - dil_conv_5x5
+reduce_n4:
+  chosen:
+  - reduce_n4_p0
+  - reduce_n4_p2
+reduce_n4_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n4_p2:
+  chosen:
+  - skip_connect
+reduce_n5:
+  chosen:
+  - reduce_n5_p0
+  - reduce_n5_p2
+reduce_n5_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n5_p2:
+  chosen:
+  - skip_connect
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_PAPER_ALIAS.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_PAPER_ALIAS.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..347874e6449c69abc6fb10e4f7020c405a1875ea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_PAPER_ALIAS.yaml
@@ -0,0 +1,116 @@
+normal_n2:
+  chosen:
+  - normal_n2_p1
+  - normal_n2_p0
+normal_n2_p0:
+  chosen:
+  - sep_conv_3x3
+normal_n2_p1:
+  chosen:
+  - sep_conv_3x3
+normal_n3:
+  chosen:
+  - normal_n3_p0
+  - normal_n3_p1
+normal_n3_p0:
+  chosen:
+  - sep_conv_3x3
+normal_n3_p1:
+  chosen:
+  - sep_conv_3x3
+normal_n3_p2:
+  chosen:
+  - sep_conv_3x3
+normal_n4:
+  chosen:
+  - normal_n4_p0
+  - normal_n4_p1
+normal_n4_p0:
+  chosen:
+  - skip_connect
+normal_n4_p1:
+  chosen:
+  - sep_conv_3x3
+normal_n4_p2:
+  chosen:
+  - skip_connect
+normal_n4_p3:
+  chosen:
+  - sep_conv_3x3
+normal_n5:
+  chosen:
+  - normal_n5_p2
+  - normal_n5_p0
+normal_n5_p0:
+  chosen:
+  - skip_connect
+normal_n5_p1:
+  chosen:
+  - skip_connect
+normal_n5_p2:
+  chosen:
+  - dil_conv_3x3
+normal_n5_p3:
+  chosen:
+  - skip_connect
+normal_n5_p4:
+  chosen:
+  - skip_connect
+reduce_n2:
+  chosen:
+  - reduce_n2_p0
+  - reduce_n2_p1
+reduce_n2_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n2_p1:
+  chosen:
+  - max_pool_3x3
+reduce_n3:
+  chosen:
+  - reduce_n3_p1
+  - reduce_n3_p2
+reduce_n3_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n3_p1:
+  chosen:
+  - max_pool_3x3
+reduce_n3_p2:
+  chosen:
+  - skip_connect
+reduce_n4:
+  chosen:
+  - reduce_n4_p2
+  - reduce_n4_p0
+reduce_n4_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n4_p1:
+  chosen:
+  - max_pool_3x3
+reduce_n4_p2:
+  chosen:
+  - skip_connect
+reduce_n4_p3:
+  chosen:
+  - skip_connect
+reduce_n5:
+  chosen:
+  - reduce_n5_p1
+  - reduce_n5_p2
+reduce_n5_p0:
+  chosen:
+  - max_pool_3x3
+reduce_n5_p1:
+  chosen:
+  - max_pool_3x3
+reduce_n5_p2:
+  chosen:
+  - skip_connect
+reduce_n5_p3:
+  chosen:
+  - skip_connect
+reduce_n5_p4:
+  chosen:
+  - skip_connect
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5d3da3d8e2f1a945f315c4e23b9e53149b6335ed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/README.md
@@ -0,0 +1,65 @@
+# DARTS
+
+> [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187425171-2dfe7fbf-7c2c-4c22-9219-2234aa83e47d.png)
+
+## Get Started
+
+### Step 1: Supernet training on Cifar-10
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/darts/darts_supernet_unroll_1xb96_cifar10.py 4 \
+  --work-dir $WORK_DIR
+```
+
+## Step 2: Subnet retraining on Cifar-10
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py 4 \
+  --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP2_CKPT
+```
+
+## Step 3: Subnet inference on Cifar-10
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP2_CKPT
+```
+
+## Results and models
+
+### Supernet
+
+| Dataset | Unroll |                       Config                       |                                                                                                                                                                                                                                                            Download                                                                                                                                                                                                                                                             |
+| :-----: | :----: | :------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| Cifar10 |  True  | [config](./darts_supernet_unroll_1xb64_cifar10.py) | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v0.1/nas/darts/darts_supernet_unroll_1xb64_cifar10/darts_supernet_unroll_1xb64_cifar10_20211222-a923a040.pth?versionId=CAEQHxiBgID6mLuL7xciIDhjYzA2NGViNzY5ZDQxODk5MTY3ZjBiMGUyMGNlYzlk) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v0.1/nas/darts/darts_supernet_unroll_1xb64_cifar10/darts_supernet_unroll_1xb64_cifar10_20211220_133123.log.json?versionId=CAEQHxiBgIDmmLuL7xciIGQwN2RlZWUwNmZkYjQwMzU4MGRiMTA3NGY4NTU5N2Nm) |
+
+### Subnet
+
+| Dataset | Params(M) | Flops(G) | Top-1 Acc | Top-5 Acc |                                                                                  Subnet                                                                                   |                        Config                         |                                                                                                                                                  Download                                                                                                                                                  |     Remarks      |
+| :-----: | :-------: | :------: | :-------: | :-------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
+| Cifar10 |   3.42    |   0.48   |   97.32   |   99.94   |                   [mutable](https://download.openmmlab.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.32_20211222-e5727921_mutable_cfg.yaml)                   | [config](./darts_subnet_1xb96_cifar10_2.0_mmrazor.py) | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.32_20211222-e5727921_latest.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/nas/darts/darts_subnetnet_1xb96_cifar10/darts_subnetnet_1xb96_cifar10_20211222-e5727921.log.json) | MMRazor searched |
+| Cifar10 |   3.83    |   0.55   |   97.27   |   99.98   | [mutable](https://download.openmmlab.com/mmrazor/v0.1/nas/darts/darts_subnetnet_1xb96_cifar10/darts_subnetnet_1xb96_cifar10_acc-97.27_20211222-17e42600_mutable_cfg.yaml) |     [config](./darts_subnet_1xb96_cifar10_2.0.py)     | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.27_20211222-17e42600_latest.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/nas/darts/darts_subnetnet_1xb96_cifar10/darts_subnetnet_1xb96_cifar10_20211222-17e42600.log.json) |     official     |
+
+## Citation
+
+```latex
+@inproceedings{liu2018darts,
+  title={DARTS: Differentiable Architecture Search},
+  author={Liu, Hanxiao and Simonyan, Karen and Yang, Yiming},
+  booktitle={International Conference on Learning Representations},
+  year={2018}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py
new file mode 100644
index 0000000000000000000000000000000000000000..c05a3b43504266799dcd6e53651e5e8016993aab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py
@@ -0,0 +1,44 @@
+_base_ = [
+    'mmrazor::_base_/settings/cifar10_darts_subnet.py',
+    'mmrazor::_base_/nas_backbones/darts_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+subnet_backbone = _base_.nas_backbone
+subnet_backbone.base_channels = 36
+subnet_backbone.num_layers = 20
+subnet_backbone.auxliary = True
+subnet_backbone.aux_channels = 128
+subnet_backbone.aux_out_channels = 768
+subnet_backbone.out_indices = (19, )
+subnet_backbone.norm_cfg = norm_cfg = dict(type='BN', affine=True)
+
+# model
+supernet = dict(
+    type='ImageClassifier',
+    data_preprocessor=_base_.preprocess_cfg,
+    backbone=subnet_backbone,
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='mmrazor.DartsSubnetClsHead',
+        num_classes=10,
+        in_channels=576,
+        aux_in_channels=768,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        aux_loss=dict(type='CrossEntropyLoss', loss_weight=0.4),
+        topk=(1, 5),
+        cal_acc=True))
+
+model = dict(
+    type='mmrazor.sub_model',
+    cfg=supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_PAPER_ALIAS.yaml',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.27_20211222-17e42600_latest.pth',  # noqa: E501
+        prefix='architecture.'))
+
+model_wrapper_cfg = None
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0_mmrazor.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0_mmrazor.py
new file mode 100644
index 0000000000000000000000000000000000000000..2085c21305ad1d3b6560cdbe1058bfffbc5f1596
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0_mmrazor.py
@@ -0,0 +1,39 @@
+_base_ = [
+    'mmrazor::_base_/settings/cifar10_darts_subnet.py',
+    'mmrazor::_base_/nas_backbones/darts_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+subnet_backbone = _base_.nas_backbone
+subnet_backbone.base_channels = 36
+subnet_backbone.num_layers = 20
+subnet_backbone.auxliary = True
+subnet_backbone.aux_channels = 128
+subnet_backbone.aux_out_channels = 768
+subnet_backbone.out_indices = (19, )
+subnet_backbone.norm_cfg = norm_cfg = dict(type='BN', affine=True)
+
+# model
+supernet = dict(
+    type='ImageClassifier',
+    data_preprocessor=_base_.preprocess_cfg,
+    backbone=subnet_backbone,
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='mmrazor.DartsSubnetClsHead',
+        num_classes=10,
+        in_channels=576,
+        aux_in_channels=768,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        aux_loss=dict(type='CrossEntropyLoss', loss_weight=0.4),
+        topk=(1, 5),
+        cal_acc=True))
+
+model = dict(
+    type='mmrazor.sub_model',
+    cfg=supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/darts/DARTS_SUBNET_CIFAR_MMRAZOR_97.32.yaml')
+
+_base_.model_wrapper_cfg = None
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_supernet_unroll_1xb96_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_supernet_unroll_1xb96_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..bcbd2dfe0acd0dcda1b70921aae1074a769d282f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/darts_supernet_unroll_1xb96_cifar10.py
@@ -0,0 +1,33 @@
+_base_ = [
+    'mmrazor::_base_/settings/cifar10_darts_supernet.py',
+    'mmrazor::_base_/nas_backbones/darts_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+custom_hooks = [
+    dict(type='mmrazor.DumpSubnetHook', interval=10, by_epoch=True)
+]
+
+# model
+model = dict(
+    type='mmrazor.Darts',
+    architecture=dict(
+        type='ImageClassifier',
+        backbone=_base_.nas_backbone,
+        neck=dict(type='GlobalAveragePooling'),
+        head=dict(
+            type='LinearClsHead',
+            num_classes=10,
+            in_channels=256,
+            loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+            topk=(1, 5),
+            cal_acc=True)),
+    mutator=dict(type='mmrazor.NasMutator'),
+    unroll=True)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.DartsDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=False)
+
+find_unused_parameter = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..b262a6960cde6557e25084c60f0cabab521a1e0a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/darts/metafile.yml
@@ -0,0 +1,28 @@
+Collections:
+  - Name: Darts
+    Metadata:
+      Training Data:
+        - CIFAR-10
+    Paper:
+      URL: https://arxiv.org/abs/1806.09055
+      Title: DARTS:Differentiable Architecture Search
+    README: configs/nas/mmcls/darts/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/algorithms/darts.py
+      Version: v0.1.0
+    Converted From:
+      Code: https://github.com/quark0/darts
+Models:
+  - Name: darts_subnet_1xb96_cifar10_2.0
+    In Collection: Darts
+    Metadata:
+      Params(M): 3.42
+      Mutable: https://download.openmmlab.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.32_20211222-e5727921_mutable_cfg.yaml
+    Results:
+      - Task: Image Classification
+        Dataset: CIFAR-10
+        Metrics:
+          Top 1 Accuracy: 97.32
+          Top 5 Accuracy: 99.94
+    Config: configs/nas/mmcls/darts/darts_subnet_1xb96_cifar10_2.0.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/darts/darts_subnetnet_1xb96_cifar10_acc-97.32_20211222-e5727921_latest.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/DSNAS_SUBNET_IMAGENET_PAPER_ALIAS.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/DSNAS_SUBNET_IMAGENET_PAPER_ALIAS.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0c35c01b57319aaf452a6e2bb947f415adb57905
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/DSNAS_SUBNET_IMAGENET_PAPER_ALIAS.yaml
@@ -0,0 +1,40 @@
+backbone.layers.0.0:
+  chosen: shuffle_3x3
+backbone.layers.0.1:
+  chosen: shuffle_7x7
+backbone.layers.0.2:
+  chosen: shuffle_3x3
+backbone.layers.0.3:
+  chosen: shuffle_5x5
+backbone.layers.1.0:
+  chosen: shuffle_3x3
+backbone.layers.1.1:
+  chosen: shuffle_3x3
+backbone.layers.1.2:
+  chosen: shuffle_3x3
+backbone.layers.1.3:
+  chosen: shuffle_7x7
+backbone.layers.2.0:
+  chosen: shuffle_xception
+backbone.layers.2.1:
+  chosen: shuffle_3x3
+backbone.layers.2.2:
+  chosen: shuffle_3x3
+backbone.layers.2.3:
+  chosen: shuffle_5x5
+backbone.layers.2.4:
+  chosen: shuffle_3x3
+backbone.layers.2.5:
+  chosen: shuffle_5x5
+backbone.layers.2.6:
+  chosen: shuffle_7x7
+backbone.layers.2.7:
+  chosen: shuffle_7x7
+backbone.layers.3.0:
+  chosen: shuffle_xception
+backbone.layers.3.1:
+  chosen: shuffle_3x3
+backbone.layers.3.2:
+  chosen: shuffle_7x7
+backbone.layers.3.3:
+  chosen: shuffle_3x3
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..222e731de83d60abf6b95b5ffec5c7248a63eada
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/README.md
@@ -0,0 +1,61 @@
+# DSNAS
+
+> [DSNAS: Direct Neural Architecture Search without Parameter Retraining](https://arxiv.org/abs/2002.09128.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Most existing NAS methods require two-stage parameter optimization.
+However, performance of the same architecture in the two stages correlates poorly.
+Based on this observation, DSNAS proposes a task-specific end-to-end differentiable NAS framework that simultaneously optimizes architecture and parameters with a low-biased Monte Carlo estimate. Child networks derived from DSNAS can be deployed directly without parameter retraining.
+
+![pipeline](/docs/en/imgs/model_zoo/dsnas/pipeline.jpg)
+
+## Get Started
+
+### Supernet training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/mmcls/dsnas/dsnas_supernet_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+## Subnet inference on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/mmcls/dsnas/dsnas_subnet_8xb128_in1k.py \
+  $STEP1_CKPT 1 --work-dir $WORK_DIR
+```
+
+## Results and models
+
+### Supernet
+
+| Dataset  | Params(M) | FLOPs (G) | Top-1 Acc (%) | Top-5 Acc (%) |                  Config                   |                                                                                                                         Download                                                                                                                         |     Remarks      |
+| :------: | :-------: | :-------: | :-----------: | :-----------: | :---------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
+| ImageNet |   3.33    |   0.299   |     73.56     |     91.24     | [config](./dsnas_supernet_8xb128_in1k.py) | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/dsnas/dsnas_supernet_8xb128_in1k_20220926_171954-29b87e3a.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/dsnas/dsnas_supernet_8xb128_in1k_20220926_171954-29b87e3a.log) | MMRazor searched |
+
+**Note**:
+
+1. There **might be(not all the case)** some small differences in our experiment in order to be consistent with other repos in OpenMMLab. For example,
+   normalize images in data preprocessing; resize by cv2 rather than PIL in training; dropout is not used in network. **Please refer to corresponding config for details.**
+2. We convert the official searched checkpoint DSNASsearch240.pth into mmrazor-style and evaluate with pytorch1.8_cuda11.0, Top-1 is 74.1 and Top-5 is 91.51.
+3. The implementation of ShuffleNetV2 in official DSNAS is different from OpenMMLab's and we follow the structure design in OpenMMLab. Note that with the
+   origin ShuffleNetV2 design in official DSNAS, the Top-1 is 73.92 and Top-5 is 91.59.
+4. The finetune stage in our implementation refers to the 'search-from-search' stage mentioned in official DSNAS.
+5. We obtain params and FLOPs using `mmrazor.ResourceEstimator`, which may be different from the origin repo.
+
+## Citation
+
+```latex
+@inproceedings{hu2020dsnas,
+  title={Dsnas: Direct neural architecture search without parameter retraining},
+  author={Hu, Shoukang and Xie, Sirui and Zheng, Hehui and Liu, Chunxiao and Shi, Jianping and Liu, Xunying and Lin, Dahua},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={12084--12092},
+  year={2020}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_subnet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_subnet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..beafd5638e0bf59fcfb07480ded2df3ea848c411
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_subnet_8xb128_in1k.py
@@ -0,0 +1,12 @@
+_base_ = ['./dsnas_supernet_8xb128_in1k.py']
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet=  # noqa: E251
+    'configs/nas/mmcls/dsnas/DSNAS_SUBNET_IMAGENET_PAPER_ALIAS.yaml'
+)  # noqa: E501
+
+find_unused_parameters = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_supernet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_supernet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..869dede5bbb776eaed3c38f1e9bca8f2f1e55c2b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/dsnas/dsnas_supernet_8xb128_in1k.py
@@ -0,0 +1,43 @@
+_base_ = [
+    'mmrazor::_base_/settings/imagenet_bs1024_dsnas.py',
+    'mmrazor::_base_/nas_backbones/dsnas_shufflenet_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+custom_hooks = [
+    dict(type='mmrazor.DumpSubnetHook', interval=10, by_epoch=True)
+]
+
+supernet = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    data_preprocessor=_base_.data_preprocessor,
+    backbone=_base_.nas_backbone,
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=1024,
+        loss=dict(
+            type='LabelSmoothLoss',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=1.0),
+        topk=(1, 5)))
+
+# model
+model = dict(
+    type='mmrazor.DSNAS',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'),
+    pretrain_epochs=15,
+    finetune_epochs=_base_.search_epochs,
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.DSNASDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+randomness = dict(seed=48, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT22.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT22.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..144342caadfb3fd74b5cef3ef89e5035d18be2b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT22.yaml
@@ -0,0 +1,140 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1280
+backbone.layers.1.0.expand_ratio:
+  chosen: 1
+backbone.layers.1.0.kernel_size:
+  chosen: 3
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.0.expand_ratio:
+  chosen: 3
+backbone.layers.2.0.kernel_size:
+  chosen: 3
+backbone.layers.2.1.expand_ratio:
+  chosen: 3
+backbone.layers.2.1.kernel_size:
+  chosen: 3
+backbone.layers.2.2.expand_ratio:
+  chosen: 3
+backbone.layers.2.2.kernel_size:
+  chosen: 3
+backbone.layers.2.3.expand_ratio:
+  chosen: 3
+backbone.layers.2.3.kernel_size:
+  chosen: 3
+backbone.layers.2.depth:
+  chosen: 2
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.0.expand_ratio:
+  chosen: 3
+backbone.layers.3.0.expand_ratio_se:
+  chosen: 4
+backbone.layers.3.0.kernel_size:
+  chosen: 5
+backbone.layers.3.1.expand_ratio:
+  chosen: 3
+backbone.layers.3.1.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.1.kernel_size:
+  chosen: 5
+backbone.layers.3.2.expand_ratio:
+  chosen: 3
+backbone.layers.3.2.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.2.kernel_size:
+  chosen: 3
+backbone.layers.3.3.expand_ratio:
+  chosen: 3
+backbone.layers.3.3.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.3.kernel_size:
+  chosen: 3
+backbone.layers.3.depth:
+  chosen: 2
+backbone.layers.3.out_channels:
+  chosen: 40
+backbone.layers.4.0.expand_ratio:
+  chosen: 4
+backbone.layers.4.0.kernel_size:
+  chosen: 7
+backbone.layers.4.1.expand_ratio:
+  chosen: 3
+backbone.layers.4.1.kernel_size:
+  chosen: 3
+backbone.layers.4.2.expand_ratio:
+  chosen: 3
+backbone.layers.4.2.kernel_size:
+  chosen: 3
+backbone.layers.4.3.expand_ratio:
+  chosen: 3
+backbone.layers.4.3.kernel_size:
+  chosen: 3
+backbone.layers.4.depth:
+  chosen: 2
+backbone.layers.4.out_channels:
+  chosen: 80
+backbone.layers.5.0.expand_ratio:
+  chosen: 3
+backbone.layers.5.0.expand_ratio_se:
+  chosen: 3
+backbone.layers.5.0.kernel_size:
+  chosen: 5
+backbone.layers.5.1.expand_ratio:
+  chosen: 4
+backbone.layers.5.1.expand_ratio_se:
+  chosen: 4
+backbone.layers.5.1.kernel_size:
+  chosen: 3
+backbone.layers.5.2.expand_ratio:
+  chosen: 3
+backbone.layers.5.2.expand_ratio_se:
+  chosen: 3
+backbone.layers.5.2.kernel_size:
+  chosen: 3
+backbone.layers.5.3.expand_ratio:
+  chosen: 3
+backbone.layers.5.3.expand_ratio_se:
+  chosen: 3
+backbone.layers.5.3.kernel_size:
+  chosen: 3
+backbone.layers.5.depth:
+  chosen: 2
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.0.expand_ratio:
+  chosen: 6
+backbone.layers.6.0.expand_ratio_se:
+  chosen: 6
+backbone.layers.6.0.kernel_size:
+  chosen: 3
+backbone.layers.6.1.expand_ratio:
+  chosen: 6
+backbone.layers.6.1.expand_ratio_se:
+  chosen: 6
+backbone.layers.6.1.kernel_size:
+  chosen: 7
+backbone.layers.6.2.expand_ratio:
+  chosen: 3
+backbone.layers.6.2.expand_ratio_se:
+  chosen: 6
+backbone.layers.6.2.kernel_size:
+  chosen: 3
+backbone.layers.6.3.expand_ratio:
+  chosen: 6
+backbone.layers.6.3.expand_ratio_se:
+  chosen: 6
+backbone.layers.6.3.kernel_size:
+  chosen: 3
+backbone.layers.6.depth:
+  chosen: 2
+backbone.layers.6.out_channels:
+  chosen: 160
+input_shape:
+  chosen:
+  - 140
+  - 140
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT31.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT31.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b8f752d9f2807f8ed37e3f624c7354509de1eb5a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT31.yaml
@@ -0,0 +1,148 @@
+backbone.first_channels:
+  chosen: 16
+backbone.last_channels:
+  chosen: 1280
+backbone.layers.1.0.expand_ratio:
+  chosen: 1
+backbone.layers.1.0.kernel_size:
+  chosen: 3
+backbone.layers.1.depth:
+  chosen: 1
+backbone.layers.1.out_channels:
+  chosen: 16
+backbone.layers.2.0.expand_ratio:
+  chosen: 3
+backbone.layers.2.0.kernel_size:
+  chosen: 5
+backbone.layers.2.1.expand_ratio:
+  chosen: 3
+backbone.layers.2.1.kernel_size:
+  chosen: 3
+backbone.layers.2.2.expand_ratio:
+  chosen: 3
+backbone.layers.2.2.kernel_size:
+  chosen: 3
+backbone.layers.2.3.expand_ratio:
+  chosen: 3
+backbone.layers.2.3.kernel_size:
+  chosen: 3
+backbone.layers.2.depth:
+  chosen: 2
+backbone.layers.2.out_channels:
+  chosen: 24
+backbone.layers.3.0.expand_ratio:
+  chosen: 4
+backbone.layers.3.0.expand_ratio_se:
+  chosen: 4
+backbone.layers.3.0.kernel_size:
+  chosen: 5
+backbone.layers.3.1.expand_ratio:
+  chosen: 3
+backbone.layers.3.1.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.1.kernel_size:
+  chosen: 5
+backbone.layers.3.2.expand_ratio:
+  chosen: 3
+backbone.layers.3.2.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.2.kernel_size:
+  chosen: 3
+backbone.layers.3.3.expand_ratio:
+  chosen: 3
+backbone.layers.3.3.expand_ratio_se:
+  chosen: 3
+backbone.layers.3.3.kernel_size:
+  chosen: 3
+backbone.layers.3.depth:
+  chosen: 2
+backbone.layers.3.out_channels:
+  chosen: 40
+backbone.layers.4.0.expand_ratio:
+  chosen: 4
+backbone.layers.4.0.expand_ratio_se:
+  chosen: 4
+backbone.layers.4.0.kernel_size:
+  chosen: 3
+backbone.layers.4.1.expand_ratio:
+  chosen: 4
+backbone.layers.4.1.expand_ratio_se:
+  chosen: 4
+backbone.layers.4.1.kernel_size:
+  chosen: 3
+backbone.layers.4.2.expand_ratio:
+  chosen: 4
+backbone.layers.4.2.expand_ratio_se:
+  chosen: 4
+backbone.layers.4.2.kernel_size:
+  chosen: 5
+backbone.layers.4.3.expand_ratio:
+  chosen: 4
+backbone.layers.4.3.expand_ratio_se:
+  chosen: 4
+backbone.layers.4.3.kernel_size:
+  chosen: 3
+backbone.layers.4.depth:
+  chosen: 3
+backbone.layers.4.out_channels:
+  chosen: 80
+backbone.layers.5.0.expand_ratio:
+  chosen: 4
+backbone.layers.5.0.expand_ratio_se:
+  chosen: 4
+backbone.layers.5.0.kernel_size:
+  chosen: 3
+backbone.layers.5.1.expand_ratio:
+  chosen: 3
+backbone.layers.5.1.expand_ratio_se:
+  chosen: 3
+backbone.layers.5.1.kernel_size:
+  chosen: 5
+backbone.layers.5.2.expand_ratio:
+  chosen: 4
+backbone.layers.5.2.expand_ratio_se:
+  chosen: 4
+backbone.layers.5.2.kernel_size:
+  chosen: 7
+backbone.layers.5.3.expand_ratio:
+  chosen: 3
+backbone.layers.5.3.expand_ratio_se:
+  chosen: 3
+backbone.layers.5.3.kernel_size:
+  chosen: 3
+backbone.layers.5.depth:
+  chosen: 3
+backbone.layers.5.out_channels:
+  chosen: 112
+backbone.layers.6.0.expand_ratio:
+  chosen: 6
+backbone.layers.6.0.expand_ratio_se:
+  chosen: 6
+backbone.layers.6.0.kernel_size:
+  chosen: 3
+backbone.layers.6.1.expand_ratio:
+  chosen: 3
+backbone.layers.6.1.expand_ratio_se:
+  chosen: 3
+backbone.layers.6.1.kernel_size:
+  chosen: 3
+backbone.layers.6.2.expand_ratio:
+  chosen: 3
+backbone.layers.6.2.expand_ratio_se:
+  chosen: 3
+backbone.layers.6.2.kernel_size:
+  chosen: 5
+backbone.layers.6.3.expand_ratio:
+  chosen: 3
+backbone.layers.6.3.expand_ratio_se:
+  chosen: 3
+backbone.layers.6.3.kernel_size:
+  chosen: 5
+backbone.layers.6.depth:
+  chosen: 4
+backbone.layers.6.out_channels:
+  chosen: 160
+input_shape:
+  chosen:
+  - 152
+  - 152
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..92270f708892d1720194cec092bbfd6574bed5de
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/README.md
@@ -0,0 +1,50 @@
+# Once-For-All
+
+> [ONCE-FOR-ALL: TRAIN ONE NETWORK AND SPE- CIALIZE IT FOR EFFICIENT DEPLOYMENT](https://arxiv.org/abs/1908.09791)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+We address the challenging problem of efficient inference across many devices and resource constraints, especially on edge devices. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing CO2 emission as much as 5 cars’ lifetime Strubell et al. (2019)) thus unscalable. In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost. We can quickly get a specialized sub-network by selecting from the OFA network without additional training. To efficiently train OFA networks, we also propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning (depth, width, kernel size, and resolution). It can obtain a surprisingly large number of sub- networks (> 1019) that can fit different hardware platforms and latency constraints while maintaining the same level of accuracy as training independently. On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5× faster than MobileNetV3, 2.6× faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top-1 accuracy under the mobile setting (\<600M MACs). OFA is the winning solution for the 3rd Low Power Computer Vision Challenge (LPCVC), DSP classification track and the 4th LPCVC, both classification track and detection track.
+
+## Get Started
+
+We product inference models which are published by official Once-For-All repo and converted by MMRazor.
+
+### Subnet test on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/mmcls/onceforall/ofa_mobilenet_subnet_8xb256_in1k.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$OFA_CKPT model.init_weight_from_supernet=False
+```
+
+## Results and models
+
+| Dataset  |       Supernet       |                                                           Subnet                                                            | Params(M) | Flops(G) | Top-1 |                                                             Config                                                              |                                                                             Download                                                                             |         Remarks         |
+| :------: | :------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :-------: | :------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------: |
+| ImageNet | AttentiveMobileNetV3 | [search space](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/_base_/nas_backbones/ofa_mobilenetv3_supernet.py) |    7.6    |  747.8   | 77.5  | [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/onceforall/ofa_mobilenet_supernet_32xb64_in1k.py) |                 [model](https://download.openmmlab.com/mmrazor/v1/ofa/ofa_mobilenet_supernet_d234_e346_k357_w1_0.py_20221214_0940-d0ebc66f.pth)                  | Converted from the repo |
+| ImageNet | AttentiveMobileNetV3 |                                            note8_lat@22ms_top1@70.4_finetune@25                                             |    4.3    |   70.9   | 70.3  |                   [config](https://download.openmmlab.com/mmrazor/v1/ofa/rtmdet/OFA_SUBNET_NOTE8_LAT22.yaml)                    | [model](https://download.openmmlab.com/mmrazor/v1/ofa/ofa_mobilenet_subnet_8xb256_in1k_note8_lat%4022ms_top1%4070.4_finetune%4025.py_20221214_0938-fb7fb84f.pth) | Converted from the repo |
+| ImageNet | AttentiveMobileNetV3 |                                            note8_lat@31ms_top1@72.8_finetune@25                                             |    4.6    |  105.4   | 72.6  |                   [config](https://download.openmmlab.com/mmrazor/v1/ofa/rtmdet/OFA_SUBNET_NOTE8_LAT31.yaml)                    | [model](https://download.openmmlab.com/mmrazor/v1/ofa/ofa_mobilenet_subnet_8xb256_in1k_note8_lat%4031ms_top1%4072.8_finetune%4025.py_20221214_0939-981a8b2a.pth) | Converted from the repo |
+
+**Note**:
+
+1. OFA provides a more fine-grained search mode, which searches expand ratios & kernel size for each block in every layer of the defined supernet, therefore the subnet configs (format as .yaml) is more complex than those of BigNAS/AttentiveNAS.
+2. We product the [ofa script](../../../../tools/model_converters/convert_ofa_ckpt.py) to convert the official weight into MMRazor-style. The layer depth of a specific subnet is required when converting keys.
+3. The models above are converted from the [once-for-all official repo](https://github.com/mit-han-lab/once-for-all). The config files of these models
+   are only for inference. We don't ensure training accuracy of these config files and you are welcomed to contribute your reproduction results.
+
+## Citation
+
+```latex
+@inproceedings{
+  cai2020once,
+  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
+  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
+  booktitle={International Conference on Learning Representations},
+  year={2020},
+  url={https://arxiv.org/pdf/1908.09791.pdf}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_search_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_search_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..75d52ad58e61b4fa178631943fb40da4d903a5ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_search_8xb128_in1k.py
@@ -0,0 +1,16 @@
+_base_ = ['./ofa_mobilenet_supernet_32xb64_in1k.py']
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=1,
+    num_candidates=2,
+    top_k=1,
+    num_mutation=1,
+    num_crossover=1,
+    mutate_prob=0.1,
+    calibrate_sample_num=4096,
+    constraints_range=dict(flops=(0., 700.)),
+    score_key='accuracy/top1')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_subnet_8xb256_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_subnet_8xb256_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b033b3e3b8824b4ced3939dde268d69890b2820
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_subnet_8xb256_in1k.py
@@ -0,0 +1,22 @@
+_base_ = 'ofa_mobilenet_supernet_32xb64_in1k.py'
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/onceforall/OFA_SUBNET_NOTE8_LAT31.yaml',
+    # You can also load the checkpoint of supernet instead of the specific
+    # subnet by modifying the `checkpoint`(path) in the following `init_cfg`
+    # with `init_weight_from_supernet = True`.
+    init_weight_from_supernet=False,
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/ofa/ofa_mobilenet_subnet_8xb256_in1k_note8_lat%4031ms_top1%4072.8_finetune%4025.py_20221214_0939-981a8b2a.pth',  # noqa: E501
+        prefix='architecture.'))
+
+model_wrapper_cfg = None
+find_unused_parameters = True
+
+test_cfg = dict(evaluate_fixed_subnet=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_supernet_32xb64_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_supernet_32xb64_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..341f4bda969cdd7625e1da7e3e5ff0c36e6fee57
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/onceforall/ofa_mobilenet_supernet_32xb64_in1k.py
@@ -0,0 +1,51 @@
+_base_ = [
+    'mmcls::_base_/default_runtime.py',
+    'mmrazor::_base_/settings/imagenet_bs2048_ofa.py',
+    'mmrazor::_base_/nas_backbones/ofa_mobilenetv3_supernet.py',
+]
+
+supernet = dict(
+    _scope_='mmrazor',
+    type='SearchableImageClassifier',
+    data_preprocessor=_base_.data_preprocessor,
+    backbone=_base_.nas_backbone,
+    neck=dict(type='mmcls.GlobalAveragePooling'),
+    head=dict(
+        type='DynamicLinearClsHead',
+        num_classes=1000,
+        in_channels=1280,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=1.0),
+        topk=(1, 5)),
+    input_resizer_cfg=_base_.input_resizer_cfg,
+    connect_head=dict(connect_with_backbone='backbone.last_mutable_channels'),
+)
+
+model = dict(
+    _scope_='mmrazor',
+    type='BigNAS',
+    drop_path_rate=0.2,
+    backbone_dropout_stages=[6, 7],
+    architecture=supernet,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(recorder='fc', from_student=True),
+                preds_T=dict(recorder='fc', from_student=False)))),
+    mutators=dict(type='mmrazor.NasMutator'))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.BigNASDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..19e55b795b2c904b7987de20abd86bccf2e5995e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/README.md
@@ -0,0 +1,76 @@
+# SPOS
+
+> [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model to address the challenge in the training. Our central idea is to construct a simplified supernet, where all architectures are single paths so that weight co-adaption problem is alleviated. Training is performed by uniform path sampling. All architectures (and their weights) are trained fully and equally.
+Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces (e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187424862-c2f3fde1-4a48-4eda-9ff7-c65971b683ba.jpg)
+
+## Get Started
+
+### Step 1: Supernet pre-training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/spos/spos_supernet_shufflenetv2_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR \
+```
+
+### Step 2: Search for subnet on the trained supernet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/spos/spos_evolution_search_shufflenetv2_8xb2048_in1k.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP1_CKPT
+```
+
+### Step 3: Subnet retraining on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/spos/spos_subnet_shufflenetv2_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP2_CKPT
+
+```
+
+## Step 4: Subnet inference on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/spos/spos_subnet_shufflenetv2_8xb128_in1k.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP3_CKPT
+```
+
+## Results and models
+
+| Dataset  |        Supernet        |                                                                            Subnet                                                                            | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) |                                                           Config                                                            | Download                                                                                                                                                                                                                                                                                                                              |                            Remarks                            |
+| :------: | :--------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------: | :------: | :-------: | :-------: | :-------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------------------------------------------------------: |
+| ImageNet |      ShuffleNetV2      |  [mutable](https://download.openmmlab.com/mmrazor/v1/spos/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20220715-aa94d5ef_subnet_cfg_v3.yaml)   |   3.35    |   0.33   |   73.87   |   91.6    | [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/spos/spos_subnet_shufflenetv2_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/spos/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d_v3.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/nas/spos/spos_shufflenetv2_subnet_8xb128_in1k/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d.log.json) |                       MMRazor searched                        |
+| ImageNet | MobileNet-ProxylessGPU | [mutable](https://download.openmmlab.com/mmrazor/v0.1/nas/spos/spos_mobilenet_subnet/spos_angelnas_flops_0.49G_acc_75.98_20220307-54f4698f_mutable_cfg.yaml) |   5.94    |  0.49\*  |   75.98   |   92.77   |  [config](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/configs/nas/mmcls/spos/spos_mobilenet_subnet_8xb128_in1k.py)   |                                                                                                                                                                                                                                                                                                                                       | [AngleNAS](https://github.com/megvii-model/AngleNAS) searched |
+
+**Note**:
+
+1. There **might be(not all the case)** some small differences in our experiment in order to be consistent with other repos in OpenMMLab. For example,
+   normalize images in data preprocessing; resize by cv2 rather than PIL in training; dropout is not used in network. **Please refer to corresponding config for details.**
+2. For *ShuffleNetV2*, we retrain the subnet reported in paper with their official code, Top-1 is 73.6 and Top-5 is 91.6.
+3. For *AngleNAS searched MobileNet-ProxylessGPU*, we obtain params and FLOPs using [this script](/tools/misc/get_flops.py), which may be different from [AngleNAS](https://github.com/megvii-model/AngleNAS#searched-models-with-abs).
+
+## Citation
+
+```latex
+@inproceedings{guo2020single,
+  title={Single path one-shot neural architecture search with uniform sampling},
+  author={Guo, Zichao and Zhang, Xiangyu and Mu, Haoyuan and Heng, Wen and Liu, Zechun and Wei, Yichen and Sun, Jian},
+  booktitle={European Conference on Computer Vision},
+  pages={544--560},
+  year={2020},
+  organization={Springer}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/SPOS_SUBNET.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/SPOS_SUBNET.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ba809da1df94fdfff372ef4b1e6868d08781cf10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/SPOS_SUBNET.yaml
@@ -0,0 +1,40 @@
+backbone.layers.0.0:
+  chosen: shuffle_7x7
+backbone.layers.0.1:
+  chosen: shuffle_3x3
+backbone.layers.0.2:
+  chosen: shuffle_7x7
+backbone.layers.0.3:
+  chosen: shuffle_3x3
+backbone.layers.1.0:
+  chosen: shuffle_xception
+backbone.layers.1.1:
+  chosen: shuffle_5x5
+backbone.layers.1.2:
+  chosen: shuffle_5x5
+backbone.layers.1.3:
+  chosen: shuffle_3x3
+backbone.layers.2.0:
+  chosen: shuffle_3x3
+backbone.layers.2.1:
+  chosen: shuffle_5x5
+backbone.layers.2.2:
+  chosen: shuffle_3x3
+backbone.layers.2.3:
+  chosen: shuffle_5x5
+backbone.layers.2.4:
+  chosen: shuffle_3x3
+backbone.layers.2.5:
+  chosen: shuffle_xception
+backbone.layers.2.6:
+  chosen: shuffle_5x5
+backbone.layers.2.7:
+  chosen: shuffle_7x7
+backbone.layers.3.0:
+  chosen: shuffle_7x7
+backbone.layers.3.1:
+  chosen: shuffle_3x3
+backbone.layers.3.2:
+  chosen: shuffle_5x5
+backbone.layers.3.3:
+  chosen: shuffle_xception
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/faster-rcnn_nas_backbone_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/faster-rcnn_nas_backbone_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1ed4a93723159fa23ad7422177cb38115828a64
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/faster-rcnn_nas_backbone_fpn_1x_coco.py
@@ -0,0 +1,21 @@
+# Suppose you are in mmdet and want to use the searched subnet
+# as backbone for faster-rcnn, then you can just use this config.
+
+_base_ = [
+    '../_base_/models/faster-rcnn_r50_fpn.py',
+    '../_base_/datasets/coco_detection.py',
+    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py',
+    'mmrazor::_base_/nas_backbones/spos_shufflenet_supernet.py'
+]
+
+_base_.nas_backbone.out_indices = (0, 1, 2, 3)
+_base_.nas_backbone.with_last_layer = False
+nas_backbone = dict(
+    # use mmrazor's build_func
+    type='mmrazor.sub_model',
+    cfg=_base_.nas_backbone,
+    fix_subnet='/path/to/your/mmrazor/configs/nas/mmcls/spos/SPOS_SUBNET.yaml',
+    extra_prefix='backbone.')
+
+_base_.model.backbone = nas_backbone
+_base_.model.neck.in_channels = [64, 160, 320, 640]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0a8dc59609999ad0a369bd540ebe17e2a45900ba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/metafile.yml
@@ -0,0 +1,28 @@
+Collections:
+  - Name: SPOS
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+    Paper:
+      URL: https://arxiv.org/abs/1904.00420
+      Title: Single Path One-Shot Neural Architecture Search with Uniform Sampling
+    README: configs/nas/mmcls/spos/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/algorithms/spos.py
+      Version: v0.1.0
+    Converted From:
+      Code: https://github.com/megvii-model/SinglePathOneShot
+Models:
+  - Name: spos_shufflenet_subnet_8xb128_in1k
+    In Collection: SPOS
+    Metadata:
+      FLOPs: 330 MB
+      Subnet: https://download.openmmlab.com/mmrazor/v1/spos/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d_subnet_cfg_v3.yaml
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 73.87
+          Top 5 Accuracy: 91.60
+    Config: configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/spos/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d_v3.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_search_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_search_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..87553ec399e384055dccf44cacd3bd380d7b063c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_search_8xb128_in1k.py
@@ -0,0 +1,17 @@
+_base_ = ['./spos_mobilenet_supernet_8xb128_in1k.py']
+
+model = dict(norm_training=True)
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=50,
+    top_k=10,
+    num_mutation=25,
+    num_crossover=25,
+    mutate_prob=0.1,
+    constraints_range=dict(flops=(0., 465.)),
+    score_key='accuracy/top1')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_subnet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_subnet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..a45d17c1bdcd74012944d9a9c66ac1ec618d146e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_subnet_8xb128_in1k.py
@@ -0,0 +1,17 @@
+_base_ = ['./spos_mobilenet_supernet_8xb128_in1k.py']
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/spos/AngleNAS_SHUFFLENETV2_IN1k_2.0.yaml',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/spos/spos_mobilenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d_v3.pth',  # noqa: E501
+        prefix='architecture.'))
+
+model_wrapper_cfg = None
+
+find_unused_parameters = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_supernet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_supernet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb38013af36bc4356a5cd82222517b7521dea685
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_mobilenet_supernet_8xb128_in1k.py
@@ -0,0 +1,31 @@
+_base_ = [
+    'mmrazor::_base_/settings/imagenet_bs1024_spos.py',
+    'mmrazor::_base_/nas_backbones/spos_mobilenet_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+# model
+supernet = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    # data_preprocessor=_base_.preprocess_cfg,
+    backbone=_base_.nas_backbone,
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=1728,
+        loss=dict(
+            type='LabelSmoothLoss',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=1.0),
+        topk=(1, 5)))
+
+model = dict(
+    type='mmrazor.SPOS',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'))
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f5a5e88f4c145e8853e2581466dd3bbbbffdb770
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py
@@ -0,0 +1,17 @@
+_base_ = ['./spos_shufflenet_supernet_8xb128_in1k.py']
+
+model = dict(norm_training=True)
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=50,
+    top_k=10,
+    num_mutation=25,
+    num_crossover=25,
+    mutate_prob=0.1,
+    constraints_range=dict(flops=(0, 330)),
+    score_key='accuracy/top1')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_predictor_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_predictor_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..fb08d34945db4747da183b2153d607d2dfa43378
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_search_predictor_8xb128_in1k.py
@@ -0,0 +1,21 @@
+_base_ = ['./spos_shufflenet_supernet_8xb128_in1k.py']
+
+model = dict(norm_training=True)
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=50,
+    top_k=10,
+    num_mutation=25,
+    num_crossover=25,
+    mutate_prob=0.1,
+    constraints_range=dict(flops=(0., 360.)),
+    predictor_cfg=dict(
+        type='mmrazor.MetricPredictor',
+        train_samples=20,
+        handler_cfg=dict(type='mmrazor.GaussProcessHandler')),
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f671def4528372b1b752ada50ad047e51525bf4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py
@@ -0,0 +1,17 @@
+_base_ = ['./spos_shufflenet_supernet_8xb128_in1k.py']
+
+_base_.model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmcls/spos/SPOS_SUBNET.yaml',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/spos/spos_shufflenetv2_subnet_8xb128_in1k_flops_0.33M_acc_73.87_20211222-1f0a0b4d_v3.pth',  # noqa: E501
+        prefix='architecture.'))
+
+model_wrapper_cfg = None
+
+find_unused_parameters = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..869bcac4cf80c245ebc17ae29cfe79fb611841c5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py
@@ -0,0 +1,31 @@
+_base_ = [
+    'mmrazor::_base_/settings/imagenet_bs1024_spos.py',
+    'mmrazor::_base_/nas_backbones/spos_shufflenet_supernet.py',
+    'mmcls::_base_/default_runtime.py',
+]
+
+# model
+supernet = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    data_preprocessor=_base_.preprocess_cfg,
+    backbone=_base_.nas_backbone,
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=1024,
+        loss=dict(
+            type='LabelSmoothLoss',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            mode='original',
+            loss_weight=1.0),
+        topk=(1, 5)))
+
+model = dict(
+    type='mmrazor.SPOS',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'))
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/DETNAS_SUBNET.yaml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/DETNAS_SUBNET.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c7bcab916f02d29d379cb7853d8ad26e9c78131e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/DETNAS_SUBNET.yaml
@@ -0,0 +1,40 @@
+backbone.layers.0.0:
+  chosen: shuffle_5x5
+backbone.layers.0.1:
+  chosen: shuffle_3x3
+backbone.layers.0.2:
+  chosen: shuffle_3x3
+backbone.layers.0.3:
+  chosen: shuffle_3x3
+backbone.layers.1.0:
+  chosen: shuffle_xception
+backbone.layers.1.1:
+  chosen: shuffle_3x3
+backbone.layers.1.2:
+  chosen: shuffle_xception
+backbone.layers.1.3:
+  chosen: shuffle_7x7
+backbone.layers.2.0:
+  chosen: shuffle_7x7
+backbone.layers.2.1:
+  chosen: shuffle_7x7
+backbone.layers.2.2:
+  chosen: shuffle_xception
+backbone.layers.2.3:
+  chosen: shuffle_xception
+backbone.layers.2.4:
+  chosen: shuffle_3x3
+backbone.layers.2.5:
+  chosen: shuffle_7x7
+backbone.layers.2.6:
+  chosen:  shuffle_5x5
+backbone.layers.2.7:
+  chosen: shuffle_xception
+backbone.layers.3.0:
+  chosen: shuffle_7x7
+backbone.layers.3.1:
+  chosen: shuffle_7x7
+backbone.layers.3.2:
+  chosen:  shuffle_7x7
+backbone.layers.3.3:
+  chosen:  shuffle_5x5
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2b4096a63bae333098c6febaae124ab1f8c5511e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/README.md
@@ -0,0 +1,87 @@
+# DetNAS
+
+> [DetNAS: Backbone Search for Object Detection](https://arxiv.org/abs/1903.10979)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Object detectors are usually equipped with backbone networks designed for image classification. It might be sub-optimal because of the gap between the tasks of image classification and object detection. In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection. It is non-trivial because detection training typically needs ImageNet pre-training while NAS systems require accuracies on the target detection task as supervisory signals. Based on the technique of one-shot supernet, which contains all possible networks in the search space, we propose a framework for backbone search on object detection. We train the supernet under the typical detector training schedule: ImageNet pre-training and detection fine-tuning. Then, the architecture search is performed on the trained supernet, using the detection task as the guidance. This framework makes NAS on backbones very efficient. In experiments, we show the effectiveness of DetNAS on various detectors, for instance, one-stage RetinaNet and the two-stage FPN. We empirically find that networks searched on object detection shows consistent superiority compared to those searched on ImageNet classification. The resulting architecture achieves superior performance than hand-crafted networks on COCO with much less FLOPs complexity.
+
+![pipeline](https://user-images.githubusercontent.com/88702197/187425296-64baa22a-9422-46cd-bd95-47e3e5707f75.jpg)
+
+## Get Started
+
+### Step 1: Supernet pre-training on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/detnas/detnas_supernet_shufflenetv2_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Step 2: Supernet fine-tuning on COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/detnas/detnas_supernet_frcnn_shufflenetv2_fpn_1x_coco.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP1_CKPT
+```
+
+### Step 3: Search for subnet on the trained supernet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/detnas/detnas_evolution_search_frcnn_shufflenetv2_fpn_coco.py 4 \
+  --work-dir $WORK_DIR --cfg-options load_from=$STEP2_CKPT
+```
+
+### Step 4: Subnet retraining on ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/detnas/detnas_subnet_shufflenetv2_8xb128_in1k.py 4 \
+  --work-dir $WORK_DIR --cfg-options algorithm.mutable_cfg=$STEP3_SUBNET_YAML  # or modify the config directly
+```
+
+### Step 5: Subnet fine-tuning on COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco.py 4 \
+  --work-dir $WORK_DIR \
+  --cfg-options --cfg-options model.init_cfg.checkpoint=$STEP4_CKPT
+```
+
+### Step 6: Subnet inference on COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco.py \
+  none 1 --work-dir $WORK_DIR \
+  --cfg-options model.init_cfg.checkpoint=$STEP5_CKPT
+```
+
+## Results and models
+
+| Dataset |      Supernet      |                                                                                                              Subnet                                                                                                               |   Params(M)    |    Flops(G)    | mAP  |                           Config                            |                                                                                                                                                                                                                                                                                                               Download                                                                                                                                                                                                                                                                                                               |     Remarks      |
+| :-----: | :----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------: | :------------: | :--: | :---------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
+|  COCO   | FRCNN-ShuffleNetV2 | [mutable](https://download.openmmlab.com/mmrazor/v0.1/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20211222-67fea61f_mutable_cfg.yaml) | 3.35(backbone) | 0.34(backbone) | 37.5 | [config](./detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco.py) | [pretrain](https://download.openmmlab.com/mmrazor/v0.1/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco/detnas_subnet_shufflenetv2_8xb128_in1k_acc-74.08_20211223-92e9b66a.pth) \|[model](https://download.openmmlab.com/mmrazor/v0.1/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20211222-67fea61f.pth) \| [log](https://download.openmmlab.com/mmrazor/v0.1/nas/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20211222-67fea61f.log.json) | MMRazor searched |
+
+**Note**:
+
+1. The experiment settings of DetNAS are similar with SPOS's, and our training dataset is COCO2017 rather than COCO2014.
+2. We also retrained official subnet with same experiment settings, the final result is 36.9
+
+## Citation
+
+```latex
+@article{chen2019detnas,
+  title={Detnas: Backbone search for object detection},
+  author={Chen, Yukang and Yang, Tong and Zhang, Xiangyu and Meng, Gaofeng and Xiao, Xinyu and Sun, Jian},
+  journal={Advances in Neural Information Processing Systems},
+  volume={32},
+  pages={6642--6652},
+  year={2019}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_search_coco_1x.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_search_coco_1x.py
new file mode 100644
index 0000000000000000000000000000000000000000..689618362fdb26e800ab7831b6a50364672a16a6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_search_coco_1x.py
@@ -0,0 +1,17 @@
+_base_ = ['./detnas_frcnn_shufflenet_supernet_coco_1x.py']
+
+model = dict(norm_training=True)
+
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.EvolutionSearchLoop',
+    dataloader=_base_.val_dataloader,
+    evaluator=_base_.val_evaluator,
+    max_epochs=20,
+    num_candidates=50,
+    top_k=10,
+    num_mutation=20,
+    num_crossover=20,
+    mutate_prob=0.1,
+    constraints_range=dict(flops=(0, 330)),
+    score_key='coco/bbox_mAP')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_subnet_coco_1x.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_subnet_coco_1x.py
new file mode 100644
index 0000000000000000000000000000000000000000..e10daec7da218897690736db1370ec62fa04348e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_subnet_coco_1x.py
@@ -0,0 +1,15 @@
+_base_ = ['./detnas_frcnn_shufflenet_supernet_coco_1x.py']
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet='configs/nas/mmdet/detnas/DETNAS_SUBNET.yaml',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint=  # noqa: E251
+        'https://download.openmmlab.com/mmrazor/v1/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20220715-61d2e900_v1.pth',  # noqa: E501
+        prefix='architecture.'))
+
+find_unused_parameters = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_supernet_coco_1x.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_supernet_coco_1x.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2b2711f63909ecd527b7031b97f7ea4f24640d4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_supernet_coco_1x.py
@@ -0,0 +1,30 @@
+_base_ = [
+    'mmdet::_base_/models/faster-rcnn_r50_fpn.py',
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_1x.py',
+    'mmdet::_base_/default_runtime.py',
+    'mmrazor::_base_/nas_backbones/spos_shufflenet_supernet.py'
+]
+
+norm_cfg = dict(type='SyncBN', requires_grad=True)
+
+supernet = _base_.model
+
+supernet.backbone = _base_.nas_backbone
+supernet.backbone.norm_cfg = norm_cfg
+supernet.backbone.out_indices = (0, 1, 2, 3)
+supernet.backbone.with_last_layer = False
+
+supernet.neck.norm_cfg = norm_cfg
+supernet.neck.in_channels = [64, 160, 320, 640]
+
+supernet.roi_head.bbox_head.norm_cfg = norm_cfg
+supernet.roi_head.bbox_head.type = 'Shared4Conv1FCBBoxHead'
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.SPOS',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'))
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_retina_shufflenet_supernet_coco_1x.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_retina_shufflenet_supernet_coco_1x.py
new file mode 100644
index 0000000000000000000000000000000000000000..21c37f51e8d3280f8e4e6b641a1a2349b7a1588c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_retina_shufflenet_supernet_coco_1x.py
@@ -0,0 +1,27 @@
+_base_ = [
+    'mmdet::_base_/models/retinanet_r50_fpn.py',
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_1x.py',
+    'mmdet::_base_/default_runtime.py',
+    'mmrazor::_base_/nas_backbones/spos_shufflenet_supernet.py'
+]
+
+norm_cfg = dict(type='SyncBN', requires_grad=True)
+
+supernet = _base_.model
+
+supernet.backbone = _base_.nas_backbone
+supernet.backbone.norm_cfg = norm_cfg
+supernet.backbone.out_indices = (0, 1, 2, 3)
+supernet.backbone.with_last_layer = False
+
+supernet.neck.norm_cfg = norm_cfg
+supernet.neck.in_channels = [64, 160, 320, 640]
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.SPOS',
+    architecture=supernet,
+    mutator=dict(type='mmrazor.NasMutator'))
+
+find_unused_parameters = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_subnet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_subnet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..4129f1863c3dd4b2b9e6211c48ae6231054db0ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_subnet_8xb128_in1k.py
@@ -0,0 +1,12 @@
+_base_ = './detnas_shufflenet_supernet_8xb128_in1k.py'
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    # NOTE: You can replace the yaml with the mutable_cfg searched by yourself
+    fix_subnet=  # noqa: E251
+    'https://download.openmmlab.com/mmrazor/v1/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20220715-61d2e900_subnet_cfg_v1.yaml'  # noqa: E501
+)
+
+find_unused_parameters = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_supernet_8xb128_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_supernet_8xb128_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..0e514e83fc20f329c77691c73425103314f84357
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/detnas_shufflenet_supernet_8xb128_in1k.py
@@ -0,0 +1 @@
+_base_ = 'mmrazor::nas/mmcls/spos/shufflenet/spos_shufflenet_supernet_8xb128_in1k.py'  # noqa: E501
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6a9a7d880ec5c320e6b73d2ddb639e653bf6b691
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/nas/mmdet/detnas/metafile.yml
@@ -0,0 +1,30 @@
+Collections:
+  - Name: DetNAS
+    Metadata:
+      Training Data:
+        - ImageNet-1k
+        - COCO
+    Paper:
+      URL: https://arxiv.org/abs/1903.10979
+      Title: DetNAS:Backbone Search for Object Detection
+    README: configs/nas/mmdet/detnas/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmrazor/blob/v0.1.0/mmrazor/models/algorithms/detnas.py
+      Version: v0.1.0
+    Converted From:
+      Code: https://github.com/megvii-model/DetNAS
+Models:
+  - Name: detnas_frcnn_shufflenet_subnet_coco_1x
+    In Collection: DetNAS
+    Metadata:
+      FLOPs(Backbone): 340 MB
+      Params(Backbone): 3.35 MB
+      Supernet: FRCNN-ShuffleNetV2
+      Mutable: https://download.openmmlab.com/mmrazor/v1/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20220715-61d2e900_subnet_cfg_v1.yaml
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 37.5
+    Config: configs/nas/mmdet/detnas/detnas_frcnn_shufflenet_subnet_coco_1x.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/detnas/detnas_subnet_frcnn_shufflenetv2_fpn_1x_coco_bbox_backbone_flops-0.34M_mAP-37.5_20220715-61d2e900_v1.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e294bee8547d33727cfee539bea39046a0ef97df
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/README.md
@@ -0,0 +1,246 @@
+# Group_fisher pruning
+
+> [Group Fisher Pruning for Practical Network Compression.](https://arxiv.org/pdf/2108.00708.pdf)
+
+## Abstract
+
+Network compression has been widely studied since it is able to reduce the memory and computation cost during inference. However, previous methods seldom deal with complicated structures like residual connections, group/depthwise convolution and feature pyramid network, where channels of multiple layers are coupled and need to be pruned simultaneously. In this paper, we present a general channel pruning approach that can be applied to various complicated structures. Particularly, we propose a layer grouping algorithm to find coupled channels automatically. Then we derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Moreover, we find that inference speedup on GPUs is more correlated with the reduction of memory rather than FLOPs, and thus we employ the memory reduction of each channel to normalize the importance. Our method can be used to prune any structures including those with coupled channels. We conduct extensive experiments on various backbones, including the classic ResNet and ResNeXt, mobilefriendly MobileNetV2, and the NAS-based RegNet, both on image classification and object detection which is under-explored. Experimental results validate that our method can effectively prune sophisticated networks, boosting inference speed without sacrificing accuracy.
+
+![pipeline](https://github.com/jshilong/FisherPruning/blob/main/resources/structures.png?raw=true)
+
+## Results and models
+
+### Classification on ImageNet
+
+| Model                         | Top-1 | Gap   | Flop(G) | Remain(%) | Parameters(M) | Remain(%) | Config                                   | Download                                                    | Onnx_cpu(FPS) |
+| ----------------------------- | ----- | ----- | ------- | --------- | ------------- | --------- | ---------------------------------------- | ----------------------------------------------------------- | ------------- |
+| ResNet50                      | 76.55 | -     | 4.11    | -         | 25.6          | -         | [mmcls][cls_r50_c]                       | [model][cls_r50_m]                                          | 55.360        |
+| ResNet50_pruned_act           | 75.22 | -1.33 | 2.06    | 50.1%     | 16.3          | 63.7%     | [prune][r_a_pc] \| [finetune][r_a_fc]    | [pruned][r_a_p] \| [finetuned][r_a_f] \| [log][r_a_l]       | 80.671        |
+| ResNet50_pruned_act + dist kd | 76.50 | -0.05 | 2.06    | 50.1%     | 16.3          | 63.7%     | [prune][r_a_pc] \| [finetune][r_a_fc_kd] | [pruned][r_a_p] \| [finetuned][r_a_f_kd] \| [log][r_a_l_kd] | 80.671        |
+| ResNet50_pruned_flops         | 75.61 | -0.94 | 2.06    | 50.1%     | 16.3          | 63.7%     | [prune][r_f_pc] \| [finetune][r_f_fc]    | [pruned][r_f_p] \| [finetuned][r_f_f] \| [log][r_f_l]       | 78.674        |
+| MobileNetV2                   | 71.86 | -     | 0.313   | -         | 3.51          | -         | [mmcls][cls_m_c]                         | [model][cls_m_m]                                            | 419.673       |
+| MobileNetV2_pruned_act        | 70.82 | -1.04 | 0.207   | 66.1%     | 3.18          | 90.6%     | [prune][m_a_pc] \| [finetune][m_a_fc]    | [pruned][m_a_p] \| [finetuned][m_a_f] \| [log][m_a_l]       | 576.118       |
+| MobileNetV2_pruned_flops      | 70.87 | -0.99 | 0.207   | 66.1%     | 2.82          | 88.7%     | [prune][m_f_pc] \| [finetune][m_f_fc]    | [pruned][m_f_p] \| [finetuned][m_f_f] \| [log][m_f_l]       | 540.105       |
+
+**Note**
+
+- Because the pruning papers use different pretraining and finetuning settings, It is hard to compare them fairly. As a result, we prefer to apply algorithms on the openmmlab settings.
+- This may make the experiment results are different from that in the original papers.
+
+### Detection on COCO
+
+| Model(Detector-Backbone)       | AP   | Gap  | Flop(G) | Remain(%) | Parameters(M) | Remain(%) | Config                                  | Download                                                 | Onnx_cpu(FPS) |
+| ------------------------------ | ---- | ---- | ------- | --------- | ------------- | --------- | --------------------------------------- | -------------------------------------------------------- | ------------- |
+| RetinaNet-R50-FPN              | 36.5 | -    | 250     | -         | 63.8          | -         | [mmdet][det_rt_c]                       | [model][det_rt_m]                                        | 1.095         |
+| RetinaNet-R50-FPN_pruned_act   | 36.5 | 0.0  | 126     | 50.4%     | 34.6          | 54.2%     | [prune][rt_a_pc] \| [finetune][rt_a_fc] | [pruned][rt_a_p] \| [finetuned][rt_a_f] \| [log][rt_a_l] | 1.608         |
+| RetinaNet-R50-FPN_pruned_flops | 36.6 | +0.1 | 126     | 50.4%     | 34.9          | 54.7%     | [prune][rt_f_pc] \| [finetune][rt_f_fc] | [pruned][rt_f_p] \| [finetuned][rt_f_f] \| [log][rt_f_l] | 1.609         |
+
+### Pose on COCO
+
+| Model                | AP    | Gap    | Flop(G) | Remain(%) | Parameters(M) | Remain(%) | Config                                  | Download                                                    | Onnx_cpu(FPS) |
+| -------------------- | ----- | ------ | ------- | --------- | ------------- | --------- | --------------------------------------- | ----------------------------------------------------------- | ------------- |
+| rtmpose-s            | 0.716 | -      | 0.68    | -         | 5.47          | -         | [mmpose][pose_s_c]                      | [model][pose_s_m]                                           | 196           |
+| rtmpose-s_pruned_act | 0.691 | -0.025 | 0.34    | 50.0%     | 3.42          | 62.5%     | [prune][rp_a_pc] \| [finetune][rp_a_fc] | [pruned][rp_sc_p] \| [finetuned][rp_sc_f] \| [log][rp_sc_l] | 268           |
+| rtmpose-t            | 0.682 | -      | 0.35    | -         | 3.34          | -         | [mmpose][pose_t_c]                      | [model][pose_t_m]                                           | 279           |
+
+| Model                         | AP    | Gap    | Flop(G) | Remain(%) | Parameters(M) | Remain(%) | Config                                  | Download                                                    | Onnx_cpu(FPS) |
+| ----------------------------- | ----- | ------ | ------- | --------- | ------------- | --------- | --------------------------------------- | ----------------------------------------------------------- | ------------- |
+| rtmpose-s-aic-coco            | 0.722 | -      | 0.68    | -         | 5.47          | -         | [mmpose][pose_s_c]                      | [model][pose_s_m]                                           | 196           |
+| rtmpose-s-aic-coco_pruned_act | 0.694 | -0.028 | 0.35    | 51.5%     | 3.43          | 62.7%     | [prune][rp_a_pc] \| [finetune][rp_a_fc] | [pruned][rp_sa_p] \| [finetuned][rp_sa_f] \| [log][rp_sa_l] | 272           |
+| rtmpose-t-aic-coco            | 0.685 | -      | 0.35    | -         | 3.34          | -         | [mmpose][pose_t_c]                      | [model][pose_t_m]                                           | 279           |
+
+- All FPS is test on the same machine with 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz.
+
+## Get Started
+
+We have three steps to apply GroupFisher to your model, including Prune, Finetune, Deploy.
+
+Note: please use torch>=1.12, as we need fxtracer to parse the models automatically.
+
+### Prune
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh \
+  {config_folder}/group_fisher_{normalization_type}_prune_{model_name}.py 8 \
+  --work-dir $WORK_DIR
+```
+
+In the pruning config file. You have to fill some args as below.
+
+```python
+"""
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+```
+
+After the pruning process, you will get a checkpoint of the pruned model named flops\_{target_flop_ratio}.pth in your workdir.
+
+### Finetune
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh \
+   {config_folder}/group_fisher_{normalization_type}_finetune_{model_name}.py 8 \
+  --work-dir $WORK_DIR
+```
+
+There are also some args for you to fill in the config file as below.
+
+```python
+"""
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+```
+
+After finetuning, except a checkpoint of the best model, there is also a fix_subnet.json, which records the pruned model structure. It will be used when deploying.
+
+### Test
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_test.sh \
+   {config_folder}/group_fisher_{normalization_type}_finetune_{model_name}.py {checkpoint_path} 8
+```
+
+### Deploy
+
+First, we assume you are fimilar to mmdeploy. For a pruned model, you only need to use the pruning deploy config to instead the pretrain config to deploy the pruned version of your model.
+
+```bash
+python {mmdeploy}/tools/deploy.py \
+    {mmdeploy}/{mmdeploy_config}.py \
+    {config_folder}/group_fisher_{normalization_type}_deploy_{model_name}.py \
+    {path_to_finetuned_checkpoint}.pth \
+    {mmdeploy}/tests/data/tiger.jpeg
+```
+
+The deploy config has some args as below:
+
+```python
+"""
+_base_ (str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+```
+
+The divisor is important for the actual inference speed, and we suggest you to test it in \[1,2,4,8,16,32\] to find the fastest divisor.
+
+## Implementation
+
+All the modules of GroupFisher is placesded in mmrazor/implementations/pruning/group_fisher/.
+
+| File                 | Module                                                               | Feature                                                                                 |
+| -------------------- | -------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
+| algorithm.py         | GroupFisherAlgorithm                                                 | Dicide when to prune a channel according to the interval and the current iteration.     |
+| mutator.py           | GroupFisherChannelMutator                                            | Select the unit with the channel of the minimal importance and to prune it.             |
+| unit.py              | GroupFisherChannelUnit                                               | Compute fisher info                                                                     |
+| ops.py <br> counters | GroupFisherConv2d <br> GroupFisherLinear <br> corresbonding counters | Collect model info to compute fisher info, including activation, grad and tensor shape. |
+
+There are also some modules to support GroupFisher. These modules may be refactored and moved to other folders as common modules for all pruning algorithms.
+
+| File                      | Module                                   | Feature                                                             |
+| ------------------------- | ---------------------------------------- | ------------------------------------------------------------------- |
+| hook.py                   | PruningStructureHook<br>ResourceInfoHook | Display pruning Structure iteratively.                              |
+| prune_sub_model.py        | GroupFisherSubModel                      | Convert a pruning algorithm(architecture) to a pruned static model. |
+| prune_deploy_sub_model.py | GroupFisherDeploySubModel                | Init a pruned static model for mmdeploy.                            |
+
+## Citation
+
+```latex
+@InProceedings{Liu:2021,
+   TITLE      = {Group Fisher Pruning for Practical Network Compression},
+   AUTHOR     = {Liu, Liyang
+               AND Zhang, Shilong
+               AND Kuang, Zhanghui
+               AND Zhou, Aojun
+               AND Xue, Jing-hao
+               AND Wang, Xinjiang
+               AND Chen, Yimin
+               AND Yang, Wenming
+               AND Liao, Qingmin
+               AND Zhang, Wayne},
+   BOOKTITLE  = {Proceedings of the 38th International Conference on Machine Learning},
+   YEAR       = {2021},
+   SERIES     = {Proceedings of Machine Learning Research},
+   MONTH      = {18--24 Jul},
+   PUBLISHER  = {PMLR},
+}
+```
+
+<!-- model links
+{model}_{prune_mode}_{file type}
+model: r: resnet50, m: mobilenetv2, rt:retinanet
+prune_mode: a: act, f: flops
+file_type: p: pruned model, f:finetuned_model, l: log, pc: prune config, fc: finetune config.
+
+repo link
+{repo}_{model}_{file type}
+ -->
+
+[cls_m_c]: https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+[cls_m_m]: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+[cls_r50_c]: https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/resnet/resnet50_8xb32_in1k.py
+[cls_r50_m]: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+[det_rt_c]: https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/retinanet/retinanet_r50_fpn_1x_coco.py
+[det_rt_m]: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth
+[m_a_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.pth
+[m_a_fc]: ../../mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py
+[m_a_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/20230130_203443.json
+[m_a_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.pth
+[m_a_pc]: ../../mmcls/group_fisher/mobilenet/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py
+[m_f_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.pth
+[m_f_fc]: ../../mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py
+[m_f_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/20230201_211550.json
+[m_f_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.pth
+[m_f_pc]: ../../mmcls/group_fisher/mobilenet/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py
+[pose_s_c]: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-s_simcc-coco_pt-aic-coco_420e-256x192-8edcf0d7_20230127.pth
+[pose_s_m]: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-s_simcc-coco_pt-aic-coco_420e-256x192-8edcf0d7_20230127.pth
+[pose_t_c]: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-tiny_simcc-coco_pt-aic-coco_420e-256x192-e613ba3f_20230127.pth
+[pose_t_m]: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+[rp_a_fc]: ../../mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.py
+[rp_a_pc]: ../../mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.py
+[rp_sa_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.pth
+[rp_sa_l]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.json
+[rp_sa_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.pth
+[rp_sc_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.pth
+[rp_sc_l]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.json
+[rp_sc_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.pth
+[rt_a_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/act/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.pth
+[rt_a_fc]: ../../mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py
+[rt_a_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/retinanet/act/20230113_231904.json
+[rt_a_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/act/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.pth
+[rt_a_pc]: ../../mmdet/group_fisher/retinanet/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py
+[rt_f_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.pth
+[rt_f_fc]: ../../mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py
+[rt_f_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/20230129_101502.json
+[rt_f_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.pth
+[rt_f_pc]: ../../mmdet/group_fisher/retinanet/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py
+[r_a_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_finetune_resnet50_8xb32_in1k.pth
+[r_a_fc]: ../../mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py
+[r_a_fc_kd]: ../../mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.py
+[r_a_f_kd]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.pth
+[r_a_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/resnet50/act/20230130_175426.json
+[r_a_l_kd]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.json
+[r_a_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_prune_resnet50_8xb32_in1k.pth
+[r_a_pc]: ../../mmcls/group_fisher/resnet50/group_fisher_act_prune_resnet50_8xb32_in1k.py
+[r_f_f]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/group_fisher_flops_finetune_resnet50_8xb32_in1k.pth
+[r_f_fc]: ../../mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py
+[r_f_l]: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/20230129_190931.json
+[r_f_p]: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/group_fisher_flops_prune_resnet50_8xb32_in1k.pth
+[r_f_pc]: ../../mmcls/group_fisher/resnet50/group_fisher_flops_prune_resnet50_8xb32_in1k.py
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_deploy_template.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_deploy_template.py
new file mode 100644
index 0000000000000000000000000000000000000000..9964448375a47bf53fa0474e88a3c3a7f8f51175
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_deploy_template.py
@@ -0,0 +1,24 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = ''
+fix_subnet = {}
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_finetune_template.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_finetune_template.py
new file mode 100644
index 0000000000000000000000000000000000000000..bee977d9355a32923c8ed09cf4bd6d1f7d07e2dc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_finetune_template.py
@@ -0,0 +1,32 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = ''
+pruned_path = ''
+finetune_lr = 0.1
+##############################################################################
+
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_prune_template.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_prune_template.py
new file mode 100644
index 0000000000000000000000000000000000000000..74d485e1f94e8f5859b44c0265cb3fce700a1bee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/base/group_fisher/group_fisher_prune_template.py
@@ -0,0 +1,75 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = ''
+pretrained_path = ''
+
+interval = 10
+normalization_type = 'act'
+lr_ratio = 0.1
+
+target_flop_ratio = 0.5
+input_shape = (1, 3, 224, 224)
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = {}
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..ba5692d61c3fed8c6338b2d1742dc16d34cd6986
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/README.md
@@ -0,0 +1,82 @@
+# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+
+## Abstract
+
+The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012.
+
+![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg)
+
+## Results and models
+
+### 1. Classification
+
+| Dataset  |   Backbone   | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) |                     CPrate                      |                        Config                        |           Download           |
+| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: |
+| ImageNet | DCFFResNet50 |   15.16   |   2260   |  step   |   73.96   |   91.66   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) |
+
+### 2. Detection
+
+| Dataset |   Method    |   Backbone   |  Style  | Lr schd | Params(M) | FLOPs(M) | bbox AP |                     CPrate                      |                              Config                               |           Download           |
+| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | Faster_RCNN | DCFFResNet50 | pytorch |  step   |   33.31   |  168320  |  35.8   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+### 3. Segmentation
+
+|  Dataset   |  Method   |    Backbone     | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU  |                               CPrate                                |                                Config                                 |           Download           |
+| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: |
+| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024  |  160k   |   18.43   |  74410   | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) |
+
+### 4. Pose
+
+| Dataset |     Method      |   Backbone   | crop size | total epochs | Params(M) | FLOPs(M) |  AP  |                           CPrate                           |                              Config                               |           Download           |
+| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | TopDown HeatMap | DCFFResNet50 |  256x192  |     300      |   26.95   |   4290   | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+## Citation
+
+```latex
+@article{lin2021training,
+  title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion},
+  author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi},
+  journal={arXiv preprint arXiv:2107.06916},
+  year={2021}
+}
+```
+
+## Get Started
+
+### Generate channel_config file
+
+Generate `resnet_cls.json` with `tools/pruning/get_channel_units.py`.
+
+```bash
+python tools/pruning/get_channel_units.py
+  configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py \
+  -c -i --output-path=configs/pruning/mmcls/dcff/resnet_cls.json
+```
+
+Then set layers' pruning rates `target_pruning_ratio` by `resnet_cls.json`.
+
+### Train DCFF
+
+#### Classification
+
+##### ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/pruning/mmcls/dcff/dcff_resnet50_8xb32_in1k.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Test DCFF
+
+#### Classification
+
+##### ImageNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_test.sh \
+  configs/pruning/mmcls/dcff/dcff_compact_resnet50_8xb32_in1k.py \
+  $CKPT 1 --work-dir $WORK_DIR
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_compact_resnet_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_compact_resnet_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a98b25841dd55630aa016230f56f1ad9edf9784
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_compact_resnet_8xb32_in1k.py
@@ -0,0 +1,13 @@
+_base_ = ['dcff_resnet_8xb32_in1k.py']
+
+# model settings
+_base_.model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    fix_subnet='configs/pruning/mmcls/dcff/fix_subnet.json',
+    mode='mutator',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint='configs/pruning/mmcls/dcff/fix_subnet_weight.pth'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_resnet_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_resnet_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f833cb5621cecd0e8668d6dec8330a68310373f4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/dcff_resnet_8xb32_in1k.py
@@ -0,0 +1,84 @@
+_base_ = [
+    'mmcls::_base_/datasets/imagenet_bs32.py',
+    'mmcls::_base_/schedules/imagenet_bs256.py',
+    'mmcls::_base_/default_runtime.py'
+]
+
+stage_ratio_1 = 0.65
+stage_ratio_2 = 0.6
+stage_ratio_3 = 0.9
+stage_ratio_4 = 0.7
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/pruning/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+    # block 1 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+    # block 2 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+    # block 3 [0.65, 0.6]*2+[0.7, 0.7]*2 downsample=[0.9]
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+    # block 4 [0.7, 0.7] downsample=[0.9]
+}
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001))
+param_scheduler = dict(
+    type='MultiStepLR', by_epoch=True, milestones=[30, 60, 90], gamma=0.1)
+train_cfg = dict(by_epoch=True, max_epochs=120, val_interval=1)
+
+data_preprocessor = {'type': 'mmcls.ClsDataPreprocessor'}
+
+# model settings
+model = dict(
+    _scope_='mmrazor',
+    type='DCFF',
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    mutator_cfg=dict(
+        type='DCFFChannelMutator',
+        channel_unit_cfg=dict(
+            type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')),
+    data_preprocessor=None,
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=1,
+    linear_schedule=False)
+
+val_cfg = dict(_delete_=True, type='mmrazor.ItePruneValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/fix_subnet.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/fix_subnet.json
new file mode 100644
index 0000000000000000000000000000000000000000..dfdcea75873d2516af6db193343f0b4df7376bb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dcff/fix_subnet.json
@@ -0,0 +1,141 @@
+{
+    "type":"DCFFChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DCFFChannelUnit",
+        "default_args":{
+            "choice_mode":"ratio"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":1.0
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59375
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59765625
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_MBV2_100M.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_MBV2_100M.json
new file mode 100644
index 0000000000000000000000000000000000000000..d4ee2409fd401809050cede5112d0790e2534512
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_MBV2_100M.json
@@ -0,0 +1,271 @@
+{
+    "type":"DMCPChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DMCPChannelUnit",
+        "default_args":{
+            "choice_mode":"number"
+        },
+        "units":{
+            "backbone.conv1.conv_(0, 32)_32":{
+                "init_args":{
+                    "num_channels":32,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":9
+            },
+            "backbone.layer1.0.conv.1.conv_(0, 16)_16":{
+                "init_args":{
+                    "num_channels":16,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":10
+            },
+            "backbone.layer2.0.conv.0.conv_(0, 96)_96":{
+                "init_args":{
+                    "num_channels":96,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":36
+            },
+            "backbone.layer2.0.conv.2.conv_(0, 24)_24":{
+                "init_args":{
+                    "num_channels":24,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":16
+            },
+            "backbone.layer2.1.conv.0.conv_(0, 144)_144":{
+                "init_args":{
+                    "num_channels":144,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":16
+            },
+            "backbone.layer3.0.conv.0.conv_(0, 144)_144":{
+                "init_args":{
+                    "num_channels":144,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":48
+            },
+            "backbone.layer3.0.conv.2.conv_(0, 32)_32":{
+                "init_args":{
+                    "num_channels":32,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":21
+            },
+            "backbone.layer3.1.conv.0.conv_(0, 192)_192":{
+                "init_args":{
+                    "num_channels":192,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":41
+            },
+            "backbone.layer3.2.conv.0.conv_(0, 192)_192":{
+                "init_args":{
+                    "num_channels":192,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":22
+            },
+            "backbone.layer4.0.conv.0.conv_(0, 192)_192":{
+                "init_args":{
+                    "num_channels":192,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":60
+            },
+            "backbone.layer4.0.conv.2.conv_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":24
+            },
+            "backbone.layer4.1.conv.0.conv_(0, 384)_384":{
+                "init_args":{
+                    "num_channels":384,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":44
+            },
+            "backbone.layer4.2.conv.0.conv_(0, 384)_384":{
+                "init_args":{
+                    "num_channels":384,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":272
+            },
+            "backbone.layer4.3.conv.0.conv_(0, 384)_384":{
+                "init_args":{
+                    "num_channels":384,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":272
+            },
+            "backbone.layer5.0.conv.0.conv_(0, 384)_384":{
+                "init_args":{
+                    "num_channels":384,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":310
+            },
+            "backbone.layer5.0.conv.2.conv_(0, 96)_96":{
+                "init_args":{
+                    "num_channels":96,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":36
+            },
+            "backbone.layer5.1.conv.0.conv_(0, 576)_576":{
+                "init_args":{
+                    "num_channels":576,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":294
+            },
+            "backbone.layer5.2.conv.0.conv_(0, 576)_576":{
+                "init_args":{
+                    "num_channels":576,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":351
+            },
+            "backbone.layer6.0.conv.0.conv_(0, 576)_576":{
+                "init_args":{
+                    "num_channels":576,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":693
+            },
+            "backbone.layer6.0.conv.2.conv_(0, 160)_160":{
+                "init_args":{
+                    "num_channels":160,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":80
+            },
+            "backbone.layer6.1.conv.0.conv_(0, 960)_960":{
+                "init_args":{
+                    "num_channels":960,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":96
+            },
+            "backbone.layer6.2.conv.0.conv_(0, 960)_960":{
+                "init_args":{
+                    "num_channels":960,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":864
+            },
+            "backbone.layer7.0.conv.0.conv_(0, 960)_960":{
+                "init_args":{
+                    "num_channels":960,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":960
+            },
+            "backbone.layer7.0.conv.2.conv_(0, 320)_320":{
+                "init_args":{
+                    "num_channels":320,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":192
+            },
+            "backbone.conv2.conv_(0, 1280)_1280":{
+                "init_args":{
+                    "num_channels":1280,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":1280
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_R50_2G.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_R50_2G.json
new file mode 100644
index 0000000000000000000000000000000000000000..833707cde0742a1cc1349c46f56aa4e747120fbe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/DMCP_R50_2G.json
@@ -0,0 +1,391 @@
+{
+    "type":"DMCPChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DMCPChannelUnit",
+        "default_args":{
+            "choice_mode":"number"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":52
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":22
+            },
+            "backbone.layer1.0.conv2_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":22
+            },
+            "backbone.layer1.0.conv3_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":106
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":16
+            },
+            "backbone.layer1.1.conv2_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":16
+            },
+            "backbone.layer1.2.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":40
+            },
+            "backbone.layer1.2.conv2_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":16
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":68
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":56
+            },
+            "backbone.layer2.0.conv3_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":155
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":32
+            },
+            "backbone.layer2.1.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":68
+            },
+            "backbone.layer2.2.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":56
+            },
+            "backbone.layer2.2.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":56
+            },
+            "backbone.layer2.3.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":80
+            },
+            "backbone.layer2.3.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":92
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.0.conv3_(0, 1024)_1024":{
+                "init_args":{
+                    "num_channels":1024,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":1024
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":106
+            },
+            "backbone.layer3.1.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":106
+            },
+            "backbone.layer3.2.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":131
+            },
+            "backbone.layer3.2.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.3.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":131
+            },
+            "backbone.layer3.3.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.4.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.4.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.5.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer3.5.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":256
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":512
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":512
+            },
+            "backbone.layer4.0.conv3_(0, 2048)_2048":{
+                "init_args":{
+                    "num_channels":2048,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":2048
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":512
+            },
+            "backbone.layer4.1.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":461
+            },
+            "backbone.layer4.2.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":512
+            },
+            "backbone.layer4.2.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"number",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.5
+                },
+                "choice":512
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3a96b61c32a17c710e98284b3892e69b7005b65c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/README.md
@@ -0,0 +1,62 @@
+# DMCP: Differentiable Markov Channel Pruning for Neural Networks
+
+## Abstract
+
+Recent works imply that the channel pruning can be regarded as searching optimal sub-structure from unpruned networks. However, existing works based on this observation require training and evaluating a large number of structures, which limits their application. In this paper, we propose a novel differentiable method for channel pruning, named Differentiable Markov Channel Pruning (DMCP), to efficiently search the optimal sub-structure. Our method is differentiable and can be directly optimized by gradient descent with respect to standard task loss and budget regularization (e.g. FLOPs constraint). In DMCP, we model the channel pruning as a Markov process, in which each state represents for retaining the corresponding channel during pruning, and transitions between states denote the pruning process. In the end, our method is able to implicitly select the proper number of channels in each layer by the Markov process with optimized transitions. To validate the effectiveness of our method, we perform extensive experiments on Imagenet with ResNet and MobilenetV2. Results show our method can achieve consistent improvement than stateof-the-art pruning methods in various FLOPs settings.
+
+## Getting Started
+
+#### Train DMCP from scratch
+
+```bash
+GPUS=32 sh tools/slurm_train.sh $PARTITION $JOB_NAME \
+  configs/pruning/mmcls/dmcp/dmcp_resnet50_supernet_32xb64.py \
+  --work-dir $WORK_DIR
+```
+
+#### After the previous steps, retrain the selected pruned sub-network
+
+#### with 2GFLOPs based on the output structure
+
+#### 'DMCP_R50_2G.json'(SOURCECODE)
+
+```bash
+GPUS=32 sh tools/slurm_train.sh $PARTITION $JOB_NAME \
+  configs/pruning/mmcls/dmcp/dmcp_resnet50_subnet_32xb64.py \
+  --work-dir $WORK_DIR
+```
+
+<!-- ## Results and models
+
+### 1.Classification
+
+| Dataset  |  Supernet   |    Flops(M)    | Top-1 (%) | Top-5 (%) |                 config                 |                                                                                                                           Download                                                                                                                           |                                               Remark                                                |
+| :------: | :---------: | :------------: | :-------: | :-------: | :------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------: |
+| ImageNet | MobilenetV2 | 319M(Supernet) |   72.30   |   90.42   |                   -                    |                                                                                                                              -                                                                                                                               |                                                  -                                                  |
+| ImageNet | MobilenetV2 |  209M(Subnet)  |   71.94   |   90.05   | [config](./dmcp_mbv2_subnet_32xb64.py) | [model](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/200M/DMCP_MBV2_200M.pth)  / [log](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/200M/dmcp_mobilenetv2_supernet_32xb64_target_flops_200m_20230129_184919.log) | [arch](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/200M/DMCP_MBV2_200M.json) |
+| ImageNet | MobilenetV2 |  102M(Subnet)  |   67.22   |   88.61   | [config](./dmcp_mbv2_subnet_32xb64.py) | [model](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/100M/DMCP_MBV2_100M.pth) / [log](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/100M/dmcp_mobilenetv2_supernet_32xb64_target_flops_100m_20230129_184919.log)  |                                   [arch\*](./DMCP_MBV2_100M.json)                                   | -->
+
+<!-- The result of Resnet50 has bugs. -->
+
+<!-- | ImageNet |  ResNet50   | 4.09G(Supernet) |   77.46   |   93.55   |                     -                      |                                                                                                                              -                                                                                                                               |                                                  -                                                  |
+| ImageNet |  ResNet50   |  2.07G(Subnet)  |   76.11   |   93.01   | [config](./dmcp_resnet50_subnet_32xb64.py) |          [model](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/2G/DMCP_R50_2G.pth)  / [log](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/2G/dmcp_resnet50_supernet_32xb64_target_flops_2g_20230129_112944.log)          |                                    [arch\*](./DMCP_R50_2G.json)                                     |
+| ImageNet |  ResNet50   |  1.05G(Subnet)  |   74.12   |   92.33   | [config](./dmcp_resnet50_subnet_32xb64.py) |          [model](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/1G/DMCP_R50_1G.pth) / [log](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/1G/dmcp_resnet50_supernet_32xb64_target_flops_1g_20230107_223552.log)           |     [arch](https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/1G/DMCP_R50_1G.json)     | -->
+
+<!-- **Note**
+
+1. Arch with * are converted from the [official repo](https://github.com/Zx55/dmcp).
+2. To get the sub-network structure with different pruning rates, we support modifying `target_flops` in `model` in the supernet config, note that here it is in MFLOPs. For example, `target_flops=1000` means get subnet with 1GFLOPs.
+3. When outputting the sampled sub-network, the FLOPs will fluctuate around 5% around the target value for efficiency.
+4. More models with different pruning rates will be released later. -->
+
+## Citation
+
+```latex
+@inproceedings{guo2020dmcp,
+  title={Dmcp: Differentiable markov channel pruning for neural networks},
+  author={Guo, Shaopeng and Wang, Yujie and Li, Quanquan and Yan, Junjie},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={1539--1547},
+  year={2020}
+}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_subnet_32xb64.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_subnet_32xb64.py
new file mode 100644
index 0000000000000000000000000000000000000000..81880f4eb0c0d17489f146408242048aaa5c3398
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_subnet_32xb64.py
@@ -0,0 +1,49 @@
+_base_ = ['dmcp_mbv2_supernet_32xb64.py']
+
+paramwise_cfg = dict(norm_decay_mult=0.0, bias_decay_mult=0.0)
+
+_base_.optim_wrapper = dict(
+    optimizer=dict(
+        type='SGD', lr=0.8, momentum=0.9, weight_decay=0.00004, nesterov=True),
+    paramwise_cfg=paramwise_cfg)
+
+max_epochs = 100
+
+_base_.param_scheduler = [
+    # warm up learning rate scheduler
+    dict(
+        type='LinearLR',
+        start_factor=0.25,
+        by_epoch=True,
+        begin=0,
+        end=3,
+        convert_to_iter_based=True),
+    # main learning rate scheduler
+    dict(
+        type='CosineAnnealingLR',
+        T_max=max_epochs,
+        eta_min=1e-5,
+        by_epoch=True,
+        begin=3,
+        end=max_epochs,
+        convert_to_iter_based=True),
+]
+
+_base_.train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
+
+custom_hooks = None
+
+# model settings
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    fix_subnet='configs/pruning/mmcls/dmcp/DMCP_MBV2_100M.json',
+    mode='mutator')
+
+default_hooks = _base_.default_hooks
+default_hooks['checkpoint'] = dict(type='CheckpointHook', interval=5)
+
+_base_.model_wrapper_cfg = None
+
+randomness = dict(seed=4872, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_supernet_32xb64.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_supernet_32xb64.py
new file mode 100644
index 0000000000000000000000000000000000000000..4109964c4e562d953dbcf9ae9e520b93d1ba1df9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_mbv2_supernet_32xb64.py
@@ -0,0 +1,61 @@
+_base_ = [
+    'mmcls::_base_/default_runtime.py',
+    '../../../_base_/settings/imagenet_bs2048_dmcp.py',
+]
+
+# model settings
+supernet = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(type='MobileNetV2', widen_factor=1.0),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=1280,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            mode='original',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+model = dict(
+    _scope_='mmrazor',
+    type='DMCP',
+    architecture=supernet,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(recorder='fc', from_student=True),
+                preds_T=dict(recorder='fc', from_student=False)))),
+    mutator_cfg=dict(
+        type='DMCPChannelMutator',
+        channel_unit_cfg=dict(
+            type='DMCPChannelUnit', default_args=dict(choice_mode='number')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')),
+    strategy=['max', 'min', 'scheduled_random', 'arch_random'],
+    arch_start_train=5000,
+    arch_train_freq=500,
+    flop_loss_weight=0.1,
+    distillation_times=10000,
+    target_flops=100)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.DMCPDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+randomness = dict(seed=0, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_subnet_32xb64.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_subnet_32xb64.py
new file mode 100644
index 0000000000000000000000000000000000000000..c612e3aa50c7b05a8539e9fd3ab8289a323d6217
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_subnet_32xb64.py
@@ -0,0 +1,48 @@
+_base_ = ['dmcp_resnet50_supernet_32xb64.py']
+
+paramwise_cfg = dict(norm_decay_mult=0.0, bias_decay_mult=0.0)
+
+_base_.optim_wrapper = dict(
+    optimizer=dict(
+        type='SGD', lr=0.8, momentum=0.9, weight_decay=0.0001, nesterov=True),
+    paramwise_cfg=paramwise_cfg)
+
+max_epochs = 100
+
+_base_.param_scheduler = [
+    # warm up learning rate scheduler
+    dict(
+        type='LinearLR',
+        start_factor=0.25,
+        by_epoch=True,
+        begin=0,
+        end=2,
+        convert_to_iter_based=True),
+    # main learning rate scheduler
+    dict(
+        type='CosineAnnealingLR',
+        T_max=max_epochs,
+        eta_min=1e-5,
+        by_epoch=True,
+        begin=2,
+        end=max_epochs,
+        convert_to_iter_based=True),
+]
+
+_base_.train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
+
+custom_hooks = None
+
+model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.supernet,
+    fix_subnet='configs/pruning/mmcls/dmcp/DMCP_R50_2G.json',
+    mode='mutator')
+
+default_hooks = _base_.default_hooks
+default_hooks['checkpoint'] = dict(type='CheckpointHook', interval=5)
+
+_base_.model_wrapper_cfg = None
+
+randomness = dict(seed=2016, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_supernet_32xb64.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_supernet_32xb64.py
new file mode 100644
index 0000000000000000000000000000000000000000..9aeaeb838cc38cde3184b55862d76d61dd8c34e0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/dmcp_resnet50_supernet_32xb64.py
@@ -0,0 +1,67 @@
+_base_ = [
+    'mmcls::_base_/default_runtime.py',
+    '../../../_base_/settings/imagenet_bs2048_dmcp.py',
+]
+
+# model settings
+supernet = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(3, ),
+        style='pytorch'),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=2048,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            mode='original',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+# model settings
+model = dict(
+    _scope_='mmrazor',
+    type='DMCP',
+    architecture=supernet,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(recorder='fc', from_student=True),
+                preds_T=dict(recorder='fc', from_student=False)))),
+    mutator_cfg=dict(
+        type='DMCPChannelMutator',
+        channel_unit_cfg=dict(
+            type='DMCPChannelUnit', default_args=dict(choice_mode='number')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')),
+    strategy=['max', 'min', 'scheduled_random', 'arch_random'],
+    arch_start_train=5000,
+    arch_train_freq=500,
+    flop_loss_weight=0.1,
+    distillation_times=10000,
+    target_flops=2000)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.DMCPDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+randomness = dict(seed=2020, diff_rank_seed=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..131f5c2892bce867601f0c3ac6b8085e2efe3908
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/dmcp/metafile.yml
@@ -0,0 +1,19 @@
+# Models:
+  # - Name: dmcp_resnet50_subnet_32xb64
+  #   In Collection: DMCP
+  #   Config: configs/pruning/mmcls/dmcp/dmcp_resnet50_subnet_32xb64.py
+  #   Weights: https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/resnet50/2G/DMCP_R50_2G.pth
+  #   Results:
+  #     - Task: Image Classification
+  #       Dataset: ImageNet-1k
+  #       Metrics:
+  #         Top 1 Accuracy: 76.11
+  # - Name: dmcp_mbv2_subnet_32xb64
+  #   In Collection: DMCP
+  #   Config: configs/pruning/mmcls/dmcp/dmcp_mbv2_subnet_32xb64.py
+  #   Weights: https://download.openmmlab.com/mmrazor/v1/pruning/dmcp/mobilenetv2/100M/DMCP_MBV2_100M.pth
+  #   Results:
+  #     - Task: Image Classification
+  #       Dataset: ImageNet-1k
+  #       Metrics:
+  #         Top 1 Accuracy: 67.22
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9b3b09936e3fa962e1cfdd8f74d6d9576900f9dc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/README.md
@@ -0,0 +1,11 @@
+# Group_fisher pruning
+
+> [Group Fisher Pruning for Practical Network Compression.](https://arxiv.org/pdf/2108.00708.pdf)
+
+## Abstract
+
+Network compression has been widely studied since it is able to reduce the memory and computation cost during inference. However, previous methods seldom deal with complicated structures like residual connections, group/depthwise convolution and feature pyramid network, where channels of multiple layers are coupled and need to be pruned simultaneously. In this paper, we present a general channel pruning approach that can be applied to various complicated structures. Particularly, we propose a layer grouping algorithm to find coupled channels automatically. Then we derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Moreover, we find that inference speedup on GPUs is more correlated with the reduction of memory rather than FLOPs, and thus we employ the memory reduction of each channel to normalize the importance. Our method can be used to prune any structures including those with coupled channels. We conduct extensive experiments on various backbones, including the classic ResNet and ResNeXt, mobilefriendly MobileNetV2, and the NAS-based RegNet, both on image classification and object detection which is under-explored. Experimental results validate that our method can effectively prune sophisticated networks, boosting inference speed without sacrificing accuracy.
+
+![pipeline](https://github.com/jshilong/FisherPruning/blob/main/resources/structures.png?raw=true)
+
+**Please refer to the [full README](../../base/group_fisher/README.md) for more details.**
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_deploy_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_deploy_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..372f6a464cf174a333fa5b91ae7660b78d9eeb44
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_deploy_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,50 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py'
+fix_subnet = {
+    'backbone.conv1.conv_(0, 32)_32': 21,
+    'backbone.layer1.0.conv.1.conv_(0, 16)_16': 10,
+    'backbone.layer2.0.conv.0.conv_(0, 96)_96': 45,
+    'backbone.layer2.0.conv.2.conv_(0, 24)_24': 24,
+    'backbone.layer2.1.conv.0.conv_(0, 144)_144': 73,
+    'backbone.layer3.0.conv.0.conv_(0, 144)_144': 85,
+    'backbone.layer3.0.conv.2.conv_(0, 32)_32': 32,
+    'backbone.layer3.1.conv.0.conv_(0, 192)_192': 95,
+    'backbone.layer3.2.conv.0.conv_(0, 192)_192': 76,
+    'backbone.layer4.0.conv.0.conv_(0, 192)_192': 160,
+    'backbone.layer4.0.conv.2.conv_(0, 64)_64': 64,
+    'backbone.layer4.1.conv.0.conv_(0, 384)_384': 204,
+    'backbone.layer4.2.conv.0.conv_(0, 384)_384': 200,
+    'backbone.layer4.3.conv.0.conv_(0, 384)_384': 217,
+    'backbone.layer5.0.conv.0.conv_(0, 384)_384': 344,
+    'backbone.layer5.0.conv.2.conv_(0, 96)_96': 96,
+    'backbone.layer5.1.conv.0.conv_(0, 576)_576': 348,
+    'backbone.layer5.2.conv.0.conv_(0, 576)_576': 338,
+    'backbone.layer6.0.conv.0.conv_(0, 576)_576': 543,
+    'backbone.layer6.0.conv.2.conv_(0, 160)_160': 160,
+    'backbone.layer6.1.conv.0.conv_(0, 960)_960': 810,
+    'backbone.layer6.2.conv.0.conv_(0, 960)_960': 803,
+    'backbone.layer7.0.conv.0.conv_(0, 960)_960': 944,
+    'backbone.layer7.0.conv.2.conv_(0, 320)_320': 320
+}
+divisor = 16
+
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..151e06103934f455d3471c908be6288fd388e745
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,31 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.pth'  # noqa
+finetune_lr = 0.045
+##############################################################################
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8ff9e0062c00564a45985c8793fade434ecc7d3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,75 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = 'mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py'
+pretrained_path = 'https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth'  # noqa
+
+interval = 25
+normalization_type = 'act'
+lr_ratio = 0.1125
+
+target_flop_ratio = 0.65
+input_shape = (1, 3, 224, 224)
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = {}
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_deploy_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_deploy_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..89f7c08a3aebbef0fa7563401c8e3e17bd28da3c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_deploy_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,49 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py'
+fix_subnet = {
+    'backbone.conv1.conv_(0, 32)_32': 27,
+    'backbone.layer1.0.conv.1.conv_(0, 16)_16': 16,
+    'backbone.layer2.0.conv.0.conv_(0, 96)_96': 77,
+    'backbone.layer2.0.conv.2.conv_(0, 24)_24': 24,
+    'backbone.layer2.1.conv.0.conv_(0, 144)_144': 85,
+    'backbone.layer3.0.conv.0.conv_(0, 144)_144': 115,
+    'backbone.layer3.0.conv.2.conv_(0, 32)_32': 32,
+    'backbone.layer3.1.conv.0.conv_(0, 192)_192': 102,
+    'backbone.layer3.2.conv.0.conv_(0, 192)_192': 95,
+    'backbone.layer4.0.conv.0.conv_(0, 192)_192': 181,
+    'backbone.layer4.0.conv.2.conv_(0, 64)_64': 64,
+    'backbone.layer4.1.conv.0.conv_(0, 384)_384': 169,
+    'backbone.layer4.2.conv.0.conv_(0, 384)_384': 176,
+    'backbone.layer4.3.conv.0.conv_(0, 384)_384': 180,
+    'backbone.layer5.0.conv.0.conv_(0, 384)_384': 308,
+    'backbone.layer5.0.conv.2.conv_(0, 96)_96': 96,
+    'backbone.layer5.1.conv.0.conv_(0, 576)_576': 223,
+    'backbone.layer5.2.conv.0.conv_(0, 576)_576': 241,
+    'backbone.layer6.0.conv.0.conv_(0, 576)_576': 511,
+    'backbone.layer6.0.conv.2.conv_(0, 160)_160': 160,
+    'backbone.layer6.1.conv.0.conv_(0, 960)_960': 467,
+    'backbone.layer6.2.conv.0.conv_(0, 960)_960': 510,
+    'backbone.layer7.0.conv.0.conv_(0, 960)_960': 771,
+    'backbone.layer7.0.conv.2.conv_(0, 320)_320': 320
+}
+divisor = 16
+
+##############################################################################
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..18c9a99f1292f155c08bdf64a62a4881c64541c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,32 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.pth'  # noqa
+finetune_lr = 0.045
+##############################################################################
+
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..65a1fdd202af77707dde7ecaca378e054e453442
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py
@@ -0,0 +1,5 @@
+_base_ = './group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py'
+model = dict(
+    mutator=dict(
+        channel_unit_cfg=dict(
+            default_args=dict(normalization_type='flops', ), ), ), )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..24f41eaae811c2082dbfdc935b2d77906f5c35ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/metafile.yml
@@ -0,0 +1,19 @@
+Models:
+  - Name: group_fisher_act_finetune_mobilenet-v2_8xb32_in1k
+    In Collection: GroupFisher
+    Config: configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.82
+  - Name: group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k
+    In Collection: GroupFisher
+    Config: configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.87
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/script.sh b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/script.sh
new file mode 100644
index 0000000000000000000000000000000000000000..5281e5ac81c5b7a68c7cb232ce56c66b1bd3e54a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/mobilenet/script.sh
@@ -0,0 +1,47 @@
+# act mode
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_prune_mobilenet-v2_8xb32_in1k.py 8
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.py 8
+
+# flops mode
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_prune_mobilenet-v2_8xb32_in1k.py 8
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.py 8
+
+
+# deploy act mode
+razor_config=configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_act_deploy_mobilenet-v2_8xb32_in1k.py
+deploy_config=mmdeploy/configs/mmcls/classification_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/act/group_fisher_act_finetune_mobilenet-v2_8xb32_in1k.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 224x224 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
+
+# deploy flop mode
+
+razor_config=configs/pruning/mmcls/group_fisher/mobilenet/group_fisher_flops_deploy_mobilenet-v2_8xb32_in1k.py
+deploy_config=mmdeploy/configs/mmcls/classification_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/mobilenet/flop/group_fisher_flops_finetune_mobilenet-v2_8xb32_in1k.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 224x224 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_deploy_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_deploy_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..8fcb4082a1460d50ea1ea83c31ee6c6d47f13b7d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_deploy_resnet50_8xb32_in1k.py
@@ -0,0 +1,61 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmcls::resnet/resnet50_8xb32_in1k.py'
+fix_subnet = {
+    'backbone.conv1_(0, 64)_64': 61,
+    'backbone.layer1.0.conv1_(0, 64)_64': 27,
+    'backbone.layer1.0.conv2_(0, 64)_64': 35,
+    'backbone.layer1.0.conv3_(0, 256)_256': 241,
+    'backbone.layer1.1.conv1_(0, 64)_64': 32,
+    'backbone.layer1.1.conv2_(0, 64)_64': 29,
+    'backbone.layer1.2.conv1_(0, 64)_64': 27,
+    'backbone.layer1.2.conv2_(0, 64)_64': 42,
+    'backbone.layer2.0.conv1_(0, 128)_128': 87,
+    'backbone.layer2.0.conv2_(0, 128)_128': 107,
+    'backbone.layer2.0.conv3_(0, 512)_512': 512,
+    'backbone.layer2.1.conv1_(0, 128)_128': 44,
+    'backbone.layer2.1.conv2_(0, 128)_128': 50,
+    'backbone.layer2.2.conv1_(0, 128)_128': 52,
+    'backbone.layer2.2.conv2_(0, 128)_128': 81,
+    'backbone.layer2.3.conv1_(0, 128)_128': 47,
+    'backbone.layer2.3.conv2_(0, 128)_128': 50,
+    'backbone.layer3.0.conv1_(0, 256)_256': 210,
+    'backbone.layer3.0.conv2_(0, 256)_256': 206,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': 1024,
+    'backbone.layer3.1.conv1_(0, 256)_256': 107,
+    'backbone.layer3.1.conv2_(0, 256)_256': 108,
+    'backbone.layer3.2.conv1_(0, 256)_256': 86,
+    'backbone.layer3.2.conv2_(0, 256)_256': 126,
+    'backbone.layer3.3.conv1_(0, 256)_256': 91,
+    'backbone.layer3.3.conv2_(0, 256)_256': 112,
+    'backbone.layer3.4.conv1_(0, 256)_256': 98,
+    'backbone.layer3.4.conv2_(0, 256)_256': 110,
+    'backbone.layer3.5.conv1_(0, 256)_256': 112,
+    'backbone.layer3.5.conv2_(0, 256)_256': 115,
+    'backbone.layer4.0.conv1_(0, 512)_512': 397,
+    'backbone.layer4.0.conv2_(0, 512)_512': 427,
+    'backbone.layer4.1.conv1_(0, 512)_512': 373,
+    'backbone.layer4.1.conv2_(0, 512)_512': 348,
+    'backbone.layer4.2.conv1_(0, 512)_512': 433,
+    'backbone.layer4.2.conv2_(0, 512)_512': 384
+}
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d1f9380c5e328cbf71595f331c9f4b540916430
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py
@@ -0,0 +1,31 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_act_prune_resnet50_8xb32_in1k.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_prune_resnet50_8xb32_in1k.pth'  # noqa
+finetune_lr = 0.1
+##############################################################################
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.py
new file mode 100644
index 0000000000000000000000000000000000000000..356c8993796da7b50cfbf3bb7b6dd14bb53c4ac0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k_dist.py
@@ -0,0 +1,61 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_act_prune_resnet50_8xb32_in1k.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_prune_resnet50_8xb32_in1k.pth'  # noqa
+finetune_lr = 0.1
+##############################################################################
+
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+teacher = algorithm.architecture
+
+pruned = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+model = dict(
+    _scope_='mmrazor',
+    _delete_=True,
+    type='SingleTeacherDistill',
+    data_preprocessor=None,
+    architecture=pruned,
+    teacher=teacher,
+    distiller=dict(
+        type='ConfigurableDistiller',
+        student_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        teacher_recorders=dict(
+            fc=dict(type='ModuleOutputs', source='head.fc')),
+        distill_losses=dict(
+            loss_kl=dict(type='DISTLoss', tau=1, loss_weight=1)),
+        loss_forward_mappings=dict(
+            loss_kl=dict(
+                preds_S=dict(from_student=True, recorder='fc'),
+                preds_T=dict(from_student=False, recorder='fc')))))
+
+find_unused_parameters = True
+val_cfg = dict(_delete_=True, type='mmrazor.SingleTeacherDistillValLoop')
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
+
+_base_ = './resnet_group_fisher_prune.py'
+
+# 76.3520
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_prune_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_prune_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..454c0cc8e6da3d7f46feccdbcf92ea7d081e7266
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_prune_resnet50_8xb32_in1k.py
@@ -0,0 +1,75 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = 'mmcls::resnet/resnet50_8xb32_in1k.py'
+pretrained_path = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa
+
+interval = 25
+normalization_type = 'act'
+lr_ratio = 0.04
+
+target_flop_ratio = 0.5
+input_shape = [1, 3, 224, 224]
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = {}
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_deploy_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_deploy_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d28a0ab510ab8d9b4beaea558ac3f67e0e05f42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_deploy_resnet50_8xb32_in1k.py
@@ -0,0 +1,61 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmcls::resnet/resnet50_8xb32_in1k.py'
+fix_subnet = {
+    'backbone.conv1_(0, 64)_64': 61,
+    'backbone.layer1.0.conv1_(0, 64)_64': 28,
+    'backbone.layer1.0.conv2_(0, 64)_64': 35,
+    'backbone.layer1.0.conv3_(0, 256)_256': 242,
+    'backbone.layer1.1.conv1_(0, 64)_64': 31,
+    'backbone.layer1.1.conv2_(0, 64)_64': 28,
+    'backbone.layer1.2.conv1_(0, 64)_64': 26,
+    'backbone.layer1.2.conv2_(0, 64)_64': 41,
+    'backbone.layer2.0.conv1_(0, 128)_128': 90,
+    'backbone.layer2.0.conv2_(0, 128)_128': 107,
+    'backbone.layer2.0.conv3_(0, 512)_512': 509,
+    'backbone.layer2.1.conv1_(0, 128)_128': 42,
+    'backbone.layer2.1.conv2_(0, 128)_128': 50,
+    'backbone.layer2.2.conv1_(0, 128)_128': 51,
+    'backbone.layer2.2.conv2_(0, 128)_128': 84,
+    'backbone.layer2.3.conv1_(0, 128)_128': 49,
+    'backbone.layer2.3.conv2_(0, 128)_128': 51,
+    'backbone.layer3.0.conv1_(0, 256)_256': 210,
+    'backbone.layer3.0.conv2_(0, 256)_256': 207,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': 1024,
+    'backbone.layer3.1.conv1_(0, 256)_256': 103,
+    'backbone.layer3.1.conv2_(0, 256)_256': 108,
+    'backbone.layer3.2.conv1_(0, 256)_256': 90,
+    'backbone.layer3.2.conv2_(0, 256)_256': 124,
+    'backbone.layer3.3.conv1_(0, 256)_256': 94,
+    'backbone.layer3.3.conv2_(0, 256)_256': 114,
+    'backbone.layer3.4.conv1_(0, 256)_256': 99,
+    'backbone.layer3.4.conv2_(0, 256)_256': 111,
+    'backbone.layer3.5.conv1_(0, 256)_256': 108,
+    'backbone.layer3.5.conv2_(0, 256)_256': 111,
+    'backbone.layer4.0.conv1_(0, 512)_512': 400,
+    'backbone.layer4.0.conv2_(0, 512)_512': 421,
+    'backbone.layer4.1.conv1_(0, 512)_512': 377,
+    'backbone.layer4.1.conv2_(0, 512)_512': 347,
+    'backbone.layer4.2.conv1_(0, 512)_512': 443,
+    'backbone.layer4.2.conv2_(0, 512)_512': 376
+}
+divisor = 16
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..b05be267607cc453ac155534cff4d4227c57d74d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py
@@ -0,0 +1,31 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_flops_prune_resnet50_8xb32_in1k.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/group_fisher_flops_prune_resnet50_8xb32_in1k.pth'  # noqa
+finetune_lr = 0.1
+##############################################################################
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_prune_resnet50_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_prune_resnet50_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..06b90bda07a121c73b701e1a08c8107da29da0ef
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_prune_resnet50_8xb32_in1k.py
@@ -0,0 +1,5 @@
+_base_ = './group_fisher_act_prune_resnet50_8xb32_in1k.py'
+model = dict(
+    mutator=dict(
+        channel_unit_cfg=dict(
+            default_args=dict(normalization_type='flops', ), ), ), )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..fd670a3c26eae09d398f602fe787309ec8210cc0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/metafile.yml
@@ -0,0 +1,19 @@
+Models:
+  - Name: group_fisher_act_finetune_resnet50_8xb32_in1k
+    In Collection: GroupFisher
+    Config: configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_finetune_resnet50_8xb32_in1k.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 75.22
+  - Name: group_fisher_flops_finetune_resnet50_8xb32_in1k
+    In Collection: GroupFisher
+    Config: configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/group_fisher_flops_finetune_resnet50_8xb32_in1k.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 75.61
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/script.sh b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/script.sh
new file mode 100644
index 0000000000000000000000000000000000000000..59ec7c5677fb5b10bc6138dfd0383ea85954daca
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/group_fisher/resnet50/script.sh
@@ -0,0 +1,49 @@
+# act mode
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_prune_resnet50_8xb32_in1k.py.py 8
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_finetune_resnet50_8xb32_in1k.py.py 8
+
+# flops mode
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_prune_resnet50_8xb32_in1k.py.py 8
+bash ./tools/dist_train.sh configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_finetune_resnet50_8xb32_in1k.py 8
+
+
+# deploy act mode
+
+razor_config=configs/pruning/mmcls/group_fisher/resnet50/group_fisher_act_deploy_resnet50_8xb32_in1k.py
+deploy_config=mmdeploy/configs/mmcls/classification_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/act/group_fisher_act_finetune_resnet50_8xb32_in1k.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 224x224 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
+
+
+# deploy flops mode
+
+razor_config=configs/pruning/mmcls/group_fisher/resnet50/group_fisher_flops_deploy_resnet50_8xb32_in1k.py
+deploy_config=mmdeploy/configs/mmcls/classification_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/resnet50/flops/group_fisher_flops_finetune_resnet50_8xb32_in1k.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 224x224 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2b6509298610b823bbbc4df8311d1fc62fdc8d42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/README.md
@@ -0,0 +1,61 @@
+# L1-norm pruning
+
+> [Pruning Filters for Efficient ConvNets.](https://arxiv.org/pdf/1608.08710.pdf)
+
+<!-- [ALGORITHM] -->
+
+## Implementation
+
+L1-norm pruning is a classical filter pruning algorithm. It prunes filers(channels) according to the l1-norm of the weight of a conv layer.
+
+We use ItePruneAlgorithm and L1MutableChannelUnit to implement l1-norm pruning. Please refer to [Pruning User Guide](../../../../docs/en/user_guides/pruning_user_guide.md) for more configuration detail.
+
+| Model             | Top-1 | Gap   | Flop(G) | Pruned | Parameters | Pruned | Config                                                                                                 | Download                                                                                                                                                                                                                                                |
+| ----------------- | ----- | ----- | ------- | ------ | ---------- | ------ | ------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ResNet34          | 73.62 | -     | 3.68    | -      | 2.18       | -      | [mmcls](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet34_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.log.json)                         |
+| ResNet34_Pruned_A | 73.61 | -0.01 | 3.10    | 15.8%  | 2.01       | 7.8%   | [config](./l1-norm_resnet34_8xb32_in1k_a.py)                                                           | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_a.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_a.json) |
+| ResNet34_Pruned_B | 73.20 | -0.42 | 2.79    | 24.2%  | 1.95       | 10.6%  | [config](./l1-norm_resnet34_8xb32_in1k_a.py)                                                           | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_b.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_b.json) |
+| ResNet34_Pruned_C | 73.89 | +0.27 | 3.40    | 7.6%   | 2.02       | 7.3%   | [config](./l1-norm_resnet34_8xb32_in1k_a.py)                                                           | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_c.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_c.json) |
+
+**Note:** There is a different implementation from the original paper. We pruned the layers related to the shortcut with a shared pruning decision, while the original paper pruned them separately in *Pruned C*. This may be why our *Pruned C* outperforms *Prune A* and *Prune B*, while *Pruned C* is worst in the original paper.
+
+## Getting Started
+
+### Prune
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh \
+  {prune_config_path}.py 8 --work-dir $WORK_DIR
+```
+
+after the pruning process, you can get a checkpoint file in the work_dir. This checkpoint file including all parameters of the original model. In the next step, we will use the checkpoint to export a pruned checkpoint.
+
+### Get the pruned model
+
+```bash
+python ./tools/pruning/get_static_model_from_algorithm.py \
+  {prune_config_path}.py \
+  {checkpoint_file}.pth \
+  --o {output_folder}
+```
+
+This step will export a pruned checkpoint and a json file which records the pruning structure. This two file will be used to deploy the pruned model.
+
+### Deploy
+
+For a pruned model, you only need to use the pruning deploy config instead of the pretrain config to deploy the pruned version of your model. If you are not fimilar with MMDeploy, please refer to [mmdeploy](https://github.com/open-mmlab/mmdeploy/tree/1.x).
+
+```bash
+python {mmdeploy}/tools/deploy.py \
+    {mmdeploy}/{mmdeploy_config}.py \
+    {pruning_deploy_config}.py \
+    {pruned_checkpoint}.pth \
+    {mmdeploy}/tests/data/tiger.jpeg
+```
+
+### Get the Flops and Parameters of a Pruned Model
+
+```bash
+python ./tools/pruning/get_flops.py \
+    {pruning_deploy_config}.py
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a.py
new file mode 100644
index 0000000000000000000000000000000000000000..25a92fd36163061908373bffbaa8af7851d1eaf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a.py
@@ -0,0 +1,60 @@
+_base_ = ['mmcls::resnet/resnet34_8xb32_in1k.py']
+
+un_prune = 1.0
+stage_ratio_1 = 0.7
+stage_ratio_2 = 0.7
+stage_ratio_3 = 0.7
+stage_ratio_4 = un_prune
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': un_prune,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4
+}
+data_preprocessor = {'type': 'mmcls.ClsDataPreprocessor'}
+architecture = _base_.model
+architecture.update({
+    'init_cfg': {
+        'type':
+        'Pretrained',
+        'checkpoint':
+        'https://download.openmmlab.com/mmclassification/v0/resnet/resnet34_8xb32_in1k_20210831-f257d4e6.pth'  # noqa
+    }
+})
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='ItePruneAlgorithm',
+    architecture=architecture,
+    mutator_cfg=dict(
+        type='ChannelMutator',
+        channel_unit_cfg=dict(
+            type='L1MutableChannelUnit',
+            default_args=dict(choice_mode='ratio'))),
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=1,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a_deploy.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a_deploy.py
new file mode 100644
index 0000000000000000000000000000000000000000..c754d11fce558d922263e634f2a5d173fef64991
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a_deploy.py
@@ -0,0 +1,57 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = ['mmcls::resnet/resnet34_8xb32_in1k.py']
+un_prune = 1.0
+stage_ratio_1 = 0.7
+stage_ratio_2 = 0.7
+stage_ratio_3 = 0.7
+stage_ratio_4 = un_prune
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+fix_subnet = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': un_prune,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4
+}
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f829e7dfc5cf82215b68f9ac84d9f0896dcde92
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b.py
@@ -0,0 +1,38 @@
+_base_ = ['./l1-norm_resnet34_8xb32_in1k_a.py']
+
+un_prune = 1.0
+stage_ratio_1 = 0.5
+stage_ratio_2 = 0.4
+stage_ratio_3 = 0.6
+stage_ratio_4 = un_prune
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': un_prune,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4
+}
+
+model = dict(target_pruning_ratio=target_pruning_ratio, )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b_deploy.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b_deploy.py
new file mode 100644
index 0000000000000000000000000000000000000000..636ff0766fdf7e7e9836dff20a10bdbc5f0e21e7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b_deploy.py
@@ -0,0 +1,57 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = ['mmcls::resnet/resnet34_8xb32_in1k.py']
+
+un_prune = 1.0
+stage_ratio_1 = 0.5
+stage_ratio_2 = 0.4
+stage_ratio_3 = 0.6
+stage_ratio_4 = un_prune
+
+fix_subnet = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': un_prune,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_3,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4
+}
+
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c.py
new file mode 100644
index 0000000000000000000000000000000000000000..d597471a3136b3f172b8b6511f1c3e7ed5e9c786
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c.py
@@ -0,0 +1,34 @@
+_base_ = ['./l1-norm_resnet34_8xb32_in1k_a.py']
+
+un_prune = 1.0
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': un_prune,
+    'backbone.layer1.1.conv1_(0, 64)_64': un_prune,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.2.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': 0.8,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.2.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.3.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.4.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': un_prune,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': un_prune,
+    'backbone.layer4.2.conv1_(0, 512)_512': un_prune
+}
+
+model = dict(target_pruning_ratio=target_pruning_ratio, )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c_deploy.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c_deploy.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c7a42e12ee8f43dc71dea4978dfee01ec86026f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c_deploy.py
@@ -0,0 +1,54 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = ['mmcls::resnet/resnet34_8xb32_in1k.py']
+un_prune = 1.0
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/get_channel_units.py {config_file} --choice
+fix_subnet = {
+    # stage 1
+    'backbone.conv1_(0, 64)_64': un_prune,  # short cut layers
+    'backbone.layer1.0.conv1_(0, 64)_64': un_prune,
+    'backbone.layer1.1.conv1_(0, 64)_64': un_prune,
+    'backbone.layer1.2.conv1_(0, 64)_64': un_prune,
+    # stage 2
+    'backbone.layer2.0.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.0.conv2_(0, 128)_128': un_prune,  # short cut layers
+    'backbone.layer2.1.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.2.conv1_(0, 128)_128': un_prune,
+    'backbone.layer2.3.conv1_(0, 128)_128': un_prune,
+    # stage 3
+    'backbone.layer3.0.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.0.conv2_(0, 256)_256': 0.8,  # short cut layers
+    'backbone.layer3.1.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.2.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.3.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.4.conv1_(0, 256)_256': un_prune,
+    'backbone.layer3.5.conv1_(0, 256)_256': un_prune,
+    # stage 4
+    'backbone.layer4.0.conv1_(0, 512)_512': un_prune,
+    'backbone.layer4.0.conv2_(0, 512)_512': un_prune,  # short cut layers
+    'backbone.layer4.1.conv1_(0, 512)_512': un_prune,
+    'backbone.layer4.2.conv1_(0, 512)_512': un_prune
+}
+
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..3009fdc25b47de5eecc9c2c088eaca93d1d19e6d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/metafile.yml
@@ -0,0 +1,28 @@
+Models:
+  - Name: l1-norm_resnet34_8xb32_in1k_a
+    In Collection: L1-norm
+    Config: configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a.py
+    Weights: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_a.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 73.61
+  - Name: l1-norm_resnet34_8xb32_in1k_b
+    In Collection: L1-norm
+    Config: configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_b.py
+    Weights: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_b.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 73.20
+  - Name: l1-norm_resnet34_8xb32_in1k_c
+    In Collection: L1-norm
+    Config: configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_c.py
+    Weights: https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_c.pth
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 73.89
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/script.sh b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/script.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2bc1e92744752204464415d3447c4c97361f4fb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmcls/l1-norm/script.sh
@@ -0,0 +1,25 @@
+
+# export pruned checkpoint example
+
+python ./tools/pruning/get_static_model_from_algorithm.py configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a.py https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/pruning/l1-norm/l1-norm_resnet34_8xb32_in1k_a.pth -o ./work_dirs/norm_resnet34_8xb32_in1k_a
+
+# deploy example
+
+razor_config=configs/pruning/mmcls/l1-norm/l1-norm_resnet34_8xb32_in1k_a_deploy.py
+deploy_config=mmdeploy/configs/mmcls/classification_onnxruntime_dynamic.py
+static_model_checkpoint_path=path/to/pruend/checkpoint
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    $static_model_checkpoint_path \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 224x224 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c156e19f5b4c7024c94d463161656d1e473ccc8e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/README.md
@@ -0,0 +1,82 @@
+# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+
+## Abstract
+
+The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012.
+
+![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg)
+
+## Results and models
+
+### 1. Classification
+
+| Dataset  |   Backbone   | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) |                     CPrate                      |                        Config                        |           Download           |
+| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: |
+| ImageNet | DCFFResNet50 |   15.16   |   2260   |  step   |   73.96   |   91.66   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) |
+
+### 2. Detection
+
+| Dataset |   Method    |   Backbone   |  Style  | Lr schd | Params(M) | FLOPs(M) | bbox AP |                     CPrate                      |                              Config                               |           Download           |
+| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | Faster_RCNN | DCFFResNet50 | pytorch |  step   |   33.31   |  168320  |  35.8   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+### 3. Segmentation
+
+|  Dataset   |  Method   |    Backbone     | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU  |                               CPrate                                |                                Config                                 |           Download           |
+| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: |
+| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024  |  160k   |   18.43   |  74410   | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) |
+
+### 4. Pose
+
+| Dataset |     Method      |   Backbone   | crop size | total epochs | Params(M) | FLOPs(M) |  AP  |                           CPrate                           |                              Config                               |           Download           |
+| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | TopDown HeatMap | DCFFResNet50 |  256x192  |     300      |   26.95   |   4290   | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+## Citation
+
+```latex
+@article{lin2021training,
+  title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion},
+  author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi},
+  journal={arXiv preprint arXiv:2107.06916},
+  year={2021}
+}
+```
+
+## Get Started
+
+### Generate channel_config file
+
+Generate `resnet_det.json` with `tools/pruning/get_channel_units.py`.
+
+```bash
+python tools/pruning/get_channel_units.py
+  configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py \
+  -c -i --output-path=configs/pruning/mmcls/dcff/resnet_det.json
+```
+
+Then set layers' pruning rates `target_pruning_ratio` by `resnet_det.json`.
+
+### Train DCFF
+
+#### Detection
+
+##### COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Test DCFF
+
+#### Detection
+
+##### COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh \
+  configs/pruning/mmdet/dcff/dcff_compact_faster_rcnn_resnet50_8xb4_coco.py \
+  $CKPT 1 --work-dir $WORK_DIR
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_compact_faster_rcnn_resnet50_8xb4_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_compact_faster_rcnn_resnet50_8xb4_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a2db5c11847c6636858e3ccb59e259dc52250a2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_compact_faster_rcnn_resnet50_8xb4_coco.py
@@ -0,0 +1,12 @@
+_base_ = ['dcff_faster_rcnn_resnet50_8xb4_coco.py']
+
+# model settings
+_base_.model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.architecture,
+    fix_subnet='configs/pruning/mmdet/dcff/fix_subnet.json',
+    mode='mutator',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint='configs/pruning/mmdet/dcff/fix_subnet_weight.pth'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d51677b8392d44a041300cce9500d2e4b387dd3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py
@@ -0,0 +1,87 @@
+_base_ = [
+    './dcff_faster_rcnn_resnet50_fpn.py',
+    'mmdet::_base_/datasets/coco_detection.py',
+    'mmdet::_base_/schedules/schedule_2x.py',
+    'mmdet::_base_/default_runtime.py'
+]
+
+stage_ratio_1 = 0.65
+stage_ratio_2 = 0.6
+stage_ratio_3 = 0.9
+stage_ratio_4 = 0.7
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/pruning/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+    # block 1 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+    # block 2 [0.65, 0.6] downsample=[0.9]
+    'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+    # block 3 [0.65, 0.6]*2+[0.7, 0.7]*2 downsample=[0.9]
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+    # block 4 [0.7, 0.7] downsample=[0.9]
+}
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.04, momentum=0.9, weight_decay=0.0001))
+param_scheduler = dict(
+    type='MultiStepLR',
+    by_epoch=True,
+    milestones=[60, 80, 95],
+    gamma=0.1,
+    _delete_=True)
+train_cfg = dict(max_epochs=120, val_interval=1)
+
+model = dict(
+    _scope_='mmrazor',
+    type='DCFF',
+    architecture=_base_.architecture,
+    mutator_cfg=dict(
+        type='DCFFChannelMutator',
+        channel_unit_cfg=dict(
+            type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='FxTracer')),
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=1,
+    linear_schedule=False)
+
+model_wrapper = dict(
+    type='mmcv.MMDistributedDataParallel', find_unused_parameters=True)
+
+val_cfg = dict(_delete_=True, type='mmrazor.ItePruneValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_fpn.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_fpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ce54033803778a1ddbe5bbc8bbf91f50c98827c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/dcff_faster_rcnn_resnet50_fpn.py
@@ -0,0 +1,114 @@
+# architecture settings
+architecture = dict(
+    _scope_='mmdet',
+    type='FasterRCNN',
+    data_preprocessor=dict(
+        type='DetDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True,
+        pad_size_divisor=32),
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_eval=True,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=5),
+    rpn_head=dict(
+        type='RPNHead',
+        in_channels=256,
+        feat_channels=256,
+        anchor_generator=dict(
+            type='AnchorGenerator',
+            scales=[8],
+            ratios=[0.5, 1.0, 2.0],
+            strides=[4, 8, 16, 32, 64]),
+        bbox_coder=dict(
+            type='DeltaXYWHBBoxCoder',
+            target_means=[.0, .0, .0, .0],
+            target_stds=[1.0, 1.0, 1.0, 1.0]),
+        loss_cls=dict(
+            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
+        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
+    roi_head=dict(
+        type='StandardRoIHead',
+        bbox_roi_extractor=dict(
+            type='SingleRoIExtractor',
+            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
+            out_channels=256,
+            featmap_strides=[4, 8, 16, 32]),
+        bbox_head=dict(
+            type='Shared2FCBBoxHead',
+            in_channels=256,
+            fc_out_channels=1024,
+            roi_feat_size=7,
+            num_classes=80,
+            bbox_coder=dict(
+                type='DeltaXYWHBBoxCoder',
+                target_means=[0., 0., 0., 0.],
+                target_stds=[0.1, 0.1, 0.2, 0.2]),
+            reg_class_agnostic=False,
+            loss_cls=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
+            loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.7,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                match_low_quality=True,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=256,
+                pos_fraction=0.5,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False),
+            allowed_border=-1,
+            pos_weight=-1,
+            debug=False),
+        rpn_proposal=dict(
+            nms_pre=2000,
+            max_per_img=1000,
+            nms=dict(type='nms', iou_threshold=0.7),
+            min_bbox_size=0),
+        rcnn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.5,
+                min_pos_iou=0.5,
+                match_low_quality=False,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=512,
+                pos_fraction=0.25,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=True),
+            pos_weight=-1,
+            debug=False)),
+    test_cfg=dict(
+        rpn=dict(
+            nms_pre=1000,
+            max_per_img=1000,
+            nms=dict(type='nms', iou_threshold=0.7),
+            min_bbox_size=0),
+        rcnn=dict(
+            score_thr=0.05,
+            nms=dict(type='nms', iou_threshold=0.5),
+            max_per_img=100)
+        # soft-nms is also supported for rcnn testing
+        # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
+    ))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/fix_subnet.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/fix_subnet.json
new file mode 100644
index 0000000000000000000000000000000000000000..dfdcea75873d2516af6db193343f0b4df7376bb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/dcff/fix_subnet.json
@@ -0,0 +1,141 @@
+{
+    "type":"DCFFChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DCFFChannelUnit",
+        "default_args":{
+            "choice_mode":"ratio"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":1.0
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59375
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59765625
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9b3b09936e3fa962e1cfdd8f74d6d9576900f9dc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/README.md
@@ -0,0 +1,11 @@
+# Group_fisher pruning
+
+> [Group Fisher Pruning for Practical Network Compression.](https://arxiv.org/pdf/2108.00708.pdf)
+
+## Abstract
+
+Network compression has been widely studied since it is able to reduce the memory and computation cost during inference. However, previous methods seldom deal with complicated structures like residual connections, group/depthwise convolution and feature pyramid network, where channels of multiple layers are coupled and need to be pruned simultaneously. In this paper, we present a general channel pruning approach that can be applied to various complicated structures. Particularly, we propose a layer grouping algorithm to find coupled channels automatically. Then we derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Moreover, we find that inference speedup on GPUs is more correlated with the reduction of memory rather than FLOPs, and thus we employ the memory reduction of each channel to normalize the importance. Our method can be used to prune any structures including those with coupled channels. We conduct extensive experiments on various backbones, including the classic ResNet and ResNeXt, mobilefriendly MobileNetV2, and the NAS-based RegNet, both on image classification and object detection which is under-explored. Experimental results validate that our method can effectively prune sophisticated networks, boosting inference speed without sacrificing accuracy.
+
+![pipeline](https://github.com/jshilong/FisherPruning/blob/main/resources/structures.png?raw=true)
+
+**Please refer to the [full README](../../base/group_fisher/README.md) for more details.**
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_deploy_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_deploy_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..1146fc78ab603b10ea74ad5425cd0e86bddb4dd2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_deploy_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,73 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py'
+fix_subnet = {
+    'backbone.conv1_(0, 64)_64': 60,
+    'backbone.layer1.0.conv1_(0, 64)_64': 48,
+    'backbone.layer1.0.conv2_(0, 64)_64': 44,
+    'backbone.layer1.0.conv3_(0, 256)_256': 250,
+    'backbone.layer1.1.conv1_(0, 64)_64': 40,
+    'backbone.layer1.1.conv2_(0, 64)_64': 41,
+    'backbone.layer1.2.conv1_(0, 64)_64': 48,
+    'backbone.layer1.2.conv2_(0, 64)_64': 62,
+    'backbone.layer2.0.conv1_(0, 128)_128': 115,
+    'backbone.layer2.0.conv2_(0, 128)_128': 127,
+    'backbone.layer2.0.conv3_(0, 512)_512': 511,
+    'backbone.layer2.1.conv1_(0, 128)_128': 69,
+    'backbone.layer2.1.conv2_(0, 128)_128': 83,
+    'backbone.layer2.2.conv1_(0, 128)_128': 111,
+    'backbone.layer2.2.conv2_(0, 128)_128': 121,
+    'backbone.layer2.3.conv1_(0, 128)_128': 122,
+    'backbone.layer2.3.conv2_(0, 128)_128': 128,
+    'backbone.layer3.0.conv1_(0, 256)_256': 255,
+    'backbone.layer3.0.conv2_(0, 256)_256': 256,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': 1024,
+    'backbone.layer3.1.conv1_(0, 256)_256': 216,
+    'backbone.layer3.1.conv2_(0, 256)_256': 223,
+    'backbone.layer3.2.conv1_(0, 256)_256': 229,
+    'backbone.layer3.2.conv2_(0, 256)_256': 247,
+    'backbone.layer3.3.conv1_(0, 256)_256': 239,
+    'backbone.layer3.3.conv2_(0, 256)_256': 246,
+    'backbone.layer3.4.conv1_(0, 256)_256': 237,
+    'backbone.layer3.4.conv2_(0, 256)_256': 239,
+    'backbone.layer3.5.conv1_(0, 256)_256': 233,
+    'backbone.layer3.5.conv2_(0, 256)_256': 221,
+    'backbone.layer4.0.conv1_(0, 512)_512': 499,
+    'backbone.layer4.0.conv2_(0, 512)_512': 494,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': 2031,
+    'backbone.layer4.1.conv1_(0, 512)_512': 451,
+    'backbone.layer4.1.conv2_(0, 512)_512': 401,
+    'backbone.layer4.2.conv1_(0, 512)_512': 396,
+    'backbone.layer4.2.conv2_(0, 512)_512': 237,
+    'neck.lateral_convs.0.conv_(0, 256)_256': 237,
+    'neck.fpn_convs.0.conv_(0, 256)_256': 241,
+    'bbox_head.cls_convs.0.conv_(0, 256)_256': 133,
+    'bbox_head.cls_convs.1.conv_(0, 256)_256': 134,
+    'bbox_head.cls_convs.2.conv_(0, 256)_256': 139,
+    'bbox_head.cls_convs.3.conv_(0, 256)_256': 79,
+    'bbox_head.reg_convs.0.conv_(0, 256)_256': 89,
+    'bbox_head.reg_convs.1.conv_(0, 256)_256': 92,
+    'bbox_head.reg_convs.2.conv_(0, 256)_256': 82,
+    'bbox_head.reg_convs.3.conv_(0, 256)_256': 117
+}
+divisor = 16
+
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0f7d08de2696daf204a1090c6920ef13ca59120
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,31 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/act/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.pth'  # noqa
+finetune_lr = 0.005
+##############################################################################
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..1bfb0708105e67ae2733d72be4c2211b5e8bb0dd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,76 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = 'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py'
+pretrained_path = 'https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'  # noqa
+
+interval = 10
+normalization_type = 'act'
+lr_ratio = 0.1
+
+target_flop_ratio = 0.5
+input_shape = (1, 3, 1333, 800)
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = {}
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+architecture.backbone.frozen_stages = -1
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_deploy_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_deploy_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..88446dbe80c0a221915a41ebbc914d36086fec96
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_deploy_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,73 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py'
+fix_subnet = {
+    'backbone.conv1_(0, 64)_64': 60,
+    'backbone.layer1.0.conv1_(0, 64)_64': 47,
+    'backbone.layer1.0.conv2_(0, 64)_64': 44,
+    'backbone.layer1.0.conv3_(0, 256)_256': 249,
+    'backbone.layer1.1.conv1_(0, 64)_64': 37,
+    'backbone.layer1.1.conv2_(0, 64)_64': 37,
+    'backbone.layer1.2.conv1_(0, 64)_64': 44,
+    'backbone.layer1.2.conv2_(0, 64)_64': 62,
+    'backbone.layer2.0.conv1_(0, 128)_128': 114,
+    'backbone.layer2.0.conv2_(0, 128)_128': 127,
+    'backbone.layer2.0.conv3_(0, 512)_512': 511,
+    'backbone.layer2.1.conv1_(0, 128)_128': 65,
+    'backbone.layer2.1.conv2_(0, 128)_128': 83,
+    'backbone.layer2.2.conv1_(0, 128)_128': 106,
+    'backbone.layer2.2.conv2_(0, 128)_128': 118,
+    'backbone.layer2.3.conv1_(0, 128)_128': 118,
+    'backbone.layer2.3.conv2_(0, 128)_128': 127,
+    'backbone.layer3.0.conv1_(0, 256)_256': 255,
+    'backbone.layer3.0.conv2_(0, 256)_256': 256,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': 1024,
+    'backbone.layer3.1.conv1_(0, 256)_256': 214,
+    'backbone.layer3.1.conv2_(0, 256)_256': 232,
+    'backbone.layer3.2.conv1_(0, 256)_256': 224,
+    'backbone.layer3.2.conv2_(0, 256)_256': 247,
+    'backbone.layer3.3.conv1_(0, 256)_256': 240,
+    'backbone.layer3.3.conv2_(0, 256)_256': 246,
+    'backbone.layer3.4.conv1_(0, 256)_256': 240,
+    'backbone.layer3.4.conv2_(0, 256)_256': 243,
+    'backbone.layer3.5.conv1_(0, 256)_256': 238,
+    'backbone.layer3.5.conv2_(0, 256)_256': 232,
+    'backbone.layer4.0.conv1_(0, 512)_512': 503,
+    'backbone.layer4.0.conv2_(0, 512)_512': 500,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': 2041,
+    'backbone.layer4.1.conv1_(0, 512)_512': 466,
+    'backbone.layer4.1.conv2_(0, 512)_512': 430,
+    'backbone.layer4.2.conv1_(0, 512)_512': 406,
+    'backbone.layer4.2.conv2_(0, 512)_512': 274,
+    'neck.lateral_convs.0.conv_(0, 256)_256': 236,
+    'neck.fpn_convs.0.conv_(0, 256)_256': 225,
+    'bbox_head.cls_convs.0.conv_(0, 256)_256': 140,
+    'bbox_head.cls_convs.1.conv_(0, 256)_256': 133,
+    'bbox_head.cls_convs.2.conv_(0, 256)_256': 139,
+    'bbox_head.cls_convs.3.conv_(0, 256)_256': 86,
+    'bbox_head.reg_convs.0.conv_(0, 256)_256': 89,
+    'bbox_head.reg_convs.1.conv_(0, 256)_256': 89,
+    'bbox_head.reg_convs.2.conv_(0, 256)_256': 76,
+    'bbox_head.reg_convs.3.conv_(0, 256)_256': 122,
+}
+divisor = 16
+
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d2d3a001da8389de3d42c52e7af04a797ef158a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,31 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.pth'  # noqa
+finetune_lr = 0.005
+##############################################################################
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..162db3bed8e7c77955091f34ea3be7fb90639d1b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py
@@ -0,0 +1,5 @@
+_base_ = './group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py'
+model = dict(
+    mutator=dict(
+        channel_unit_cfg=dict(
+            default_args=dict(normalization_type='flops', ), ), ), )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..232f9cb970887f92b7991435c1ecb0a5bf6dd2e2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/metafile.yml
@@ -0,0 +1,19 @@
+Models:
+  - Name: group_fisher_act_finetune_retinanet_r50_fpn_1x_coco
+    In Collection: GroupFisher
+    Config: configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/act/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 36.5
+  - Name: group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco
+    In Collection: GroupFisher
+    Config: configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.pth
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 36.6
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/script.sh b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/script.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b14243a4df9be550a5da92195f06bbd91024c457
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmdet/group_fisher/retinanet/script.sh
@@ -0,0 +1,49 @@
+# act mode
+bash ./tools/dist_train.sh configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_prune_retinanet_r50_fpn_1x_coco.py 8
+bash ./tools/dist_train.sh configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.py 8
+
+# flops mode
+bash ./tools/dist_train.sh configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_prune_retinanet_r50_fpn_1x_coco.py 8
+bash ./tools/dist_train.sh configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.py 8
+
+
+
+# deploy act mode
+
+razor_config=configs/pruning/mmdet/group_fisher/retinanet/group_fisher_act_deploy_retinanet_r50_fpn_1x_coco.py
+deploy_config=mmdeploy/configs/mmdet/detection/detection_onnxruntime_static.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/act/group_fisher_act_finetune_retinanet_r50_fpn_1x_coco.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 800x1248 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
+
+# deploy flop mode
+
+razor_config=configs/pruning/mmdet/group_fisher/retinanet/group_fisher_flops_deploy_retinanet_r50_fpn_1x_coco.py
+deploy_config=mmdeploy/configs/mmdet/detection/detection_onnxruntime_static.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/retinanet/flops/group_fisher_flops_finetune_retinanet_r50_fpn_1x_coco.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 800x1248 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f08efe4ffc63c21b1bfbcf4857b3d12526bd939b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/README.md
@@ -0,0 +1,82 @@
+# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+
+## Abstract
+
+The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012.
+
+![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg)
+
+## Results and models
+
+### 1. Classification
+
+| Dataset  |   Backbone   | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) |                     CPrate                      |                        Config                        |           Download           |
+| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: |
+| ImageNet | DCFFResNet50 |   15.16   |   2260   |  step   |   73.96   |   91.66   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) |
+
+### 2. Detection
+
+| Dataset |   Method    |   Backbone   |  Style  | Lr schd | Params(M) | FLOPs(M) | bbox AP |                     CPrate                      |                              Config                               |           Download           |
+| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | Faster_RCNN | DCFFResNet50 | pytorch |  step   |   33.31   |  168320  |  35.8   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+### 3. Segmentation
+
+|  Dataset   |  Method   |    Backbone     | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU  |                               CPrate                                |                                Config                                 |           Download           |
+| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: |
+| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024  |  160k   |   18.43   |  74410   | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) |
+
+### 4. Pose
+
+| Dataset |     Method      |   Backbone   | crop size | total epochs | Params(M) | FLOPs(M) |  AP  |                           CPrate                           |                              Config                               |           Download           |
+| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | TopDown HeatMap | DCFFResNet50 |  256x192  |     300      |   26.95   |   4290   | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+## Citation
+
+```latex
+@article{lin2021training,
+  title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion},
+  author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi},
+  journal={arXiv preprint arXiv:2107.06916},
+  year={2021}
+}
+```
+
+## Get Started
+
+### Generate channel_config file
+
+Generate `resnet_pose.json` with `tools/pruning/get_channel_units.py`.
+
+```bash
+python tools/pruning/get_channel_units.py
+  configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50.py \
+  -c -i --output-path=configs/pruning/mmpose/dcff/resnet_pose.json
+```
+
+Then set layers' pruning rates `target_pruning_ratio` by `resnet_pose.json`.
+
+### Train DCFF
+
+#### Pose
+
+##### COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Test DCFF
+
+#### Pose
+
+##### COCO
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_test.sh \
+  configs/pruning/mmpose/dcff/dcff_compact_topdown_heatmap_resnet50.py \
+  $CKPT 1 --work-dir $WORK_DIR
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_compact_topdown_heatmap_resnet50_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_compact_topdown_heatmap_resnet50_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba503237903559ca8beb66a47bc8b13d8a1781d5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_compact_topdown_heatmap_resnet50_coco.py
@@ -0,0 +1,12 @@
+_base_ = ['dcff_topdown_heatmap_resnet50_coco.py']
+
+# model settings
+_base_.model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.architecture,
+    fix_subnet='configs/pruning/mmpose/dcff/fix_subnet.json',
+    mode='mutator',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint='configs/pruning/mmpose/dcff/fix_subnet_weight.pth'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..d18bccc02767e297a19a2ed84052b46f25db2e45
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py
@@ -0,0 +1,188 @@
+_base_ = [
+    'mmpose::_base_/default_runtime.py',
+]
+train_cfg = dict(max_epochs=300, val_interval=10)
+
+optim_wrapper = dict(optimizer=dict(type='Adam', lr=5e-4), clip_grad=None)
+
+# learning policy
+param_scheduler = [
+    dict(
+        type='LinearLR', begin=0, end=500, start_factor=0.001,
+        by_epoch=False),  # warm-up
+    dict(
+        type='MultiStepLR',
+        begin=0,
+        end=300,
+        milestones=[170, 220, 280],
+        gamma=0.1,
+        by_epoch=True)
+]
+
+# automatically scaling LR based on the actual training batch size
+auto_scale_lr = dict(base_batch_size=512)
+
+# hooks
+default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater'))
+
+# codec settings
+codec = dict(
+    type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2)
+
+# model settings
+architecture = dict(
+    type='mmpose.TopdownPoseEstimator',
+    data_preprocessor=dict(
+        type='mmpose.PoseDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True),
+    backbone=dict(
+        type='mmpose.ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(3, ),
+    ),
+    head=dict(
+        type='mmpose.HeatmapHead',
+        in_channels=1843,
+        out_channels=17,
+        loss=dict(type='mmpose.KeypointMSELoss', use_target_weight=True),
+        decoder=codec),
+    test_cfg=dict(
+        flip_test=True,
+        flip_mode='heatmap',
+        shift_heatmap=True,
+    ))
+
+stage_ratio_1 = 0.8
+stage_ratio_2 = 0.8
+stage_ratio_3 = 0.9
+stage_ratio_4 = 0.85
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/pruning/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+    # block 1 [0.8, 0.8] downsample=[0.9]
+    'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+    # block 2 [0.8, 0.8] downsample=[0.9]
+    'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+    # block 3 [0.8, 0.8]*2+[0.8, 0.85]*2 downsample=[0.9]
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+    # block 4 [0.85, 0.85] downsample=[0.9]
+}
+
+model = dict(
+    _scope_='mmrazor',
+    type='DCFF',
+    architecture=architecture,
+    mutator_cfg=dict(
+        type='DCFFChannelMutator',
+        channel_unit_cfg=dict(
+            type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')),
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=1,
+    linear_schedule=False)
+
+dataset_type = 'CocoDataset'
+data_mode = 'topdown'
+data_root = 'data/coco/'
+
+file_client_args = dict(backend='disk')
+
+train_pipeline = [
+    dict(type='LoadImage', file_client_args=file_client_args),
+    dict(type='GetBBoxCenterScale'),
+    dict(type='RandomFlip', direction='horizontal'),
+    dict(type='RandomHalfBody'),
+    dict(type='RandomBBoxTransform'),
+    dict(type='TopdownAffine', input_size=codec['input_size']),
+    dict(type='GenerateTarget', target_type='heatmap', encoder=codec),
+    dict(type='PackPoseInputs')
+]
+
+test_pipeline = [
+    dict(type='LoadImage', file_client_args=file_client_args),
+    dict(type='GetBBoxCenterScale'),
+    dict(type='TopdownAffine', input_size=codec['input_size']),
+    dict(type='PackPoseInputs')
+]
+
+train_dataloader = dict(
+    batch_size=32,
+    num_workers=2,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_mode=data_mode,
+        ann_file='annotations/person_keypoints_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        pipeline=train_pipeline,
+    ))
+val_dataloader = dict(
+    batch_size=32,
+    num_workers=2,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        data_mode=data_mode,
+        ann_file='annotations/person_keypoints_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        test_mode=True,
+        bbox_file='data/coco/person_detection_results/'
+        'COCO_val2017_detections_AP_H_56_person.json',
+        pipeline=test_pipeline,
+    ))
+test_dataloader = val_dataloader
+
+model_wrapper = dict(
+    type='mmcv.MMDistributedDataParallel', find_unused_parameters=True)
+
+val_evaluator = dict(
+    type='mmpose.CocoMetric',
+    ann_file=data_root + 'annotations/person_keypoints_val2017.json')
+test_evaluator = val_evaluator
+
+val_cfg = dict(_delete_=True, type='mmrazor.ItePruneValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/fix_subnet.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/fix_subnet.json
new file mode 100644
index 0000000000000000000000000000000000000000..dfdcea75873d2516af6db193343f0b4df7376bb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/dcff/fix_subnet.json
@@ -0,0 +1,141 @@
+{
+    "type":"DCFFChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DCFFChannelUnit",
+        "default_args":{
+            "choice_mode":"ratio"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":1.0
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59375
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59765625
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_aic-coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_aic-coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..3c720566f0a0ed081aa4736b01569d79c8d9b5d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_aic-coco-256x192.py
@@ -0,0 +1,53 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmpose::body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py'  # noqa
+fix_subnet = {
+    'backbone.stem.0.conv_(0, 16)_16': 8,
+    'backbone.stem.1.conv_(0, 16)_16': 9,
+    'backbone.stem.2.conv_(0, 32)_32': 9,
+    'backbone.stage1.0.conv_(0, 64)_64': 32,
+    'backbone.stage1.1.short_conv.conv_(0, 32)_32': 30,
+    'backbone.stage1.1.main_conv.conv_(0, 32)_32': 29,
+    'backbone.stage1.1.blocks.0.conv1.conv_(0, 32)_32': 24,
+    'backbone.stage1.1.final_conv.conv_(0, 64)_64': 27,
+    'backbone.stage2.0.conv_(0, 128)_128': 62,
+    'backbone.stage2.1.short_conv.conv_(0, 64)_64': 63,
+    'backbone.stage2.1.main_conv.conv_(0, 64)_64': 64,
+    'backbone.stage2.1.blocks.0.conv1.conv_(0, 64)_64': 56,
+    'backbone.stage2.1.blocks.1.conv1.conv_(0, 64)_64': 62,
+    'backbone.stage2.1.final_conv.conv_(0, 128)_128': 65,
+    'backbone.stage3.0.conv_(0, 256)_256': 167,
+    'backbone.stage3.1.short_conv.conv_(0, 128)_128': 127,
+    'backbone.stage3.1.main_conv.conv_(0, 128)_128': 128,
+    'backbone.stage3.1.blocks.0.conv1.conv_(0, 128)_128': 124,
+    'backbone.stage3.1.blocks.1.conv1.conv_(0, 128)_128': 123,
+    'backbone.stage3.1.final_conv.conv_(0, 256)_256': 172,
+    'backbone.stage4.0.conv_(0, 512)_512': 337,
+    'backbone.stage4.1.conv1.conv_(0, 256)_256': 256,
+    'backbone.stage4.1.conv2.conv_(0, 512)_512': 379,
+    'backbone.stage4.2.short_conv.conv_(0, 256)_256': 188,
+    'backbone.stage4.2.main_conv.conv_(0, 256)_256': 227,
+    'backbone.stage4.2.blocks.0.conv1.conv_(0, 256)_256': 238,
+    'backbone.stage4.2.blocks.0.conv2.pointwise_conv.conv_(0, 256)_256': 195,
+    'backbone.stage4.2.final_conv.conv_(0, 512)_512': 163
+}
+divisor = 8
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..64fa6c2b6b6c9b7629171a19f96f7084dd713568
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_coco-256x192.py
@@ -0,0 +1,53 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_(str): The path to your pretrain config file.
+fix_subnet (Union[dict,str]): The dict store the pruning structure or the
+    json file including it.
+divisor (int): The divisor the make the channel number divisible.
+"""
+
+_base_ = 'mmpose::body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py'  # noqa
+fix_subnet = {
+    'backbone.stem.0.conv_(0, 16)_16': 8,
+    'backbone.stem.1.conv_(0, 16)_16': 10,
+    'backbone.stem.2.conv_(0, 32)_32': 11,
+    'backbone.stage1.0.conv_(0, 64)_64': 32,
+    'backbone.stage1.1.short_conv.conv_(0, 32)_32': 32,
+    'backbone.stage1.1.main_conv.conv_(0, 32)_32': 23,
+    'backbone.stage1.1.blocks.0.conv1.conv_(0, 32)_32': 25,
+    'backbone.stage1.1.final_conv.conv_(0, 64)_64': 25,
+    'backbone.stage2.0.conv_(0, 128)_128': 71,
+    'backbone.stage2.1.short_conv.conv_(0, 64)_64': 61,
+    'backbone.stage2.1.main_conv.conv_(0, 64)_64': 62,
+    'backbone.stage2.1.blocks.0.conv1.conv_(0, 64)_64': 57,
+    'backbone.stage2.1.blocks.1.conv1.conv_(0, 64)_64': 59,
+    'backbone.stage2.1.final_conv.conv_(0, 128)_128': 69,
+    'backbone.stage3.0.conv_(0, 256)_256': 177,
+    'backbone.stage3.1.short_conv.conv_(0, 128)_128': 122,
+    'backbone.stage3.1.main_conv.conv_(0, 128)_128': 123,
+    'backbone.stage3.1.blocks.0.conv1.conv_(0, 128)_128': 125,
+    'backbone.stage3.1.blocks.1.conv1.conv_(0, 128)_128': 123,
+    'backbone.stage3.1.final_conv.conv_(0, 256)_256': 171,
+    'backbone.stage4.0.conv_(0, 512)_512': 351,
+    'backbone.stage4.1.conv1.conv_(0, 256)_256': 256,
+    'backbone.stage4.1.conv2.conv_(0, 512)_512': 367,
+    'backbone.stage4.2.short_conv.conv_(0, 256)_256': 183,
+    'backbone.stage4.2.main_conv.conv_(0, 256)_256': 216,
+    'backbone.stage4.2.blocks.0.conv1.conv_(0, 256)_256': 238,
+    'backbone.stage4.2.blocks.0.conv2.pointwise_conv.conv_(0, 256)_256': 195,
+    'backbone.stage4.2.final_conv.conv_(0, 512)_512': 187
+}
+divisor = 16
+##############################################################################
+
+architecture = _base_.model
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherDeploySubModel',
+    architecture=architecture,
+    fix_subnet=fix_subnet,
+    divisor=divisor,
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..b4fb4f827cf73ac4c125ac984a69759f5419be1a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.py
@@ -0,0 +1,32 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.py'  # noqa
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.pth'  # noqa
+finetune_lr = 4e-3
+##############################################################################
+
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..5cc6db15e42fc1af75532a4ffc142624f687aec3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.py
@@ -0,0 +1,33 @@
+#############################################################################
+"""# You have to fill these args.
+
+_base_(str): The path to your pruning config file.
+pruned_path (str): The path to the checkpoint of the pruned model.
+finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
+    rate of the pretrain.
+"""
+
+_base_ = './group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.py'
+pruned_path = 'https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.pth'  # noqa
+finetune_lr = 4e-3
+##############################################################################
+
+algorithm = _base_.model
+algorithm.init_cfg = dict(type='Pretrained', checkpoint=pruned_path)
+# algorithm.update(dict(architecture=dict(test_cfg=dict(flip_test=False), ))) # disable flip test # noqa
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherSubModel',
+    algorithm=algorithm,
+)
+
+# restore lr
+optim_wrapper = dict(optimizer=dict(lr=finetune_lr))
+
+# remove pruning related hooks
+custom_hooks = _base_.custom_hooks[:-2]
+
+# delete ddp
+model_wrapper_cfg = None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..14bdc96f5eb55d7e2f6113966c53320c32164df3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_aic-coco-256x192.py
@@ -0,0 +1,75 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = 'mmpose::body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py'  # noqa
+pretrained_path = 'https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-s_simcc-aic-coco_pt-aic-coco_420e-256x192-fcb2599b_20230126.pth'  # noqa
+
+interval = 10
+normalization_type = 'act'
+lr_ratio = 0.1
+
+target_flop_ratio = 0.51
+input_shape = (1, 3, 256, 192)
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = None
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a998e5934babfe67412a16ad5fb83ad72bd2651
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/group_fisher_prune_rtmpose-s_8xb256-420e_coco-256x192.py
@@ -0,0 +1,75 @@
+#############################################################################
+"""You have to fill these args.
+
+_base_ (str): The path to your pretrained model checkpoint.
+pretrained_path (str): The path to your pretrained model checkpoint.
+
+interval (int): Interval between pruning two channels. You should ensure you
+    can reach your target pruning ratio when the training ends.
+normalization_type (str): GroupFisher uses two methods to normlized the channel
+    importance, including ['flops','act']. The former uses flops, while the
+    latter uses the memory occupation of activation feature maps.
+lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
+    you need to decrease the original lr rate until the pruning training work
+    steadly without getting nan.
+
+target_flop_ratio (float): The target flop ratio to prune your model.
+input_shape (Tuple): input shape to measure the flops.
+"""
+
+_base_ = 'mmpose::body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py'  # noqa
+pretrained_path = 'https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-s_simcc-coco_pt-aic-coco_420e-256x192-8edcf0d7_20230127.pth'  # noqa
+
+interval = 10
+normalization_type = 'act'
+lr_ratio = 0.1
+
+target_flop_ratio = 0.51
+input_shape = (1, 3, 256, 192)
+##############################################################################
+
+architecture = _base_.model
+
+if hasattr(_base_, 'data_preprocessor'):
+    architecture.update({'data_preprocessor': _base_.data_preprocessor})
+    data_preprocessor = None
+
+architecture.init_cfg = dict(type='Pretrained', checkpoint=pretrained_path)
+architecture['_scope_'] = _base_.default_scope
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='GroupFisherAlgorithm',
+    architecture=architecture,
+    interval=interval,
+    mutator=dict(
+        type='GroupFisherChannelMutator',
+        parse_cfg=dict(type='ChannelAnalyzer', tracer_type='FxTracer'),
+        channel_unit_cfg=dict(
+            type='GroupFisherChannelUnit',
+            default_args=dict(normalization_type=normalization_type, ),
+        ),
+    ),
+)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.GroupFisherDDP',
+    broadcast_buffers=False,
+)
+
+optim_wrapper = dict(
+    optimizer=dict(lr=_base_.optim_wrapper.optimizer.lr * lr_ratio))
+
+custom_hooks = getattr(_base_, 'custom_hooks', []) + [
+    dict(type='mmrazor.PruningStructureHook'),
+    dict(
+        type='mmrazor.ResourceInfoHook',
+        interval=interval,
+        demo_input=dict(
+            type='mmrazor.DefaultDemoInput',
+            input_shape=input_shape,
+        ),
+        save_ckpt_thr=[target_flop_ratio],
+    ),
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/script.sh b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/script.sh
new file mode 100644
index 0000000000000000000000000000000000000000..897cd3ac1e884f27d2be4d5b8ed4bc1a9ff37ac8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmpose/group_fisher/script.sh
@@ -0,0 +1,39 @@
+# deploy rtmpose-s_pruned_act
+
+razor_config=configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_coco-256x192.py
+deploy_config=mmdeploy/configs/mmpose/pose-detection_simcc_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_coco-256x192.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 256x192 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
+
+# deploy rtmpose-s-aic-coco_pruned_act
+
+razor_config=configs/pruning/mmpose/group_fisher/group_fisher_deploy_rtmpose-s_8xb256-420e_aic-coco-256x192.py
+deploy_config=mmdeploy/configs/mmpose/pose-detection_simcc_onnxruntime_dynamic.py
+
+python mmdeploy/tools/deploy.py $deploy_config \
+    $razor_config \
+    https://download.openmmlab.com/mmrazor/v1/pruning/group_fisher/rtmpose-s/group_fisher_finetune_rtmpose-s_8xb256-420e_aic-coco-256x192.pth \
+    mmdeploy/tests/data/tiger.jpeg \
+    --work-dir ./work_dirs/mmdeploy
+
+python mmdeploy/tools/profiler.py $deploy_config \
+    $razor_config \
+    mmdeploy/demo/resources \
+    --model ./work_dirs/mmdeploy/end2end.onnx \
+    --shape 256x192 \
+    --device cpu \
+    --num-iter 1000 \
+    --warmup 100
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fd00eb898b4f89f2b814dba514b3ea8529087154
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/README.md
@@ -0,0 +1,82 @@
+# Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion
+
+## Abstract
+
+The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select “important” filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computationeconomical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an intersimilarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top1 accuracy on ILSVRC-2012.
+
+![pipeline](https://user-images.githubusercontent.com/31244134/189286581-722853ba-c6d7-4a39-b902-37995b444c71.jpg)
+
+## Results and models
+
+### 1. Classification
+
+| Dataset  |   Backbone   | Params(M) | FLOPs(M) | lr_type | Top-1 (%) | Top-5 (%) |                     CPrate                      |                        Config                        |           Download           |
+| :------: | :----------: | :-------: | :------: | :-----: | :-------: | :-------: | :---------------------------------------------: | :--------------------------------------------------: | :--------------------------: |
+| ImageNet | DCFFResNet50 |   15.16   |   2260   |  step   |   73.96   |   91.66   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmcls/dcff/dcff_resnet_8xb32_in1k.py) | [model](<>) \| \[log\] (\<>) |
+
+### 2. Detection
+
+| Dataset |   Method    |   Backbone   |  Style  | Lr schd | Params(M) | FLOPs(M) | bbox AP |                     CPrate                      |                              Config                               |           Download           |
+| :-----: | :---------: | :----------: | :-----: | :-----: | :-------: | :------: | :-----: | :---------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | Faster_RCNN | DCFFResNet50 | pytorch |  step   |   33.31   |  168320  |  35.8   | \[0.0\]+\[0.35,0.4,0.1\]\*10+\[0.3,0.3,0.1\]\*6 | [config](../../mmdet/dcff/dcff_faster_rcnn_resnet50_8xb4_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+### 3. Segmentation
+
+|  Dataset   |  Method   |    Backbone     | crop size | Lr schd | Params(M) | FLOPs(M) | mIoU  |                               CPrate                                |                                Config                                 |           Download           |
+| :--------: | :-------: | :-------------: | :-------: | :-----: | :-------: | :------: | :---: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: | :--------------------------: |
+| Cityscapes | PointRend | DCFFResNetV1c50 | 512x1024  |  160k   |   18.43   |  74410   | 76.75 | \[0.0, 0.0, 0.0\] + \[0.35, 0.4, 0.1\] * 10 + \[0.3, 0.3, 0.1\] * 6 | [config](../../mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py) | [model](<>) \| \[log\] (\<>) |
+
+### 4. Pose
+
+| Dataset |     Method      |   Backbone   | crop size | total epochs | Params(M) | FLOPs(M) |  AP  |                           CPrate                           |                              Config                               |           Download           |
+| :-----: | :-------------: | :----------: | :-------: | :----------: | :-------: | :------: | :--: | :--------------------------------------------------------: | :---------------------------------------------------------------: | :--------------------------: |
+|  COCO   | TopDown HeatMap | DCFFResNet50 |  256x192  |     300      |   26.95   |   4290   | 68.3 | \[0.0\] + \[0.2, 0.2, 0.1\] * 10 + \[0.15, 0.15, 0.1\] * 6 | [config](../../mmpose/dcff/dcff_topdown_heatmap_resnet50_coco.py) | [model](<>) \| \[log\] (\<>) |
+
+## Citation
+
+```latex
+@article{lin2021training,
+  title={Training Compact CNNs for Image Classification using Dynamic-coded Filter Fusion},
+  author={Lin, Mingbao and Ji, Rongrong and Chen, Bohong and Chao, Fei and Liu, Jianzhuang and Zeng, Wei and Tian, Yonghong and Tian, Qi},
+  journal={arXiv preprint arXiv:2107.06916},
+  year={2021}
+}
+```
+
+## Get Started
+
+### Generate channel_config file
+
+Generate `resnet_seg.json` with `tools/pruning/get_channel_units.py`.
+
+```bash
+python tools/pruning/get_channel_units.py
+  configs/pruning/mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py \
+  -c -i --output-path=configs/pruning/mmseg/dcff/resnet_seg.json
+```
+
+Then set layers' pruning rates `target_pruning_ratio` by `resnet_seg.json`.
+
+### Train DCFF
+
+#### Segmentation
+
+##### Citpscapes
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh \
+  configs/pruning/mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py 4 \
+  --work-dir $WORK_DIR
+```
+
+### Test DCFF
+
+#### Segmentation
+
+##### Citpscapes
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_test.sh \
+  configs/pruning/mmseg/dcff/dcff_compact_pointrend_resnet50_8xb2_cityscapes.py \
+  $CKPT 1 --work-dir $WORK_DIR
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_compact_pointrend_resnet50_8xb2_cityscapes.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_compact_pointrend_resnet50_8xb2_cityscapes.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6c1eb0318ec32b67907acc7b3bcbcba23e5b1a8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_compact_pointrend_resnet50_8xb2_cityscapes.py
@@ -0,0 +1,12 @@
+_base_ = ['dcff_pointrend_resnet50_8xb2_cityscapes.py']
+
+# model settings
+_base_.model = dict(
+    _scope_='mmrazor',
+    type='sub_model',
+    cfg=_base_.architecture,
+    fix_subnet='configs/pruning/mmseg/dcff/fix_subnet.json',
+    mode='mutator',
+    init_cfg=dict(
+        type='Pretrained',
+        checkpoint='configs/pruning/mmseg/dcff/fix_subnet_weight.pth'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py
new file mode 100644
index 0000000000000000000000000000000000000000..d552e23e9e0aecbcf00a152c235a5d448153ea2d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/dcff_pointrend_resnet50_8xb2_cityscapes.py
@@ -0,0 +1,99 @@
+_base_ = [
+    # TODO: use autoaug pipeline.
+    'mmseg::_base_/datasets/cityscapes.py',
+    'mmseg::_base_/schedules/schedule_160k.py',
+    'mmseg::_base_/default_runtime.py',
+    './pointrend_resnet50.py'
+]
+
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005),
+    clip_grad=dict(max_norm=25, norm_type=2),
+    _delete_=True)
+train_cfg = dict(type='IterBasedTrainLoop', max_iters=160000, val_interval=800)
+
+param_scheduler = [
+    # warm up
+    dict(type='LinearLR', by_epoch=False, start_factor=0.1, begin=0, end=200),
+    dict(
+        type='PolyLR',
+        eta_min=1e-4,
+        power=0.9,
+        begin=200,
+        end=80000,
+        by_epoch=False,
+    )
+]
+
+stage_ratio_1 = 0.65
+stage_ratio_2 = 0.6
+stage_ratio_3 = 0.9
+stage_ratio_4 = 0.7
+
+# the config template of target_pruning_ratio can be got by
+# python ./tools/pruning/get_channel_units.py {config_file} --choice
+target_pruning_ratio = {
+    'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+    'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+    'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+    'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+    # block 1 [0.8, 0.8] downsample=[0.9]
+    'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+    'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+    'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+    'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+    # block 2 [0.8, 0.8] downsample=[0.9]
+    'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+    'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+    'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+    'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+    'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+    # block 3 [0.8, 0.8]*2+[0.8, 0.85]*2 downsample=[0.9]
+    'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+    'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+    'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+    # block 4 [0.85, 0.85] downsample=[0.9]
+}
+
+# model settings
+model = dict(
+    _scope_='mmrazor',
+    type='DCFF',
+    architecture=_base_.architecture,
+    mutator_cfg=dict(
+        type='DCFFChannelMutator',
+        channel_unit_cfg=dict(
+            type='DCFFChannelUnit', default_args=dict(choice_mode='ratio')),
+        parse_cfg=dict(
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')),
+    target_pruning_ratio=target_pruning_ratio,
+    step_freq=200,
+    linear_schedule=False)
+
+model_wrapper = dict(
+    type='mmcv.MMDistributedDataParallel', find_unused_parameters=True)
+
+val_cfg = dict(_delete_=True, type='mmrazor.ItePruneValLoop')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/fix_subnet.json b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/fix_subnet.json
new file mode 100644
index 0000000000000000000000000000000000000000..dfdcea75873d2516af6db193343f0b4df7376bb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/fix_subnet.json
@@ -0,0 +1,141 @@
+{
+    "type":"DCFFChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DCFFChannelUnit",
+        "default_args":{
+            "choice_mode":"ratio"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":1.0
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59375
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59765625
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/pointrend_resnet50.py b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/pointrend_resnet50.py
new file mode 100644
index 0000000000000000000000000000000000000000..816ec83868bbebb9d6160d587432cbcfe58720f5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/pruning/mmseg/dcff/pointrend_resnet50.py
@@ -0,0 +1,63 @@
+data_preprocessor = dict(
+    _scope_='mmseg',
+    type='SegDataPreProcessor',
+    mean=[123.675, 116.28, 103.53],
+    std=[58.395, 57.12, 57.375],
+    bgr_to_rgb=True,
+    size=(512, 1024),
+    pad_val=0,
+    seg_pad_val=255)
+architecture = dict(
+    _scope_='mmseg',
+    type='CascadeEncoderDecoder',
+    data_preprocessor=data_preprocessor,
+    num_stages=2,
+    pretrained=None,
+    backbone=dict(
+        type='ResNetV1c',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        dilations=(1, 1, 1, 1),
+        strides=(1, 2, 2, 2),
+        norm_eval=False,
+        style='pytorch',
+        contract_dilation=True),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=4),
+    decode_head=[
+        dict(
+            type='FPNHead',
+            in_channels=[256, 256, 256, 256],
+            in_index=[0, 1, 2, 3],
+            feature_strides=[4, 8, 16, 32],
+            channels=128,
+            dropout_ratio=-1,
+            num_classes=19,
+            align_corners=False,
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='PointHead',
+            in_channels=[256],
+            in_index=[0],
+            channels=256,
+            num_fcs=3,
+            coarse_pred_each_layer=True,
+            dropout_ratio=-1,
+            num_classes=19,
+            align_corners=False,
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))
+    ],
+    # model training and testing settings
+    train_cfg=dict(
+        num_points=2048, oversample_ratio=3, importance_sample_ratio=0.75),
+    test_cfg=dict(
+        mode='whole',
+        subdivision_steps=2,
+        subdivision_num_points=8196,
+        scale_factor=2))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1fc673c5f6e2c2baf0334d77961ffdf37d0ea78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py
@@ -0,0 +1,30 @@
+deploy_cfg = dict(
+    onnx_config=dict(
+        type='onnx',
+        export_params=True,
+        keep_initializers_as_inputs=False,
+        opset_version=11,
+        save_file='end2end.onnx',
+        input_names=['input'],
+        output_names=['output'],
+        input_shape=None,
+        optimize=True,
+        dynamic_axes={
+            'input': {
+                0: 'batch',
+                2: 'height',
+                3: 'width'
+            },
+            'output': {
+                0: 'batch'
+            }
+        }),
+    backend_config=dict(
+        type='openvino',
+        model_inputs=[dict(opt_shapes=dict(input=[1, 3, 224, 224]))]),
+    codebase_config=dict(type='mmcls', task='Classification'),
+    function_record_to_pop=[
+        'mmcls.models.classifiers.ImageClassifier.forward',
+        'mmcls.models.classifiers.BaseClassifier.forward'
+    ],
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py
new file mode 100644
index 0000000000000000000000000000000000000000..a562c370b20238057fa061476fec3ceb3c83ff5c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py
@@ -0,0 +1,39 @@
+deploy_cfg = dict(
+    onnx_config=dict(
+        type='onnx',
+        export_params=True,
+        keep_initializers_as_inputs=False,
+        opset_version=11,
+        save_file='end2end.onnx',
+        input_names=['input'],
+        output_names=['output'],
+        input_shape=[224, 224],
+        optimize=True,
+        dynamic_axes=dict(
+            input=dict({
+                0: 'batch',
+                2: 'height',
+                3: 'width'
+            }),
+            output=dict({0: 'batch'}))),
+    codebase_config=dict(type='mmcls', task='Classification'),
+    backend_config=dict(
+        type='tensorrt',
+        common_config=dict(
+            fp16_mode=False,
+            max_workspace_size=1073741824,
+            int8_mode=True,
+            explicit_quant_mode=True),
+        model_inputs=[
+            dict(
+                input_shapes=dict(
+                    input=dict(
+                        min_shape=[1, 3, 224, 224],
+                        opt_shape=[4, 3, 224, 224],
+                        max_shape=[8, 3, 224, 224])))
+        ]),
+    function_record_to_pop=[
+        'mmcls.models.classifiers.ImageClassifier.forward',
+        'mmcls.models.classifiers.BaseClassifier.forward', 'torch.cat'
+    ],
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_openvino_dynamic-800x1344.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_openvino_dynamic-800x1344.py
new file mode 100644
index 0000000000000000000000000000000000000000..c76898d0be4d2548f78b3e116987614b1814509b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_openvino_dynamic-800x1344.py
@@ -0,0 +1,47 @@
+deploy_cfg = dict(
+    onnx_config=dict(
+        type='onnx',
+        export_params=True,
+        keep_initializers_as_inputs=False,
+        opset_version=11,
+        save_file='end2end.onnx',
+        input_shape=None,
+        input_names=['input'],
+        output_names=['dets', 'labels'],
+        optimize=True,
+        dynamic_axes={
+            'input': {
+                0: 'batch',
+                2: 'height',
+                3: 'width'
+            },
+            'dets': {
+                0: 'batch',
+                1: 'num_dets',
+            },
+            'labels': {
+                0: 'batch',
+                1: 'num_dets',
+            },
+        }),
+    backend_config=dict(
+        type='openvino',
+        model_inputs=[dict(opt_shapes=dict(input=[1, 3, 800, 1344]))]),
+    codebase_config=dict(
+        type='mmdet',
+        task='ObjectDetection',
+        model_type='end2end',
+        post_processing=dict(
+            score_threshold=0.05,
+            confidence_threshold=0.005,  # for YOLOv3
+            iou_threshold=0.5,
+            max_output_boxes_per_class=200,
+            pre_top_k=5000,
+            keep_top_k=100,
+            background_label_id=-1,
+        )),
+    function_record_to_pop=[
+        'mmdet.models.detectors.single_stage.SingleStageDetector.forward',
+        'mmdet.models.detectors.two_stage.TwoStageDetector.forward',
+        'mmdet.models.detectors.single_stage_instance_seg.SingleStageInstanceSegmentor.forward'  # noqa: E501
+    ])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_tensorrt-int8-explicit_dynamic-320x320-1344x1344.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_tensorrt-int8-explicit_dynamic-320x320-1344x1344.py
new file mode 100644
index 0000000000000000000000000000000000000000..1061d6bd6982440c074d8d4cc9edd53d99e33fea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/deploy_cfgs/mmdet/detection_tensorrt-int8-explicit_dynamic-320x320-1344x1344.py
@@ -0,0 +1,58 @@
+deploy_cfg = dict(
+    onnx_config=dict(
+        type='onnx',
+        export_params=True,
+        keep_initializers_as_inputs=False,
+        opset_version=11,
+        save_file='end2end.onnx',
+        input_names=['input'],
+        output_names=['dets', 'labels'],
+        input_shape=None,
+        optimize=True,
+        dynamic_axes=dict(
+            input=dict({
+                0: 'batch',
+                2: 'height',
+                3: 'width'
+            }),
+            dets=dict({
+                0: 'batch',
+                1: 'num_dets'
+            }),
+            labels=dict({
+                0: 'batch',
+                1: 'num_dets'
+            }))),
+    codebase_config=dict(
+        type='mmdet',
+        task='ObjectDetection',
+        model_type='end2end',
+        post_processing=dict(
+            score_threshold=0.05,
+            confidence_threshold=0.005,
+            iou_threshold=0.5,
+            max_output_boxes_per_class=200,
+            pre_top_k=5000,
+            keep_top_k=100,
+            background_label_id=-1)),
+    backend_config=dict(
+        type='tensorrt',
+        common_config=dict(
+            fp16_mode=False,
+            max_workspace_size=1073741824,
+            int8_mode=True,
+            explicit_quant_mode=True),
+        model_inputs=[
+            dict(
+                input_shapes=dict(
+                    input=dict(
+                        min_shape=[1, 3, 320, 320],
+                        opt_shape=[1, 3, 800, 1344],
+                        max_shape=[1, 3, 1344, 1344])))
+        ]),
+    function_record_to_pop=[
+        'mmdet.models.detectors.single_stage.SingleStageDetector.forward',
+        'mmdet.models.detectors.two_stage.TwoStageDetector.forward',
+        'mmdet.models.detectors.single_stage_instance_seg.SingleStageInstanceSegmentor.forward',  # noqa: E501
+        'torch.cat'
+    ])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1a9f535194f9fbd6d4c6bc67346746c4203e345a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/README.md
@@ -0,0 +1,59 @@
+# Post-Training Quantization (PTQ)
+
+> [A White Paper on Neural Network Quantization](https://arxiv.org/abs/2106.08295)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation. In this white paper, we introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance while maintaining low-bit weights and activations. We start with a hardware motivated introduction to quantization and then consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT). PTQ requires no re-training or labelled data and is thus a lightweight push-button approach to quantization. In most cases, PTQ is sufficient for achieving 8-bit quantization with close to floating-point accuracy. QAT requires fine-tuning and access to labeled training data but enables lower bit quantization with competitive results. For both solutions, we provide tested pipelines based on existing literature and extensive experimentation that lead to state-of-the-art performance for common deep learning models and tasks.
+
+## Results and models
+
+### Classification
+
+| Model        | Dataset  | Backend  | Top 1 Acc（fp32） | Top 1 Acc（int8） | Top 1 Acc（deployed） | Config                                                      | Download                                                                                                                                                                                                                                                                                                       |
+| ------------ | -------- | -------- | --------------- | --------------- | ------------------- | ----------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| resnet18     | ImageNet | openvino | 69.90           | 69.742          | 69.74               | [config](./ptq_openvino_resnet18_8xb32_in1k_calib32xb32.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet18_8xb32_in1k_calib32xb32_20230330_163655-2386d965.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet18_8xb32_in1k_calib32xb32_20230330_163655-2386d965.log) |
+| resnet50     | ImageNet | openvino | 76.55           | 76.374          | 76.378              | [config](./ptq_openvino_resnet50_8xb32_in1k_calib32xb32.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet50_8xb32_in1k_calib32xb32_20230330_170115-2acd6014.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet50_8xb32_in1k_calib32xb32_20230330_170115-2acd6014.log) |
+| mobilenet_v2 | ImageNet | openvino | 71.86           | 70.224          | 70.292              | [config](./ptq_openvino_mbv2_8xb32_in1k_calib32xb32.py)     | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_mbv2_8xb32_in1k_calib32xb32_20230330_170909-364822ad.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_mbv2_8xb32_in1k_calib32xb32_20230330_170909-364822ad.log)         |
+| resnet18     | ImageNet | tensorrt | 69.90           | 69.762          | 69.85               | [config](./ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32_20230331_144323-640b272e.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32_20230331_144323-640b272e.log) |
+| resnet50     | ImageNet | tensorrt | 76.55           | 76.372          | 76.374              | [config](./ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32_20230331_145011-d2da300f.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32_20230331_145011-d2da300f.log) |
+| mobilenet_v2 | ImageNet | tensorrt | 71.86           | 70.324          | 70.548              | [config](./ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32.py)     | [model](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32_20230331_153131-335988e4.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32_20230331_153131-335988e4.log)         |
+
+### Detection
+
+| Model          | Dataset | Backend  | box AP（fp32） | box AP（int8） | box AP（deployed） | Config                                                         | Download                 |
+| -------------- | ------- | -------- | ------------ | ------------ | ---------------- | -------------------------------------------------------------- | ------------------------ |
+| retina_r50_fpn | COCO    | openvino | 36.5         | 36.3         | 36.3             | [config](./ptq_openvino_retina_r50_1x_coco_calib32xb32.py)     | [model](<>) \| [log](<>) |
+| yolox_s        | COCO    | openvino | 40.5         | 38.5         | 38.5             | [config](./ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32.py) | [model](<>) \| [log](<>) |
+| retina_r50_fpn | COCO    | tensorrt | 36.5         | 36.2         | 36.3             | [config](./ptq_tensorrt_retina_r50_1x_coco_calib32xb32.py)     | [model](<>) \| [log](<>) |
+| yolox_s        | COCO    | tensorrt | 40.5         | 38.8         | 39.3             | [config](./ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32.py) | [model](<>) \| [log](<>) |
+
+## Citation
+
+```latex
+ @misc{Nagel_Fournarakis_Amjad_Bondarenko_Baalen_Blankevoort_2021,
+    title={A White Paper on Neural Network Quantization},
+    journal={Cornell University - arXiv},
+    author={Nagel, Markus and Fournarakis, Marios and Amjad, RanaAli and Bondarenko, Yelysei and Baalen, Martvan and Blankevoort, Tijmen},
+    year={2021},
+    month={Jun}
+ }
+```
+
+## Getting Started
+
+**PTQ for pretrain model**
+
+```
+python tools/ptq.py ${CONFIG}
+```
+
+**Test for quantized model**
+
+```
+python tools/test.py ${CONFIG} ${CKPT}
+```
+
+For more details, please refer to [Quantization User Guide](https://mmrazor.readthedocs.io/en/main/user_guides/quantization_user_guide.html)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..1ebceab4b6c188a0840b8168249d305cdca2939b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/metafile.yml
@@ -0,0 +1,164 @@
+Collections:
+  - Name: PTQ
+    README: configs/quantization/ptq/base/README.md
+Models:
+  - Name: ptq_openvino_mbv2_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+        Metrics:
+          Top 1 Accuracy: 71.86
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.224
+    Config: configs/quantization/ptq/base/ptq_openvino_mbv2_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_mbv2_8xb32_in1k_calib32xb32_20230330_170909-364822ad.pth
+  - Name: ptq_openvino_resnet18_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.742
+    Config: configs/quantization/ptq/base/ptq_openvino_resnet18_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet18_8xb32_in1k_calib32xb32_20230330_163655-2386d965.pth
+  - Name: ptq_openvino_resnet50_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 76.374
+    Config: configs/quantization/ptq/base/ptq_openvino_resnet50_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_resnet50_8xb32_in1k_calib32xb32_20230330_170115-2acd6014.pth
+  - Name: ptq_openvino_retina_r50_1x_coco_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmdet::retinanet/retinanet_r50_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth
+        Metrics:
+          box AP: 36.5
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 36.3
+    Config: configs/quantization/ptq/base/ptq_openvino_retina_r50_1x_coco_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_retina_r50_1x_coco_calib32xb32_20230330_172645-80eea5b6.pth
+  - Name: ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmdet::yolox/yolox_s_8xb8-300e_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth
+        Metrics:
+          box AP: 40.5
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 38.5
+    Config: configs/quantization/ptq/base/ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/openvino/ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32_20230330_175747-f1a0a2f4.pth
+  - Name: ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: tensorrt
+      Float Model:
+        Config: mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth
+        Metrics:
+          Top 1 Accuracy: 71.86
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 70.324
+    Config: configs/quantization/ptq/base/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32_20230331_153131-335988e4.pth
+  - Name: ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: tensorrt
+      Float Model:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.762
+    Config: configs/quantization/ptq/base/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32_20230331_144323-640b272e.pth
+  - Name: ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: tensorrt
+      Float Model:
+        Config: mmcls::resnet/resnet50_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
+        Metrics:
+          Top 1 Accuracy: 76.55
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 76.372
+    Config: configs/quantization/ptq/base/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32_20230331_145011-d2da300f.pth
+  - Name: ptq_tensorrt_retina_r50_1x_coco_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: tensorrt
+      Float Model:
+        Config: mmdet::retinanet/retinanet_r50_fpn_1x_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth
+        Metrics:
+          box AP: 36.5
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 36.2
+    Config: configs/quantization/ptq/base/ptq_tensorrt_retina_r50_1x_coco_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_retina_r50_1x_coco_calib32xb32_20230330_205741-4c5c10c4.pth
+  - Name: ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32
+    In Collection: PTQ
+    Metadata:
+      Backend: tensorrt
+      Float Model:
+        Config: mmdet::yolox/yolox_s_8xb8-300e_coco.py
+        Weights: https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth
+        Metrics:
+          box AP: 40.5
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 38.8
+    Config: configs/quantization/ptq/base/ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/ptq/tensorrt/ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32_20230331_155139-f2021e57.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_mbv2_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_mbv2_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..efa2a75dd889bd24c1ad0ad4d771cfa70f897c06
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_mbv2_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,54 @@
+_base_ = [
+    'mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py'
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet18_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet18_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..b548b15f5f68fd8c0dac0210251575090d720251
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet18_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,51 @@
+_base_ = [
+    'mmcls::resnet/resnet18_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py'
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(type='mmrazor.MMArchitectureQuantDDP', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet50_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet50_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..14802a442486c3781eb9250cafd4194a95424bfa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_resnet50_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,50 @@
+_base_ = [
+    'mmcls::resnet/resnet50_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_openvino_dynamic-224x224.py'
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+model_wrapper_cfg = dict(type='mmrazor.MMArchitectureQuantDDP', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_retina_r50_1x_coco_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_retina_r50_1x_coco_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..e35e6270ef1dc06632a7a69914cafbe48f67b69d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_retina_r50_1x_coco_calib32xb32.py
@@ -0,0 +1,52 @@
+_base_ = [
+    'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py',
+    '../../deploy_cfgs/mmdet/detection_openvino_dynamic-800x1344.py'
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True,
+        pad_size_divisor=32),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmdet.models.dense_heads.base_dense_head.BaseDenseHead.predict_by_feat',  # noqa: E501
+                'mmdet.models.dense_heads.anchor_head.AnchorHead.loss_by_feat',
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..bab9ed021fe06aede3ffd927c289fd4f7f7e3989
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_openvino_yolox_s_8xb8-300e_coco_calib32xb32.py
@@ -0,0 +1,57 @@
+_base_ = [
+    'mmdet::yolox/yolox_s_8xb8-300e_coco.py',
+    '../../deploy_cfgs/mmdet/detection_openvino_dynamic-800x1344.py'
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        pad_size_divisor=32,
+        batch_augments=[
+            dict(
+                type='mmdet.BatchSyncRandomResize',
+                random_size_range=(480, 800),
+                size_divisor=32,
+                interval=10)
+        ]),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmdet.models.dense_heads.yolox_head.YOLOXHead.predict_by_feat',  # noqa: E501
+                'mmdet.models.dense_heads.yolox_head.YOLOXHead.loss_by_feat',
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+custom_hooks = []
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..68b6d4f976e35cc2988c62899b6ab877179de543
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_mbv2_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,54 @@
+_base_ = [
+    'mmcls::mobilenet_v2/mobilenet-v2_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py'  # noqa: E501
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..41d08812c3945d38116c0a11a52ea3682a61fdbe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet18_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,51 @@
+_base_ = [
+    'mmcls::resnet/resnet18_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py'  # noqa: E501
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(type='mmrazor.MMArchitectureQuantDDP', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..e4fa955dc35d1090f3a407cc81d3c7e626bf40a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_resnet50_8xb32_in1k_calib32xb32.py
@@ -0,0 +1,51 @@
+_base_ = [
+    'mmcls::resnet/resnet50_8xb32_in1k.py',
+    '../../deploy_cfgs/mmcls/classification_tensorrt-int8-explicit_dynamic-224x224.py'  # noqa: E501
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(type='mmrazor.MMArchitectureQuantDDP', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_retina_r50_1x_coco_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_retina_r50_1x_coco_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ca81a920d1ec845d00d51b5d734095a110f8d4b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_retina_r50_1x_coco_calib32xb32.py
@@ -0,0 +1,53 @@
+_base_ = [
+    'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py',
+    '../../deploy_cfgs/mmdet/detection_tensorrt-int8-explicit_dynamic-320x320-1344x1344.py'  # noqa: E501
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True,
+        pad_size_divisor=32),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmdet.models.dense_heads.base_dense_head.BaseDenseHead.predict_by_feat',  # noqa: E501
+                'mmdet.models.dense_heads.anchor_head.AnchorHead.loss_by_feat',
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32.py
new file mode 100644
index 0000000000000000000000000000000000000000..51e4f8f112e0cb4a1cdf959327e8397b676b28a7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/ptq/base/ptq_tensorrt_yolox_s_8xb8-300e_coco_calib32xb32.py
@@ -0,0 +1,58 @@
+_base_ = [
+    'mmdet::yolox/yolox_s_8xb8-300e_coco.py',
+    '../../deploy_cfgs/mmdet/detection_tensorrt-int8-explicit_dynamic-320x320-1344x1344.py'  # noqa: E501
+]
+
+_base_.val_dataloader.batch_size = 32
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=_base_.val_dataloader,
+    calibrate_steps=32,
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        pad_size_divisor=32,
+        batch_augments=[
+            dict(
+                type='mmdet.BatchSyncRandomResize',
+                random_size_range=(480, 800),
+                size_divisor=32,
+                interval=10)
+        ]),
+    architecture=_base_.model,
+    deploy_cfg=_base_.deploy_cfg,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmdet.models.dense_heads.yolox_head.YOLOXHead.predict_by_feat',  # noqa: E501
+                'mmdet.models.dense_heads.yolox_head.YOLOXHead.loss_by_feat',
+            ])))
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+custom_hooks = []
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..ec4541eb454656170a96f36930a8066320c94d8a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/README.md
@@ -0,0 +1,45 @@
+# Quantization-Aware-Training (QAT)
+
+> [A White Paper on Neural Network Quantization](https://arxiv.org/abs/2106.08295)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation. In this white paper, we introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance while maintaining low-bit weights and activations. We start with a hardware motivated introduction to quantization and then consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT). PTQ requires no re-training or labelled data and is thus a lightweight push-button approach to quantization. In most cases, PTQ is sufficient for achieving 8-bit quantization with close to floating-point accuracy. QAT requires fine-tuning and access to labeled training data but enables lower bit quantization with competitive results. For both solutions, we provide tested pipelines based on existing literature and extensive experimentation that lead to state-of-the-art performance for common deep learning models and tasks.
+
+## Results and models
+
+### Classification
+
+| Model    | Dataset  | Backend  | Top 1 Acc（fp32） | Top 1 Acc（int8） | Config                                              | Download                                                                                                                                                                                                                                                                                       |
+| -------- | -------- | -------- | --------------- | --------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| resnet18 | ImageNet | openvino | 69.90           | 69.98           | [config](./qat_openvino_resnet18_10e_8xb32_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/qat_openvino_resnet18_8xb32_10e_in1k_20230413_172732-5b9ff01d.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/qat_openvino_resnet18_8xb32_10e_in1k_20230413_172732-5b9ff01d.log) |
+
+## Citation
+
+```latex
+ @misc{Nagel_Fournarakis_Amjad_Bondarenko_Baalen_Blankevoort_2021,
+    title={A White Paper on Neural Network Quantization},
+    journal={Cornell University - arXiv},
+    author={Nagel, Markus and Fournarakis, Marios and Amjad, RanaAli and Bondarenko, Yelysei and Baalen, Martvan and Blankevoort, Tijmen},
+    year={2021},
+    month={Jun}
+ }
+```
+
+## Getting Started
+
+**QAT for pretrain model**
+
+```
+python tools/train.py ${CONFIG}
+```
+
+**Test for quantized model**
+
+```
+python tools/test.py ${CONFIG} ${CKPT}
+```
+
+For more details, please refer to [Quantization User Guide](https://mmrazor.readthedocs.io/en/main/user_guides/quantization_user_guide.html)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..bd4015a50afcab1585285ac44af9b2d52ba7a84e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/metafile.yml
@@ -0,0 +1,20 @@
+Collections:
+  - Name: QAT
+    README: configs/quantization/qat/base/README.md
+Models:
+  - Name: qat_openvino_resnet18_10e_8xb32_in1k.py
+    In Collection: QAT
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.98
+    Config: configs/quantization/qat/base/qat_openvino_resnet18_10e_8xb32_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/qat_openvino_resnet18_8xb32_10e_in1k_20230413_172732-5b9ff01d.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/qat_openvino_resnet18_10e_8xb32_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/qat_openvino_resnet18_10e_8xb32_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..261af7abb413baecf95e0536fe86d2a3f5dea426
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/base/qat_openvino_resnet18_10e_8xb32_in1k.py
@@ -0,0 +1,62 @@
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
+
+resnet = _base_.model
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=resnet,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.0001, momentum=0.9, weight_decay=0.0001))
+
+# learning policy
+param_scheduler = dict(
+    _delete_=True, type='ConstantLR', factor=1.0, by_epoch=True)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=False)
+
+# train, val, test setting
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.QATEpochBasedLoop',
+    max_epochs=10,
+    val_interval=1)
+val_cfg = dict(_delete_=True, type='mmrazor.QATValLoop')
+
+# Make sure the buffer such as min_val/max_val in saved checkpoint is the same
+# among different rank.
+default_hooks = dict(sync=dict(type='SyncBuffersHook'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7babfa96ed184952a2d3d2df165d0191c7dadb8c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/README.md
@@ -0,0 +1,46 @@
+# Learned Step Size Quantization (LSQ)
+
+> [Learned Step Size Quantization](https://arxiv.org/abs/1902.08153)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Our approach builds upon existing methods for learning weights in quantized networks by improving how the quantizer itself is configured. Specifically, we introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. This approach works using different levels of precision as needed for a given system and requires only a simple modification of existing training code.
+
+## Results and models
+
+### Classification
+
+| Model    | Dataset  | Backend  | Top 1 Acc（fp32） | Top 1 Acc（int8） | Max Epochs | Config                                               | Download                                                                                                                                                                                                                                                                                         |
+| -------- | -------- | -------- | --------------- | --------------- | ---------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| resnet18 | ImageNet | openvino | 69.90           | 69.418          | 10         | [config](./lsq_openvino_resnet18_8xb32_10e_in1k.py)  | [model](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_10e_in1k_20230413_224237-36eac1f1.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_10e_in1k_20230413_224237-36eac1f1.log)   |
+| resnet18 | ImageNet | openvino | 69.90           | 69.992          | 100        | [config](./lsq_openvino_resnet18_8xb32_100e_in1k.py) | [model](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_100e_in1k_20230402_173316-ca5993bf.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_100e_in1k_20230402_173316-ca5993bf.log) |
+
+## Citation
+
+```latex
+ @misc{Esser_McKinstry_Bablani_Appuswamy_Modha_2019,
+    title={Learned Step Size Quantization},
+    journal={arXiv: Learning},
+    author={Esser, StevenK. and McKinstry, JeffreyL. and Bablani, Deepika and Appuswamy, Rathinakumar and Modha, DharmendraS.},
+    year={2019},
+    month={Feb}
+ }
+```
+
+## Getting Started
+
+**QAT for pretrain model**
+
+```
+python tools/train.py ${CONFIG}
+```
+
+**Test for quantized model**
+
+```
+python tools/test.py ${CONFIG} ${CKPT}
+```
+
+For more details, please refer to [Quantization User Guide](https://mmrazor.readthedocs.io/en/main/user_guides/quantization_user_guide.html)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_100e_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_100e_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..00e424141c72b77c3c21c3294e213233feb81254
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_100e_in1k.py
@@ -0,0 +1,68 @@
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
+
+resnet = _base_.model
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.LSQPerChannelObserver'),
+    a_observer=dict(type='mmrazor.LSQObserver'),
+    w_fake_quant=dict(type='mmrazor.LearnableFakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.LearnableFakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=resnet,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.0001, momentum=0.9, weight_decay=0.0001))
+
+# learning policy
+param_scheduler = dict(
+    _delete_=True,
+    type='CosineAnnealingLR',
+    T_max=100,
+    by_epoch=True,
+    begin=0,
+    end=100)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+# train, val, test setting
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.LSQEpochBasedLoop',
+    max_epochs=100,
+    val_interval=1,
+    freeze_bn_begin=1)
+val_cfg = dict(_delete_=True, type='mmrazor.QATValLoop')
+
+# Make sure the buffer such as min_val/max_val in saved checkpoint is the same
+# among different rank.
+default_hooks = dict(sync=dict(type='SyncBuffersHook'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_10e_in1k.py b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_10e_in1k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f931ddaf52a218f436ef670228f2612bdb7fa71d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_10e_in1k.py
@@ -0,0 +1,63 @@
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
+
+resnet = _base_.model
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.LSQPerChannelObserver'),
+    a_observer=dict(type='mmrazor.LSQObserver'),
+    w_fake_quant=dict(type='mmrazor.LearnableFakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.LearnableFakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+model = dict(
+    _delete_=True,
+    _scope_='mmrazor',
+    type='MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=resnet,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+optim_wrapper = dict(
+    optimizer=dict(type='SGD', lr=0.0001, momentum=0.9, weight_decay=0.0001))
+
+# learning policy
+param_scheduler = dict(
+    _delete_=True, type='ConstantLR', factor=1.0, by_epoch=True)
+
+model_wrapper_cfg = dict(
+    type='mmrazor.MMArchitectureQuantDDP',
+    broadcast_buffers=False,
+    find_unused_parameters=True)
+
+# train, val, test setting
+train_cfg = dict(
+    _delete_=True,
+    type='mmrazor.LSQEpochBasedLoop',
+    max_epochs=10,
+    val_interval=1,
+    freeze_bn_begin=1)
+val_cfg = dict(_delete_=True, type='mmrazor.QATValLoop')
+
+# Make sure the buffer such as min_val/max_val in saved checkpoint is the same
+# among different rank.
+default_hooks = dict(sync=dict(type='SyncBuffersHook'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/metafile.yml b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..89308d333000f63e300ebcf81abef5b90ec20590
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/quantization/qat/lsq/metafile.yml
@@ -0,0 +1,36 @@
+Collections:
+  - Name: LSQ
+    README: configs/quantization/qat/lsq/README.md
+Models:
+  - Name: lsq_openvino_resnet18_8xb32_10e_in1k.py
+    In Collection: LSQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.418
+    Config: configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_10e_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_10e_in1k_20230413_224237-36eac1f1.pth
+  - Name: lsq_openvino_resnet18_8xb32_100e_in1k.py
+    In Collection: LSQ
+    Metadata:
+      Backend: openvino
+      Float Model:
+        Config: mmcls::resnet/resnet18_8xb32_in1k.py
+        Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
+        Metrics:
+          Top 1 Accuracy: 69.90
+    Results:
+      - Task: Image Classification
+        Dataset: ImageNet-1k
+        Metrics:
+          Top 1 Accuracy: 69.992
+    Config: configs/quantization/qat/lsq/lsq_openvino_resnet18_8xb32_100e_in1k.py
+    Weights: https://download.openmmlab.com/mmrazor/v1/quantization/qat/openvino/lsq_openvino_resnet18_8xb32_100e_in1k_20230402_173316-ca5993bf.pth
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/README.md b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b32008e7f02f1b8f20a3df59e5dacf773ea86aa5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/README.md
@@ -0,0 +1,34 @@
+# Wide-ResNet
+
+> [Wide Residual Networks](https://arxiv.org/abs/1605.07146)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/156701329-2c7ec7bc-23da-401b-86bf-dea8567ccee8.png" width="90%"/>
+</div>
+
+## Results and models
+
+### Cifar10
+
+| Model  | Top-1 (%) |                Config                 |                                                                                                                                       Download                                                                                                                                        |
+| :----: | :-------: | :-----------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| WRN-16 |   93.04   | [config](./wrn16-w2_b16x8_cifar10.py) |   [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/wide_resnet/wrn16_2_b16x8_cifar10_20220831_204709-446b466e.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/wide_resnet/wrn16_2_b16x8_cifar10_20220831_204709-446b466e.json)   |
+| WRN-22 |  94.8700  | [config](./wrn22-w4_b16x8_cifar10.py) | [model](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn22-w4_b16x8_cifar10/wrn22-w4_b16x8_cifar10_20221201_170638-1d044c6f.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn22-w4_b16x8_cifar10/wrn22-w4_b16x8_cifar10_20221201_170638-1d044c6f.json) |
+| WRN-28 |   95.41   | [config](./wrn28-w4_b16x8_cifar10.py) |   [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.pth) \| [log](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmrazor/v1/wide_resnet/wrn28_4_b16x8_cifar10_20220831_173536-d6f8725c.json)   |
+| WRN-40 |  94.6700  | [config](./wrn40-w2_b16x8_cifar10.py) | [model](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn40-w2_b16x8_cifar10/wrn40-w2_b16x8_cifar10_20221201_170318-761c8c55.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/wide_resnet/wrn40-w2_b16x8_cifar10/wrn40-w2_b16x8_cifar10_20221201_170318-761c8c55.json) |
+
+## Citation
+
+```bibtex
+@INPROCEEDINGS{Zagoruyko2016WRN,
+    author = {Sergey Zagoruyko and Nikos Komodakis},
+    title = {Wide Residual Networks},
+    booktitle = {BMVC},
+    year = {2016}}
+```
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a518df4fa628e67331d73c699346beca6a3c96f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn16-w2_b16x8_cifar10.py
@@ -0,0 +1,7 @@
+_base_ = [
+    'mmcls::_base_/datasets/cifar10_bs16.py',
+    '../../../_base_/vanilla_models/wrn16_2_cifar10.py',
+    'mmcls::_base_/schedules/cifar10_bs128.py',
+    'mmcls::_base_/default_runtime.py',
+]
+test_evaluator = dict(topk=(1, 5))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn22-w4_b16x8_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn22-w4_b16x8_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..7c48887966c90c6c57c1aa373f5185fcab76d8ef
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn22-w4_b16x8_cifar10.py
@@ -0,0 +1,3 @@
+_base_ = ['wrn16-w2_b16x8_cifar10.py']
+model = dict(
+    backbone=dict(depth=22, widen_factor=4), head=dict(in_channels=256, ))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..83f0f8153f4884adf5ec660c81e594c280f11b93
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn28-w4_b16x8_cifar10.py
@@ -0,0 +1,3 @@
+_base_ = ['wrn16-w2_b16x8_cifar10.py']
+model = dict(
+    backbone=dict(depth=28, widen_factor=4), head=dict(in_channels=256, ))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn40-w2_b16x8_cifar10.py b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn40-w2_b16x8_cifar10.py
new file mode 100644
index 0000000000000000000000000000000000000000..de44e9191d268c0223beff403b9e2c27c4adfd6b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/configs/vanilla/mmcls/wide-resnet/wrn40-w2_b16x8_cifar10.py
@@ -0,0 +1,3 @@
+_base_ = ['wrn16-w2_b16x8_cifar10.py']
+model = dict(
+    backbone=dict(depth=40, widen_factor=2), head=dict(in_channels=128, ))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc5acaaebb67dc0bee300c163c4024c836d7db8a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/__init__.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import mmcv
+import mmengine
+from mmengine.utils import digit_version
+
+from .version import __version__
+
+mmcv_minimum_version = '2.0.0rc1'
+mmcv_maximum_version = '2.1.0'
+mmcv_version = digit_version(mmcv.__version__)
+
+mmengine_minimum_version = '0.1.0'
+mmengine_maximum_version = '1.0.0'
+mmengine_version = digit_version(mmengine.__version__)
+
+assert (mmcv_version >= digit_version(mmcv_minimum_version)
+        and mmcv_version <= digit_version(mmcv_maximum_version)), \
+    f'MMCV=={mmcv.__version__} is used but incompatible. ' \
+    f'Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}.'
+
+assert (mmengine_version >= digit_version(mmengine_minimum_version)
+        and mmengine_version < digit_version(mmengine_maximum_version)), \
+    f'MMEngine=={mmengine.__version__} is used but incompatible. ' \
+    f'Please install mmengine>={mmengine_minimum_version}, ' \
+    f'<{mmengine_maximum_version}.'
+
+__all__ = ['__version__', 'digit_version']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6eab05e8f8ffaabd7831daeab2746938d05a2f9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .crd_dataset_wrapper import CRDDataset
+from .transforms import AutoAugment, AutoAugmentV2, PackCRDClsInputs
+
+__all__ = ['AutoAugment', 'AutoAugmentV2', 'PackCRDClsInputs', 'CRDDataset']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/crd_dataset_wrapper.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/crd_dataset_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..aa62f383b3b565f5e5d9568c3b88680cce34ce1b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/crd_dataset_wrapper.py
@@ -0,0 +1,254 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import warnings
+from typing import Any, Dict, List, Union
+
+import numpy as np
+from mmengine.dataset.base_dataset import BaseDataset, force_full_init
+
+from mmrazor.registry import DATASETS
+
+
+@DATASETS.register_module()
+class CRDDataset:
+    """A wrapper of `CRD` dataset.
+
+    Suitable for image classification datasets like CIFAR. Following
+    the sampling strategy in the `paper <https://arxiv.org/abs/1908.03195>`_,
+    in each epoch, each data sample has contrast information.
+    Contrast information for an image is indices of negetive data samples.
+    Note:
+        ``CRDDataset`` should not inherit from ``BaseDataset``
+        since ``get_subset`` and ``get_subset_`` could  produce ambiguous
+        meaning sub-dataset which conflicts with original dataset. If you
+        want to use a sub-dataset of ``CRDDataset``, you should set
+        ``indices`` arguments for wrapped dataset which inherit from
+        ``BaseDataset``.
+    Args:
+        dataset (BaseDataset or dict): The dataset to be repeated.
+        neg_num (int): number of negetive data samples.
+        percent (float): sampling percentage.
+        lazy_init (bool, optional): whether to load annotation during
+            instantiation. Defaults to False
+        num_classes (int, optional): Number of classes. Defaults to None.
+        sample_mode (str, optional): Data sampling mode. Defaults to 'exact'.
+    """
+
+    def __init__(self,
+                 dataset: Union[BaseDataset, dict],
+                 neg_num: int,
+                 percent: float,
+                 lazy_init: bool = False,
+                 num_classes: int = None,
+                 sample_mode: str = 'exact') -> None:
+        if isinstance(dataset, dict):
+            self.dataset = DATASETS.build(dataset)
+        elif isinstance(dataset, BaseDataset):
+            self.dataset = dataset
+        else:
+            raise TypeError(
+                'elements in datasets sequence should be config or '
+                f'`BaseDataset` instance, but got {type(dataset)}')
+        self._metainfo = self.dataset.metainfo
+
+        self._fully_initialized = False
+
+        # CRD unique attributes.
+        self.num_classes = num_classes
+        self.neg_num = neg_num
+        self.sample_mode = sample_mode
+        self.percent = percent
+
+        if not lazy_init:
+            self.full_init()
+
+    def _parse_fullset_contrast_info(self) -> None:
+        """parse contrast information of the whole dataset."""
+        assert self.sample_mode in [
+            'exact', 'random'
+        ], ('`sample_mode` must in [`exact`, `random`], '
+            f'but get `{self.sample_mode}`')
+
+        # Handle special occasion:
+        #   if dataset's ``CLASSES`` is not list of consecutive integers,
+        #   e.g. [2, 3, 5].
+        num_classes: int = self.num_classes  # type: ignore
+        if num_classes is None:
+            num_classes = max(self.dataset.get_gt_labels()) + 1
+
+        if not self.dataset.test_mode:  # type: ignore
+            # Parse info.
+            self.gt_labels = self.dataset.get_gt_labels()
+            self.num_samples: int = self.dataset.__len__()
+
+            self.cls_positive: List[List[int]] = [[]
+                                                  for _ in range(num_classes)
+                                                  ]  # type: ignore
+            for i in range(self.num_samples):
+                self.cls_positive[self.gt_labels[i]].append(i)
+
+            self.cls_negative: List[List[int]] = [[]
+                                                  for i in range(num_classes)
+                                                  ]  # type: ignore
+            for i in range(num_classes):  # type: ignore
+                for j in range(num_classes):  # type: ignore
+                    if j == i:
+                        continue
+                    self.cls_negative[i].extend(self.cls_positive[j])
+
+            self.cls_positive = [
+                np.asarray(self.cls_positive[i])
+                for i in range(num_classes)  # type: ignore
+            ]
+            self.cls_negative = [
+                np.asarray(self.cls_negative[i])
+                for i in range(num_classes)  # type: ignore
+            ]
+
+            if 0 < self.percent < 1:
+                n = int(len(self.cls_negative[0]) * self.percent)
+                self.cls_negative = [
+                    np.random.permutation(self.cls_negative[i])[0:n]
+                    for i in range(num_classes)  # type: ignore
+                ]
+
+            self.cls_positive = np.asarray(self.cls_positive)
+            self.cls_negative = np.asarray(self.cls_negative)
+
+    @property
+    def metainfo(self) -> dict:
+        """Get the meta information of the repeated dataset.
+
+        Returns:
+            dict: The meta information of repeated dataset.
+        """
+        return copy.deepcopy(self._metainfo)
+
+    def _get_contrast_info(self, data: Dict, idx: int) -> Dict:
+        """Get contrast information for each data sample."""
+        if self.sample_mode == 'exact':
+            pos_idx = idx
+        elif self.sample_mode == 'random':
+            pos_idx = np.random.choice(self.cls_positive[self.gt_labels[idx]],
+                                       1)
+            pos_idx = pos_idx[0]  # type: ignore
+        else:
+            raise NotImplementedError(self.sample_mode)
+        replace = True if self.neg_num > \
+            len(self.cls_negative[self.gt_labels[idx]]) else False
+        neg_idx = np.random.choice(
+            self.cls_negative[self.gt_labels[idx]],
+            self.neg_num,
+            replace=replace)
+        contrast_sample_idxs = np.hstack((np.asarray([pos_idx]), neg_idx))
+        data['contrast_sample_idxs'] = contrast_sample_idxs
+        return data
+
+    def full_init(self):
+        """Loop to ``full_init`` each dataset."""
+        if self._fully_initialized:
+            return
+
+        self.dataset.full_init()
+        self._parse_fullset_contrast_info()
+
+        self._fully_initialized = True
+
+    @force_full_init
+    def get_data_info(self, idx: int) -> Dict:
+        """Get annotation by index.
+
+        Args:
+            idx (int): Global index of ``ConcatDataset``.
+        Returns:
+            dict: The idx-th annotation of the dataset.
+        """
+        data_info = self.dataset.get_data_info(idx)  # type: ignore
+        if not self.dataset.test_mode:  # type: ignore
+            data_info = self._get_contrast_info(data_info, idx)
+        return data_info
+
+    def prepare_data(self, idx) -> Any:
+        """Get data processed by ``self.pipeline``.
+
+        Args:
+            idx (int): The index of ``data_info``.
+
+        Returns:
+            Any: Depends on ``self.pipeline``.
+        """
+        data_info = self.get_data_info(idx)
+        return self.dataset.pipeline(data_info)
+
+    def __getitem__(self, idx: int) -> dict:
+        """Get the idx-th image and data information of dataset after
+        ``self.pipeline``, and ``full_init`` will be called if the dataset has
+        not been fully initialized.
+
+        During training phase, if ``self.pipeline`` get ``None``,
+        ``self._rand_another`` will be called until a valid image is fetched or
+         the maximum limit of refetech is reached.
+
+        Args:
+            idx (int): The index of self.data_list.
+
+        Returns:
+            dict: The idx-th image and data information of dataset after
+            ``self.pipeline``.
+        """
+        # Performing full initialization by calling `__getitem__` will consume
+        # extra memory. If a dataset is not fully initialized by setting
+        # `lazy_init=True` and then fed into the dataloader. Different workers
+        # will simultaneously read and parse the annotation. It will cost more
+        # time and memory, although this may work. Therefore, it is recommended
+        # to manually call `full_init` before dataset fed into dataloader to
+        # ensure all workers use shared RAM from master process.
+        if not self._fully_initialized:
+            warnings.warn(
+                'Please call `full_init()` method manually to accelerate '
+                'the speed.')
+            self.full_init()
+
+        if self.dataset.test_mode:
+            data = self.prepare_data(idx)
+            if data is None:
+                raise Exception('Test time pipline should not get `None` '
+                                'data_sample')
+            return data
+
+        for _ in range(self.dataset.max_refetch + 1):
+            data = self.prepare_data(idx)
+            # Broken images or random augmentations may cause the returned data
+            # to be None
+            if data is None:
+                idx = self.dataset._rand_another()
+                continue
+            return data
+
+        raise Exception(
+            f'Cannot find valid image after {self.dataset.max_refetch}! '
+            'Please check your image path and pipeline')
+
+    @force_full_init
+    def __len__(self):
+        return len(self.dataset)
+
+    def get_subset_(self, indices: Union[List[int], int]) -> None:
+        """Not supported in ``ClassBalancedDataset`` for the ambiguous meaning
+        of sub-dataset."""
+        raise NotImplementedError(
+            '`ClassBalancedDataset` dose not support `get_subset` and '
+            '`get_subset_` interfaces because this will lead to ambiguous '
+            'implementation of some methods. If you want to use `get_subset` '
+            'or `get_subset_` interfaces, please use them in the wrapped '
+            'dataset first and then use `ClassBalancedDataset`.')
+
+    def get_subset(self, indices: Union[List[int], int]) -> 'BaseDataset':
+        """Not supported in ``ClassBalancedDataset`` for the ambiguous meaning
+        of sub-dataset."""
+        raise NotImplementedError(
+            '`ClassBalancedDataset` dose not support `get_subset` and '
+            '`get_subset_` interfaces because this will lead to ambiguous '
+            'implementation of some methods. If you want to use `get_subset` '
+            'or `get_subset_` interfaces, please use them in the wrapped '
+            'dataset first and then use `ClassBalancedDataset`.')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf32d36384e009f5e729b7ef566b77a8a204fead
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .auto_augment import AutoAugment
+from .auto_augmentv2 import AutoAugmentV2
+from .formatting import PackCRDClsInputs
+
+__all__ = ['AutoAugment', 'AutoAugmentV2', 'PackCRDClsInputs']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augment.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augment.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d27264546f01934c34d7ba0912a1b31b7e94a7a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augment.py
@@ -0,0 +1,415 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import math
+import random
+from typing import Optional
+
+import numpy as np
+import PIL
+from mmcv.transforms import BaseTransform
+from PIL import Image, ImageEnhance, ImageOps
+
+from mmrazor.registry import TRANSFORMS
+
+_PIL_VER = tuple([int(x) for x in PIL.__version__.split('.')[:2]])
+
+_FILL = (128, 128, 128)
+
+# This signifies the max integer that the controller RNN could predict for the
+# augmentation scheme.
+_MAX_LEVEL = 10.
+
+_HPARAMS_DEFAULT = dict(
+    translate_const=250,
+    img_mean=_FILL,
+)
+
+_RANDOM_INTERPOLATION = (Image.NEAREST, Image.BILINEAR, Image.BICUBIC)
+
+
+def _interpolation(kwargs):
+    interpolation = kwargs.pop('resample', Image.NEAREST)
+    if isinstance(interpolation, (list, tuple)):
+        return random.choice(interpolation)
+    else:
+        return interpolation
+
+
+def _check_args_tf(kwargs):
+    if 'fillcolor' in kwargs and _PIL_VER < (5, 0):
+        kwargs.pop('fillcolor')
+    kwargs['resample'] = _interpolation(kwargs)
+
+
+def shear_x(img, factor, **kwargs):
+    """ShearX images."""
+
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, factor, 0, 0, 1, 0),
+                         **kwargs)
+
+
+def shear_y(img, factor, **kwargs):
+    """ShearY images."""
+
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, factor, 1, 0),
+                         **kwargs)
+
+
+def translate_x_rel(img, pct, **kwargs):
+    """TranslateXRel images."""
+
+    pixels = pct * img.size[0]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+
+
+def translate_y_rel(img, pct, **kwargs):
+    """TranslateYRel images."""
+
+    pixels = pct * img.size[1]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+
+
+def translate_x_abs(img, pixels, **kwargs):
+    """TranslateX images."""
+
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+
+
+def translate_y_abs(img, pixels, **kwargs):
+    """TranslateY images."""
+
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+
+
+def rotate(img, degrees, **kwargs):
+    """Rotate images."""
+
+    _check_args_tf(kwargs)
+    if _PIL_VER >= (5, 2):
+        return img.rotate(degrees, **kwargs)
+    elif _PIL_VER >= (5, 0):
+        w, h = img.size
+        post_trans = (0, 0)
+        rotn_center = (w / 2.0, h / 2.0)
+        angle = -math.radians(degrees)
+        matrix = [
+            round(math.cos(angle), 15),
+            round(math.sin(angle), 15),
+            0.0,
+            round(-math.sin(angle), 15),
+            round(math.cos(angle), 15),
+            0.0,
+        ]
+
+        def transform(x, y, matrix):
+            (a, b, c, d, e, f) = matrix
+            return a * x + b * y + c, d * x + e * y + f
+
+        matrix[2], matrix[5] = transform(-rotn_center[0] - post_trans[0],
+                                         -rotn_center[1] - post_trans[1],
+                                         matrix)
+        matrix[2] += rotn_center[0]
+        matrix[5] += rotn_center[1]
+        return img.transform(img.size, Image.AFFINE, matrix, **kwargs)
+    else:
+        return img.rotate(degrees, resample=kwargs['resample'])
+
+
+def auto_contrast(img, **__):
+    """AutoContrast images."""
+
+    return ImageOps.autocontrast(img)
+
+
+def invert(img, **__):
+    """Invert images."""
+
+    return ImageOps.invert(img)
+
+
+def equalize(img, **__):
+    """Equalize images."""
+
+    return ImageOps.equalize(img)
+
+
+def solarize(img, thresh, **__):
+    """Solarize images."""
+
+    return ImageOps.solarize(img, thresh)
+
+
+def solarize_add(img, add, thresh=128, **__):
+    """SolarizeAdd images."""
+
+    lut = []
+    for i in range(256):
+        if i < thresh:
+            lut.append(min(255, i + add))
+        else:
+            lut.append(i)
+    if img.mode in ('L', 'RGB'):
+        if img.mode == 'RGB' and len(lut) == 256:
+            lut = lut + lut + lut
+        return img.point(lut)
+    else:
+        return img
+
+
+def posterize(img, bits_to_keep, **__):
+    """Posterize images."""
+
+    if bits_to_keep >= 8:
+        return img
+    bits_to_keep = max(1, bits_to_keep)  # prevent all 0 images
+    return ImageOps.posterize(img, bits_to_keep)
+
+
+def contrast(img, factor, **__):
+    """Contrast images."""
+
+    return ImageEnhance.Contrast(img).enhance(factor)
+
+
+def color(img, factor, **__):
+    """Color images."""
+
+    return ImageEnhance.Color(img).enhance(factor)
+
+
+def brightness(img, factor, **__):
+    """Brightness images."""
+
+    return ImageEnhance.Brightness(img).enhance(factor)
+
+
+def sharpness(img, factor, **__):
+    """Sharpness images."""
+
+    return ImageEnhance.Sharpness(img).enhance(factor)
+
+
+def _randomly_negate(v):
+    """With 50% prob, negate the value."""
+    return -v if random.random() > 0.5 else v
+
+
+class AutoAugmentOp(BaseTransform):
+
+    def __init__(self, name, prob, magnitude, hparams={}):
+        NAME_TO_OP = {
+            'AutoContrast': auto_contrast,
+            'Equalize': equalize,
+            'Invert': invert,
+            'Rotate': rotate,
+            'Posterize': posterize,
+            'Posterize2': posterize,
+            'Solarize': solarize,
+            'SolarizeAdd': solarize_add,
+            'Color': color,
+            'Contrast': contrast,
+            'Brightness': brightness,
+            'Sharpness': sharpness,
+            'ShearX': shear_x,
+            'ShearY': shear_y,
+            'TranslateX': translate_x_abs,
+            'TranslateY': translate_y_abs,
+            'TranslateXRel': translate_x_rel,
+            'TranslateYRel': translate_y_rel,
+        }
+        self.aug_fn = NAME_TO_OP[name]
+        self.prob = prob
+        self.magnitude = magnitude
+        # If std deviation of magnitude is > 0, we introduce some randomness
+        # in the usually fixed policy and sample magnitude from normal dist
+        # with mean magnitude and std-dev of magnitude_std.
+        # NOTE This is being tested as it's not in paper or reference impl.
+        self.magnitude_std = 0.5  # FIXME add arg/hparam
+        self.kwargs = {
+            'fillcolor':
+            hparams['img_mean'] if 'img_mean' in hparams else _FILL,
+            'resample':
+            hparams['interpolation']
+            if 'interpolation' in hparams else _RANDOM_INTERPOLATION
+        }
+
+        self._get_magnitude(name)
+
+    def _get_magnitude(self, name):
+        if name == 'AutoContrast' or name == 'Equalize' or name == 'Invert':
+            self.level_fn = self.pass_fn
+        elif name == 'Rotate':
+            self.level_fn = self._rotate_level_to_arg
+        elif name == 'Posterize':
+            self.level_fn = self._conversion0
+        elif name == 'Posterize2':
+            self.level_fn = self._conversion1
+        elif name == 'Solarize':
+            self.level_fn = self._conversion2
+        elif name == 'SolarizeAdd':
+            self.level_fn = self._conversion3
+        elif name in ['Color', 'Contrast', 'Brightness', 'Sharpness']:
+            self.level_fn = self._enhance_level_to_arg
+        elif name == 'ShearX' or name == 'ShearY':
+            self.level_fn = self._shear_level_to_arg
+        elif name == 'TranslateX' or name == 'TranslateY':
+            self.level_fn = self._translate_abs_level_to_arg2
+        elif name == 'TranslateXRel' or name == 'TranslateYRel':
+            self.level_fn = self._translate_rel_level_to_arg
+        else:
+            print('{} not recognized'.format({}))
+
+        magnitude = self.magnitude
+        if self.magnitude_std and self.magnitude_std > 0:
+            magnitude = random.gauss(magnitude, self.magnitude_std)
+        magnitude = min(_MAX_LEVEL, max(0, magnitude))
+        self.level_args = self.level_fn(magnitude)
+
+    def _rotate_level_to_arg(self, level):
+        # range [-30, 30]
+        level = (level / _MAX_LEVEL) * 30.
+        level = _randomly_negate(level)
+        return (level, )
+
+    def _enhance_level_to_arg(self, level):
+        # range [0.1, 1.9]
+        return ((level / _MAX_LEVEL) * 1.8 + 0.1, )
+
+    def _shear_level_to_arg(self, level):
+        # range [-0.3, 0.3]
+        level = (level / _MAX_LEVEL) * 0.3
+        level = _randomly_negate(level)
+        return (level, )
+
+    def _translate_abs_level_to_arg2(self, level):
+        level = (level / _MAX_LEVEL) * float(
+            _HPARAMS_DEFAULT['translate_const'])
+        level = _randomly_negate(level)
+        return (level, )
+
+    def _translate_rel_level_to_arg(self, level):
+        # range [-0.45, 0.45]
+        level = (level / _MAX_LEVEL) * 0.45
+        level = _randomly_negate(level)
+        return (level, )
+
+    def pass_fn(self, input):
+        return ()
+
+    def _conversion0(self, input):
+        return (int((input / _MAX_LEVEL) * 4) + 4, )
+
+    def _conversion1(self, input):
+        return (4 - int((input / _MAX_LEVEL) * 4), )
+
+    def _conversion2(self, input):
+        return (int((input / _MAX_LEVEL) * 256), )
+
+    def _conversion3(self, input):
+        return (int((input / _MAX_LEVEL) * 110), )
+
+    def transform(self, results):
+        if self.prob < random.random():
+            return results
+
+        for key in results.get('img_fields', ['img']):
+            img = Image.fromarray(results[key])
+            img = self.aug_fn(img, *self.level_args, **self.kwargs)
+            results[key] = np.array(img)
+        return results
+
+
+@TRANSFORMS.register_module()
+class AutoAugment(BaseTransform):
+    """Auto Augment Implementation adapted from timm: ImageNet
+    auto_augment_policy is 'original': From TPU EfficientNet impl
+    https://github.com/rwightman/pytorch-image-models.
+
+    ImageNet auto_augment_policy is 'v0':
+    A PyTorch implementation of : `AutoAugment: Learning Augmentation
+    Policies from Data <https://arxiv.org/abs/1805.09501>`_
+    """
+    auto_augment_policy = {
+        'original': [
+            [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+            [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+            [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+            [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+            [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+            [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+            [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+            [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+            [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+            [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+            [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+            [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+            [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
+            [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+            [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+            [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+            [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+            [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+            [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+            [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+            [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+            [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+            [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)],
+            [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+            [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+        ],
+        'v0': [
+            [('Posterize', 0.4, 8), ('Rotate', 0.6, 9)],
+            [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+            [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+            [('Posterize', 0.6, 7), ('Posterize', 0.6, 6)],
+            [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+            [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+            [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+            [('Posterize', 0.8, 5), ('Equalize', 1.0, 2)],
+            [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+            [('Equalize', 0.6, 8), ('Posterize', 0.4, 6)],
+            [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+            [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+            [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+            [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+            [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+            [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+            [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+            [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+            [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+            [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+            [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+            [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+            [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+            [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+            [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        ]
+    }
+
+    def __init__(self, policies: str = 'original', extra_params: dict = None):
+        self.policies = copy.deepcopy(self.auto_augment_policy[policies])
+        extra_params = extra_params if extra_params else dict(
+            translate_const=250, img_mean=_FILL)
+        self.sub_policy = [[AutoAugmentOp(*a, extra_params) for a in sp]
+                           for sp in self.policies]
+
+    def transform(self, results: dict) -> Optional[dict]:
+        sub_policy = random.choice(self.sub_policy)
+        for op in sub_policy:
+            results = op(results)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(policies={self.policies})'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augmentv2.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augmentv2.py
new file mode 100644
index 0000000000000000000000000000000000000000..60d5647ed598dcfc4188d408fa3760344dc1861b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/auto_augmentv2.py
@@ -0,0 +1,540 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import math
+import random
+
+import numpy as np
+import PIL
+from mmcv.transforms import Compose
+from PIL import Image, ImageEnhance, ImageOps
+
+from mmrazor.registry import TRANSFORMS
+
+_PIL_VER = tuple([int(x) for x in PIL.__version__.split('.')[:2]])
+
+_FILL = (128, 128, 128)
+
+# This signifies the max integer that the controller RNN could predict for the
+# augmentation scheme.
+_MAX_LEVEL = 10.
+
+_HPARAMS_DEFAULT = dict(
+    translate_const=250,
+    img_mean=_FILL,
+)
+
+_interpolation_name_to_pil = {
+    'bilinear': Image.BILINEAR,
+    'bicubic': Image.BICUBIC,
+    'nearest': Image.NEAREST,
+}
+
+
+@TRANSFORMS.register_module()
+class AutoAugmentOp(object):
+    """Base class for ops of autoaugment."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        self.prob = prob
+        self.magnitude = magnitude
+        self.magnitude_std = 0.5
+
+        self.kwargs = {
+            'fillcolor':
+            extra_params['img_mean']
+            if 'img_mean' in extra_params else _FILL,  # noqa: E501
+            'resample':
+            extra_params['interpolation'] if 'interpolation' in extra_params
+            else _interpolation_name_to_pil.values()  # noqa: E501,E131
+        }
+        self._get_magnitude()
+
+    def __call__(self, results):
+        return results
+
+    def _interpolation(self, kwargs):
+        interpolation = kwargs.pop('resample', Image.NEAREST)
+        if isinstance(interpolation, (list, tuple)):
+            return random.choice(interpolation)
+        else:
+            return interpolation
+
+    def _check_args_tf(self, kwargs):
+        if 'fillcolor' in kwargs and _PIL_VER < (5, 0):
+            kwargs.pop('fillcolor')
+        kwargs['resample'] = self._interpolation(kwargs)
+
+    def _get_magnitude(self):
+        magnitude = self.magnitude
+        if self.magnitude_std and self.magnitude_std > 0:
+            magnitude = random.gauss(magnitude, self.magnitude_std)
+        magnitude = min(_MAX_LEVEL, max(0, magnitude))
+        self.magnitude = magnitude
+
+    def _randomly_negate(self, v):
+        """With 50% prob, negate the value."""
+        return -v if random.random() > 0.5 else v
+
+    def _rotate_level_to_arg(self, level):
+        # range [-30, 30]
+        level = (level / _MAX_LEVEL) * 30.
+        level = self._randomly_negate(level)
+        return (level, )
+
+    def _enhance_level_to_arg(self, level):
+        # range [0.1, 1.9]
+        return ((level / _MAX_LEVEL) * 1.8 + 0.1, )
+
+    def _shear_level_to_arg(self, level):
+        # range [-0.3, 0.3]
+        level = (level / _MAX_LEVEL) * 0.3
+        level = self._randomly_negate(level)
+        return (level, )
+
+    def _translate_abs_level_to_arg(self, level):
+        level = (level / _MAX_LEVEL) * \
+                float(_HPARAMS_DEFAULT['translate_const'])
+        level = self._randomly_negate(level)
+        return (level, )
+
+    def _translate_rel_level_to_arg(self, level):
+        # range [-0.45, 0.45]
+        level = (level / _MAX_LEVEL) * 0.45
+        level = self._randomly_negate(level)
+        return (level, )
+
+
+@TRANSFORMS.register_module()
+class ShearX(AutoAugmentOp):
+    """ShearX images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._shear_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, factor, 0, 0, 1, 0), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class ShearY(AutoAugmentOp):
+    """ShearY images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._shear_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, 0, 0, factor, 1, 0), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class TranslateXRel(AutoAugmentOp):
+    """TranslateXRel images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._translate_rel_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            pixels = self.level_args[0] * img.size[0]
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, 0, pixels, 0, 1, 0), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class TranslateYRel(AutoAugmentOp):
+    """TranslateYRel images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._translate_rel_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            pixels = self.level_args[0] * img.size[1]
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, 0, 0, 0, 1, pixels), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class TranslateX(AutoAugmentOp):
+    """TranslateX images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._translate_abs_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            pixels = self.level_args[0]
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, 0, pixels, 0, 1, 0), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class TranslateY(AutoAugmentOp):
+    """TranslateY images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._translate_abs_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            pixels = self.level_args[0]
+            img = img.transform(img.size, Image.AFFINE,
+                                (1, 0, 0, 0, 1, pixels), **kwargs)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class RotateV2(AutoAugmentOp):
+    """Rotate images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._rotate_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def transform(self, x, y, matrix):
+        (a, b, c, d, e, f) = matrix
+        return a * x + b * y + c, d * x + e * y + f
+
+    def __call__(self, results, **kwargs):
+        if self.prob < random.random():
+            return results
+        degrees = self.level_args[0]
+        self._check_args_tf(kwargs)
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            if _PIL_VER >= (5, 2):
+                img = img.rotate(degrees, **kwargs)
+            elif _PIL_VER >= (5, 0):
+                w, h = img.size
+                post_trans = (0, 0)
+                rotn_center = (w / 2.0, h / 2.0)
+                angle = -math.radians(degrees)
+                matrix = [
+                    round(math.cos(angle), 15),
+                    round(math.sin(angle), 15),
+                    0.0,
+                    round(-math.sin(angle), 15),
+                    round(math.cos(angle), 15),
+                    0.0,
+                ]
+                matrix[2], matrix[5] = self.transform(
+                    -rotn_center[0] - post_trans[0],
+                    -rotn_center[1] - post_trans[1], matrix)
+                matrix[2] += rotn_center[0]
+                matrix[5] += rotn_center[1]
+                img = img.transform(img.size, Image.AFFINE, matrix, **kwargs)
+            else:
+                img = img.rotate(degrees, resample=kwargs['resample'])
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class AutoContrastV2(AutoAugmentOp):
+    """AutoContrast images."""
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageOps.autocontrast(img)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class InvertV2(AutoAugmentOp):
+    """Invert images."""
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageOps.invert(img)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class EqualizeV2(AutoAugmentOp):
+    """Equalize images."""
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageOps.equalize(img)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class SolarizeV2(AutoAugmentOp):
+    """Solarize images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        thresh = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageOps.solarize(img, thresh)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+    def level_fn(self, input):
+        return (int((input / _MAX_LEVEL) * 256), )
+
+
+@TRANSFORMS.register_module()
+class SolarizeAddV2(AutoAugmentOp):
+    """SolarizeAdd images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        thresh = 128
+        add = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            lut = []
+            for i in range(256):
+                if i < thresh:
+                    lut.append(min(255, i + add))
+                else:
+                    lut.append(i)
+            if img.mode in ('L', 'RGB'):
+                if img.mode == 'RGB' and len(lut) == 256:
+                    lut = lut + lut + lut
+                img = img.point(lut)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+    def level_fn(self, input):
+        return (int((input / _MAX_LEVEL) * 110), )
+
+
+@TRANSFORMS.register_module()
+class PosterizeV2(AutoAugmentOp):
+    """Posterize images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        bits_to_keep = self.level_args[0]
+        if bits_to_keep >= 8:
+            return results
+        bits_to_keep = max(1, bits_to_keep)  # prevent all 0 images
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageOps.posterize(img, bits_to_keep)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+    def level_fn(self, input):
+        return (int((input / _MAX_LEVEL) * 4) + 4, )
+
+
+@TRANSFORMS.register_module()
+class ContrastV2(AutoAugmentOp):
+    """Contrast images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._enhance_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageEnhance.Contrast(img).enhance(factor)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class Color(AutoAugmentOp):
+    """Color images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._enhance_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageEnhance.Color(img).enhance(factor)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class BrightnessV2(AutoAugmentOp):
+    """Brightness images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._enhance_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageEnhance.Brightness(img).enhance(factor)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class SharpnessV2(AutoAugmentOp):
+    """Sharpness images."""
+
+    def __init__(self, prob, magnitude, extra_params: dict):
+        super().__init__(prob, magnitude, extra_params)
+        self.level_fn = self._enhance_level_to_arg
+        self.level_args = self.level_fn(self.magnitude)
+
+    def __call__(self, results, **__):
+        if self.prob < random.random():
+            return results
+        factor = self.level_args[0]
+        for key in results.get('img_fields', ['img']):
+            img = results[key]
+            img = Image.fromarray(img)
+            img = ImageEnhance.Sharpness(img).enhance(factor)
+            img = np.array(img)
+            results[key] = img
+        return results
+
+
+@TRANSFORMS.register_module()
+class AutoAugmentV2(object):
+    """Auto Augment Implementation adapted from timm:
+
+    https://github.com/rwightman/pytorch-image-models
+    """
+
+    def __init__(self, policies):
+        self.policies = copy.deepcopy(policies)
+        self.sub_policy = [Compose(policy) for policy in self.policies]
+
+    def __call__(self, results):
+        sub_policy = random.choice(self.sub_policy)
+        results = sub_policy(results)
+        return results
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(policies={self.policies})'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/formatting.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/formatting.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2ba63ddce1358301b602b8323b09f4c990e896e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/datasets/transforms/formatting.py
@@ -0,0 +1,73 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+try:
+    from mmcls.datasets.transforms.formatting import PackClsInputs, to_tensor
+    from mmcls.structures import ClsDataSample
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    PackClsInputs = get_placeholder('mmcls')
+    to_tensor = get_placeholder('mmcls')
+    ClsDataSample = get_placeholder('mmcls')
+
+import warnings
+from typing import Any, Dict, Generator
+
+import numpy as np
+import torch
+
+from mmrazor.registry import TRANSFORMS
+
+
+@TRANSFORMS.register_module()
+class PackCRDClsInputs(PackClsInputs):
+
+    def transform(self, results: Dict) -> Dict:
+        """Method to pack the input data.
+
+        Args:
+            results (dict): Result dict from the data pipeline.
+
+        Returns:
+            dict:
+            - 'inputs' (obj:`torch.Tensor`): The forward data of models.
+            - 'data_sample' (obj:`ClsDataSample`): The annotation info of the
+              sample.
+        """
+        packed_results = dict()
+        if 'img' in results:
+            img = results['img']
+            if len(img.shape) < 3:
+                img = np.expand_dims(img, -1)
+            img = np.ascontiguousarray(img.transpose(2, 0, 1))
+            packed_results['inputs'] = to_tensor(img)
+        else:
+            warnings.warn(
+                'Cannot get "img" in the input dict of `PackClsInputs`,'
+                'please make sure `LoadImageFromFile` has been added '
+                'in the data pipeline or images have been loaded in '
+                'the dataset.')
+
+        data_sample = ClsDataSample()
+        if 'gt_label' in results:
+            gt_label = results['gt_label']
+            data_sample.set_gt_label(gt_label)
+
+        if 'sample_idx' in results:
+            # transfer `sample_idx` to Tensor
+            self.meta_keys: Generator[Any, None, None] = (
+                key for key in self.meta_keys if key != 'sample_idx')
+            value = results['sample_idx']
+            if isinstance(value, int):
+                value = torch.tensor(value).to(torch.long)
+            data_sample.set_data(dict(sample_idx=value))
+
+        if 'contrast_sample_idxs' in results:
+            value = results['contrast_sample_idxs']
+            if isinstance(value, np.ndarray):
+                value = torch.from_numpy(value).to(torch.long)
+            data_sample.set_data(dict(contrast_sample_idxs=value))
+
+        img_meta = {k: results[k] for k in self.meta_keys if k in results}
+        data_sample.set_metainfo(img_meta)
+        packed_results['data_samples'] = data_sample
+
+        return packed_results
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b0d4a692574758d47ba58ac2d7d663736b98a6f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .hooks import (DMCPSubnetHook, DumpSubnetHook, EstimateResourcesHook,
+                    StopDistillHook)
+from .optimizers import SeparateOptimWrapperConstructor
+from .runner import (AutoSlimGreedySearchLoop, DartsEpochBasedTrainLoop,
+                     DartsIterBasedTrainLoop, EvolutionSearchLoop,
+                     GreedySamplerTrainLoop, LSQEpochBasedLoop, PTQLoop,
+                     QATEpochBasedLoop, QATValLoop, SelfDistillValLoop,
+                     SingleTeacherDistillValLoop, SlimmableValLoop,
+                     SubnetValLoop)
+
+__all__ = [
+    'DMCPSubnetHook', 'StopDistillHook', 'SeparateOptimWrapperConstructor',
+    'DumpSubnetHook', 'SingleTeacherDistillValLoop',
+    'DartsEpochBasedTrainLoop', 'DartsIterBasedTrainLoop', 'SlimmableValLoop',
+    'EvolutionSearchLoop', 'GreedySamplerTrainLoop', 'EstimateResourcesHook',
+    'SelfDistillValLoop', 'AutoSlimGreedySearchLoop', 'SubnetValLoop',
+    'PTQLoop', 'QATEpochBasedLoop', 'LSQEpochBasedLoop', 'QATValLoop'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce8e8348a84241374bde40e552ec4ae2903f7d6f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dmcp_subnet_hook import DMCPSubnetHook
+from .dump_subnet_hook import DumpSubnetHook
+from .estimate_resources_hook import EstimateResourcesHook
+from .stop_distillation_hook import StopDistillHook
+from .visualization_hook import RazorVisualizationHook
+
+__all__ = [
+    'DumpSubnetHook', 'EstimateResourcesHook', 'RazorVisualizationHook',
+    'DMCPSubnetHook', 'StopDistillHook'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dmcp_subnet_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dmcp_subnet_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..5c3186fcab7289bf9c03b312ec67f35ef37eba75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dmcp_subnet_hook.py
@@ -0,0 +1,74 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import json
+import os
+
+from mmengine.hooks import Hook
+from mmengine.registry import HOOKS
+
+from mmrazor.structures import export_fix_subnet
+
+
+@HOOKS.register_module()
+class DMCPSubnetHook(Hook):
+    """Dump subnet periodically.
+
+    Args:
+        subnet_sample_num (int):The number of networks sampled,
+            the last of which is the sub-network sampled in ``expected``
+            mode and the others are sampled in ``direct`` mode.
+            Defaults to 10.
+    """
+
+    priority = 'VERY_LOW'
+
+    def __init__(self, subnet_sample_num: int = 10, **kwargs) -> None:
+        self.subnet_sample_num = subnet_sample_num
+
+    def _save_subnet(self, model, runner, save_path):
+        """Save the sampled sub-network config."""
+        fix_subnet, _ = export_fix_subnet(
+            model,
+            export_subnet_mode='mutator',
+            slice_weight=True,
+        )
+        fix_subnet = json.dumps(fix_subnet, indent=4, separators=(',', ':'))
+        with open(save_path, 'w') as file:
+            file.write(fix_subnet)
+
+        runner.logger.info('export finished and '
+                           f'{save_path} saved in {runner.work_dir}.')
+
+    def after_run(self, runner):
+        """Save the sampled subnet under target FLOPs.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        model = getattr(runner.model, 'module', runner.model)
+        runner.logger.info('Sampling...')
+
+        num_sample = self.subnet_sample_num
+        root_dir = os.path.join(runner.work_dir, 'model_sample')
+        target_flops = model.target_flops * 1e6
+
+        if not os.path.exists(root_dir):
+            os.makedirs(root_dir)
+
+        for i in range(num_sample + 1):
+            cur_flops = target_flops * 10
+            while cur_flops > target_flops * 1.05 or \
+                    cur_flops < target_flops * 0.95:
+                model.set_subnet(mode='direct', arch_train=False)
+                cur_flops = model.calc_current_flops()
+
+            if i == num_sample:
+                model.set_subnet(mode='expected', arch_train=False)
+                save_path = os.path.join(root_dir, 'excepted_ch.json')
+                runner.logger.info(
+                    f'Excepted sample(ES) arch with FlOP(MB):{cur_flops}')
+            else:
+                save_path = os.path.join(root_dir,
+                                         'subnet_{}.json'.format(i + 1))
+                runner.logger.info(
+                    f'Driect sample(DS) arch with FlOP(MB): {cur_flops/1e6}')
+            self._save_subnet(model, runner, save_path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dump_subnet_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dump_subnet_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..1aaeaba0f99440a634cda2f87ddb399ce01675f6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/dump_subnet_hook.py
@@ -0,0 +1,169 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+from pathlib import Path
+from typing import Optional, Sequence, Union
+
+from mmengine.dist import master_only
+from mmengine.fileio import FileClient, dump
+from mmengine.hooks import Hook
+from mmengine.registry import HOOKS
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.structures import convert_fix_subnet, export_fix_subnet
+
+DATA_BATCH = Optional[Sequence[dict]]
+
+
+@HOOKS.register_module()
+class DumpSubnetHook(Hook):
+    """Dump subnet periodically.
+
+    Args:
+        interval (int): The saving period. If ``by_epoch=True``, interval
+            indicates epochs, otherwise it indicates iterations.
+            Defaults to -1, which means "never".
+        by_epoch (bool): Saving checkpoints by epoch or by iteration.
+            Default: True.
+        out_dir (str, optional | Path): The root directory to save checkpoints.
+            If not specified, ``runner.work_dir`` will be used by default. If
+            specified, the ``out_dir`` will be the concatenation of ``out_dir``
+            and the last level directory of ``runner.work_dir``. For example,
+            if the input ``our_dir`` is ``./tmp`` and ``runner.work_dir`` is
+            ``./work_dir/cur_exp``, then the ckpt will be saved in
+            ``./tmp/cur_exp``. Defaults to None.
+        max_keep_subnets (int): The maximum subnets to keep.
+            In some cases we want only the latest few subnets and would
+            like to delete old ones to save the disk space.
+            Defaults to -1, which means unlimited.
+        save_last (bool): Whether to force the last checkpoint to be
+            saved regardless of interval. Defaults to True.
+        file_client_args (dict, optional): Arguments to instantiate a
+            FileClient. See :class:`mmcv.fileio.FileClient` for details.
+            Defaults to None.
+    """
+    out_dir: str
+
+    priority = 'VERY_LOW'
+
+    def __init__(self,
+                 interval: int = -1,
+                 by_epoch: bool = True,
+                 out_dir: Optional[Union[str, Path]] = None,
+                 max_keep_subnets: int = -1,
+                 save_last: bool = True,
+                 file_client_args: Optional[dict] = None,
+                 **kwargs) -> None:
+        self.interval = interval
+        self.by_epoch = by_epoch
+        self.out_dir = out_dir  # type: ignore
+        self.max_keep_subnets = max_keep_subnets
+        self.save_last = save_last
+        self.args = kwargs
+        self.file_client_args = file_client_args
+
+    def before_train(self, runner) -> None:
+        """Finish all operations, related to checkpoint.
+
+        This function will get the appropriate file client, and the directory
+        to save these checkpoints of the model.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        if self.out_dir is None:
+            self.out_dir = runner.work_dir
+
+        self.file_client = FileClient.infer_client(self.file_client_args,
+                                                   self.out_dir)
+        # if `self.out_dir` is not equal to `runner.work_dir`, it means that
+        # `self.out_dir` is set so the final `self.out_dir` is the
+        # concatenation of `self.out_dir` and the last level directory of
+        # `runner.work_dir`
+        if self.out_dir != runner.work_dir:
+            basename = osp.basename(runner.work_dir.rstrip(osp.sep))
+            self.out_dir = self.file_client.join_path(
+                self.out_dir, basename)  # type: ignore  # noqa: E501
+
+        runner.logger.info(f'Subnets will be saved to {self.out_dir} by '
+                           f'{self.file_client.name}.')
+
+    def after_train_epoch(self, runner) -> None:
+        """Save the checkpoint and synchronize buffers after each epoch.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        if not self.by_epoch:
+            return
+
+        # save subnet for following cases:
+        # 1. every ``self.interval`` epochs
+        # 2. reach the last epoch of training
+        if self.every_n_epochs(runner, self.interval) or (
+                self.save_last and self.is_last_train_epoch(runner)):
+            runner.logger.info(f'Saving subnet at {runner.epoch + 1} epochs')
+            self._save_subnet(runner)
+
+    @master_only
+    def _save_subnet(self, runner) -> None:
+        """Save the current best subnet.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        model = runner.model.module if runner.distributed else runner.model
+
+        # delete non-leaf tensor to get deepcopy(model).
+        # TODO solve the hard case.
+        for module in model.architecture.modules():
+            if isinstance(module, BaseMutable):
+                if hasattr(module, 'arch_weights'):
+                    delattr(module, 'arch_weights')
+
+        copied_model = copy.deepcopy(model)
+        copied_model.mutator.set_choices(copied_model.mutator.sample_choices())
+
+        subnet_dict = export_fix_subnet(copied_model)[0]
+        subnet_dict = convert_fix_subnet(subnet_dict)
+
+        if self.by_epoch:
+            subnet_filename = self.args.get(
+                'filename_tmpl',
+                'subnet_epoch_{}.yaml').format(runner.epoch + 1)
+        else:
+            subnet_filename = self.args.get(
+                'filename_tmpl', 'subnet_iter_{}.yaml').format(runner.iter + 1)
+
+        file_client = FileClient.infer_client(self.file_client_args,
+                                              self.out_dir)
+        filepath = file_client.join_path(self.out_dir, subnet_filename)
+
+        dump(subnet_dict, filepath, file_format='yaml')
+
+    def after_train_iter(self,
+                         runner,
+                         batch_idx: int,
+                         data_batch: DATA_BATCH = None,
+                         outputs=Optional[dict]) -> None:
+        """Save the subnet after each iteration.
+
+        Args:
+            runner (Runner): The runner of the training process.
+            batch_idx (int): The index of the current batch in the train loop.
+            data_batch (Sequence[dict], optional): Data from dataloader.
+                Defaults to None.
+            outputs (dict, optional): Outputs from model. Defaults to None.
+        """
+        if self.by_epoch:
+            return
+
+        # save checkpoint for following cases:
+        # 1. every ``self.interval`` iterations
+        # 2. reach the last iteration of training
+        if self.every_n_train_iters(runner, self.interval) or \
+                (self.save_last and
+                 self.is_last_train_iter(runner)):
+            runner.logger.info(
+                f'Saving subnet at {runner.iter + 1} iterations')
+            self._save_subnet(runner)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/estimate_resources_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/estimate_resources_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..28f381a3c66160954fcda96408666d938c4513c5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/estimate_resources_hook.py
@@ -0,0 +1,122 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Any, Dict, Optional, Sequence
+
+import torch
+from mmengine.hooks import Hook
+from mmengine.registry import HOOKS
+from mmengine.structures import BaseDataElement
+
+from mmrazor.registry import TASK_UTILS
+
+DATA_BATCH = Optional[Sequence[dict]]
+
+
+@HOOKS.register_module()
+class EstimateResourcesHook(Hook):
+    """Estimate model resources periodically.
+
+    Args:
+        interval (int): The saving period. If ``by_epoch=True``, interval
+            indicates epochs, otherwise it indicates iterations.
+            Defaults to -1, which means "never".
+        by_epoch (bool): Saving checkpoints by epoch or by iteration.
+            Default to True.
+        estimator_cfg (Dict[str, Any]): Used for building a resource estimator.
+            Default to None.
+
+    Example:
+    >>> add the `EstimatorResourcesHook` in custom_hooks as follows:
+        custom_hooks = [
+            dict(type='mmrazor.EstimateResourcesHook',
+                 interval=1,
+                 by_epoch=True,
+                 estimator_cfg=dict(input_shape=(1, 3, 64, 64)))
+        ]
+    """
+    out_dir: str
+
+    priority = 'VERY_LOW'
+
+    def __init__(self,
+                 interval: int = -1,
+                 by_epoch: bool = True,
+                 estimator_cfg: Dict[str, Any] = None,
+                 **kwargs) -> None:
+        self.interval = interval
+        self.by_epoch = by_epoch
+        estimator_cfg = dict() if estimator_cfg is None else estimator_cfg
+        if 'type' not in estimator_cfg:
+            estimator_cfg['type'] = 'mmrazor.ResourceEstimator'
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+    def after_val_epoch(self,
+                        runner,
+                        metrics: Optional[Dict[str, float]] = None) -> None:
+        """Estimate model resources after every n val epochs.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        if not self.by_epoch:
+            return
+
+        if self.every_n_epochs(runner, self.interval):
+            self.estimate_resources(runner)
+
+    def after_val_iter(self,
+                       runner,
+                       batch_idx: int,
+                       data_batch: DATA_BATCH = None,
+                       outputs: Optional[Sequence[BaseDataElement]] = None) \
+            -> None:
+        """Estimate model resources after every n val iters.
+
+        Args:
+            runner (Runner): The runner of the training process.
+        """
+        if self.by_epoch:
+            return
+
+        if self.every_n_train_iters(runner, self.interval):
+            self.estimate_resources(runner)
+
+    def estimate_resources(self, runner) -> None:
+        """Estimate model resources: latency/flops/params."""
+        model = runner.model.module if runner.distributed else runner.model
+
+        # TODO confirm the state judgement.
+        if hasattr(model, 'is_supernet') and model.is_supernet:
+            model = self.export_subnet(model)
+
+        resource_metrics = self.estimator.estimate(model)
+        runner.logger.info(f'Estimate model resources: {resource_metrics}')
+
+    def export_subnet(self, model) -> torch.nn.Module:
+        """Export current best subnet.
+
+        NOTE: This method is called when it comes to those NAS algorithms that
+        require building a supernet for training.
+
+        For those algorithms, measuring subnet resources is more meaningful
+        than supernet during validation, therefore this method is required to
+        get the current searched subnet from the supernet.
+        """
+        # Avoid circular import
+        from mmrazor.models.mutables.base_mutable import BaseMutable
+        from mmrazor.structures import export_fix_subnet, load_fix_subnet
+
+        # delete non-leaf tensor to get deepcopy(model).
+        # TODO solve the hard case.
+        for module in model.architecture.modules():
+            if isinstance(module, BaseMutable):
+                if hasattr(module, 'arch_weights'):
+                    delattr(module, 'arch_weights')
+
+        copied_model = copy.deepcopy(model)
+        copied_model.mutator.set_choices(copied_model.mutator.sample_choices())
+
+        subnet_dict = export_fix_subnet(copied_model)[0]
+        load_fix_subnet(copied_model, subnet_dict)
+
+        return copied_model
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/group_fisher_hooks.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/group_fisher_hooks.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ce8631b79ef64705eff454f2cb10ffaaf89eaf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/group_fisher_hooks.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import \
+    PruningStructureHook  # noqa
+from mmrazor.implementations.pruning.group_fisher import \
+    ResourceInfoHook  # noqa
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/stop_distillation_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/stop_distillation_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b907dc61712ad0672e65574c824fffaf049084a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/stop_distillation_hook.py
@@ -0,0 +1,31 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.hooks import Hook
+from mmengine.model import is_model_wrapper
+
+from mmrazor.registry import HOOKS
+
+
+@HOOKS.register_module()
+class StopDistillHook(Hook):
+    """Stop distilling at a certain time.
+
+    Args:
+        stop_epoch (int): Stop distillation at this epoch.
+    """
+
+    priority = 'LOW'
+
+    def __init__(self, stop_epoch: int) -> None:
+        self.stop_epoch = stop_epoch
+
+    def before_train_epoch(self, runner) -> None:
+        """Stop distillation."""
+        if runner.epoch >= self.stop_epoch:
+            model = runner.model
+            # TODO: refactor after mmengine using model wrapper
+            if is_model_wrapper(model):
+                model = model.module
+            assert hasattr(model, 'distillation_stopped')
+
+            runner.logger.info('Distillation has been stopped!')
+            model.distillation_stopped = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/visualization_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/visualization_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..a52145b101d5725defe04ac71039956f122c22af
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/hooks/visualization_hook.py
@@ -0,0 +1,205 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import warnings
+from typing import List, Optional, Union
+
+import mmcv
+import torch
+from mmcv.transforms import Compose
+from mmengine.dist import master_only
+from mmengine.fileio import FileClient
+from mmengine.hooks import Hook
+from mmengine.model import is_model_wrapper
+from mmengine.utils import mkdir_or_exist
+from mmengine.visualization import Visualizer
+
+from mmrazor.models.task_modules import RecorderManager
+from mmrazor.registry import HOOKS
+from mmrazor.visualization.local_visualizer import modify
+
+
+def norm(feat):
+    assert len(feat.shape) == 4
+    N, C, H, W = feat.shape
+    feat = feat.permute(1, 0, 2, 3).reshape(C, -1)
+    mean = feat.mean(dim=-1, keepdim=True)
+    std = feat.std(dim=-1, keepdim=True)
+    centered = (feat - mean) / (std + 1e-6)
+    centered = centered.reshape(C, N, H, W).permute(1, 0, 2, 3)
+    return centered
+
+
+@HOOKS.register_module()
+class RazorVisualizationHook(Hook):
+    """Razor Visualization Hook. Used to visualize training process immediate
+    feature maps.
+
+    1. If ``show`` is True, it means that only the immediate feature maps are
+        visualized without storing data, so ``vis_backends`` needs to
+        be excluded.
+    2. If ``out_dir`` is specified, it means that the immediate feature maps
+        need to be saved to ``out_dir``. In order to avoid vis_backends
+        also storing data, so ``vis_backends`` needs to be excluded.
+    3. ``vis_backends`` takes effect if the user does not specify ``show``
+        and `out_dir``. You can set ``vis_backends`` to WandbVisBackend or
+        TensorboardVisBackend to store the immediate feature maps in Wandb or
+        Tensorboard.
+
+    Args:
+        recorders (dict): All recorders' config.
+        mappings: (Dict[str, Dict]): The mapping between feature names and
+            records.
+        enabled (bool): Whether to draw immediate feature maps. If it is False,
+            it means that no drawing will be done. Defaults to False.
+        interval (int): The interval of visualization. Defaults to 1.
+        show (bool): Whether to display the drawn image. Default to False.
+        wait_time (float): The interval of show (s). Defaults to 0.
+        out_dir (str, optional): directory where painted images
+            will be saved in testing process.
+        file_client_args (dict): Arguments to instantiate a FileClient.
+            See :class:`mmengine.fileio.FileClient` for details.
+            Defaults to ``dict(backend='disk')``.
+        is_overlaid (bool): If `is_overlaid` is True, the final output image
+            will be the weighted sum of img and featmap. Defaults to True.
+        visualization_cfg (dict): Configs for visualization.
+        use_norm (bool): Whether to apply Batch Normalization over the
+            feature map. Defaults to False.
+    """
+
+    def __init__(self,
+                 recorders: dict,
+                 mappings: dict,
+                 enabled: bool = False,
+                 data_idx: Union[int, List] = 0,
+                 interval: int = 1,
+                 show: bool = False,
+                 wait_time: float = 0.1,
+                 out_dir: Optional[str] = None,
+                 file_client_args: dict = dict(backend='disk'),
+                 is_overlaid: bool = True,
+                 visualization_cfg=dict(
+                     channel_reduction='pixel_wise_max',
+                     topk=20,
+                     arrangement=(4, 5),
+                     resize_shape=None,
+                     alpha=0.5),
+                 use_norm: bool = False):
+        self.enabled = enabled
+        self._visualizer: Visualizer = Visualizer.get_current_instance()
+        self._visualizer.draw_featmap = modify
+        if isinstance(data_idx, int):
+            data_idx = [data_idx]
+        self.data_idx = data_idx
+        self.show = show
+        if self.show:
+            # No need to think about vis backends.
+            self._visualizer._vis_backends = {}
+            warnings.warn('The show is True, it means that only '
+                          'the prediction results are visualized '
+                          'without storing data, so vis_backends '
+                          'needs to be excluded.')
+
+        self.wait_time = wait_time
+        self.file_client_args = file_client_args.copy()
+        self.file_client = None
+        self.out_dir = out_dir
+        self.interval = interval
+
+        self.is_overlaid = is_overlaid
+        self.visualization_cfg = visualization_cfg
+        self.use_norm = use_norm
+
+        self.recorder_manager = RecorderManager(recorders)
+        self.mappings = mappings
+
+        self._step = 0  # Global step value to record
+
+    @master_only
+    def before_run(self, runner) -> None:
+        model = runner.model
+        if is_model_wrapper(model):
+            self.recorder_manager.initialize(model.module)
+        else:
+            self.recorder_manager.initialize(model)
+
+    @master_only
+    def before_train(self, runner):
+        if not self.enabled or runner.epoch % self.interval != 0:
+            return
+        self._visualize(runner, 'before_run')
+
+    @master_only
+    def after_train_epoch(self, runner) -> None:
+        if not self.enabled or runner.epoch % self.interval != 0:
+            return
+        self._visualize(runner, f'epoch_{runner.epoch}')
+
+    def _visualize(self, runner, stage):
+        if self.out_dir is not None:
+            self.out_dir = osp.join(runner.work_dir, runner.timestamp,
+                                    self.out_dir)
+            mkdir_or_exist(self.out_dir)
+
+        if self.file_client is None:
+            self.file_client = FileClient(**self.file_client_args)
+
+        cfg = runner.cfg.copy()
+        test_pipeline = cfg.test_dataloader.dataset.pipeline
+        new_test_pipeline = []
+        for pipeline in test_pipeline:
+            if pipeline['type'] != 'LoadAnnotations' and pipeline[
+                    'type'] != 'LoadPanopticAnnotations':
+                new_test_pipeline.append(pipeline)
+
+        test_pipeline = Compose(new_test_pipeline)
+        dataset = runner.val_loop.dataloader.dataset
+
+        for idx in self.data_idx:
+            data_info = dataset.get_data_info(idx)
+            img_path = data_info['img_path']
+            data_ = dict(img_path=img_path, img_id=0)
+            data_ = test_pipeline(data_)
+
+            data_['inputs'] = [data_['inputs']]
+            data_['data_samples'] = [data_['data_samples']]
+
+            with torch.no_grad(), self.recorder_manager:
+                runner.model.test_step(data_)
+
+            if self.is_overlaid:
+                img_bytes = self.file_client.get(img_path)
+                overlaid_image = mmcv.imfrombytes(
+                    img_bytes, channel_order='rgb')
+            else:
+                overlaid_image = None
+
+            for name, record in self.mappings.items():
+                recorder = self.recorder_manager.get_recorder(record.recorder)
+                record_idx = getattr(record, 'record_idx', 0)
+                data_idx = getattr(record, 'data_idx', None)
+                feats = recorder.get_record_data(record_idx, data_idx)
+                if isinstance(feats, torch.Tensor):
+                    feats = (feats, )
+
+                for i, feat in enumerate(feats):
+                    if self.use_norm:
+                        feat = norm(feat)
+                    drawn_img = self._visualizer.draw_featmap(
+                        feat[0], overlaid_image, **self.visualization_cfg)
+
+                    out_file = None
+                    if self.out_dir is not None:
+                        out_file = f'{stage}_data_idx_{idx}_{name}_{i}.jpg'
+                        out_file = osp.join(self.out_dir, out_file)
+
+                    self._visualizer.add_datasample(
+                        f'{stage}_data_idx_{idx}_{name}_{i}',
+                        drawn_img,
+                        draw_gt=False,
+                        draw_pred=False,
+                        show=self.show,
+                        wait_time=0.1,
+                        # TODO: Supported in mmengine's Viusalizer.
+                        out_file=out_file,
+                        step=self._step)
+                    self._step += 1
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9fb003224bd431625346dd8c501401737bcb0c77
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .optimizer_constructor import SeparateOptimWrapperConstructor
+
+__all__ = ['SeparateOptimWrapperConstructor']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/optimizer_constructor.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/optimizer_constructor.py
new file mode 100644
index 0000000000000000000000000000000000000000..09889dcbf82f72d6f4649e8dc30f725f7b553773
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/optimizers/optimizer_constructor.py
@@ -0,0 +1,71 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import Optional
+
+import torch.nn as nn
+from mmengine.optim import DefaultOptimWrapperConstructor, OptimWrapperDict
+
+from mmrazor.registry import OPTIM_WRAPPER_CONSTRUCTORS
+
+
+@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
+class SeparateOptimWrapperConstructor:
+    """OptimizerConstructor for Darts. This class construct optimizer for
+    the submodules (generator and discriminator for GAN most models) of the
+    model separately, and return a :class:~`mmengine.optim.OptimWrapperDict`.
+    Example:
+        >>> # build GAN model
+        >>> model = dict(
+        >>>     type='SAGAN',
+        >>>     num_classes=10,
+        >>>     generator=dict(type='SAGANGenerator'),
+        >>>     discriminator=dict(type='ProjDiscriminator'))
+        >>> gan_model = MODELS.build(model)
+        >>> # build constructor
+        >>> optim_wrapper = dict(
+        >>>     constructor='GANOptimWrapperConstructor',
+        >>>     generator=dict(
+        >>>         type='OptimWrapper',
+        >>>         accumulative_counts=1,
+        >>>         optimizer=dict(type='Adam', lr=0.0002,
+        >>>                        betas=(0.5, 0.999))),
+        >>>     discriminator=dict(
+        >>>         optimizer=dict(
+        >>>             type='OptimWrapper',
+        >>>             accumulative_counts=1,
+        >>>             optimizer=dict(type='Adam', lr=0.0002,
+        >>>                            betas=(0.5, 0.999)),
+        >>>         )))
+        >>> optim_wrapper_dict_builder = GenOptimConstructor(optim_wrapper)
+        >>> # build optim wrapper dict
+        >>> optim_wrapper_dict = optim_wrapper_dict_builder(gan_model)
+    Args:
+        optim_wrapper_cfg (dict): Config of the optimizer wrapper.
+        paramwise_cfg (Optional[dict]): Parameter-wise options.
+    """
+
+    def __init__(self,
+                 optim_wrapper_cfg: dict,
+                 paramwise_cfg: Optional[dict] = None):
+        if not isinstance(optim_wrapper_cfg, dict):
+            raise TypeError('optimizer_cfg should be a dict',
+                            f'but got {type(optim_wrapper_cfg)}')
+        assert paramwise_cfg is None, (
+            'parawise_cfg should be set in each optimizer separately')
+        self.optim_cfg = optim_wrapper_cfg
+        self.constructors = {}
+        for key, cfg in self.optim_cfg.items():
+            cfg_ = cfg.copy()
+            paramwise_cfg_ = cfg_.pop('paramwise_cfg', None)
+
+            self.constructors[key] = DefaultOptimWrapperConstructor(
+                cfg_, paramwise_cfg_)
+
+    def __call__(self, module: nn.Module) -> OptimWrapperDict:
+        """Build optimizer and return a optimizerwrapperdict."""
+        optimizers = {}
+        if hasattr(module, 'module'):
+            module = module.module
+        for key, constructor in self.constructors.items():
+            optimizers[key] = constructor(module._modules[key])
+        return OptimWrapperDict(**optimizers)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5fe2fd524f5b11f105807dcbc785bcfb2fa86096
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .autoslim_greedy_search_loop import AutoSlimGreedySearchLoop
+from .darts_loop import DartsEpochBasedTrainLoop, DartsIterBasedTrainLoop
+from .distill_val_loop import SelfDistillValLoop, SingleTeacherDistillValLoop
+from .evolution_search_loop import EvolutionSearchLoop
+from .iteprune_val_loop import ItePruneValLoop
+from .quantization_loops import (LSQEpochBasedLoop, PTQLoop, QATEpochBasedLoop,
+                                 QATValLoop)
+from .slimmable_val_loop import SlimmableValLoop
+from .subnet_sampler_loop import GreedySamplerTrainLoop
+from .subnet_val_loop import SubnetValLoop
+
+__all__ = [
+    'SingleTeacherDistillValLoop', 'DartsEpochBasedTrainLoop',
+    'DartsIterBasedTrainLoop', 'SlimmableValLoop', 'EvolutionSearchLoop',
+    'GreedySamplerTrainLoop', 'SubnetValLoop', 'SelfDistillValLoop',
+    'ItePruneValLoop', 'AutoSlimGreedySearchLoop', 'QATEpochBasedLoop',
+    'PTQLoop', 'LSQEpochBasedLoop', 'QATValLoop'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/autoslim_greedy_search_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/autoslim_greedy_search_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf9752ce00b9eb6acdfb197962d46005bb417d18
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/autoslim_greedy_search_loop.py
@@ -0,0 +1,223 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+import warnings
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import torch
+from mmengine import fileio
+from mmengine.evaluator import Evaluator
+from mmengine.runner import TestLoop
+from torch.utils.data import DataLoader
+
+from mmrazor.registry import LOOPS, TASK_UTILS
+from mmrazor.structures import convert_fix_subnet, export_fix_subnet
+from .utils import check_subnet_resources
+
+
+@LOOPS.register_module()
+class AutoSlimGreedySearchLoop(TestLoop):
+    """Loop for Greedy searching in AutoSlim. Please refer to
+    https://arxiv.org/abs/1903.11728 for more details.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        target_flops (Tuple[float]): The FLOPs limitation of target subnets.
+        estimator_cfg (dict, Optional): Used for building a resource estimator.
+            Defaults to None.
+        score_key (str): Specify one metric in evaluation results to score
+            candidates. Defaults to 'accuracy_top-1'.
+        resume_from (str, optional): Specify the path of saved .pkl file for
+            resuming searching.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 target_flops: Tuple[float],
+                 estimator_cfg: Dict[str, Any] = dict(),
+                 score_key: str = 'accuracy/top1',
+                 resume_from: Optional[str] = None):
+        super().__init__(runner, dataloader, evaluator)
+
+        if hasattr(self.dataloader.dataset, 'metainfo'):
+            self.evaluator.dataset_meta = self.dataloader.dataset.metainfo
+        else:
+            warnings.warn(
+                f'Dataset {self.dataloader.dataset.__class__.__name__} has no '
+                'metainfo. ``dataset_meta`` in evaluator, metric and '
+                'visualizer will be None.')
+
+        self.target_flops = sorted(target_flops, reverse=True)
+        self.score_key = score_key
+        self.resume_from = resume_from
+
+        # initialize estimator
+        estimator_cfg = dict() if estimator_cfg is None else estimator_cfg
+        if 'type' not in estimator_cfg:
+            estimator_cfg['type'] = 'mmrazor.ResourceEstimator'
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+        if self.runner.distributed:
+            self.model = runner.model.module
+        else:
+            self.model = runner.model
+
+        assert hasattr(self.model, 'mutator')
+        units = self.model.mutator.mutable_units
+
+        self.candidate_choices = {}
+        for unit in units:
+            self.candidate_choices[unit.alias] = unit.candidate_choices
+
+        self.max_subnet = {}
+        for name, candidate_choices in self.candidate_choices.items():
+            self.max_subnet[name] = len(candidate_choices)
+        self.current_subnet = self.max_subnet
+
+        current_subnet_choices = self._channel_bins2choices(
+            self.current_subnet)
+        _, results = check_subnet_resources(self.model, current_subnet_choices,
+                                            self.estimator)
+        self.current_flops = results['flops']
+
+        self.searched_subnet: List[Dict[str, int]] = []
+        self.searched_subnet_flops: List[float] = []
+
+    def run(self) -> None:
+        """Launch searching."""
+        self.runner.call_hook('before_test')
+
+        if self.resume_from:
+            self._resume()
+
+        for target in self.target_flops:
+            if self.resume_from and self.current_flops <= target:
+                continue
+
+            if self.current_flops <= target:
+                self.searched_subnet.append(self.current_subnet)
+                self.searched_subnet_flops.append(self.current_flops)
+                self.runner.logger.info(
+                    f'Find model flops {self.current_flops} <= {target}')
+                continue
+
+            while self.current_flops > target:
+                best_score, best_subnet = None, None
+
+                for unit_name in sorted(self.current_subnet.keys()):
+                    if self.current_subnet[unit_name] == 1:
+                        # The number of channel_bin has reached the minimum
+                        # value
+                        continue
+                    pruned_subnet = copy.deepcopy(self.current_subnet)
+                    pruned_subnet[unit_name] -= 1
+                    pruned_subnet_choices = self._channel_bins2choices(
+                        pruned_subnet)
+                    self.model.mutator.set_choices(pruned_subnet_choices)
+                    metrics = self._val_subnet()
+                    score = metrics[self.score_key] \
+                        if len(metrics) != 0 else 0.
+                    self.runner.logger.info(
+                        f'Slimming unit {unit_name}, {self.score_key}: {score}'
+                    )
+                    if best_score is None or score > best_score:
+                        best_score = score
+                        best_subnet = pruned_subnet
+
+                if best_subnet is None:
+                    raise RuntimeError(
+                        'Cannot find any valid model, check your '
+                        'configurations.')
+
+                self.current_subnet = best_subnet
+                current_subnet_choices = self._channel_bins2choices(
+                    self.current_subnet)
+                _, results = check_subnet_resources(self.model,
+                                                    current_subnet_choices,
+                                                    self.estimator)
+                self.current_flops = results['flops']
+                self.runner.logger.info(
+                    f'Greedily find model, score: {best_score}, '
+                    f'{self.current_subnet}, FLOPS: {self.current_flops}')
+                self._save_searcher_ckpt()
+
+            self.searched_subnet.append(self.current_subnet)
+            self.searched_subnet_flops.append(self.current_flops)
+            self.runner.logger.info(
+                f'Find model flops {self.current_flops} <= {target}')
+
+        self._save_searched_subnet()
+        self.runner.call_hook('after_test')
+
+    def _channel_bins2choices(self, subnet_channel_bins):
+        """Convert the channel bin number of a channel unit to the choice
+        (ratio when choice_mode='ratio' and channel number when
+        choice_mode='number')."""
+        choices = {}
+        for unit_name, bins in subnet_channel_bins.items():
+            # `bins` is in range [1, max_bins]
+            choices[unit_name] = self.candidate_choices[unit_name][bins - 1]
+        return choices
+
+    @torch.no_grad()
+    def _val_subnet(self) -> Dict:
+        """Run validation."""
+        self.runner.model.eval()
+        for data_batch in self.dataloader:
+            outputs = self.runner.model.val_step(data_batch)
+            self.evaluator.process(data_samples=outputs, data_batch=data_batch)
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        return metrics
+
+    def _save_searcher_ckpt(self) -> None:
+        """Save searcher ckpt, which is different from common ckpt.
+
+        It mainly contains the candicate pool, the top-k candicates with scores
+        and the current epoch.
+        """
+        if self.runner.rank != 0:
+            return
+        save_for_resume = dict()
+        for k in [
+                'current_subnet', 'current_flops', 'searched_subnet',
+                'searched_subnet_flops'
+        ]:
+            save_for_resume[k] = getattr(self, k)
+        fileio.dump(save_for_resume,
+                    osp.join(self.runner.work_dir, 'latest.pkl'))
+        self.runner.logger.info(
+            f'{len(self.searched_subnet)} subnets have been searched, '
+            f'FLOPs are {self.searched_subnet_flops}')
+
+    def _save_searched_subnet(self):
+        """Save the final searched subnet dict."""
+        if self.runner.rank != 0:
+            return
+        self.runner.logger.info('Search finished:')
+        for subnet, flops in zip(self.searched_subnet,
+                                 self.searched_subnet_flops):
+            subnet_choice = self._channel_bins2choices(subnet)
+            self.model.mutator.set_choices(subnet_choice)
+            fixed_subnet, _ = export_fix_subnet(self.model)
+            save_name = 'FLOPS_{:.2f}M.yaml'.format(flops)
+            fixed_subnet = convert_fix_subnet(fixed_subnet)
+            fileio.dump(fixed_subnet, osp.join(self.runner.work_dir,
+                                               save_name))
+            self.runner.logger.info(
+                f'{save_name} is saved in {self.runner.work_dir}.')
+
+    def _resume(self):
+        """Resume searching."""
+        searcher_resume = fileio.load(self.resume_from)
+        for key, val in searcher_resume.items():
+            setattr(self, key, val)
+        self.runner.logger.info('#' * 100)
+        self.runner.logger.info(f'Current channel_bins dict: '
+                                f'{self.current_subnet}, \n'
+                                f'Current flops: {self.current_flops}')
+        self.runner.logger.info('#' * 100)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/darts_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/darts_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..65e4611dfadfb1a5e9345a2750f1c5b2201f770a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/darts_loop.py
@@ -0,0 +1,181 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Union
+
+from mmengine.runner import EpochBasedTrainLoop, IterBasedTrainLoop
+from torch.utils.data import DataLoader
+
+from mmrazor.registry import LOOPS
+
+
+@LOOPS.register_module()
+class DartsEpochBasedTrainLoop(EpochBasedTrainLoop):
+    """EpochBasedTrainLoop for `Darts <https://arxiv.org/abs/1806.09055>`_
+
+    In Darts, Two dataloaders are needed in the training stage. One
+    (`dataloader`) is used to train the supernet and update its weights,
+    another(`mutator_dataloader`) is only used to train and update the
+    parameters of the supernet's architecture setting. In
+    `DartsEpochBasedTrainLoop`, these dataloaders will be combined as a
+    special dataloader, whose `data_batch` will contain both of the
+    dataloaders' `data_batch`.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or Dict):
+            A dataloader object or a dict to build a dataloader for
+            training the model.
+        mutator_dataloader (Dataloader or Dict):
+            A dataloader object or a dict to build a dataloader for
+            training the parameters of model architecture.
+        max_epochs (int): Total training epochs.
+        val_begin (int): The epoch that begins validating.
+            Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[Dict, DataLoader],
+                 mutator_dataloader: Union[Dict, DataLoader],
+                 max_epochs: int,
+                 val_begin: int = 1,
+                 val_interval: int = 1) -> None:
+        super().__init__(runner, dataloader, max_epochs, val_begin,
+                         val_interval)
+        if isinstance(mutator_dataloader, dict):
+            self.mutator_dataloader = runner.build_dataloader(
+                mutator_dataloader, seed=runner.seed)
+        else:
+            self.mutator_dataloader = mutator_dataloader
+        self.multi_loaders = [self.dataloader, self.mutator_dataloader]
+
+    def run_epoch(self) -> None:
+        """Iterate one epoch."""
+        self.runner.call_hook('before_train_epoch')
+        self.runner.model.train()
+
+        for idx, data_batch in enumerate(EpochMultiLoader(self.multi_loaders)):
+            self.run_iter(idx, data_batch)
+
+        self.runner.call_hook('after_train_epoch')
+        self._epoch += 1
+
+
+@LOOPS.register_module()
+class DartsIterBasedTrainLoop(IterBasedTrainLoop):
+    """IterBasedTrainLoop for `Darts <https://arxiv.org/abs/1806.09055>`_
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or Dict):
+            A dataloader object or a dict to build a dataloader.
+        mutator_dataloader (Dataloader or Dict):
+            A dataloader object or a dict to build a dataloader for
+            training the parameters of model architecture.
+        max_iter (int): Total training iterations.
+        val_begin (int): The iteration that begins validating.
+            Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1000.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[Dict, DataLoader],
+                 mutator_dataloader: Union[Dict, DataLoader],
+                 max_iters: int,
+                 val_begin: int = 1,
+                 val_interval: int = 1000) -> None:
+        super().__init__(runner, dataloader, max_iters, val_begin,
+                         val_interval)
+        if isinstance(mutator_dataloader, dict):
+            self.mutator_dataloader = runner.build_dataloader(
+                mutator_dataloader, seed=runner.seed)
+        else:
+            self.mutator_dataloader = mutator_dataloader
+        multi_loaders = [self.dataloader, self.mutator_dataloader]
+        self.multi_loaders = IterMultiLoader(multi_loaders)
+
+    def run(self) -> None:
+        """Launch training."""
+        self.runner.call_hook('before_train')
+        # In iteration-based training loop, we treat the whole training process
+        # as a big epoch and execute the corresponding hook.
+        self.runner.call_hook('before_train_epoch')
+        while self._iter < self._max_iters:
+            self.runner.model.train()
+
+            data_batch = next(self.multi_loaders)  # type: ignore
+            self.run_iter(data_batch)
+
+            if (self.runner.val_loop is not None
+                    and self._iter >= self.val_begin
+                    and self._iter % self.val_interval == 0):
+                self.runner.val_loop.run()
+
+        self.runner.call_hook('after_train_epoch')
+        self.runner.call_hook('after_train')
+
+
+class EpochMultiLoader:
+    """Multi loaders based on epoch."""
+
+    def __init__(self, dataloaders: List[DataLoader]):
+        self._dataloaders = dataloaders
+        self.iter_loaders = [iter(loader) for loader in self._dataloaders]
+
+    @property
+    def num_loaders(self):
+        """The number of dataloaders."""
+        return len(self._dataloaders)
+
+    def __iter__(self):
+        """Return self when executing __iter__."""
+        return self
+
+    def __next__(self):
+        """Get the next iter's data of multiple loaders."""
+        data = tuple([next(loader) for loader in self.iter_loaders])
+
+        return data
+
+    def __len__(self):
+        """Get the length of loader."""
+        return min([len(loader) for loader in self._dataloaders])
+
+
+class IterMultiLoader:
+    """Multi loaders based on iter."""
+
+    def __init__(self, dataloaders: Union[List[DataLoader], DataLoader]):
+        self._dataloaders = dataloaders if isinstance(dataloaders,
+                                                      list) else [dataloaders]
+        self.iter_loaders = [iter(loader) for loader in self._dataloaders]
+        self._epoch = 0
+
+    @property
+    def epoch(self):
+        """The property of the class."""
+        return self._epoch
+
+    @property
+    def num_loaders(self):
+        """The number of dataloaders."""
+        return len(self._dataloaders)
+
+    def __next__(self):
+        """Get the next iter's data of multiple loaders."""
+        try:
+            data = tuple([next(loader) for loader in self.iter_loaders])
+        except StopIteration:
+            self._epoch += 1
+            for loader in self._dataloaders:
+                if hasattr(loader.sampler, 'set_epoch'):
+                    loader.sampler.set_epoch(self._epoch)
+            self.iter_loader = [iter(loader) for loader in self._dataloaders]
+            data = tuple([next(loader) for loader in self.iter_loaders])
+
+        return data
+
+    def __len__(self):
+        """Get the length of loader."""
+        return min([len(loader) for loader in self._dataloaders])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/distill_val_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/distill_val_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a86bbf4eae19d431c77692f303ef0d4b9cce4aa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/distill_val_loop.py
@@ -0,0 +1,127 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Sequence, Union
+
+import torch
+from mmengine.evaluator import Evaluator
+from mmengine.runner import ValLoop, autocast
+from torch.utils.data import DataLoader
+
+from mmrazor.registry import LOOPS
+
+
+@LOOPS.register_module()
+class SingleTeacherDistillValLoop(ValLoop):
+    """Knowledge Distill loop for validation. It is not only validate student,
+    but also validate teacher with the same dataloader.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 fp16: bool = False) -> None:
+        super().__init__(runner, dataloader, evaluator, fp16)
+        if self.runner.distributed:
+            assert hasattr(self.runner.model.module, 'teacher')
+            # TODO: remove hard code after mmcls add data_preprocessor
+            data_preprocessor = self.runner.model.module.data_preprocessor
+            self.teacher = self.runner.model.module.teacher
+            self.teacher.data_preprocessor = data_preprocessor
+
+        else:
+            assert hasattr(self.runner.model, 'teacher')
+            # TODO: remove hard code after mmcls add data_preprocessor
+            data_preprocessor = self.runner.model.data_preprocessor
+            self.teacher = self.runner.model.teacher
+            self.teacher.data_preprocessor = data_preprocessor
+
+    def run(self):
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+        self.runner.call_hook('before_val_epoch')
+        self.runner.model.eval()
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch)
+        # compute student metrics
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+
+        self.runner.call_hook('before_val_epoch')
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter_teacher(idx, data_batch)
+        # compute teacher metrics
+        teacher_metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        for key, value in teacher_metrics.items():
+            teacher_key = 'teacher.' + key
+            metrics[teacher_key] = value
+
+        self.runner.call_hook('after_val_epoch', metrics=metrics)
+        self.runner.call_hook('after_val')
+
+    @torch.no_grad()
+    def run_iter_teacher(self, idx, data_batch: Sequence[dict]):
+        """Iterate one mini-batch.
+
+        Args:
+            data_batch (Sequence[dict]): Batch of data
+                from dataloader.
+        """
+        self.runner.call_hook(
+            'before_val_iter', batch_idx=idx, data_batch=data_batch)
+
+        with autocast(enabled=self.fp16):
+            # outputs should be sequence of BaseDataElement
+            outputs = self.teacher.val_step(data_batch)
+
+        self.evaluator.process(data_samples=outputs, data_batch=data_batch)
+        self.runner.call_hook(
+            'after_val_iter',
+            batch_idx=idx,
+            data_batch=data_batch,
+            outputs=outputs)
+
+
+@LOOPS.register_module()
+class SelfDistillValLoop(ValLoop):
+    """Knowledge Distill loop for validation. Only validate student.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 fp16: bool = False) -> None:
+        super().__init__(runner, dataloader, evaluator, fp16)
+
+    def run(self):
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+        self.runner.call_hook('before_val_epoch')
+        self.runner.model.eval()
+
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch)
+        # compute student metrics
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        student_metrics = dict()
+        for key, value in metrics.items():
+            student_key = 'student.' + key
+            student_metrics[student_key] = value
+
+        self.runner.call_hook('after_val_epoch', metrics=student_metrics)
+        self.runner.call_hook('after_val')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/evolution_search_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/evolution_search_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..644385e2fe6abf2991aeb2a271abb43a26527010
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/evolution_search_loop.py
@@ -0,0 +1,443 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import random
+import time
+import warnings
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import numpy as np
+import torch
+from mmengine import fileio
+from mmengine.dist import broadcast_object_list
+from mmengine.evaluator import Evaluator
+from mmengine.runner import EpochBasedTrainLoop
+from mmengine.utils import is_list_of
+from torch.utils.data import DataLoader
+
+from mmrazor.registry import LOOPS, TASK_UTILS
+from mmrazor.structures import (Candidates, convert_fix_subnet,
+                                export_fix_subnet)
+from mmrazor.utils import SupportRandomSubnet
+from .utils import CalibrateBNMixin, check_subnet_resources, crossover
+
+
+@LOOPS.register_module()
+class EvolutionSearchLoop(EpochBasedTrainLoop, CalibrateBNMixin):
+    """Loop for evolution searching.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        max_epochs (int): Total searching epochs. Defaults to 20.
+        max_keep_ckpts (int): The maximum checkpoints of searcher to keep.
+            Defaults to 3.
+        resume_from (str, optional): Specify the path of saved .pkl file for
+            resuming searching.
+        num_candidates (int): The length of candidate pool. Defaults to 50.
+        top_k (int): Specify top k candidates based on scores. Defaults to 10.
+        num_mutation (int): The number of candidates got by mutation.
+            Defaults to 25.
+        num_crossover (int): The number of candidates got by crossover.
+            Defaults to 25.
+        mutate_prob (float): The probability of mutation. Defaults to 0.1.
+        crossover_prob (float): The probability of crossover. Defaults to 0.5.
+        calibrate_sample_num (int): The number of images to compute the true
+            average of per-batch mean/variance instead of the running average.
+            Defaults to -1.
+        constraints_range (Dict[str, Any]): Constraints to be used for
+            screening candidates. Defaults to dict(flops=(0, 330)).
+        estimator_cfg (dict, Optional): Used for building a resource estimator.
+            Defaults to None.
+        predictor_cfg (dict, Optional): Used for building a metric predictor.
+            Defaults to None.
+        score_key (str): Specify one metric in evaluation results to score
+            candidates. Defaults to 'accuracy_top-1'.
+        init_candidates (str, optional): The candidates file path, which is
+            used to init `self.candidates`. Its format is usually in .yaml
+            format. Defaults to None.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 max_epochs: int = 20,
+                 max_keep_ckpts: int = 3,
+                 resume_from: Optional[str] = None,
+                 num_candidates: int = 50,
+                 top_k: int = 10,
+                 num_mutation: int = 25,
+                 num_crossover: int = 25,
+                 mutate_prob: float = 0.1,
+                 crossover_prob: float = 0.5,
+                 calibrate_sample_num: int = -1,
+                 constraints_range: Dict[str, Any] = dict(flops=(0., 330.)),
+                 estimator_cfg: Optional[Dict] = None,
+                 predictor_cfg: Optional[Dict] = None,
+                 score_key: str = 'accuracy/top1',
+                 init_candidates: Optional[str] = None) -> None:
+        super().__init__(runner, dataloader, max_epochs)
+        if isinstance(evaluator, dict) or is_list_of(evaluator, dict):
+            self.evaluator = runner.build_evaluator(evaluator)  # type: ignore
+        else:
+            self.evaluator = evaluator  # type: ignore
+        if hasattr(self.dataloader.dataset, 'metainfo'):
+            self.evaluator.dataset_meta = self.dataloader.dataset.metainfo
+        else:
+            warnings.warn(
+                f'Dataset {self.dataloader.dataset.__class__.__name__} has no '
+                'metainfo. ``dataset_meta`` in evaluator, metric and '
+                'visualizer will be None.')
+
+        self.num_candidates = num_candidates
+        self.top_k = top_k
+        self.constraints_range = constraints_range
+        self.calibrate_sample_num = calibrate_sample_num
+        self.score_key = score_key
+        self.num_mutation = num_mutation
+        self.num_crossover = num_crossover
+        self.mutate_prob = mutate_prob
+        self.crossover_prob = crossover_prob
+        self.max_keep_ckpts = max_keep_ckpts
+        self.resume_from = resume_from
+        self.fp16 = False
+
+        if init_candidates is None:
+            self.candidates = Candidates()
+        else:
+            self.candidates = fileio.load(init_candidates)
+            assert isinstance(self.candidates, Candidates), 'please use the \
+                correct init candidates file'
+
+        self.top_k_candidates = Candidates()
+
+        if self.runner.distributed:
+            self.model = runner.model.module
+        else:
+            self.model = runner.model
+
+        # initialize estimator
+        estimator_cfg = dict() if estimator_cfg is None else estimator_cfg
+        if 'type' not in estimator_cfg:
+            estimator_cfg['type'] = 'mmrazor.ResourceEstimator'
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+        # initialize predictor
+        self.use_predictor = False
+        self.predictor_cfg = predictor_cfg
+        if self.predictor_cfg is not None:
+            self.predictor_cfg['score_key'] = self.score_key
+            self.predictor_cfg['search_groups'] = \
+                self.model.mutator.search_groups
+            self.predictor = TASK_UTILS.build(self.predictor_cfg)
+
+    def run(self) -> None:
+        """Launch searching."""
+        self.runner.call_hook('before_train')
+
+        if self.predictor_cfg is not None:
+            self._init_predictor()
+
+        if self.resume_from:
+            self._resume()
+
+        while self._epoch < self._max_epochs:
+            self.run_epoch()
+            self._save_searcher_ckpt()
+
+        self._save_best_fix_subnet()
+
+        self.runner.call_hook('after_train')
+
+    def run_epoch(self) -> None:
+        """Iterate one epoch.
+
+        Steps:
+            1. Sample some new candidates from the supernet. Then Append them
+                to the candidates, Thus make its number equal to the specified
+                number.
+            2. Validate these candidates(step 1) and update their scores.
+            3. Pick the top k candidates based on the scores(step 2), which
+                will be used in mutation and crossover.
+            4. Implement Mutation and crossover, generate better candidates.
+        """
+        self.sample_candidates()
+        self.update_candidates_scores()
+
+        scores_before = self.top_k_candidates.scores
+        self.runner.logger.info(f'top k scores before update: '
+                                f'{scores_before}')
+
+        self.candidates.extend(self.top_k_candidates)
+        self.candidates.sort_by(key_indicator='score', reverse=True)
+        self.top_k_candidates = Candidates(self.candidates.data[:self.top_k])
+
+        scores_after = self.top_k_candidates.scores
+        self.runner.logger.info(f'top k scores after update: '
+                                f'{scores_after}')
+
+        mutation_candidates = self.gen_mutation_candidates()
+        self.candidates_mutator_crossover = Candidates(mutation_candidates)
+        crossover_candidates = self.gen_crossover_candidates()
+        self.candidates_mutator_crossover.extend(crossover_candidates)
+
+        assert len(self.candidates_mutator_crossover
+                   ) <= self.num_candidates, 'Total of mutation and \
+            crossover should be less than the number of candidates.'
+
+        self.candidates = self.candidates_mutator_crossover
+        self._epoch += 1
+
+    def sample_candidates(self) -> None:
+        """Update candidate pool contains specified number of candicates."""
+        candidates_resources = []
+        init_candidates = len(self.candidates)
+        if self.runner.rank == 0:
+            while len(self.candidates) < self.num_candidates:
+                candidate = self.model.mutator.sample_choices()
+                is_pass, result = self._check_constraints(
+                    random_subnet=candidate)
+                if is_pass:
+                    self.candidates.append(candidate)
+                    candidates_resources.append(result)
+            self.candidates = Candidates(self.candidates.data)
+        else:
+            self.candidates = Candidates([dict(a=0)] * self.num_candidates)
+
+        if len(candidates_resources) > 0:
+            self.candidates.update_resources(
+                candidates_resources,
+                start=len(self.candidates.data) - len(candidates_resources))
+            assert init_candidates + len(
+                candidates_resources) == self.num_candidates
+
+        # broadcast candidates to val with multi-GPUs.
+        broadcast_object_list(self.candidates.data)
+
+    def update_candidates_scores(self) -> None:
+        """Validate candicate one by one from the candicate pool, and update
+        top-k candicates."""
+        for i, candidate in enumerate(self.candidates.subnets):
+            self.model.mutator.set_choices(candidate)
+            metrics = self._val_candidate(use_predictor=self.use_predictor)
+            score = round(metrics[self.score_key], 2) \
+                if len(metrics) != 0 else 0.
+            self.candidates.set_resource(i, score, 'score')
+            self.runner.logger.info(
+                f'Epoch:[{self._epoch}/{self._max_epochs}] '
+                f'Candidate:[{i + 1}/{self.num_candidates}] '
+                f'Flops: {self.candidates.resources("flops")[i]} '
+                f'Params: {self.candidates.resources("params")[i]} '
+                f'Latency: {self.candidates.resources("latency")[i]} '
+                f'Score: {self.candidates.scores[i]} ')
+
+    def gen_mutation_candidates(self):
+        """Generate specified number of mutation candicates."""
+        mutation_resources = []
+        mutation_candidates: List = []
+        max_mutate_iters = self.num_mutation * 10
+        mutate_iter = 0
+        while len(mutation_candidates) < self.num_mutation:
+            mutate_iter += 1
+            if mutate_iter > max_mutate_iters:
+                break
+
+            mutation_candidate = self._mutation()
+
+            is_pass, result = self._check_constraints(
+                random_subnet=mutation_candidate)
+            if is_pass:
+                mutation_candidates.append(mutation_candidate)
+                mutation_resources.append(result)
+
+        mutation_candidates = Candidates(mutation_candidates)
+        mutation_candidates.update_resources(mutation_resources)
+
+        return mutation_candidates
+
+    def gen_crossover_candidates(self):
+        """Generate specofied number of crossover candicates."""
+        crossover_resources = []
+        crossover_candidates: List = []
+        crossover_iter = 0
+        max_crossover_iters = self.num_crossover * 10
+        while len(crossover_candidates) < self.num_crossover:
+            crossover_iter += 1
+            if crossover_iter > max_crossover_iters:
+                break
+
+            crossover_candidate = self._crossover()
+
+            is_pass, result = self._check_constraints(
+                random_subnet=crossover_candidate)
+            if is_pass:
+                crossover_candidates.append(crossover_candidate)
+                crossover_resources.append(result)
+
+        crossover_candidates = Candidates(crossover_candidates)
+        crossover_candidates.update_resources(crossover_resources)
+
+        return crossover_candidates
+
+    def _mutation(self) -> SupportRandomSubnet:
+        """Mutate with the specified mutate_prob."""
+        candidate1 = random.choice(self.top_k_candidates.subnets)
+        candidate2 = self.model.mutator.sample_choices()
+        candidate = crossover(candidate1, candidate2, prob=self.mutate_prob)
+        return candidate
+
+    def _crossover(self) -> SupportRandomSubnet:
+        """Crossover."""
+        candidate1 = random.choice(self.top_k_candidates.subnets)
+        candidate2 = random.choice(self.top_k_candidates.subnets)
+        candidate = crossover(candidate1, candidate2, prob=self.crossover_prob)
+        return candidate
+
+    def _resume(self):
+        """Resume searching."""
+        if self.runner.rank == 0:
+            searcher_resume = fileio.load(self.resume_from)
+            for k in searcher_resume.keys():
+                setattr(self, k, searcher_resume[k])
+            epoch_start = int(searcher_resume['_epoch'])
+            self._max_epochs = self._max_epochs - epoch_start
+            self.runner.logger.info('#' * 100)
+            self.runner.logger.info(f'Resume from epoch: {epoch_start}')
+            self.runner.logger.info('#' * 100)
+
+    def _save_best_fix_subnet(self):
+        """Save best subnet in searched top-k candidates."""
+        if self.runner.rank == 0:
+            best_random_subnet = self.top_k_candidates.subnets[0]
+            self.model.mutator.set_choices(best_random_subnet)
+
+            best_fix_subnet, sliced_model = \
+                export_fix_subnet(self.model, slice_weight=True)
+
+            timestamp_subnet = time.strftime('%Y%m%d_%H%M', time.localtime())
+            model_name = f'subnet_{timestamp_subnet}.pth'
+            save_path = osp.join(self.runner.work_dir, model_name)
+            torch.save({
+                'state_dict': sliced_model.state_dict(),
+                'meta': {}
+            }, save_path)
+            self.runner.logger.info(f'Subnet checkpoint {model_name} saved in '
+                                    f'{self.runner.work_dir}')
+
+            save_name = 'best_fix_subnet.yaml'
+            best_fix_subnet = convert_fix_subnet(best_fix_subnet)
+            fileio.dump(best_fix_subnet,
+                        osp.join(self.runner.work_dir, save_name))
+            self.runner.logger.info(
+                f'Subnet config {save_name} saved in {self.runner.work_dir}.')
+
+            self.runner.logger.info('Search finished.')
+
+    @torch.no_grad()
+    def _val_candidate(self, use_predictor: bool = False) -> Dict:
+        """Run validation.
+
+        Args:
+            use_predictor (bool): Whether to use predictor to get metrics.
+                Defaults to False.
+        """
+        if use_predictor:
+            assert self.predictor is not None
+            metrics = self.predictor.predict(self.model)
+        else:
+            if self.calibrate_sample_num > 0:
+                self.calibrate_bn_statistics(self.runner.train_dataloader,
+                                             self.calibrate_sample_num)
+            self.runner.model.eval()
+            for data_batch in self.dataloader:
+                outputs = self.runner.model.val_step(data_batch)
+                self.evaluator.process(
+                    data_samples=outputs, data_batch=data_batch)
+            metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        return metrics
+
+    def _save_searcher_ckpt(self) -> None:
+        """Save searcher ckpt, which is different from common ckpt.
+
+        It mainly contains the candicate pool, the top-k candicates with scores
+        and the current epoch.
+        """
+        if self.runner.rank == 0:
+            save_for_resume = dict()
+            save_for_resume['_epoch'] = self._epoch
+            for k in ['candidates', 'top_k_candidates']:
+                save_for_resume[k] = getattr(self, k)
+            fileio.dump(
+                save_for_resume,
+                osp.join(self.runner.work_dir,
+                         f'search_epoch_{self._epoch}.pkl'))
+            self.runner.logger.info(
+                f'Epoch:[{self._epoch}/{self._max_epochs}], top1_score: '
+                f'{self.top_k_candidates.scores[0]}')
+
+            if self.max_keep_ckpts > 0:
+                cur_ckpt = self._epoch + 1
+                redundant_ckpts = range(1, cur_ckpt - self.max_keep_ckpts)
+                for _step in redundant_ckpts:
+                    ckpt_path = osp.join(self.runner.work_dir,
+                                         f'search_epoch_{_step}.pkl')
+                    if osp.isfile(ckpt_path):
+                        os.remove(ckpt_path)
+
+    def _check_constraints(
+            self, random_subnet: SupportRandomSubnet) -> Tuple[bool, Dict]:
+        """Check whether is beyond constraints.
+
+        Returns:
+            bool, result: The result of checking.
+        """
+        is_pass, results = check_subnet_resources(
+            model=self.model,
+            subnet=random_subnet,
+            estimator=self.estimator,
+            constraints_range=self.constraints_range)
+
+        return is_pass, results
+
+    def _init_predictor(self):
+        """Initialize predictor, training is required."""
+        if self.predictor.handler_ckpt:
+            self.predictor.load_checkpoint()
+            self.runner.logger.info(
+                f'Loaded Checkpoints from {self.predictor.handler_ckpt}')
+        else:
+            self.runner.logger.info('No predictor checkpoints found. '
+                                    'Start pre-training the predictor.')
+            if isinstance(self.predictor.train_samples, str):
+                self.runner.logger.info('Find specified samples in '
+                                        f'{self.predictor.train_samples}')
+                train_samples = fileio.load(self.predictor.train_samples)
+                self.candidates = train_samples['subnets']
+            else:
+                self.runner.logger.info(
+                    'Without specified samples. Start random sampling.')
+                temp_num_candidates = self.num_candidates
+                self.num_candidates = self.predictor.train_samples
+
+                assert self.use_predictor is False, (
+                    'Real evaluation is required when initializing predictor.')
+                self.sample_candidates()
+                self.update_candidates_scores()
+                self.num_candidates = temp_num_candidates
+
+            inputs = []
+            for candidate in self.candidates.subnets:
+                inputs.append(self.predictor.model2vector(candidate))
+            inputs = np.array(inputs)
+            labels = np.array(self.candidates.scores)
+            self.predictor.fit(inputs, labels)
+            if self.runner.rank == 0:
+                predictor_dir = self.predictor.save_checkpoint(
+                    osp.join(self.runner.work_dir, 'predictor'))
+                self.runner.logger.info(
+                    f'Predictor pre-trained, saved in {predictor_dir}.')
+            self.use_predictor = True
+            self.candidates = Candidates()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/iteprune_val_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/iteprune_val_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a627f3986f31be96a2c4d2db4f26500393d87df
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/iteprune_val_loop.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import json
+import os.path as osp
+
+import torch
+from mmengine.runner import ValLoop
+
+from mmrazor.registry import LOOPS
+from mmrazor.structures import export_fix_subnet
+
+
+@LOOPS.register_module()
+class ItePruneValLoop(ValLoop):
+    """Pruning loop for validation. Export fixed subnet configs.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+    """
+
+    def run(self):
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+        self.runner.call_hook('before_val_epoch')
+        self.runner.model.eval()
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch)
+
+        # compute metrics
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        self._save_fix_subnet()
+        self.runner.call_hook('after_val_epoch', metrics=metrics)
+        self.runner.call_hook('after_val')
+        return metrics
+
+    def _save_fix_subnet(self):
+        """Save model subnet config."""
+        model = self.runner.model.module \
+            if self.runner.distributed else self.runner.model
+
+        fix_subnet, static_model = export_fix_subnet(
+            model, export_subnet_mode='mutator', slice_weight=True)
+        fix_subnet = json.dumps(fix_subnet, indent=4, separators=(',', ':'))
+
+        subnet_name = 'fix_subnet.json'
+        weight_name = 'fix_subnet_weight.pth'
+        with open(osp.join(self.runner.work_dir, subnet_name), 'w') as file:
+            file.write(fix_subnet)
+        torch.save({'state_dict': static_model.state_dict()},
+                   osp.join(self.runner.work_dir, weight_name))
+        self.runner.logger.info(
+            'export finished and '
+            f'{subnet_name}, '
+            f'{weight_name} saved in {self.runner.work_dir}.')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/quantization_loops.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/quantization_loops.py
new file mode 100644
index 0000000000000000000000000000000000000000..58d91cf182db7c9cf59c2481fe7c67f35edba92b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/quantization_loops.py
@@ -0,0 +1,399 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from typing import Dict, List, Optional, Sequence, Tuple, Union
+
+import torch
+from mmengine.evaluator import Evaluator
+from mmengine.logging import print_log
+from mmengine.runner import EpochBasedTrainLoop, TestLoop, ValLoop
+
+try:
+    from torch.ao.quantization import (disable_observer, enable_fake_quant,
+                                       enable_observer)
+    from torch.nn.intrinsic.qat import freeze_bn_stats
+except ImportError:
+    from mmrazor.utils import get_placeholder
+
+    disable_observer = get_placeholder('torch>=1.13')
+    enable_fake_quant = get_placeholder('torch>=1.13')
+    enable_observer = get_placeholder('torch>=1.13')
+    freeze_bn_stats = get_placeholder('torch>=1.13')
+
+from mmengine.dist import all_reduce_params, is_distributed
+from torch.utils.data import DataLoader
+
+from mmrazor.models import register_torch_fake_quants, register_torch_observers
+from mmrazor.models.fake_quants import (enable_param_learning,
+                                        enable_static_estimate, enable_val)
+from mmrazor.registry import LOOPS
+
+TORCH_observers = register_torch_observers()
+TORCH_fake_quants = register_torch_fake_quants()
+
+
+@LOOPS.register_module()
+class QATEpochBasedLoop(EpochBasedTrainLoop):
+    """`EpochBasedLoop` for `QuantizationAwareTraining`
+
+    Args:
+        runner (Runner): A reference of runner
+        dataloader (Dataloader or dict): An iterator to generate one batch of
+            dataset each iteration.
+        max_epochs (int): Total training epochs.
+        val_begin (int): The epoch that begins validating. Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1.
+        disable_observer_begin (int): The number of total epochs to update
+            observers. Defaults to -1, which means observers are enabled
+            all the time.
+        freeze_bn_begin (int): The number of total epochs to update batch norm
+            stats. Defaults to -1, which means no need to freeze bn.
+        dynamic_intervals (List[Tuple[int, int]], optional): The
+            first element in the tuple is a milestone and the second
+            element is a interval. The interval is used after the
+            corresponding milestone. Defaults to None.
+    """
+
+    def __init__(
+            self,
+            runner,
+            dataloader: Union[DataLoader, Dict],
+            max_epochs: int,
+            val_begin: int = 1,
+            val_interval: int = 1,
+            disable_observer_begin: int = -1,
+            freeze_bn_begin: int = -1,
+            dynamic_intervals: Optional[List[Tuple[int, int]]] = None) -> None:
+        super().__init__(runner, dataloader, max_epochs, val_begin,
+                         val_interval, dynamic_intervals)
+
+        self.disable_observer_begin = disable_observer_begin
+        self.freeze_bn_begin = freeze_bn_begin
+
+    def prepare_for_run_epoch(self):
+        """Toggle the state of the observers and fake quantizers before qat
+        training."""
+        self.runner.model.apply(enable_fake_quant)
+
+        # The initialized _epoch equals to 0 so _epoch + 1
+        # equal to the current epoch
+        if (self.disable_observer_begin > 0
+                and self._epoch + 1 >= self.disable_observer_begin):
+            self.runner.model.apply(disable_observer)
+        else:
+            self.runner.model.apply(enable_observer)
+
+        if (self.freeze_bn_begin > 0
+                and self._epoch + 1 >= self.freeze_bn_begin):
+            self.runner.model.apply(freeze_bn_stats)
+
+    def prepare_for_val(self):
+        """Toggle the state of the observers and fake quantizers before
+        validation."""
+        self.runner.model.apply(enable_fake_quant)
+        self.runner.model.apply(disable_observer)
+
+    def run(self):
+        """Launch training."""
+        self.runner.call_hook('before_train')
+
+        while self._epoch < self._max_epochs:
+            self.prepare_for_run_epoch()
+            self.run_epoch()
+
+            self._decide_current_val_interval()
+            if (self.runner.val_loop is not None
+                    and self._epoch >= self.val_begin
+                    and self._epoch % self.val_interval == 0):
+                self.runner.val_loop.run()
+
+        self.runner.call_hook('after_train')
+
+    def run_epoch(self) -> None:
+        """Iterate one epoch."""
+        self.runner.call_hook('before_train_epoch')
+        self.runner.model.train()
+
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch)
+
+        self.runner.model.sync_qparams(src_mode='loss')
+        # Make sure the registered buffer such as `observer_enabled` is
+        # correct in the saved checkpoint.
+        self.prepare_for_val()
+        self.runner.call_hook('after_train_epoch')
+        self._epoch += 1
+
+
+@LOOPS.register_module()
+class LSQEpochBasedLoop(QATEpochBasedLoop):
+    """`EpochBasedLoop` for `LEARNED STEP SIZE QUANTIZATION`
+
+    Paper: Learned Step Size Quantization. <https://arxiv.org/abs/1902.08153>
+
+    Args:
+        runner (Runner): A reference of runner
+        dataloader (Dataloader or dict): An iterator to generate one batch of
+            dataset each iteration.
+        max_epochs (int): Total training epochs.
+        val_begin (int): The epoch that begins validating. Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1.
+        freeze_bn_begin (int): The number of total epochs to update batch norm
+            stats. Defaults to -1, which means no need to freeze bn.
+        dynamic_intervals (List[Tuple[int, int]], optional): The
+            first element in the tuple is a milestone and the second
+            element is a interval. The interval is used after the
+            corresponding milestone. Defaults to None.
+    """
+
+    def __init__(
+            self,
+            runner,
+            dataloader: Union[DataLoader, Dict],
+            max_epochs: int,
+            val_begin: int = 1,
+            val_interval: int = 1,
+            freeze_bn_begin: int = -1,
+            dynamic_intervals: Optional[List[Tuple[int, int]]] = None) -> None:
+        super().__init__(
+            runner,
+            dataloader,
+            max_epochs,
+            val_begin,
+            val_interval,
+            freeze_bn_begin=freeze_bn_begin,
+            dynamic_intervals=dynamic_intervals)
+
+        self.is_first_batch = True
+        self.distributed = is_distributed()
+
+    def prepare_for_run_epoch(self):
+        """Toggle the state of the observers and fake quantizers before qat
+        training."""
+        if (self.freeze_bn_begin > 0
+                and self._epoch + 1 >= self.freeze_bn_begin):
+            self.runner.model.apply(freeze_bn_stats)
+
+        self.runner.model.apply(enable_param_learning)
+
+    def prepare_for_val(self):
+        """Toggle the state of the observers and fake quantizers before
+        validation."""
+        self.runner.model.apply(enable_val)
+
+    def run_epoch(self) -> None:
+        """Iterate one epoch."""
+        self.runner.call_hook('before_train_epoch')
+        self.runner.model.train()
+
+        for idx, data_batch in enumerate(self.dataloader):
+            if self.is_first_batch:
+                # lsq observer init
+                self.runner.model.apply(enable_static_estimate)
+
+            self.run_iter(idx, data_batch)
+
+            if self.is_first_batch:
+                # In the first batch, scale in LearnableFakeQuantize is
+                # calculated through lsq observer. As the values of `scale` of
+                # different observers in different rank are usually different,
+                # we have to sync the `scale` here.
+                if self.distributed:
+                    all_reduce_params(
+                        self.runner.model.parameters(), op='mean')
+
+                # Change back to param learning mode
+                self.is_first_batch = False
+                self.runner.model.apply(enable_param_learning)
+
+        self.runner.model.sync_qparams(src_mode='loss')
+        # Make sure the registered buffer such as `observer_enabled` is
+        # correct in the saved checkpoint.
+        self.prepare_for_val()
+        self.runner.call_hook('after_train_epoch')
+        self._epoch += 1
+
+
+@LOOPS.register_module()
+class QATValLoop(ValLoop):
+    """`ValLoop` for `QuantizationAwareTraining`
+
+    Args:
+        runner (Runner): A reference of runner
+        dataloader (Dataloader or dict): An iterator to generate one batch of
+            dataset each iteration.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 fp16: bool = False) -> None:
+        super().__init__(runner, dataloader, evaluator, fp16)
+        if self.runner.distributed:
+            assert hasattr(self.runner.model.module, 'architecture')
+            # TODO: remove hard code after mmcls add data_preprocessor
+            data_preprocessor = self.runner.model.module.data_preprocessor
+            self.architecture = self.runner.model.module.architecture
+            self.architecture.data_preprocessor = data_preprocessor
+
+        else:
+            assert hasattr(self.runner.model, 'architecture')
+            # TODO: remove hard code after mmcls add data_preprocessor
+            data_preprocessor = self.runner.model.data_preprocessor
+            self.architecture = self.runner.model.architecture
+            self.architecture.data_preprocessor = data_preprocessor
+
+    def run(self) -> dict:
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+        self.runner.call_hook('before_val_epoch')
+        self.runner.model.eval()
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch, self.runner.model)
+
+        # compute metrics
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        qat_metrics = dict()
+        for key, value in metrics.items():
+            qat_key = 'qat.' + key
+            ori_key = 'original.' + key
+            qat_metrics[qat_key] = value
+            self.runner.message_hub.log_scalars.pop(f'val/{ori_key}', None)
+
+        self.runner.call_hook('after_val_epoch', metrics=qat_metrics)
+
+        self.runner.call_hook('before_val_epoch')
+        self.runner.model.eval()
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch, self.architecture)
+
+        # compute metrics
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        qat_metrics = dict()
+        for key, value in metrics.items():
+            qat_key = 'qat.' + key
+            ori_key = 'original.' + key
+            qat_metrics[ori_key] = value
+            self.runner.message_hub.log_scalars.pop(f'val/{qat_key}', None)
+
+        self.runner.call_hook('after_val_epoch', metrics=qat_metrics)
+
+        self.runner.call_hook('after_val')
+        return qat_metrics
+
+    @torch.no_grad()
+    def run_iter(self, idx, data_batch: Sequence[dict], model):
+        """Iterate one mini-batch.
+
+        Args:
+            data_batch (Sequence[dict]): Batch of data
+                from dataloader.
+        """
+        self.runner.call_hook(
+            'before_val_iter', batch_idx=idx, data_batch=data_batch)
+        # outputs should be sequence of BaseDataElement
+
+        outputs = model.val_step(data_batch)
+        self.evaluator.process(data_samples=outputs, data_batch=data_batch)
+        self.runner.call_hook(
+            'after_val_iter',
+            batch_idx=idx,
+            data_batch=data_batch,
+            outputs=outputs)
+
+
+@LOOPS.register_module()
+class PTQLoop(TestLoop):
+    """`TestLoop` for Post Training Quantization.
+
+    Args:
+        runner (Runner): A reference of runner
+        dataloader (Dataloader or dict): An iterator to generate one batch of
+            dataset each iteration.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool, optional): Enable FP16 training mode. Defaults to False.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 calibrate_dataloader: Union[DataLoader, Dict],
+                 calibrate_steps=32,
+                 fp16: bool = False,
+                 only_val=False):
+        super().__init__(runner, dataloader, evaluator, fp16)
+        if isinstance(calibrate_dataloader, dict):
+            # Determine whether or not different ranks use different seed.
+            diff_rank_seed = runner._randomness_cfg.get(
+                'diff_rank_seed', False)
+            self.calibrate_dataloader = runner.build_dataloader(
+                calibrate_dataloader,
+                seed=runner.seed,
+                diff_rank_seed=diff_rank_seed)
+        else:
+            self.calibrate_dataloader = calibrate_dataloader
+
+        self.calibrate_steps = calibrate_steps
+        self.only_val = only_val
+
+    def run(self) -> dict:
+        """Launch test."""
+        self.runner.call_hook('before_test')
+        self.runner.call_hook('before_test_epoch')
+
+        self.runner.model.eval()
+
+        if not self.only_val:
+            self.runner.model.apply(enable_fake_quant)
+            self.runner.model.apply(enable_observer)
+
+            print_log('Star calibratiion...')
+            for idx, data_batch in enumerate(self.calibrate_dataloader):
+                if idx == self.calibrate_steps:
+                    break
+                self.run_iter(idx, data_batch)
+            print_log('Finish calibratiion!')
+
+            self.runner.model.apply(enable_fake_quant)
+            self.runner.model.apply(disable_observer)
+
+            save_dir = os.path.join(self.runner.work_dir,
+                                    self.runner.timestamp)
+            self.runner.save_checkpoint(
+                save_dir,
+                'model_ptq.pth',
+                file_client_args=None,
+                save_optimizer=False,
+                save_param_scheduler=False)
+            print_log(f'Quantized model is saved in {save_dir}')
+
+        print_log('Start Evaluating quantized model...')
+        self.runner.model.apply(enable_fake_quant)
+        self.runner.model.apply(disable_observer)
+        metricts = self.runner.val_loop.run()
+        self.runner.call_hook('after_test_epoch', metrics=metricts)
+        self.runner.call_hook('after_test')
+
+        return metricts
+
+    @torch.no_grad()
+    def run_iter(self, idx, data_batch: Sequence[dict]) -> None:
+        """Iterate one mini-batch.
+
+        Args:
+            data_batch (Sequence[dict]): Batch of data from dataloader.
+        """
+        self.runner.call_hook(
+            'before_test_iter', batch_idx=idx, data_batch=data_batch)
+
+        _ = self.runner.model.calibrate_step(data_batch)
+
+        self.runner.call_hook(
+            'after_test_iter',
+            batch_idx=idx,
+            data_batch=data_batch,
+            outputs=None)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/slimmable_val_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/slimmable_val_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3f5e2a4ee5fd217045f9f2a2672a63fc348b882
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/slimmable_val_loop.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Union
+
+from mmengine.evaluator import Evaluator
+from mmengine.runner import ValLoop
+from torch.utils.data import DataLoader
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import LOOPS
+
+
+@LOOPS.register_module()
+class SlimmableValLoop(ValLoop):
+    """Knowledge Distill loop for validation. It is not only validate student,
+    but also validate teacher with the same dataloader.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[DataLoader, Dict],
+                 evaluator: Union[Evaluator, Dict, List],
+                 fp16: bool = False) -> None:
+        super().__init__(runner, dataloader, evaluator, fp16)
+
+        if self.runner.distributed:
+            model = self.runner.model.module
+        else:
+            model = self.runner.model
+
+        # just for convenience
+        self._model = model
+
+    def run(self):
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+
+        all_metrics = dict()
+        for subnet_idx, subnet in enumerate(self._model.mutator.subnets):
+            self.runner.call_hook('before_val_epoch')
+            self.runner.model.eval()
+            self._model.mutator.set_choices(subnet)
+            for idx, data_batch in enumerate(self.dataloader):
+                self.run_iter(idx, data_batch)
+            # compute student metrics
+            metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+            all_metrics.update(add_prefix(metrics, f'subnet_{subnet_idx}'))
+
+        self.runner.call_hook('after_val_epoch', metrics=all_metrics)
+
+        self.runner.call_hook('after_val')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_sampler_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_sampler_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f26ee7a23af7c2fbc76300d856621cfb6d69fc7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_sampler_loop.py
@@ -0,0 +1,338 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+import os
+import random
+from abc import abstractmethod
+from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
+
+import torch
+from mmengine import fileio
+from mmengine.evaluator import Evaluator
+from mmengine.runner import IterBasedTrainLoop
+from mmengine.utils import is_list_of
+from torch.utils.data import DataLoader
+
+from mmrazor.registry import LOOPS, TASK_UTILS
+from mmrazor.structures import Candidates
+from mmrazor.utils import SupportRandomSubnet
+from .utils import check_subnet_resources
+
+
+class BaseSamplerTrainLoop(IterBasedTrainLoop):
+    """IterBasedTrainLoop for base sampler.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader for training the model.
+        max_iters (int): Total training iters.
+        val_begin (int): The iteration that begins validating.
+            Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1000.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[Dict, DataLoader],
+                 max_iters: int,
+                 val_begin: int = 1,
+                 val_interval: int = 1000):
+        super().__init__(runner, dataloader, max_iters, val_begin,
+                         val_interval)
+        if self.runner.distributed:
+            self.model = runner.model.module
+        else:
+            self.model = runner.model
+
+    @abstractmethod
+    def sample_subnet(self) -> SupportRandomSubnet:
+        """Sample a subnet to train the supernet."""
+
+    def run_iter(self, data_batch: Sequence[dict]) -> None:
+        """Iterate one mini-batch.
+
+        Args:
+            data_batch (Sequence[dict]): Batch of data from dataloader.
+        """
+        self.runner.call_hook(
+            'before_train_iter', batch_idx=self._iter, data_batch=data_batch)
+        # Enable gradient accumulation mode and avoid unnecessary gradient
+        # synchronization during gradient accumulation process.
+        # outputs should be a dict of loss.
+        subnet = self.sample_subnet()
+        self.model.mutator.set_choices(subnet)
+        outputs = self.runner.model.train_step(
+            data_batch, optim_wrapper=self.runner.optim_wrapper)
+        self.runner.message_hub.update_info('train_logs', outputs)
+
+        self.runner.call_hook(
+            'after_train_iter',
+            batch_idx=self._iter,
+            data_batch=data_batch,
+            outputs=outputs)
+        self._iter += 1
+
+
+@LOOPS.register_module()
+class GreedySamplerTrainLoop(BaseSamplerTrainLoop):
+    """IterBasedTrainLoop for greedy sampler.
+    In GreedySamplerTrainLoop, `Greedy` means that only use some top
+    sampled candidates to train the supernet. So GreedySamplerTrainLoop mainly
+    picks the top candidates based on their val socres, then use them to train
+    the supernet one by one.
+    Steps:
+        1. Sample from the supernet and the candidates.
+        2. Validate these sampled candidates to get each candidate's score.
+        3. Get top-k candidates based on their scores, then use them to train
+        the supernet one by one.
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader for training the model.
+        dataloader_val (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader for evaluating the candidates.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        max_iters (int): Total training iters.
+        val_begin (int): The iteration that begins validating.
+            Defaults to 1.
+        val_interval (int): Validation interval. Defaults to 1000.
+        score_key (str): Specify one metric in evaluation results to score
+            candidates. Defaults to 'accuracy_top-1'.
+        constraints_range (Dict[str, Any]): Constraints to be used for
+            screening candidates. Defaults to dict(flops=(0, 330)).
+        estimator_cfg (dict, Optional): Used for building a resource estimator.
+            Defaults to None.
+        num_candidates (int): The number of the candidates consist of samples
+            from supernet and itself. Defaults to 1000.
+        num_samples (int): The number of sample in each sampling subnet.
+            Defaults to 10.
+        top_k (int): Choose top_k subnet from the candidates used to train
+            the supernet. Defaults to 5.
+        prob_schedule (str): The schedule to generate the probablity of
+            sampling from the candidates. The probablity will increase from
+            [init_prob, max_prob] during [schedule_start_iter,
+            schedule_end_iter]. Both of 'linear' schedule and 'consine'
+            schedule are supported. Defaults to 'linear'.
+        schedule_start_iter (int): The start iter of the prob_schedule.
+            Defaults to 10000. 10000 is corresponding to batch_size: 1024.
+            You should adptive it based on your batch_size.
+        schedule_end_iter (int): The end iter in of the prob_schedule.
+            Defaults to 144360. 144360 = 120(epoch) * 1203 (iters/epoch),
+            batch_size is 1024. You should adptive it based on the batch_size
+            and the total training epochs.
+        init_prob (float): The init probablity of the prob_schedule.
+            Defaults to 0.0.
+        max_prob (float): The max probablity of the prob_schedule.
+            Defaults to 0.8.
+    """
+
+    def __init__(self,
+                 runner,
+                 dataloader: Union[Dict, DataLoader],
+                 dataloader_val: Union[Dict, DataLoader],
+                 evaluator: Union[Evaluator, Dict, List],
+                 max_iters: int,
+                 val_begin: int = 1,
+                 val_interval: int = 1000,
+                 score_key: str = 'accuracy/top1',
+                 constraints_range: Dict[str, Any] = dict(flops=(0, 330)),
+                 estimator_cfg: Optional[Dict] = None,
+                 num_candidates: int = 1000,
+                 num_samples: int = 10,
+                 top_k: int = 5,
+                 prob_schedule: str = 'linear',
+                 schedule_start_iter: int = 10000,
+                 schedule_end_iter: int = 144360,
+                 init_prob: float = 0.,
+                 max_prob: float = 0.8) -> None:
+        super().__init__(runner, dataloader, max_iters, val_begin,
+                         val_interval)
+        if isinstance(dataloader_val, dict):
+            self.dataloader_val = runner.build_dataloader(
+                dataloader_val, seed=runner.seed)
+        else:
+            self.dataloader_val = dataloader_val
+
+        if isinstance(evaluator, dict) or is_list_of(evaluator, dict):
+            self.evaluator = runner.build_evaluator(evaluator)
+        else:
+            self.evaluator = evaluator
+
+        self.score_key = score_key
+        self.constraints_range = constraints_range
+        self.num_candidates = num_candidates
+        self.num_samples = num_samples
+        self.top_k = top_k
+        assert prob_schedule in ['linear', 'consine']
+        self.prob_schedule = prob_schedule
+        self.schedule_start_iter = schedule_start_iter
+        self.schedule_end_iter = schedule_end_iter
+        self.init_prob = init_prob
+        self.max_prob = max_prob
+        self.cur_prob: float = 0.
+
+        self.candidates = Candidates()
+        self.top_k_candidates = Candidates()
+
+        # initialize estimator
+        estimator_cfg = dict() if estimator_cfg is None else estimator_cfg
+        if 'type' not in estimator_cfg:
+            estimator_cfg['type'] = 'mmrazor.ResourceEstimator'
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+    def run(self) -> None:
+        """Launch training."""
+        self.runner.call_hook('before_train')
+        # In iteration-based training loop, we treat the whole training process
+        # as a big epoch and execute the corresponding hook.
+        self.runner.call_hook('before_train_epoch')
+        while self._iter < self._max_iters:
+            self.runner.model.train()
+
+            data_batch = next(self.dataloader_iterator)
+            self.run_iter(data_batch)
+
+            if (self.runner.val_loop is not None
+                    and self._iter >= self.runner.val_begin
+                    and self._iter % self.runner.val_interval == 0):
+                self.runner.val_loop.run()
+        self._save_candidates()
+
+        self.runner.call_hook('after_train_epoch')
+        self.runner.call_hook('after_train')
+
+    def sample_subnet(self) -> SupportRandomSubnet:
+        """Sample a subnet from top_k candidates one by one, then to train the
+        surpernet with the subnet.
+
+        Steps:
+            1. Update and get the `top_k_candidates`.
+                1.1. Update the prob of sampling from the `candidates` based on
+                the `prob_schedule` and the current iter.
+                1.2. Sample `num_samples` candidates from the supernet and the
+                `candidates` based on the updated prob(step 1.1).
+                1.3. Val all candidates to get their scores, including the
+                sampled candidates(step 1.2).
+                1.4. Update the `top_k_candidates` based on
+                their scores(step 1.3).
+            2. Pop from the `top_k_candidates` one by one to train
+            the supernet.
+        """
+        if len(self.top_k_candidates) == 0:
+            self.update_cur_prob(cur_iter=self._iter)
+
+            sampled_candidates, num_sample_from_supernet = \
+                self.get_candidates_with_sample(num_samples=self.num_samples)
+
+            self.candidates.extend(sampled_candidates)
+
+            self.update_candidates_scores()
+
+            self.candidates.sort_by(key_indicator='score', reverse=True)
+            self.candidates = Candidates(
+                self.candidates.data[:self.num_candidates])
+            self.top_k_candidates = Candidates(
+                self.candidates.data[:self.top_k])
+
+            top1_score = self.top_k_candidates.scores[0]
+            if (self._iter % self.val_interval) < self.top_k:
+                self.runner.logger.info(
+                    f'GreedySampler: [{self._iter:>6d}] '
+                    f'prob {self.cur_prob:.3f} '
+                    f'num_sample_from_supernet '
+                    f'{num_sample_from_supernet}/{self.num_samples} '
+                    f'top1_score {top1_score:.3f} '
+                    f'cur_num_candidates: {len(self.candidates)}')
+        return self.top_k_candidates.subnets[0]
+
+    def update_cur_prob(self, cur_iter: int) -> None:
+        """update current probablity of sampling from the candidates, which is
+        generated based on the probablity strategy and current iter."""
+        if cur_iter > self.schedule_end_iter:
+            self.cur_prob = self.max_prob
+        elif cur_iter < self.schedule_start_iter:
+            self.cur_prob = self.init_prob
+        else:
+            schedule_all_steps = self.schedule_end_iter - \
+                self.schedule_start_iter
+            schedule_cur_steps = cur_iter - self.schedule_start_iter
+            if self.prob_schedule == 'linear':
+                tmp = self.max_prob - self.init_prob
+                self.cur_prob = tmp / schedule_all_steps * schedule_cur_steps
+            elif self.prob_schedule == 'consine':
+                tmp_1 = (1 - self.init_prob) * 0.5
+                tmp_2 = math.pi * schedule_cur_steps
+                tmp_3 = schedule_all_steps
+                self.cur_prob = tmp_1 * (1 + math.cos(tmp_2 / tmp_3)) \
+                    + self.init_prob
+            else:
+                raise ValueError('`prob_schedule` is eroor, it should be \
+                    one of `linear` and `consine`.')
+
+    def get_candidates_with_sample(self,
+                                   num_samples: int) -> Tuple[Candidates, int]:
+        """Get candidates with sampling from supernet and the candidates based
+        on the current probablity."""
+        num_sample_from_supernet = 0
+        sampled_candidates = Candidates()
+        for _ in range(num_samples):
+            if random.random() >= self.cur_prob or len(self.candidates) == 0:
+                subnet = self._sample_from_supernet()
+                is_pass, _ = self._check_constraints(subnet)
+                if is_pass:
+                    sampled_candidates.append(subnet)
+                num_sample_from_supernet += 1
+            else:
+                sampled_candidates.append(self._sample_from_candidates())
+        return sampled_candidates, num_sample_from_supernet
+
+    def update_candidates_scores(self) -> None:
+        """Update candidates' scores, which are validated with the
+        `dataloader_val`."""
+        for i, candidate in enumerate(self.candidates.subnets):
+            self.model.mutator.set_choices(candidate)
+            metrics = self._val_candidate()
+            score = metrics[self.score_key] if len(metrics) != 0 else 0.
+            self.candidates.set_resource(i, score, 'score')
+
+    @torch.no_grad()
+    def _val_candidate(self) -> Dict:
+        """Run validation."""
+        self.runner.model.eval()
+        for data_batch in self.dataloader_val:
+            outputs = self.runner.model.val_step(data_batch)
+            self.evaluator.process(data_samples=outputs, data_batch=data_batch)
+        metrics = self.evaluator.evaluate(len(self.dataloader_val.dataset))
+        return metrics
+
+    def _sample_from_supernet(self) -> SupportRandomSubnet:
+        """Sample from the supernet."""
+        subnet = self.model.sample_subnet()
+        return subnet
+
+    def _sample_from_candidates(self) -> SupportRandomSubnet:
+        """Sample from the candidates."""
+        assert len(self.candidates) > 0
+        subnet = random.choice(self.candidates.data)
+        return subnet
+
+    def _check_constraints(self, random_subnet: SupportRandomSubnet):
+        """Check whether is beyond constraints.
+
+        Returns:
+            bool, result: The result of checking.
+        """
+        is_pass, results = check_subnet_resources(
+            model=self.model,
+            subnet=random_subnet,
+            estimator=self.estimator,
+            constraints_range=self.constraints_range)
+
+        return is_pass, results
+
+    def _save_candidates(self) -> None:
+        """Save the candidates to init the next searching."""
+        save_path = os.path.join(self.runner.work_dir, 'candidates.pkl')
+        fileio.dump(self.candidates, save_path)
+        self.runner.logger.info(f'candidates.pkl saved in '
+                                f'{self.runner.work_dir}')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_val_loop.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_val_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..d61e2747fbff138f47002d56dac01346dfabea7f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/subnet_val_loop.py
@@ -0,0 +1,96 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Union
+
+from mmengine.evaluator import Evaluator
+from mmengine.runner import ValLoop
+from torch.utils.data import DataLoader
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import LOOPS, TASK_UTILS
+from .utils import CalibrateBNMixin
+
+
+@LOOPS.register_module()
+class SubnetValLoop(ValLoop, CalibrateBNMixin):
+    """Loop for subnet validation in NAS with BN re-calibration.
+
+    Args:
+        runner (Runner): A reference of runner.
+        dataloader (Dataloader or dict): A dataloader object or a dict to
+            build a dataloader.
+        evaluator (Evaluator or dict or list): Used for computing metrics.
+        fp16 (bool): Whether to enable fp16 validation. Defaults to
+            False.
+        evaluate_fixed_subnet (bool): Whether to evaluate a fixed subnet only
+            or not. Defaults to False.
+        calibrate_sample_num (int): The number of images to compute the true
+            average of per-batch mean/variance instead of the running average.
+            Defaults to 4096.
+        estimator_cfg (dict, Optional): Used for building a resource estimator.
+            Defaults to dict(type='mmrazor.ResourceEstimator').
+    """
+
+    def __init__(
+        self,
+        runner,
+        dataloader: Union[DataLoader, Dict],
+        evaluator: Union[Evaluator, Dict, List],
+        fp16: bool = False,
+        evaluate_fixed_subnet: bool = False,
+        calibrate_sample_num: int = 4096,
+        estimator_cfg: Optional[Dict] = dict(type='mmrazor.ResourceEstimator')
+    ) -> None:
+        super().__init__(runner, dataloader, evaluator, fp16)
+
+        if self.runner.distributed:
+            model = self.runner.model.module
+        else:
+            model = self.runner.model
+
+        self.model = model
+        self.evaluate_fixed_subnet = evaluate_fixed_subnet
+        self.calibrate_sample_num = calibrate_sample_num
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+    def run(self):
+        """Launch validation."""
+        self.runner.call_hook('before_val')
+        self.runner.call_hook('before_val_epoch')
+
+        all_metrics = dict()
+
+        if self.evaluate_fixed_subnet:
+            metrics = self._evaluate_once()
+            all_metrics.update(add_prefix(metrics, 'fix_subnet'))
+        elif hasattr(self.model, 'sample_kinds'):
+            for kind in self.model.sample_kinds:
+                if kind == 'max':
+                    self.model.mutator.set_max_choices()
+                    metrics = self._evaluate_once()
+                    all_metrics.update(add_prefix(metrics, 'max_subnet'))
+                elif kind == 'min':
+                    self.model.mutator.set_min_choices()
+                    metrics = self._evaluate_once()
+                    all_metrics.update(add_prefix(metrics, 'min_subnet'))
+                elif 'random' in kind:
+                    self.model.mutator.set_choices(
+                        self.model.mutator.sample_choices())
+                    metrics = self._evaluate_once()
+                    all_metrics.update(add_prefix(metrics, f'{kind}_subnet'))
+
+        self.runner.call_hook('after_val_epoch', metrics=all_metrics)
+        self.runner.call_hook('after_val')
+
+    def _evaluate_once(self) -> Dict:
+        """Evaluate a subnet once with BN re-calibration."""
+        self.calibrate_bn_statistics(self.runner.train_dataloader,
+                                     self.calibrate_sample_num)
+        self.runner.model.eval()
+        for idx, data_batch in enumerate(self.dataloader):
+            self.run_iter(idx, data_batch)
+
+        metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
+        resource_metrics = self.estimator.estimate(self.model)
+        metrics.update(resource_metrics)
+
+        return metrics
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..90a2aea84fbb2712ec0b69485891d60ec56f2240
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .calibrate_bn_mixin import CalibrateBNMixin
+from .check import check_subnet_resources
+from .genetic import crossover
+
+__all__ = ['crossover', 'check_subnet_resources', 'CalibrateBNMixin']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/calibrate_bn_mixin.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/calibrate_bn_mixin.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e0fe495377827c2da7ecb5a1501b196f2687fac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/calibrate_bn_mixin.py
@@ -0,0 +1,141 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any
+
+import torch
+import torch.distributed as dist
+from mmengine.runner import Runner, autocast
+from torch import Tensor
+from torch.nn.modules.batchnorm import _BatchNorm
+from torch.utils.data import DataLoader
+
+
+class AverageMeter:
+    """Computes and stores the average and current value."""
+
+    def __init__(self) -> None:
+        self.reset()
+
+    def reset(self) -> None:
+        self.avg: Tensor = 0
+        self.sum: Tensor = 0
+        self.count: int = 0
+
+    def update(self, val: Any, batch_size: int = 1) -> None:
+        if dist.is_initialized() and dist.is_available():
+            dist.all_reduce(val, dist.ReduceOp.SUM, async_op=False)
+            batch_size_tensor = torch.tensor([batch_size], device=val.device)
+            dist.all_reduce(
+                batch_size_tensor, dist.ReduceOp.SUM, async_op=False)
+            total_batch_size = batch_size_tensor.item()
+
+            val /= (total_batch_size / batch_size)
+            batch_size = total_batch_size
+
+        self.sum += val * batch_size
+        self.count += batch_size
+        self.avg = self.sum / self.count
+
+
+class CalibrateBNMixin:
+    runner: Runner
+    fp16: bool = False
+
+    @torch.no_grad()
+    def calibrate_bn_statistics(self,
+                                dataloader: DataLoader,
+                                calibrate_sample_num: int = 2000) -> None:
+
+        def record_input_statistics_hook(bn_module: _BatchNorm, input: Tensor,
+                                         output: Tensor) -> None:
+            mean_average_meter: AverageMeter = bn_module.__mean_average_meter__
+            var_average_meter: AverageMeter = bn_module.__var_average_meter__
+
+            real_input = input[0]
+            mean = real_input.mean((0, 2, 3))
+            var = real_input.var((0, 2, 3), unbiased=True)
+
+            mean_average_meter.update(mean, real_input.size(0))
+            var_average_meter.update(var, real_input.size(0))
+
+        hook_handles = []
+
+        for name, module in self.runner.model.named_modules():
+            if isinstance(module, _BatchNorm):
+                self.runner.logger.debug(
+                    'register `record_input_statistics_hook` to module: '
+                    f'{name}')
+                module.__mean_average_meter__ = AverageMeter()
+                module.__var_average_meter__ = AverageMeter()
+                handle = module.register_forward_hook(
+                    record_input_statistics_hook)
+                hook_handles.append(handle)
+
+        self.runner.model.train()
+        self.runner.logger.info('Start calibrating batch norm statistics')
+        self.runner.logger.info(
+            f'Total sample number for calibration: {calibrate_sample_num}')
+        remaining = calibrate_sample_num
+        for data_batch in dataloader:
+            if len(data_batch) >= remaining:
+                data_batch = data_batch[:remaining]
+            if isinstance(data_batch, torch.Tensor):
+                data_batch_nums = len(data_batch)
+            else:
+                data_batch_nums = len(data_batch['inputs'])
+            if dist.is_initialized() and dist.is_available():
+                data_batch_tensor = torch.tensor(
+                    [data_batch_nums], device=self.runner.model.device)
+                dist.all_reduce(
+                    data_batch_tensor, dist.ReduceOp.SUM, async_op=False)
+                data_batch_nums = data_batch_tensor.item()
+            remaining -= data_batch_nums
+
+            self.runner.logger.debug(
+                f'Remaining samples for calibration: {remaining}')
+            with autocast(enabled=self.fp16):
+                self.runner.model.test_step(data_batch)
+
+            if remaining <= 0:
+                break
+
+        for name, module in self.runner.model.named_modules():
+            if isinstance(module, _BatchNorm):
+                mean_average_meter = module.__mean_average_meter__
+                var_average_meter = module.__var_average_meter__
+                if mean_average_meter.count == 0 or \
+                        var_average_meter.count == 0:
+                    assert mean_average_meter.count == 0 and \
+                        var_average_meter.count == 0
+                    self.runner.logger.debug(
+                        f'layer {name} is not chosen, ignored')
+                    continue
+
+                calibrated_bn_mean = mean_average_meter.avg
+                calibrated_bn_var = var_average_meter.avg
+
+                feature_dim = calibrated_bn_mean.size(0)
+
+                self.runner.logger.debug(
+                    f'layer: {name}, '
+                    f'current feature dimension: {feature_dim}, '
+                    'number of samples for calibration: '
+                    f'{mean_average_meter.count}, '
+                    'l2 norm of calibrated running mean: '
+                    f'{calibrated_bn_mean.norm()}, '
+                    'l2 norm of calibrated running var: '
+                    f'{calibrated_bn_var.norm()}, '
+                    'l2 norm of original running mean: '
+                    f'{module.running_mean[:feature_dim].norm()}, '
+                    'l2 norm of original running var: '
+                    f'{module.running_var[:feature_dim].norm()}, ')
+
+                module.running_mean[:feature_dim].copy_(calibrated_bn_mean)
+                module.running_var[:feature_dim].copy_(calibrated_bn_var)
+
+                del module.__mean_average_meter__
+                del module.__var_average_meter__
+
+        self.runner.logger.debug('Remove all hooks for calibration')
+        self.runner.logger.info('Calibrate batch norm statistics done')
+        for handle in hook_handles:
+            handle.remove()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/check.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/check.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad774f647792ccc2df8ea91750e085de162fa443
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/check.py
@@ -0,0 +1,48 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Dict, Tuple
+
+import torch
+
+from mmrazor.models import ResourceEstimator
+from mmrazor.structures import export_fix_subnet
+from mmrazor.utils import SupportRandomSubnet
+
+try:
+    from mmdet.models.detectors import BaseDetector
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseDetector = get_placeholder('mmdet')
+
+
+@torch.no_grad()
+def check_subnet_resources(
+    model,
+    subnet: SupportRandomSubnet,
+    estimator: ResourceEstimator,
+    constraints_range: Dict[str, Any] = dict(flops=(0, 330))
+) -> Tuple[bool, Dict]:
+    """Check whether is beyond resources constraints.
+
+    Returns:
+        bool, result: The result of checking.
+    """
+    if constraints_range is None:
+        return True, dict()
+
+    assert hasattr(model, 'mutator') and hasattr(model, 'architecture')
+    model.mutator.set_choices(subnet)
+    _, sliced_model = export_fix_subnet(model, slice_weight=True)
+
+    model_to_check = sliced_model.architecture  # type: ignore
+    if isinstance(model_to_check, BaseDetector):
+        results = estimator.estimate(model=model_to_check.backbone)
+    else:
+        results = estimator.estimate(model=model_to_check)
+
+    for k, v in constraints_range.items():
+        if not isinstance(v, (list, tuple)):
+            v = (0, v)
+        if results[k] < v[0] or results[k] > v[1]:
+            return False, results
+
+    return True, results
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/genetic.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/genetic.py
new file mode 100644
index 0000000000000000000000000000000000000000..12a4e7863eb785507c3db2430aef590c98d71716
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/engine/runner/utils/genetic.py
@@ -0,0 +1,31 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+
+import numpy as np
+
+from mmrazor.utils import SingleMutatorRandomSubnet
+
+
+def crossover(random_subnet1: SingleMutatorRandomSubnet,
+              random_subnet2: SingleMutatorRandomSubnet,
+              prob: float = 0.5) -> SingleMutatorRandomSubnet:
+    """Crossover in genetic algorithm.
+
+    Args:
+        random_subnet1 (SINGLE_MUTATOR_RANDOM_SUBNET): One of the subnets to
+            crossover.
+        random_subnet2 (SINGLE_MUTATOR_RANDOM_SUBNET): One of the subnets to
+            crossover.
+        prob (float): The probablity of getting choice from `random_subnet2`.
+            Defaults to 0.5.
+
+    Returns:
+        SINGLE_MUTATOR_RANDOM_SUBNET: The result of crossover.
+    """
+    assert prob >= 0. and prob <= 1.,  \
+        'The probability of crossover has to be between 0 and 1'
+    crossover_subnet = copy.deepcopy(random_subnet1)
+    for group_id, choice in random_subnet2.items():
+        if np.random.random_sample() < prob:
+            crossover_subnet[group_id] = choice
+    return crossover_subnet
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..a03158f5343d5e24666adaf4394b3fb5a8461302
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""impl folder is an experimental file structure to store algorithm
+implementations.
+
+Previous file structure splits the files of an algorithm into different folders
+according to the types of these files. It may make it hard to understand an
+algorithm. So we add the impl folder, where all files of an algorithm are
+stored in one folder. As this structure is experimental, it may change rapidly.
+"""
+
+from . import pruning  # noqa
+
+__all__ = ['pruning']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e28ae7dc2db74e246845c95a8cf823f3336d41a4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from . import group_fisher
+
+__all__ = ['group_fisher']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5dd85ce3c13942b365f27126a32d77d2a70847e1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/__init__.py
@@ -0,0 +1,24 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .algorithm import GroupFisherAlgorithm
+from .counters import GroupFisherConv2dCounter, GroupFisherLinearCounter
+from .hook import PruningStructureHook, ResourceInfoHook
+from .mutator import GroupFisherChannelMutator
+from .ops import GroupFisherConv2d, GroupFisherLinear, GroupFisherMixin
+from .prune_deploy_sub_model import GroupFisherDeploySubModel
+from .prune_sub_model import GroupFisherSubModel
+from .unit import GroupFisherChannelUnit
+
+__all__ = [
+    'GroupFisherDeploySubModel',
+    'GroupFisherSubModel',
+    'GroupFisherAlgorithm',
+    'GroupFisherConv2dCounter',
+    'GroupFisherLinearCounter',
+    'PruningStructureHook',
+    'ResourceInfoHook',
+    'GroupFisherChannelMutator',
+    'GroupFisherChannelUnit',
+    'GroupFisherConv2d',
+    'GroupFisherLinear',
+    'GroupFisherMixin',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..a90b406db8405aac0462bb2c6efb57ed385ce9aa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/algorithm.py
@@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Union
+
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+from mmengine.logging import print_log
+from mmengine.model import BaseModel, MMDistributedDataParallel
+
+from mmrazor.models.algorithms.base import BaseAlgorithm
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from mmrazor.utils import RuntimeInfo
+from .mutator import GroupFisherChannelMutator
+
+
+@MODELS.register_module()
+class GroupFisherAlgorithm(BaseAlgorithm):
+    """`Group Fisher Pruning for Practical Network Compression`.
+    https://arxiv.org/pdf/2108.00708.pdf.
+
+    Args:
+        architecture (Union[BaseModel, Dict]): The model to be pruned.
+        mutator (Union[Dict, ChannelMutator], optional): The config
+            of a mutator. Defaults to dict( type='GroupFisherChannelMutator',
+            channel_unit_cfg=dict( type='GroupFisherChannelUnit')).
+        interval (int): The interval of  pruning two channels. Defaults to 10.
+        data_preprocessor (Optional[Union[Dict, nn.Module]], optional):
+            Defaults to None.
+        init_cfg (Optional[Dict], optional): init config for the model.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: Union[Dict, GroupFisherChannelMutator] = dict(
+                     type='GroupFisherChannelMutator',
+                     channel_unit_cfg=dict(type='GroupFisherChannelUnit')),
+                 interval: int = 10,
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.interval = interval
+
+        # using sync bn or normal bn
+        if dist.is_initialized():
+            print_log('Convert Bn to SyncBn.')
+            self.architecture = nn.SyncBatchNorm.convert_sync_batchnorm(
+                self.architecture)
+        else:
+            from mmengine.model import revert_sync_batchnorm
+            self.architecture = revert_sync_batchnorm(self.architecture)
+
+        # mutator
+        self.mutator: GroupFisherChannelMutator = MODELS.build(mutator)
+        self.mutator.prepare_from_supernet(self.architecture)
+
+    def train_step(self, data: Union[dict, tuple, list],
+                   optim_wrapper) -> Dict[str, torch.Tensor]:
+        return self._train_step(data, optim_wrapper)
+
+    def _train_step(self, data: Union[dict, tuple, list], optim_wrapper):
+        """Train step function for GroupFisherAlgorithm and GroupFisherDDP."""
+        self.mutator.start_record_info()
+        res = super().train_step(data, optim_wrapper)
+        self.mutator.end_record_info()
+
+        self.mutator.update_imp()
+        self.mutator.reset_recorded_info()
+
+        if RuntimeInfo.iter() % self.interval == 0:
+            self.mutator.try_prune()
+            self.mutator.reset_imp()
+
+        return res
+
+
+@MODEL_WRAPPERS.register_module()
+class GroupFisherDDP(MMDistributedDataParallel):
+    """Train step for group fisher."""
+
+    def train_step(self, data: Union[dict, tuple, list],
+                   optim_wrapper) -> Dict[str, torch.Tensor]:
+        algorithm = self.module
+        return GroupFisherAlgorithm._train_step(algorithm, data, optim_wrapper)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/counters.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/counters.py
new file mode 100644
index 0000000000000000000000000000000000000000..a8888e1dd245ae6a02c62a4530b95404f80d8dbb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/counters.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.models.task_modules.estimators.counters import (
+    DynamicConv2dCounter, DynamicLinearCounter)
+from mmrazor.registry import TASK_UTILS
+
+
+@TASK_UTILS.register_module()
+class GroupFisherConv2dCounter(DynamicConv2dCounter):
+    """Counter of GroupFisherConv2d."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class GroupFisherLinearCounter(DynamicLinearCounter):
+    """Counter of GroupFisherLinear."""
+    pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..524503dc1df6dc76a8eed1229a4e083b4e6b7513
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/hook.py
@@ -0,0 +1,198 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+from mmengine.dist import master_only
+from mmengine.hooks import Hook
+from mmengine.runner import Runner, save_checkpoint
+from torch import distributed as torch_dist
+
+from mmrazor.models.algorithms import BaseAlgorithm
+from mmrazor.models.mutators.channel_mutator.channel_mutator import \
+    ChannelMutator
+from mmrazor.models.task_modules.demo_inputs import DefaultDemoInput
+from mmrazor.models.task_modules.estimators import ResourceEstimator
+from mmrazor.registry import HOOKS, TASK_UTILS
+from mmrazor.utils import RuntimeInfo, print_log
+
+
+def get_model_from_runner(runner):
+    """Get the model from a runner."""
+    if torch_dist.is_initialized():
+        return runner.model.module
+    else:
+        return runner.model
+
+
+def is_pruning_algorithm(algorithm):
+    """Check whether a model is a pruning algorithm."""
+    return isinstance(algorithm, BaseAlgorithm) \
+             and isinstance(getattr(algorithm, 'mutator', None), ChannelMutator) # noqa
+
+
+@HOOKS.register_module()
+class PruningStructureHook(Hook):
+    """This hook is used to display the structurn information during pruning.
+
+    Args:
+        by_epoch (bool, optional): Whether to display structure information
+            iteratively by epoch. Defaults to True.
+        interval (int, optional): The interval between two structure
+            information display.
+    """
+
+    def __init__(self, by_epoch=True, interval=1) -> None:
+
+        super().__init__()
+        self.by_epoch = by_epoch
+        self.interval = interval
+
+    def show_unit_info(self, algorithm):
+        """Show unit information of an algorithm."""
+        if is_pruning_algorithm(algorithm):
+            chices = algorithm.mutator.choice_template
+            import json
+            print_log(json.dumps(chices, indent=4))
+
+            for unit in algorithm.mutator.mutable_units:
+                if hasattr(unit, 'importance'):
+                    imp = unit.importance()
+                    print_log(
+                        f'{unit.name}: \t{imp.min().item()}\t{imp.max().item()}'  # noqa
+                    )
+
+    @master_only
+    def show(self, runner):
+        """Show pruning algorithm information of a runner."""
+        algorithm = get_model_from_runner(runner)
+        if is_pruning_algorithm(algorithm):
+            self.show_unit_info(algorithm)
+
+    # hook points
+
+    def after_train_epoch(self, runner) -> None:
+        if self.by_epoch and RuntimeInfo.epoch() % self.interval == 0:
+            self.show(runner)
+
+    def after_train_iter(self, runner, batch_idx: int, data_batch,
+                         outputs) -> None:
+        if not self.by_epoch and RuntimeInfo.iter() % self.interval == 0:
+            self.show(runner)
+
+
+def input_generator_wrapper(model, demp_input: DefaultDemoInput):
+
+    def input_generator(input_shape):
+        res = demp_input.get_data(model)
+        return res
+
+    return input_generator
+
+
+@HOOKS.register_module()
+class ResourceInfoHook(Hook):
+    """This hook is used to display the resource related information and save
+    the checkpoint according to a threshold during pruning.
+
+    Args:
+        demo_input (dict, optional): the demo input for ResourceEstimator.
+            Defaults to DefaultDemoInput([1, 3, 224, 224]).
+        interval (int, optional): the interval to check the resource. Defaults
+            to 10.
+        resource_type (str, optional): the type of resource to check.
+            Defaults to 'flops'.
+        save_ckpt_thr (list, optional): the threshold to save checkpoint.
+            Defaults to [0.5].
+        early_stop (bool, optional): whether to stop when all checkpoints have
+            been saved according to save_ckpt_thr. Defaults to True.
+    """
+
+    def __init__(self,
+                 demo_input=DefaultDemoInput([1, 3, 224, 224]),
+                 interval=10,
+                 resource_type='flops',
+                 save_ckpt_thr=[0.5],
+                 early_stop=True) -> None:
+
+        super().__init__()
+        if isinstance(demo_input, dict):
+            demo_input = TASK_UTILS.build(demo_input)
+
+        self.demo_input = demo_input
+        self.save_ckpt_thr = sorted(
+            save_ckpt_thr, reverse=True)  # big to small
+        self.resource_type = resource_type
+        self.early_stop = early_stop
+        self.estimator: ResourceEstimator = TASK_UTILS.build(
+            dict(
+                _scope_='mmrazor',
+                type='ResourceEstimator',
+                flops_params_cfg=dict(
+                    input_shape=tuple(demo_input.input_shape), )))
+        self.interval = interval
+        self.origin_delta = None
+
+    def before_run(self, runner) -> None:
+        """Init original_resource."""
+        model = get_model_from_runner(runner)
+        original_resource = self._evaluate(model)
+        print_log(f'get original resource: {original_resource}')
+
+        self.origin_delta = original_resource[self.resource_type]
+
+    # save checkpoint
+
+    def after_train_iter(self,
+                         runner: Runner,
+                         batch_idx: int,
+                         data_batch=None,
+                         outputs=None) -> None:
+        """Check resource after train iteration."""
+        if RuntimeInfo.iter() % self.interval == 0 and len(
+                self.save_ckpt_thr) > 0:
+            model = get_model_from_runner(runner)
+            current_delta = self._evaluate(model)[self.resource_type]
+            percent = current_delta / self.origin_delta
+            if percent < self.save_ckpt_thr[0]:
+                self._save_checkpoint(model, runner.work_dir,
+                                      self.save_ckpt_thr.pop(0))
+        if self.early_stop and len(self.save_ckpt_thr) == 0:
+            exit()
+
+    # show info
+
+    @master_only
+    def after_train_epoch(self, runner) -> None:
+        """Check resource after train epoch."""
+        model = get_model_from_runner(runner)
+        current_delta = self._evaluate(model)[self.resource_type]
+        print_log(
+            f'current {self.resource_type}: {current_delta} / {self.origin_delta}'  # noqa
+        )
+
+    #
+
+    def _evaluate(self, model: nn.Module):
+        """Evaluate the resource required by a model."""
+        with torch.no_grad():
+            training = model.training
+            model.eval()
+            res = self.estimator.estimate(
+                model,
+                flops_params_cfg=dict(
+                    input_constructor=input_generator_wrapper(
+                        model,
+                        self.demo_input,
+                    )))
+            if training:
+                model.train()
+            return res
+
+    @master_only
+    def _save_checkpoint(self, model, path, delta_percent):
+        """Save the checkpoint  of a model."""
+        ckpt = {'state_dict': model.state_dict()}
+        save_path = f'{path}/{self.resource_type}_{delta_percent:.2f}.pth'
+        save_checkpoint(ckpt, save_path)
+        print_log(
+            f'Save checkpoint to {save_path} with {self._evaluate(model)}'  # noqa
+        )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..d9e521a3850164935502f052bbe18f6f4118f206
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/mutator.py
@@ -0,0 +1,87 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import Dict, List, Type, Union
+
+from mmengine.dist import dist
+
+from mmrazor.models.mutators.channel_mutator.channel_mutator import \
+    ChannelMutator
+from mmrazor.registry import MODELS
+from mmrazor.utils import print_log
+from .unit import GroupFisherChannelUnit
+
+
+@MODELS.register_module()
+class GroupFisherChannelMutator(ChannelMutator[GroupFisherChannelUnit]):
+    """Channel mutator for GroupFisher Pruning Algorithm.
+
+    Args:
+        channel_unit_cfg (Union[dict, Type[ChannelUnitType]], optional):
+            Config of MutableChannelUnits. Defaults to
+            dict(type='GroupFisherChannelUnit',
+                 default_args=dict(choice_mode='ratio')).
+        parse_cfg (Dict): The config of the tracer to parse the model.
+            Defaults to dict(type='ChannelAnalyzer',
+                             demo_input=(1, 3, 224, 224),
+                             tracer_type='FxTracer').
+    """
+
+    def __init__(self,
+                 channel_unit_cfg: Union[dict,
+                                         Type[GroupFisherChannelUnit]] = dict(
+                                             type='GroupFisherChannelUnit'),
+                 parse_cfg: Dict = dict(
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='FxTracer'),
+                 **kwargs) -> None:
+        super().__init__(channel_unit_cfg, parse_cfg, **kwargs)
+        self.mutable_units: List[GroupFisherChannelUnit]
+
+    def start_record_info(self) -> None:
+        """Start recording the related information."""
+        for unit in self.mutable_units:
+            unit.start_record_fisher_info()
+
+    def end_record_info(self) -> None:
+        """Stop recording the related information."""
+        for unit in self.mutable_units:
+            unit.end_record_fisher_info()
+
+    def reset_recorded_info(self) -> None:
+        """Reset the related information."""
+        for unit in self.mutable_units:
+            unit.reset_recorded()
+
+    def try_prune(self) -> None:
+        """Prune the channel with the minimum fisher unless it is the last
+        channel of the current layer."""
+        min_imp = 1e5
+        min_unit = self.mutable_units[0]
+        for unit in self.mutable_units:
+            if unit.mutable_channel.activated_channels > 1:
+                imp = unit.importance()
+                if imp.isnan().any():
+                    if dist.get_rank() == 0:
+                        print_log(
+                            f'{unit.name} detects nan in importance, this pruning skips.'  # noqa
+                        )
+                    return
+                if imp.min() < min_imp:
+                    min_imp = imp.min().item()
+                    min_unit = unit
+        if min_unit.try_to_prune_min_channel():
+            if dist.get_rank() == 0:
+                print_log(
+                    f'{min_unit.name} prunes a channel with min imp = {min_imp}'  # noqa
+                )
+
+    def update_imp(self) -> None:
+        """Update the fisher information of each unit."""
+        for unit in self.mutable_units:
+            unit.update_fisher_info()
+
+    def reset_imp(self) -> None:
+        """Reset the fisher information of each unit."""
+        for unit in self.mutable_units:
+            unit.reset_fisher_info()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/ops.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..35dbbd7496d7f831867fa66eb7c9be8c31bc2f83
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/ops.py
@@ -0,0 +1,150 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+import torch
+
+from mmrazor.models.architectures.dynamic_ops.bricks.dynamic_conv import \
+    DynamicConv2d
+from mmrazor.models.architectures.dynamic_ops.bricks.dynamic_linear import \
+    DynamicLinear
+
+
+class GroupFisherMixin:
+    """The mixin class for GroupFisher ops."""
+
+    def _init(self) -> None:
+        self.handlers: list = []
+        self.recorded_input: List = []
+        self.recorded_grad: List = []
+        self.recorded_out_shape: List = []
+
+    def forward_hook_wrapper(self):
+        """Wrap the hook used in forward."""
+
+        def forward_hook(module: GroupFisherMixin, input, output):
+            module.recorded_out_shape.append(output.shape)
+            module.recorded_input.append(input[0])
+
+        return forward_hook
+
+    def backward_hook_wrapper(self):
+        """Wrap the hook used in backward."""
+
+        def backward_hook(module: GroupFisherMixin, grad_in, grad_out):
+            module.recorded_grad.insert(0, grad_in[0])
+
+        return backward_hook
+
+    def start_record(self: torch.nn.Module) -> None:
+        """Start recording information during forward and backward."""
+        self.end_record()  # ensure to run start_record only once
+        self.handlers.append(
+            self.register_forward_hook(self.forward_hook_wrapper()))
+        self.handlers.append(
+            self.register_backward_hook(self.backward_hook_wrapper()))
+
+    def end_record(self):
+        """Stop recording information during forward and backward."""
+        for handle in self.handlers:
+            handle.remove()
+        self.handlers = []
+
+    def reset_recorded(self):
+        """Reset the recorded information."""
+        self.recorded_input = []
+        self.recorded_grad = []
+        self.recorded_out_shape = []
+
+    @property
+    def delta_flop_of_a_out_channel(self):
+        raise NotImplementedError()
+
+    @property
+    def delta_flop_of_a_in_channel(self):
+        raise NotImplementedError()
+
+    @property
+    def delta_memory_of_a_out_channel(self):
+        raise NotImplementedError()
+
+
+class GroupFisherConv2d(DynamicConv2d, GroupFisherMixin):
+    """The Dynamic Conv2d operation used in GroupFisher Algorithm."""
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self._init()
+
+    @property
+    def delta_flop_of_a_out_channel(self) -> torch.Tensor:
+        """Calculate the summation of flops when prune an out_channel."""
+        delta_flop_sum = 0
+        for shape in self.recorded_out_shape:
+            _, _, h, w = shape
+            in_c = int(self.mutable_attrs['in_channels'].current_mask.float().
+                       sum().item())
+            # normal conv
+            if self.groups == 1:
+                delta_flop = h * w * self.kernel_size[0] * self.kernel_size[
+                    1] * in_c
+            # dwconv
+            elif self.groups == self.in_channels == self.out_channels:
+                delta_flop = h * w * self.kernel_size[0] * self.kernel_size[1]
+            # groupwise conv
+            else:
+                raise NotImplementedError()
+            delta_flop_sum += delta_flop
+        return delta_flop_sum
+
+    @property
+    def delta_flop_of_a_in_channel(self):
+        """Calculate the summation of flops when prune an in_channel."""
+        delta_flop_sum = 0
+        for shape in self.recorded_out_shape:
+            _, out_c, h, w = shape
+            # normal conv
+            if self.groups == 1:
+                delta_flop = h * w * self.kernel_size[0] * self.kernel_size[
+                    1] * out_c
+            # dwconv
+            elif self.groups == self.in_channels == self.out_channels:
+                delta_flop = h * w * self.kernel_size[0] * self.kernel_size[1]
+            # groupwise conv
+            else:
+                raise NotImplementedError()
+            delta_flop_sum += delta_flop
+        return delta_flop_sum
+
+    @property
+    def delta_memory_of_a_out_channel(self):
+        """Calculate the summation of memory when prune a channel."""
+        delta_flop_sum = 0
+        for shape in self.recorded_out_shape:
+            _, _, h, w = shape
+            delta_flop_sum += h * w
+        return delta_flop_sum
+
+
+class GroupFisherLinear(DynamicLinear, GroupFisherMixin):
+    """The Dynamic Linear operation used in GroupFisher Algorithm."""
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self._init()
+
+    @property
+    def delta_flop_of_a_out_channel(self):
+        """Calculate the summation of flops when prune an out_channel."""
+        in_c = self.mutable_attrs['in_channels'].current_mask.float().sum()
+        return in_c * len(self.recorded_out_shape)
+
+    @property
+    def delta_flop_of_a_in_channel(self):
+        """Calculate the summation of flops when prune an in_channel."""
+        out_c = self.mutable_attrs['out_channels'].current_mask.float().sum()
+        return out_c * len(self.recorded_out_shape)
+
+    @property
+    def delta_memory_of_a_out_channel(self):
+        """Calculate the summation of memory when prune a channel."""
+        return 1 * len(self.recorded_out_shape)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_deploy_sub_model.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_deploy_sub_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..00a569ef5a5e139ea0c2b5eab0fd53a0b7cd84d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_deploy_sub_model.py
@@ -0,0 +1,78 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import json
+import types
+from typing import Union
+
+import torch.nn as nn
+from mmengine import fileio
+
+from mmrazor.models.utils.expandable_utils import make_channel_divisible
+from mmrazor.registry import MODELS
+from mmrazor.structures.subnet.fix_subnet import (export_fix_subnet,
+                                                  load_fix_subnet)
+from mmrazor.utils import print_log
+
+
+def post_process_for_mmdeploy_wrapper(divisor):
+
+    def post_process_for_mmdeploy(model: nn.Module):
+        s = make_channel_divisible(model, divisor=divisor)
+        print_log(f'structure after make divisible: {json.dumps(s,indent=4)}')
+
+    return post_process_for_mmdeploy
+
+
+@MODELS.register_module()
+def GroupFisherDeploySubModel(architecture,
+                              fix_subnet: Union[dict, str] = {},
+                              divisor=1,
+                              parse_cfg=dict(
+                                  _scope_='mmrazor',
+                                  type='ChannelAnalyzer',
+                                  demo_input=(1, 3, 224, 224),
+                                  tracer_type='FxTracer'),
+                              **kwargs):
+    """Convert a architecture to a pruned static architecture for mmdeploy.
+
+    Args:
+        architecture (Union[nn.Module, dict]): the model to be pruned.
+        fix_subnet (Union[dict, str]): the channel remaining ratio for each
+            unit, or the path of a file including this info. Defaults to {}.
+        divisor (int, optional): The divisor to make the channel number
+            divisible. Defaults to 1.
+        parse_cfg (dict, optional): The args for channel mutator.
+    Returns:
+        BaseModel: a BaseModel of mmengine.
+    """
+    # import avoid circular import
+    from mmrazor.models.mutables import SequentialMutableChannelUnit
+    from mmrazor.models.mutators import ChannelMutator
+    from mmrazor.models.utils.expandable_utils.unit import ExpandableUnit
+
+    #  build architecture
+    if isinstance(architecture, dict):
+        architecture = MODELS.build(architecture)
+    assert isinstance(architecture, nn.Module)
+
+    # to dynamic model
+    mutator = ChannelMutator[ExpandableUnit](
+        channel_unit_cfg=SequentialMutableChannelUnit, parse_cfg=parse_cfg)
+
+    mutator.prepare_from_supernet(architecture)
+    if isinstance(fix_subnet, str):
+        fix_subnet = fileio.load(fix_subnet)
+    assert isinstance(fix_subnet, dict)
+    mutator.set_choices(fix_subnet)
+    print_log(json.dumps(mutator.current_choices, indent=4))
+
+    fix_subnet = export_fix_subnet(architecture)[0]
+    load_fix_subnet(architecture, fix_subnet)
+
+    # cooperate with mmdeploy to make the channel divisible after load
+    # the checkpoint.
+    if divisor != 1:
+        setattr(
+            architecture, 'post_process_for_mmdeploy',
+            types.MethodType(
+                post_process_for_mmdeploy_wrapper(divisor), architecture))
+    return architecture
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_sub_model.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_sub_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..87a77346dd939233ab7c0b5e5afc06aba1b6ec60
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/prune_sub_model.py
@@ -0,0 +1,105 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import json
+import types
+
+import torch.nn as nn
+from mmengine import dist, fileio
+from mmengine.model import BaseModel, BaseModule
+
+from mmrazor.models.algorithms import BaseAlgorithm
+from mmrazor.models.utils.expandable_utils import make_channel_divisible
+from mmrazor.registry import MODELS
+from mmrazor.structures.subnet.fix_subnet import (export_fix_subnet,
+                                                  load_fix_subnet)
+from mmrazor.utils import RuntimeInfo, print_log
+
+
+def clean_params_init_info(model: nn.Module):
+    """Clean param init info."""
+    if hasattr(model, '_params_init_info'):
+        delattr(model, '_params_init_info')
+    for module in model.modules():
+        if hasattr(module, '_params_init_info'):
+            delattr(module, '_params_init_info')
+
+
+def clean_init_cfg(model: BaseModule):
+    """Clean init cfg."""
+    for module in model.modules():
+        if module is model:
+            continue
+        if isinstance(module, BaseModule):
+            module.init_cfg = {}
+
+
+def hacky_init_weights_wrapper(fix_subnet):
+    """This init weight method is used to prevent the model init again after
+    build.
+
+    Besides, It also save fix_subnet.json after RuntimeInfo is ready.
+    """
+
+    def hacky_init_weights(model):
+        if dist.get_rank() == 0:
+            try:
+                work_dir = RuntimeInfo.work_dir()
+                fileio.dump(
+                    fix_subnet, work_dir + '/fix_subnet.json', indent=4)
+                print_log(
+                    f'save pruning structure in {work_dir}/fix_subnet.json')
+            except Exception:
+                pass
+
+    return hacky_init_weights
+
+
+@MODELS.register_module()
+def GroupFisherSubModel(
+    algorithm,
+    divisor=1,
+    **kargs,
+):
+    """Convert a algorithm(with an architecture) to a static pruned
+    architecture.
+
+    Args:
+        algorithm (Union[BaseAlgorithm, dict]): The pruning algorithm to
+            finetune.
+        divisor (int): The divisor to make the channel number
+            divisible. Defaults to 1.
+
+    Returns:
+        nn.Module: a static model.
+    """
+    # init algorithm
+    if isinstance(algorithm, dict):
+        algorithm = MODELS.build(algorithm)  # type: ignore
+    assert isinstance(algorithm, BaseAlgorithm)
+    algorithm.init_weights()
+    clean_params_init_info(algorithm)
+
+    pruning_structure = algorithm.mutator.choice_template
+    print_log('PruneSubModel get pruning structure:')
+    print_log(json.dumps(pruning_structure, indent=4))
+
+    # to static model
+    fix_mutable = export_fix_subnet(algorithm.architecture)[0]
+    load_fix_subnet(algorithm.architecture, fix_mutable)
+    model = algorithm.architecture
+
+    # make channel divisible
+    if divisor != 1:
+        divisible_structure = make_channel_divisible(
+            model, divisor=divisor, zero_weight=False)
+
+        print_log('PruneSubModel get divisible pruning structure:')
+        print_log(json.dumps(divisible_structure, indent=4))
+        pruning_structure = divisible_structure
+
+    # refine model
+    model.data_preprocessor = algorithm.data_preprocessor
+    if isinstance(model, BaseModel):
+        model.init_cfg = None
+        model.init_weights = types.MethodType(
+            hacky_init_weights_wrapper(pruning_structure), model)
+    return model
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c9128b78a3c1050acc2c6d9b462afa199f86c31
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/implementations/pruning/group_fisher/unit.py
@@ -0,0 +1,230 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+import torch
+import torch.nn as nn
+from mmengine.model.utils import _BatchNormXd
+from mmengine.utils.dl_utils.parrots_wrapper import \
+    SyncBatchNorm as EngineSyncBatchNorm
+from torch import distributed as dist
+
+import mmrazor.models.architectures.dynamic_ops as dynamic_ops
+from mmrazor.models.mutables.mutable_channel.mutable_channel_container import \
+    MutableChannelContainer
+from mmrazor.models.mutables.mutable_channel.units.l1_mutable_channel_unit import \
+    L1MutableChannelUnit  # noqa
+from mmrazor.registry import MODELS
+from .ops import GroupFisherConv2d, GroupFisherLinear, GroupFisherMixin
+
+
+@MODELS.register_module()
+class GroupFisherChannelUnit(L1MutableChannelUnit):
+    """ChannelUnit for GroupFisher Pruning Algorithm.
+
+    Args:
+        num_channels (int): Number of channels.
+        normalization_type (str): Type of normalization. It can be one of
+            ['flops','act','none',]. Defaults to 'flop'.
+        mutate_linear (bool): Whether to prune linear layers.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 normalization_type: str = 'flops',
+                 mutate_linear=False,
+                 *args) -> None:
+        super().__init__(num_channels, *args)
+        normalized_fisher_info = torch.zeros([self.num_channels])
+        self.register_buffer('normalized_fisher_info', normalized_fisher_info)
+        self.normalized_fisher_info: torch.Tensor
+
+        self.hook_handles: List = []
+        assert normalization_type in ['flops', 'act', 'none']
+        self.delta_type = normalization_type
+
+        self.mutate_linear = mutate_linear
+
+    def prepare_for_pruning(self, model: nn.Module) -> None:
+        """Prepare for pruning, including register mutable channels.
+
+        Args:
+            model (nn.Module): The model need to be pruned.
+        """
+        # register MutableMask
+        self._replace_with_dynamic_ops(
+            model, {
+                nn.Conv2d: GroupFisherConv2d,
+                nn.BatchNorm2d: dynamic_ops.DynamicBatchNorm2d,
+                nn.Linear: GroupFisherLinear,
+                nn.SyncBatchNorm: dynamic_ops.DynamicSyncBatchNorm,
+                EngineSyncBatchNorm: dynamic_ops.DynamicSyncBatchNorm,
+                _BatchNormXd: dynamic_ops.DynamicBatchNormXd,
+            })
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
+
+    # prune
+    def try_to_prune_min_channel(self) -> bool:
+        """Prune the channel with the minimum value of fisher information."""
+        if self.mutable_channel.activated_channels > 1:
+            imp = self.importance()
+            index = imp.argmin()
+            self.mutable_channel.mask.scatter_(0, index, 0.0)
+            return True
+        else:
+            return False
+
+    @property
+    def is_mutable(self) -> bool:
+        """Whether the unit is mutable."""
+        mutable = super().is_mutable
+        if self.mutate_linear:
+            return mutable
+        else:
+            has_linear = False
+            for layer in self.input_related:
+                if isinstance(layer.module, nn.Linear):
+                    has_linear = True
+            return mutable and (not has_linear)
+
+    @property
+    def input_related_dynamic_ops(self):
+        for channel in self.input_related:
+            if isinstance(channel.module, GroupFisherMixin):
+                yield channel.module
+
+    @property
+    def output_related_dynamic_ops(self):
+        for channel in self.output_related:
+            if isinstance(channel.module, GroupFisherMixin):
+                yield channel.module
+
+    @property
+    def dynamic_ops(self):
+        for module in self.input_related_dynamic_ops:
+            yield module
+        for module in self.output_related_dynamic_ops:
+            yield module
+
+    # fisher information recorded
+
+    def start_record_fisher_info(self) -> None:
+        """Start recording the related fisher info of each channel."""
+        for module in self.dynamic_ops:
+            module.start_record()
+
+    def end_record_fisher_info(self) -> None:
+        """Stop recording the related fisher info of each channel."""
+        for module in self.dynamic_ops:
+            module.end_record()
+
+    def reset_recorded(self) -> None:
+        """Reset the recorded info of each channel."""
+        for module in self.dynamic_ops:
+            module.reset_recorded()
+
+    # fisher related computation
+
+    def importance(self):
+        """The importance of each channel."""
+        fisher = self.normalized_fisher_info.clone()
+        mask = self.mutable_channel.current_mask
+        n_mask = (1 - mask.float()).bool()
+        fisher.masked_fill_(n_mask, fisher.max() + 1)
+        return fisher
+
+    def reset_fisher_info(self) -> None:
+        """Reset the related fisher info."""
+        self.normalized_fisher_info.zero_()
+
+    @torch.no_grad()
+    def update_fisher_info(self) -> None:
+        """Update the fisher info of each channel."""
+
+        batch_fisher_sum = self.current_batch_fisher
+        assert isinstance(batch_fisher_sum, torch.Tensor)
+        if dist.is_initialized():
+            dist.all_reduce(batch_fisher_sum)
+        batch_fisher_sum = self._get_normalized_fisher_info(
+            batch_fisher_sum, self.delta_type)
+        self.normalized_fisher_info = self.normalized_fisher_info + batch_fisher_sum  # noqa
+
+    @property
+    def current_batch_fisher(self) -> torch.Tensor:
+        """Accumulate the unit's fisher info of this batch."""
+        with torch.no_grad():
+            fisher: torch.Tensor = 0
+            for module in self.input_related_dynamic_ops:
+                fisher = fisher + self._fisher_of_a_module(module)
+            return (fisher**2).sum(0)  # shape: [C]
+
+    @torch.no_grad()
+    def _fisher_of_a_module(self, module: GroupFisherMixin) -> torch.Tensor:
+        """Calculate the fisher info of one module.
+
+        Args:
+            module (GroupFisherConv2d): A `GroupFisherConv2d` module.
+
+        Return:
+            torch.Tensor: Whose shape is [B C]
+        """
+        assert len(module.recorded_input) > 0 and \
+            len(module.recorded_input) == len(module.recorded_grad)
+        fisher_sum: torch.Tensor = 0
+        for input, grad_input in zip(module.recorded_input,
+                                     module.recorded_grad):
+            fisher: torch.Tensor = input * grad_input
+            if len(fisher.shape) == 4:
+                fisher = fisher.sum(dim=[2, 3])
+            assert len(fisher.shape) == 2  # B C
+            fisher_sum = fisher_sum + fisher
+        assert isinstance(fisher_sum, torch.Tensor)
+        # expand to full num_channel
+        batch_size = fisher_sum.shape[0]
+        mask = self.mutable_channel.current_mask.unsqueeze(0).expand(
+            [batch_size, self.num_channels])
+        zeros = fisher_sum.new_zeros([batch_size, self.num_channels])
+        fisher_sum = zeros.masked_scatter_(mask, fisher_sum)
+        return fisher_sum
+
+    @torch.no_grad()
+    def _get_normalized_fisher_info(self,
+                                    fisher_info,
+                                    delta_type='flop') -> torch.Tensor:
+        """Get the normalized fisher info.
+
+        Args:
+            delta_type (str): Type of delta. Defaults to 'flop'.
+        """
+        fisher = fisher_info.double()
+        if delta_type == 'flops':
+            delta_flop = self._delta_flop_of_a_channel
+            assert delta_flop > 0
+            fisher = fisher / (float(delta_flop) / 1e9)
+        elif delta_type == 'act':
+            delta_memory = self._delta_memory_of_a_channel
+            assert delta_memory > 0
+            fisher = fisher / (float(delta_memory) / 1e6)
+        elif delta_type == 'none':
+            pass
+        else:
+            raise NotImplementedError(delta_type)
+        return fisher
+
+    @property
+    def _delta_flop_of_a_channel(self) -> torch.Tensor:
+        """Calculate the flops of a channel."""
+        delta_flop = 0
+        for module in self.output_related_dynamic_ops:
+            delta_flop += module.delta_flop_of_a_out_channel
+        for module in self.input_related_dynamic_ops:
+            delta_flop += module.delta_flop_of_a_in_channel
+        return delta_flop
+
+    @property
+    def _delta_memory_of_a_channel(self) -> torch.Tensor:
+        """Calculate the memory of a channel."""
+        delta_memory = 0
+        for module in self.output_related_dynamic_ops:
+            delta_memory += module.delta_memory_of_a_out_channel
+        return delta_memory
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5b9ec45156d958a8dda24b9487c9f4277d0bee2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/__init__.py
@@ -0,0 +1,12 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .algorithms import *  # noqa: F401,F403
+from .architectures import *  # noqa: F401,F403
+from .distillers import *  # noqa: F401,F403
+from .fake_quants import *  # noqa: F401,F403
+from .losses import *  # noqa: F401,F403
+from .mutables import *  # noqa: F401,F403
+from .mutators import *  # noqa: F401,F403
+from .observers import *  # noqa: F401,F403
+from .quantizers import *  # noqa: F401,F403
+from .task_modules import *  # noqa: F401,F403
+from .utils import *  # noqa: F401,F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..178cc653547f69bc58a59807d45c8c91f3494c59
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base import BaseAlgorithm
+from .distill import (DAFLDataFreeDistillation, DataFreeDistillation,
+                      FpnTeacherDistill, OverhaulFeatureDistillation,
+                      SelfDistill, SingleTeacherDistill)
+from .nas import (DSNAS, DSNASDDP, SPOS, Autoformer, AutoSlim, AutoSlimDDP,
+                  BigNAS, BigNASDDP, Darts, DartsDDP)
+from .pruning import DCFF, DMCP, DMCPDDP, SlimmableNetwork, SlimmableNetworkDDP
+from .pruning.ite_prune_algorithm import ItePruneAlgorithm
+from .quantization import MMArchitectureQuant, MMArchitectureQuantDDP
+
+__all__ = [
+    'SingleTeacherDistill', 'BaseAlgorithm', 'FpnTeacherDistill', 'SPOS',
+    'SlimmableNetwork', 'SlimmableNetworkDDP', 'AutoSlim', 'AutoSlimDDP',
+    'Darts', 'DartsDDP', 'DCFF', 'SelfDistill', 'DataFreeDistillation',
+    'DAFLDataFreeDistillation', 'OverhaulFeatureDistillation',
+    'ItePruneAlgorithm', 'DSNAS', 'DSNASDDP', 'Autoformer', 'BigNAS',
+    'BigNASDDP', 'DMCP', 'DMCPDDP', 'MMArchitectureQuant',
+    'MMArchitectureQuantDDP'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/base.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..5bb98391bde4513eb3055185bb4a66a4430d4043
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/base.py
@@ -0,0 +1,209 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, OrderedDict, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine.model import BaseModel
+from mmengine.structures import BaseDataElement
+
+from mmrazor.registry import MODELS
+
+LossResults = Dict[str, torch.Tensor]
+TensorResults = Union[Tuple[torch.Tensor], torch.Tensor]
+PredictResults = List[BaseDataElement]
+ForwardResults = Union[LossResults, TensorResults, PredictResults]
+
+
+@MODELS.register_module()
+class BaseAlgorithm(BaseModel):
+    """Base class for algorithms.
+
+    BaseAlgorithm inherit from BaseModel. BaseModel implements the basic
+    functions of the algorithmic model, such as weights initialize,
+    batch inputs preprocess(see more information in
+    :class:`BaseDataPreprocessor`), parse losses, and update model parameters.
+    More details of BaseModel could see docs for :class:`BaseModel`.
+
+    :obj:`BaseAlgorithm` forward just is a wrapper of :obj:`BaseModel` forward.
+    Various compression algorithms can be implemented by inheriting
+    BaseAlgorithm.
+
+    Subclasses inherit from BaseAlgorithm only need to override the
+    :meth:`loss`, which implements the logic to calculate loss, then
+    can be trained in the runner.
+
+    Args:
+        architecture (dict | :obj:`BaseModel`): The config of
+            :class:`BaseModel` or built model.
+        data_preprocessor (dict | torch.nn.Module | None): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        init_cfg (dict): The weight initialized config for
+            :class:`BaseModule`.
+        module_inplace(bool): Whether to allow module inplace attribute True.
+            Defaults to False.
+
+    Note:
+        If `data_preprocessor` is None, :obj:`BaseAlgorithm` will set
+        `data_preprocessor` to model's `data_preprocessor`.
+
+
+    Attributes:
+        architecture (:obj:`BaseModel`): Model that needs to be compressed.
+        data_preprocessor (:obj:`BaseDataPreprocessor`): Used for
+            pre-processing data sampled by dataloader to the format accepted by
+            :meth:`forward`.
+        init_cfg (dict, optional): Initialization config dict.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 init_cfg: Optional[Dict] = None,
+                 module_inplace: bool = False) -> None:
+
+        # super().__init__() needs built data_preprocessor, so
+        # build model first.
+        if isinstance(architecture, Dict):
+            architecture = MODELS.build(architecture)
+
+        if not isinstance(architecture, BaseModel):
+            raise TypeError('architecture should be a `dict` or '
+                            f'`BaseModel` instance, but got '
+                            f'{type(architecture)}')
+
+        # If `data_preprocessor` is None, there will set
+        # `data_preprocessor` to model's `data_preprocessor`.
+        if data_preprocessor is None:
+            # use model's data_preprocessor
+            data_preprocessor = getattr(architecture, 'data_preprocessor',
+                                        None)
+        super().__init__(data_preprocessor, init_cfg)
+
+        # Cannot assign module before Module.__init__()
+        self.architecture = architecture
+
+        # Find all nn.Modules in the model that contain the 'inplace' attribute
+        # and set them to False
+        self.module_inplace = module_inplace
+        if not self.module_inplace:
+            self.set_module_inplace_false(architecture, 'self.architecture')
+        pass
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[List[BaseDataElement]] = None,
+                mode: str = 'tensor') -> ForwardResults:
+        """Returns losses or predictions of training, validation, testing, and
+        simple inference process.
+
+        ``forward`` method of BaseModel is an abstract method, its subclasses
+        must implement this method.
+
+        Accepts ``batch_inputs`` and ``data_samples`` processed by
+        :attr:`data_preprocessor`, and returns results according to mode
+        arguments.
+
+        During non-distributed training, validation, and testing process,
+        ``forward`` will be called by ``BaseModel.train_step``,
+        ``BaseModel.val_step`` and ``BaseModel.val_step`` directly.
+
+        During distributed data parallel training process,
+        ``MMSeparateDistributedDataParallel.train_step`` will first call
+        ``DistributedDataParallel.forward`` to enable automatic
+        gradient synchronization, and then call ``forward`` to get training
+        loss.
+
+        Args:
+            batch_inputs (torch.Tensor): batch input tensor collated by
+                :attr:`data_preprocessor`.
+            data_samples (List[BaseDataElement], optional):
+                data samples collated by :attr:`data_preprocessor`.
+            mode (str): mode should be one of ``loss``, ``predict`` and
+                ``tensor``
+                - ``loss``: Called by ``train_step`` and return loss ``dict``
+                  used for logging
+                - ``predict``: Called by ``val_step`` and ``test_step``
+                  and return list of ``BaseDataElement`` results used for
+                  computing metric.
+                - ``tensor``: Called by custom use to get ``Tensor`` type
+                  results.
+
+        Returns:
+            ForwardResults:
+                - If ``mode == loss``, return a ``dict`` of loss tensor used
+                  for backward and logging.
+                - If ``mode == predict``, return a ``list`` of
+                  :obj:`BaseDataElement` for computing metric
+                  and getting inference result.
+                - If ``mode == tensor``, return a tensor or ``tuple`` of tensor
+                  or ``dict of tensor for custom use.
+        """
+        if mode == 'loss':
+            return self.loss(inputs, data_samples)
+        elif mode == 'tensor':
+            return self._forward(inputs, data_samples)
+        elif mode == 'predict':
+            return self._predict(inputs, data_samples)
+        else:
+            raise RuntimeError(f'Invalid mode "{mode}". '
+                               'Only supports loss, predict and tensor mode')
+
+    def loss(
+        self,
+        inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+        return self.architecture(inputs, data_samples, mode='loss')
+
+    def _forward(
+        self,
+        inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> TensorResults:
+        """Network forward process."""
+        return self.architecture(inputs, data_samples, mode='tensor')
+
+    def _predict(
+        self,
+        inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> PredictResults:
+        """Predict results from a batch of inputs and data samples with post-
+        processing."""
+        return self.architecture(inputs, data_samples, mode='predict')
+
+    def set_module_inplace_false(self, architecture: Union[OrderedDict,
+                                                           nn.Module],
+                                 varstr: str) -> None:
+        """Find all nn.Modules in the model that contain the 'inplace'
+        attribute and set them to False in order to prevent occur error in
+        Recorders using recursion algorithm.
+
+        This function will disassemble the Args architecture .If type
+        'nn.Module' is detected, determine if it contains an 'inplace'
+        attribute and set False if it does. If none, get the OrderedDict
+        and then iterate through the dictionary to continue the recursive
+        search.
+
+        Args:
+            architecture (OrderedDict | nn.Module): The config OrderedDict
+            for model or built model.
+            varstr (str): Records the call-level string containing the
+            'inplace' attribute.
+
+        Returns:
+            None
+        """
+
+        if isinstance(architecture, nn.Module):
+            if hasattr(eval(varstr), 'inplace'):
+                eval(varstr).inplace = False
+            else:
+                self.set_module_inplace_false(architecture._modules,
+                                              varstr + '._modules')
+        elif isinstance(architecture, OrderedDict):
+            for key, value in architecture.items():
+                self.set_module_inplace_false(value, varstr + f"['{key}']")
+        else:
+            return
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..40bc62a46e3349a40cf85f3ca4ca0cf2b7ae0a5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .configurable import (DAFLDataFreeDistillation, DataFreeDistillation,
+                           FpnTeacherDistill, OverhaulFeatureDistillation,
+                           SelfDistill, SingleTeacherDistill)
+
+__all__ = [
+    'SingleTeacherDistill', 'FpnTeacherDistill', 'SelfDistill',
+    'DataFreeDistillation', 'DAFLDataFreeDistillation',
+    'OverhaulFeatureDistillation'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8902f737c89a7b345cdff2747d544c42b7f671cc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .datafree_distillation import (DAFLDataFreeDistillation,
+                                    DataFreeDistillation)
+from .fpn_teacher_distill import FpnTeacherDistill
+from .overhaul_feature_distillation import OverhaulFeatureDistillation
+from .self_distill import SelfDistill
+from .single_teacher_distill import SingleTeacherDistill
+
+__all__ = [
+    'SelfDistill', 'SingleTeacherDistill', 'FpnTeacherDistill',
+    'DataFreeDistillation', 'DAFLDataFreeDistillation',
+    'OverhaulFeatureDistillation'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/datafree_distillation.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/datafree_distillation.py
new file mode 100644
index 0000000000000000000000000000000000000000..1aff807cda1cb11e4359c89b70d424762eacd77e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/datafree_distillation.py
@@ -0,0 +1,229 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Tuple
+
+import torch
+import torch.nn as nn
+from mmengine.optim import OPTIMIZERS, OptimWrapper
+from mmengine.runner import load_checkpoint
+
+from mmrazor.models.utils import add_prefix, set_requires_grad
+from mmrazor.registry import MODELS
+from ...base import BaseAlgorithm
+
+
+@MODELS.register_module()
+class DataFreeDistillation(BaseAlgorithm):
+    """Algorithm for data-free teacher-student distillation Typically, the
+    teacher is a pretrained model and the student is a small model trained on
+    the generator's output. The student is trained to mimic the behavior of the
+    teacher. The generator is trained to generate images that are similar to
+    the real images.
+
+    Args:
+        distiller (dict): The config dict for built distiller.
+        generator_distiller (dict): The distiller collecting outputs & losses
+            to update the generator.
+        teachers (dict[str, dict]): The dict of config dict for teacher models
+            and their ckpt_path (optional).
+        generator (dictl): The config dict for built distiller generator.
+        student_iter (int): The number of student steps in train_step().
+            Defaults to 1.
+        student_train_first (bool): Whether to train student in first place.
+            Defaults to False.
+    """
+
+    def __init__(self,
+                 distiller: dict,
+                 generator_distiller: dict,
+                 teachers: Dict[str, Dict[str, dict]],
+                 generator: dict,
+                 student_iter: int = 1,
+                 student_train_first: bool = False,
+                 **kwargs) -> None:
+        super().__init__(**kwargs)
+        self.student_iter = student_iter
+        self.student_train_first = student_train_first
+        self.distiller = MODELS.build(distiller)
+        self.generator_distiller = MODELS.build(generator_distiller)
+
+        if not isinstance(teachers, Dict):
+            raise TypeError('teacher should be a `dict` but got '
+                            f'{type(teachers)}')
+
+        self.teachers = nn.ModuleDict()
+        for teacher_name, cfg in teachers.items():
+            self.teachers[teacher_name] = MODELS.build(cfg['build_cfg'])
+            if 'ckpt_path' in cfg:
+                # avoid loaded parameters be overwritten
+                self.teachers[teacher_name].init_weights()
+                _ = load_checkpoint(self.teachers[teacher_name],
+                                    cfg['ckpt_path'])
+            self.teachers[teacher_name].eval()
+            set_requires_grad(self.teachers[teacher_name], False)
+
+        if not isinstance(generator, Dict):
+            raise TypeError('generator should be a `dict` instance, but got '
+                            f'{type(generator)}')
+        self.generator = MODELS.build(generator)
+
+        # In ``DataFreeDistiller``, the recorder manager is just
+        # constructed, but not really initialized yet.
+        self.distiller.prepare_from_student(self.student)
+        self.distiller.prepare_from_teacher(self.teachers)
+        self.generator_distiller.prepare_from_student(self.student)
+        self.generator_distiller.prepare_from_teacher(self.teachers)
+
+    @property
+    def student(self) -> nn.Module:
+        """Alias for ``architecture``."""
+        return self.architecture
+
+    def train_step(self, data: Dict[str, List[dict]],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """Train step for DataFreeDistillation.
+
+        Args:
+            data (Dict[str, List[dict]]): Data sampled by dataloader.
+            optim_wrapper (OptimWrapper): A wrapper of optimizer to
+                update parameters.
+        """
+        log_vars = dict()
+        for _, teacher in self.teachers.items():
+            teacher.eval()
+
+        if self.student_train_first:
+            _, dis_log_vars = self.train_student(data,
+                                                 optim_wrapper['architecture'])
+            _, generator_loss_vars = self.train_generator(
+                data, optim_wrapper['generator'])
+        else:
+            _, generator_loss_vars = self.train_generator(
+                data, optim_wrapper['generator'])
+            _, dis_log_vars = self.train_student(data,
+                                                 optim_wrapper['architecture'])
+
+        log_vars.update(dis_log_vars)
+        log_vars.update(generator_loss_vars)
+        return log_vars
+
+    def train_student(
+            self, data: Dict[str, List[dict]], optimizer: OPTIMIZERS
+    ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
+        """Train step for the student model.
+
+        Args:
+            data (Dict[str, List[dict]]): Data sampled by dataloader.
+            optimizer (OPTIMIZERS): The optimizer to update student.
+        """
+        log_vars = dict()
+        batch_size = len(data['inputs'])
+
+        for _ in range(self.student_iter):
+            fakeimg_init = torch.randn(
+                (batch_size, self.generator.module.latent_dim))
+            fakeimg = self.generator(fakeimg_init, batch_size).detach()
+
+            with optimizer.optim_context(self.student):
+                pseudo_data = self.data_preprocessor(data, True)
+                pseudo_data_samples = pseudo_data['data_samples']
+                # recorde the needed information
+                with self.distiller.student_recorders:
+                    _ = self.student(fakeimg, pseudo_data_samples, mode='loss')
+                with self.distiller.teacher_recorders, torch.no_grad():
+                    for _, teacher in self.teachers.items():
+                        _ = teacher(fakeimg, pseudo_data_samples, mode='loss')
+                loss_distill = self.distiller.compute_distill_losses()
+
+            distill_loss, distill_log_vars = self.parse_losses(loss_distill)
+            optimizer.update_params(distill_loss)
+        log_vars = dict(add_prefix(distill_log_vars, 'distill'))
+
+        return distill_loss, log_vars
+
+    def train_generator(
+            self, data: Dict[str, List[dict]], optimizer: OPTIMIZERS
+    ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
+        """Train step for the generator.
+
+        Args:
+            data (Dict[str, List[dict]]): Data sampled by dataloader.
+            optimizer (OPTIMIZERS): The optimizer to update generator.
+        """
+        batch_size = len(data['inputs'])
+        fakeimg_init = torch.randn(
+            (batch_size, self.generator.module.latent_dim))
+        fakeimg = self.generator(fakeimg_init, batch_size)
+
+        with optimizer.optim_context(self.generator):
+            pseudo_data = self.data_preprocessor(data, True)
+            pseudo_data_samples = pseudo_data['data_samples']
+            # recorde the needed information
+            with self.generator_distiller.student_recorders:
+                _ = self.student(fakeimg, pseudo_data_samples, mode='loss')
+            with self.generator_distiller.teacher_recorders:
+                for _, teacher in self.teachers.items():
+                    _ = teacher(fakeimg, pseudo_data_samples, mode='loss')
+            loss_generator = self.generator_distiller.compute_distill_losses()
+
+        generator_loss, generator_loss_vars = self.parse_losses(loss_generator)
+        optimizer.update_params(generator_loss)
+        log_vars = dict(add_prefix(generator_loss_vars, 'generator'))
+
+        return generator_loss, log_vars
+
+
+@MODELS.register_module()
+class DAFLDataFreeDistillation(DataFreeDistillation):
+
+    def train_step(self, data: Dict[str, List[dict]],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """DAFL train step.
+
+        Args:
+            data (Dict[str, List[dict]): Data sampled by dataloader.
+            optim_wrapper (OptimWrapper): A wrapper of optimizer to
+                update parameters.
+        """
+        log_vars = dict()
+        batch_size = len(data['inputs'])
+
+        for _, teacher in self.teachers.items():
+            teacher.eval()
+
+        # fakeimg initialization and revised by generator.
+        fakeimg_init = torch.randn(
+            (batch_size, self.generator.module.latent_dim))
+        fakeimg = self.generator(fakeimg_init, batch_size)
+        pseudo_data = self.data_preprocessor(data, True)
+        pseudo_data_samples = pseudo_data['data_samples']
+
+        with optim_wrapper['generator'].optim_context(self.generator):
+            # recorde the needed information
+            with self.generator_distiller.student_recorders:
+                _ = self.student(fakeimg, pseudo_data_samples, mode='loss')
+            with self.generator_distiller.teacher_recorders:
+                for _, teacher in self.teachers.items():
+                    _ = teacher(fakeimg, pseudo_data_samples, mode='loss')
+            loss_generator = self.generator_distiller.compute_distill_losses()
+
+        generator_loss, generator_loss_vars = self.parse_losses(loss_generator)
+        log_vars.update(add_prefix(generator_loss_vars, 'generator'))
+
+        with optim_wrapper['architecture'].optim_context(self.student):
+            # recorde the needed information
+            with self.distiller.student_recorders:
+                _ = self.student(
+                    fakeimg.detach(), pseudo_data_samples, mode='loss')
+            with self.distiller.teacher_recorders, torch.no_grad():
+                for _, teacher in self.teachers.items():
+                    _ = teacher(
+                        fakeimg.detach(), pseudo_data_samples, mode='loss')
+            loss_distill = self.distiller.compute_distill_losses()
+
+        distill_loss, distill_log_vars = self.parse_losses(loss_distill)
+        log_vars.update(add_prefix(distill_log_vars, 'distill'))
+
+        optim_wrapper['generator'].update_params(generator_loss)
+        optim_wrapper['architecture'].update_params(distill_loss)
+
+        return log_vars
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/fpn_teacher_distill.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/fpn_teacher_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a7027491a64cde71358ca4bc0ad3e8a96cc8850
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/fpn_teacher_distill.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Optional
+
+import torch
+from mmengine.structures import BaseDataElement
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODELS
+from ...base import LossResults
+from .single_teacher_distill import SingleTeacherDistill
+
+
+@MODELS.register_module()
+class FpnTeacherDistill(SingleTeacherDistill):
+    """``FpnTeacherDistill`` means teacher only execute backbone and neck.
+
+    If the intermediate results required for distill algorithm are generated by
+    the backbone and neck parts, using ``FpnTeacherDistill`` can speed up
+    training.
+    """
+
+    def loss(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+
+        losses = dict()
+        # If the `override_data` of a delivery is False, the delivery will
+        # record the origin data.
+        self.distiller.set_deliveries_override(False)
+
+        # Unlike ``SingleTeacherDistill``, teacher will only execute
+        # back + neck, not head, so there will be no loss.
+        if self.teacher_trainable:
+            with self.distiller.teacher_recorders, self.distiller.deliveries:
+                _ = self.teacher.extract_feat(batch_inputs)
+        else:
+            with self.distiller.teacher_recorders, self.distiller.deliveries:
+                with torch.no_grad():
+                    _ = self.teacher.extract_feat(batch_inputs)
+
+        # If the `override_data` of a delivery is True, the delivery will
+        # override the origin data with the recorded data.
+        self.distiller.set_deliveries_override(True)
+        with self.distiller.student_recorders, self.distiller.deliveries:
+            student_losses = self.student(
+                batch_inputs, data_samples, mode='loss')
+        losses.update(add_prefix(student_losses, 'student'))
+
+        if not self.distillation_stopped:
+            # Automatically compute distill losses based on
+            # `loss_forward_mappings`.
+            # The required data already exists in the recorders.
+            distill_losses = self.distiller.compute_distill_losses()
+            losses.update(add_prefix(distill_losses, 'distill'))
+
+        return losses
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/overhaul_feature_distillation.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/overhaul_feature_distillation.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b199269920d5ee5f241aacab98bd48d2f190ca0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/overhaul_feature_distillation.py
@@ -0,0 +1,53 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Union
+
+from mmengine.model import BaseModel
+
+from mmrazor.registry import MODELS
+from ....distillers import OFDDistiller
+from .single_teacher_distill import SingleTeacherDistill
+
+
+@MODELS.register_module()
+class OverhaulFeatureDistillation(SingleTeacherDistill):
+    """`A Comprehensive Overhaul of Feature Distillation`
+    https://sites.google.com/view/byeongho-heo/overhaul.
+
+    Inherited from ``SingleTeacherDistill``.
+
+
+    Args:
+        distiller (dict): The config dict for built distiller. Must be a
+            ``OFDDistiller``.
+        teacher (dict | BaseModel): The config dict for teacher model or built
+            teacher model.
+        teacher_ckpt (str): The path of teacher's checkpoint. Defaults to None.
+        teacher_trainable (bool): Whether the teacher is trainable. Defaults
+            to False.
+        teacher_norm_eval (bool): Whether to set teacher's norm layers to eval
+            mode, namely, freeze running stats (mean and var). Note: Effect on
+            Batch Norm and its variants only. Defaults to True.
+        student_trainable (bool): Whether the student is trainable. Defaults
+            to True.
+        calculate_student_loss (bool): Whether to calculate student loss
+            (original task loss) to update student model. Defaults to True.
+    """
+
+    def __init__(self,
+                 distiller: dict,
+                 teacher: Union[BaseModel, Dict],
+                 teacher_ckpt: Optional[str] = None,
+                 teacher_trainable: bool = False,
+                 teacher_norm_eval: bool = True,
+                 student_trainable: bool = True,
+                 calculate_student_loss: bool = True,
+                 **kwargs) -> None:
+        super().__init__(distiller, teacher, teacher_ckpt, teacher_trainable,
+                         teacher_norm_eval, student_trainable,
+                         calculate_student_loss, **kwargs)
+
+        assert isinstance(self.distiller, OFDDistiller), (
+            'distiller of `OverhaulFeatureDistillation` expects `OFDDistiller`'
+            f', but get {type(self.distiller)}')
+
+        self.distiller.init_ofd_connectors(self.teacher)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/self_distill.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/self_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..533d548e56cf0c2f0c7b0945fb387189a1cc79d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/self_distill.py
@@ -0,0 +1,92 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Optional
+
+import torch
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODELS
+from ...base import BaseAlgorithm, LossResults
+
+
+@MODELS.register_module()
+class SelfDistill(BaseAlgorithm):
+    """``SelfDistill`` can be used to develop distill algorithms without
+    teacher.
+
+    Args:
+        distiller (dict): The config dict for built distiller. Distiller may
+            have teacher.
+        student_trainable (bool): Whether the student is trainable. Defaults
+            to True.
+        calculate_student_loss (bool): Whether to calculate student loss
+            (original task loss) to update student model. Defaults to True.
+    """
+
+    def __init__(self,
+                 distiller: dict,
+                 student_trainable: bool = True,
+                 calculate_student_loss: bool = True,
+                 **kwargs) -> None:
+        super().__init__(**kwargs)
+
+        self.distiller = MODELS.build(distiller)
+        # The student model will not calculate gradients and update parameters
+        # in some pretraining process.
+        self.student_trainable = student_trainable
+
+        # The student loss will not be updated into ``losses`` in some
+        # pretraining process.
+        self.calculate_student_loss = calculate_student_loss
+
+        # In ``ConfigurableDistller``, the recorder manager is just
+        # constructed, but not really initialized yet.
+        self.distiller.prepare_from_student(self.student)
+        # Still prepare from self-teacher. Teacher recorders of
+        # ``SelfDistiller`` hook from self.student but require detach().
+        self.distiller.prepare_from_teacher(self.student)
+
+    @property
+    def student(self) -> nn.Module:
+        """Alias for ``architecture``."""
+        return self.architecture
+
+    def loss(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+
+        losses = dict()
+
+        # If the `override_data` of a delivery is True, the delivery will
+        # override the origin data with the recorded data.
+        self.distiller.set_deliveries_override(True)
+        # Original task loss will not be used during some pretraining process.
+        if self.calculate_student_loss:
+            # teacher_recorders hook from student
+            with self.distiller.student_recorders, \
+                    self.distiller.teacher_recorders, \
+                    self.distiller.deliveries:
+                student_losses = self.student(
+                    batch_inputs, data_samples, mode='loss')
+            losses.update(add_prefix(student_losses, 'student'))
+        else:
+            with self.distiller.student_recorders, \
+                    self.distiller.teacher_recorders, \
+                    self.distiller.deliveries:
+                if self.student_trainable:
+                    _ = self.student(batch_inputs, data_samples, mode='loss')
+                else:
+                    with torch.no_grad():
+                        _ = self.student(
+                            batch_inputs, data_samples, mode='loss')
+
+        # Automatically compute distill losses based on `loss_forward_mappings`
+        # The required data already exists in the recorders.
+        distill_losses = self.distiller.compute_distill_losses()
+        losses.update(add_prefix(distill_losses, 'distill'))
+
+        return losses
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/single_teacher_distill.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/single_teacher_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..97139d256ee27e8a905dd9d4bc6e366ad62081ae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/distill/configurable/single_teacher_distill.py
@@ -0,0 +1,156 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel
+from mmengine.runner import load_checkpoint
+from mmengine.structures import BaseDataElement
+from torch import nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODELS
+from ...base import BaseAlgorithm, LossResults
+
+
+@MODELS.register_module()
+class SingleTeacherDistill(BaseAlgorithm):
+    """``SingleTeacherDistill`` can be used to develop distill algorithms which
+    only use one teacher.
+
+    Args:
+        distiller (dict): The config dict for built distiller.
+        teacher (dict | BaseModel): The config dict for teacher model or built
+            teacher model.
+        teacher_ckpt (str): The path of teacher's checkpoint. Defaults to None.
+        teacher_trainable (bool): Whether the teacher is trainable. Defaults
+            to False.
+        teacher_norm_eval (bool): Whether to set teacher's norm layers to eval
+            mode, namely, freeze running stats (mean and var). Note: Effect on
+            Batch Norm and its variants only. Defaults to True.
+        student_trainable (bool): Whether the student is trainable. Defaults
+            to True.
+        calculate_student_loss (bool): Whether to calculate student loss
+            (original task loss) to update student model. Defaults to True.
+        teacher_module_inplace(bool): Whether to allow teacher module inplace
+            attribute True. Defaults to False.
+    """
+
+    def __init__(self,
+                 distiller: dict,
+                 teacher: Union[BaseModel, Dict],
+                 teacher_ckpt: Optional[str] = None,
+                 teacher_trainable: bool = False,
+                 teacher_norm_eval: bool = True,
+                 student_trainable: bool = True,
+                 calculate_student_loss: bool = True,
+                 teacher_module_inplace: bool = False,
+                 **kwargs) -> None:
+        super().__init__(**kwargs)
+
+        self.distiller = MODELS.build(distiller)
+
+        if isinstance(teacher, Dict):
+            teacher = MODELS.build(teacher)
+
+        if not isinstance(teacher, BaseModel):
+            raise TypeError('teacher should be a `dict` or '
+                            f'`BaseModel` instance, but got '
+                            f'{type(teacher)}')
+
+        self.teacher = teacher
+
+        # Find all nn.Modules in the model that contain the 'inplace' attribute
+        # and set them to False.
+        self.teacher_module_inplace = teacher_module_inplace
+        if not self.teacher_module_inplace:
+            self.set_module_inplace_false(teacher, 'self.teacher')
+
+        if teacher_ckpt:
+            _ = load_checkpoint(self.teacher, teacher_ckpt)
+            # avoid loaded parameters be overwritten
+            self.teacher._is_init = True
+        self.teacher_trainable = teacher_trainable
+        if not self.teacher_trainable:
+            for param in self.teacher.parameters():
+                param.requires_grad = False
+        self.teacher_norm_eval = teacher_norm_eval
+
+        # The student model will not calculate gradients and update parameters
+        # in some pretraining process.
+        self.student_trainable = student_trainable
+
+        # The student loss will not be updated into ``losses`` in some
+        # pretraining process.
+        self.calculate_student_loss = calculate_student_loss
+
+        # In ``ConfigurableDistller``, the recorder manager is just
+        # constructed, but not really initialized yet.
+        self.distiller.prepare_from_student(self.student)
+        self.distiller.prepare_from_teacher(self.teacher)
+
+        # may be modified by stop distillation hook
+        self.distillation_stopped = False
+
+    @property
+    def student(self) -> nn.Module:
+        """Alias for ``architecture``."""
+        return self.architecture
+
+    def loss(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+
+        losses = dict()
+
+        # If the `override_data` of a delivery is False, the delivery will
+        # record the origin data.
+        self.distiller.set_deliveries_override(False)
+        if self.teacher_trainable:
+            with self.distiller.teacher_recorders, self.distiller.deliveries:
+                teacher_losses = self.teacher(
+                    batch_inputs, data_samples, mode='loss')
+
+            losses.update(add_prefix(teacher_losses, 'teacher'))
+        else:
+            with self.distiller.teacher_recorders, self.distiller.deliveries:
+                with torch.no_grad():
+                    _ = self.teacher(batch_inputs, data_samples, mode='loss')
+
+        # If the `override_data` of a delivery is True, the delivery will
+        # override the origin data with the recorded data.
+        self.distiller.set_deliveries_override(True)
+        # Original task loss will not be used during some pretraining process.
+        if self.calculate_student_loss:
+            with self.distiller.student_recorders, self.distiller.deliveries:
+                student_losses = self.student(
+                    batch_inputs, data_samples, mode='loss')
+            losses.update(add_prefix(student_losses, 'student'))
+        else:
+            with self.distiller.student_recorders, self.distiller.deliveries:
+                if self.student_trainable:
+                    _ = self.student(batch_inputs, data_samples, mode='loss')
+                else:
+                    with torch.no_grad():
+                        _ = self.student(
+                            batch_inputs, data_samples, mode='loss')
+
+        if not self.distillation_stopped:
+            # Automatically compute distill losses based on
+            # `loss_forward_mappings`.
+            # The required data already exists in the recorders.
+            distill_losses = self.distiller.compute_distill_losses()
+            losses.update(add_prefix(distill_losses, 'distill'))
+
+        return losses
+
+    def train(self, mode: bool = True) -> None:
+        """Set distiller's forward mode."""
+        super().train(mode)
+        if mode and self.teacher_norm_eval:
+            for m in self.teacher.modules():
+                if isinstance(m, _BatchNorm):
+                    m.eval()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a9c29161f067caaa3e9accc062969bdb27f4886
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/__init__.py
@@ -0,0 +1,12 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .autoformer import Autoformer
+from .autoslim import AutoSlim, AutoSlimDDP
+from .bignas import BigNAS, BigNASDDP
+from .darts import Darts, DartsDDP
+from .dsnas import DSNAS, DSNASDDP
+from .spos import SPOS
+
+__all__ = [
+    'SPOS', 'AutoSlim', 'AutoSlimDDP', 'BigNAS', 'BigNASDDP', 'Darts',
+    'DartsDDP', 'DSNAS', 'DSNASDDP', 'DSNAS', 'DSNASDDP', 'Autoformer'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoformer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ac5432d817fa2f0f6c9100718b629e45d81191c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoformer.py
@@ -0,0 +1,63 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.mutators import NasMutator
+from mmrazor.registry import MODELS
+from ..base import BaseAlgorithm, LossResults
+
+VALID_MUTATOR_TYPE = Union[NasMutator, Dict]
+
+
+@MODELS.register_module()
+class Autoformer(BaseAlgorithm):
+    """Implementation of `Autoformer <https://arxiv.org/abs/2107.00651>`_
+
+    AutoFormer is dedicated to vision transformer search. AutoFormer
+    entangles the weights of different blocks in the same layers during
+    supernet training.
+    The logic of the search part is implemented in
+    :class:`mmrazor.engine.EvolutionSearchLoop`
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`NasMutator` or
+            built mutator.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 data_preprocessor: Optional[Union[dict, nn.Module]] = None,
+                 init_cfg: Optional[dict] = None):
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        self.mutator.prepare_from_supernet(self.architecture)
+
+    def _build_mutator(self, mutator: VALID_MUTATOR_TYPE = None) -> NasMutator:
+        """build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, NasMutator):
+            raise TypeError('mutator should be a `dict` or `NasMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def loss(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+        self.mutator.set_choices(self.mutator.sample_choices())
+        return self.architecture(batch_inputs, data_samples, mode='loss')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoslim.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoslim.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc8d54c0e7e1052811930b0e35e7954fbaf426df
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/autoslim.py
@@ -0,0 +1,263 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper
+from mmengine.structures import BaseDataElement
+from torch import nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.distillers import ConfigurableDistiller
+from mmrazor.models.mutators import ChannelMutator
+from mmrazor.models.utils import (add_prefix,
+                                  reinitialize_optim_wrapper_count_status)
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from ..base import BaseAlgorithm
+
+VALID_MUTATOR_TYPE = Union[ChannelMutator, Dict]
+VALID_DISTILLER_TYPE = Union[ConfigurableDistiller, Dict]
+VALID_PATH_TYPE = Union[str, Path]
+VALID_CHANNEL_CFG_PATH_TYPE = Union[VALID_PATH_TYPE, List[VALID_PATH_TYPE]]
+
+
+@MODELS.register_module()
+class AutoSlim(BaseAlgorithm):
+    """Implementation of Autoslim algorithm. Please refer to
+    https://arxiv.org/abs/1903.11728 for more details.
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`ChannelMutator` or
+            built mutator.
+        distiller (VALID_DISTILLER_TYPE): Cfg of :class:`ConfigurableDistiller`
+            or built distiller.
+        norm_training (bool): Whether set bn to training mode when model is
+            set to eval mode. Note that in slimmable networks, accumulating
+            different numbers of channels results in different feature means
+            and variances, which further leads to inaccurate statistics of
+            shared BN. Set ``norm_training`` to True to use the feature
+            means and variances in a batch.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        num_random_samples (int): number of random sample subnets.
+            Defaults to 2.
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 distiller: VALID_DISTILLER_TYPE = None,
+                 norm_training: bool = False,
+                 num_random_samples: int = 2,
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        # NOTE: `mutator.prepare_from_supernet` must be called
+        # before distiller initialized.
+        self.mutator.prepare_from_supernet(self.architecture)
+
+        self.distiller = self._build_distiller(distiller)
+        self.distiller.prepare_from_teacher(self.architecture)
+        self.distiller.prepare_from_student(self.architecture)
+
+        self.sample_kinds = ['max', 'min']
+        for i in range(num_random_samples):
+            self.sample_kinds.append('random' + str(i))
+
+        self._optim_wrapper_count_status_reinitialized = False
+        self.norm_training = norm_training
+
+    def _build_mutator(self,
+                       mutator: VALID_MUTATOR_TYPE = None) -> ChannelMutator:
+        """Build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, ChannelMutator):
+            raise TypeError('mutator should be a `dict` or `ChannelMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def _build_distiller(
+            self,
+            distiller: VALID_DISTILLER_TYPE = None) -> ConfigurableDistiller:
+        """Build distiller."""
+        if isinstance(distiller, dict):
+            distiller = MODELS.build(distiller)
+        if not isinstance(distiller, ConfigurableDistiller):
+            raise TypeError('distiller should be a `dict` or '
+                            '`ConfigurableDistiller` instance, but got '
+                            f'{type(distiller)}')
+        return distiller
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """Train step."""
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper.optim_context(
+                    self), self.distiller.student_recorders:  # type: ignore
+                hard_loss = self(batch_inputs, data_samples, mode='loss')
+                soft_loss = self.distiller.compute_distill_losses()
+
+                subnet_losses.update(hard_loss)
+                subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = self.parse_losses(subnet_losses)
+                optim_wrapper.update_params(parsed_subnet_losses)
+
+            return subnet_losses
+
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=len(self.sample_kinds))
+            self._optim_wrapper_count_status_reinitialized = True
+
+        input_data = self.data_preprocessor(data, True)
+        batch_inputs = input_data['inputs']
+        data_samples = input_data['data_samples']
+
+        total_losses = dict()
+        for kind in self.sample_kinds:
+            # update the max subnet loss.
+            if kind == 'max':
+                self.mutator.set_choices(self.mutator.max_choices)
+                with optim_wrapper.optim_context(
+                        self
+                ), self.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper.update_params(parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, 'max_subnet'))
+            # update the min subnet loss.
+            elif kind == 'min':
+                self.mutator.set_choices(self.mutator.min_choices)
+                min_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, 'min_subnet'))
+            # update the random subnets loss.
+            elif 'random' in kind:
+                self.mutator.set_choices(self.mutator.sample_choices())
+                random_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(random_subnet_losses, f'{kind}_subnet'))
+
+        return total_losses
+
+    def train(self, mode=True):
+        """Overwrite the train method in ``nn.Module`` to set ``nn.BatchNorm``
+        to training mode when model is set to eval mode when
+        ``self.norm_training`` is ``True``.
+
+        Args:
+            mode (bool): whether to set training mode (``True``) or evaluation
+                mode (``False``). Default: ``True``.
+        """
+        super(AutoSlim, self).train(mode)
+        if not mode and self.norm_training:
+            for module in self.modules():
+                if isinstance(module, _BatchNorm):
+                    module.training = True
+
+
+@MODEL_WRAPPERS.register_module()
+class AutoSlimDDP(MMDistributedDataParallel):
+    """DDPwapper for autoslim."""
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper.optim_context(
+                    self
+            ), self.module.distiller.student_recorders:  # type: ignore
+                hard_loss = self(batch_inputs, data_samples, mode='loss')
+                soft_loss = self.module.distiller.compute_distill_losses()
+
+                subnet_losses.update(hard_loss)
+                subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = self.module.parse_losses(
+                    subnet_losses)
+                optim_wrapper.update_params(parsed_subnet_losses)
+
+            return subnet_losses
+
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=len(self.module.sample_kinds))
+            self._optim_wrapper_count_status_reinitialized = True
+
+        input_data = self.module.data_preprocessor(data, True)
+        batch_inputs = input_data['inputs']
+        data_samples = input_data['data_samples']
+
+        total_losses = dict()
+        for kind in self.module.sample_kinds:
+            # update the max subnet loss.
+            if kind == 'max':
+                self.module.mutator.set_max_choices()
+                with optim_wrapper.optim_context(
+                        self
+                ), self.module.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.module.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper.update_params(parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, 'max_subnet'))
+            # update the min subnet loss.
+            elif kind == 'min':
+                self.module.mutator.set_min_choices()
+                min_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, 'min_subnet'))
+            # update the random subnets loss.
+            elif 'random' in kind:
+                self.module.mutator.set_choices(
+                    self.module.mutator.sample_choices())
+                random_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(random_subnet_losses, f'{kind}_subnet'))
+
+        return total_losses
+
+    @property
+    def _optim_wrapper_count_status_reinitialized(self) -> bool:
+        return self.module._optim_wrapper_count_status_reinitialized
+
+    @_optim_wrapper_count_status_reinitialized.setter
+    def _optim_wrapper_count_status_reinitialized(self, val: bool) -> None:
+        assert isinstance(val, bool)
+
+        self.module._optim_wrapper_count_status_reinitialized = val
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/bignas.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/bignas.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd3ec4e203dd350a448231a20325f9bcbce972f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/bignas.py
@@ -0,0 +1,280 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.architectures.ops.mobilenet_series import MBBlock
+from mmrazor.models.architectures.utils import set_dropout
+from mmrazor.models.distillers import ConfigurableDistiller
+from mmrazor.models.mutators import NasMutator
+from mmrazor.models.utils import (add_prefix,
+                                  reinitialize_optim_wrapper_count_status)
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from ..base import BaseAlgorithm
+
+VALID_MUTATOR_TYPE = Union[NasMutator, Dict]
+VALID_DISTILLER_TYPE = Union[ConfigurableDistiller, Dict]
+
+
+@MODELS.register_module()
+class BigNAS(BaseAlgorithm):
+    """Implementation of `BigNas <https://arxiv.org/pdf/2003.11142>`_
+
+    BigNAS is a NAS algorithm which searches the following items in MobileNetV3
+    with the one-shot paradigm: kernel_sizes, out_channels, expand_ratios,
+    block_depth and input sizes.
+
+    BigNAS uses a `sandwich` strategy to sample subnets from the supernet,
+    which includes the max subnet, min subnet and N random subnets. It doesn't
+    require retraining, therefore we can directly get well-trained subnets
+    after supernet training.
+
+    The logic of the search part is implemented in
+    :class:`mmrazor.engine.EvolutionSearchLoop`
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`NasMutator` or
+            built mutator.
+        distiller (VALID_DISTILLER_TYPE): Cfg of :class:`ConfigurableDistiller`
+            or built distiller.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        num_random_samples (int): number of random sample subnets.
+            Defaults to 2.
+        drop_path_rate (float): Stochastic depth rate. Defaults to 0.2.
+        backbone_dropout_stages (List): Stages to be set dropout. Defaults to
+            [6, 7].
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 distiller: VALID_DISTILLER_TYPE = None,
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 num_random_samples: int = 2,
+                 drop_path_rate: float = 0.2,
+                 backbone_dropout_stages: List = [6, 7],
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        # NOTE: `mutator.prepare_from_supernet` must be called
+        # before distiller initialized.
+        self.mutator.prepare_from_supernet(self.architecture)
+
+        self.distiller = self._build_distiller(distiller)
+        self.distiller.prepare_from_teacher(self.architecture)
+        self.distiller.prepare_from_student(self.architecture)
+
+        self.sample_kinds = ['max', 'min']
+        for i in range(num_random_samples):
+            self.sample_kinds.append('random' + str(i))
+
+        self.drop_path_rate = drop_path_rate
+        self.backbone_dropout_stages = backbone_dropout_stages
+        self._optim_wrapper_count_status_reinitialized = False
+
+    def _build_mutator(self, mutator: VALID_MUTATOR_TYPE = None) -> NasMutator:
+        """Build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, NasMutator):
+            raise TypeError('mutator should be a `dict` or `NasMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def _build_distiller(
+            self,
+            distiller: VALID_DISTILLER_TYPE = None) -> ConfigurableDistiller:
+        """Build distiller."""
+        if isinstance(distiller, dict):
+            distiller = MODELS.build(distiller)
+        if not isinstance(distiller, ConfigurableDistiller):
+            raise TypeError('distiller should be a `dict` or '
+                            '`ConfigurableDistiller` instance, but got '
+                            f'{type(distiller)}')
+        return distiller
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper.optim_context(
+                    self), self.distiller.student_recorders:  # type: ignore
+                _ = self(batch_inputs, data_samples, mode='loss')
+                soft_loss = self.distiller.compute_distill_losses()
+
+                subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = self.parse_losses(subnet_losses)
+                optim_wrapper.update_params(parsed_subnet_losses)
+
+            return subnet_losses
+
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=len(self.sample_kinds))
+            self._optim_wrapper_count_status_reinitialized = True
+
+        batch_inputs, data_samples = self.data_preprocessor(data,
+                                                            True).values()
+
+        total_losses = dict()
+        for kind in self.sample_kinds:
+            # update the max subnet loss.
+            if kind == 'max':
+                self.mutator.set_max_choices()
+                set_dropout(
+                    layers=self.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.backbone_dropout_stages,
+                    drop_path_rate=self.drop_path_rate)
+                with optim_wrapper.optim_context(
+                        self
+                ), self.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper.update_params(parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, 'max_subnet'))
+            # update the min subnet loss.
+            elif kind == 'min':
+                self.mutator.set_min_choices()
+                set_dropout(
+                    layers=self.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.backbone_dropout_stages,
+                    drop_path_rate=0.)
+                min_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, 'min_subnet'))
+            # update the random subnets loss.
+            elif 'random' in kind:
+                self.mutator.set_choices(self.mutator.sample_choices())
+                set_dropout(
+                    layers=self.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.backbone_dropout_stages,
+                    drop_path_rate=0.)
+                random_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(random_subnet_losses, f'{kind}_subnet'))
+
+        return total_losses
+
+
+@MODEL_WRAPPERS.register_module()
+class BigNASDDP(MMDistributedDataParallel):
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper.optim_context(
+                    self
+            ), self.module.distiller.student_recorders:  # type: ignore
+                _ = self(batch_inputs, data_samples, mode='loss')
+                soft_loss = self.module.distiller.compute_distill_losses()
+
+                subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = self.module.parse_losses(
+                    subnet_losses)
+                optim_wrapper.update_params(parsed_subnet_losses)
+
+            return subnet_losses
+
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=len(self.module.sample_kinds))
+            self._optim_wrapper_count_status_reinitialized = True
+
+        batch_inputs, data_samples = self.module.data_preprocessor(
+            data, True).values()
+
+        total_losses = dict()
+        for kind in self.module.sample_kinds:
+            # update the max subnet loss.
+            if kind == 'max':
+                self.module.mutator.set_max_choices()
+                set_dropout(
+                    layers=self.module.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.module.backbone_dropout_stages,
+                    drop_path_rate=self.module.drop_path_rate)
+                with optim_wrapper.optim_context(
+                        self
+                ), self.module.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.module.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper.update_params(parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, 'max_subnet'))
+            # update the min subnet loss.
+            elif kind == 'min':
+                self.module.mutator.set_min_choices()
+                set_dropout(
+                    layers=self.module.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.module.backbone_dropout_stages,
+                    drop_path_rate=0.)
+                min_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, 'min_subnet'))
+            # update the random subnets loss.
+            elif 'random' in kind:
+                self.module.mutator.set_choices(
+                    self.module.mutator.sample_choices())
+                set_dropout(
+                    layers=self.module.architecture.backbone.layers[:-1],
+                    module=MBBlock,
+                    dropout_stages=self.module.backbone_dropout_stages,
+                    drop_path_rate=0.)
+                random_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(random_subnet_losses, f'{kind}_subnet'))
+
+        return total_losses
+
+    @property
+    def _optim_wrapper_count_status_reinitialized(self) -> bool:
+        return self.module._optim_wrapper_count_status_reinitialized
+
+    @_optim_wrapper_count_status_reinitialized.setter
+    def _optim_wrapper_count_status_reinitialized(self, val: bool) -> None:
+        assert isinstance(val, bool)
+
+        self.module._optim_wrapper_count_status_reinitialized = val
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/darts.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/darts.py
new file mode 100644
index 0000000000000000000000000000000000000000..b110f47ce9db1027237e0ea5c961c8a5b8cfc153
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/darts.py
@@ -0,0 +1,523 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper, OptimWrapperDict
+from torch import nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.mutators import NasMutator
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from ..base import BaseAlgorithm
+
+VALID_MUTATOR_TYPE = Union[NasMutator, Dict]
+
+
+@MODELS.register_module()
+class Darts(BaseAlgorithm):
+    """Implementation of `DARTS <https://arxiv.org/abs/1806.09055>`_
+
+    DARTS means Differentiable Architecture Search, a classic NAS algorithm.
+    :class:`Darts` implements the APIs required by the DARTS, as well as the
+    supernet training and subnet retraining logic for each iter.
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`NasMutator` or
+            built mutator.
+        norm_training (bool): Whether to set norm layers to training mode,
+            namely, not freeze running stats (mean and var). Note: Effect on
+            Batch Norm and its variants only. Defaults to False.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 unroll: bool = False,
+                 norm_training: bool = False,
+                 data_preprocessor: Optional[Union[dict, nn.Module]] = None,
+                 init_cfg: Optional[dict] = None):
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        # Mutator is an essential component of the NAS algorithm. It
+        # provides some APIs commonly used by NAS.
+        # Before using it, you must do some preparation according to
+        # the supernet.
+        self.mutator.prepare_from_supernet(self.architecture)
+        self.mutator.prepare_arch_params()
+
+        self.norm_training = norm_training
+        self.unroll = unroll
+
+    def _build_mutator(self, mutator: VALID_MUTATOR_TYPE = None) -> NasMutator:
+        """Build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, NasMutator):
+            raise TypeError('mutator should be a `dict` or `NasMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def train(self, mode=True):
+        """Convert the model into eval mode while keep normalization layer
+        unfreezed."""
+
+        super().train(mode)
+        if self.norm_training and not mode:
+            for module in self.architecture.modules():
+                if isinstance(module, _BatchNorm):
+                    module.training = True
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training.
+
+        This method defines an iteration step during training, except for the
+        back propagation and optimizer updating, which are done in an optimizer
+        hook. Note that in some complicated cases or models, the whole process
+        including back propagation and optimizer updating are also defined in
+        this method, such as GAN.
+
+        Args:
+            data (dict): The output of dataloader.
+            optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of
+                runner is passed to ``train_step()``. This argument is unused
+                and reserved.
+        Returns:
+            dict: It should contain at least 3 keys: ``loss``, ``log_vars``,
+                ``num_samples``.
+                ``loss`` is a tensor for back propagation, which can be a
+                weighted sum of multiple losses.
+                ``log_vars`` contains all the variables to be sent to the
+                logger.
+                ``num_samples`` indicates the batch size (when the model is
+                DDP, it means the batch size on each GPU), which is used for
+                averaging the logs.
+        """
+        if isinstance(data, (tuple, list)) and isinstance(
+                optim_wrapper, OptimWrapperDict):
+            assert len(data) == len(optim_wrapper), \
+                f'The length of data ({len(data)}) should be equal to that '\
+                f'of optimizers ({len(optim_wrapper)}).'
+
+            supernet_data, mutator_data = data
+
+            log_vars = dict()
+
+            # Update the parameter of mutator
+            if self.unroll:
+                with optim_wrapper['mutator'].optim_context(self):
+                    optim_wrapper['mutator'].zero_grad()
+                    mutator_log_vars = self._unrolled_backward(
+                        mutator_data, supernet_data, optim_wrapper)
+                optim_wrapper['mutator'].step()
+                log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+            else:
+                with optim_wrapper['mutator'].optim_context(self):
+                    pseudo_data = self.data_preprocessor(mutator_data, True)
+                    mutator_batch_inputs = pseudo_data['inputs']
+                    mutator_data_samples = pseudo_data['data_samples']
+                    mutator_loss = self(
+                        mutator_batch_inputs,
+                        mutator_data_samples,
+                        mode='loss')
+                mutator_losses, mutator_log_vars = self.parse_losses(
+                    mutator_loss)
+                optim_wrapper['mutator'].update_params(mutator_losses)
+                log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+
+            # Update the parameter of supernet
+            with optim_wrapper['architecture'].optim_context(self):
+                pseudo_data = self.data_preprocessor(supernet_data, True)
+                supernet_batch_inputs = pseudo_data['inputs']
+                supernet_data_samples = pseudo_data['data_samples']
+                supernet_loss = self(
+                    supernet_batch_inputs, supernet_data_samples, mode='loss')
+            supernet_losses, supernet_log_vars = self.parse_losses(
+                supernet_loss)
+            optim_wrapper['architecture'].update_params(supernet_losses)
+            log_vars.update(add_prefix(supernet_log_vars, 'supernet'))
+
+        else:
+            # Enable automatic mixed precision training context.
+            with optim_wrapper.optim_context(self):
+                pseudo_data = self.data_preprocessor(data, True)
+                batch_inputs = pseudo_data['inputs']
+                data_samples = pseudo_data['data_samples']
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, log_vars = self.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+        return log_vars
+
+    def _unrolled_backward(self, mutator_data, supernet_data, optim_wrapper):
+        """Compute unrolled loss and backward its gradients."""
+        backup_params = copy.deepcopy(tuple(self.architecture.parameters()))
+
+        # Do virtual step on training data
+        lr = optim_wrapper['architecture'].param_groups[0]['lr']
+        momentum = optim_wrapper['architecture'].param_groups[0]['momentum']
+        weight_decay = optim_wrapper['architecture'].param_groups[0][
+            'weight_decay']
+        self._compute_virtual_model(supernet_data, lr, momentum, weight_decay,
+                                    optim_wrapper['architecture'])
+
+        # Calculate unrolled loss on validation data
+        # Keep gradients for model here for compute hessian
+        pseudo_data = self.data_preprocessor(mutator_data, True)
+        mutator_batch_inputs = pseudo_data['inputs']
+        mutator_data_samples = pseudo_data['data_samples']
+        mutator_loss = self(
+            mutator_batch_inputs, mutator_data_samples, mode='loss')
+        mutator_losses, mutator_log_vars = self.parse_losses(mutator_loss)
+
+        # Here we use the backward function of optimWrapper to calculate
+        # the gradients of mutator loss. The gradients of model and arch
+        # can directly obtained. For more information, please refer to
+        # https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/optimizer/optimizer_wrapper.py
+        optim_wrapper['mutator'].backward(mutator_losses)
+        d_model = [param.grad for param in self.architecture.parameters()]
+        d_arch = [param.grad for param in self.mutator.parameters()]
+
+        # compute hessian and final gradients
+        hessian = self._compute_hessian(backup_params, d_model, supernet_data,
+                                        optim_wrapper['architecture'])
+
+        w_arch = tuple(self.mutator.parameters())
+
+        with torch.no_grad():
+            for param, d, h in zip(w_arch, d_arch, hessian):
+                # gradient = dalpha - lr * hessian
+                param.grad = d - lr * h
+
+        # restore weights
+        self._restore_weights(backup_params)
+        return mutator_log_vars
+
+    def _compute_virtual_model(self, supernet_data, lr, momentum, weight_decay,
+                               optim_wrapper):
+        """Compute unrolled weights w`"""
+        # don't need zero_grad, using autograd to calculate gradients
+        pseudo_data = self.data_preprocessor(supernet_data, True)
+        supernet_batch_inputs = pseudo_data['inputs']
+        supernet_data_samples = pseudo_data['data_samples']
+        supernet_loss = self(
+            supernet_batch_inputs, supernet_data_samples, mode='loss')
+        supernet_loss, _ = self.parse_losses(supernet_loss)
+
+        optim_wrapper.backward(supernet_loss)
+        gradients = [param.grad for param in self.architecture.parameters()]
+
+        with torch.no_grad():
+            for w, g in zip(self.architecture.parameters(), gradients):
+                m = optim_wrapper.optimizer.state[w].get('momentum_buffer', 0.)
+                w = w - lr * (momentum * m + g + weight_decay * w)
+
+    def _restore_weights(self, backup_params):
+        """restore weight from backup params."""
+        with torch.no_grad():
+            for param, backup in zip(self.architecture.parameters(),
+                                     backup_params):
+                param.copy_(backup)
+
+    def _compute_hessian(self, backup_params, dw, supernet_data,
+                         optim_wrapper) -> List:
+        """compute hession metric
+            dw = dw` { L_val(w`, alpha) }
+            w+ = w + eps * dw
+            w- = w - eps * dw
+            hessian = (dalpha { L_trn(w+, alpha) }  \
+                - dalpha { L_trn(w-, alpha) }) / (2*eps)
+            eps = 0.01 / ||dw||
+        """
+        self._restore_weights(backup_params)
+        norm = torch.cat([w.view(-1) for w in dw]).norm()
+        eps = 0.01 / norm
+        if norm < 1E-8:
+            print(
+                'In computing hessian, norm is smaller than 1E-8, \
+                cause eps to be %.6f.', norm.item())
+
+        dalphas = []
+        for e in [eps, -2. * eps]:
+            # w+ = w + eps*dw`, w- = w - eps*dw`
+            with torch.no_grad():
+                for p, d in zip(self.architecture.parameters(), dw):
+                    p += e * d
+
+            pseudo_data = self.data_preprocessor(supernet_data, True)
+            supernet_batch_inputs = pseudo_data['inputs']
+            supernet_data_samples = pseudo_data['data_samples']
+            supernet_loss = self(
+                supernet_batch_inputs, supernet_data_samples, mode='loss')
+            supernet_loss, _ = self.parse_losses(supernet_loss)
+
+            optim_wrapper.backward(supernet_loss)
+            dalpha = [param.grad for param in self.mutator.parameters()]
+            dalphas.append(dalpha)
+
+        # dalpha { L_trn(w+) }, # dalpha { L_trn(w-) }
+        dalpha_pos, dalpha_neg = dalphas
+        hessian = [(p - n) / (2. * eps)
+                   for p, n in zip(dalpha_pos, dalpha_neg)]
+        return hessian
+
+
+class BatchNormWrapper(nn.Module):
+    """Wrapper for BatchNorm.
+
+    For more information, Please refer to
+    https://github.com/NVIDIA/apex/issues/121
+    """
+
+    def __init__(self, m):
+        super(BatchNormWrapper, self).__init__()
+        self.m = m
+        # Set the batch norm to eval mode
+        self.m.eval()
+
+    def forward(self, x):
+        """Convert fp16 to fp32 when forward."""
+        input_type = x.dtype
+        x = self.m(x.float())
+        return x.to(input_type)
+
+
+@MODEL_WRAPPERS.register_module()
+class DartsDDP(MMDistributedDataParallel):
+    """DDP for Darts and rewrite train_step of MMDDP."""
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+
+        fp16 = True
+        if fp16:
+
+            def add_fp16_bn_wrapper(model):
+                for child_name, child in model.named_children():
+                    if isinstance(child, nn.BatchNorm2d):
+                        setattr(model, child_name, BatchNormWrapper(child))
+                    else:
+                        add_fp16_bn_wrapper(child)
+
+            add_fp16_bn_wrapper(self.module)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training.
+
+        This method defines an iteration step during training, except for the
+        back propagation and optimizer updating, which are done in an optimizer
+        hook. Note that in some complicated cases or models, the whole process
+        including back propagation and optimizer updating are also defined in
+        this method, such as GAN.
+
+        Args:
+            data (dict): The output of dataloader.
+            optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of
+                runner is passed to ``train_step()``. This argument is unused
+                and reserved.
+        Returns:
+            dict: It should contain at least 3 keys: ``loss``, ``log_vars``,
+                ``num_samples``.
+                ``loss`` is a tensor for back propagation, which can be a
+                weighted sum of multiple losses.
+                ``log_vars`` contains all the variables to be sent to the
+                logger.
+                ``num_samples`` indicates the batch size (when the model is
+                DDP, it means the batch size on each GPU), which is used for
+                averaging the logs.
+        """
+        if isinstance(data, (tuple, list)) and isinstance(
+                optim_wrapper, OptimWrapperDict):
+            assert len(data) == len(optim_wrapper), \
+                f'The length of data ({len(data)}) should be equal to that '\
+                f'of optimizers ({len(optim_wrapper)}).'
+
+            supernet_data, mutator_data = data
+
+            log_vars = dict()
+
+            # Update the parameter of mutator
+            if self.module.unroll:
+                with optim_wrapper['mutator'].optim_context(self):
+                    optim_wrapper['mutator'].zero_grad()
+                    mutator_log_vars = self._unrolled_backward(
+                        mutator_data, supernet_data, optim_wrapper)
+                optim_wrapper['mutator'].step()
+                log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+            else:
+                with optim_wrapper['mutator'].optim_context(self):
+                    pseudo_data = self.module.data_preprocessor(
+                        mutator_data, True)
+                    mutator_batch_inputs = pseudo_data['inputs']
+                    mutator_data_samples = pseudo_data['data_samples']
+                    mutator_loss = self(
+                        mutator_batch_inputs,
+                        mutator_data_samples,
+                        mode='loss')
+
+                    mutator_losses, mutator_log_vars = self.module.parse_losses(  # noqa: E501
+                        mutator_loss)
+                    optim_wrapper['mutator'].update_params(mutator_losses)
+                    log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+
+            # Update the parameter of supernet
+            with optim_wrapper['architecture'].optim_context(self):
+                pseudo_data = self.module.data_preprocessor(
+                    supernet_data, True)
+                supernet_batch_inputs = pseudo_data['inputs']
+                supernet_data_samples = pseudo_data['data_samples']
+                supernet_loss = self(
+                    supernet_batch_inputs, supernet_data_samples, mode='loss')
+
+                supernet_losses, supernet_log_vars = self.module.parse_losses(
+                    supernet_loss)
+
+                optim_wrapper['architecture'].update_params(supernet_losses)
+                log_vars.update(add_prefix(supernet_log_vars, 'supernet'))
+
+        else:
+            # Enable automatic mixed precision training context.
+            with optim_wrapper.optim_context(self):
+                pseudo_data = self.module.data_preprocessor(data, True)
+                batch_inputs = pseudo_data['inputs']
+                data_samples = pseudo_data['data_samples']
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, log_vars = self.module.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+
+        return log_vars
+
+    def _unrolled_backward(self, mutator_data, supernet_data, optim_wrapper):
+        """Compute unrolled loss and backward its gradients."""
+        backup_params = copy.deepcopy(
+            tuple(self.module.architecture.parameters()))
+
+        # do virtual step on training data
+        lr = optim_wrapper['architecture'].param_groups[0]['lr']
+        momentum = optim_wrapper['architecture'].param_groups[0]['momentum']
+        weight_decay = optim_wrapper['architecture'].param_groups[0][
+            'weight_decay']
+        self._compute_virtual_model(supernet_data, lr, momentum, weight_decay,
+                                    optim_wrapper['architecture'])
+
+        # calculate unrolled loss on validation data
+        # keep gradients for model here for compute hessian
+        pseudo_data = self.module.data_preprocessor(mutator_data, True)
+        mutator_batch_inputs = pseudo_data['inputs']
+        mutator_data_samples = pseudo_data['data_samples']
+        mutator_loss = self(
+            mutator_batch_inputs, mutator_data_samples, mode='loss')
+        mutator_losses, mutator_log_vars = self.module.parse_losses(
+            mutator_loss)
+
+        # Here we use the backward function of optimWrapper to calculate
+        # the gradients of mutator loss. The gradients of model and arch
+        # can directly obtained. For more information, please refer to
+        # https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/optimizer/optimizer_wrapper.py
+        optim_wrapper['mutator'].backward(mutator_losses)
+        d_model = [
+            param.grad for param in self.module.architecture.parameters()
+        ]
+        d_arch = [param.grad for param in self.module.mutator.parameters()]
+
+        # compute hessian and final gradients
+        hessian = self._compute_hessian(backup_params, d_model, supernet_data,
+                                        optim_wrapper['architecture'])
+
+        w_arch = tuple(self.module.mutator.parameters())
+
+        with torch.no_grad():
+            for param, da, he in zip(w_arch, d_arch, hessian):
+                # gradient = dalpha - lr * hessian
+                param.grad = da - lr * he
+
+        # restore weights
+        self._restore_weights(backup_params)
+        return mutator_log_vars
+
+    def _compute_virtual_model(self, supernet_data, lr, momentum, weight_decay,
+                               optim_wrapper):
+        """Compute unrolled weights w`"""
+        # don't need zero_grad, using autograd to calculate gradients
+        pseudo_data = self.module.data_preprocessor(supernet_data, True)
+        supernet_batch_inputs = pseudo_data['inputs']
+        supernet_data_samples = pseudo_data['data_samples']
+        supernet_loss = self(
+            supernet_batch_inputs, supernet_data_samples, mode='loss')
+        supernet_loss, _ = self.module.parse_losses(supernet_loss)
+
+        optim_wrapper.backward(supernet_loss)
+        gradients = [
+            param.grad for param in self.module.architecture.parameters()
+        ]
+
+        with torch.no_grad():
+            for w, g in zip(self.module.architecture.parameters(), gradients):
+                m = optim_wrapper.optimizer.state[w].get('momentum_buffer', 0.)
+                w = w - lr * (momentum * m + g + weight_decay * w)
+
+    def _restore_weights(self, backup_params):
+        """restore weight from backup params."""
+        with torch.no_grad():
+            for param, backup in zip(self.module.architecture.parameters(),
+                                     backup_params):
+                param.copy_(backup)
+
+    def _compute_hessian(self, backup_params, dw, supernet_data,
+                         optim_wrapper) -> List:
+        """compute hession metric
+            dw = dw` { L_val(w`, alpha) }
+            w+ = w + eps * dw
+            w- = w - eps * dw
+            hessian = (dalpha { L_trn(w+, alpha) }  \
+                - dalpha { L_trn(w-, alpha) }) / (2*eps)
+            eps = 0.01 / ||dw||
+        """
+        self._restore_weights(backup_params)
+        norm = torch.cat([w.view(-1) for w in dw]).norm()
+        eps = 0.01 / norm
+        if norm < 1E-8:
+            print(
+                'In computing hessian, norm is smaller than 1E-8, \
+                cause eps to be %.6f.', norm.item())
+
+        dalphas = []
+        for e in [eps, -2. * eps]:
+            # w+ = w + eps*dw`, w- = w - eps*dw`
+            with torch.no_grad():
+                for p, d in zip(self.module.architecture.parameters(), dw):
+                    p += e * d
+
+            pseudo_data = self.module.data_preprocessor(supernet_data, True)
+            supernet_batch_inputs = pseudo_data['inputs']
+            supernet_data_samples = pseudo_data['data_samples']
+            supernet_loss = self(
+                supernet_batch_inputs, supernet_data_samples, mode='loss')
+            supernet_loss, _ = self.module.parse_losses(supernet_loss)
+
+            optim_wrapper.backward(supernet_loss)
+            dalpha = [param.grad for param in self.module.mutator.parameters()]
+            dalphas.append(dalpha)
+
+        # dalpha { L_trn(w+) }, # dalpha { L_trn(w-) }
+        dalpha_pos, dalpha_neg = dalphas
+        hessian = [(p - n) / (2. * eps)
+                   for p, n in zip(dalpha_pos, dalpha_neg)]
+        return hessian
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/dsnas.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/dsnas.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5937ba71108a9b5a5accb495a3a52117e1e801f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/dsnas.py
@@ -0,0 +1,332 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from typing import Any, Dict, List, Optional, Union
+
+import torch
+import torch.distributed as dist
+import torch.nn.functional as F
+from mmengine.dist import get_dist_info
+from mmengine.logging import MessageHub
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper, OptimWrapperDict
+from torch import nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.mutables import BaseMutable
+from mmrazor.models.mutators import NasMutator
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODEL_WRAPPERS, MODELS, TASK_UTILS
+from mmrazor.structures import export_fix_subnet, load_fix_subnet
+from ..base import BaseAlgorithm
+
+VALID_MUTATOR_TYPE = Union[NasMutator, Dict]
+
+
+@MODELS.register_module()
+class DSNAS(BaseAlgorithm):
+    """Implementation of `DSNAS <https://arxiv.org/abs/2002.09128>`_
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`NasMutator` or
+            built mutator.
+        pretrain_epochs (int): Num of epochs for supernet pretraining.
+        finetune_epochs (int): Num of epochs for subnet finetuning.
+        flops_constraints (float): Flops constraints for judging whether to
+            backward flops loss or not. Default to 300.0(M).
+        estimator_cfg (Dict[str, Any]): Used for building a resource estimator.
+            Default to None.
+        norm_training (bool): Whether to set norm layers to training mode,
+            namely, not freeze running stats (mean and var). Note: Effect on
+            Batch Norm and its variants only. Defaults to False.
+        data_preprocessor (dict, optional): The pre-process config of
+            :class:`BaseDataPreprocessor`. Defaults to None.
+        init_cfg (dict): Init config for ``BaseModule``.
+
+    Note:
+        Dsnas doesn't require retraining. It has 3 stages in searching:
+        1. `cur_epoch` < `pretrain_epochs` refers to supernet pretraining.
+        2. `pretrain_epochs` <= `cur_epoch` < `finetune_epochs` refers to
+                normal supernet training while mutator is updated.
+        3. `cur_epoch` >= `finetune_epochs` refers to subnet finetuning.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 pretrain_epochs: int = 0,
+                 finetune_epochs: int = 80,
+                 flops_constraints: float = 300.0,
+                 estimator_cfg: Dict[str, Any] = None,
+                 norm_training: bool = False,
+                 data_preprocessor: Optional[Union[dict, nn.Module]] = None,
+                 init_cfg: Optional[dict] = None):
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        # initialize estimator
+        estimator_cfg = dict() if estimator_cfg is None else estimator_cfg
+        if 'type' not in estimator_cfg:
+            estimator_cfg['type'] = 'mmrazor.ResourceEstimator'
+        self.estimator = TASK_UTILS.build(estimator_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        # Mutator is an essential component of the NAS algorithm. It
+        # provides some APIs commonly used by NAS.
+        # Before using it, you must do some preparation according to
+        # the supernet.
+        self.mutator.prepare_from_supernet(self.architecture)
+        self.mutator.prepare_arch_params()
+
+        self.mutable_module_resources = self._get_module_resources()
+        self.search_space_name_list = list(self.mutator._name2mutable.keys())
+
+        self.is_supernet = True
+
+        self.norm_training = norm_training
+        self.pretrain_epochs = pretrain_epochs
+        self.finetune_epochs = finetune_epochs
+        if pretrain_epochs >= finetune_epochs:
+            raise ValueError(f'Pretrain stage (optional) must be done before '
+                             f'finetuning stage. Got `{pretrain_epochs}` >= '
+                             f'`{finetune_epochs}`.')
+
+        self.flops_loss_coef = 1e-2
+        self.flops_constraints = flops_constraints
+        _, self.world_size = get_dist_info()
+
+    def _build_mutator(self, mutator: VALID_MUTATOR_TYPE = None) -> NasMutator:
+        """Build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, NasMutator):
+            raise TypeError('mutator should be a `dict` or `NasMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def train(self, mode=True):
+        """Convert the model into eval mode while keep normalization layer
+        unfreezed."""
+
+        super().train(mode)
+        if self.norm_training and not mode:
+            for module in self.architecture.modules():
+                if isinstance(module, _BatchNorm):
+                    module.training = True
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training.
+
+        Args:
+            data (dict): The output of dataloader.
+            optimizer (:obj:`torch.optim.Optimizer` | dict): The optimizer of
+                runner is passed to ``train_step()``.
+        """
+        if isinstance(optim_wrapper, OptimWrapperDict):
+            log_vars = dict()
+            self.message_hub = MessageHub.get_current_instance()
+            cur_epoch = self.message_hub.get_info('epoch')
+            need_update_mutator = self.need_update_mutator(cur_epoch)
+
+            if cur_epoch == self.finetune_epochs and self.is_supernet:
+                # synchronize arch params to start the finetune stage.
+                for k, v in self.mutator.arch_params.items():
+                    dist.broadcast(v, src=0)
+                self._fix_archtecture()
+                self.is_supernet = False
+
+            # 1. update architecture
+            with optim_wrapper['architecture'].optim_context(self):
+                pseudo_data = self.data_preprocessor(data, True)
+                supernet_batch_inputs = pseudo_data['inputs']
+                supernet_data_samples = pseudo_data['data_samples']
+                supernet_loss = self(
+                    supernet_batch_inputs, supernet_data_samples, mode='loss')
+
+            supernet_losses, supernet_log_vars = self.parse_losses(
+                supernet_loss)
+            optim_wrapper['architecture'].backward(
+                supernet_losses, retain_graph=need_update_mutator)
+            optim_wrapper['architecture'].step()
+            optim_wrapper['architecture'].zero_grad()
+            log_vars.update(add_prefix(supernet_log_vars, 'supernet'))
+
+            # 2. update mutator
+            if need_update_mutator:
+                with optim_wrapper['mutator'].optim_context(self):
+                    mutator_loss = self.compute_mutator_loss()
+                mutator_losses, mutator_log_vars = \
+                    self.parse_losses(mutator_loss)
+                optim_wrapper['mutator'].update_params(mutator_losses)
+                log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+                # handle the grad of arch params & weights
+                self.handle_grads()
+
+        else:
+            # Enable automatic mixed precision training context.
+            with optim_wrapper.optim_context(self):
+                pseudo_data = self.data_preprocessor(data, True)
+                batch_inputs = pseudo_data['inputs']
+                data_samples = pseudo_data['data_samples']
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, log_vars = self.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+
+        return log_vars
+
+    def _fix_archtecture(self):
+        """Fix architecture based on current choice."""
+        self.mutator.set_choices(self.mutator.sample_choices())
+        for module in self.architecture.modules():
+            if isinstance(module, BaseMutable):
+                if not module.is_fixed:
+                    module.fix_chosen(module.current_choice)
+
+    def _get_module_resources(self):
+        """Get resources of spec modules."""
+        spec_modules = []
+        for name, module in self.architecture.named_modules():
+            if isinstance(module, BaseMutable):
+                for choice in module.choices:
+                    spec_modules.append(name + '._candidates.' + choice)
+
+        mutable_module_resources = self.estimator.estimate_separation_modules(
+            self.architecture, dict(spec_modules=spec_modules))
+
+        return mutable_module_resources
+
+    def need_update_mutator(self, cur_epoch: int) -> bool:
+        """Whether to update mutator."""
+        if cur_epoch >= self.pretrain_epochs and \
+           cur_epoch < self.finetune_epochs:
+            return True
+        return False
+
+    def compute_mutator_loss(self) -> Dict[str, torch.Tensor]:
+        """Compute mutator loss.
+
+        In this method, arch_loss & flops_loss[optional] are computed
+        by traversing arch_weights & probs in search groups.
+
+        Returns:
+            Dict: Loss of the mutator.
+        """
+        arch_loss = 0.0
+        flops_loss = 0.0
+        for name, module in self.architecture.named_modules():
+            if isinstance(module, BaseMutable):
+                k = module.mutable_prefix + '_' + \
+                    str(self.search_space_name_list.index(name))
+                probs = F.softmax(self.mutator.arch_params[k], -1)
+                arch_loss += torch.log(
+                    (module.arch_weights * probs).sum(-1)).sum()
+
+                # get the index of op with max arch weights.
+                index = (module.arch_weights == 1).nonzero().item()
+                _module_key = name + '._candidates.' + module.choices[index]
+                flops_loss += probs[index] * \
+                    self.mutable_module_resources[_module_key]['flops']
+
+        mutator_loss = dict(arch_loss=arch_loss / self.world_size)
+
+        copied_model = copy.deepcopy(self)
+        copied_model.mutator.set_choices(copied_model.mutator.sample_choices())
+
+        subnet_dict = export_fix_subnet(copied_model)[0]
+        load_fix_subnet(copied_model, subnet_dict)
+
+        subnet_flops = self.estimator.estimate(copied_model)['flops']
+        if subnet_flops >= self.flops_constraints:
+            mutator_loss['flops_loss'] = \
+                (flops_loss * self.flops_loss_coef) / self.world_size
+
+        return mutator_loss
+
+    def handle_grads(self):
+        """Handle grads of arch params & arch weights."""
+        for name, module in self.architecture.named_modules():
+            if isinstance(module, BaseMutable):
+                k = module.mutable_prefix + '_' + \
+                    str(self.search_space_name_list.index(name))
+                self.mutator.arch_params[k].grad.data.mul_(
+                    module.arch_weights.grad.data.sum())
+                module.arch_weights.grad.zero_()
+
+
+@MODEL_WRAPPERS.register_module()
+class DSNASDDP(MMDistributedDataParallel):
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training.
+
+        This method defines an iteration step during training, except for the
+        back propagation and optimizer updating, which are done in an optimizer
+        hook. Note that in some complicated cases or models, the whole process
+        including back propagation and optimizer updating are also defined in
+        this method, such as GAN.
+        """
+        if isinstance(optim_wrapper, OptimWrapperDict):
+            log_vars = dict()
+            self.message_hub = MessageHub.get_current_instance()
+            cur_epoch = self.message_hub.get_info('epoch')
+            need_update_mutator = self.module.need_update_mutator(cur_epoch)
+
+            # TODO process the input
+            if cur_epoch == self.module.finetune_epochs and \
+               self.module.is_supernet:
+                # synchronize arch params to start the finetune stage.
+                for k, v in self.module.mutator.arch_params.items():
+                    dist.broadcast(v, src=0)
+                self.module._fix_archtecture()
+                self.module.is_supernet = False
+
+            # 1. update architecture
+            with optim_wrapper['architecture'].optim_context(self):
+                pseudo_data = self.module.data_preprocessor(data, True)
+                supernet_batch_inputs = pseudo_data['inputs']
+                supernet_data_samples = pseudo_data['data_samples']
+                supernet_loss = self(
+                    supernet_batch_inputs, supernet_data_samples, mode='loss')
+
+            supernet_losses, supernet_log_vars = self.module.parse_losses(
+                supernet_loss)
+            optim_wrapper['architecture'].backward(
+                supernet_losses, retain_graph=need_update_mutator)
+            optim_wrapper['architecture'].step()
+            optim_wrapper['architecture'].zero_grad()
+            log_vars.update(add_prefix(supernet_log_vars, 'supernet'))
+
+            # 2. update mutator
+            if need_update_mutator:
+                with optim_wrapper['mutator'].optim_context(self):
+                    mutator_loss = self.module.compute_mutator_loss()
+                mutator_losses, mutator_log_vars = \
+                    self.module.parse_losses(mutator_loss)
+                optim_wrapper['mutator'].update_params(mutator_losses)
+                log_vars.update(add_prefix(mutator_log_vars, 'mutator'))
+                # handle the grad of arch params & weights
+                self.module.handle_grads()
+
+        else:
+            # Enable automatic mixed precision training context.
+            with optim_wrapper.optim_context(self):
+                pseudo_data = self.module.data_preprocessor(data, True)
+                batch_inputs = pseudo_data['inputs']
+                data_samples = pseudo_data['data_samples']
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, log_vars = self.module.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+
+        return log_vars
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/spos.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/spos.py
new file mode 100644
index 0000000000000000000000000000000000000000..90a27aa4b96925ad2b4655c3539ba9d9915c1f01
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/nas/spos.py
@@ -0,0 +1,97 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel
+from mmengine.structures import BaseDataElement
+from torch import nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.mutators import NasMutator
+from mmrazor.registry import MODELS
+from ..base import BaseAlgorithm, LossResults
+
+VALID_MUTATOR_TYPE = Union[NasMutator, Dict]
+
+
+@MODELS.register_module()
+class SPOS(BaseAlgorithm):
+    """Implementation of `SPOS <https://arxiv.org/abs/1904.00420>`_
+
+    SPOS means Single Path One-Shot, a classic NAS algorithm.
+    :class:`SPOS` implements the APIs required by the Single Path One-Shot
+    algorithm, as well as the supernet training and subnet retraining logic
+    for each iter.
+
+    The logic of the search part is implemented in
+    :class:`mmrazor.core.EvolutionSearch`
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        mutator (VALID_MUTATOR_TYPE): The config of :class:`NasMutator` or
+            built mutator.
+        norm_training (bool): Whether to set norm layers to training mode,
+            namely, not freeze running stats (mean and var). Note: Effect on
+            Batch Norm and its variants only. Defaults to False.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+
+    Note:
+        During supernet training, since each op is not fully trained, the
+        statistics of :obj:_BatchNorm are inaccurate. This problem affects the
+        evaluation of the performance of each subnet in the search phase. There
+        are usually two ways to solve this problem, both need to set
+        `norm_training` to True:
+
+        1) Using a large batch size, BNs use the mean and variance of the
+           current batch during forward.
+        2) Recalibrate the statistics of BN before searching.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 norm_training: bool = False,
+                 data_preprocessor: Optional[Union[dict, nn.Module]] = None,
+                 init_cfg: Optional[dict] = None):
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.mutator = self._build_mutator(mutator)
+        # Mutator is an essential component of the NAS algorithm. It
+        # provides some APIs commonly used by NAS.
+        # Before using it, you must do some preparations according to
+        # the supernet.
+        self.mutator.prepare_from_supernet(self.architecture)
+
+        self.norm_training = norm_training
+
+    def _build_mutator(self, mutator: VALID_MUTATOR_TYPE = None) -> NasMutator:
+        """Build mutator."""
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        if not isinstance(mutator, NasMutator):
+            raise TypeError('mutator should be a `dict` or `NasMutator` '
+                            f'instance, but got {type(mutator)}.')
+        return mutator
+
+    def loss(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: Optional[List[BaseDataElement]] = None,
+    ) -> LossResults:
+        """Calculate losses from a batch of inputs and data samples."""
+        self.mutator.set_choices(self.mutator.sample_choices())
+        return self.architecture(batch_inputs, data_samples, mode='loss')
+
+    def train(self, mode=True):
+        """Convert the model into eval mode while keep normalization layer
+        unfreezed."""
+
+        super().train(mode)
+        if self.norm_training and not mode:
+            for module in self.architecture.modules():
+                if isinstance(module, _BatchNorm):
+                    module.training = True
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..09a87b90dc51b81662f191544e47038fd506cad9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dcff import DCFF
+from .dmcp import DMCP, DMCPDDP
+from .slimmable_network import SlimmableNetwork, SlimmableNetworkDDP
+
+__all__ = [
+    'SlimmableNetwork', 'SlimmableNetworkDDP', 'DCFF', 'DMCP', 'DMCPDDP'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dcff.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dcff.py
new file mode 100644
index 0000000000000000000000000000000000000000..e89da50b4e7512f4dfa62ec7d2d7961e0e5dc6b3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dcff.py
@@ -0,0 +1,153 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Dict, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine import MMLogger
+from mmengine.model import BaseModel
+from mmengine.structures import BaseDataElement
+
+from mmrazor.models.mutators import DCFFChannelMutator
+from mmrazor.registry import MODELS
+from .ite_prune_algorithm import ItePruneAlgorithm, ItePruneConfigManager
+
+LossResults = Dict[str, torch.Tensor]
+TensorResults = Union[Tuple[torch.Tensor], torch.Tensor]
+PredictResults = List[BaseDataElement]
+ForwardResults = Union[LossResults, TensorResults, PredictResults]
+
+
+@MODELS.register_module()
+class DCFF(ItePruneAlgorithm):
+    """DCFF Networks.
+
+    Please refer to paper
+    [Dynamic-coded Filter Fusion](https://arxiv.org/abs/2107.06916).
+
+    Args:
+        architecture (Union[BaseModel, Dict]): The model to be pruned.
+        mutator_cfg (Union[Dict, ChannelMutator], optional): The config
+            of a mutator. Defaults to dict( type='DCFFChannelMutator',
+            channel_unit_cfg=dict( type='DCFFChannelUnit')).
+        data_preprocessor (Optional[Union[Dict, nn.Module]], optional):
+            Defaults to None.
+        target_pruning_ratio (dict, optional): The prune-target. The template
+            of the prune-target can be get by calling
+            mutator.choice_template(). Defaults to {}.
+        step_freq (int, optional): The step between two pruning operations.
+            Defaults to 1. Legal input includes [1, self._max_iters]
+            One and only one of (step_freq, prune_times) is set to legal int.
+        prune_times (int, optional): The total times to prune a model.
+            Defaults to 0. Legal input includes [1, self._max_iters]
+            One and only one of (step_freq, prune_times) is set to legal int.
+        init_cfg (Optional[Dict], optional): init config for architecture.
+            Defaults to None.
+        linear_schedule (bool, optional): flag to set linear ratio schedule.
+            Defaults to False due to dcff fixed pruning rate.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator_cfg: Union[Dict, DCFFChannelMutator] = dict(
+                     type='DCFFChannelMutator',
+                     channel_unit_cfg=dict(type='DCFFChannelUnit')),
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 target_pruning_ratio: Optional[Dict[str, float]] = None,
+                 step_freq=1,
+                 prune_times=0,
+                 init_cfg: Optional[Dict] = None,
+                 linear_schedule=False) -> None:
+        # invalid param prune_times, reset after message_hub get [max_epoch]
+        super().__init__(architecture, mutator_cfg, data_preprocessor,
+                         target_pruning_ratio, step_freq, prune_times,
+                         init_cfg, linear_schedule)
+
+    def _calc_temperature(self, cur_num: int, max_num: int):
+        """Calculate temperature param."""
+        # Set the fixed parameters required to calculate the temperature t
+        t_s, t_e, k = 1, 10000, 1
+
+        A = 2 * (t_e - t_s) * (1 + math.exp(-k * max_num)) / (
+            1 - math.exp(-k * max_num))
+        T = A / (1 + math.exp(-k * cur_num)) + t_s - A / 2
+        t = 1 / T
+        return t
+
+    def _legal_freq_time(self, freq_time):
+        """check whether step_freq or prune_times belongs to legal range:
+
+            [1, self._max_iters]
+
+        Args:
+            freq_time (Int): step_freq or prune_times.
+        """
+        return (freq_time > 0) and (freq_time < self._max_iters)
+
+    def _init_prune_config_manager(self):
+        """init prune_config_manager and check step_freq & prune_times.
+
+        In DCFF, prune_times is set by step_freq and self._max_iters.
+        """
+        if self.target_pruning_ratio is None:
+            target_pruning_ratio = self.mutator.current_choices
+        else:
+            target_pruning_ratio = self.set_target_pruning_ratio(
+                self.target_pruning_ratio, self.mutator.mutable_units)
+
+        if self.by_epoch:
+            # step_freq based on iterations
+            self.step_freq *= self._iters_per_epoch
+
+        if self._legal_freq_time(self.step_freq) ^ self._legal_freq_time(
+                self.prune_times):
+            if self._legal_freq_time(self.step_freq):
+                self.prune_times = self._max_iters // self.step_freq
+            else:
+                self.step_freq = self._max_iters // self.prune_times
+        else:
+            raise RuntimeError('One and only one of (step_freq, prune_times)'
+                               'can be set to legal int.')
+
+        # config_manager move to forward.
+        # message_hub['max_epoch'] unaccessible when init
+        prune_config_manager = ItePruneConfigManager(
+            target_pruning_ratio,
+            self.mutator.current_choices,
+            self.step_freq,
+            prune_times=self.prune_times,
+            linear_schedule=self.linear_schedule)
+
+        return prune_config_manager
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[List[BaseDataElement]] = None,
+                mode: str = 'tensor') -> ForwardResults:
+        """Forward."""
+
+        if self.training:
+            # In DCFF prune_message is related to total_num
+            # Set self.prune_config_manager after message_hub
+            # has['max_epoch/iter']
+            if not hasattr(self, 'prune_config_manager'):
+                # iter num per epoch only available after initiation
+                self.prune_config_manager = self._init_prune_config_manager()
+            if self.prune_config_manager.is_prune_time(self._iter):
+                config = self.prune_config_manager.prune_at(self._iter)
+                self.mutator.set_choices(config)
+
+                # calc fusion channel
+                temperature = self._calc_temperature(self._iter,
+                                                     self._max_iters)
+                self.mutator.calc_information(temperature)
+
+                logger = MMLogger.get_current_instance()
+                if (self.by_epoch):
+                    logger.info(
+                        f'The model is pruned at {self._epoch}th epoch once.')
+                else:
+                    logger.info(
+                        f'The model is pruned at {self._iter}th iter once.')
+
+        return super().forward(inputs, data_samples, mode)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dmcp.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dmcp.py
new file mode 100644
index 0000000000000000000000000000000000000000..043cf9acffba14b1f1cae8d4af401887fa7d4a49
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/dmcp.py
@@ -0,0 +1,425 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import random
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import torch
+from mmengine import MessageHub
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.distillers import ConfigurableDistiller
+from mmrazor.models.mutators import ChannelMutator, DMCPChannelMutator
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from ...task_modules.estimators import ResourceEstimator
+from ..base import BaseAlgorithm
+
+VALID_DISTILLER_TYPE = Union[ConfigurableDistiller, Dict, Any]
+
+LossResults = Dict[str, torch.Tensor]
+TensorResults = Union[Tuple[torch.Tensor], torch.Tensor]
+PredictResults = List[BaseDataElement]
+ForwardResults = Union[LossResults, TensorResults, PredictResults]
+
+
+@MODELS.register_module()
+class DMCP(BaseAlgorithm):
+    """Implementation of `DMCP <https://arxiv.org/abs/2005.03354>`_
+
+    Args:
+        architecture (dict|:obj:`BaseModel`): The config of :class:`BaseModel`
+            or built model. Corresponding to supernet in NAS algorithm.
+        distiller (VALID_DISTILLER_TYPE): Configs to build a distiller.
+        data_preprocessor (Optional[Union[dict, nn.Module]]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        strategy (list): mode of sampled net.
+            Defaults to ['max', 'min', 'arch_random'].
+        arch_start_train (int): Number of iter to start arch training.
+            Defaults to ['max', 'min', 'arch_random'].
+        arch_train_freq (int): Frequency of training.
+            Defaults to 500.
+        distillation_times (int): Number of iter to start arch training.
+            Defaults to 20000.
+        target_flops (int): Target FLOPs. Default unit: MFLOPs.
+            Defaults to 150.
+        flops_loss_type (str): The model used to calculate flops_loss.
+            Defaults to `log_l1`.
+        flop_loss_weight (float): Weight of flops_loss.
+             Defaults to 1.0.
+        init_cfg (Optional[dict]): Init config for ``BaseModule``.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 distiller: VALID_DISTILLER_TYPE,
+                 architecture: Union[BaseModel, Dict],
+                 mutator_cfg: Union[Dict, DMCPChannelMutator] = dict(
+                     type=' DMCPChannelMutator',
+                     channel_unit_cfg=dict(type='DMCPChannelUnit')),
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 strategy: List = ['max', 'min', 'arch_random'],
+                 init_cfg: Optional[Dict] = None,
+                 arch_start_train=10000,
+                 arch_train_freq=500,
+                 distillation_times=20000,
+                 target_flops=150,
+                 flops_loss_type: str = 'log_l1',
+                 flop_loss_weight: float = 1.0) -> None:
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.arch_start_train = arch_start_train
+        self.arch_train_freq = arch_train_freq
+        self.strategy = strategy
+        self.distillation_times = distillation_times
+        self.target_flops = target_flops
+
+        self.flops_loss_type = flops_loss_type
+        self.flop_loss_weight = flop_loss_weight
+        self.cur_sample_prob = 1.0
+        self.arch_train = False
+
+        self.mutator: ChannelMutator = MODELS.build(mutator_cfg)
+        self.mutator.prepare_from_supernet(self.architecture)
+
+        self.distiller = self._build_distiller(distiller)
+        self.distiller.prepare_from_teacher(self.architecture)
+        self.distiller.prepare_from_student(self.architecture)
+
+    def _build_distiller(
+            self, distiller: VALID_DISTILLER_TYPE) -> ConfigurableDistiller:
+        """Build distiller."""
+        if isinstance(distiller, dict):
+            distiller = MODELS.build(distiller)
+        if not isinstance(distiller, ConfigurableDistiller):
+            raise TypeError('distiller should be a `dict` or '
+                            '`ConfigurableDistiller` instance, but got '
+                            f'{type(distiller)}')
+
+        return distiller
+
+    def set_subnet(self, mode, arch_train=None) -> None:
+        """Set subnet by 'max' 'min' 'random' 'direct' or 'expected."""
+        assert mode in ('max', 'min', 'random', 'direct', 'expected')
+        if arch_train is None:
+            arch_train = self.arch_train
+        self.mutator.sample_subnet(mode, arch_train)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training."""
+        if not self.arch_train and \
+                self._iter > self.arch_start_train:
+            self.arch_train = True
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper['architecture'].optim_context(
+                    self), self.distiller.student_recorders:  # type: ignore
+                hard_loss = self(batch_inputs, data_samples, mode='loss')
+                subnet_losses.update(hard_loss)
+
+                if self._iter > self.distillation_times:
+                    soft_loss = self.distiller.compute_distill_losses()
+                    subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = self.parse_losses(subnet_losses)
+                optim_wrapper['architecture'].update_params(
+                    parsed_subnet_losses)
+
+            return subnet_losses
+
+        batch_inputs, data_samples = self.data_preprocessor(data,
+                                                            True).values()
+
+        total_losses = dict()
+        # update model parameters
+        max_net_num = min_net_num = random_net_num = direct_net_num = 1
+        for kind in self.strategy:
+            if kind in ('max'):
+                self.set_subnet(mode='max')
+                with optim_wrapper['architecture'].optim_context(
+                        self
+                ), self.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper['architecture'].update_params(
+                        parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, f'max_subnet{max_net_num}'))
+                max_net_num += 1
+            elif kind in ('min'):
+                self.set_subnet(mode='min')
+                min_subnet_losses =\
+                    distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, f'min_subnet{min_net_num}'))
+                min_net_num += 1
+            elif kind in ('arch_random'):
+                if self.arch_train:
+                    self.set_subnet(mode='direct')
+                    direct_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(direct_subnet_losses,
+                                   f'direct_subnet{direct_net_num}'))
+                    direct_net_num += 1
+                else:
+                    self.set_subnet(mode='random')
+                    random_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(random_subnet_losses,
+                                   f'random_subnet{random_net_num}'))
+                    random_net_num += 1
+            elif kind in ('scheduled_random'):
+                if random.uniform(0, 1) > self.cur_sample_prob\
+                        and self.arch_train:
+                    self.set_subnet(mode='direct')
+                    direct_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(direct_subnet_losses,
+                                   f'direct_subnet{direct_net_num}'))
+                    direct_net_num += 1
+                else:
+                    self.set_subnet(mode='random')
+                    random_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(random_subnet_losses,
+                                   f'random_subnet{random_net_num}'))
+                    random_net_num += 1
+                self.cur_sample_prob *= 0.9999
+
+        # update arch parameters
+        if self.arch_train \
+                and self._iter % self.arch_train_freq == 0:
+            with optim_wrapper['mutator'].optim_context(self):
+                optim_wrapper['mutator'].zero_grad()
+                mutator_loss = self._update_arch_params(
+                    batch_inputs, data_samples, optim_wrapper, mode='loss')
+            total_losses.update(mutator_loss)
+        return total_losses
+
+    def _update_arch_params(self,
+                            inputs: torch.Tensor,
+                            data_samples: Optional[List[BaseDataElement]],
+                            optim_wrapper: OptimWrapper,
+                            mode: str = 'loss') -> Dict:
+        """Update the arch parameters in mutator.
+
+        Returns:
+            dict: It should contain 2 keys: ``arch_loss``, ``flops_loss``.
+                ``arch_loss`` is a tensor for back propagation, which can be a
+                weighted sum of multiple losses.
+                ``flops_loss`` contains all the variables to be sent to the
+                logger.
+        """
+        arch_params_loss = dict()
+        self.eval()
+        # update arch_loss
+        self.set_subnet(mode='max', arch_train=True)
+        with optim_wrapper['mutator'].optim_context(self):
+            arch_loss = self(inputs, data_samples, mode=mode)
+        parsed_arch_loss, _ = self.parse_losses(arch_loss)
+        optim_wrapper['mutator'].update_params(parsed_arch_loss)
+        arch_params_loss.update(add_prefix(arch_loss, 'arch'))
+
+        # update flops_loss
+        self.set_subnet(mode='expected', arch_train=False)
+        expected_flops = self.calc_current_flops()
+        flops_loss = self._compute_flops_loss(expected_flops).to(
+            arch_loss['loss'].device)
+        parsed_flops_loss, _ = self.parse_losses({'loss': flops_loss})
+        optim_wrapper['mutator'].update_params(parsed_flops_loss)
+        arch_params_loss.update(add_prefix({'loss': flops_loss}, 'flops'))
+        self.train()
+        return arch_params_loss
+
+    def _compute_flops_loss(self, expected_flops):
+        """Calculation of loss functions of arch parameters.
+
+        Calculate the difference between the calculated FLOPs and the target
+        FLOPs(MFLOPs).
+
+        Args:
+            expected_flops (tensor|float): FLOPs calculated from the current
+                number of sampling channels
+        Returns:
+            tensor|float: A loss calculated from the input expected FLOPs and
+                the target FLOPs. And the type of this loss should be the same
+                as the expected FLOPs.
+        """
+        flops_error = expected_flops - self.target_flops * 1e6
+
+        if self.flops_loss_type == 'l2':
+            floss = torch.pow(flops_error, 2)
+        elif self.flops_loss_type == 'inverted_log_l1':
+            floss = -torch.log(1 / (flops_error + 1e-5))
+        elif self.flops_loss_type == 'log_l1':
+            if abs(flops_error) > 200:
+                ratio = 0.1
+            else:
+                ratio = 1.0
+            # piecewise log function
+            lower_flops = self.target_flops * 0.95
+            if expected_flops < lower_flops:
+                floss = torch.log(ratio * abs(flops_error))
+            elif (lower_flops <= expected_flops < self.target_flops):
+                floss = expected_flops * 0
+            else:
+                floss = (
+                    torch.log(ratio * abs(expected_flops - (lower_flops))))
+        elif self.flops_loss_type == 'l1':
+            floss = abs(flops_error)
+        else:
+            raise NotImplementedError
+        return floss * self.flop_loss_weight
+
+    def calc_current_flops(self):
+        """Calculate the FLOPs under the current sampled network."""
+        estimator = ResourceEstimator()
+        model = getattr(self, 'module', self)
+        estimation = estimator.estimate(
+            model=model.architecture.backbone,
+            flops_params_cfg=dict(units=None))
+        return estimation['flops']
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[List[BaseDataElement]] = None,
+                mode: str = 'loss') -> ForwardResults:
+        """Forward."""
+        return BaseAlgorithm.forward(self, inputs, data_samples, mode)
+
+    @property
+    def _iter(self):
+        """Get current sum iteration number."""
+        message_hub = MessageHub.get_current_instance()
+        if 'iter' in message_hub.runtime_info:
+            return message_hub.runtime_info['iter']
+        else:
+            raise RuntimeError('Use MessageHub before initiation.'
+                               'iter is inited in before_run_iter().')
+
+
+@MODEL_WRAPPERS.register_module()
+class DMCPDDP(MMDistributedDataParallel):
+    """DDP for DMCP and rewrite train_step of MMDDP."""
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """The iteration step during training."""
+        if not self.module.arch_train and \
+                self.module._iter > self.module.arch_start_train:
+            self.module.arch_train = True
+
+        def distill_step(
+                batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
+        ) -> Dict[str, torch.Tensor]:
+            subnet_losses = dict()
+            with optim_wrapper['architecture'].optim_context(
+                    self), self.module.distiller.student_recorders:
+                hard_loss = self(batch_inputs, data_samples, mode='loss')
+                subnet_losses.update(hard_loss)
+                if self.module._iter > self.module.distillation_times:
+                    soft_loss = \
+                        self.module.distiller.compute_distill_losses()
+                    subnet_losses.update(soft_loss)
+
+                parsed_subnet_losses, _ = \
+                    self.module.parse_losses(subnet_losses)
+                optim_wrapper['architecture'].update_params(
+                    parsed_subnet_losses)
+
+            return subnet_losses
+
+        batch_inputs, data_samples = self.module.data_preprocessor(
+            data, True).values()
+
+        total_losses = dict()
+        # update model parameters
+        max_net_num = min_net_num = random_net_num = direct_net_num = 1
+        for kind in self.module.strategy:
+            if kind in ('max'):
+                self.module.set_subnet(mode='max')
+                with optim_wrapper['architecture'].optim_context(
+                        self
+                ), self.module.distiller.teacher_recorders:  # type: ignore
+                    max_subnet_losses = self(
+                        batch_inputs, data_samples, mode='loss')
+                    parsed_max_subnet_losses, _ = self.module.parse_losses(
+                        max_subnet_losses)
+                    optim_wrapper['architecture'].update_params(
+                        parsed_max_subnet_losses)
+                total_losses.update(
+                    add_prefix(max_subnet_losses, f'max_subnet{max_net_num}'))
+                max_net_num += 1
+            elif kind in ('min'):
+                self.module.set_subnet(mode='min')
+                min_subnet_losses = distill_step(batch_inputs, data_samples)
+                total_losses.update(
+                    add_prefix(min_subnet_losses, f'min_subnet{min_net_num}'))
+                min_net_num += 1
+            elif kind in ('arch_random'):
+                if self.module.arch_train:
+                    self.module.set_subnet(mode='direct')
+                    direct_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(direct_subnet_losses,
+                                   f'direct_subnet{direct_net_num}'))
+                    direct_net_num += 1
+                else:
+                    self.module.set_subnet(mode='random')
+                    random_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(random_subnet_losses,
+                                   f'random_subnet{random_net_num}'))
+                    random_net_num += 1
+            elif kind in ('scheduled_random'):
+                if random.uniform(0, 1) > self.module.cur_sample_prob\
+                        and self.module.arch_train:
+                    self.module.set_subnet(mode='direct')
+                    direct_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(direct_subnet_losses,
+                                   f'direct_subnet{direct_net_num}'))
+                    direct_net_num += 1
+                else:
+                    self.module.set_subnet(mode='random')
+                    random_subnet_losses = distill_step(
+                        batch_inputs, data_samples)
+                    total_losses.update(
+                        add_prefix(random_subnet_losses,
+                                   f'random_subnet{random_net_num}'))
+                    random_net_num += 1
+                self.module.cur_sample_prob *= 0.9999
+
+        # update arch parameters
+        if self.module.arch_train \
+                and self.module._iter % self.module.arch_train_freq == 0:
+            with optim_wrapper['mutator'].optim_context(self):
+                optim_wrapper['mutator'].zero_grad()
+                mutator_loss = self.module._update_arch_params(
+                    batch_inputs, data_samples, optim_wrapper, mode='loss')
+            total_losses.update(mutator_loss)
+        return total_losses
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/group_fisher_algoritho.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/group_fisher_algoritho.py
new file mode 100644
index 0000000000000000000000000000000000000000..eccbe122889a35596db487dbfdf0a07a54f4922e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/group_fisher_algoritho.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherAlgorithm  # noqa
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/ite_prune_algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/ite_prune_algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..937aaa156fca89a81b95cc16dbe88906f7e003ad
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/ite_prune_algorithm.py
@@ -0,0 +1,273 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmengine import MessageHub, MMLogger
+from mmengine.model import BaseModel
+from mmengine.structures import BaseDataElement
+
+from mmrazor.models.mutables import MutableChannelUnit
+from mmrazor.models.mutators import ChannelMutator
+from mmrazor.registry import MODELS
+from ..base import BaseAlgorithm
+
+LossResults = Dict[str, torch.Tensor]
+TensorResults = Union[Tuple[torch.Tensor], torch.Tensor]
+PredictResults = List[BaseDataElement]
+ForwardResults = Union[LossResults, TensorResults, PredictResults]
+
+
+class ItePruneConfigManager:
+    """ItePruneConfigManager manages the config of the structure of the model
+    during pruning.
+
+    Args:
+        target (Dict[str, Union[int, float]]): The target structure to prune.
+        supernet (Dict[str, Union[int, float]]): The sturecture of the
+            supernet.
+        step_freq (int, optional): The prune step of epoch/iter to prune.
+            Defaults to 1.
+        prune_times (int, optional): The times to prune. Defaults to 1.
+        linear_schedule (bool, optional): flag to set linear ratio schedule.
+            Defaults to True.
+    """
+
+    def __init__(self,
+                 target: Dict[str, Union[int, float]],
+                 supernet: Dict[str, Union[int, float]],
+                 step_freq=1,
+                 prune_times=1,
+                 linear_schedule=True) -> None:
+
+        self.supernet = supernet
+        self.target = target
+        self.step_freq = step_freq
+        self.prune_times = prune_times
+        self.linear_schedule = linear_schedule
+
+        self.delta: Dict = self._get_delta_each_iter(self.target,
+                                                     self.supernet,
+                                                     self.prune_times)
+
+    def is_prune_time(self, iteration):
+        """Is the time to prune during training process."""
+        return iteration % self.step_freq == 0 \
+            and iteration // self.step_freq < self.prune_times
+
+    def prune_at(self, iteration):
+        """Get the pruning structure in a time(iteration)."""
+        times = iteration // self.step_freq + 1
+        assert times <= self.prune_times
+        prune_current = {}
+        ratio = times / self.prune_times
+
+        for key in self.target:
+            if self.linear_schedule:
+                # TO DO: add scheduler for more pruning rate schedule
+                prune_current[key] = (self.target[key] - self.supernet[key]
+                                      ) * ratio + self.supernet[key]
+            else:
+                prune_current[key] = self.target[key]
+            if isinstance(self.supernet[key], int):
+                prune_current[key] = int(prune_current[key])
+        return prune_current
+
+    def _get_delta_each_iter(self, target: Dict, supernet: Dict, times: int):
+        """Get the structure change for pruning once."""
+        delta = {}
+        for key in target:
+            one_target = target[key]
+            if isinstance(one_target, float):
+                delta[key] = (1.0 - one_target) / times
+            elif isinstance(one_target, int):
+                delta[key] = int((supernet[key] - one_target) / times)
+            else:
+                raise NotImplementedError()
+        return delta
+
+
+@MODELS.register_module()
+class ItePruneAlgorithm(BaseAlgorithm):
+    """ItePruneAlgorithm prunes a model iteratively until reaching a prune-
+    target.
+
+    Args:
+        architecture (Union[BaseModel, Dict]): The model to be pruned.
+        mutator_cfg (Union[Dict, ChannelMutator], optional): The config
+            of a mutator. Defaults to dict( type='ChannelMutator',
+            channel_unit_cfg=dict( type='SequentialMutableChannelUnit')).
+        data_preprocessor (Optional[Union[Dict, nn.Module]], optional):
+            Defaults to None.
+        target_pruning_ratio (dict, optional): The prune-target. The template
+            of the prune-target can be get by calling
+            mutator.choice_template(). Defaults to {}.
+        step_freq (int, optional): The step between two pruning operations.
+            Defaults to 1.
+        prune_times (int, optional): The total times to prune a model.
+            Defaults to 1.
+        init_cfg (Optional[Dict], optional): init config for architecture.
+            Defaults to None.
+        linear_schedule (bool, optional): flag to set linear ratio schedule.
+            Defaults to True.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator_cfg: Union[Dict, ChannelMutator] = dict(
+                     type='ChannelMutator',
+                     channel_unit_cfg=dict(
+                         type='SequentialMutableChannelUnit')),
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 target_pruning_ratio: Optional[Dict[str, float]] = None,
+                 step_freq=1,
+                 prune_times=1,
+                 init_cfg: Optional[Dict] = None,
+                 linear_schedule=True) -> None:
+
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        # decided by EpochBasedRunner or IterBasedRunner
+        self.target_pruning_ratio = target_pruning_ratio
+        self.step_freq = step_freq
+        self.prune_times = prune_times
+        self.linear_schedule = linear_schedule
+
+        self.mutator: ChannelMutator = MODELS.build(mutator_cfg)
+        self.mutator.prepare_from_supernet(self.architecture)
+
+    def set_target_pruning_ratio(
+            self, target: Dict[str, float],
+            units: List[MutableChannelUnit]) -> Dict[str, float]:
+        """According to the target pruning ratio of each unit, set the target
+        ratio of each unit in units."""
+        target_pruning_ratio: Dict[str, float] = dict()
+        for unit in units:
+            assert isinstance(unit, MutableChannelUnit), (
+                f'unit should be `MutableChannelUnit`, but got {type(unit)}.')
+            unit_name = unit.name
+            # The config of target pruning ratio does not
+            # contain all units.
+            if unit_name not in target:
+                continue
+            unit_target = target[unit_name]
+            assert isinstance(unit_target, (float, int))
+            target_pruning_ratio[unit_name] = unit_target
+        return target_pruning_ratio
+
+    def check_prune_target(self, config: Dict):
+        """Check if the prune-target is supported."""
+        for value in config.values():
+            assert isinstance(value, int) or isinstance(value, float)
+
+    def _init_prune_config_manager(self):
+        """init prune_config_manager and check step_freq & prune_times.
+
+        message_hub['max_epoch/iter'] unaccessible when initiation.
+        """
+        if self.target_pruning_ratio is None:
+            target_pruning_ratio = self.mutator.current_choices
+        else:
+            target_pruning_ratio = self.set_target_pruning_ratio(
+                self.target_pruning_ratio, self.mutator.mutable_units)
+
+        if self.by_epoch:
+            # step_freq based on iterations
+            self.step_freq *= self._iters_per_epoch
+
+        # config_manager move to forward.
+        # message_hub['max_epoch'] unaccessible when init
+        prune_config_manager = ItePruneConfigManager(
+            target_pruning_ratio,
+            self.mutator.current_choices,
+            self.step_freq,
+            prune_times=self.prune_times,
+            linear_schedule=self.linear_schedule)
+
+        return prune_config_manager
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[List[BaseDataElement]] = None,
+                mode: str = 'tensor') -> ForwardResults:
+        """Forward."""
+
+        if self.training:
+            if not hasattr(self, 'prune_config_manager'):
+                # self._iters_per_epoch() only available after initiation
+                self.prune_config_manager = self._init_prune_config_manager()
+            if self.prune_config_manager.is_prune_time(self._iter):
+
+                config = self.prune_config_manager.prune_at(self._iter)
+
+                self.mutator.set_choices(config)
+
+                logger = MMLogger.get_current_instance()
+                if (self.by_epoch):
+                    logger.info(
+                        f'The model is pruned at {self._epoch}th epoch once.')
+                else:
+                    logger.info(
+                        f'The model is pruned at {self._iter}th iter once.')
+
+        return super().forward(inputs, data_samples, mode)
+
+    def init_weights(self):
+        return self.architecture.init_weights()
+
+    # private methods
+
+    @property
+    def by_epoch(self):
+        """Get epoch/iter based train loop."""
+        # IterBasedTrainLoop max_epochs default to 1
+        # TO DO: Add by_epoch params or change default max_epochs?
+        return self._max_epochs != 1
+
+    @property
+    def _epoch(self):
+        """Get current epoch number."""
+        message_hub = MessageHub.get_current_instance()
+        if 'epoch' in message_hub.runtime_info:
+            return message_hub.runtime_info['epoch']
+        else:
+            raise RuntimeError('Use MessageHub before initiation.'
+                               'epoch is inited in before_run_epoch().')
+
+    @property
+    def _iter(self):
+        """Get current sum iteration number."""
+        message_hub = MessageHub.get_current_instance()
+        if 'iter' in message_hub.runtime_info:
+            return message_hub.runtime_info['iter']
+        else:
+            raise RuntimeError('Use MessageHub before initiation.'
+                               'iter is inited in before_run_iter().')
+
+    @property
+    def _max_epochs(self):
+        """Get max epoch number.
+
+        Default 1 for IterTrainLoop
+        """
+        message_hub = MessageHub.get_current_instance()
+        if 'max_epochs' in message_hub.runtime_info:
+            return message_hub.runtime_info['max_epochs']
+        else:
+            raise RuntimeError('Use MessageHub before initiation.'
+                               'max_epochs is inited in before_run_epoch().')
+
+    @property
+    def _max_iters(self):
+        """Get max iteration number."""
+        message_hub = MessageHub.get_current_instance()
+        if 'max_iters' in message_hub.runtime_info:
+            return message_hub.runtime_info['max_iters']
+        else:
+            raise RuntimeError('Use MessageHub before initiation.'
+                               'max_iters is inited in before_run_iter().')
+
+    @property
+    def _iters_per_epoch(self):
+        """Get iter num per epoch."""
+        return self._max_iters / self._max_epochs
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/slimmable_network.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/slimmable_network.py
new file mode 100644
index 0000000000000000000000000000000000000000..f57c223ee99965ace88ae895aeed002c7050e69a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/pruning/slimmable_network.py
@@ -0,0 +1,219 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+
+import torch
+from mmengine.model import BaseModel, MMDistributedDataParallel
+from mmengine.optim import OptimWrapper
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.mutables import BaseMutable
+from mmrazor.models.mutators import SlimmableChannelMutator
+from mmrazor.models.utils import (add_prefix,
+                                  reinitialize_optim_wrapper_count_status)
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from mmrazor.structures.subnet.fix_subnet import _dynamic_to_static
+from ..base import BaseAlgorithm
+
+VALID_MUTATOR_TYPE = Union[SlimmableChannelMutator, Dict]
+VALID_PATH_TYPE = Union[str, Path]
+VALID_CHANNEL_CFG_PATH_TYPE = Union[VALID_PATH_TYPE, List[VALID_PATH_TYPE]]
+
+
+@MODELS.register_module()
+class SlimmableNetwork(BaseAlgorithm):
+    """Slimmable Neural Networks.
+
+    Please refer to paper
+    [Slimmable Neural Networks](https://arxiv.org/abs/1812.08928) for details.
+
+    Args:
+        mutator (dict | :obj:`SlimmableChannelMutator`): The config of
+            :class:`SlimmableChannelMutator` or built mutator.
+            About the config of mutator, please refer to
+            SlimmableChannelMutator
+        architecture (dict | :obj:`BaseModel`): The config of
+            :class:`BaseModel` or built model.
+        deploy_index (int): index of subnet to be deployed.
+        data_preprocessor (dict | :obj:`torch.nn.Module` | None): The
+            pre-process config of :class:`BaseDataPreprocessor`.
+            Defaults to None.
+        init_cfg (dict | None): The weight initialized config for
+            :class:`BaseModule`. Default to None.
+    """
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 mutator: VALID_MUTATOR_TYPE = None,
+                 deploy_index=-1,
+                 data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        if isinstance(mutator, dict):
+            self.mutator = MODELS.build(mutator)
+        else:
+            self.mutator = mutator
+        self.mutator.prepare_from_supernet(self.architecture)
+        self.num_subnet = len(self.mutator.subnets)
+
+        # must after `prepare_from_supernet`
+        if deploy_index != -1:
+            self._deploy(deploy_index)
+        else:
+            self.is_deployed = False
+
+        # HACK
+        # reinitialize count status of `OptimWrapper` since
+        # `optim_wrapper.update_params` will be called multiple times
+        # in our slimmable train step.
+        self._optim_wrapper_count_status_reinitialized = False
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """Train step."""
+        input_data = self.data_preprocessor(data, True)
+        batch_inputs = input_data['inputs']
+        data_samples = input_data['data_samples']
+        train_kwargs = dict(
+            batch_inputs=batch_inputs,
+            data_samples=data_samples,
+            optim_wrapper=optim_wrapper)
+        if self.is_deployed:
+            return self._fixed_train_step(**train_kwargs)
+        else:
+            return self._slimmable_train_step(**train_kwargs)
+
+    def _slimmable_train_step(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: List[BaseDataElement],
+        optim_wrapper: OptimWrapper,
+    ) -> Dict[str, torch.Tensor]:
+        """Train step of Slimmable Network."""
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=self.num_subnet)
+            self._optim_wrapper_count_status_reinitialized = True
+        total_losses = dict()
+
+        for subnet_idx, subnet in enumerate(self.mutator.subnets):
+            self.mutator.set_choices(subnet)
+            with optim_wrapper.optim_context(self):
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, _ = self.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+
+            total_losses.update(add_prefix(losses, f'subnet_{subnet_idx}'))
+
+        return total_losses
+
+    def _fixed_train_step(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: List[BaseDataElement],
+        optim_wrapper: OptimWrapper,
+    ) -> Dict[str, torch.Tensor]:
+        """Train step of fixed network."""
+        with optim_wrapper.optim_context(self):
+            losses = self(batch_inputs, data_samples, mode='loss')
+        parsed_losses, _ = self.parse_losses(losses)
+        optim_wrapper.update_params(parsed_losses)
+
+        return losses
+
+    def _fix_archtecture(self):
+        for module in self.architecture.modules():
+            if isinstance(module, BaseMutable):
+                if not module.is_fixed:
+                    module.fix_chosen(None)
+
+    def _deploy(self, index: int):
+        self.mutator.set_choices(self.mutator.subnets[index])
+        self.mutator.fix_channel_mutables()
+        self._fix_archtecture()
+        _dynamic_to_static(self.architecture)
+        self.is_deployed = True
+
+
+@MODEL_WRAPPERS.register_module()
+class SlimmableNetworkDDP(MMDistributedDataParallel):
+    """DDP wrapper for Slimmable Neural Network."""
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+
+    def train_step(self, data: List[dict],
+                   optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
+        """Train step."""
+        input_data = self.module.data_preprocessor(data, True)
+        batch_inputs = input_data['inputs']
+        data_samples = input_data['data_samples']
+        train_kwargs = dict(
+            batch_inputs=batch_inputs,
+            data_samples=data_samples,
+            optim_wrapper=optim_wrapper)
+        if self.module.is_deployed:
+            return self._fixed_train_step(**train_kwargs)
+        else:
+            return self._slimmable_train_step(**train_kwargs)
+
+    def _slimmable_train_step(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: List[BaseDataElement],
+        optim_wrapper: OptimWrapper,
+    ) -> Dict[str, torch.Tensor]:
+        """Train step of Slimmable Network."""
+        if not self._optim_wrapper_count_status_reinitialized:
+            reinitialize_optim_wrapper_count_status(
+                model=self,
+                optim_wrapper=optim_wrapper,
+                accumulative_counts=self.module.num_subnet)
+            self._optim_wrapper_count_status_reinitialized = True
+        total_losses = dict()
+
+        for subnet_idx, subnet in enumerate(self.module.mutator.subnets):
+            self.module.mutator.set_choices(subnet)
+            with optim_wrapper.optim_context(self):
+                losses = self(batch_inputs, data_samples, mode='loss')
+            parsed_losses, _ = self.module.parse_losses(losses)
+            optim_wrapper.update_params(parsed_losses)
+
+            total_losses.update(add_prefix(losses, f'subnet_{subnet_idx}'))
+
+        return total_losses
+
+    def _fixed_train_step(
+        self,
+        batch_inputs: torch.Tensor,
+        data_samples: List[BaseDataElement],
+        optim_wrapper: OptimWrapper,
+    ) -> Dict[str, torch.Tensor]:
+        """Train step of fixed network."""
+        with optim_wrapper.optim_context(self):
+            losses = self(batch_inputs, data_samples, mode='loss')
+        parsed_losses, _ = self.module.parse_losses(losses)
+        optim_wrapper.update_params(parsed_losses)
+
+        return losses
+
+    @property
+    def _optim_wrapper_count_status_reinitialized(self) -> bool:
+        return self.module._optim_wrapper_count_status_reinitialized
+
+    @_optim_wrapper_count_status_reinitialized.setter
+    def _optim_wrapper_count_status_reinitialized(self, val: bool) -> None:
+        assert isinstance(val, bool)
+
+        self.module._optim_wrapper_count_status_reinitialized = val
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..03a9538e230955e415acdcd29a07ff17b18776b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .mm_architecture import MMArchitectureQuant, MMArchitectureQuantDDP
+
+__all__ = ['MMArchitectureQuant', 'MMArchitectureQuantDDP']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/mm_architecture.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/mm_architecture.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce6d926d0852f38c067850e9de3a7249d4637b59
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/algorithms/quantization/mm_architecture.py
@@ -0,0 +1,427 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import torch
+from mmengine.config import Config
+from mmengine.model import MMDistributedDataParallel
+from mmengine.runner import load_checkpoint
+from mmengine.structures import BaseDataElement
+from torch import nn
+
+from mmrazor.models.utils import pop_rewriter_function_record
+from mmrazor.registry import MODEL_WRAPPERS, MODELS
+from mmrazor.structures.quantization import QConfigHandler
+from ..base import BaseAlgorithm, BaseModel
+
+try:
+    from torch.ao.quantization import (FakeQuantizeBase, MinMaxObserver,
+                                       PerChannelMinMaxObserver,
+                                       disable_observer)
+except ImportError:
+    from mmrazor.utils import get_placeholder
+
+    FakeQuantizeBase = get_placeholder('torch>=1.13')
+    MinMaxObserver = get_placeholder('torch>=1.13')
+    PerChannelMinMaxObserver = get_placeholder('torch>=1.13')
+    disable_observer = get_placeholder('torch>=1.13')
+
+LossResults = Dict[str, torch.Tensor]
+TensorResults = Union[Tuple[torch.Tensor], torch.Tensor]
+PredictResults = List[BaseDataElement]
+ForwardResults = Union[LossResults, TensorResults, PredictResults]
+
+
+@MODELS.register_module()
+class MMArchitectureQuant(BaseAlgorithm):
+    """General quantization for OpenMMLab's models.
+
+    Args:
+        architecture (Union[Dict, BaseModel]): The config of model to be
+            quantized.
+        quantizer (Union[Dict, BaseModel]): The quantizer to support different
+            backend type.
+        deploy_cfg (Union[str, Dict]): Deployment config file or Config object.
+        qmodel_modes (List): The available mode of runner.
+        data_preprocessor (Optional[Dict]): The pre-process
+            config of :class:`BaseDataPreprocessor`. Defaults to None.
+        forward_modes (Tuple): The modes in forward method in OpenMMLab
+            architecture could be tensor, predict, or loss. It can generate
+            different graph of quantized model.
+        float_checkpoint (Optional[str]): The path of pretrained FP checkpoint.
+            Quantization is different from or task, we recommend to use
+            `float_checkpoint` as pretrain model. Defaults to None.
+        init_cfg (Optional[Dict]): The weight initialized config for:
+            class:`BaseModule`.
+
+    Note:
+        forward_modes (Tuple): In OpenMMLab architecture, differenet modes
+            will trace a different graph of quantized model.
+    """
+
+    def __init__(self,
+                 architecture: Union[Dict, BaseModel],
+                 quantizer: Union[Dict, BaseModel],
+                 deploy_cfg: Optional[Union[str, Dict]] = None,
+                 data_preprocessor: Optional[Dict] = None,
+                 forward_modes: Tuple = ('tensor', 'predict', 'loss'),
+                 float_checkpoint: Optional[str] = None,
+                 input_shapes: Tuple = (1, 3, 224, 224),
+                 init_cfg: Optional[Dict] = None):
+
+        super().__init__(architecture, data_preprocessor, init_cfg)
+
+        self.quantizer = MODELS.build(quantizer)
+        self.input_shapes = input_shapes
+        self.forward_modes = forward_modes
+        if isinstance(deploy_cfg, str):
+            deploy_cfg = Config.fromfile(deploy_cfg)
+        self.deploy_cfg = deploy_cfg
+
+        # Replace syncbn and _BatchNormXd (in mmengine) with batchnorm2d
+        self.quantizer.convert_batchnorm2d(self.architecture)
+
+        # If we have a float_checkpoint, we load it as pretrain.
+        if float_checkpoint:
+            _ = load_checkpoint(self.architecture, float_checkpoint)
+            self.architecture._is_init = True
+
+        self.qmodels = self._build_qmodels(self.architecture)
+        self.sync_qparams('tensor')
+        self.reset_observer_and_fakequant_statistics(self)
+
+    def reset_observer_and_fakequant_statistics(self, model):
+        """Reset the statistics in observers and fake quantizers.
+
+        The forward computation in `_build_qmodels` can modify the original
+        statistics in observers and fake quantizers.
+        """
+        for module in model.modules():
+            if isinstance(module, (MinMaxObserver, PerChannelMinMaxObserver)):
+                module.reset_min_max_vals()
+            elif isinstance(module, FakeQuantizeBase):
+                module.scale.data = torch.ones_like(module.scale)
+                module.zero_point.data = torch.zeros_like(module.zero_point)
+
+    def sync_qparams(self, src_mode: str):
+        """Sync all quantize parameters in different `forward_modes`. We could
+        have more than one forward mode to generate graphs, each mode will
+        generate one graph. But in training, only one graph will be update, so
+        we need to sync qparams in the other graphs.
+
+        Args:
+            src_mode (str): The modes of forward method.
+
+        Note:
+            `traverse()` method recursively traverses all modules to sync
+                quantized graph generated from different `forward_modes`.
+                This is because We have different mode ('tensor', 'predict',
+                'loss') in OpenMMLab architecture which have different graph
+                in some subtle ways, so we need to sync them here.
+        """
+
+        def traverse(module, prefix):
+            for name, child in module._modules.items():
+                if module is None:
+                    continue
+                child_name = f'{prefix}{name}'
+                if isinstance(child, FakeQuantizeBase):
+                    for name, param in child.named_parameters():
+                        param_name = f'{child_name}.{name}'
+                        src_param = src_state_dict[param_name]
+                        if src_param.shape == param.shape:
+                            param.data.copy_(src_param)
+                        else:
+                            requirs_grad = param.requires_grad
+                            param.requires_grad = False
+                            param.resize_(src_param.shape)
+                            param.requires_grad = requirs_grad
+                            param.data.copy_(src_param)
+                    for name, buffer in child.named_buffers():
+                        buffer_name = f'{child_name}.{name}'
+                        src_buffer = src_state_dict[buffer_name]
+                        if src_buffer.shape == buffer.shape:
+                            buffer.data.copy_(src_buffer)
+                        else:
+                            buffer.resize_(src_buffer.shape)
+                            buffer.data.copy_(src_buffer)
+                else:
+                    traverse(child, f'{child_name}.')
+
+        src_state_dict = self.qmodels[src_mode].state_dict()
+        for mode in self.forward_modes:
+            if mode == src_mode:
+                continue
+            traverse(self.qmodels[mode], '')
+
+    def _get_rewriter_context_in_mmdeploy(self, deploy_cfg):
+        """Get rewriter context in mmdeploy according to the deploy related
+        config."""
+        from mmdeploy.apis.onnx.passes import optimize_onnx
+        from mmdeploy.codebase import import_codebase
+        from mmdeploy.core import RewriterContext
+        from mmdeploy.utils import (IR, Backend, get_backend, get_codebase,
+                                    get_dynamic_axes, get_ir_config,
+                                    get_onnx_config)
+        from mmdeploy.utils.config_utils import get_codebase_external_module
+
+        codebase = get_codebase(deploy_cfg)
+        custom_module_list = get_codebase_external_module(deploy_cfg)
+        import_codebase(codebase, custom_module_list)
+
+        def _add_or_update(cfg: dict, key: str, val: Any):
+            if key in cfg and isinstance(cfg[key], dict) and isinstance(
+                    val, dict):
+                cfg[key].update(val)
+            else:
+                cfg[key] = val
+
+        context_info = dict()
+        deploy_cfg = copy.deepcopy(deploy_cfg)
+
+        backend = get_backend(deploy_cfg).value
+
+        onnx_cfg = get_onnx_config(deploy_cfg)
+        opset_version = onnx_cfg.get('opset_version', 11)
+
+        input_names = onnx_cfg['input_names']
+        output_names = onnx_cfg['output_names']
+        axis_names = input_names + output_names
+        dynamic_axes = get_dynamic_axes(deploy_cfg, axis_names)
+
+        verbose = not onnx_cfg.get('strip_doc_string', True) or onnx_cfg.get(
+            'verbose', False)
+        keep_initializers_as_inputs = onnx_cfg.get(
+            'keep_initializers_as_inputs', True)
+        optimize = onnx_cfg.get('optimize', False)
+        if backend == Backend.NCNN.value:
+            """NCNN backend needs a precise blob counts, while using onnx
+            optimizer will merge duplicate initilizers without reference
+            count."""
+            optimize = False
+
+        ir_config = dict(
+            type='onnx',
+            input_names=input_names,
+            output_names=output_names,
+            opset_version=opset_version,
+            dynamic_axes=dynamic_axes,
+            verbose=verbose,
+            keep_initializers_as_inputs=keep_initializers_as_inputs)
+
+        _add_or_update(deploy_cfg, 'ir_config', ir_config)
+        ir = IR.get(get_ir_config(deploy_cfg)['type'])
+        if isinstance(backend, Backend):
+            backend = backend.value
+        backend_config = dict(type=backend)
+        _add_or_update(deploy_cfg, 'backend_config', backend_config)
+
+        context_info['cfg'] = deploy_cfg
+        context_info['ir'] = ir
+        if 'backend' not in context_info:
+            context_info['backend'] = backend
+        if 'opset' not in context_info:
+            context_info['opset'] = opset_version
+
+        if 'onnx_custom_passes' not in context_info:
+            onnx_custom_passes = optimize_onnx if optimize else None
+            context_info['onnx_custom_passes'] = onnx_custom_passes
+
+        return RewriterContext(**context_info)
+
+    def _pop_function_record_in_rewriter_context(self, rewriter_context):
+        """Delete user-specific rewriters from
+        `RewriterContext._rewriter_manager`. We use the model which is
+        rewritten by mmdeploy to build quantized models. However not all the
+        functions rewritten by mmdeploy need to be rewritten in mmrazor. For
+        example, mmdeploy rewrite
+        `mmcls.models.classifiers.ImageClassifier.forward` and
+        `mmcls.models.classifiers.BaseClassifier.forward` for deployment. But
+        they can't be rewritten by mmrazor as ptq and qat are done in mmrazor.
+        So to ensure ptq and qat proceed normally, we have to remove these
+        record from `RewriterContext._rewriter_manager`.
+
+        Args:
+            rewriter_context (RewriterContext): The RewriterContext used in
+                mmdeploy.
+        """
+        skipped_methods = getattr(self.quantizer.tracer, 'skipped_methods', [])
+        function_record_to_pop = self.deploy_cfg.get('function_record_to_pop',
+                                                     [])
+        function_record_to_pop.extend(skipped_methods)
+        return pop_rewriter_function_record(rewriter_context,
+                                            function_record_to_pop)
+
+    def _build_qmodels(self, model: BaseModel):
+        """Build quantized models from the given model.
+
+        Args:
+            model (BaseModel): the given fp model.
+
+        Example:
+            The main body of the graph is all the same, but the last one or two
+            op will have difference, as shown below.
+
+            self.qmodels['tensor'].graph.print_tabular()
+            opcode       target            args
+            call_module  head.fc           (activation_post_process_38,)
+            output       output            (head_fc,)
+
+            self.qmodels['loss'].graph.print_tabular()
+            opcode       target            args
+            call_method  _get_loss         (head, head_fc, data_samples)
+            output       output            (_get_loss,)
+
+            self.qmodels['predict'].graph.print_tabular()
+            opcode       target            args
+            call_method  _get_predictions  (head, head_fc, data_samples)
+            output       output            (_get_predictions,)
+        """
+
+        rewriter_context = self._get_rewriter_context_in_mmdeploy(
+            self.deploy_cfg) if self.deploy_cfg is not None else None
+
+        if rewriter_context is not None:
+            # Pop function records in `quantizer.tracer.skipped_method`
+            # temporarily
+            function_record_backup = \
+                self._pop_function_record_in_rewriter_context(rewriter_context)
+
+        qmodels = nn.ModuleDict()
+        for mode in self.forward_modes:
+            concrete_args = {'mode': mode}
+
+            if rewriter_context is not None:
+                with rewriter_context:
+                    observed_module = self.quantizer.prepare(
+                        model, concrete_args)
+            else:
+                observed_module = self.quantizer.prepare(model, concrete_args)
+
+            qmodels[mode] = observed_module
+
+        if rewriter_context is not None:
+            # Add these popped function records back.
+            rewriter_context._rewriter_manager.function_rewriter. \
+                _registry._rewrite_records.update(function_record_backup)
+
+        # data_samples can not be None in detectors during prediction.
+        # But we need to make the dummy prediction in _build_qmodels.
+        # It is more convenient to use `tensor` mode.
+        is_training = qmodels['tensor'].training
+        # Avoid random input changing bn's statistics
+        qmodels['tensor'].eval()
+        # Originally, the steps to train a qat model is as follows:
+        # 1. build qmodels 2. convert the model to ddpmodel 3. forward backward
+        # The shape of `scale` and `zero_point` can be modified during forward.
+        # We initialize these parameters with per-tensor mode by default for
+        # convenience. Their shape will be modified during forward if
+        # per-channel mode is used. It's hacky. Hence we need to input a
+        # dummy input to make sure the shape has been modified.
+        device = next(qmodels.parameters()).device
+        dummy_input = torch.randn(self.input_shapes).to(device)
+        qmodels['tensor'](dummy_input, None, 'tensor')
+        qmodels['tensor'].train(mode=is_training)
+
+        return qmodels
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[List[BaseDataElement]] = None,
+                mode: str = 'tensor') -> ForwardResults:
+        """Forward with qmodels in quantization."""
+
+        if mode in self.qmodels:
+            qmodel = self.qmodels[mode]
+            return qmodel(inputs, data_samples, mode)
+        else:
+            return self.architecture(inputs, data_samples, mode)
+
+    def calibrate_step(self, data: Union[Dict, Tuple, List]):
+        """PTQ method need calibrate by cali data."""
+
+        data = self.data_preprocessor(data, False)
+        return self._run_forward(data, mode='predict')
+
+    def get_deploy_model(self):
+        """Prepare for deploy to the backend with mmdeploy, which will be used
+        in mmdeploy, and usually includes as follows:
+
+        1. prepare for the float model rewritten by mmdeploy.
+        2. load checkpoint consists of float weight and quantized params in
+        mmrazor.
+        3. post process weight fakequant for exporting .onnx that meet
+        the backend's requirement.
+        """
+        device = next(self.parameters()).device
+        quantized_state_dict = self.qmodels['predict'].state_dict()
+        fp32_model = self.architecture
+        self.quantizer.convert_batchnorm2d(fp32_model)
+        observed_model = self.quantizer.prepare(fp32_model)
+        observed_model.load_state_dict(quantized_state_dict)
+
+        self.quantizer.post_process_for_deploy(
+            observed_model,
+            device=device,
+            keep_w_fake_quant=True,
+            update_weight_with_fakequant=True)
+
+        # replace various activation fakequant with base fakequant, which
+        # contributes to deploy our model to various backends.
+        for node in observed_model.graph.nodes:
+            if 'activation_post_process_' in node.name:
+                module_name = node.target
+                module = getattr(observed_model, module_name)
+                fakequant_new = QConfigHandler.replace_fakequant(
+                    module,
+                    self.quantizer.qconfig.a_qscheme,
+                    update_qparams=True)
+                setattr(observed_model, module_name, fakequant_new)
+
+        observed_model.apply(disable_observer)
+
+        return observed_model
+
+
+@MODEL_WRAPPERS.register_module()
+class MMArchitectureQuantDDP(MMDistributedDataParallel):
+    """DDPwapper for MMArchitectureQuant.
+
+    Args:
+        device_ids (Optional[Union[List, int, torch.device]]): devices to run
+        ddp.
+    """
+
+    def __init__(self,
+                 *,
+                 device_ids: Optional[Union[List, int, torch.device]] = None,
+                 **kwargs) -> None:
+
+        if device_ids is None:
+            if os.environ.get('LOCAL_RANK') is not None:
+                device_ids = [int(os.environ['LOCAL_RANK'])]
+        super().__init__(device_ids=device_ids, **kwargs)
+        # After moving all model parameters and buffers to the GPU
+        # (`model.cuda()`), the buffers in model are different.
+        self.module.qmodels = self.module._build_qmodels(
+            self.module.architecture)
+        self.module.sync_qparams('tensor')
+        self.module.reset_observer_and_fakequant_statistics(self)
+
+    def calibrate_step(self, data: Union[Dict, Tuple, List]):
+        """PTQ method need calibrate by cali data."""
+
+        return self.module.calibrate_step(data)
+
+    def sync_qparams(self, src_mode: str):
+        """Same as in 'MMArchitectureQuant'. Sync all quantize parameters in
+        different `forward_modes`. We could have several modes to generate
+        graphs, but in training, only one graph will be update, so we need to
+        sync qparams on the other graphs.
+
+        Args:
+            src_mode (str): The src modes of forward method.
+        """
+
+        self.module.sync_qparams(src_mode)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..1db10e9063fc0257f5f0edad8b121a437182a627
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .backbones import *  # noqa: F401,F403
+from .classifiers import *  # noqa: F401,F403
+from .connectors import *  # noqa: F401,F403
+from .dynamic_ops import *  # noqa: F401,F403
+from .generators import *  # noqa: F401,F403
+from .heads import *  # noqa: F401,F403
+from .necks import *  # noqa: F401,F403
+from .ops import *  # noqa: F401,F403
+from .utils import *  # noqa: F401,F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..99313ee7e71a00b7ab744a3b89c225bb469befd3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/__init__.py
@@ -0,0 +1,12 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .darts_backbone import DartsBackbone
+from .searchable_autoformer import AutoformerBackbone
+from .searchable_mobilenet_v2 import SearchableMobileNetV2
+from .searchable_mobilenet_v3 import AttentiveMobileNetV3
+from .searchable_shufflenet_v2 import SearchableShuffleNetV2
+from .wideresnet import WideResNet
+
+__all__ = [
+    'DartsBackbone', 'AutoformerBackbone', 'SearchableMobileNetV2',
+    'AttentiveMobileNetV3', 'SearchableShuffleNetV2', 'WideResNet'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/darts_backbone.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/darts_backbone.py
new file mode 100644
index 0000000000000000000000000000000000000000..a2c4f068997018d02a0ba73f5a843cffe4fd77ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/darts_backbone.py
@@ -0,0 +1,376 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmcv.cnn import build_activation_layer, build_norm_layer
+from torch import Tensor
+
+from mmrazor.registry import MODELS
+
+
+class FactorizedReduce(nn.Module):
+    """Reduce feature map size by factorized pointwise (stride=2).
+
+    Args:
+        in_channels (int): number of channels of input tensor.
+        out_channels (int): number of channels of output tensor.
+        act_cfg (Dict): config to build activation layer.
+        norm_cfg (Dict): config to build normalization layer.
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        act_cfg: Dict = dict(type='ReLU'),
+        norm_cfg: Dict = dict(type='BN')
+    ) -> None:
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.act_cfg = act_cfg
+        self.norm_cfg = norm_cfg
+        self.relu = build_activation_layer(self.act_cfg)
+        self.conv1 = nn.Conv2d(
+            self.in_channels,
+            self.out_channels // 2,
+            1,
+            stride=2,
+            padding=0,
+            bias=False)
+        self.conv2 = nn.Conv2d(
+            self.in_channels,
+            self.out_channels // 2,
+            1,
+            stride=2,
+            padding=0,
+            bias=False)
+        self.bn = build_norm_layer(self.norm_cfg, self.out_channels)[1]
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward with factorized reduce."""
+        x = self.relu(x)
+        out = torch.cat([self.conv1(x), self.conv2(x[:, :, 1:, 1:])], dim=1)
+        out = self.bn(out)
+        return out
+
+
+class StandardConv(nn.Module):
+    """Standard Convolution in Darts. Basic structure is ReLU-Conv-BN.
+
+    Args:
+        in_channels (int): number of channels of input tensor.
+        out_channels (int): number of channels of output tensor.
+        kernel_size (Union[int, Tuple]): size of the convolving kernel.
+        stride (Union[int, Tuple]): controls the stride for the
+            cross-correlation, a single number or a one-element tuple.
+            Default to 1.
+        padding (Union[str, int, Tuple]): Padding added to both sides
+            of the input. Default to 0.
+        act_cfg (Dict): config to build activation layer.
+        norm_cfg (Dict): config to build normalization layer.
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        kernel_size: Union[int, Tuple],
+        stride: Union[int, Tuple] = 1,
+        padding: Union[str, int, Tuple] = 0,
+        act_cfg: Dict = dict(type='ReLU'),
+        norm_cfg: Dict = dict(type='BN')
+    ) -> None:
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.padding = padding
+        self.act_cfg = act_cfg
+        self.norm_cfg = norm_cfg
+        self.net = nn.Sequential(
+            build_activation_layer(self.act_cfg),
+            nn.Conv2d(
+                self.in_channels,
+                self.out_channels,
+                self.kernel_size,
+                self.stride,
+                self.padding,
+                bias=False),
+            build_norm_layer(self.norm_cfg, self.out_channels)[1])
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward the standard convolution."""
+        return self.net(x)
+
+
+class Node(nn.Module):
+    """Node structure of DARTS.
+
+    Args:
+        node_id (str): key of the node.
+        num_prev_nodes (int): number of previous nodes.
+        channels (int): number of channels of current node.
+        num_downsample_nodes (int): index of downsample node.
+        mutable_cfg (Dict): config of `DiffMutableModule`.
+        route_cfg (Dict): config of `DiffChoiceRoute`.
+    """
+
+    def __init__(self, node_id: str, num_prev_nodes: int, channels: int,
+                 num_downsample_nodes: int, mutable_cfg: Dict,
+                 route_cfg: Dict) -> None:
+        super().__init__()
+        edges = nn.ModuleDict()
+        for i in range(num_prev_nodes):
+            stride = 2 if i < num_downsample_nodes else 1
+            edge_id = f'{node_id}_p{i}'
+
+            module_kwargs = dict(
+                in_channels=channels,
+                out_channels=channels,
+                stride=stride,
+            )
+
+            mutable_cfg.update(module_kwargs=module_kwargs)
+            mutable_cfg.update(alias=edge_id)
+            edges.add_module(edge_id, MODELS.build(mutable_cfg))
+
+        route_cfg.update(alias=node_id)
+        route_cfg.update(edges=edges)
+        self.route = MODELS.build(route_cfg)
+
+    def forward(self, prev_nodes: Union[List[Tensor],
+                                        Tuple[Tensor]]) -> Tensor:
+        """Forward with the previous nodes list."""
+        return self.route(prev_nodes)
+
+
+class Cell(nn.Module):
+    """Darts cell structure.
+
+    Args:
+        num_nodes (int): number of nodes.
+        channels (int): number of channels of current cell.
+        prev_channels (int): number of channel of previous input.
+        prev_prev_channels (int): number of channel of previous previous input.
+        reduction (bool): whether to reduce the feature map size.
+        prev_reduction (bool): whether to reduce the previous feature map size.
+        mutable_cfg (Optional[Dict]): config of `DiffMutableModule`.
+        route_cfg (Optional[Dict]): config of `DiffChoiceRoute`.
+        act_cfg (Dict): config to build activation layer.
+            Defaults to dict(type='ReLU').
+        norm_cfg (Dict): config to build normalization layer.
+            Defaults to dict(type='BN').
+    """
+
+    def __init__(
+            self,
+            num_nodes: int,
+            channels: int,
+            prev_channels: int,
+            prev_prev_channels: int,
+            reduction: bool,
+            prev_reduction: bool,
+            mutable_cfg: Dict,
+            route_cfg: Dict,
+            act_cfg: Dict = dict(type='ReLU'),
+            norm_cfg: Dict = dict(type='BN'),
+    ) -> None:
+
+        super().__init__()
+        self.act_cfg = act_cfg
+        self.norm_cfg = norm_cfg
+        self.reduction = reduction
+        self.num_nodes = num_nodes
+
+        # If previous cell is reduction cell, current input size does not match
+        # with output size of cell[k-2]. So the output[k-2] should be reduced
+        # by preprocessing.
+        if prev_reduction:
+            self.preproc0 = FactorizedReduce(prev_prev_channels, channels,
+                                             self.act_cfg, self.norm_cfg)
+        else:
+            self.preproc0 = StandardConv(prev_prev_channels, channels, 1, 1, 0,
+                                         self.act_cfg, self.norm_cfg)
+        self.preproc1 = StandardConv(prev_channels, channels, 1, 1, 0,
+                                     self.act_cfg, self.norm_cfg)
+
+        # generate dag
+        self.nodes = nn.ModuleList()
+        for depth in range(2, self.num_nodes + 2):
+            if reduction:
+                node_id = f'reduce_n{depth}'
+                num_downsample_nodes = 2
+            else:
+                node_id = f'normal_n{depth}'
+                num_downsample_nodes = 0
+            self.nodes.append(
+                Node(node_id, depth, channels, num_downsample_nodes,
+                     mutable_cfg, route_cfg))
+
+    def forward(self, s0: Tensor, s1: Tensor) -> Tensor:
+        """Forward with the outputs of previous previous cell and previous
+        cell."""
+        tensors = [self.preproc0(s0), self.preproc1(s1)]
+        for node in self.nodes:
+            cur_tensor = node(tensors)
+            tensors.append(cur_tensor)
+
+        return torch.cat(tensors[2:], dim=1)
+
+
+class AuxiliaryModule(nn.Module):
+    """Auxiliary head in 2/3 place of network to let the gradient flow well.
+
+    Args:
+        in_channels (int): number of channels of inputs.
+        base_channels (int): number of middle channels of the auxiliary module.
+        out_channels (int): number of channels of outputs.
+        norm_cfg (Dict): config to build normalization layer.
+            Defaults to dict(type='BN').
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 base_channels: int,
+                 out_channels: int,
+                 norm_cfg: Dict = dict(type='BN')) -> None:
+        super().__init__()
+        self.norm_cfg = norm_cfg
+        self.net = nn.Sequential(
+            nn.ReLU(),
+            nn.AvgPool2d(5, stride=2, padding=0,
+                         count_include_pad=False),  # 2x2 out
+            nn.Conv2d(in_channels, base_channels, kernel_size=1, bias=False),
+            build_norm_layer(self.norm_cfg, base_channels)[1],
+            nn.ReLU(inplace=True),
+            nn.Conv2d(base_channels, out_channels, kernel_size=2,
+                      bias=False),  # 1x1 out
+            build_norm_layer(self.norm_cfg, out_channels)[1],
+            nn.ReLU(inplace=True))
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward the auxiliary module."""
+        return self.net(x)
+
+
+@MODELS.register_module()
+class DartsBackbone(nn.Module):
+    """Backbone of Differentiable Architecture Search (DARTS).
+
+    Args:
+        in_channels (int): number of channels of input tensor.
+        base_channels (int): number of middle channels.
+        mutable_cfg (Optional[Dict]): config of `DiffMutableModule`.
+        route_cfg (Optional[Dict]): config of `DiffChoiceRoute`.
+        num_layers (Optional[int]): number of layers.
+            Defaults to 8.
+        num_nodes (Optional[int]): number of nodes.
+            Defaults to 4.
+        stem_multiplier (Optional[int]): multiplier for stem.
+            Defaults to 3.
+        out_indices (tuple, optional): output indices for auxliary module.
+            Defaults to (7, ).
+        auxliary (bool, optional): whether use auxliary module.
+            Defaults to False.
+        aux_channels (Optional[int]): number of middle channels of
+            auxliary module. Defaults to None.
+        aux_out_channels (Optional[int]): number of output channels of
+            auxliary module. Defaults to None.
+        act_cfg (Dict): config to build activation layer.
+            Defaults to dict(type='ReLU').
+        norm_cfg (Dict): config to build normalization layer.
+            Defaults to dict(type='BN').
+    """
+
+    def __init__(
+            self,
+            in_channels: int,
+            base_channels: int,
+            mutable_cfg: Dict,
+            route_cfg: Dict,
+            num_layers: int = 8,
+            num_nodes: int = 4,
+            stem_multiplier: int = 3,
+            out_indices: Union[Tuple, List] = (7, ),
+            auxliary: bool = False,
+            aux_channels: Optional[int] = None,
+            aux_out_channels: Optional[int] = None,
+            act_cfg: Dict = dict(type='ReLU'),
+            norm_cfg: Dict = dict(type='BN'),
+    ) -> None:
+        super().__init__()
+
+        self.in_channels = in_channels
+        self.base_channels = base_channels
+        self.num_layers = num_layers
+        self.num_nodes = num_nodes
+        self.stem_multiplier = stem_multiplier
+        self.out_indices = out_indices
+        assert self.out_indices[-1] == self.num_layers - 1
+        if auxliary:
+            assert aux_channels is not None
+            assert aux_out_channels is not None
+            self.aux_channels = aux_channels
+            self.aux_out_channels = aux_out_channels
+            self.auxliary_indice = 2 * self.num_layers // 3
+
+        else:
+            self.auxliary_indice = -1
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.out_channels = self.stem_multiplier * self.base_channels
+        stem_norm_cfg = copy.deepcopy(self.norm_cfg)
+        stem_norm_cfg.update(dict(affine=True))
+        self.stem = nn.Sequential(
+            nn.Conv2d(
+                self.in_channels, self.out_channels, 3, 1, 1, bias=False),
+            build_norm_layer(self.norm_cfg, self.out_channels)[1])
+
+        # for the first cell, stem is used for both s0 and s1
+        # prev_prev_channels and prev_channels is output channel size,
+        # but c_cur is input channel size.
+        prev_prev_channels = self.out_channels
+        prev_channels = self.out_channels
+        self.out_channels = self.base_channels
+
+        self.cells = nn.ModuleList()
+        prev_reduction, reduction = False, False
+        for i in range(self.num_layers):
+            prev_reduction, reduction = reduction, False
+            # Reduce featuremap size and double channels in 1/3
+            # and 2/3 layer.
+            if i in [self.num_layers // 3, 2 * self.num_layers // 3]:
+                self.out_channels *= 2
+                reduction = True
+
+            cell = Cell(self.num_nodes, self.out_channels, prev_channels,
+                        prev_prev_channels, reduction, prev_reduction,
+                        mutable_cfg, route_cfg, self.act_cfg, self.norm_cfg)
+            self.cells.append(cell)
+
+            prev_prev_channels = prev_channels
+            prev_channels = self.out_channels * self.num_nodes
+
+            if i == self.auxliary_indice:
+                self.auxliary_module = AuxiliaryModule(prev_channels,
+                                                       self.aux_channels,
+                                                       self.aux_out_channels,
+                                                       self.norm_cfg)
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward the darts backbone."""
+        outs = []
+        s0 = s1 = self.stem(x)
+        for i, cell in enumerate(self.cells):
+            s0, s1 = s1, cell(s0, s1)
+            if i in self.out_indices:
+                outs.append(s1)
+            if i == self.auxliary_indice and self.training:
+                aux_feature = self.auxliary_module(s1)
+                outs.insert(0, aux_feature)
+
+        return tuple(outs)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_autoformer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_autoformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..28bd85dc814241dc236d02e6535560208464c337
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_autoformer.py
@@ -0,0 +1,375 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List
+
+import numpy as np
+import torch
+import torch.nn as nn
+from mmcv.cnn import build_activation_layer, build_norm_layer
+
+from mmrazor.models.architectures.dynamic_ops.bricks import (
+    DynamicLinear, DynamicMultiheadAttention, DynamicPatchEmbed,
+    DynamicSequential)
+from mmrazor.models.mutables import (BaseMutable, BaseMutableChannel,
+                                     MutableChannelContainer,
+                                     OneShotMutableChannel,
+                                     OneShotMutableValue)
+from mmrazor.models.mutables.mutable_channel import OneShotMutableChannelUnit
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models.backbones.base_backbone import BaseBackbone
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseBackbone = get_placeholder('mmcls')
+
+
+class TransformerEncoderLayer(BaseBackbone):
+    """Autoformer block.
+
+    Args:
+        embed_dims (int): Number of input channels.
+        num_heads (int): Number of attention heads.
+        mlp_ratio (List): Ratio of ffn.
+        attn_drop_rate (float): Dropout rate of the dropout layer after the
+            attention calculation of query and key. Defaults to 0.
+        proj_drop_rate (float): Dropout rate of the dropout layer after the
+            output projection. Defaults to 0.
+        out_drop_rate (dict): Dropout rate of the dropout layer before adding
+            the shortcut. Defaults to 0.
+        qkv_bias (bool, optional): Whether to keep bias of qkv.
+            Defaults to True.
+        act_cfg (Dict, optional): The config for acitvation function.
+            Defaults to dict(type='GELU').
+        norm_cfg (Dict, optional): The config for normalization.
+            Defaults to dict(type='mmrazor.DynamicLayerNorm').
+        init_cfg (Dict, optional): The config for initialization.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 embed_dims: int,
+                 num_heads: int,
+                 mlp_ratio: float,
+                 proj_drop_rate: float = 0.,
+                 attn_drop_rate: float = 0.,
+                 out_drop_rate: float = 0.,
+                 qkv_bias: bool = True,
+                 act_cfg: Dict = dict(type='GELU'),
+                 norm_cfg: Dict = dict(type='mmrazor.DynamicLayerNorm'),
+                 init_cfg: Dict = None) -> None:
+        super().__init__(init_cfg)
+
+        self.norm1_name, norm1 = build_norm_layer(
+            norm_cfg, embed_dims, postfix=1)
+        self.add_module(self.norm1_name, norm1)
+
+        self.attn = DynamicMultiheadAttention(
+            embed_dims=embed_dims,
+            num_heads=num_heads,
+            attn_drop_rate=attn_drop_rate,
+            proj_drop_rate=proj_drop_rate,
+            out_drop_rate=out_drop_rate,
+            qkv_bias=qkv_bias)
+
+        self.norm2_name, norm2 = build_norm_layer(
+            norm_cfg, embed_dims, postfix=2)
+        self.add_module(self.norm2_name, norm2)
+
+        middle_channels = int(embed_dims * mlp_ratio)
+        self.fc1 = DynamicLinear(embed_dims, middle_channels)
+        self.fc2 = DynamicLinear(middle_channels, embed_dims)
+        self.act = build_activation_layer(act_cfg)
+
+    @property
+    def norm1(self):
+        """The first normalization."""
+        return getattr(self, self.norm1_name)
+
+    @property
+    def norm2(self):
+        """The second normalization."""
+        return getattr(self, self.norm2_name)
+
+    def register_mutables(self, mutable_num_heads: BaseMutable,
+                          mutable_mlp_ratios: BaseMutable,
+                          mutable_q_embed_dims: BaseMutable,
+                          mutable_head_dims: BaseMutable,
+                          mutable_embed_dims: BaseMutable):
+        """Mutate the mutables of encoder layer."""
+        # record the mutables
+        self.mutable_num_heads = mutable_num_heads
+        self.mutable_mlp_ratios = mutable_mlp_ratios
+        self.mutable_q_embed_dims = mutable_q_embed_dims
+        self.mutable_embed_dims = mutable_embed_dims
+        self.mutable_head_dims = mutable_head_dims
+        # handle the mutable of FFN
+        self.middle_channels = mutable_mlp_ratios * mutable_embed_dims
+
+        self.attn.register_mutable_attr('num_heads', mutable_num_heads)
+
+        # handle the mutable of the first dynamic LN
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.norm1, self.mutable_embed_dims, True)
+        # handle the mutable of the second dynamic LN
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.norm2, self.mutable_embed_dims, True)
+
+        # handle the mutable of attn
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.attn, self.mutable_embed_dims, False)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.attn,
+            self.mutable_q_embed_dims,
+            True,
+            end=self.mutable_q_embed_dims.current_choice)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.attn.rel_pos_embed_k, self.mutable_head_dims, False)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.attn.rel_pos_embed_v, self.mutable_head_dims, False)
+
+        # handle the mutable of fc
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.fc1, mutable_embed_dims, False)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.fc1,
+            self.middle_channels,
+            True,
+            start=0,
+            end=self.middle_channels.current_choice)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.fc2,
+            self.middle_channels,
+            False,
+            start=0,
+            end=self.middle_channels.current_choice)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.fc2, mutable_embed_dims, True)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward of Transformer Encode Layer."""
+        residual = x
+        x = self.norm1(x)
+        x = self.attn(x)
+        x = residual + x
+        residual = x
+        x = self.norm2(x)
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.fc2(x)
+        return residual + x
+
+
+@MODELS.register_module()
+class AutoformerBackbone(BaseBackbone):
+    """Autoformer backbone.
+
+    A PyTorch implementation of Autoformer introduced by:
+    `AutoFormer: Searching Transformers for Visual Recognition
+    <https://arxiv.org/abs/2107.00651>`_
+
+    Modified from the `official repo
+    <https://github.com/microsoft/Cream/blob/main/AutoFormer/>`.
+
+    Args:
+        arch_setting (Dict[str, List]): Architecture settings.
+        img_size (int, optional): The image size of input.
+            Defaults to 224.
+        patch_size (int, optional): The patch size of autoformer.
+            Defaults to 16.
+        in_channels (int, optional): The input channel dimension.
+            Defaults to 3.
+        drop_rate (float): Probability of an element to be zeroed.
+            Defaults to 0.
+        drop_path_rate (float): stochastic depth rate. Defaults to 0.
+        qkv_bias (bool, optional): Whether to keep bias of qkv.
+            Defaults to True.
+        norm_cfg (Dict, optional): The config of normalization.
+            Defaults to dict(type='mmrazor.DynamicLayerNorm').
+        act_cfg (Dict, optional): The config of activation functions.
+            Defaults to dict(type='GELU').
+        use_final_norm (bool, optional): Whether use final normalization.
+            Defaults to True.
+        init_cfg (Dict, optional): The config for initialization.
+            Defaults to None.
+
+    Excamples:
+        >>> arch_setting = dict(
+        ...     mlp_ratios=[3.0, 3.5, 4.0],
+        ...     num_heads=[8, 9, 10],
+        ...     depth=[14, 15, 16],
+        ...     embed_dims=[528, 576, 624]
+        ... )
+        >>> model = AutoformerBackbone(arch_setting=arch_setting)
+    """
+
+    def __init__(self,
+                 arch_setting: Dict[str, List],
+                 img_size: int = 224,
+                 patch_size: int = 16,
+                 in_channels: int = 3,
+                 drop_rate: float = 0.,
+                 drop_path_rate: float = 0.,
+                 qkv_bias: bool = True,
+                 norm_cfg: Dict = dict(type='mmrazor.DynamicLayerNorm'),
+                 act_cfg: Dict = dict(type='GELU'),
+                 use_final_norm: bool = True,
+                 init_cfg: Dict = None) -> None:
+
+        super().__init__(init_cfg)
+
+        self.arch_setting = arch_setting
+        self.img_size = img_size
+        self.patch_size = patch_size
+        self.qkv_bias = qkv_bias
+        self.in_channels = in_channels
+        self.drop_rate = drop_rate
+        self.use_final_norm = use_final_norm
+        self.act_cfg = act_cfg
+
+        # adapt mutable settings
+        self.mlp_ratio_range: List = self.arch_setting['mlp_ratios']
+        self.num_head_range: List = self.arch_setting['num_heads']
+        self.depth_range: List = self.arch_setting['depth']
+        self.embed_dim_range: List = self.arch_setting['embed_dims']
+
+        # mutable variables of autoformer
+        self.mutable_depth = OneShotMutableValue(
+            value_list=self.depth_range, default_value=self.depth_range[-1])
+
+        self.mutable_embed_dims = OneShotMutableChannel(
+            num_channels=self.embed_dim_range[-1],
+            candidate_choices=self.embed_dim_range)
+
+        # handle the mutable in multihead attention
+        self.base_embed_dims = OneShotMutableChannel(
+            num_channels=64, candidate_choices=[64])
+
+        self.mutable_num_heads = [
+            OneShotMutableValue(
+                value_list=self.num_head_range,
+                default_value=self.num_head_range[-1])
+            for _ in range(self.depth_range[-1])
+        ]
+        self.mutable_mlp_ratios = [
+            OneShotMutableValue(
+                value_list=self.mlp_ratio_range,
+                default_value=self.mlp_ratio_range[-1])
+            for _ in range(self.depth_range[-1])
+        ]
+
+        self.mutable_q_embed_dims = [
+            i * self.base_embed_dims for i in self.mutable_num_heads
+        ]
+
+        # patch embeddings
+        self.patch_embed = DynamicPatchEmbed(
+            img_size=self.img_size,
+            in_channels=self.in_channels,
+            embed_dims=self.mutable_embed_dims.num_channels)
+
+        # num of patches
+        self.patch_resolution = [
+            img_size // patch_size, img_size // patch_size
+        ]
+        num_patches = self.patch_resolution[0] * self.patch_resolution[1]
+
+        # cls token and pos embed
+        self.pos_embed = nn.Parameter(
+            torch.zeros(1, num_patches + 1,
+                        self.mutable_embed_dims.num_channels))
+
+        self.cls_token = nn.Parameter(
+            torch.zeros(1, 1, self.mutable_embed_dims.num_channels))
+
+        self.drop_after_pos = nn.Dropout(p=drop_rate)
+
+        # stochastic depth decay rule
+        self.dpr = np.linspace(0, drop_path_rate,
+                               self.mutable_depth.max_choice)
+
+        # main body
+        self.blocks = self._make_layer(
+            embed_dims=self.mutable_embed_dims.num_channels,
+            depth=self.mutable_depth.max_choice)
+
+        # final norm
+        if self.use_final_norm:
+            self.norm1_name, norm1 = build_norm_layer(
+                norm_cfg, self.mutable_embed_dims.num_channels)
+            self.add_module(self.norm1_name, norm1)
+
+        self.last_mutable = self.mutable_embed_dims
+
+        self.register_mutables()
+
+    @property
+    def norm1(self):
+        """The first normalization."""
+        return getattr(self, self.norm1_name)
+
+    def _make_layer(self, embed_dims, depth):
+        """Build multiple TransformerEncoderLayers."""
+        layers = []
+        for i in range(depth):
+            layer = TransformerEncoderLayer(
+                embed_dims=embed_dims,
+                num_heads=self.mutable_num_heads[i].max_choice,
+                mlp_ratio=self.mutable_mlp_ratios[i].max_choice,
+                proj_drop_rate=self.drop_rate,
+                out_drop_rate=self.dpr[i],
+                qkv_bias=self.qkv_bias,
+                act_cfg=self.act_cfg)
+            layers.append(layer)
+        return DynamicSequential(*layers)
+
+    def register_mutables(self):
+        """Mutate the autoformer."""
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        # handle the mutation of depth
+        self.blocks.register_mutable_attr('depth', self.mutable_depth)
+
+        # handle the mutation of patch embed
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.patch_embed, self.mutable_embed_dims, True)
+
+        # handle the dependencies of TransformerEncoderLayers
+        for i in range(self.mutable_depth.max_choice):  # max depth here
+            layer = self.blocks[i]
+            layer.register_mutables(
+                mutable_num_heads=self.mutable_num_heads[i],
+                mutable_mlp_ratios=self.mutable_mlp_ratios[i],
+                mutable_q_embed_dims=self.mutable_q_embed_dims[i],
+                mutable_head_dims=self.base_embed_dims,
+                mutable_embed_dims=self.last_mutable)
+
+        # handle the mutable of final norm
+        if self.use_final_norm:
+            MutableChannelContainer.register_mutable_channel_to_module(
+                self.norm1, self.last_mutable, True)
+
+    def forward(self, x: torch.Tensor):
+        """Forward of Autoformer."""
+        B = x.shape[0]
+        x = self.patch_embed(x)
+
+        embed_dims = int(self.mutable_embed_dims.current_choice) if isinstance(
+            self.mutable_embed_dims,
+            BaseMutableChannel) else self.embed_dim_range[-1]
+
+        # cls token
+        cls_tokens = self.cls_token[..., :embed_dims].expand(B, -1, -1)
+        x = torch.cat((cls_tokens, x), dim=1)
+
+        # pos embed
+        x = x + self.pos_embed[..., :embed_dims]
+        x = self.drop_after_pos(x)
+
+        # dynamic depth
+        x = self.blocks(x)
+
+        if self.use_final_norm:
+            x = self.norm1(x)
+
+        return (torch.mean(x[:, 1:], dim=1), )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v2.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..640dde29ad00fa41e7d8427e8398fc49e335323f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v2.py
@@ -0,0 +1,229 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, List, Optional, Sequence, Tuple, Union
+
+from mmcv.cnn import ConvModule
+from mmengine.model import Sequential
+from torch import Tensor
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models.backbones.base_backbone import BaseBackbone
+    from mmcls.models.utils import make_divisible
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseBackbone = get_placeholder('mmcls')
+    make_divisible = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class SearchableMobileNetV2(BaseBackbone):
+    """Searchable MobileNetV2 backbone.
+
+    Args:
+        arch_setting (list[list]): Architecture settings.
+        first_channels (int): Channel width of first ConvModule. Default: 32.
+        last_channels (int): Channel width of last ConvModule. Default: 1200.
+        widen_factor (float): Width multiplier, multiply number of
+            channels in each layer by this amount. Default: 1.0.
+        out_indices (Sequence[int]): Output from which stages.
+            Default: (7, ).
+        frozen_stages (int): Stages to be frozen (all param fixed).
+            Default: -1, which means not freezing any parameters.
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Default: dict(type='ReLU6').
+        norm_eval (bool): Whether to set norm layers to eval mode, namely,
+            freeze running stats (mean and var). Note: Effect on Batch Norm
+            and its variants only. Default: False.
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Default: False.
+        init_cfg (dict | list[dict], optional): initialization configuration
+            dict to define initializer. OpenMMLab has implemented
+            6 initializers, including ``Constant``, ``Xavier``, ``Normal``,
+            ``Uniform``, ``Kaiming``, and ``Pretrained``.
+
+    Excamples:
+        >>> mutable_cfg = dict(
+        ...     type='OneShotMutableOP',
+        ...     candidates=dict(
+        ...         mb_k3e1=dict(
+        ...             type='MBBlock',
+        ...             kernel_size=3,
+        ...             expand_ratio=1,
+        ...             norm_cfg=dict(type='BN'),
+        ...             act_cfg=dict(type='ReLU6'))))
+        >>> arch_setting = [
+        ...     # Parameters to build layers. 4 parameters are needed to
+        ...     # construct a layer, from left to right:
+        ...     # channel, num_blocks, stride, mutable cfg.
+        ...     [16, 1, 1, mutable_cfg],
+        ...     [24, 2, 2, mutable_cfg],
+        ...     [32, 3, 2, mutable_cfg],
+        ...     [64, 4, 2, mutable_cfg],
+        ...     [96, 3, 1, mutable_cfg],
+        ...     [160, 3, 2, mutable_cfg],
+        ...     [320, 1, 1, mutable_cfg]
+        ... ]
+        >>> model = SearchableMobileNetV2(arch_setting=arch_setting)
+    """
+
+    def __init__(
+        self,
+        arch_setting: List[List],
+        first_channels: int = 32,
+        last_channels: int = 1280,
+        widen_factor: float = 1.,
+        out_indices: Sequence[int] = (7, ),
+        frozen_stages: int = -1,
+        conv_cfg: Optional[Dict] = None,
+        norm_cfg: Dict = dict(type='BN'),
+        act_cfg: Dict = dict(type='ReLU6'),
+        norm_eval: bool = False,
+        with_cp: bool = False,
+        init_cfg: Optional[Union[Dict, List[Dict]]] = [
+            dict(type='Kaiming', layer=['Conv2d']),
+            dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm'])
+        ]
+    ) -> None:
+        for index in out_indices:
+            if index not in range(8):
+                raise ValueError('the item in out_indices must in '
+                                 f'range(0, 8). But received {index}')
+
+        if frozen_stages not in range(-1, 8):
+            raise ValueError('frozen_stages must be in range(-1, 8). '
+                             f'But received {frozen_stages}')
+
+        super().__init__(init_cfg)
+
+        self.arch_setting = arch_setting
+        self.widen_factor = widen_factor
+        self.out_indices = out_indices
+        self.frozen_stages = frozen_stages
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.norm_eval = norm_eval
+        self.with_cp = with_cp
+
+        self.in_channels = make_divisible(first_channels * widen_factor, 8)
+
+        self.conv1 = ConvModule(
+            in_channels=3,
+            out_channels=self.in_channels,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            conv_cfg=self.conv_cfg,
+            norm_cfg=self.norm_cfg,
+            act_cfg=self.act_cfg)
+
+        self.layers = []
+
+        for i, layer_cfg in enumerate(arch_setting):
+            channel, num_blocks, stride, mutable_cfg = layer_cfg
+            out_channels = make_divisible(channel * widen_factor, 8)
+            inverted_res_layer = self._make_layer(
+                out_channels=out_channels,
+                num_blocks=num_blocks,
+                stride=stride,
+                mutable_cfg=copy.deepcopy(mutable_cfg))
+            layer_name = f'layer{i + 1}'
+            self.add_module(layer_name, inverted_res_layer)
+            self.layers.append(layer_name)
+
+        if widen_factor > 1.0:
+            self.out_channel = int(last_channels * widen_factor)
+        else:
+            self.out_channel = last_channels
+
+        layer = ConvModule(
+            in_channels=self.in_channels,
+            out_channels=self.out_channel,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            conv_cfg=self.conv_cfg,
+            norm_cfg=self.norm_cfg,
+            act_cfg=self.act_cfg)
+
+        self.add_module('conv2', layer)
+        self.layers.append('conv2')
+
+    def _make_layer(self, out_channels: int, num_blocks: int, stride: int,
+                    mutable_cfg: Dict) -> Sequential:
+        """Stack mutable blocks to build a layer for SearchableMobileNetV2.
+
+        Note:
+            Here we use ``module_kwargs`` to pass dynamic parameters such as
+            ``in_channels``, ``out_channels`` and ``stride``
+            to build the mutable.
+
+        Args:
+            out_channels (int): out_channels of block.
+            num_blocks (int): number of blocks.
+            stride (int): stride of the first block.
+            mutable_cfg (dict): Config of mutable.
+
+        Returns:
+            mmengine.model.Sequential: The layer made.
+        """
+        layers = []
+        for i in range(num_blocks):
+            if i >= 1:
+                stride = 1
+
+            mutable_cfg.update(
+                module_kwargs=dict(
+                    in_channels=self.in_channels,
+                    out_channels=out_channels,
+                    stride=stride))
+            layers.append(MODELS.build(mutable_cfg))
+
+            self.in_channels = out_channels
+
+        return Sequential(*layers)
+
+    def forward(self, x: Tensor) -> Tuple[Tensor, ...]:
+        """Forward computation.
+
+        Args:
+            x (tensor): x contains input data for forward computation.
+        """
+        x = self.conv1(x)
+
+        outs = []
+        for i, layer_name in enumerate(self.layers):
+            layer = getattr(self, layer_name)
+            x = layer(x)
+            if i in self.out_indices:
+                outs.append(x)
+
+        return tuple(outs)
+
+    def _freeze_stages(self) -> None:
+        """Freeze params not to update in the specified stages."""
+        if self.frozen_stages >= 0:
+            for param in self.conv1.parameters():
+                param.requires_grad = False
+        for i in range(1, self.frozen_stages + 1):
+            layer = getattr(self, f'layer{i}')
+            layer.eval()
+            for param in layer.parameters():
+                param.requires_grad = False
+
+    def train(self, mode: bool = True) -> None:
+        """Set module status before forward computation."""
+        super().train(mode)
+
+        self._freeze_stages()
+        if mode and self.norm_eval:
+            for m in self.modules():
+                if isinstance(m, _BatchNorm):
+                    m.eval()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v3.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v3.py
new file mode 100644
index 0000000000000000000000000000000000000000..b5fe373d7598d183723603fba9994aba3d40a63d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_mobilenet_v3.py
@@ -0,0 +1,366 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import OrderedDict
+from typing import Dict, List, Optional, Sequence, Union
+
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+from mmengine.logging import MMLogger
+from mmengine.model import Sequential, constant_init
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.architectures.dynamic_ops.bricks import DynamicSequential
+from mmrazor.models.architectures.ops.mobilenet_series import MBBlock
+from mmrazor.models.architectures.utils.mutable_register import (
+    mutate_conv_module, mutate_mobilenet_layer)
+from mmrazor.models.mutables import (MutableChannelContainer,
+                                     OneShotMutableChannel,
+                                     OneShotMutableChannelUnit,
+                                     OneShotMutableValue)
+from mmrazor.models.utils.parse_values import parse_values
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models.backbones.base_backbone import BaseBackbone
+    from mmcls.models.utils import make_divisible
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseBackbone = get_placeholder('mmcls')
+    make_divisible = get_placeholder('mmcls')
+
+logger = MMLogger.get_current_instance()
+
+
+@MODELS.register_module()
+class AttentiveMobileNetV3(BaseBackbone):
+    """Searchable MobileNetV3 backbone.
+
+    Args:
+        arch_setting (Dict[str, List]): Architecture settings.
+        widen_factor (float): Width multiplier, multiply number of
+            channels in each layer by this amount. Defaults to 1.0.
+        out_indices (Sequence[int]): Output from which stages.
+            Defaults to (7, ).
+        frozen_stages (int): Stages to be frozen (all param fixed).
+            Defaults to -1, which means not freezing any parameters.
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Defaults to None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Defaults to dict(type='BN').
+        act_cfg_list (List): Config dict for activation layer.
+            Defaults to None.
+        stride_list (list): stride setting in each stage.
+            Defaults to None.
+        with_se_list (list): Whether to use se-layer in each stage.
+            Defaults to None.
+        norm_eval (bool): Whether to set norm layers to eval mode, namely,
+            freeze running stats (mean and var). Note: Effect on Batch Norm
+            and its variants only. Defaults to False.
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Defaults to False.
+        zero_init_residual (bool): Zero norm param in linear conv of MBBlock
+            or not when there is a shortcut. Defaults to True.
+        fine_grained_mode (bool): Whether to use fine-grained mode (search
+            kernel size & expand ratio for each MB block in each layers).
+            Defaults to False.
+        with_attentive_shortcut (bool): Use shortcut in AttentiveNAS or not.
+            Defaults to True.
+        init_cfg (dict | list[dict], optional): initialization configuration
+            dict to define initializer. OpenMMLab has implemented
+            6 initializers, including ``Constant``, ``Xavier``, ``Normal``,
+            ``Uniform``, ``Kaiming``, and ``Pretrained``.
+    """
+
+    def __init__(self,
+                 arch_setting: Dict[str, List],
+                 widen_factor: float = 1.,
+                 out_indices: Sequence[int] = (7, ),
+                 frozen_stages: int = -1,
+                 conv_cfg: Dict = dict(type='BigNasConv2d'),
+                 norm_cfg: Dict = dict(type='DynamicBatchNorm2d'),
+                 act_cfg_list: List = None,
+                 stride_list: List = None,
+                 with_se_list: List = None,
+                 norm_eval: bool = False,
+                 with_cp: bool = False,
+                 zero_init_residual: bool = True,
+                 fine_grained_mode: bool = False,
+                 with_attentive_shortcut: bool = True,
+                 init_cfg: Optional[Union[Dict, List[Dict]]] = None):
+
+        super().__init__(init_cfg)
+
+        self.arch_setting = arch_setting
+        self.widen_factor = widen_factor
+        self.out_indices = out_indices
+        for index in out_indices:
+            if index not in range(0, 8):
+                raise ValueError('the item in out_indices must in '
+                                 f'range(0, 8). But received {index}')
+        if frozen_stages not in range(-1, 8):
+            raise ValueError('frozen_stages must in range(-1, 8). '
+                             f'But received {frozen_stages}')
+        self.out_indices = out_indices
+        self.frozen_stages = frozen_stages
+
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.norm_eval = norm_eval
+        self.zero_init_residual = zero_init_residual
+        self.with_cp = with_cp
+        self.fine_grained_mode = fine_grained_mode
+        self.with_attentive_shortcut = with_attentive_shortcut
+
+        self.act_cfg_list = act_cfg_list if act_cfg_list \
+            else ['Swish'] * 9
+        self.stride_list = stride_list if stride_list \
+            else [1, 2, 2, 2, 1, 2, 1]
+        self.with_se_list = with_se_list if with_se_list \
+            else [False, False, True, False, True, True, True]
+
+        # adapt mutable settings
+        self.kernel_size_list = parse_values(self.arch_setting['kernel_size'])
+        self.num_blocks_list = parse_values(self.arch_setting['num_blocks'])
+        self.expand_ratio_list = \
+            parse_values(self.arch_setting['expand_ratio'])
+        self.num_channels_list = \
+            parse_values(self.arch_setting['num_out_channels'])
+
+        self.num_channels_list = [[
+            make_divisible(c * widen_factor, 8) for c in channels
+        ] for channels in self.num_channels_list]
+
+        self.first_act = self.act_cfg_list.pop(0)
+        self.last_act = self.act_cfg_list.pop(-1)
+
+        self.first_out_channels_list = self.num_channels_list.pop(0)
+        self.last_out_channels_list = self.num_channels_list.pop(-1)
+        self.last_expand_ratio_list = self.expand_ratio_list.pop(-1)
+        assert len(self.kernel_size_list) == len(self.num_blocks_list) == \
+            len(self.expand_ratio_list) == len(self.num_channels_list)
+
+        self.layers = self._make_layer()
+
+        self.register_mutables()
+
+    def _make_layer(self):
+        """Build multiple mobilenet layers."""
+        layers = []
+        self.in_channels = max(self.first_out_channels_list)
+
+        self.first_conv = ConvModule(
+            in_channels=3,
+            out_channels=self.in_channels,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            conv_cfg=self.conv_cfg,
+            norm_cfg=self.norm_cfg,
+            act_cfg=dict(type=self.first_act))
+
+        for i, (num_blocks, kernel_sizes, expand_ratios, num_channels) in \
+            enumerate(zip(self.num_blocks_list, self.kernel_size_list,
+                          self.expand_ratio_list, self.num_channels_list)):
+            inverted_res_layer = self._make_single_layer(
+                out_channels=num_channels,
+                num_blocks=num_blocks,
+                kernel_sizes=kernel_sizes,
+                expand_ratios=expand_ratios,
+                stride=self.stride_list[i],
+                use_se=self.with_se_list[i],
+                act=self.act_cfg_list[i])
+            layer_name = f'layer{i + 1}'
+            self.add_module(layer_name, inverted_res_layer)
+            layers.append(inverted_res_layer)
+
+        last_expand_channels = \
+            self.in_channels * max(self.last_expand_ratio_list)
+        self.out_channels = max(self.last_out_channels_list)
+        last_layers = Sequential(
+            OrderedDict([('final_expand_layer',
+                          ConvModule(
+                              in_channels=self.in_channels,
+                              out_channels=last_expand_channels,
+                              kernel_size=1,
+                              padding=0,
+                              conv_cfg=self.conv_cfg,
+                              norm_cfg=self.norm_cfg,
+                              act_cfg=dict(type=self.last_act))),
+                         ('pool', nn.AdaptiveAvgPool2d((1, 1))),
+                         ('feature_mix_layer',
+                          ConvModule(
+                              in_channels=last_expand_channels,
+                              out_channels=self.out_channels,
+                              kernel_size=1,
+                              padding=0,
+                              bias=False,
+                              conv_cfg=self.conv_cfg,
+                              norm_cfg=None,
+                              act_cfg=dict(type=self.last_act)))]))
+        self.add_module('last_conv', last_layers)
+        layers.append(last_layers)
+        return layers
+
+    def _make_single_layer(self, out_channels: List, num_blocks: List,
+                           kernel_sizes: List, expand_ratios: List,
+                           stride: int, act: str, use_se: bool):
+        """Stack InvertedResidual blocks (MBBlocks) to build a layer for
+        MobileNetV3.
+
+        Args:
+            out_channels (List): out_channels of block.
+            num_blocks (List): num of blocks.
+            kernel_sizes (List): num of kernel sizes.
+            expand_ratios (int): Expand the number of channels of the
+                hidden layer in InvertedResidual by this ratio.
+            stride (int): stride of the first block.
+            use_se (bool): Use SE layer in MBBlock or not.
+        """
+        _layers = []
+        for i in range(max(num_blocks)):
+            if i >= 1:
+                stride = 1
+            if use_se:
+                se_cfg = dict(
+                    act_cfg=(dict(type='ReLU'), dict(type='HSigmoid')),
+                    ratio=4,
+                    conv_cfg=self.conv_cfg)
+            else:
+                se_cfg = None  # type: ignore
+
+            mb_layer = MBBlock(
+                in_channels=self.in_channels,
+                out_channels=max(out_channels),
+                kernel_size=max(kernel_sizes),
+                stride=stride,
+                expand_ratio=max(expand_ratios),
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=dict(type=act),
+                with_cp=self.with_cp,
+                se_cfg=se_cfg,
+                with_attentive_shortcut=self.with_attentive_shortcut)
+
+            _layers.append(mb_layer)
+            self.in_channels = max(out_channels)
+
+        dynamic_seq = DynamicSequential(*_layers)
+        return dynamic_seq
+
+    def register_mutables(self):
+        """Mutate the BigNAS-style MobileNetV3."""
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        self.first_mutable_channels = OneShotMutableChannel(
+            alias='backbone.first_channels',
+            num_channels=max(self.first_out_channels_list),
+            candidate_choices=self.first_out_channels_list)
+
+        mutate_conv_module(
+            self.first_conv, mutable_out_channels=self.first_mutable_channels)
+
+        mid_mutable = self.first_mutable_channels
+        # mutate the built mobilenet layers
+        for i, layer in enumerate(self.layers[:-1]):
+            num_blocks = self.num_blocks_list[i]
+            kernel_sizes = self.kernel_size_list[i]
+            expand_ratios = self.expand_ratio_list[i]
+            out_channels = self.num_channels_list[i]
+
+            prefix = 'backbone.layers.' + str(i + 1) + '.'
+
+            mutable_out_channels = OneShotMutableChannel(
+                alias=prefix + 'out_channels',
+                candidate_choices=out_channels,
+                num_channels=max(out_channels))
+
+            if not self.fine_grained_mode:
+                mutable_kernel_size = OneShotMutableValue(
+                    alias=prefix + 'kernel_size', value_list=kernel_sizes)
+
+                mutable_expand_ratio = OneShotMutableValue(
+                    alias=prefix + 'expand_ratio', value_list=expand_ratios)
+
+            mutable_depth = OneShotMutableValue(
+                alias=prefix + 'depth', value_list=num_blocks)
+            layer.register_mutable_attr('depth', mutable_depth)
+
+            for k in range(max(self.num_blocks_list[i])):
+
+                if self.fine_grained_mode:
+                    mutable_kernel_size = OneShotMutableValue(
+                        alias=prefix + str(k) + '.kernel_size',
+                        value_list=kernel_sizes)
+
+                    mutable_expand_ratio = OneShotMutableValue(
+                        alias=prefix + str(k) + '.expand_ratio',
+                        value_list=expand_ratios)
+
+                mutate_mobilenet_layer(layer[k], mid_mutable,
+                                       mutable_out_channels,
+                                       mutable_expand_ratio,
+                                       mutable_kernel_size,
+                                       self.fine_grained_mode)
+                mid_mutable = mutable_out_channels
+
+        self.last_mutable_channels = OneShotMutableChannel(
+            alias='backbone.last_channels',
+            num_channels=self.out_channels,
+            candidate_choices=self.last_out_channels_list)
+
+        last_mutable_expand_value = OneShotMutableValue(
+            value_list=self.last_expand_ratio_list,
+            default_value=max(self.last_expand_ratio_list))
+
+        derived_expand_channels = mid_mutable * last_mutable_expand_value
+        mutate_conv_module(
+            self.layers[-1].final_expand_layer,
+            mutable_in_channels=mid_mutable,
+            mutable_out_channels=derived_expand_channels)
+        mutate_conv_module(
+            self.layers[-1].feature_mix_layer,
+            mutable_in_channels=derived_expand_channels,
+            mutable_out_channels=self.last_mutable_channels)
+
+    def forward(self, x):
+        x = self.first_conv(x)
+        outs = []
+        for i, layer in enumerate(self.layers):
+            x = layer(x)
+            if i in self.out_indices:
+                outs.append(x)
+
+        return tuple(outs)
+
+    def train(self, mode=True):
+        super().train(mode)
+        self._freeze_stages()
+        if mode and self.norm_eval:
+            for m in self.modules():
+                if isinstance(m, _BatchNorm):
+                    m.eval()
+
+    def init_weights(self) -> None:
+        super().init_weights()
+
+        if self.zero_init_residual:
+            for name, module in self.named_modules():
+                if isinstance(module, MBBlock):
+                    if module.with_res_shortcut or \
+                            module.with_attentive_shortcut:
+                        norm_layer = module.linear_conv.norm
+                        constant_init(norm_layer, val=0)
+                        logger.debug(
+                            f'init {type(norm_layer)} of linear_conv in '
+                            f'`{name}` to zero')
+
+    def _freeze_stages(self):
+        if self.frozen_stages >= 0:
+            for param in self.first_conv.parameters():
+                param.requires_grad = False
+
+        for i in range(1, self.frozen_stages + 1):
+            layer = getattr(self, f'layer{i}')
+            layer.eval()
+            for param in layer.parameters():
+                param.requires_grad = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_shufflenet_v2.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_shufflenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..db9e300a407d67908ed818eb9e3b6dc5e85b4b9d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/searchable_shufflenet_v2.py
@@ -0,0 +1,225 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, List, Optional, Sequence, Tuple, Union
+
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+from mmengine.model import ModuleList, Sequential
+from mmengine.model.weight_init import constant_init, normal_init
+from torch import Tensor
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models.backbones.base_backbone import BaseBackbone
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseBackbone = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class SearchableShuffleNetV2(BaseBackbone):
+    """Based on ShuffleNetV2 backbone.
+
+    Args:
+        arch_setting (list[list]): Architecture settings.
+        stem_multiplier (int): Stem multiplier - adjusts the number of
+            channels in the first layer. Default: 1.
+        widen_factor (float): Width multiplier - adjusts the number of
+            channels in each layer by this amount. Default: 1.0.
+        out_indices (Sequence[int]): Output from which stages.
+            Default: (4, ).
+        frozen_stages (int): Stages to be frozen (all param fixed).
+            Default: -1, which means not freezing any parameters.
+        with_last_layer (bool): Whether is last layer.
+            Default: True, which means not need to add `Placeholder``.
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Default: dict(type='ReLU').
+        norm_eval (bool): Whether to set norm layers to eval mode, namely,
+            freeze running stats (mean and var). Note: Effect on Batch Norm
+            and its variants only. Default: False.
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Default: False.
+        init_cfg (dict | list[dict], optional): initialization configuration
+            dict to define initializer. OpenMMLab has implemented
+            6 initializers, including ``Constant``, ``Xavier``, ``Normal``,
+            ``Uniform``, ``Kaiming``, and ``Pretrained``.
+
+    Examples:
+        >>> mutable_cfg = dict(
+        ...     type='OneShotMutableOP',
+        ...     candidates=dict(
+        ...         shuffle_3x3=dict(
+        ...             type='ShuffleBlock',
+        ...             kernel_size=3,
+        ...             norm_cfg=dict(type='BN'))))
+        >>> arch_setting = [
+        ...     # Parameters to build layers. 3 parameters are needed to
+        ...     # construct a layer, from left to right:
+        ...     # channel, num_blocks, mutable cfg.
+        ...     [64, 4, mutable_cfg],
+        ...     [160, 4, mutable_cfg],
+        ...     [320, 8, mutable_cfg],
+        ...     [640, 4, mutable_cfg]
+        ... ]
+        >>> model = SearchableShuffleNetV2(arch_setting=arch_setting)
+    """
+
+    def __init__(self,
+                 arch_setting: List[List],
+                 stem_multiplier: int = 1,
+                 widen_factor: float = 1.0,
+                 out_indices: Sequence[int] = (4, ),
+                 frozen_stages: int = -1,
+                 with_last_layer: bool = True,
+                 conv_cfg: Optional[Dict] = None,
+                 norm_cfg: Dict = dict(type='BN'),
+                 act_cfg: Dict = dict(type='ReLU'),
+                 norm_eval: bool = False,
+                 with_cp: bool = False,
+                 init_cfg: Optional[Union[Dict, List[Dict]]] = None) -> None:
+        layers_nums = 5 if with_last_layer else 4
+        for index in out_indices:
+            if index not in range(0, layers_nums):
+                raise ValueError('the item in out_indices must in '
+                                 f'range(0, 5). But received {index}')
+
+        self.frozen_stages = frozen_stages
+        if frozen_stages not in range(-1, layers_nums):
+            raise ValueError('frozen_stages must be in range(-1, 5). '
+                             f'But received {frozen_stages}')
+
+        super().__init__(init_cfg)
+
+        self.arch_setting = arch_setting
+        self.widen_factor = widen_factor
+        self.out_indices = out_indices
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.norm_eval = norm_eval
+        self.with_cp = with_cp
+
+        last_channels = 1024
+        self.in_channels = 16 * stem_multiplier
+        self.conv1 = ConvModule(
+            in_channels=3,
+            out_channels=self.in_channels,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+
+        self.layers = ModuleList()
+        for channel, num_blocks, mutable_cfg in arch_setting:
+            out_channels = round(channel * widen_factor)
+            layer = self._make_layer(out_channels, num_blocks,
+                                     copy.deepcopy(mutable_cfg))
+            self.layers.append(layer)
+
+        if with_last_layer:
+            self.layers.append(
+                ConvModule(
+                    in_channels=self.in_channels,
+                    out_channels=last_channels,
+                    kernel_size=1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg))
+
+    def _make_layer(self, out_channels: int, num_blocks: int,
+                    mutable_cfg: Dict) -> Sequential:
+        """Stack mutable blocks to build a layer for ShuffleNet V2.
+
+        Note:
+            Here we use ``module_kwargs`` to pass dynamic parameters such as
+            ``in_channels``, ``out_channels`` and ``stride``
+            to build the mutable.
+
+        Args:
+            out_channels (int): out_channels of the block.
+            num_blocks (int): number of blocks.
+            mutable_cfg (dict): Config of mutable.
+
+        Returns:
+            mmengine.model.Sequential: The layer made.
+        """
+        layers = []
+        for i in range(num_blocks):
+            stride = 2 if i == 0 else 1
+
+            mutable_cfg.update(
+                module_kwargs=dict(
+                    in_channels=self.in_channels,
+                    out_channels=out_channels,
+                    stride=stride))
+            layers.append(MODELS.build(mutable_cfg))
+            self.in_channels = out_channels
+
+        return Sequential(*layers)
+
+    def _freeze_stages(self) -> None:
+        """Freeze params not to update in the specified stages."""
+        if self.frozen_stages >= 0:
+            for param in self.conv1.parameters():
+                param.requires_grad = False
+
+        for i in range(self.frozen_stages):
+            m = self.layers[i]
+            m.eval()
+            for param in m.parameters():
+                param.requires_grad = False
+
+    def init_weights(self) -> None:
+        """Init weights of ``SearchableShuffleNetV2``."""
+        super().init_weights()
+
+        if (isinstance(self.init_cfg, dict)
+                and self.init_cfg['type'] == 'Pretrained'):
+            # Suppress default init if use pretrained model.
+            return
+
+        for name, m in self.named_modules():
+            if isinstance(m, nn.Conv2d):
+                if 'conv1' in name:
+                    normal_init(m, mean=0, std=0.01)
+                else:
+                    normal_init(m, mean=0, std=1.0 / m.weight.shape[1])
+            elif isinstance(m, (_BatchNorm, nn.GroupNorm)):
+                constant_init(m, val=1, bias=0.0001)
+                if isinstance(m, _BatchNorm):
+                    if m.running_mean is not None:
+                        nn.init.constant_(m.running_mean, 0)
+
+    def forward(self, x: Tensor) -> Tuple[Tensor, ...]:
+        """Forward computation.
+
+        Args:
+            x (tensor): x contains input data for forward computation.
+        """
+        x = self.conv1(x)
+
+        outs = []
+        for i, layer in enumerate(self.layers):
+            x = layer(x)
+            if i in self.out_indices:
+                outs.append(x)
+
+        return tuple(outs)
+
+    def train(self, mode: bool = True) -> None:
+        """Set module status before forward computation."""
+        super().train(mode)
+
+        self._freeze_stages()
+        if mode and self.norm_eval:
+            for m in self.modules():
+                if isinstance(m, _BatchNorm):
+                    m.eval()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/wideresnet.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/wideresnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..5350d8f7b3e274f8d819ca95a7f89f313e011300
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/backbones/wideresnet.py
@@ -0,0 +1,403 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# This file is modified from `mmcls.models.backbones.resnet`
+
+import warnings
+from typing import Dict, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcv.cnn import build_conv_layer, build_norm_layer
+from mmengine.model import BaseModule
+from mmengine.model.weight_init import constant_init
+
+from mmrazor.registry import MODELS
+
+
+class BasicBlock(nn.Module):
+    """BasicBlock for WideResNet. The differences from ResNet are in:
+     1. The forward path
+     2. The position of residual path
+     3. Different downsample
+
+    Args:
+        in_channels (int): Input channels of this block.
+        out_channels (int): Output channels of this block.
+        expansion (int): The ratio of ``out_channels/mid_channels`` where
+            ``mid_channels`` is the output channels of conv1. This is a
+            reserved argument in BasicBlock and should always be 1. Default: 1.
+        stride (int): stride of the block. Default: 1
+        stride (int): stride of the block. Default: 1
+        dilation (int): dilation of convolution. Default: 1
+        downsample (nn.Module, optional): downsample operation on identity
+            branch. Default: None.
+        droprate (float, optional): droprate of the block. Defaults to 0.
+        conv_cfg (dict, optional): dictionary to construct and config conv
+            layer. Default: None
+        norm_cfg (dict): dictionary to construct and config norm layer.
+            Default: dict(type='BN')
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        expansion: int = 1,
+        stride: int = 1,
+        dilation: int = 1,
+        downsample: nn.Module = None,
+        droprate: float = 0,
+        conv_cfg: Dict = None,
+        norm_cfg: Dict = dict(type='BN')
+    ) -> None:  # noqa: E125
+        super(BasicBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.expansion = expansion
+        self.stride = stride
+        self.dilation = dilation
+        self.droprate = droprate
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+
+        self.norm1_name, norm1 = build_norm_layer(
+            norm_cfg, in_channels, postfix=1)
+        self.norm2_name, norm2 = build_norm_layer(
+            norm_cfg, out_channels, postfix=2)
+
+        self.add_module(self.norm1_name, norm1)
+        self.relu1 = nn.ReLU(inplace=True)
+        self.conv1 = build_conv_layer(
+            conv_cfg,
+            in_channels,
+            out_channels,
+            3,
+            stride=stride,
+            padding=dilation,
+            dilation=dilation,
+            bias=False)
+
+        self.add_module(self.norm2_name, norm2)
+        self.relu2 = nn.ReLU(inplace=True)
+        self.conv2 = build_conv_layer(
+            conv_cfg,
+            out_channels,
+            out_channels,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias=False)
+        self.downsample = downsample
+
+    @property
+    def norm1(self):
+        return getattr(self, self.norm1_name)
+
+    @property
+    def norm2(self):
+        return getattr(self, self.norm2_name)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """forward func.
+
+        Args:
+            x (torch.Tensor): input.
+
+        Returns:
+            torch.Tensor: output.
+        """
+
+        identity = self.relu1(self.bn1(x))
+        out = self.conv1(identity)
+        out = self.bn2(out)
+        out = self.relu2(out)
+        if self.droprate > 0:
+            out = F.dropout(out, p=self.droprate, training=self.training)
+        out = self.conv2(out)
+        if self.downsample:
+            out += self.downsample(identity)
+        else:
+            out += x
+        return out
+
+
+def get_expansion(block: nn.Module,
+                  widen_factor: int,
+                  expansion: int = None) -> int:
+    """Get the expansion of a residual block.
+    The block expansion will be obtained by the following order:
+    1. If ``expansion`` is given, just return it.
+    2. If ``block`` has the attribute ``expansion``, then return
+       ``block.expansion``.
+    3. If ``block`` is ``BaseBlock``, then return ``widen_factor``.
+    3. Return the default value according the the block type:
+       4 for ``Bottleneck``.
+
+    Args:
+        block (class): The block class.
+        widen_factor (int): The given widen factor.
+        expansion (int | None): The given expansion ratio.
+    Returns:
+        int: The expansion of the block.
+    """
+    if isinstance(expansion, int):
+        assert expansion > 0
+    elif expansion is None:
+        if hasattr(block, 'expansion'):
+            expansion = block.expansion
+        elif issubclass(block, BasicBlock):
+            expansion = widen_factor
+        else:
+            raise TypeError(f'expansion is not specified for {block.__name__}')
+    else:
+        raise TypeError('expansion must be an integer or None')
+
+    return expansion
+
+
+class ResLayer(nn.Sequential):
+    """ResLayer to build ResNet style backbone.
+
+    Args:
+        block (nn.Module): Residual block used to build ResLayer.
+        num_blocks (int): Number of blocks.
+        in_channels (int): Input channels of this block.
+        out_channels (int): Output channels of this block.
+        expansion (int): The expansion for BasicBlock/Bottleneck.
+            If not specified, it will firstly be obtained via
+            ``block.expansion``. If the block has no attribute "expansion",
+            the following default values will be used: 1 for BasicBlock and
+            4 for Bottleneck.
+        droprate (float, optional): droprate of the layer. Defaults to 0.
+        stride (int): stride of the first block. Default: 1.
+        conv_cfg (Dict, optional): dictionary to construct and config conv
+            layer. Default: None
+        norm_cfg (Dict): dictionary to construct and config norm layer.
+            Default: dict(type='BN')
+    """
+
+    def __init__(self,
+                 block: nn.Module,
+                 num_blocks: int,
+                 in_channels: int,
+                 out_channels: int,
+                 expansion: int,
+                 droprate: float = 0,
+                 stride: int = 1,
+                 conv_cfg: Dict = None,
+                 norm_cfg: Dict = dict(type='BN'),
+                 **kwargs):
+        self.block = block
+        self.droprate = droprate
+        self.expansion = expansion
+
+        downsample = None
+        if stride != 1 or in_channels != out_channels:
+            downsample = build_conv_layer(
+                conv_cfg,
+                in_channels,
+                out_channels,
+                kernel_size=1,
+                stride=stride,
+                bias=False)
+
+        layers = []
+        layers.append(
+            block(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                expansion=self.expansion,
+                stride=stride,
+                downsample=downsample,
+                conv_cfg=conv_cfg,
+                norm_cfg=norm_cfg,
+                **kwargs))
+        in_channels = out_channels
+        for _ in range(1, num_blocks):
+            layers.append(
+                block(
+                    in_channels=in_channels,
+                    out_channels=out_channels,
+                    expansion=self.expansion,
+                    stride=1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    **kwargs))
+        super(ResLayer, self).__init__(*layers)
+
+
+@MODELS.register_module()
+class WideResNet(BaseModule):
+    """WideResNet backbone. Only support 3-stage WideResNet, which is usually
+    for tiny images. E.g., CIFAR10 and CIFAR100.
+
+    WRN50 and WRN101 are now officially supported in
+    MMClassification. See link below:
+    https://github.com/open-mmlab/mmclassification/pull/715
+
+    Please refer to the `paper <https://arxiv.org/abs/1605.07146>`__ for
+    details.
+
+    Args:
+        depth (int): Network depth, from {10, 16, 22, 28, 40, 50, 101, 152}.
+        widen_factor (int):  Width multiplier of mid-channel in blocks.
+        in_channels (int): Number of input image channels. Default: 3.
+        stem_channels (int): Output channels of the stem layer. Default: 64.
+        base_channels (int): Middle channels of the first stage. Default: 64.
+        num_stages (int): Stages of the network. Default: 4.
+        strides (Sequence[int]): Strides of the first block of each stage.
+            Default: ``(1, 2, 2, 2)``.
+        dilations (Sequence[int]): Dilation of each stage.
+            Default: ``(1, 1, 1, 1)``.
+        frozen_stages (int): Stages to be frozen (stop grad and set eval mode).
+            -1 means not freezing any parameters. Default: -1.
+        conv_cfg (dict | None): The config dict for conv layers. Default: None.
+        norm_cfg (dict): The config dict for norm layers.
+        norm_eval (bool): Whether to set norm layers to eval mode, namely,
+            freeze running stats (mean and var). Note: Effect on Batch Norm
+            and its variants only. Default: False.
+        zero_init_residual (bool): Whether to use zero init for last norm layer
+            in resblocks to let them behave as identity. Default: True.
+    """
+    arch_setting = {
+        10: (BasicBlock, (1, 1, 1)),
+        16: (BasicBlock, (2, 2, 2)),
+        22: (BasicBlock, (3, 3, 3)),
+        28: (BasicBlock, (4, 4, 4)),
+        40: (BasicBlock, (6, 6, 6)),
+    }
+
+    def __init__(self,
+                 depth: int,
+                 widen_factor: int = 4,
+                 in_channels: int = 3,
+                 stem_channels: int = 16,
+                 base_channels: int = 16,
+                 expansion: int = None,
+                 num_stages: int = 3,
+                 strides: Tuple[int, ...] = (1, 2, 2),
+                 dilations: Tuple[int, ...] = (1, 1, 1),
+                 frozen_stages: int = -1,
+                 conv_cfg: Dict = None,
+                 norm_cfg: Dict = dict(type='BN', requires_grad=True),
+                 norm_eval: bool = False,
+                 zero_init_residual: bool = False,
+                 init_cfg=[
+                     dict(type='Kaiming', layer=['Conv2d']),
+                     dict(
+                         type='Constant',
+                         val=1,
+                         layer=['_BatchNorm', 'GroupNorm'])
+                 ]):
+        super(WideResNet, self).__init__(init_cfg)
+        if depth > 40:
+            """MMClassication now supports WRN-50 and 101 officially.
+
+            Refer to:
+            https://github.com/open-mmlab/mmclassification/pull/715/files
+            """
+            warnings.warn('`WiderResNet` deep than 40 now is deprecated')
+        if depth not in self.arch_setting:
+            raise KeyError(f'invalid depth {depth} for WideResNet')
+        self.depth = depth
+        self.widen_factor = widen_factor
+        self.stem_channels = stem_channels
+        self.base_channels = base_channels
+        self.num_stages = num_stages
+        self.strides = strides
+        self.dilations = dilations
+        self.frozen_stages = frozen_stages
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.norm_eval = norm_eval
+        self.zero_init_residual = zero_init_residual
+        self.block, stage_blocks = self.arch_setting[depth]
+        self.stage_blocks = stage_blocks[:num_stages]
+        self.expansion = get_expansion(self.block, widen_factor, expansion)
+
+        self._make_stem_layer(in_channels, stem_channels)
+
+        self.res_layers = []
+        _in_channels = stem_channels
+        _out_channels = base_channels * self.expansion
+        for i, num_blocks in enumerate(self.stage_blocks):
+            stride = strides[i]
+            dilation = dilations[i]
+            res_layer = self.make_res_layer(
+                block=self.block,
+                num_blocks=num_blocks,
+                in_channels=_in_channels,
+                out_channels=_out_channels,
+                expansion=self.expansion,
+                stride=stride,
+                dilation=dilation,
+                conv_cfg=conv_cfg,
+                norm_cfg=norm_cfg,
+            )
+            _in_channels = _out_channels
+            _out_channels *= 2
+            layer_name = f'layer{i + 1}'
+            self.add_module(layer_name, res_layer)
+            self.res_layers.append(layer_name)
+
+        self._freeze_stages()
+
+        self.feat_dim = res_layer[-1].out_channels
+
+        self.norm1_name, norm1 = build_norm_layer(
+            self.norm_cfg, _out_channels // 2, postfix=1)
+        self.add_module(self.norm1_name, norm1)
+        self.relu = nn.ReLU(inplace=True)
+
+    @property
+    def norm1(self):
+        return getattr(self, self.norm1_name)
+
+    def make_res_layer(self, **kwargs):
+        return ResLayer(**kwargs)
+
+    def _make_stem_layer(self, in_channels, base_channels):
+        self.conv1 = build_conv_layer(
+            self.conv_cfg,
+            in_channels,
+            base_channels,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias=False)
+
+    def _freeze_stages(self):
+        if self.frozen_stages >= 0:
+            self.norm1.eval()
+            for m in [self.conv1, self.norm1]:
+                for param in m.parameters():
+                    param.requires_grad = False
+
+        for i in range(1, self.frozen_stages + 1):
+            m = getattr(self, f'layer{i}')
+            m.eval()
+            for param in m.parameters():
+                param.requires_grad = False
+
+    def init_weights(self):
+        super(WideResNet, self).init_weights()
+
+        if (isinstance(self.init_cfg, dict)
+                and self.init_cfg['type'] == 'Pretrained'):
+            # Suppress zero_init_residual if use pretrained model.
+            return
+
+        if self.zero_init_residual:
+            for m in self.modules():
+                if isinstance(m, BasicBlock):
+                    constant_init(m.norm2, 0)
+
+    def forward(self, x):
+        # TODO: return multi-stage features.
+        x = self.conv1(x)
+        for layer_name in self.res_layers:
+            res_layer = getattr(self, layer_name)
+            x = res_layer(x)
+        x = self.norm1(x)
+        x = self.relu(x)
+        return tuple([x])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6bbd245ffe95ec628c9df21238b0f7a13af603ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .image import SearchableImageClassifier
+
+__all__ = ['SearchableImageClassifier']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/image.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/image.py
new file mode 100644
index 0000000000000000000000000000000000000000..016d40e7a07c9e871e6cc6846f70a1ed7d6d908b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/classifiers/image.py
@@ -0,0 +1,104 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+try:
+    from mmcls.models import ImageClassifier
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ImageClassifier = get_placeholder('mmcls')
+from torch import Tensor
+
+from mmrazor.models.architectures.dynamic_ops import DynamicInputResizer
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class SearchableImageClassifier(ImageClassifier):
+    """SearchableImageClassifier for sliceable networks.
+
+    Args:
+        backbone (dict): The same as ImageClassifier.
+        neck (dict, optional): The same as ImageClassifier. Defaults to None.
+        head (dict, optional): The same as ImageClassifier. Defaults to None.
+        pretrained (dict, optional): The same as ImageClassifier. Defaults to
+            None.
+        train_cfg (dict, optional): The same as ImageClassifier. Defaults to
+            None.
+        data_preprocessor (dict, optional): The same as ImageClassifier.
+            Defaults to None.
+        init_cfg (dict, optional): The same as ImageClassifier. Defaults to
+            None.
+        input_resizer_cfg (dict, optional): Configs for a input resizer, which
+            is designed for dynamically changing the input size, making the
+            input size as a searchable part. Defaults to None.
+        connect_head (dict, optional): Dimensions are aligned in head will be
+            substitute to it's `str type` value, so that search_space of the
+            first components can be connets to the next. e.g:
+            {'connect_with_backbone': 'backbone.last_mutable'} means that
+            func:`connect_with_backbone` will be substitute to backbones
+            last_mutable. Defaults to None.
+    """
+
+    def __init__(self,
+                 backbone: dict,
+                 neck: Optional[dict] = None,
+                 head: Optional[dict] = None,
+                 pretrained: Optional[str] = None,
+                 train_cfg: Optional[dict] = None,
+                 data_preprocessor: Optional[dict] = None,
+                 init_cfg: Optional[dict] = None,
+                 input_resizer_cfg: Optional[dict] = None,
+                 connect_head: Optional[dict] = None):
+        super().__init__(backbone, neck, head, pretrained, train_cfg,
+                         data_preprocessor, init_cfg)
+
+        if self.with_head and connect_head is not None:
+            for kh, vh in connect_head.items():
+                component, attr = vh.split('.')
+                value = getattr(getattr(self, component), attr)
+                getattr(self.head, kh)(value)
+
+        if input_resizer_cfg is not None:
+            input_resizer: Optional[DynamicInputResizer] = \
+                self._build_input_resizer(input_resizer_cfg)
+        else:
+            input_resizer = None
+        self.input_resizer = input_resizer
+
+    def extract_feat(self,
+                     batch_inputs: Tensor,
+                     stage: str = 'neck',
+                     input_resizer: bool = True) -> Tensor:
+        """Extract features with resizing inputs first."""
+        if self.input_resizer is not None and input_resizer:
+            batch_inputs = self.input_resizer(batch_inputs)
+
+        return super().extract_feat(batch_inputs, stage)
+
+    def _build_input_resizer(self,
+                             input_resizer_cfg: Dict) -> DynamicInputResizer:
+        """Build a input resizer."""
+        mutable_shape_cfg = dict(type='OneShotMutableValue')
+
+        mutable_shape_cfg['alias'] = \
+            input_resizer_cfg.get('alias', 'input_shape')
+
+        assert 'input_sizes' in input_resizer_cfg and \
+            isinstance(input_resizer_cfg['input_sizes'][0], list), (
+                'input_resizer_cfg[`input_sizes`] should be List[list].')
+        mutable_shape_cfg['value_list'] = \
+            input_resizer_cfg.get('input_sizes')  # type: ignore
+
+        mutable_shape = MODELS.build(mutable_shape_cfg)
+
+        input_resizer = MODELS.build(dict(type='DynamicInputResizer'))
+        input_resizer.register_mutable_attr('shape', mutable_shape)
+
+        return input_resizer
+
+    def simple_test(self, img, img_metas=None, **kwargs):
+        """Test without augmentation."""
+        x = self.extract_feat(img, input_resizer=False)
+        res = self.head.simple_test(x, **kwargs)
+
+        return res
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd4c91e77def3c4d86c0503becc54910abc179b6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .byot_connector import BYOTConnector
+from .convmodule_connector import ConvModuleConnector
+from .crd_connector import CRDConnector
+from .factor_transfer_connectors import Paraphraser, Translator
+from .fbkd_connector import FBKDStudentConnector, FBKDTeacherConnector
+from .mgd_connector import MGDConnector
+from .norm_connector import NormConnector
+from .ofd_connector import OFDTeacherConnector
+from .torch_connector import TorchFunctionalConnector, TorchNNConnector
+
+__all__ = [
+    'ConvModuleConnector', 'Translator', 'Paraphraser', 'BYOTConnector',
+    'FBKDTeacherConnector', 'FBKDStudentConnector', 'TorchFunctionalConnector',
+    'CRDConnector', 'TorchNNConnector', 'OFDTeacherConnector', 'MGDConnector',
+    'NormConnector'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/base_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/base_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..629af19402371aef931d18f1777d1392f2116bcd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/base_connector.py
@@ -0,0 +1,43 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import Dict, Optional, Tuple, Union
+
+import torch
+from mmengine.model import BaseModule
+
+
+class BaseConnector(BaseModule, metaclass=ABCMeta):
+    """Base class of connectors.
+
+    Connector is mainly used for distillation, it usually converts the channel
+    number of input feature to align features of student and teacher.
+
+    All subclasses should implement the following APIs:
+
+    - ``forward_train()``
+
+    Args:
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self, init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg=init_cfg)
+
+    def forward(self, feature: torch.Tensor) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            feature (torch.Tensor): Input feature.
+        """
+        return self.forward_train(feature)
+
+    @abstractmethod
+    def forward_train(
+        self, feature: torch.Tensor
+    ) -> Union[Tuple[torch.Tensor, ...], torch.Tensor]:
+        """Abstract train computation.
+
+        Args:
+            feature (torch.Tensor): Input feature.
+        """
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/byot_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/byot_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..af27dfe44a123993342a34dd093c4d927829b052
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/byot_connector.py
@@ -0,0 +1,82 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from math import log
+from typing import Dict, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from ..ops.darts_series import DartsSepConv
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class BYOTConnector(BaseConnector):
+    """BYOTConnector connector that adds a self-attention with DartsSepConv.
+
+    Args:
+        in_channel (int): The input channel of the DartsSepConv.
+            Use like input_tensor_channel = in_channel * expansion.
+        out_channel (int): The output channel of the DartsSepConv.
+            Use like output_tensor_channel = out_channel * expansion.
+        num_classes (int): The classification class num.
+        expansion (int): Expansion of DartsSepConv. Default to 4.
+        pool_size (int | tuple[int]): Average 2D pool size. Default to 4.
+        kernel_size (int | tuple[int]): Size of the convolving kernel in
+            DartsSepConv. Same as that in ``nn._ConvNd``. Default to 3.
+        stride (int | tuple[int]): Stride of the first layer in DartsSepConv.
+            Same as that in ``nn._ConvNd``. Default to 1.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(
+        self,
+        in_channel: int,
+        out_channel: int,
+        num_classes: int,
+        expansion: int = 4,
+        pool_size: Union[int, Tuple[int]] = 4,
+        kernel_size: Union[int, Tuple[int]] = 3,
+        stride: Union[int, Tuple[int]] = 1,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(init_cfg)
+        self.attention = nn.Sequential(
+            DartsSepConv(
+                in_channels=in_channel * expansion,
+                out_channels=in_channel * expansion,
+                kernel_size=kernel_size,
+                stride=stride), nn.BatchNorm2d(in_channel * expansion),
+            nn.ReLU(), nn.Upsample(scale_factor=2, mode='bilinear'),
+            nn.Sigmoid())
+        scala_num = log(out_channel / in_channel, 2)
+        assert scala_num.is_integer()
+        scala = []
+
+        _in_channel = in_channel
+
+        for _ in range(int(scala_num)):
+            scala.append(
+                DartsSepConv(
+                    in_channels=_in_channel * expansion,
+                    out_channels=_in_channel * 2 * expansion,
+                    kernel_size=kernel_size,
+                    stride=stride))
+            _in_channel *= 2
+        scala.append(nn.AvgPool2d(pool_size))
+        self.scala = nn.Sequential(*scala)
+        self.fc = nn.Linear(out_channel * expansion, num_classes)
+
+    def forward_train(self, feature: torch.Tensor) -> Tuple[torch.Tensor, ...]:
+        """Forward computation.
+
+        Args:
+            feature (torch.Tensor): Input feature.
+        """
+        feat = self.attention(feature)
+        feat = feat * feature
+
+        feat = self.scala(feat)
+        feat = feat.view(feature.size(0), -1)
+        logits = self.fc(feat)
+        return (feat, logits)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/convmodule_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/convmodule_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..44d596377bf23e1db2b1274ce99f70b326f863e6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/convmodule_connector.py
@@ -0,0 +1,92 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple, Union
+
+import torch
+from mmcv.cnn import ConvModule
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class ConvModuleConnector(BaseConnector):
+    """Convolution connector that bundles conv/norm/activation layers.
+
+    Args:
+        in_channel (int): The input channel of the connector.
+        out_channel (int): The output channel of the connector.
+        kernel_size (int | tuple[int, int]): Size of the convolving kernel.
+            Same as that in ``nn._ConvNd``.
+        stride (int | tuple[int, int]): Stride of the convolution.
+            Same as that in ``nn._ConvNd``.
+        padding (int | tuple[int, int]): Zero-padding added to both sides of
+            the input. Same as that in ``nn._ConvNd``.
+        dilation (int | tuple[int, int]): Spacing between kernel elements.
+            Same as that in ``nn._ConvNd``.
+        groups (int): Number of blocked connections from input channels to
+            output channels. Same as that in ``nn._ConvNd``.
+        bias (bool | str): If specified as `auto`, it will be decided by the
+            norm_cfg. Bias will be set as True if `norm_cfg` is None, otherwise
+            False. Default: "auto".
+        conv_cfg (dict): Config dict for convolution layer. Default: None,
+            which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer. Default: None.
+        act_cfg (dict): Config dict for activation layer.
+            Default: dict(type='ReLU').
+        inplace (bool): Whether to use inplace mode for activation.
+            Default: True.
+        with_spectral_norm (bool): Whether use spectral norm in conv module.
+            Default: False.
+        padding_mode (str): If the `padding_mode` has not been supported by
+            current `Conv2d` in PyTorch, we will use our own padding layer
+            instead. Currently, we support ['zeros', 'circular'] with official
+            implementation and ['reflect'] with our own implementation.
+            Default: 'zeros'.
+        order (tuple[str]): The order of conv/norm/activation layers. It is a
+            sequence of "conv", "norm" and "act". Common examples are
+            ("conv", "norm", "act") and ("act", "conv", "norm").
+            Default: ('conv', 'norm', 'act').
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(
+        self,
+        in_channel: int,
+        out_channel: int,
+        kernel_size: Union[int, Tuple[int, int]] = 1,
+        stride: Union[int, Tuple[int, int]] = 1,
+        padding: Union[int, Tuple[int, int]] = 0,
+        dilation: Union[int, Tuple[int, int]] = 1,
+        groups: int = 1,
+        bias: Union[str, bool] = 'auto',
+        conv_cfg: Optional[Dict] = None,
+        norm_cfg: Optional[Dict] = None,
+        act_cfg: Dict = dict(type='ReLU'),
+        inplace: bool = True,
+        with_spectral_norm: bool = False,
+        padding_mode: str = 'zeros',
+        order: tuple = ('conv', 'norm', 'act'),
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(init_cfg)
+        self.conv_module = ConvModule(in_channel, out_channel, kernel_size,
+                                      stride, padding, dilation, groups, bias,
+                                      conv_cfg, norm_cfg, act_cfg, inplace,
+                                      with_spectral_norm, padding_mode, order)
+
+    def forward_train(self, feature: torch.Tensor) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            feature (torch.Tensor): Input feature.
+        """
+        for layer in self.conv_module.order:
+            if layer == 'conv':
+                if self.conv_module.with_explicit_padding:
+                    feature = self.conv_module.padding_layer(feature)
+                feature = self.conv_module.conv(feature)
+            elif layer == 'norm' and self.conv_module.with_norm:
+                feature = self.conv_module.norm(feature)
+            elif layer == 'act' and self.conv_module.with_activation:
+                feature = self.conv_module.activate(feature)
+        return feature
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/crd_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/crd_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..48648c75dc473fc29e7f8dfc6280d9225938b3d0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/crd_connector.py
@@ -0,0 +1,47 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class CRDConnector(BaseConnector):
+    """Connector with linear layer.
+
+    Args:
+        dim_in (int, optional): input channels. Defaults to 1024.
+        dim_out (int, optional): output channels. Defaults to 128.
+    """
+
+    def __init__(self,
+                 dim_in: int = 1024,
+                 dim_out: int = 128,
+                 **kwargs) -> None:
+        super(CRDConnector, self).__init__(**kwargs)
+        self.linear = nn.Linear(dim_in, dim_out)
+        self.l2norm = Normalize(2)
+
+    def forward_train(self, x: torch.Tensor) -> torch.Tensor:
+        x = x.view(x.size(0), -1)
+        x = self.linear(x)
+        x = self.l2norm(x)
+        return x
+
+
+class Normalize(nn.Module):
+    """normalization layer.
+
+    Args:
+        power (int, optional): power. Defaults to 2.
+    """
+
+    def __init__(self, power: int = 2) -> None:
+        super(Normalize, self).__init__()
+        self.power = power
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        norm = x.pow(self.power).sum(1, keepdim=True).pow(1. / self.power)
+        out = x.div(norm)
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/factor_transfer_connectors.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/factor_transfer_connectors.py
new file mode 100644
index 0000000000000000000000000000000000000000..536649003d1fe52dea6d0c5ec620e3bebdb74a4f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/factor_transfer_connectors.py
@@ -0,0 +1,133 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class Paraphraser(BaseConnector):
+    """Paraphrasing Complex Network: Network Compression via Factor Transfer,
+    NeurIPS 2018. https://arxiv.org/pdf/1802.04977.pdf.
+
+    teacher connector of FT.
+
+    Args:
+        in_channel ([int]): number of input channels.
+        out_channel ([int]): number of output channels.
+        use_bn (bool, optional): Defaults to False.
+        phase (str, optional): Training phase. Defaults to 'pretrain'.
+        use_bn (Optional[bool], optional): use BN or not. Defaults to False.
+        init_cfg (Optional[Dict], optional): The weight initialized config for
+            :class:`BaseModule`. Defaults to None.
+    """
+
+    def __init__(self,
+                 in_channel: int,
+                 out_channel: int,
+                 phase='pretrain',
+                 use_bn: Optional[bool] = False,
+                 init_cfg: Optional[Dict] = None) -> None:
+
+        super(Paraphraser, self).__init__(init_cfg)
+        self._build_modules(in_channel, out_channel, use_bn)
+
+        assert phase in ['pretrain', 'train'], f'Unexpect `phase`: {phase}'
+        self.phase = phase
+
+    def _build_modules(self,
+                       in_channel: int,
+                       out_channel: int,
+                       use_bn: Optional[bool] = False) -> None:
+        """A helper func to build internal modules.
+
+        Args:
+            in_channel (int): input channels
+            out_channel (int): output channels
+            use_bn (Optional[bool], optional): use BN or not.
+                Defaults to False.
+        """
+
+        self.encoder = nn.Sequential(
+            nn.Conv2d(in_channel, in_channel, 3, 1, 1),
+            nn.BatchNorm2d(in_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.Conv2d(in_channel, out_channel, 3, 1, 1),
+            nn.BatchNorm2d(out_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.Conv2d(out_channel, out_channel, 3, 1, 1),
+            nn.BatchNorm2d(out_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True))
+        self.decoder = nn.Sequential(
+            nn.ConvTranspose2d(out_channel, out_channel, 3, 1, 1),
+            nn.BatchNorm2d(out_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.ConvTranspose2d(out_channel, in_channel, 3, 1, 1),
+            nn.BatchNorm2d(in_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.ConvTranspose2d(in_channel, in_channel, 3, 1, 1),
+            nn.BatchNorm2d(in_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True))
+
+    def forward_train(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward func for training."""
+        with torch.no_grad():
+            factor = self.encoder(x)
+        return factor
+
+    def forward_pretrain(self, t_feat: torch.Tensor) -> torch.Tensor:
+        """Forward func for pretraining."""
+        factor = self.encoder(t_feat)
+        t_feat_rec = self.decoder(factor)
+
+        return t_feat_rec
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Omitted."""
+        if self.phase == 'train':
+            return self.forward_train(x)
+        elif self.phase == 'pretrain':
+            return self.forward_pretrain(x)
+        else:
+            raise NotImplementedError(
+                f'phase: `{self.phase}` is not supported.')
+
+
+@MODELS.register_module()
+class Translator(BaseConnector):
+    """Paraphrasing Complex Network: Network Compression via Factor Transfer,
+    NeurIPS 2018. https://arxiv.org/pdf/1802.04977.pdf.
+
+    student connector of FT.
+
+    Args:
+        in_channel ([int]): number of input channels.
+        out_channel ([int]): number of output channels.
+        use_bn (bool, optional): Defaults to False.
+        init_cfg (Optional[Dict], optional): The weight initialized config for
+            :class:`BaseModule`. Defaults to None.
+    """
+
+    def __init__(self,
+                 in_channel: int,
+                 out_channel: int,
+                 use_bn: Optional[bool] = True,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super(Translator, self).__init__(init_cfg)
+        self.encoder = nn.Sequential(
+            nn.Conv2d(in_channel, in_channel, 3, 1, 1),
+            nn.BatchNorm2d(in_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.Conv2d(in_channel, out_channel, 3, 1, 1),
+            nn.BatchNorm2d(out_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.Conv2d(out_channel, out_channel, 3, 1, 1),
+            nn.BatchNorm2d(out_channel) if use_bn else nn.Sequential(),
+            nn.LeakyReLU(0.1, inplace=True))
+
+    def forward_train(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward func for training."""
+        return self.encoder(x)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/fbkd_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/fbkd_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..db9007ee67c493856f15a1c87d5fa7377937c428
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/fbkd_connector.py
@@ -0,0 +1,298 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn as nn
+from mmcv.cnn import NonLocal2d
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+class NonLocal2dMaxpoolNstride(NonLocal2d):
+    """Nonlocal block for 2-dimension inputs, with a configurable
+    maxpool_stride.
+
+    This module is proposed in
+    "Non-local Neural Networks"
+    Paper reference: https://arxiv.org/abs/1711.07971
+    Code reference: https://github.com/AlexHex7/Non-local_pytorch
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        reduction (int): Channel reduction ratio. Defaults to 2.
+        conv_cfg (dict): The config dict for convolution layers.
+            Defaults to `nn.Conv2d`.
+        norm_cfg (dict): The config dict for normalization layers.
+            Defaults to `BN`. (This parameter is only applicable to conv_out.)
+        mode (str): Options are `gaussian`, `concatenation`,
+            `embedded_gaussian` and `dot_product`. Default: dot_product.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        maxpool_stride (int): The stride of the maxpooling module.
+            Defaults to 2.
+        zeros_init (bool): Whether to use zero to initialize weights of
+            `conv_out`. Defaults to True.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 reduction: int = 2,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 norm_cfg: Dict = dict(type='BN'),
+                 mode: str = 'embedded_gaussian',
+                 sub_sample: bool = False,
+                 maxpool_stride: int = 2,
+                 zeros_init: bool = True,
+                 **kwargs) -> None:
+        """Inits the NonLocal2dMaxpoolNstride module."""
+        super().__init__(
+            in_channels=in_channels,
+            sub_sample=sub_sample,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            reduction=reduction,
+            mode=mode,
+            zeros_init=zeros_init,
+            **kwargs)
+        self.norm_cfg = norm_cfg
+
+        if sub_sample:
+            max_pool_layer = nn.MaxPool2d(
+                kernel_size=(maxpool_stride, maxpool_stride))
+            self.g: nn.Sequential = nn.Sequential(self.g, max_pool_layer)
+            if self.mode != 'gaussian':
+                self.phi: nn.Sequential = nn.Sequential(
+                    self.phi, max_pool_layer)
+            else:
+                self.phi = max_pool_layer
+
+
+@MODELS.register_module()
+class FBKDStudentConnector(BaseConnector):
+    """Improve Object Detection with Feature-based Knowledge Distillation:
+    Towards Accurate and Efficient Detectors, ICLR2021.
+    https://openreview.net/pdf?id=uKhGRvM8QNH.
+
+    Student connector for FBKD.
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        reduction (int): Channel reduction ratio. Defaults to 2.
+        conv_cfg (dict): The config dict for convolution layers.
+            Defaults to `nn.Conv2d`.
+        norm_cfg (dict): The config dict for normalization layers.
+            Defaults to `BN`. (This parameter is only applicable to conv_out.)
+        mode (str): Options are `gaussian`, `concatenation`,
+            `embedded_gaussian` and `dot_product`. Default: dot_product.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        maxpool_stride (int): The stride of the maxpooling module.
+            Defaults to 2.
+        zeros_init (bool): Whether to use zero to initialize weights of
+            `conv_out`. Defaults to True.
+        spatial_T (float): Temperature used in spatial-wise pooling.
+            Defaults to 0.5.
+        channel_T (float): Temperature used in channel-wise pooling.
+            Defaults to 0.5.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 reduction: int = 2,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 norm_cfg: Dict = dict(type='BN'),
+                 mode: str = 'dot_product',
+                 sub_sample: bool = False,
+                 maxpool_stride: int = 2,
+                 zeros_init: bool = True,
+                 spatial_T: float = 0.5,
+                 channel_T: float = 0.5,
+                 init_cfg: Optional[Dict] = None,
+                 **kwargs) -> None:
+        """Inits the FBKDStuConnector."""
+        super().__init__(init_cfg)
+        self.channel_wise_adaptation = nn.Linear(in_channels, in_channels)
+
+        self.spatial_wise_adaptation = nn.Conv2d(
+            1, 1, kernel_size=3, stride=1, padding=1)
+
+        self.adaptation_layers = nn.Conv2d(
+            in_channels, in_channels, kernel_size=1, stride=1, padding=0)
+
+        self.student_non_local = NonLocal2dMaxpoolNstride(
+            in_channels=in_channels,
+            reduction=reduction,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            mode=mode,
+            sub_sample=sub_sample,
+            maxpool_stride=maxpool_stride,
+            zeros_init=zeros_init,
+            **kwargs)
+
+        self.non_local_adaptation = nn.Conv2d(
+            in_channels, in_channels, kernel_size=1, stride=1, padding=0)
+
+        self.in_channels = in_channels
+        self.spatial_T = spatial_T
+        self.channel_T = channel_T
+
+    def forward_train(self, x: torch.Tensor) -> Tuple[torch.Tensor, ...]:
+        """Frorward function for training.
+
+        Args:
+            x (torch.Tensor): Input student features.
+
+        Returns:
+            s_spatial_mask (torch.Tensor): Student spatial-wise mask.
+            s_channel_mask (torch.Tensor): Student channel-wise mask.
+            s_feat_adapt (torch.Tensor): Adaptative student feature.
+            s_channel_pool_adapt (torch.Tensor): Student feature which through
+                channel-wise pooling and adaptation_layers.
+            s_spatial_pool_adapt (torch.Tensor): Student feature which through
+                spatial-wise pooling and adaptation_layers.
+            s_relation_adapt (torch.Tensor): Adaptative student relations.
+        """
+        # Calculate spatial-wise mask.
+        s_spatial_mask = torch.mean(torch.abs(x), [1], keepdim=True)
+        size = s_spatial_mask.size()
+        s_spatial_mask = s_spatial_mask.view(x.size(0), -1)
+
+        # Soften or sharpen the spatial-wise mask by temperature.
+        s_spatial_mask = torch.softmax(
+            s_spatial_mask / self.spatial_T, dim=1) * size[-1] * size[-2]
+        s_spatial_mask = s_spatial_mask.view(size)
+
+        # Calculate channel-wise mask.
+        s_channel_mask = torch.mean(torch.abs(x), [2, 3], keepdim=True)
+        channel_mask_size = s_channel_mask.size()
+        s_channel_mask = s_channel_mask.view(x.size(0), -1)
+
+        # Soften or sharpen the channel-wise mask by temperature.
+        s_channel_mask = torch.softmax(
+            s_channel_mask / self.channel_T, dim=1) * self.in_channels
+        s_channel_mask = s_channel_mask.view(channel_mask_size)
+
+        # Adaptative and pool student feature through channel-wise.
+        s_feat_adapt = self.adaptation_layers(x)
+        s_channel_pool_adapt = self.channel_wise_adaptation(
+            torch.mean(x, [2, 3]))
+
+        # Adaptative and pool student feature through spatial-wise.
+        s_spatial_pool = torch.mean(x, [1]).view(
+            x.size(0), 1, x.size(2), x.size(3))
+        s_spatial_pool_adapt = self.spatial_wise_adaptation(s_spatial_pool)
+
+        # Calculate non_local_adaptation.
+        s_relation = self.student_non_local(x)
+        s_relation_adapt = self.non_local_adaptation(s_relation)
+
+        return (s_spatial_mask, s_channel_mask, s_channel_pool_adapt,
+                s_spatial_pool_adapt, s_relation_adapt, s_feat_adapt)
+
+
+@MODELS.register_module()
+class FBKDTeacherConnector(BaseConnector):
+    """Improve Object Detection with Feature-based Knowledge Distillation:
+    Towards Accurate and Efficient Detectors, ICLR2021.
+    https://openreview.net/pdf?id=uKhGRvM8QNH.
+
+    Teacher connector for FBKD.
+
+    Args:
+        in_channels (int): Channels of the input feature map.
+        reduction (int): Channel reduction ratio. Defaults to 2.
+        conv_cfg (dict): The config dict for convolution layers.
+            Defaults to `nn.Conv2d`.
+        norm_cfg (dict): The config dict for normalization layers.
+            Defaults to `BN`. (This parameter is only applicable to conv_out.)
+        mode (str): Options are `gaussian`, `concatenation`,
+            `embedded_gaussian` and `dot_product`. Default: dot_product.
+        sub_sample (bool): Whether to apply max pooling after pairwise
+            function (Note that the `sub_sample` is applied on spatial only).
+            Default: False.
+        maxpool_stride (int): The stride of the maxpooling module.
+            Defaults to 2.
+        zeros_init (bool): Whether to use zero to initialize weights of
+            `conv_out`. Defaults to True.
+        spatial_T (float): Temperature used in spatial-wise pooling.
+            Defaults to 0.5.
+        channel_T (float): Temperature used in channel-wise pooling.
+            Defaults to 0.5.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self,
+                 in_channels,
+                 reduction=2,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 norm_cfg: Dict = dict(type='BN'),
+                 mode: str = 'dot_product',
+                 sub_sample: bool = False,
+                 maxpool_stride: int = 2,
+                 zeros_init: bool = True,
+                 spatial_T: float = 0.5,
+                 channel_T: float = 0.5,
+                 init_cfg: Optional[Dict] = None,
+                 **kwargs) -> None:
+        super().__init__(init_cfg)
+        self.teacher_non_local = NonLocal2dMaxpoolNstride(
+            in_channels=in_channels,
+            reduction=reduction,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            mode=mode,
+            sub_sample=sub_sample,
+            maxpool_stride=maxpool_stride,
+            zeros_init=zeros_init,
+            **kwargs)
+
+        self.in_channels = in_channels
+        self.spatial_T = spatial_T
+        self.channel_T = channel_T
+
+    def forward_train(self, x: torch.Tensor) -> Tuple[torch.Tensor, ...]:
+        """Frorward function for training.
+
+        Args:
+            x (torch.Tensor): Input teacher features.
+
+        Returns:
+            t_spatial_mask (torch.Tensor): Teacher spatial-wise mask.
+            t_channel_mask (torch.Tensor): Teacher channel-wise mask.
+            t_spatial_pool (torch.Tensor): Teacher features which through
+                spatial-wise pooling.
+            t_relation (torch.Tensor): Teacher relation matrix.
+        """
+        # Calculate spatial-wise mask.
+        t_spatial_mask = torch.mean(torch.abs(x), [1], keepdim=True)
+        size = t_spatial_mask.size()
+        t_spatial_mask = t_spatial_mask.view(x.size(0), -1)
+
+        # Soften or sharpen the spatial-wise mask by temperature.
+        t_spatial_mask = torch.softmax(
+            t_spatial_mask / self.spatial_T, dim=1) * size[-1] * size[-2]
+        t_spatial_mask = t_spatial_mask.view(size)
+
+        # Calculate channel-wise mask.
+        t_channel_mask = torch.mean(torch.abs(x), [2, 3], keepdim=True)
+        channel_mask_size = t_channel_mask.size()
+        t_channel_mask = t_channel_mask.view(x.size(0), -1)
+
+        # Soften or sharpen the channel-wise mask by temperature.
+        t_channel_mask = torch.softmax(
+            t_channel_mask / self.channel_T, dim=1) * self.in_channels
+        t_channel_mask = t_channel_mask.view(channel_mask_size)
+
+        # Adaptative and pool student feature through spatial-wise.
+        t_spatial_pool = torch.mean(x, [1]).view(
+            x.size(0), 1, x.size(2), x.size(3))
+
+        # Calculate non_local relation.
+        t_relation = self.teacher_non_local(x)
+
+        return (t_spatial_mask, t_channel_mask, t_spatial_pool, t_relation, x)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/mgd_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/mgd_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b53fed1dce64432c91ab02c2856c8fbc4ddc731
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/mgd_connector.py
@@ -0,0 +1,71 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class MGDConnector(BaseConnector):
+    """PyTorch version of `Masked Generative Distillation.
+
+    <https://arxiv.org/abs/2205.01529>`
+
+    Args:
+        student_channels(int): Number of channels in the student's feature map.
+        teacher_channels(int): Number of channels in the teacher's feature map.
+        lambda_mgd (float, optional): masked ratio. Defaults to 0.65
+        init_cfg (Optional[Dict], optional): The weight initialized config for
+            :class:`BaseModule`. Defaults to None.
+    """
+
+    def __init__(
+        self,
+        student_channels: int,
+        teacher_channels: int,
+        lambda_mgd: float = 0.65,
+        mask_on_channel: bool = False,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(init_cfg)
+        self.lambda_mgd = lambda_mgd
+        self.mask_on_channel = mask_on_channel
+        if student_channels != teacher_channels:
+            self.align = nn.Conv2d(
+                student_channels,
+                teacher_channels,
+                kernel_size=1,
+                stride=1,
+                padding=0)
+        else:
+            self.align = None
+
+        self.generation = nn.Sequential(
+            nn.Conv2d(
+                teacher_channels, teacher_channels, kernel_size=3, padding=1),
+            nn.ReLU(inplace=True),
+            nn.Conv2d(
+                teacher_channels, teacher_channels, kernel_size=3, padding=1))
+
+    def forward_train(self, feature: torch.Tensor) -> torch.Tensor:
+        if self.align is not None:
+            feature = self.align(feature)
+
+        N, C, H, W = feature.shape
+
+        device = feature.device
+        if not self.mask_on_channel:
+            mat = torch.rand((N, 1, H, W)).to(device)
+        else:
+            mat = torch.rand((N, C, 1, 1)).to(device)
+
+        mat = torch.where(mat > 1 - self.lambda_mgd,
+                          torch.zeros(1).to(device),
+                          torch.ones(1).to(device)).to(device)
+
+        masked_fea = torch.mul(feature, mat)
+        new_fea = self.generation(masked_fea)
+        return new_fea
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/norm_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/norm_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d65da7dc518cbe7f1c4f4213bcb35b803bead75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/norm_connector.py
@@ -0,0 +1,19 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+from mmcv.cnn import build_norm_layer
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class NormConnector(BaseConnector):
+
+    def __init__(self, in_channels, norm_cfg, init_cfg: Optional[Dict] = None):
+        super(NormConnector, self).__init__(init_cfg)
+        _, self.norm = build_norm_layer(norm_cfg, in_channels)
+
+    def forward_train(self, feature: torch.Tensor) -> torch.Tensor:
+        return self.norm(feature)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/ofd_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/ofd_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..280ec3c4f55abbaf6cbc1b167e1b7bf4c601f43d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/ofd_connector.py
@@ -0,0 +1,38 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+
+@MODELS.register_module()
+class OFDTeacherConnector(BaseConnector):
+    """Connector designed for ``OverhaulFeatureDistillation``
+
+    Args:
+        init_cfg (Optional[Dict], optional): Initialization config dict.
+            Defaults to None.
+    """
+
+    def __init__(self, init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg)
+        self.margin: torch.Tensor = None
+
+    def init_margin(self, margin: torch.Tensor) -> None:
+        """Initializing margin, will be called by
+        ``OverhaulFeatureDistillation``.
+
+        Args:
+            margin (torch.Tensor): margin
+        """
+        self.margin = margin
+
+    def forward_train(self, feature: torch.Tensor) -> torch.Tensor:
+        """forward func for training."""
+        assert self.margin is not None, (
+            'margin must be initialized before training.')
+        self.margin = self.margin.to(feature.device)
+        feature = torch.max(feature.detach(), self.margin)
+        return feature
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/torch_connector.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/torch_connector.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b64d33ae2f7db5d59b1b32eb80bbfa533431fb1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/connectors/torch_connector.py
@@ -0,0 +1,135 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+from .base_connector import BaseConnector
+
+FUNCTION_LIST = [
+    'adaptive_avg_pool2d',
+    'adaptive_max_pool2d',
+    'avg_pool2d',
+    'dropout',
+    'dropout2d',
+    'max_pool2d',
+    'normalize',
+    'relu',
+    'softmax',
+    'interpolate',
+]
+
+
+@MODELS.register_module()
+class TorchFunctionalConnector(BaseConnector):
+    """TorchFunctionalConnector: Call function in torch.nn.functional
+    to process input data
+
+    usage:
+        tensor1 = torch.rand(3,3,16,16)
+        pool_connector = TorchFunctionalConnector(
+                            function_name='avg_pool2d',
+                            func_args=dict(kernel_size=4),
+                        )
+        tensor2 = pool_connector.forward_train(tensor1)
+        tensor2.size()
+        # torch.Size([3, 3, 4, 4])
+
+        which is equal to torch.nn.functional.avg_pool2d(kernel_size=4)
+    Args:
+        function_name (str, optional): function. Defaults to None.
+        func_args (dict): args parsed to function. Defaults to {}.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self,
+                 function_name: Optional[str] = None,
+                 func_args: Dict = {},
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg)
+        assert function_name is not None, 'Arg `function_name` cannot be None'
+        if function_name not in FUNCTION_LIST:
+            raise ValueError(
+                ' Arg `function_name` are not available, See this list',
+                FUNCTION_LIST)
+        self.func = getattr(F, function_name)
+        self.func_args = func_args
+
+    def forward_train(self, x: torch.Tensor) -> torch.Tensor:
+        """Frorward function for training.
+
+        Args:
+            x (torch.Tensor): Input features.
+        """
+        x = self.func(x, **self.func_args)
+        return x
+
+
+MODULE_LIST = [
+    'AdaptiveAvgPool2d',
+    'AdaptiveMaxPool2d',
+    'AvgPool2d',
+    'BatchNorm2d',
+    'Conv2d',
+    'Dropout',
+    'Dropout2d',
+    'Linear',
+    'MaxPool2d',
+    'ReLU',
+    'Softmax',
+]
+
+
+@MODELS.register_module()
+class TorchNNConnector(BaseConnector):
+    """TorchNNConnector: create nn.module in torch.nn to process input data
+
+    usage:
+        tensor1 = torch.rand(3,3,16,16)
+        pool_connector = TorchNNConnector(
+                            module_name='AvgPool2d',
+                            module_args=dict(kernel_size=4),
+                        )
+        tensor2 = pool_connector.forward_train(tensor1)
+        tensor2.size()
+        # torch.Size([3, 3, 4, 4])
+
+        which is equal to torch.nn.AvgPool2d(kernel_size=4)
+    Args:
+        module_name (str, optional):
+            module name. Defaults to None.
+            possible_values:['AvgPool2d',
+                            'Dropout2d',
+                            'AdaptiveAvgPool2d',
+                            'AdaptiveMaxPool2d',
+                            'ReLU',
+                            'Softmax',
+                            'BatchNorm2d',
+                            'Linear',]
+        module_args (dict):
+            args parsed to nn.Module().__init__(). Defaults to {}.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self,
+                 module_name: Optional[str] = None,
+                 module_args: Dict = {},
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg)
+        assert module_name is not None, 'Arg `module_name` cannot be None'
+        if module_name not in MODULE_LIST:
+            raise ValueError(
+                ' Arg `module_name` are not available, See this list',
+                MODULE_LIST)
+        self.func = getattr(nn, module_name)(**module_args)
+
+    def forward_train(self, x: torch.Tensor) -> torch.Tensor:
+        """Frorward function for training.
+
+        Args:
+            x (torch.Tensor): Input features.
+        """
+        x = self.func(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b49693855083d67cf5c02ff1568fdd0d6f45e408
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .bricks import *  # noqa: F401,F403
+from .head import *  # noqa: F401,F403
+from .mixins import *  # noqa: F401,F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3abe4fd8c6312701fd5e39523ac3bffdc700fe7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/__init__.py
@@ -0,0 +1,35 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dynamic_container import DynamicSequential
+from .dynamic_conv import (BigNasConv2d, DynamicConv2d,
+                           DynamicConv2dAdaptivePadding, FuseConv2d, OFAConv2d)
+from .dynamic_embed import DynamicPatchEmbed
+from .dynamic_function import DynamicInputResizer
+from .dynamic_linear import DynamicLinear
+from .dynamic_multi_head_attention import DynamicMultiheadAttention
+from .dynamic_norm import (DMCPBatchNorm2d, DynamicBatchNorm1d,
+                           DynamicBatchNorm2d, DynamicBatchNorm3d,
+                           DynamicBatchNormXd, DynamicLayerNorm,
+                           DynamicSyncBatchNorm, SwitchableBatchNorm2d)
+from .dynamic_relative_position import DynamicRelativePosition2D
+
+__all__ = [
+    'BigNasConv2d',
+    'DynamicConv2d',
+    'OFAConv2d',
+    'DynamicLinear',
+    'DynamicBatchNorm1d',
+    'DynamicBatchNorm2d',
+    'DynamicBatchNorm3d',
+    'SwitchableBatchNorm2d',
+    'DynamicSequential',
+    'DynamicPatchEmbed',
+    'DynamicRelativePosition2D',
+    'FuseConv2d',
+    'DynamicMultiheadAttention',
+    'DynamicSyncBatchNorm',
+    'DynamicConv2dAdaptivePadding',
+    'DynamicBatchNormXd',
+    'DynamicInputResizer',
+    'DynamicLayerNorm',
+    'DMCPBatchNorm2d',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_container.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_container.py
new file mode 100644
index 0000000000000000000000000000000000000000..3696fe38eb872e510d95bee689dbffeaeab28aeb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_container.py
@@ -0,0 +1,109 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Iterator, Optional, Set
+
+import torch.nn as nn
+from mmengine.model import Sequential
+from torch import Tensor
+from torch.nn import Module
+
+from mmrazor.models.mutables import DerivedMutable, MutableValue
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from ..mixins import DynamicMixin
+
+
+class DynamicSequential(Sequential, DynamicMixin):
+    """Dynamic Sequential Container."""
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs: Set[str] = {'depth'}
+
+    forward_ignored_module = (MutableValue, DerivedMutable, nn.ModuleDict)
+
+    def __init__(self, *args, init_cfg: Optional[dict] = None):
+        super().__init__(*args, init_cfg=init_cfg)
+
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @property
+    def mutable_depth(self):
+        """Mutable depth."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['depth']
+
+    def register_mutable_attr(self: Sequential, attr: str,
+                              mutable: BaseMutable):
+        """Register attribute of mutable."""
+        if attr == 'depth':
+            self._register_mutable_depth(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_depth(self: Sequential, mutable_depth: MutableValue):
+        """Register mutable depth."""
+        assert hasattr(self, 'mutable_attrs')
+        assert mutable_depth.current_choice is not None
+        current_depth = mutable_depth.current_choice
+        if current_depth > len(self._modules):
+            raise ValueError(f'Expect depth of mutable to be smaller than '
+                             f'{len(self._modules)} as `depth`, '
+                             f'but got: {current_depth}.')
+        self.mutable_attrs['depth'] = mutable_depth
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return Sequential
+
+    def to_static_op(self: Sequential) -> Sequential:
+        """Convert dynamic Sequential to static one."""
+        self.check_if_mutables_fixed()
+
+        if self.mutable_depth is None:
+            fixed_depth = len(self)
+        else:
+            fixed_depth = self.get_current_choice(self.mutable_depth)
+
+        modules = []
+        passed_module_nums = 0
+        for module in self:
+            if isinstance(module, self.forward_ignored_module):
+                continue
+            else:
+                passed_module_nums += 1
+            if passed_module_nums > fixed_depth:
+                break
+
+            modules.append(module)
+
+        return Sequential(*modules)
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of Dynamic Sequential."""
+        if self.mutable_depth is None:
+            return self(x)
+
+        current_depth = self.get_current_choice(self.mutable_depth)
+        passed_module_nums = 0
+        for module in self.pure_modules():
+            passed_module_nums += 1
+            if passed_module_nums > current_depth:
+                break
+            x = module(x)
+        return x
+
+    @property
+    def pure_module_nums(self) -> int:
+        """Number of pure module."""
+        return sum(1 for _ in self.pure_modules())
+
+    def pure_modules(self) -> Iterator[Module]:
+        """nn.Module would influence the forward of Sequential."""
+        for module in self._modules.values():
+            if isinstance(module, self.forward_ignored_module):
+                continue
+            yield module
+
+    @classmethod
+    def convert_from(cls, module: Sequential):
+        """Convert the static Sequential to dynamic one."""
+        dynamic_m = cls(module._modules)
+        return dynamic_m
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_conv.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..4604ccb36921e7a8e509b7c5b6c28df231753890
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_conv.py
@@ -0,0 +1,286 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Callable, Dict
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.registry import MODELS
+from ..mixins.dynamic_conv_mixins import (BigNasConvMixin, DynamicConvMixin,
+                                          FuseConvMixin, OFAConvMixin)
+
+GroupWiseConvWarned = False
+
+
+@MODELS.register_module()
+class DynamicConv2d(nn.Conv2d, DynamicConvMixin):
+    """Dynamic Conv2d OP.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicConv2d`` is totally same as
+        :obj:`torch.nn.Conv2d`.
+
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `in_channels`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs = {'in_channels', 'out_channels'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        # TODO
+        # https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d
+        assert self.padding_mode == 'zeros'
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module: nn.Conv2d) -> 'DynamicConv2d':
+        """Convert an instance of nn.Conv2d to a new instance of
+        DynamicConv2d."""
+
+        return cls(
+            in_channels=module.in_channels,
+            out_channels=module.out_channels,
+            kernel_size=module.kernel_size,
+            stride=module.stride,
+            padding=module.padding,
+            dilation=module.dilation,
+            groups=module.groups,
+            bias=True if module.bias is not None else False,
+            padding_mode=module.padding_mode)
+
+    @property
+    def conv_func(self) -> Callable:
+        """The function that will be used in ``forward_mixin``."""
+        return F.conv2d
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.Conv2d
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of dynamic conv2d OP."""
+        return self.forward_mixin(x)
+
+
+@MODELS.register_module()
+class BigNasConv2d(nn.Conv2d, BigNasConvMixin):
+    """Conv2d used in BigNas.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicConv2d`` is totally same as
+        :obj:`torch.nn.Conv2d`.
+
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `in_channels`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs = {'in_channels', 'out_channels', 'kernel_size'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        # TODO
+        # https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d
+        assert self.padding_mode == 'zeros'
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module: nn.Conv2d) -> 'BigNasConv2d':
+        """Convert an instance of `nn.Conv2d` to a new instance of
+        `BigNasConv2d`."""
+        return cls(
+            in_channels=module.in_channels,
+            out_channels=module.out_channels,
+            kernel_size=module.kernel_size,
+            stride=module.stride,
+            padding=module.padding,
+            dilation=module.dilation,
+            groups=module.groups,
+            bias=True if module.bias is not None else False,
+            padding_mode=module.padding_mode)
+
+    @property
+    def conv_func(self) -> Callable:
+        """The function that will be used in ``forward_mixin``."""
+        return F.conv2d
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.Conv2d
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of bignas' conv2d."""
+        return self.forward_mixin(x)
+
+
+@MODELS.register_module()
+class OFAConv2d(nn.Conv2d, OFAConvMixin):
+    """Conv2d used in `Once-for-All`.
+
+    Refers to `Once-for-All: Train One Network and Specialize it for Efficient
+    Deployment <http://arxiv.org/abs/1908.09791>`_.
+    """
+    """Dynamic Conv2d OP.
+
+    Note:
+        Arguments for ``__init__`` of ``OFAConv2d`` is totally same as
+        :obj:`torch.nn.Conv2d`.
+
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `in_channels`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs = {'in_channels', 'out_channels'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        # TODO
+        # https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d
+        assert self.padding_mode == 'zeros'
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module: nn.Conv2d) -> 'OFAConv2d':
+        """Convert an instance of `nn.Conv2d` to a new instance of
+        `OFAConv2d`."""
+
+        return cls(
+            in_channels=module.in_channels,
+            out_channels=module.out_channels,
+            kernel_size=module.kernel_size,
+            stride=module.stride,
+            padding=module.padding,
+            dilation=module.dilation,
+            groups=module.groups,
+            bias=True if module.bias is not None else False,
+            padding_mode=module.padding_mode)
+
+    @property
+    def conv_func(self) -> Callable:
+        """The function that will be used in ``forward_mixin``."""
+        return F.conv2d
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.Conv2d
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of OFA's conv2d."""
+        return self.forward_mixin(x)
+
+
+@MODELS.register_module()
+class FuseConv2d(nn.Conv2d, FuseConvMixin):
+    """FuseConv2d used in `DCFF`.
+
+    Refers to `Training Compact CNNs for Image Classification
+    using Dynamic-coded Filter Fusion <https://arxiv.org/abs/2107.06916>`_.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `in_channels`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs = {'in_channels', 'out_channels'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module: nn.Conv2d) -> 'FuseConv2d':
+        """Convert an instance of `nn.Conv2d` to a new instance of
+        `FuseConv2d`."""
+        return cls(
+            in_channels=module.in_channels,
+            out_channels=module.out_channels,
+            kernel_size=module.kernel_size,
+            stride=module.stride,
+            padding=module.padding,
+            dilation=module.dilation,
+            groups=module.groups,
+            bias=True if module.bias is not None else False,
+            padding_mode=module.padding_mode)
+
+    @property
+    def conv_func(self) -> Callable:
+        """The function that will be used in ``forward_mixin``."""
+        return F.conv2d
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.Conv2d
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of fused conv2d."""
+        return self.forward_mixin(x)
+
+
+class DynamicConv2dAdaptivePadding(DynamicConv2d):
+    """Dynamic version of mmcv.cnn.bricks.Conv2dAdaptivePadding."""
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        img_h, img_w = x.size()[-2:]
+        kernel_h, kernel_w = self.weight.size()[-2:]
+        stride_h, stride_w = self.stride
+        output_h = math.ceil(img_h / stride_h)
+        output_w = math.ceil(img_w / stride_w)
+        pad_h = (
+            max((output_h - 1) * self.stride[0] +
+                (kernel_h - 1) * self.dilation[0] + 1 - img_h, 0))
+        pad_w = (
+            max((output_w - 1) * self.stride[1] +
+                (kernel_w - 1) * self.dilation[1] + 1 - img_w, 0))
+        if pad_h > 0 or pad_w > 0:
+            x = F.pad(x, [
+                pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+            ])
+        return super().forward(x)
+
+    @property
+    def static_op_factory(self):
+        from mmcv.cnn.bricks import Conv2dAdaptivePadding
+        return Conv2dAdaptivePadding
+
+    def to_static_op(self) -> nn.Conv2d:
+        self.check_if_mutables_fixed()
+
+        weight, bias, padding = self.get_dynamic_params()
+        groups = self.groups
+        if groups == self.in_channels == self.out_channels and \
+                self.mutable_in_channels is not None:
+            mutable_in_channels = self.mutable_attrs['in_channels']
+            groups = mutable_in_channels.current_mask.sum().item()
+        out_channels = weight.size(0)
+        in_channels = weight.size(1) * groups
+
+        kernel_size = tuple(weight.shape[2:])
+
+        static_conv = self.static_op_factory(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=self.stride,
+            padding=padding,
+            dilation=self.dilation,
+            groups=groups,
+            bias=True if bias is not None else False)
+
+        static_conv.weight = nn.Parameter(weight)
+        if bias is not None:
+            static_conv.bias = nn.Parameter(bias)
+
+        return static_conv
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_embed.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_embed.py
new file mode 100644
index 0000000000000000000000000000000000000000..f255d24027ca0c8abb73c1750dd51a34becd4343
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_embed.py
@@ -0,0 +1,154 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Dict, Set, Tuple
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+try:
+    from mmcls.models.utils import PatchEmbed
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    PatchEmbed = get_placeholder('mmcls')
+from mmengine import print_log
+from torch import Tensor
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.registry import MODELS
+from ..mixins import DynamicChannelMixin
+
+
+@MODELS.register_module()
+class DynamicPatchEmbed(PatchEmbed, DynamicChannelMixin):
+    """Dynamic Patch Embedding.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicPatchEmbed`` is totally same as
+        :obj:`mmcls.models.utils.PatchEmbed`.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `embed_dims`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+
+    mutable_attrs: nn.ModuleDict
+    accepted_mutable_attrs: Set[str] = {'embed_dims'}
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'embed_dims',
+        'out_channels': 'embed_dims'
+    }
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @property
+    def mutable_embed_dims(self):
+        """Mutable embedding dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['embed_dims']
+
+    def register_mutable_attr(self: PatchEmbed, attr: str,
+                              mutable: BaseMutable):
+        """Register attribute of mutable."""
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr, mutable):
+        """Register `embed_dims`."""
+        if attr == 'embed_dims':
+            self._register_embed_dims(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_embed_dims(self: PatchEmbed,
+                             mutable_patch_embedding: BaseMutable) -> None:
+        """Register mutable embedding dimension."""
+        mask_size = mutable_patch_embedding.current_mask.size(0)
+
+        if mask_size != self.embed_dims:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.embed_dims} as '
+                f'`embed_dims`, but got: {mask_size}.')
+
+        self.mutable_attrs['embed_dims'] = mutable_patch_embedding
+
+    def _get_dynamic_params(self: PatchEmbed) -> Tuple[Tensor, Tensor]:
+        """Get mask of ``embed_dims``"""
+        if 'embed_dims' not in self.mutable_attrs:
+            return self.projection.weight, self.projection.bias
+        else:
+            out_mask = self.mutable_embed_dims.current_mask.to(
+                self.projection.weight.device)
+            weight = self.projection.weight[out_mask][:]
+            bias = self.projection.bias[
+                out_mask] if self.projection.bias is not None else None  # noqa: E501
+            return weight, bias
+
+    def to_static_op(self: PatchEmbed) -> nn.Module:
+        """Convert dynamic PatchEmbed to static PatchEmbed."""
+        self.check_if_mutables_fixed()
+        assert self.mutable_embed_dims is not None
+
+        weight, bias = self._get_dynamic_params()
+        static_patch_embed = self.static_op_factory(
+            img_size=self.img_size,
+            in_channels=3,
+            embed_dims=self.mutable_embed_dims.activated_channels)
+
+        static_patch_embed.projection.weight = nn.Parameter(weight.clone())
+        static_patch_embed.projection.bias = nn.Parameter(bias.clone())
+
+        return static_patch_embed
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return PatchEmbed
+
+    @classmethod
+    def convert_from(cls, module) -> nn.Module:
+        """Convert a PatchEmbed to a DynamicPatchEmbed."""
+
+        dynamic_patch_embed = cls(
+            img_size=module.img_size,
+            in_channels=3,
+            embed_dims=module.embed_dims,
+            norm_cfg=None,
+            conv_cfg=None,
+            init_cfg=None)
+
+        # TODO mutable_attr should be inherited from its `__base__` class
+        dynamic_patch_embed.projection = module.projection
+        dynamic_patch_embed.norm = module.norm
+
+        return dynamic_patch_embed
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of dynamic patch embed."""
+        weight, bias = self._get_dynamic_params()
+        x = F.conv2d(
+            x,
+            weight,
+            bias,
+            stride=16,
+            padding=self.projection.padding,
+            dilation=self.projection.dilation).flatten(2).transpose(1, 2)
+
+        if self.norm is not None:
+            x = self.norm(x)
+
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_function.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_function.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e5761eac10db76ba6187f23e15b55f0e55ec784
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_function.py
@@ -0,0 +1,61 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn as nn
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.registry import MODELS
+from ...ops import InputResizer
+from ..mixins.dynamic_mixins import DynamicResizeMixin
+
+
+@MODELS.register_module()
+class DynamicInputResizer(InputResizer, DynamicResizeMixin):
+    """Dynamic InputResizer Module.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicInputResizer`` is totally same
+        as :obj:`mmrazor.models.architectures.InputResizer`.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `InputResizer`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+
+    mutable_attrs: nn.ModuleDict
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, Optional[BaseMutable]] = nn.ModuleDict()
+
+    def forward(self,
+                x: torch.Tensor,
+                size=Optional[Tuple[int, int]]) -> torch.Tensor:
+        self._size = self.get_dynamic_shape()
+
+        if not self._size:
+            self._size = size
+
+        return super().forward(x, self._size)
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return InputResizer
+
+    @classmethod
+    def convert_from(cls, module: InputResizer):
+        """Convert a InputResizer to a DynamicInputResizer.
+
+        Args:
+            module (:obj:`mmrazor.models.architectures.InputResizer`):
+            The original InputResizer module.
+        """
+        dynamic_seq = cls(
+            interpolation_type=module._interpolation_type,
+            align_corners=module._align_corners,
+            scale_factor=module._scale_factor)
+
+        return dynamic_seq
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_linear.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_linear.py
new file mode 100644
index 0000000000000000000000000000000000000000..4faa0c8b79d43c44c78e780ec97784741bb85955
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_linear.py
@@ -0,0 +1,53 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from ..mixins import DynamicLinearMixin
+
+
+class DynamicLinear(nn.Linear, DynamicLinearMixin):
+    """Dynamic Linear OP.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicLinear`` is totally same as
+        :obj:`torch.nn.Linear`.
+
+    Attributes:
+        mutable_in_features (BaseMutable, optional): Mutable for controlling
+            ``in_features``.
+        mutable_out_features (BaseMutable, optional): Mutable for controlling
+            ``out_features``.
+    """
+    accepted_mutable_attrs = {'in_features', 'out_features'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @property
+    def static_op_factory(self):
+        return nn.Linear
+
+    @classmethod
+    def convert_from(cls, module):
+        """Convert a nn.Linear module to a DynamicLinear.
+
+        Args:
+            module (:obj:`torch.nn.Linear`): The original Linear module.
+        """
+        dynamic_linear = cls(
+            in_features=module.in_features,
+            out_features=module.out_features,
+            bias=True if module.bias is not None else False)
+        return dynamic_linear
+
+    def forward(self, input: Tensor) -> Tensor:
+        """Forward of dynamic linear OP."""
+        weight, bias = self.get_dynamic_params()
+
+        return F.linear(input, weight, bias)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_multi_head_attention.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_multi_head_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..8dcd6de3ccb71405557002e8f5129d6068549d02
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_multi_head_attention.py
@@ -0,0 +1,279 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Dict, Set, Tuple
+
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine import print_log
+from torch import Tensor
+
+from mmrazor.models.architectures.ops import MultiheadAttention
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from ..mixins import DynamicChannelMixin
+from .dynamic_relative_position import DynamicRelativePosition2D  # noqa: E501
+
+
+class DynamicMultiheadAttention(MultiheadAttention, DynamicChannelMixin):
+    """Dynamic Multihead Attention with iRPE..
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicMultiheadAttention`` is
+        totally same as
+        :obj:`mmrazor.models.architectures.MultiheadAttention`.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `num_heads`、 `embed_dims`、 `q_embed_dims`.
+            The key of the dict must in ``accepted_mutable_attrs``.
+    """
+
+    mutable_attrs: nn.ModuleDict
+    relative_position: bool
+    max_relative_position: int
+    w_qs: nn.Linear
+    w_ks: nn.Linear
+    w_vs: nn.Linear
+    embed_dims: int
+    q_embed_dims: int
+    proj: nn.Linear
+    attn_drop_rate: float
+    accepted_mutable_attrs: Set[str] = {
+        'num_heads', 'embed_dims', 'q_embed_dims'
+    }
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'embed_dims',
+        'out_channels': 'q_embed_dims',
+    }
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+        # dynamic image relative position encoding
+        if self.relative_position:
+            self.rel_pos_embed_k = DynamicRelativePosition2D(
+                self.head_dims, self.max_relative_position)
+            self.rel_pos_embed_v = DynamicRelativePosition2D(
+                self.head_dims, self.max_relative_position)
+
+    @property
+    def mutable_num_heads(self):
+        """Mutable number of heads."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['num_heads']
+
+    @property
+    def mutable_embed_dims(self):
+        """Mutable embedding dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['embed_dims']
+
+    @property
+    def mutable_q_embed_dims(self):
+        """Mutable intermediate embedding dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['q_embed_dims']
+
+    def register_mutable_attr(self, attr: str, mutable: BaseMutable):
+        """Register attribute of mutable."""
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            # if hasattr(self, 'mutable_attrs'):
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr: str, mutable: BaseMutable):
+        """Register `embed_dims` `q_embed_dims` `num_heads`"""
+        if attr == 'num_heads':
+            self._register_mutable_num_heads(mutable)
+        elif attr == 'embed_dims':
+            self._register_mutable_embed_dims(mutable)
+        elif attr == 'q_embed_dims':
+            self._register_mutable_q_embed_dims(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_num_heads(self, mutable_num_heads):
+        """Register the mutable number of heads."""
+        assert hasattr(self, 'mutable_attrs')
+        current_choice = mutable_num_heads.current_choice
+        if current_choice > self.num_heads:
+            raise ValueError(
+                f'Expect value of mutable to be smaller or equal than '
+                f'{self.num_heads} as `num_heads`, but got: {current_choice}.')
+
+        self.mutable_attrs['num_heads'] = mutable_num_heads
+
+    def _register_mutable_embed_dims(self, mutable_embed_dims):
+        """Register mutable embedding dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        mask_size = mutable_embed_dims.current_mask.size(0)
+        if mask_size != self.embed_dims:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.embed_dims} as '
+                f'`embed_dims`, but got: {mask_size}.')
+
+        self.mutable_attrs['embed_dims'] = mutable_embed_dims
+
+    def _register_mutable_q_embed_dims(self, mutable_q_embed_dims):
+        """Register intermediate mutable embedding dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        self.mutable_attrs['q_embed_dims'] = mutable_q_embed_dims
+
+    def _get_dynamic_proj_params(self, w: nn.Linear) -> Tuple[Tensor, Tensor]:
+        """Get parameters of dynamic projection.
+
+        Note:
+            The input dimension is decided by `mutable_q_embed_dims`.
+            The output dimension is decided by `mutable_embed_dims`.
+        """
+        # TODO support mask
+        if self.mutable_embed_dims is None and \
+                self.mutable_q_embed_dims is None:
+            return w.weight, w.bias
+
+        if self.mutable_q_embed_dims is not None:
+            in_features = self.mutable_q_embed_dims.activated_channels
+        else:
+            in_features = self.embed_dims
+
+        if self.mutable_embed_dims is not None:
+            out_features = self.mutable_embed_dims.activated_channels
+        else:
+            out_features = self.embed_dims
+
+        weight = w.weight[:out_features, :in_features]
+        bias = w.bias[:out_features] if w.bias is not None else None
+
+        return weight, bias
+
+    def _get_dynamic_qkv_params(self, w: nn.Linear) -> Tuple[Tensor, Tensor]:
+        """Get parameters of dynamic QKV.
+
+        Note:
+            The output dimension is decided by `mutable_q_embed_dims`.
+            The input dimension is decided by `mutable_embed_dims`.
+        """
+        # TODO support mask later
+        if self.mutable_q_embed_dims is None and \
+                self.mutable_embed_dims is None:
+            return w.weight, w.bias
+
+        if self.mutable_embed_dims is not None:
+            in_features = self.mutable_embed_dims.activated_channels
+        else:
+            in_features = self.embed_dims
+
+        if self.mutable_q_embed_dims is not None:
+            out_features = self.mutable_q_embed_dims.activated_channels
+        else:
+            out_features = self.mutable_q_embed_dims
+
+        weight = w.weight[:out_features, :in_features]
+        bias = w.bias[:out_features] if w.bias is not None else None
+
+        return weight, bias
+
+    def to_static_op(self) -> MultiheadAttention:
+        """Convert dynamic MultiheadAttention to static one."""
+        self.check_if_mutables_fixed()
+
+        embed_dims = self.mutable_embed_dims.activated_channels
+        num_heads = self.mutable_num_heads.current_choice
+
+        q_w, q_b = self._get_dynamic_qkv_params(self.w_qs)
+        k_w, k_b = self._get_dynamic_qkv_params(self.w_ks)
+        v_w, v_b = self._get_dynamic_qkv_params(self.w_vs)
+
+        proj_w, proj_b = self._get_dynamic_proj_params(self.proj)
+
+        static_mha = MultiheadAttention(
+            embed_dims=embed_dims,
+            num_heads=num_heads,
+            input_dims=None,
+            relative_position=self.relative_position,
+            max_relative_position=self.max_relative_position)
+
+        static_mha.w_qs.weight = nn.Parameter(q_w.clone())
+        static_mha.w_qs.bias = nn.Parameter(q_b.clone())
+
+        static_mha.w_ks.weight = nn.Parameter(k_w.clone())
+        static_mha.w_ks.bias = nn.Parameter(k_b.clone())
+
+        static_mha.w_vs.weight = nn.Parameter(v_w.clone())
+        static_mha.w_vs.bias = nn.Parameter(v_b.clone())
+
+        static_mha.proj.weight = nn.Parameter(proj_w.clone())
+        static_mha.proj.bias = nn.Parameter(proj_b.clone())
+
+        if self.relative_position:
+            static_mha.rel_pos_embed_k = self.rel_pos_embed_k.to_static_op()
+            static_mha.rel_pos_embed_v = self.rel_pos_embed_v.to_static_op()
+
+        return static_mha
+
+    @classmethod
+    def convert_from(cls, module):
+        """Convert the static module to dynamic one."""
+        dynamic_mha = cls(
+            embed_dims=module.embed_dims,
+            num_heads=module.num_heads,
+        )
+        return dynamic_mha
+
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return MultiheadAttention
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward of dynamic MultiheadAttention."""
+        B, N = x.shape[0], x.shape[1]
+        q_w, q_b = self._get_dynamic_qkv_params(self.w_qs)
+        k_w, k_b = self._get_dynamic_qkv_params(self.w_ks)
+        v_w, v_b = self._get_dynamic_qkv_params(self.w_vs)
+
+        q_embed_dims = self.mutable_q_embed_dims.activated_channels
+        num_heads = self.mutable_num_heads.current_choice
+
+        q = F.linear(x, q_w, q_b).view(B, N, num_heads,
+                                       q_embed_dims // num_heads)
+        k = F.linear(x, k_w, k_b).view(B, N, num_heads,
+                                       q_embed_dims // num_heads)
+        v = F.linear(x, v_w, v_b).view(B, N, num_heads,
+                                       q_embed_dims // num_heads)
+
+        q, k, v = q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2)
+
+        attn = (q @ k.transpose(-2, -1)) * self.scale
+
+        if self.relative_position:
+            r_p_k = self.rel_pos_embed_k(N, N)
+            attn = attn + (q.permute(2, 0, 1, 3).reshape(N, num_heads * B, -1)  # noqa: E501
+                           @ r_p_k.transpose(2, 1)) \
+                .transpose(1, 0).reshape(B, num_heads, N, N) * self.scale
+
+        attn = attn.softmax(dim=-1)
+        attn = self.attn_drop(attn)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
+
+        if self.relative_position:
+            r_p_v = self.rel_pos_embed_v(N, N)
+            attn_1 = attn.permute(2, 0, 1, 3).reshape(N, B * num_heads, -1)
+            x = x + (attn_1 @ r_p_v).transpose(1, 0).reshape(
+                B, num_heads, N, -1).transpose(2, 1).reshape(B, N, -1)
+
+        # proj
+        weight, bias = self._get_dynamic_proj_params(self.proj)
+        x = F.linear(x, weight, bias)
+        x = self.proj_drop(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_norm.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_norm.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb5dd3b753091cdd7d61af4d40a577e47b289b9f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_norm.py
@@ -0,0 +1,482 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from functools import partial
+from typing import Any, Callable, Dict, List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.model.utils import _BatchNormXd
+from torch import Tensor
+from torch.nn import LayerNorm
+from torch.nn.modules._functions import SyncBatchNorm as sync_batch_norm
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.registry import MODELS
+from ..mixins import DynamicBatchNormMixin, DynamicLayerNormMixin
+
+PartialType = Callable[[Any, Optional[nn.Parameter]], Tuple]
+
+
+class _DynamicBatchNorm(_BatchNorm, DynamicBatchNormMixin):
+    """Dynamic BatchNormxd OP.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicBatchNormxd`` is totally same as
+        :obj:`torch.nn.BatchNormxd`.
+
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `num_features`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    accepted_mutable_attrs = {'num_features'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, Optional[BaseMutable]] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module: _BatchNorm):
+        """Convert a _BatchNorm module to a DynamicBatchNorm.
+
+        Args:
+            module (:obj:`torch.nn._BatchNorm`): The original BatchNorm module.
+        """
+        dynamic_bn = cls(
+            num_features=module.num_features,
+            eps=module.eps,
+            momentum=module.momentum,
+            affine=module.affine,
+            track_running_stats=module.track_running_stats)
+
+        return dynamic_bn
+
+    def forward(self, input: Tensor) -> Tensor:
+        """Forward of dynamic BatchNormxd OP."""
+        self._check_input_dim(input)
+
+        if self.momentum is None:
+            exponential_average_factor = 0.0
+        else:
+            exponential_average_factor = self.momentum
+
+        if self.training and self.track_running_stats:
+            if self.num_batches_tracked is not None:  # type: ignore
+                self.num_batches_tracked = \
+                    self.num_batches_tracked + 1  # type: ignore
+                if self.momentum is None:  # use cumulative moving average
+                    exponential_average_factor = 1.0 / float(
+                        self.num_batches_tracked)
+                else:  # use exponential moving average
+                    exponential_average_factor = self.momentum
+
+        if self.training:
+            bn_training = True
+        else:
+            bn_training = (self.running_mean is None) and (self.running_var is
+                                                           None)
+
+        running_mean, running_var, weight, bias = self.get_dynamic_params()
+
+        out = F.batch_norm(input, running_mean, running_var, weight, bias,
+                           bn_training, exponential_average_factor, self.eps)
+
+        # copy changed running statistics
+        if self.training and self.track_running_stats:
+            out_mask = self._get_num_features_mask()
+            self.running_mean.masked_scatter_(out_mask, running_mean)
+            self.running_var.masked_scatter_(out_mask, running_var)
+
+        return out
+
+
+@MODELS.register_module()
+class DynamicBatchNorm1d(_DynamicBatchNorm):
+    """Dynamic BatchNorm1d OP."""
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.BatchNorm1d
+
+    def _check_input_dim(self, input: Tensor) -> None:
+        """Check if input dimension is valid."""
+        if input.dim() != 2 and input.dim() != 3:
+            raise ValueError('expected 2D or 3D input (got {}D input)'.format(
+                input.dim()))
+
+
+@MODELS.register_module()
+class DynamicBatchNorm2d(_DynamicBatchNorm):
+    """Dynamic BatchNorm2d OP."""
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.BatchNorm2d
+
+    def _check_input_dim(self, input: Tensor) -> None:
+        """Check if input dimension is valid."""
+        if input.dim() != 4:
+            raise ValueError('expected 4D input (got {}D input)'.format(
+                input.dim()))
+
+
+@MODELS.register_module()
+class DynamicBatchNorm3d(_DynamicBatchNorm):
+    """Dynamic BatchNorm3d OP."""
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return nn.BatchNorm3d
+
+    def _check_input_dim(self, input: Tensor) -> None:
+        """Check if input dimension is valid."""
+        if input.dim() != 5:
+            raise ValueError('expected 5D input (got {}D input)'.format(
+                input.dim()))
+
+
+class SwitchableBatchNorm2d(DynamicBatchNorm2d):
+    """A switchable DynamicBatchNorm2d. It mmploys independent batch
+    normalization for different switches in a slimmable network.
+
+    To train slimmable networks, ``SwitchableBatchNorm2d`` privatizes all batch
+    normalization layers for each switch in a slimmable network. Compared with
+    the naive training approach, it solves the problem of feature aggregation
+    inconsistency between different switches by independently normalizing the
+    feature mean and variance during testing.
+    """
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self.candidate_bn = nn.ModuleDict()
+
+    def init_candidates(self, candidates: List):
+        """Initialize candicates."""
+        assert len(self.candidate_bn) == 0
+        self._check_candidates(candidates)
+        for num in candidates:
+            self.candidate_bn[str(num)] = nn.BatchNorm2d(
+                num, self.eps, self.momentum, self.affine,
+                self.track_running_stats)
+
+    def forward(self, input: Tensor) -> Tensor:
+        """Forward."""
+        choice_num = self.activated_channel_num()
+        if choice_num == self.num_features:
+            return super().forward(input)
+        else:
+            assert str(choice_num) in self.candidate_bn
+            return self.candidate_bn[str(choice_num)](input)
+
+    def to_static_op(self: _BatchNorm) -> nn.Module:
+        """Convert to a normal BatchNorm."""
+        choice_num = self.activated_channel_num()
+        if choice_num == self.num_features:
+            return super().to_static_op()
+        else:
+            assert str(choice_num) in self.candidate_bn
+            return self.candidate_bn[str(choice_num)]
+
+    # private methods
+
+    def activated_channel_num(self):
+        """The number of activated channels."""
+        mask = self._get_num_features_mask()
+        choice_num = (mask == 1).sum().item()
+        return choice_num
+
+    def _check_candidates(self, candidates: List):
+        """Check if candidates aviliable."""
+        for value in candidates:
+            assert isinstance(value, int)
+            assert 0 < value <= self.num_features
+
+    @property
+    def static_op_factory(self):
+        """Return initializer of static op."""
+        return nn.BatchNorm2d
+
+
+@MODELS.register_module()
+class DynamicLayerNorm(LayerNorm, DynamicLayerNormMixin):
+    """Applies Layer Normalization over a mini-batch of inputs according to the
+    `mutable_num_channels` dynamically.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicLayerNorm`` is totally same as
+        :obj:`torch.nn.LayerNorm`.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `num_features`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+    accepted_mutable_attrs = {'num_features'}
+
+    def __init__(self, *args, **kwargs):
+        super(DynamicLayerNorm, self).__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, Optional[BaseMutable]] = nn.ModuleDict()
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return LayerNorm
+
+    @classmethod
+    def convert_from(cls, module: LayerNorm):
+        """Convert a _BatchNorm module to a DynamicBatchNorm.
+
+        Args:
+            module (:obj:`torch.nn._BatchNorm`): The original BatchNorm module.
+        """
+        dynamic_ln = cls(
+            normalized_shape=module.normalized_shape,
+            eps=module.eps,
+            elementwise_affine=module.elementwise_affine)
+
+        return dynamic_ln
+
+    def forward(self, input: Tensor) -> Tensor:
+        """Slice the parameters according to `mutable_num_channels`, and
+        forward."""
+        self._check_input_dim(input)
+
+        weight, bias = self.get_dynamic_params()
+        self.normalized_shape = (
+            self.mutable_num_features.activated_channels, )
+
+        return F.layer_norm(input, self.normalized_shape, weight, bias,
+                            self.eps)
+
+    def _check_input_dim(self, input: Tensor) -> None:
+        """Check if input dimension is valid."""
+        if input.dim() != 3:
+            raise ValueError('expected 3D input (got {}D input)'.format(
+                input.dim()))
+
+
+class DynamicSyncBatchNorm(nn.SyncBatchNorm, DynamicBatchNormMixin):
+    """DynamicOp for sync bn."""
+
+    def __init__(self,
+                 num_features: int,
+                 eps: float = 0.00001,
+                 momentum: float = 0.1,
+                 affine: bool = True,
+                 track_running_stats: bool = True,
+                 process_group: Optional[Any] = None) -> None:
+        super().__init__(num_features, eps, momentum, affine,
+                         track_running_stats, process_group)
+        self.mutable_attrs: Dict[str, Optional[BaseMutable]] = nn.ModuleDict()
+
+    @classmethod
+    def convert_from(cls, module):
+        return cls(module.num_features, module.eps, module.momentum,
+                   module.affine, module.track_running_stats,
+                   module.process_group)
+
+    @property
+    def static_op_factory(self):
+        return nn.SyncBatchNorm
+
+    def forward(self, input: Tensor) -> Tensor:
+        # currently only GPU input is supported
+        if not input.is_cuda:
+            raise ValueError(
+                'SyncBatchNorm expected input tensor to be on GPU')
+
+        self._check_input_dim(input)
+        if hasattr(self, '_check_non_zero_input_channels'):
+            self._check_non_zero_input_channels(input)
+
+        # exponential_average_factor is set to self.momentum
+        # (when it is available) only so that it gets updated
+        # in ONNX graph when this node is exported to ONNX.
+        if self.momentum is None:
+            exponential_average_factor = 0.0
+        else:
+            exponential_average_factor = self.momentum
+
+        if self.training and self.track_running_stats:
+            assert self.num_batches_tracked is not None
+            self.num_batches_tracked.add_(1)
+            if self.momentum is None:  # use cumulative moving average
+                exponential_average_factor = (1.0 /
+                                              self.num_batches_tracked.item())
+            else:  # use exponential moving average
+                exponential_average_factor = self.momentum
+        r"""
+        Decide whether the mini-batch stats should be used for normalization
+        rather than the buffers.
+        Mini-batch stats are used in training mode, and in eval mode when
+        buffers are None.
+        """
+        if self.training:
+            bn_training = True
+        else:
+            bn_training = (self.running_mean is None) and (self.running_var is
+                                                           None)
+        r"""
+        Buffers are only updated if they are to be tracked and we are in
+        training mode. Thus they only need to be
+        passed when the update should occur (i.e. in training mode when
+        they are tracked), or when buffer stats are
+        used for normalization (i.e. in eval mode when buffers are not None).
+        """
+        # If buffers are not to be tracked, ensure that they won't be updated
+        running_mean = (
+            self.running_mean
+            if not self.training or self.track_running_stats else None)
+        running_var = (
+            self.running_var
+            if not self.training or self.track_running_stats else None)
+
+        # Don't sync batchnorm stats in inference mode (model.eval()).
+        need_sync = (bn_training and self.training)
+        if need_sync:
+            process_group = torch.distributed.group.WORLD
+            if self.process_group:
+                process_group = self.process_group
+            world_size = torch.distributed.get_world_size(process_group)
+            need_sync = world_size > 1
+
+        running_mean, running_var, weight, bias = self.get_dynamic_params()
+
+        # fallback to framework BN when synchronization is not necessary
+        if not need_sync:
+            out = F.batch_norm(
+                input,
+                running_mean,
+                running_var,
+                weight,
+                bias,
+                bn_training,
+                exponential_average_factor,
+                self.eps,
+            )
+        else:
+            assert bn_training
+            out = sync_batch_norm.apply(
+                input,
+                weight,
+                bias,
+                running_mean,
+                running_var,
+                self.eps,
+                exponential_average_factor,
+                process_group,
+                world_size,
+            )
+
+        # copy changed running statistics
+        if self.training and self.track_running_stats:
+            out_mask = self._get_num_features_mask()
+            self.running_mean.masked_scatter_(out_mask, running_mean)
+            self.running_var.masked_scatter_(out_mask, running_var)
+
+        return out
+
+
+class DynamicBatchNormXd(_DynamicBatchNorm):
+    """Dynamic op for _DynamicBatchNorm."""
+
+    @property
+    def static_op_factory(self):
+        return _BatchNormXd
+
+    def _check_input_dim(self, input: torch.Tensor):
+        return
+
+
+@MODELS.register_module()
+class DMCPBatchNorm2d(DynamicBatchNorm2d):
+    accepted_mutable_attrs = {'num_features'}
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self.mutable_attrs: Dict[str, Optional[BaseMutable]] = nn.ModuleDict()
+
+    def forward(self,
+                input: Tensor,
+                arch_param=None,
+                arch_attr=None) -> Tensor:
+        """Forward of dynamic DMCPBatchNorm2d."""
+        out = self.forward_batchnorm(input)
+        if arch_param is not None:
+            out = self.forward_arch_param(out, arch_param, arch_attr)
+        return out
+
+    def forward_batchnorm(self, input: Tensor) -> Tensor:
+        """Forward of BatchNorm2d."""
+        self._check_input_dim(input)
+
+        if self.momentum is None:
+            exponential_average_factor = 0.0
+        else:
+            exponential_average_factor = self.momentum
+
+        if self.training and self.track_running_stats:
+            if self.num_batches_tracked is not None:  # type: ignore
+                self.num_batches_tracked = \
+                    self.num_batches_tracked + 1  # type: ignore
+                if self.momentum is None:  # use cumulative moving average
+                    exponential_average_factor = 1.0 / float(
+                        self.num_batches_tracked)
+                else:  # use exponential moving average
+                    exponential_average_factor = self.momentum
+
+        if self.training:
+            bn_training = True
+        else:
+            bn_training = (self.running_mean is None) and (self.running_var is
+                                                           None)
+
+        running_mean, running_var, weight, bias = self.get_dynamic_params()
+
+        out = F.batch_norm(input, running_mean, running_var, weight, bias,
+                           bn_training, exponential_average_factor, self.eps)
+
+        # copy changed running statistics
+        if self.training and self.track_running_stats:
+            out_mask = self._get_num_features_mask()
+            self.running_mean.masked_scatter_(out_mask, running_mean)
+            self.running_var.masked_scatter_(out_mask, running_var)
+
+        return out
+
+    def forward_arch_param(self, input: Tensor, arch_param,
+                           arch_attr) -> Tensor:
+        """Forward of arch parameters."""
+        size_x = input.size()
+        (group_size, num_groups, min_ch) = arch_attr
+
+        if num_groups == 0 or size_x[1] == min_ch:
+            return input
+
+        arch = torch.clamp(arch_param, min=0)
+        prob_distribute = torch.exp(-arch)
+
+        prob = torch.cumprod(prob_distribute, dim=0).view(num_groups, 1)
+        tp_x = input.transpose(0, 1).contiguous()
+        tp_group_x = tp_x[min_ch:]
+
+        size_tp_group = tp_group_x.size()
+        num_groups = size_tp_group[0] // group_size
+        tp_group_x = tp_group_x.view(num_groups, -1) * prob[:num_groups]
+        tp_group_x = tp_group_x.view(size_tp_group)
+
+        out = torch.cat([tp_x[:min_ch], tp_group_x]).transpose(0,
+                                                               1).contiguous()
+        return out
+
+    def set_forward_args(self, arch_param: nn.Parameter,
+                         arch_attr: Optional[Tuple]) -> None:
+        """Interface for modifying the arch_param using partial."""
+        forward_with_default_args: PartialType = \
+            partial(self.forward, arch_param=arch_param, arch_attr=arch_attr)
+        setattr(self, 'forward', forward_with_default_args)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_relative_position.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_relative_position.py
new file mode 100644
index 0000000000000000000000000000000000000000..572880a436ae5d568e174d88e3e7f562ac6b901d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/dynamic_relative_position.py
@@ -0,0 +1,154 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Dict, Set
+
+import torch
+from mmengine import print_log
+from torch import Tensor, nn
+
+from mmrazor.models.architectures.ops import RelativePosition2D
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from ..mixins import DynamicChannelMixin
+
+
+class DynamicRelativePosition2D(RelativePosition2D, DynamicChannelMixin):
+    """Searchable RelativePosition module.
+
+    Note:
+        Arguments for ``__init__`` of ``DynamicRelativePosition2D`` is totally
+        same as :obj:`mmrazor.models.architectures.RelativePosition2D`.
+    Attributes:
+        mutable_attrs (ModuleDict[str, BaseMutable]): Mutable attributes,
+            such as `head_dims`. The key of the dict must in
+            ``accepted_mutable_attrs``.
+    """
+
+    mutable_attrs: nn.ModuleDict
+    head_dims: int
+    max_relative_position: int
+    embeddings_table_v: nn.Parameter
+    embeddings_table_h: nn.Parameter
+    accepted_mutable_attrs: Set[str] = {'head_dims'}
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'head_dims',
+        'out_channels': 'head_dims',
+    }
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        self.mutable_attrs: Dict[str, BaseMutable] = nn.ModuleDict()
+
+    @property
+    def mutable_head_dims(self):
+        """Mutable head dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['head_dims']
+
+    def register_mutable_attr(self, attr: str, mutable: BaseMutable):
+        """Register attribute of mutable."""
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr, mutable):
+        """Register `head_dims`"""
+        if attr == 'head_dims':
+            self._registry_mutable_head_dims(mutable)
+        else:
+            raise NotImplementedError
+
+    def _registry_mutable_head_dims(self,
+                                    mutable_head_dims: BaseMutable) -> None:
+        """Register head dimension."""
+        assert hasattr(self, 'mutable_attrs')
+        self.mutable_attrs['head_dims'] = mutable_head_dims
+
+    def to_static_op(self) -> nn.Module:
+        """Convert dynamic RelativePosition2D to static One."""
+        self.check_if_mutables_fixed()
+        assert self.mutable_head_dims is not None
+
+        self.current_head_dim = self.mutable_head_dims.activated_channels
+        static_relative_position = self.static_op_factory(
+            self.current_head_dim)
+        static_relative_position.embeddings_table_v = \
+            nn.Parameter(
+                self.embeddings_table_v[:, :self.current_head_dim].clone())
+        static_relative_position.embeddings_table_h = \
+            nn.Parameter(
+                self.embeddings_table_h[:, :self.current_head_dim].clone())
+
+        return static_relative_position
+
+    @property
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+        return RelativePosition2D
+
+    @classmethod
+    def convert_from(cls, module):
+        """Convert a RP to a dynamic RP."""
+        dynamic_rp = cls(
+            head_dims=module.head_dims,
+            max_relative_position=module.max_relative_position)
+        return dynamic_rp
+
+    def forward(self, length_q, length_k) -> Tensor:
+        """Forward of Dynamic Relative Position."""
+        if self.mutable_head_dims is None:
+            self.current_head_dim = self.head_dims
+        else:
+            self.current_head_dim = self.mutable_head_dims.activated_channels
+
+        self.sample_eb_table_h = self.embeddings_table_h[:, :self.
+                                                         current_head_dim]
+        self.sample_eb_table_v = self.embeddings_table_v[:, :self.
+                                                         current_head_dim]
+
+        # remove the first cls token distance computation
+        length_q = length_q - 1
+        length_k = length_k - 1
+        range_vec_q = torch.arange(length_q)
+        range_vec_k = torch.arange(length_k)
+        # compute the row and column distance
+        distance_mat_v = (
+            range_vec_k[None, :] // int(length_q**0.5) -
+            range_vec_q[:, None] // int(length_q**0.5))
+        distance_mat_h = (
+            range_vec_k[None, :] % int(length_q**0.5) -
+            range_vec_q[:, None] % int(length_q**0.5))
+        distance_mat_clipped_v = torch.clamp(distance_mat_v,
+                                             -self.max_relative_position,
+                                             self.max_relative_position)
+        distance_mat_clipped_h = torch.clamp(distance_mat_h,
+                                             -self.max_relative_position,
+                                             self.max_relative_position)
+
+        final_mat_v = distance_mat_clipped_v + self.max_relative_position + 1
+        final_mat_h = distance_mat_clipped_h + self.max_relative_position + 1
+        # pad the 0 which represent the cls token
+        final_mat_v = torch.nn.functional.pad(final_mat_v, (1, 0, 1, 0),
+                                              'constant', 0)
+        final_mat_h = torch.nn.functional.pad(final_mat_h, (1, 0, 1, 0),
+                                              'constant', 0)
+
+        final_mat_v = torch.LongTensor(final_mat_v)
+        final_mat_h = torch.LongTensor(final_mat_h)
+        # get the embeddings with the corresponding distance
+
+        embeddings = self.sample_eb_table_v[final_mat_v] + \
+            self.sample_eb_table_h[final_mat_h]
+
+        return embeddings
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/group_fisher_ops.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/group_fisher_ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4a635607d87d072a249e7716f379ecb8906c4ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/bricks/group_fisher_ops.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherConv2d  # noqa
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherLinear  # noqa
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherMixin  # noqa
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..a9da44d6ef58161ebb0142499bcc4b0fd1c557b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dynamic_linear_head import DynamicLinearClsHead  # noqa: F401
+
+__all__ = ['DynamicLinearClsHead']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/dynamic_linear_head.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/dynamic_linear_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a6a21d4c758d1dc23b8b6740d5637ee875d1691
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/head/dynamic_linear_head.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import abstractmethod
+from typing import Optional, Tuple
+
+import torch
+
+try:
+    from mmcls.models import ClsHead
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ClsHead = get_placeholder('mmcls')
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from mmrazor.models.mutables.mutable_channel import MutableChannelContainer
+from mmrazor.models.mutables.mutable_channel.units import \
+    OneShotMutableChannelUnit
+from mmrazor.registry import MODELS
+from ..bricks.dynamic_linear import DynamicLinear
+
+
+class DynamicHead:
+
+    @abstractmethod
+    def connect_with_backbone(self,
+                              backbone_output_mutable: BaseMutable) -> None:
+        """Connect with Dynamic Backbone."""
+        ...
+
+
+@MODELS.register_module()
+class DynamicLinearClsHead(ClsHead, DynamicHead):
+    """Dynamic Linear classification head for Autoformer.
+
+    Args:
+        num_classes (int): Number of classes.
+        in_channels (int): Number of input channels.
+        init_cfg (Optional[dict], optional): Init config.
+            Defaults to dict(type='Normal',
+                        layer='DynamicLinear', std=0.01).
+    """
+
+    def __init__(self,
+                 num_classes: int = 1000,
+                 in_channels: int = 624,
+                 init_cfg: Optional[dict] = dict(
+                     type='Normal', layer='DynamicLinear', std=0.01),
+                 **kwargs):
+        super().__init__(init_cfg=init_cfg, **kwargs)
+
+        self.in_channels = in_channels
+        self.num_classes = num_classes
+
+        if self.num_classes <= 0:
+            raise ValueError(
+                f'num_classes={num_classes} must be a positive integer')
+
+        self.fc = DynamicLinear(self.in_channels, self.num_classes)
+
+    def pre_logits(self, feats: Tuple[torch.Tensor]) -> torch.Tensor:
+        """The process before the final classification head.
+
+        The input ``feats`` is a tuple of tensor, and each tensor is the
+        feature of a backbone stage. In ``LinearClsHead``, we just obtain the
+        feature of the last stage.
+        """
+        # The LinearClsHead doesn't have other module, just return after
+        # unpacking.
+        return feats[-1]
+
+    def forward(self, feats: Tuple[torch.Tensor]) -> torch.Tensor:
+        """The forward process."""
+        pre_logits = self.pre_logits(feats)
+        # The final classification head.
+        cls_score = self.fc(pre_logits)
+        return cls_score
+
+    def connect_with_backbone(self,
+                              backbone_output_mutable: BaseMutable) -> None:
+        """Connect dynamic backbone."""
+
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.fc, backbone_output_mutable, False)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e97f7ad7810e3ab26221dcb2de34992f345da76a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dynamic_conv_mixins import DynamicConvMixin
+from .dynamic_layernorm_mixins import DynamicLayerNormMixin
+from .dynamic_mixins import (DynamicBatchNormMixin, DynamicChannelMixin,
+                             DynamicLinearMixin, DynamicMixin)
+
+__all__ = [
+    'DynamicChannelMixin',
+    'DynamicBatchNormMixin',
+    'DynamicLinearMixin',
+    'DynamicMixin',
+    'DynamicConvMixin',
+    'DynamicLayerNormMixin',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_conv_mixins.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_conv_mixins.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf63ec11c6632b2ba92745ba21b8da834854a4ed
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_conv_mixins.py
@@ -0,0 +1,572 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import abstractmethod
+from functools import partial
+from itertools import repeat
+from typing import Any, Callable, Iterable, Optional, Tuple, Union
+
+import torch
+import torch.nn.functional as F
+from torch import Tensor, nn
+from torch.nn.modules.conv import _ConvNd
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from .dynamic_mixins import DynamicChannelMixin
+
+PartialType = Callable[[Any, Optional[nn.Parameter]], Any]
+
+
+def _ntuple(n: int) -> Callable:  # pragma: no cover
+    """Repeat a number n times."""
+
+    def parse(x):
+        if isinstance(x, Iterable):
+            return tuple(x)
+        return tuple(repeat(x, n))
+
+    return parse
+
+
+def _get_current_kernel_pos(source_kernel_size: int,
+                            target_kernel_size: int) -> Tuple[int, int]:
+    """Get position of current kernel size.
+
+    Returns:
+        Tuple[int, int]: (upper left position, bottom right position)
+    """
+    assert source_kernel_size >= target_kernel_size, \
+        '`source_kernel_size` must greater or equal than `target_kernel_size`'
+
+    center = source_kernel_size >> 1
+    current_offset = target_kernel_size >> 1
+
+    start_offset = center - current_offset
+    end_offset = center + current_offset + 1
+
+    return start_offset, end_offset
+
+
+def _get_same_padding(kernel_size: int, n_dims: int) -> Tuple[int]:
+    """Get same padding according to kernel size."""
+    assert kernel_size & 1
+    _pair = _ntuple(n_dims)
+
+    return _pair(kernel_size >> 1)
+
+
+class DynamicConvMixin(DynamicChannelMixin):
+    """A mixin class for Pytorch conv, which can mutate ``in_channels`` and
+    ``out_channels``.
+
+    Note:
+        All subclass should implement ``conv_func``API.
+    """
+
+    @property
+    @abstractmethod
+    def conv_func(self: _ConvNd):
+        """The function that will be used in ``forward_mixin``."""
+        pass
+
+    def register_mutable_attr(self, attr, mutable):
+
+        if attr == 'in_channels':
+            self._register_mutable_in_channels(mutable)
+        elif attr == 'out_channels':
+            self._register_mutable_out_channels(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_in_channels(
+            self: _ConvNd, mutable_in_channels: BaseMutable) -> None:
+        """Mutate ``in_channels`` with given mutable.
+
+        Args:
+            mutable_in_channels (BaseMutable): Mutable for controlling
+                ``in_channels``.
+
+        Raises:
+            ValueError: Error if size of mask if not same as ``in_channels``.
+        """
+        assert hasattr(self, 'mutable_attrs')
+        self.check_mutable_channels(mutable_in_channels)
+        mask_size = mutable_in_channels.current_mask.size(0)
+        if mask_size != self.in_channels:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.in_channels} as '
+                f'`in_channels`, but got: {mask_size}.')
+
+        self.mutable_attrs['in_channels'] = mutable_in_channels
+
+    def _register_mutable_out_channels(
+            self: _ConvNd, mutable_out_channels: BaseMutable) -> None:
+        """Mutate ``out_channels`` with given mutable.
+
+        Args:
+            mutable_out_channels (BaseMutable): Mutable for controlling
+                ``out_channels``.
+
+        Raises:
+            ValueError: Error if size of mask if not same as ``out_channels``.
+        """
+        assert hasattr(self, 'mutable_attrs')
+        self.check_mutable_channels(mutable_out_channels)
+        mask_size = mutable_out_channels.current_mask.size(0)
+        if mask_size != self.out_channels:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.out_channels} as '
+                f'`out_channels`, but got: {mask_size}.')
+
+        self.mutable_attrs['out_channels'] = mutable_out_channels
+
+    @property
+    def mutable_in_channels(self: _ConvNd) -> Optional[BaseMutable]:
+        """Mutable related to input."""
+        assert hasattr(self, 'mutable_attrs')
+        return getattr(self.mutable_attrs, 'in_channels', None)  # type:ignore
+
+    @property
+    def mutable_out_channels(self: _ConvNd) -> Optional[BaseMutable]:
+        """Mutable related to output."""
+        assert hasattr(self, 'mutable_attrs')
+        return getattr(self.mutable_attrs, 'out_channels', None)  # type:ignore
+
+    def get_dynamic_params(
+            self: _ConvNd) -> Tuple[Tensor, Optional[Tensor], Tuple[int]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor], Tuple[int]]: Sliced weight, bias
+                and padding.
+        """
+        # slice in/out channel of weight according to
+        # mutable in_channels/out_channels
+        weight, bias = self._get_dynamic_params_by_mutable_channels(
+            self.weight, self.bias)
+        return weight, bias, self.padding
+
+    def _get_dynamic_params_by_mutable_channels(
+            self: _ConvNd, weight: Tensor,
+            bias: Optional[Tensor]) -> Tuple[Tensor, Optional[Tensor]]:
+        """Get sliced weight and bias according to ``mutable_in_channels`` and
+        ``mutable_out_channels``.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor]]: Sliced weight and bias.
+        """
+        if 'in_channels' not in self.mutable_attrs and \
+                'out_channels' not in self.mutable_attrs:
+            return weight, bias
+
+        if 'in_channels' in self.mutable_attrs:
+            mutable_in_channels = self.mutable_attrs['in_channels']
+            in_mask = mutable_in_channels.current_mask.to(weight.device)
+        else:
+            in_mask = torch.ones(weight.size(1)).bool().to(weight.device)
+
+        if 'out_channels' in self.mutable_attrs:
+            mutable_out_channels = self.mutable_attrs['out_channels']
+            out_mask = mutable_out_channels.current_mask.to(weight.device)
+        else:
+            out_mask = torch.ones(weight.size(0)).bool().to(weight.device)
+
+        if self.groups == 1:
+            weight = weight[out_mask][:, in_mask]
+        elif self.groups == self.in_channels == self.out_channels:
+            # depth-wise conv
+            weight = weight[out_mask]
+        else:
+            # group-wise conv
+            in_mask_ = in_mask.reshape([self.groups, -1])  # G in/G
+            in_per_group = in_mask_.sum(dim=-1)[0].item()
+            assert (in_mask_.sum(dim=-1) == in_per_group).all()
+            out_mask_ = out_mask.reshape([self.groups, -1])  # G out/G
+            out_per_group = out_mask_.sum(dim=-1)[0].item()
+            assert (out_mask_.sum(dim=-1) == out_per_group).all()
+
+            mask = out_mask_.unsqueeze(-1) * in_mask_.unsqueeze(
+                -2)  # G out/G in/G
+            mask = mask.flatten()
+            weight = weight.flatten(0, 1)
+            weight = weight[mask]
+            weight = weight.reshape(
+                [self.groups * out_per_group, in_per_group, *self.kernel_size])
+
+        bias = self.bias[out_mask] if self.bias is not None else None
+        return weight, bias
+
+    def forward_mixin(self: _ConvNd, x: Tensor) -> Tensor:
+        """Forward of dynamic conv2d OP."""
+        groups = self.groups
+        if self.groups == self.in_channels == self.out_channels:
+            groups = x.size(1)
+        weight, bias, padding = self.get_dynamic_params()
+
+        return self.conv_func(x, weight, bias, self.stride, padding,
+                              self.dilation, groups)
+
+    def to_static_op(self: _ConvNd) -> nn.Conv2d:
+        """Convert dynamic conv2d to :obj:`torch.nn.Conv2d`.
+
+        Returns:
+            torch.nn.Conv2d: :obj:`torch.nn.Conv2d` with sliced parameters.
+        """
+        self.check_if_mutables_fixed()
+
+        weight, bias, padding = self.get_dynamic_params()
+        groups = self.groups
+        if groups == self.in_channels == self.out_channels and \
+                self.mutable_in_channels is not None:
+            mutable_in_channels = self.mutable_attrs['in_channels']
+            groups = mutable_in_channels.current_mask.sum().item()
+        out_channels = weight.size(0)
+        in_channels = weight.size(1) * groups
+
+        kernel_size = tuple(weight.shape[2:])
+
+        static_conv = self.static_op_factory(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=self.stride,
+            padding=padding,
+            padding_mode=self.padding_mode,
+            dilation=self.dilation,
+            groups=groups,
+            bias=True if bias is not None else False)
+
+        static_conv.weight = nn.Parameter(weight)
+        if bias is not None:
+            static_conv.bias = nn.Parameter(bias)
+
+        return static_conv
+
+
+class BigNasConvMixin(DynamicConvMixin):
+    """A mixin class for Pytorch conv, which can mutate ``in_channels``,
+    ``out_channels`` and ``kernel_size``."""
+
+    def register_mutable_attr(self, attr, mutable):
+
+        if attr == 'in_channels':
+            self._register_mutable_in_channels(mutable)
+        elif attr == 'out_channels':
+            self._register_mutable_out_channels(mutable)
+        elif attr == 'kernel_size':
+            self._register_mutable_kernel_size(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_kernel_size(
+            self: _ConvNd, mutable_kernel_size: BaseMutable) -> None:
+        """Mutate ``kernel_size`` with given mutable.
+
+        Args:
+            mutable_kernel_size (BaseMutable): Mutable for controlling
+                ``kernel_size``.
+
+        Note:
+            ``kernel_size_seq`` must be provided if ``mutable_kernel_size``
+            does not have ``choices`` attribute.
+
+        Raises:
+            ValueError: Error if max choice of ``kernel_size_list``
+                not same as ``kernel_size``.
+        """
+
+        kernel_size_seq = getattr(mutable_kernel_size, 'choices', None)
+        if kernel_size_seq is None or len(kernel_size_seq) == 0:
+            raise ValueError('kernel size sequence must be provided')
+        kernel_size_list = list(sorted(kernel_size_seq))
+
+        _pair = _ntuple(len(self.weight.shape) - 2)
+        max_kernel_size = _pair(kernel_size_list[-1])
+        if max_kernel_size != self.kernel_size:
+            raise ValueError(
+                f'Expect max kernel size to be: {self.kernel_size}, '
+                f'but got: {max_kernel_size}')
+
+        self.kernel_size_list = kernel_size_list
+        self.mutable_attrs['kernel_size'] = mutable_kernel_size
+
+    def get_dynamic_params(
+            self: _ConvNd) -> Tuple[Tensor, Optional[Tensor], Tuple[int]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor], Tuple[int]]: Sliced weight, bias
+                and padding.
+        """
+        # 1. slice kernel size of weight according to kernel size mutable
+        weight, padding = self._get_dynamic_params_by_mutable_kernel_size(
+            self.weight)
+
+        # 2. slice in/out channel of weight according to mutable in_channels
+        # and mutable out channels.
+        weight, bias = self._get_dynamic_params_by_mutable_channels(
+            weight, self.bias)
+        return weight, bias, padding
+
+    def _get_dynamic_params_by_mutable_kernel_size(
+            self: _ConvNd, weight: Tensor) -> Tuple[Tensor, Tuple]:
+        """Get sliced weight and bias according to ``mutable_in_channels`` and
+        ``mutable_out_channels``."""
+
+        if 'kernel_size' not in self.mutable_attrs \
+                or self.kernel_size_list is None:
+            return weight, self.padding
+
+        mutable_kernel_size = self.mutable_attrs['kernel_size']
+        current_kernel_size = self.get_current_choice(mutable_kernel_size)
+
+        n_dims = len(self.weight.shape) - 2
+        current_padding: Union[Tuple[int], Tuple[int, int]] = \
+            _get_same_padding(current_kernel_size, n_dims)
+
+        _pair = _ntuple(len(self.weight.shape) - 2)
+        if _pair(current_kernel_size) == self.kernel_size:
+            return weight, current_padding
+
+        start_offset, end_offset = _get_current_kernel_pos(
+            source_kernel_size=self.kernel_size[0],
+            target_kernel_size=current_kernel_size)
+        current_weight = \
+            weight[:, :, start_offset:end_offset, start_offset:end_offset]
+
+        return current_weight, current_padding
+
+
+class OFAConvMixin(BigNasConvMixin):
+    """A mixin class for Pytorch conv, which can mutate ``in_channels``,
+    ``out_channels`` and ``kernel_size``."""
+
+    def _register_mutable_kernel_size(
+            self: _ConvNd, mutable_kernel_size: BaseMutable) -> None:
+        """Mutate ``kernel_size`` with given mutable and register
+        transformation matrix."""
+        super()._register_mutable_kernel_size(mutable_kernel_size)
+        self._register_trans_matrix()
+
+    def _register_trans_matrix(self: _ConvNd) -> None:
+        """Register transformation matrix that used in progressive
+        shrinking."""
+        assert self.kernel_size_list is not None
+
+        trans_matrix_names = []
+        for i in range(len(self.kernel_size_list) - 1, 0, -1):
+            source_kernel_size = self.kernel_size_list[i]
+            target_kernel_size = self.kernel_size_list[i - 1]
+            trans_matrix_name = self._get_trans_matrix_name(
+                src=source_kernel_size, tar=target_kernel_size)
+            trans_matrix_names.append(trans_matrix_name)
+            # TODO support conv1d & conv3d
+            trans_matrix = nn.Parameter(torch.eye(target_kernel_size**2))
+            self.register_parameter(name=trans_matrix_name, param=trans_matrix)
+        self._trans_matrix_names = trans_matrix_names
+
+    @staticmethod
+    def _get_trans_matrix_name(src: int, tar: int) -> str:
+        """Get name of trans matrix."""
+        return f'trans_matrix_{src}to{tar}'
+
+    def _get_dynamic_params_by_mutable_kernel_size(
+            self: _ConvNd, weight: Tensor) -> Tuple[Tensor, Tuple]:
+        """Get sliced weight and bias according to ``mutable_in_channels`` and
+        ``mutable_out_channels``."""
+
+        if 'kernel_size' not in self.mutable_attrs:
+            return weight, self.padding
+
+        mutable_kernel_size = self.mutable_attrs['kernel_size']
+        current_kernel_size = self.get_current_choice(mutable_kernel_size)
+
+        n_dims = len(self.weight.shape) - 2
+        current_padding: Union[Tuple[int], Tuple[int, int]] = \
+            _get_same_padding(current_kernel_size, n_dims)
+
+        _pair = _ntuple(len(self.weight.shape) - 2)
+        if _pair(current_kernel_size) == self.kernel_size:
+            return weight, current_padding
+
+        current_weight = weight[:, :, :, :]
+        for i in range(len(self.kernel_size_list) - 1, 0, -1):
+            source_kernel_size = self.kernel_size_list[i]
+            if source_kernel_size <= current_kernel_size:
+                break
+            target_kernel_size = self.kernel_size_list[i - 1]
+            trans_matrix = getattr(
+                self,
+                self._get_trans_matrix_name(
+                    src=source_kernel_size, tar=target_kernel_size))
+
+            start_offset, end_offset = _get_current_kernel_pos(
+                source_kernel_size=source_kernel_size,
+                target_kernel_size=target_kernel_size)
+            target_weight = current_weight[:, :, start_offset:end_offset,
+                                           start_offset:end_offset]
+            target_weight = target_weight.reshape(-1, target_kernel_size**2)
+            target_weight = F.linear(target_weight, trans_matrix)
+            target_weight = target_weight.reshape(
+                weight.size(0), weight.size(1), target_kernel_size,
+                target_kernel_size)
+
+            current_weight = target_weight
+
+        return current_weight, current_padding
+
+
+class FuseConvMixin(DynamicConvMixin):
+    """A mixin class for fuse conv, which can mutate ``in_channels``,
+    ``out_channels`` ."""
+
+    def set_forward_args(self, choice: Tensor) -> None:
+        """Interface for modifying the arch_param using partial."""
+        param_channel_with_default_args: PartialType = \
+            partial(
+                self._get_dynamic_params_by_mutable_channels_choice,
+                choice=choice)
+        setattr(self, '_get_dynamic_params_by_mutable_channels',
+                param_channel_with_default_args)
+
+    def get_dynamic_params(
+            self: _ConvNd) -> Tuple[Tensor, Optional[Tensor], Tuple[int]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor], Tuple[int]]: Sliced weight, bias
+                and padding.
+        """
+        # slice in/out channel of weight according to mutable in_channels
+        # and mutable out channels.
+        weight, bias = self._get_dynamic_params_by_mutable_channels(
+            self.weight, self.bias)
+        return weight, bias, self.padding
+
+    def _get_dynamic_params_by_mutable_channels_choice(
+            self: _ConvNd, weight: Tensor, bias: Optional[Tensor],
+            choice: Tensor) -> Tuple[Tensor, Optional[Tensor]]:
+        """Get sliced weight and bias according to ``mutable_in_channels`` and
+        ``mutable_out_channels``.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor]]: Sliced weight and bias.
+        """
+
+        mutable_in_channels = 0
+        mutable_out_channels = 0
+
+        if 'in_channels' in self.mutable_attrs:
+            mutable_in_channels = self.mutable_attrs[
+                'in_channels'].current_mask.sum().item()
+
+        if 'out_channels' in self.mutable_attrs:
+            mutable_out_channels = self.mutable_attrs[
+                'out_channels'].current_mask.sum().item()
+
+        if mutable_in_channels == 0:
+            mutable_in_channels = self.in_channels
+        if mutable_out_channels == 0:
+            mutable_out_channels = self.out_channels
+
+        # if channel not in mutable_attrs or unchanged
+        if mutable_in_channels == self.in_channels and \
+                mutable_out_channels == self.out_channels:
+            return weight, bias
+
+        weight = self.weight[:, 0:mutable_in_channels, :, :]
+        if self.groups == 1:
+            cout, cin, k, _ = weight.shape
+            fused_weight = torch.mm(choice,
+                                    weight.reshape(cout,
+                                                   -1)).reshape(-1, cin, k, k)
+        elif self.groups == self.in_channels == self.out_channels:
+            # depth-wise conv
+            cout, cin, k, _ = weight.shape
+            fused_weight = torch.mm(choice,
+                                    weight.reshape(cout,
+                                                   -1)).reshape(-1, cin, k, k)
+        else:
+            raise NotImplementedError(
+                'Current `ChannelMutator` only support pruning the depth-wise '
+                '`nn.Conv2d` or `nn.Conv2d` module whose group number equals '
+                f'to one, but got {self.groups}.')
+        if (self.bias is not None):
+            fused_bias = torch.mm(choice, self.bias.unsqueeze(1)).squeeze(1)
+        else:
+            fused_bias = self.bias
+        return fused_weight, fused_bias
+
+    def to_static_op(self: _ConvNd) -> nn.Conv2d:
+        """Convert dynamic conv2d to :obj:`torch.nn.Conv2d`.
+
+        Returns:
+            torch.nn.Conv2d: :obj:`torch.nn.Conv2d` with sliced parameters.
+        """
+        self.check_if_mutables_fixed()
+
+        weight, bias, padding = self.get_dynamic_params()
+        groups = self.groups
+        if groups == self.in_channels == self.out_channels and \
+                self.mutable_in_channels is not None:
+            mutable_in_channels = self.mutable_attrs['in_channels']
+            groups = mutable_in_channels.current_mask.sum().item()
+        out_channels = weight.size(0)
+        in_channels = weight.size(1) * groups
+
+        kernel_size = tuple(weight.shape[2:])
+
+        static_conv = self.static_op_factory(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=self.stride,
+            padding=padding,
+            padding_mode=self.padding_mode,
+            dilation=self.dilation,
+            groups=groups,
+            bias=True if bias is not None else False)
+
+        static_conv.weight = nn.Parameter(weight)
+        if bias is not None:
+            static_conv.bias = nn.Parameter(bias)
+
+        return static_conv
+
+    def get_pooled_channel(self: _ConvNd, tau: float) -> Tensor:
+        """Calculate channel's kl and apply softmax pooling on channel. Return
+        `layeri_softmaxp` as pooling result.
+
+        Args:
+            tau (float): Temperature by epoch/iter.
+
+        Returns:
+            Tensor: softmax pooled channel.
+        """
+        param = self.weight
+
+        # Compute layeri_param.
+        layeri_param = torch.reshape(param.detach(), (param.shape[0], -1))
+        layeri_Eudist = torch.cdist(layeri_param, layeri_param, p=2)
+        layeri_negaEudist = -layeri_Eudist
+        softmax = nn.Softmax(dim=1)
+        layeri_softmaxp = softmax(layeri_negaEudist / tau)
+
+        # KL = [c, 1, c] * ([c, 1 ,c] / [c, c, 1]).log()
+        #    = [c, 1, c] * ([c, 1, c].log() - [c, c, 1].log())
+        # only dim0 is required, dim1 and dim2 are pooled
+        # calc mean(dim=1) first
+
+        # avoid frequent NaN
+        eps = 1e-7
+        layeri_kl = layeri_softmaxp[:, None, :]
+        log_p = layeri_kl * (layeri_kl + eps).log()
+        log_q = layeri_kl * torch.mean((layeri_softmaxp + eps).log(), dim=1)
+
+        layeri_kl = torch.mean((log_p - log_q), dim=2)
+        del log_p, log_q
+        real_out = self.mutable_attrs['out_channels'].activated_channels
+
+        layeri_iscore_kl = torch.sum(layeri_kl, dim=1)
+        _, topm_ids_order = torch.topk(
+            layeri_iscore_kl, int(real_out), sorted=False)
+        del param, layeri_param, layeri_negaEudist, layeri_kl
+        return layeri_softmaxp[topm_ids_order, :]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_layernorm_mixins.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_layernorm_mixins.py
new file mode 100644
index 0000000000000000000000000000000000000000..785be9935b03a25b24cf8f9d0eb1bc12f2242730
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_layernorm_mixins.py
@@ -0,0 +1,147 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from typing import Dict, Optional, Set, Tuple
+
+import torch
+from mmengine import print_log
+from torch import Tensor, nn
+from torch.nn import LayerNorm
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+from .dynamic_mixins import DynamicChannelMixin
+
+
+class DynamicLayerNormMixin(DynamicChannelMixin):
+    """A mixin class for Pytorch LayerNorm, which can mutate
+    ``num_features``."""
+    accepted_mutable_attrs: Set[str] = {'num_features'}
+
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'num_features',
+        'out_channels': 'num_features',
+    }
+
+    @property
+    def num_features(self):
+        return getattr(self, 'normalized_shape')[0]
+
+    @property
+    def mutable_num_features(self):
+        """Mutable number of features."""
+        assert hasattr(self, 'mutable_attrs')
+        return self.mutable_attrs['num_features']
+
+    def register_mutable_attr(self, attr, mutable):
+        """Register attribute of mutable."""
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr, mutable):
+        """Register `num_features`."""
+        if attr == 'num_features':
+            self._register_mutable_num_features(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_num_features(
+            self: LayerNorm, mutable_num_features: BaseMutable) -> None:
+        """Mutate ``num_features`` with given mutable.
+
+        Args:
+            mutable_num_features (BaseMutable): Mutable for controlling
+                ``num_features``.
+        Raises:
+            RuntimeError: Error if both ``affine`` and
+                ``tracking_running_stats`` are False.
+            ValueError: Error if size of mask if not same as ``num_features``.
+        """
+        if not self.elementwise_affine:
+            raise RuntimeError(
+                'num_features can not be mutated if both `affine` and '
+                '`tracking_running_stats` are False')
+
+        self.check_mutable_channels(mutable_num_features)
+        mask_size = mutable_num_features.current_mask.size(0)
+
+        # normalized_shape is a tuple
+        if mask_size != self.normalized_shape[0]:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.normalized_shape}'
+                f' as `normalized_shape`, but got: {mask_size}.')
+
+        self.mutable_attrs['num_features'] = mutable_num_features
+
+    def _get_num_features_mask(self: LayerNorm) -> Optional[torch.Tensor]:
+        """Get mask of ``num_features``."""
+        if self.elementwise_affine:
+            refer_tensor = self.weight
+        else:
+            return None
+
+        if 'num_features' in self.mutable_attrs:
+            out_mask = self.mutable_num_features.current_mask.to(
+                refer_tensor.device)
+        else:
+            out_mask = torch.ones_like(refer_tensor).bool()
+
+        return out_mask
+
+    def get_dynamic_params(
+            self: LayerNorm) -> Tuple[Optional[Tensor], Optional[Tensor]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor],
+                Optional[Tensor]]: Sliced running_mean, running_var, weight and
+                bias.
+        """
+        out_mask = self._get_num_features_mask()
+
+        if self.elementwise_affine:
+            weight = self.weight[out_mask]
+            bias = self.bias[out_mask]
+        else:
+            weight, bias = self.weight, self.bias
+
+        return weight, bias
+
+    def to_static_op(self: LayerNorm) -> nn.Module:
+        """Convert dynamic LayerNormxd to :obj:`torch.nn.LayerNormxd`.
+
+        Returns:
+            torch.nn.LayerNormxd: :obj:`torch.nn.LayerNormxd` with sliced
+                parameters.
+        """
+        self.check_if_mutables_fixed()
+
+        weight, bias = self.get_dynamic_params()
+
+        if 'num_features' in self.mutable_attrs:
+            num_features = self.mutable_attrs['num_features'].current_mask.sum(
+            ).item()
+        else:
+            num_features = self.num_features
+
+        static_ln = self.static_op_factory(
+            normalized_shape=num_features,
+            eps=self.eps,
+            elementwise_affine=self.elementwise_affine)
+
+        if weight is not None:
+            static_ln.weight = nn.Parameter(weight.clone())
+        if bias is not None:
+            static_ln.bias = nn.Parameter(bias.clone())
+
+        return static_ln
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_mixins.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_mixins.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a610e5e8c53738e45c341de502b94d6d2083168
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/dynamic_ops/mixins/dynamic_mixins.py
@@ -0,0 +1,455 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+from abc import ABC, abstractmethod
+from typing import Any, Dict, Optional, Set, Tuple
+
+import torch
+from mmengine import print_log
+from torch import Tensor, nn
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.mutables.base_mutable import BaseMutable
+
+
+class DynamicMixin(ABC):
+    """Base class for dynamic OP. A dynamic OP usually consists of a normal
+    static OP and mutables, where mutables are used to control the searchable
+    (mutable) part of the dynamic OP.
+
+    Note:
+        When the dynamic OP has just been initialized, its forward propagation
+        logic should be the same as the corresponding static OP. Only after
+        the searchable part accepts the specific mutable through the
+        corresponding interface does the part really become dynamic.
+
+    Note:
+        All subclass should implement ``to_static_op`` and
+        ``static_op_factory`` APIs.
+
+    Args:
+        accepted_mutables (set): The string set of all accepted mutables.
+    """
+    accepted_mutable_attrs: Set[str] = set()
+    attr_mappings: Dict[str, str] = dict()
+
+    @abstractmethod
+    def register_mutable_attr(self, attr: str, mutable: BaseMutable):
+        pass
+
+    def get_mutable_attr(self, attr: str) -> BaseMutable:
+
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            return getattr(self.mutable_attrs, attr_map, None)  # type:ignore
+        else:
+            return getattr(self.mutable_attrs, attr, None)  # type:ignore
+
+    @classmethod
+    @abstractmethod
+    def convert_from(cls, module):
+        """Convert an instance of Pytorch module to a new instance of Dynamic
+        module."""
+
+    @property
+    @abstractmethod
+    def static_op_factory(self):
+        """Corresponding Pytorch OP."""
+
+    @abstractmethod
+    def to_static_op(self) -> nn.Module:
+        """Convert dynamic OP to static OP.
+
+        Note:
+            The forward result for the same input between dynamic OP and its
+            corresponding static OP must be same.
+
+        Returns:
+            nn.Module: Corresponding static OP.
+        """
+
+    def check_if_mutables_fixed(self) -> None:
+        """Check if all mutables are fixed.
+
+        Raises:
+            RuntimeError: Error if a existing mutable is not fixed.
+        """
+        from mmrazor.models.mutables import (DerivedMutable,
+                                             MutableChannelContainer)
+
+        def check_fixed(mutable: Optional[BaseMutable]) -> None:
+            if mutable is not None and not mutable.is_fixed:
+                raise RuntimeError(f'Mutable `{mutable.alias}` is not fixed.')
+
+        for mutable in self.mutable_attrs.values():  # type: ignore
+            if isinstance(mutable, (MutableChannelContainer, DerivedMutable)):
+                continue
+            check_fixed(mutable)
+
+    def check_mutable_attr_valid(self, attr):
+        assert attr in self.attr_mappings or \
+                    attr in self.accepted_mutable_attrs
+
+    @staticmethod
+    def get_current_choice(mutable: BaseMutable) -> Any:
+        """Get current choice of given mutable.
+
+        Args:
+            mutable (BaseMutable): Given mutable.
+
+        Raises:
+            RuntimeError: Error if `current_choice` is None.
+
+        Returns:
+            Any: Current choice of given mutable.
+        """
+        current_choice = mutable.current_choice
+        if current_choice is None:
+            raise RuntimeError(f'current choice of mutable {type(mutable)} '
+                               'can not be None at runtime')
+
+        return current_choice
+
+
+class DynamicChannelMixin(DynamicMixin):
+    """Base class for dynamic OP with mutable channels.
+
+    Note:
+        All subclass should implement ``mutable_in_channels`` and
+        ``mutable_out_channels`` APIs.
+    """
+
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'in_channels',
+        'out_channels': 'out_channels',
+    }
+
+    @staticmethod
+    def check_mutable_channels(mutable_channels: BaseMutable) -> None:
+        """Check if mutable has `currnet_mask` attribute.
+
+        Args:
+            mutable_channels (BaseMutable): Mutable to be checked.
+
+        Raises:
+            ValueError: Error if mutable does not have `current_mask`
+                attribute.
+        """
+        if not hasattr(mutable_channels, 'current_mask'):
+            raise ValueError(
+                'channel mutable must have attribute `current_mask`')
+
+
+class DynamicBatchNormMixin(DynamicChannelMixin):
+    """A mixin class for Pytorch BatchNorm, which can mutate
+    ``num_features``."""
+    accepted_mutable_attrs: Set[str] = {'num_features'}
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'num_features',
+        'out_channels': 'num_features',
+    }
+
+    def register_mutable_attr(self, attr, mutable):
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr, mutable):
+
+        if attr == 'num_features':
+            self._register_mutable_num_features(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_num_features(
+            self: _BatchNorm, mutable_num_features: BaseMutable) -> None:
+        """Mutate ``num_features`` with given mutable.
+
+        Args:
+            mutable_num_features (BaseMutable): Mutable for controlling
+                ``num_features``.
+
+        Raises:
+            RuntimeError: Error if both ``affine`` and
+                ``tracking_running_stats`` are False.
+            ValueError: Error if size of mask if not same as ``num_features``.
+        """
+        if not self.affine and not self.track_running_stats:
+            raise RuntimeError(
+                'num_features can not be mutated if both `affine` and '
+                '`tracking_running_stats` are False')
+
+        self.check_mutable_channels(mutable_num_features)
+        mask_size = mutable_num_features.current_mask.size(0)
+        if mask_size != self.num_features:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.num_features} as '
+                f'`num_features`, but got: {mask_size}.')
+
+        self.mutable_attrs['num_features'] = mutable_num_features
+
+    def _get_num_features_mask(self: _BatchNorm) -> Optional[torch.Tensor]:
+        """Get mask of ``num_features``"""
+        if self.affine:
+            refer_tensor = self.weight
+        elif self.track_running_stats:
+            refer_tensor = self.running_mean
+        else:
+            return None
+
+        if 'num_features' in self.mutable_attrs:
+            out_mask = self.mutable_attrs['num_features'].current_mask.to(
+                refer_tensor.device)
+        else:
+            out_mask = torch.ones_like(refer_tensor).bool()
+
+        return out_mask
+
+    def get_dynamic_params(
+        self: _BatchNorm
+    ) -> Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor],
+               Optional[Tensor]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor],
+                Optional[Tensor]]: Sliced running_mean, running_var, weight and
+                bias.
+        """
+        out_mask = self._get_num_features_mask()
+
+        if self.affine:
+            weight = self.weight[out_mask]
+            bias = self.bias[out_mask]
+        else:
+            weight, bias = self.weight, self.bias
+
+        if self.track_running_stats:
+            running_mean = self.running_mean[out_mask] \
+                if not self.training or self.track_running_stats else None
+            running_var = self.running_var[out_mask] \
+                if not self.training or self.track_running_stats else None
+        else:
+            running_mean, running_var = self.running_mean, self.running_var
+
+        return running_mean, running_var, weight, bias
+
+    def to_static_op(self: _BatchNorm) -> nn.Module:
+        """Convert dynamic BatchNormxd to :obj:`torch.nn.BatchNormxd`.
+
+        Returns:
+            torch.nn.BatchNormxd: :obj:`torch.nn.BatchNormxd` with sliced
+                parameters.
+        """
+        self.check_if_mutables_fixed()
+
+        running_mean, running_var, weight, bias = self.get_dynamic_params()
+        if 'num_features' in self.mutable_attrs:
+            num_features = self.mutable_attrs['num_features'].current_mask.sum(
+            ).item()
+        else:
+            num_features = self.num_features
+
+        static_bn = self.static_op_factory(
+            num_features=num_features,
+            eps=self.eps,
+            momentum=self.momentum,
+            affine=self.affine,
+            track_running_stats=self.track_running_stats)
+
+        if running_mean is not None:
+            static_bn.running_mean.copy_(running_mean)
+            static_bn.running_mean = static_bn.running_mean.to(
+                running_mean.device)
+        if running_var is not None:
+            static_bn.running_var.copy_(running_var)
+            static_bn.running_var = static_bn.running_var.to(
+                running_var.device)
+        if weight is not None:
+            static_bn.weight = nn.Parameter(weight)
+        if bias is not None:
+            static_bn.bias = nn.Parameter(bias)
+
+        return static_bn
+
+
+class DynamicLinearMixin(DynamicChannelMixin):
+    """A mixin class for Pytorch Linear, which can mutate ``in_features`` and
+    ``out_features``."""
+
+    accepted_mutable_attrs: Set[str] = {'in_features', 'out_features'}
+    attr_mappings: Dict[str, str] = {
+        'in_channels': 'in_features',
+        'out_channels': 'out_features',
+    }
+
+    def register_mutable_attr(self, attr, mutable):
+        self.check_mutable_attr_valid(attr)
+        if attr in self.attr_mappings:
+            attr_map = self.attr_mappings[attr]
+            assert attr_map in self.accepted_mutable_attrs
+            if attr_map in self.mutable_attrs:
+                print_log(
+                    f'{attr_map}({attr}) is already in `mutable_attrs`',
+                    level=logging.WARNING)
+            else:
+                self._register_mutable_attr(attr_map, mutable)
+        elif attr in self.accepted_mutable_attrs:
+            self._register_mutable_attr(attr, mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_attr(self, attr, mutable):
+
+        if attr == 'in_features':
+            self._register_mutable_in_features(mutable)
+        elif attr == 'out_features':
+            self._register_mutable_out_features(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_in_features(
+            self: nn.Linear, mutable_in_features: BaseMutable) -> None:
+        """Mutate ``in_features`` with given mutable.
+
+        Args:
+            mutable_in_features (BaseMutable): Mutable for controlling
+                ``in_features``.
+
+        Raises:
+            ValueError: Error if size of mask if not same as ``in_features``.
+        """
+        self.check_mutable_channels(mutable_in_features)
+        mask_size = mutable_in_features.current_mask.size(0)
+        if mask_size != self.in_features:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.in_features} as '
+                f'`in_features`, but got: {mask_size}.')
+
+        self.mutable_attrs['in_features'] = mutable_in_features
+
+    def _register_mutable_out_features(
+            self: nn.Linear, mutable_out_features: BaseMutable) -> None:
+        """Mutate ``out_features`` with given mutable.
+
+        Args:
+            mutable_out_features (BaseMutable): Mutable for controlling
+                ``out_features``.
+
+        Raises:
+            ValueError: Error if size of mask if not same as ``out_features``.
+        """
+        self.check_mutable_channels(mutable_out_features)
+        mask_size = mutable_out_features.current_mask.size(0)
+        if mask_size != self.out_features:
+            raise ValueError(
+                f'Expect mask size of mutable to be {self.out_features} as '
+                f'`in_features`, but got: {mask_size}.')
+
+        self.mutable_attrs['out_features'] = mutable_out_features
+
+    def get_dynamic_params(self: nn.Linear) -> Tuple[Tensor, Optional[Tensor]]:
+        """Get dynamic parameters that will be used in forward process.
+
+        Returns:
+            Tuple[Tensor, Optional[Tensor]]: Sliced weight and bias.
+        """
+        if 'in_features' not in self.mutable_attrs and \
+                'out_features' not in self.mutable_attrs:
+            return self.weight, self.bias
+
+        if 'in_features' in self.mutable_attrs:
+            in_mask = self.mutable_attrs['in_features'].current_mask.to(
+                self.weight.device)
+        else:
+            in_mask = torch.ones(self.weight.size(1)).bool().to(
+                self.weight.device)
+        if 'out_features' in self.mutable_attrs:
+
+            out_mask = self.mutable_attrs['out_features'].current_mask.to(
+                self.weight.device)
+        else:
+            out_mask = torch.ones(self.weight.size(0)).bool().to(
+                self.weight.device)
+
+        weight = self.weight[out_mask][:, in_mask]
+        bias = self.bias[out_mask] if self.bias is not None else None
+
+        return weight, bias
+
+    def to_static_op(self: nn.Linear) -> nn.Module:
+        """Convert to :obj:`torch.nn.Linear`.
+
+        Returns:
+            nn.Linear: :obj:`torch.nn.Linear` with sliced parameters.
+        """
+        self.check_if_mutables_fixed()
+
+        weight, bias = self.get_dynamic_params()
+        out_features = weight.size(0)
+        in_features = weight.size(1)
+
+        static_linear = self.static_op_factory(
+            in_features=in_features,
+            out_features=out_features,
+            bias=True if bias is not None else False)
+
+        static_linear.weight = nn.Parameter(weight)
+        if bias is not None:
+            static_linear.bias = nn.Parameter(bias)
+
+        return static_linear
+
+
+class DynamicResizeMixin(DynamicMixin):
+    """A mixin class for Pytorch InputResizer, which can mutate ``shape``."""
+
+    accepted_mutable_attrs: Set[str] = {'shape'}
+
+    def register_mutable_attr(self, attr, mutable):
+        if attr == 'shape':
+            self._register_mutable_shape(mutable)
+        else:
+            raise NotImplementedError
+
+    def _register_mutable_shape(self, mutable_shape):
+        assert hasattr(self, 'mutable_attrs')
+        current_shape = mutable_shape.current_choice
+        shape_dim = 1 if isinstance(current_shape, int) else len(current_shape)
+        if shape_dim not in [1, 2, 3]:
+            raise ValueError('Expect shape of mutable to be 1, 2 or 3'
+                             f', but got: {shape_dim}.')
+
+        self.mutable_attrs['shape'] = mutable_shape
+
+    def get_dynamic_shape(self):
+        if 'shape' in self.mutable_attrs:
+            current_shape = self.mutable_attrs['shape'].current_choice
+        else:
+            current_shape = None
+        return current_shape
+
+    def to_static_op(self) -> nn.Module:
+        self.check_if_mutables_fixed()
+
+        input_resizer = self.static_op_factory(
+            interpolation_type=self._interpolation_type,  # type:ignore
+            align_corners=self._align_corners,  # type:ignore
+            scale_factor=self._scale_factor)  # type:ignore
+
+        size = self.get_dynamic_shape()
+        if size is not None:
+            input_resizer._size = size
+
+        return input_resizer
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d88960106daac317fced56d6e7363a788e1dd1e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dafl_generator import DAFLGenerator
+from .zskt_generator import ZSKTGenerator
+
+__all__ = ['DAFLGenerator', 'ZSKTGenerator']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/base_generator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/base_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae38df4153a034f039f0fb76ee9f1623d9bbea85
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/base_generator.py
@@ -0,0 +1,63 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+from mmengine.model import BaseModule
+
+from mmrazor.models.utils import get_module_device
+
+
+class BaseGenerator(BaseModule):
+    """The base class for generating images.
+
+    Args:
+        img_size (int): The size of generated image.
+        latent_dim (int): The dimension of latent data.
+        hidden_channels (int): The dimension of hidden channels.
+        init_cfg (dict, optional): The config to control the initialization.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 img_size: int,
+                 latent_dim: int,
+                 hidden_channels: int,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg=init_cfg)
+        self.img_size = img_size
+        self.latent_dim = latent_dim
+        self.hidden_channels = hidden_channels
+
+    def process_latent(self,
+                       latent_data: Optional[torch.Tensor] = None,
+                       batch_size: int = 1) -> torch.Tensor:
+        """Generate the latent data if the input is None. Put the latent data
+        into the current gpu.
+
+        Args:
+            latent_data (torch.Tensor, optional): The latent data. Defaults to
+                None.
+            batch_size (int): The batch size of the latent data. Defaults to 1.
+        """
+        if isinstance(latent_data, torch.Tensor):
+            assert latent_data.shape[1] == self.latent_dim, \
+                'Second dimension of the input must be equal to "latent_dim",'\
+                f'but got {latent_data.shape[1]} != {self.latent_dim}.'
+            if latent_data.ndim == 2:
+                batch_data = latent_data
+            else:
+                raise ValueError('The noise should be in shape of (n, c)'
+                                 f'but got {latent_data.shape}')
+        elif latent_data is None:
+            assert batch_size > 0, \
+                '"batch_size" should larger than zero when "latent_data" is '\
+                f'None, but got {batch_size}.'
+            batch_data = torch.randn((batch_size, self.latent_dim))
+
+        # putting data on the right device
+        batch_data = batch_data.to(get_module_device(self))
+        return batch_data
+
+    def forward(self) -> None:
+        """Forward function."""
+        raise NotImplementedError
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/dafl_generator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/dafl_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a17e4b6bcc0e8d98eddeea6e722605fb7a5df5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/dafl_generator.py
@@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+from .base_generator import BaseGenerator
+
+
+@MODELS.register_module()
+class DAFLGenerator(BaseGenerator):
+    """Generator for DAFL.
+
+    Args:
+        img_size (int): The size of generated image.
+        latent_dim (int): The dimension of latent data.
+        hidden_channels (int): The dimension of hidden channels.
+        scale_factor (int, optional): The scale factor for F.interpolate.
+            Defaults to 2.
+        bn_eps (float, optional): The eps param in bn. Defaults to 0.8.
+        leaky_slope (float, optional): The slope param in leaky relu. Defaults
+            to 0.2.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(
+        self,
+        img_size: int,
+        latent_dim: int,
+        hidden_channels: int,
+        scale_factor: int = 2,
+        bn_eps: float = 0.8,
+        leaky_slope: float = 0.2,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(
+            img_size, latent_dim, hidden_channels, init_cfg=init_cfg)
+        self.init_size = self.img_size // (scale_factor**2)
+        self.scale_factor = scale_factor
+        self.linear = nn.Linear(self.latent_dim,
+                                self.hidden_channels * self.init_size**2)
+
+        self.bn1 = nn.BatchNorm2d(self.hidden_channels)
+        self.conv_blocks1 = nn.Sequential(
+            nn.Conv2d(
+                self.hidden_channels,
+                self.hidden_channels,
+                3,
+                stride=1,
+                padding=1),
+            nn.BatchNorm2d(self.hidden_channels, eps=bn_eps),
+            nn.LeakyReLU(leaky_slope, inplace=True),
+        )
+        self.conv_blocks2 = nn.Sequential(
+            nn.Conv2d(
+                self.hidden_channels,
+                self.hidden_channels // 2,
+                3,
+                stride=1,
+                padding=1),
+            nn.BatchNorm2d(self.hidden_channels // 2, eps=bn_eps),
+            nn.LeakyReLU(leaky_slope, inplace=True),
+            nn.Conv2d(self.hidden_channels // 2, 3, 3, stride=1, padding=1),
+            nn.Tanh(), nn.BatchNorm2d(3, affine=False))
+
+    def forward(self,
+                data: Optional[torch.Tensor] = None,
+                batch_size: int = 1) -> torch.Tensor:
+        """Forward function for generator.
+
+        Args:
+            data (torch.Tensor, optional): The input data. Defaults to None.
+            batch_size (int): Batch size. Defaults to 1.
+        """
+        batch_data = self.process_latent(data, batch_size)
+        img = self.linear(batch_data)
+        img = img.view(img.shape[0], self.hidden_channels, self.init_size,
+                       self.init_size)
+        img = self.bn1(img)
+        img = F.interpolate(img, scale_factor=self.scale_factor)
+        img = self.conv_blocks1(img)
+        img = F.interpolate(img, scale_factor=self.scale_factor)
+        img = self.conv_blocks2(img)
+        return img
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/zskt_generator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/zskt_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..2216eb2ce1e8413f1f89b17e66e4ce22795fb83f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/generators/zskt_generator.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from .base_generator import BaseGenerator
+
+
+class View(nn.Module):
+    """Class for view tensors.
+
+    Args:
+        size (Tuple[int, ...]): Size of the output tensor.
+    """
+
+    def __init__(self, size: Tuple[int, ...]) -> None:
+        super(View, self).__init__()
+        self.size = size
+
+    def forward(self, tensor: torch.Tensor) -> torch.Tensor:
+        """"Forward function for view tensors."""
+        return tensor.view(self.size)
+
+
+@MODELS.register_module()
+class ZSKTGenerator(BaseGenerator):
+    """Generator for ZSKT. code link:
+    https://github.com/polo5/ZeroShotKnowledgeTransfer/
+
+    Args:
+        img_size (int): The size of generated image.
+        latent_dim (int): The dimension of latent data.
+        hidden_channels (int): The dimension of hidden channels.
+        scale_factor (int, optional): The scale factor for F.interpolate.
+            Defaults to 2.
+        leaky_slope (float, optional): The slope param in leaky relu. Defaults
+            to 0.2.
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(
+        self,
+        img_size: int,
+        latent_dim: int,
+        hidden_channels: int,
+        scale_factor: int = 2,
+        leaky_slope: float = 0.2,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(
+            img_size, latent_dim, hidden_channels, init_cfg=init_cfg)
+        self.init_size = self.img_size // (scale_factor**2)
+        self.scale_factor = scale_factor
+
+        self.layers = nn.Sequential(
+            nn.Linear(self.latent_dim,
+                      self.hidden_channels * self.init_size**2),
+            View((-1, self.hidden_channels, self.init_size, self.init_size)),
+            nn.BatchNorm2d(self.hidden_channels),
+            nn.Upsample(scale_factor=scale_factor),
+            nn.Conv2d(
+                self.hidden_channels,
+                self.hidden_channels,
+                3,
+                stride=1,
+                padding=1), nn.BatchNorm2d(self.hidden_channels),
+            nn.LeakyReLU(leaky_slope, inplace=True),
+            nn.Upsample(scale_factor=scale_factor),
+            nn.Conv2d(
+                self.hidden_channels,
+                self.hidden_channels // 2,
+                3,
+                stride=1,
+                padding=1), nn.BatchNorm2d(self.hidden_channels // 2),
+            nn.LeakyReLU(leaky_slope, inplace=True),
+            nn.Conv2d(self.hidden_channels // 2, 3, 3, stride=1, padding=1),
+            nn.BatchNorm2d(3, affine=True))
+
+    def forward(self,
+                data: Optional[torch.Tensor] = None,
+                batch_size: int = 1) -> torch.Tensor:
+        """Forward function for generator.
+
+        Args:
+            data (torch.Tensor, optional): The input data. Defaults to None.
+            batch_size (int): Batch size. Defaults to 1.
+        """
+        batch_data = self.process_latent(data, batch_size)
+        return self.layers(batch_data)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d7da475d2f4539ac2789295d058f690953f02d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .darts_subnet_head import DartsSubnetClsHead
+from .deit_head import DeiTClsHead
+
+__all__ = ['DartsSubnetClsHead', 'DeiTClsHead']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/darts_subnet_head.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/darts_subnet_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..c3886ced3f8b96dd78c9a08a42c008f9c29e1ebc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/darts_subnet_head.py
@@ -0,0 +1,83 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Tuple
+
+import torch
+from torch import nn
+
+from mmrazor.models.utils import add_prefix
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.evaluation import Accuracy
+    from mmcls.models.heads import LinearClsHead
+    from mmcls.structures import ClsDataSample
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    Accuracy = get_placeholder('mmcls')
+    LinearClsHead = get_placeholder('mmcls')
+    ClsDataSample = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class DartsSubnetClsHead(LinearClsHead):
+
+    def __init__(self, aux_in_channels, aux_loss, **kwargs):
+        super(DartsSubnetClsHead, self).__init__(**kwargs)
+        self.aux_linear = nn.Linear(aux_in_channels, self.num_classes)
+        self.aux_loss_module = MODELS.build(aux_loss)
+
+    def forward_aux(self, feats: Tuple[torch.Tensor]):
+
+        aux_feat = feats[0]
+        aux_cls_score = self.aux_linear(aux_feat)
+        return aux_cls_score
+
+    def _get_aux_loss(self, cls_score: torch.Tensor,
+                      data_samples: List[ClsDataSample], **kwargs):
+        """Unpack data samples and compute loss."""
+        # Unpack data samples and pack targets
+        if 'score' in data_samples[0].gt_label:
+            # Batch augmentation may convert labels to one-hot format scores.
+            target = torch.stack([i.gt_label.score for i in data_samples])
+        else:
+            target = torch.hstack([i.gt_label.label for i in data_samples])
+
+        # compute loss
+        losses = dict()
+        loss = self.aux_loss_module(
+            cls_score, target, avg_factor=cls_score.size(0), **kwargs)
+        losses['loss'] = loss
+
+        # compute accuracy
+        if self.cal_acc:
+            assert target.ndim == 1, 'If you enable batch augmentation ' \
+                'like mixup during training, `cal_acc` is pointless.'
+            acc = Accuracy.calculate(cls_score, target, topk=self.topk)
+            losses.update(
+                {f'accuracy_top-{k}': a
+                 for k, a in zip(self.topk, acc)})
+
+        return losses
+
+    def loss(self, feats: Tuple[torch.Tensor],
+             data_samples: List[ClsDataSample], **kwargs) -> dict:
+        """Calculate losses from the classification score.
+        Args:
+            feats (tuple[Tensor]): The features extracted from the backbone.
+                Multiple stage inputs are acceptable but only the last stage
+                will be used to classify. The shape of every item should be
+                ``(num_samples, num_classes)``.
+            data_samples (List[ClsDataSample]): The annotation data of
+                every samples.
+            **kwargs: Other keyword arguments to forward the loss module.
+        Returns:
+            dict[str, Tensor]: a dictionary of loss components
+        """
+        losses = super().loss(feats, data_samples, **kwargs)
+
+        aux_cls_score = self.forward_aux(feats)
+        aux_losses = self._get_aux_loss(aux_cls_score, data_samples, **kwargs)
+
+        losses.update(add_prefix(aux_losses, 'aux_head.'))
+
+        return losses
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/deit_head.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/deit_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..61d587d9309f3e027f302d540971287095e42fc6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/heads/deit_head.py
@@ -0,0 +1,69 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Tuple
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models import VisionTransformerClsHead
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    VisionTransformerClsHead = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class DeiTClsHead(VisionTransformerClsHead):
+    """Distilled Vision Transformer classifier head.
+
+    Comparing with the :class:`DeiTClsHead` in mmcls, this head support to
+    train the distilled version DeiT.
+
+    Args:
+        num_classes (int): Number of categories excluding the background
+            category.
+        in_channels (int): Number of channels in the input feature map.
+        hidden_dim (int, optional): Number of the dimensions for hidden layer.
+            Defaults to None, which means no extra hidden layer.
+        act_cfg (dict): The activation config. Only available during
+            pre-training. Defaults to ``dict(type='Tanh')``.
+        init_cfg (dict): The extra initialization configs. Defaults to
+            ``dict(type='Constant', layer='Linear', val=0)``.
+    """
+
+    def _init_layers(self):
+        """"Init extra hidden linear layer to handle dist token if exists."""
+        super(DeiTClsHead, self)._init_layers()
+        if self.hidden_dim is None:
+            head_dist = nn.Linear(self.in_channels, self.num_classes)
+        else:
+            head_dist = nn.Linear(self.hidden_dim, self.num_classes)
+        self.layers.add_module('head_dist', head_dist)
+
+    def pre_logits(
+            self, feats: Tuple[List[torch.Tensor]]
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """The process before the final classification head.
+
+        The input ``feats`` is a tuple of list of tensor, and each tensor is
+        the feature of a backbone stage. In ``DeiTClsHead``, we obtain the
+        feature of the last stage and forward in hidden layer if exists.
+        """
+        _, cls_token, dist_token = feats[-1]
+        if self.hidden_dim is None:
+            return cls_token, dist_token
+        else:
+            cls_token = self.layers.act(self.layers.pre_logits(cls_token))
+            dist_token = self.layers.act(self.layers.pre_logits(dist_token))
+            return cls_token, dist_token
+
+    def forward(self, feats: Tuple[List[torch.Tensor]]) -> torch.Tensor:
+        """The forward process."""
+        cls_token, dist_token = self.pre_logits(feats)
+        # The final classification head.
+        cls_score = self.layers.head(cls_token)
+        # Forward so that the corresponding recorder can record the output
+        # of the distillation token
+        _ = self.layers.head_dist(dist_token)
+        return cls_score
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..60b3c7e44c2e6b012efc9454e0d0d13fed4fd066
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .squeezemean_with_dropout import SqueezeMeanPoolingWithDropout
+
+__all__ = ['SqueezeMeanPoolingWithDropout']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/squeezemean_with_dropout.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/squeezemean_with_dropout.py
new file mode 100644
index 0000000000000000000000000000000000000000..eca3447294f59755651cf4ce1113168bd6995eb8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/necks/squeezemean_with_dropout.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple, Union
+
+import torch
+import torch.nn.functional as F
+from mmengine.model import BaseModule
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class SqueezeMeanPoolingWithDropout(BaseModule):
+    """Dimensionality Reduction Neck with Dropout.
+
+    Dimensionality Reduction the feature map of backbone by SqueezeMean.
+    Some of the code is borrowed from
+        `https://github.com/facebookresearch/AttentiveNAS`.
+
+    Args:
+        drop_ratio (float): Dropout rate. Defaults to 0.2.
+    """
+
+    def __init__(self, drop_ratio: float = 0.2):
+        super(SqueezeMeanPoolingWithDropout, self).__init__()
+        self.drop_ratio = drop_ratio
+
+    def dimension_reduction(self, x: torch.Tensor):
+        assert x.ndim > 1, 'SqueezeMean only support (B, C, *) input.'
+        'to B C*H*W output if dim = 2'
+        for i in range(x.ndim - 1, 1, -1):
+            x = x.mean(i, keepdim=True)
+            x = torch.squeeze(x, -1)
+        return x
+
+    def forward(
+            self, inputs: Union[Tuple,
+                                torch.Tensor]) -> Union[Tuple, torch.Tensor]:
+        """Forward function with dropout.
+
+        Args:
+            x (Union[Tuple, torch.Tensor]): The feature map of backbone.
+        Returns:
+            Tuple[torch.Tensor]: The output features.
+        """
+        drop_ratio = self.drop_ratio if self.drop_ratio is not None else 0.0
+
+        if isinstance(inputs, tuple):
+            outs = tuple([self.dimension_reduction(x) for x in inputs])
+            if drop_ratio > 0 and self.training:
+                outs = tuple([F.dropout(x, p=drop_ratio) for x in outs])
+        elif isinstance(inputs, torch.Tensor):
+            inputs = self.dimension_reduction(inputs)
+            if drop_ratio > 0 and self.training:
+                outs = F.dropout(inputs, p=drop_ratio)  # type:ignore
+        else:
+            raise TypeError('neck inputs should be tuple or torch.tensor')
+        return outs
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e6eec7d3a3db65d8ffedbc3afb4898e2a9a88a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .common import Identity
+from .darts_series import (DartsDilConv, DartsPoolBN, DartsSepConv,
+                           DartsSkipConnect, DartsZero)
+from .efficientnet_series import ConvBnAct, DepthwiseSeparableConv
+from .function import InputResizer
+from .gather_tensors import GatherTensors
+from .mobilenet_series import MBBlock
+from .shufflenet_series import ShuffleBlock, ShuffleXception
+from .transformer_series import MultiheadAttention, RelativePosition2D
+
+__all__ = [
+    'ShuffleBlock', 'ShuffleXception', 'DartsPoolBN', 'DartsDilConv',
+    'DartsSepConv', 'DartsSkipConnect', 'DartsZero', 'MBBlock', 'Identity',
+    'ConvBnAct', 'DepthwiseSeparableConv', 'GatherTensors', 'InputResizer',
+    'RelativePosition2D', 'MultiheadAttention'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/base.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..f02756000336bfe0380e12350e568f80035c97de
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/base.py
@@ -0,0 +1,19 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.model import BaseModule
+
+
+class BaseOP(BaseModule):
+    """Base class for searchable operations.
+
+    Args:
+        in_channels (int): The input channels of the operation.
+        out_channels (int): The output channels of the operation.
+        stride (int): Stride of the operation. Defaults to 1.
+    """
+
+    def __init__(self, in_channels, out_channels, stride=1, **kwargs):
+        super(BaseOP, self).__init__(**kwargs)
+
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.stride = stride
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/common.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/common.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f1bccd9b142828b12fbd1a45caa1ab1cb04ba68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/common.py
@@ -0,0 +1,48 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmcv.cnn import ConvModule
+
+from mmrazor.registry import MODELS
+from .base import BaseOP
+
+
+@MODELS.register_module()
+class Identity(BaseOP):
+    """Base class for searchable operations.
+
+    Args:
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Default: None.
+    """
+
+    def __init__(self,
+                 conv_cfg=None,
+                 norm_cfg=dict(type='BN'),
+                 act_cfg=None,
+                 **kwargs):
+        super(Identity, self).__init__(**kwargs)
+
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+
+        if self.stride != 1 or self.in_channels != self.out_channels:
+            self.downsample = ConvModule(
+                self.in_channels,
+                self.out_channels,
+                kernel_size=1,
+                stride=self.stride,
+                padding=0,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
+        else:
+            self.downsample = None
+
+    def forward(self, x):
+        if self.downsample is not None:
+            x = self.downsample(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/darts_series.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/darts_series.py
new file mode 100644
index 0000000000000000000000000000000000000000..71368f5151064c69098a157dac66979d5bc42cce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/darts_series.py
@@ -0,0 +1,181 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+from mmcv.cnn import build_norm_layer
+from mmcv.cnn.bricks import DropPath
+
+from mmrazor.registry import MODELS
+from .base import BaseOP
+
+
+@MODELS.register_module()
+class DartsPoolBN(BaseOP):
+
+    def __init__(self,
+                 pool_type,
+                 kernel_size=3,
+                 norm_cfg=dict(type='BN'),
+                 use_drop_path=False,
+                 **kwargs):
+        super(DartsPoolBN, self).__init__(**kwargs)
+        self.kernel_size = kernel_size
+        self.norm_cfg = norm_cfg
+        if pool_type == 'max':
+            self.pool = nn.MaxPool2d(self.kernel_size, self.stride, 1)
+        elif pool_type == 'avg':
+            self.pool = nn.AvgPool2d(
+                self.kernel_size, self.stride, 1, count_include_pad=False)
+        self.bn = build_norm_layer(self.norm_cfg, self.out_channels)[1]
+
+        self.drop_path = DropPath() if use_drop_path else None
+
+    def forward(self, x):
+        out = self.pool(x)
+        out = self.bn(out)
+        if self.drop_path is not None:
+            out = self.drop_path(out)
+
+        return out
+
+
+@MODELS.register_module()
+class DartsDilConv(BaseOP):
+
+    def __init__(self,
+                 kernel_size,
+                 use_drop_path=False,
+                 norm_cfg=dict(type='BN'),
+                 **kwargs):
+        super(DartsDilConv, self).__init__(**kwargs)
+        self.kernel_size = kernel_size
+        self.norm_cfg = norm_cfg
+        self.dilation = 2
+        assert self.kernel_size in [3, 5]
+        assert self.stride in [1, 2]
+        self.conv1 = nn.Sequential(
+            nn.ReLU(),
+            nn.Conv2d(
+                self.in_channels,
+                self.in_channels,
+                self.kernel_size,
+                self.stride, (self.kernel_size // 2) * self.dilation,
+                dilation=self.dilation,
+                groups=self.in_channels,
+                bias=False),
+            nn.Conv2d(
+                self.in_channels, self.out_channels, 1, stride=1, bias=False),
+            build_norm_layer(self.norm_cfg, self.in_channels)[1])
+
+        self.drop_path = DropPath() if use_drop_path else None
+
+    def forward(self, x):
+        out = self.conv1(x)
+        if self.drop_path is not None:
+            out = self.drop_path(out)
+        return out
+
+
+@MODELS.register_module()
+class DartsSepConv(BaseOP):
+
+    def __init__(self,
+                 kernel_size,
+                 use_drop_path=False,
+                 norm_cfg=dict(type='BN'),
+                 **kwargs):
+        super(DartsSepConv, self).__init__(**kwargs)
+
+        self.kernel_size = kernel_size
+        self.norm_cfg = norm_cfg
+        assert self.kernel_size in [3, 5]
+        assert self.stride in [1, 2]
+        self.conv1 = nn.Sequential(
+            nn.ReLU(),
+            nn.Conv2d(
+                self.in_channels,
+                self.in_channels,
+                self.kernel_size,
+                self.stride,
+                self.kernel_size // 2,
+                groups=self.in_channels,
+                bias=False),
+            nn.Conv2d(
+                self.in_channels, self.in_channels, 1, stride=1, bias=False),
+            build_norm_layer(self.norm_cfg, self.in_channels)[1])
+        self.conv2 = nn.Sequential(
+            nn.ReLU(),
+            nn.Conv2d(
+                self.in_channels,
+                self.out_channels,
+                self.kernel_size,
+                1,
+                self.kernel_size // 2,
+                groups=self.in_channels,
+                bias=False),
+            nn.Conv2d(
+                self.out_channels, self.out_channels, 1, stride=1, bias=False),
+            build_norm_layer(self.norm_cfg, self.out_channels)[1])
+
+        self.drop_path = DropPath() if use_drop_path else None
+
+    def forward(self, x):
+        out = self.conv1(x)
+        out = self.conv2(out)
+        if self.drop_path is not None:
+            out = self.drop_path(out)
+        return out
+
+
+@MODELS.register_module()
+class DartsSkipConnect(BaseOP):
+    """Reduce feature map size by factorized pointwise (stride=2)."""
+
+    def __init__(self,
+                 use_drop_path=False,
+                 norm_cfg=dict(type='BN'),
+                 **kwargs):
+        super(DartsSkipConnect, self).__init__(**kwargs)
+        self.norm_cfg = norm_cfg
+        if self.stride > 1:
+            self.relu = nn.ReLU()
+            self.conv1 = nn.Conv2d(
+                self.in_channels,
+                self.out_channels // 2,
+                1,
+                stride=2,
+                padding=0,
+                bias=False)
+            self.conv2 = nn.Conv2d(
+                self.in_channels,
+                self.out_channels // 2,
+                1,
+                stride=2,
+                padding=0,
+                bias=False)
+            self.bn = build_norm_layer(self.norm_cfg, self.out_channels)[1]
+
+        self.drop_path = DropPath() if use_drop_path else None
+
+    def forward(self, x):
+        if self.stride > 1:
+            x = self.relu(x)
+            out = torch.cat(
+                [self.conv1(x), self.conv2(x[:, :, 1:, 1:])], dim=1)
+            out = self.bn(out)
+            if self.drop_path is not None:
+                out = self.drop_path(out)
+        else:
+            out = x
+        return out
+
+
+@MODELS.register_module()
+class DartsZero(BaseOP):
+
+    def __init__(self, **kwargs):
+        super(DartsZero, self).__init__(**kwargs)
+
+    def forward(self, x):
+        if self.stride == 1:
+            return x.mul(0.)
+        return x[:, :, ::self.stride, ::self.stride].mul(0.)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/efficientnet_series.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/efficientnet_series.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab78caa5bafecba517594cc5ac0d34b098135524
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/efficientnet_series.py
@@ -0,0 +1,165 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+
+from mmrazor.registry import MODELS
+from .base import BaseOP
+
+try:
+    from mmcls.models.utils import SELayer
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    SELayer = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class ConvBnAct(BaseOP):
+    """ConvBnAct block from timm.
+
+    Args:
+        in_channels (int): number of in channels.
+        out_channels (int): number of out channels.
+        kernel_size (int): kernel size of convolution.
+        stride (int, optional): stride of convolution. Defaults to 1.
+        dilation (int, optional): dilation rate of convolution. Defaults to 1.
+        padding (int, optional): padding size of convolution. Defaults to 0.
+        skip (bool, optional): whether using skip connect. Defaults to False.
+        conv_cfg (Optional[dict], optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (Dict, optional): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (Dict, optional):Config dict for activation layer.
+            Default: dict(type='ReLU').
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 padding: int = 0,
+                 skip: bool = False,
+                 conv_cfg: Optional[dict] = None,
+                 se_cfg: Dict = None,
+                 norm_cfg: Dict = dict(type='BN'),
+                 act_cfg: Dict = dict(type='ReLU')):
+        super().__init__(
+            in_channels=in_channels, out_channels=out_channels, stride=stride)
+        self.has_residual = skip and stride == 1 \
+            and in_channels == out_channels
+        self.with_se = se_cfg is not None
+
+        if self.with_se:
+            assert isinstance(se_cfg, dict)
+            self.se = SELayer(self.out_channels, **se_cfg)
+
+        self.convModule = ConvModule(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=padding,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+
+    def forward(self, x):
+        """Forward function."""
+        shortcut = x
+        x = self.convModule(x)
+        if self.has_residual:
+            x += shortcut
+        return x
+
+
+@MODELS.register_module()
+class DepthwiseSeparableConv(BaseOP):
+    """DepthwiseSeparable block Used for DS convs in MobileNet-V1 and in the
+    place of IR blocks that have no expansion (factor of 1.0). This is an
+    alternative to having a IR with an optional first pw conv.
+
+    Args:
+        in_channels (int): number of in channels.
+        out_channels (int): number of out channels.
+        dw_kernel_size (int, optional): the kernel size of depth-wise
+            convolution. Defaults to 3.
+        stride (int, optional): stride of convolution.
+            Defaults to 1.
+        dilation (int, optional): dilation rate of convolution.
+            Defaults to 1.
+        noskip (bool, optional): whether use skip connection.
+            Defaults to False.
+        pw_kernel_size (int, optional): kernel size of point wise convolution.
+            Defaults to 1.
+        pw_act (bool, optional): whether using activation in point-wise
+            convolution. Defaults to False.
+        se_cfg (Dict, optional): _description_. Defaults to None.
+        conv_cfg (Optional[dict], optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (Dict, optional): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (Dict, optional):Config dict for activation layer.
+            Default: dict(type='ReLU').
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 dw_kernel_size: int = 3,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 noskip: bool = False,
+                 pw_kernel_size: int = 1,
+                 pw_act: bool = False,
+                 conv_cfg: Optional[dict] = None,
+                 se_cfg: Dict = None,
+                 norm_cfg: Dict = dict(type='BN'),
+                 act_cfg: Dict = dict(type='ReLU')):
+
+        super().__init__(
+            in_channels=in_channels, out_channels=out_channels, stride=stride)
+        self.has_residual = (stride == 1
+                             and in_channels == out_channels) and not noskip
+        self.has_pw_act = pw_act  # activation after point-wise conv
+
+        self.se_cfg = se_cfg
+
+        self.conv_dw = ConvModule(
+            in_channels=in_channels,
+            out_channels=in_channels,
+            kernel_size=dw_kernel_size,
+            stride=stride,
+            dilation=dilation,
+            padding=dw_kernel_size // 2,
+            groups=in_channels,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg,
+        )
+
+        # Squeeze-and-excitation
+        self.se = SELayer(out_channels, **
+                          se_cfg) if self.se_cfg else nn.Identity()
+
+        self.conv_pw = ConvModule(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=pw_kernel_size,
+            padding=pw_kernel_size // 2,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg if self.has_pw_act else None,
+        )
+
+    def forward(self, x):
+        shortcut = x
+        x = self.conv_dw(x)
+        x = self.se(x)
+        x = self.conv_pw(x)
+        if self.has_residual:
+            x += shortcut
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/function.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/function.py
new file mode 100644
index 0000000000000000000000000000000000000000..3509e7da4273844104d8f6c871e0bf87f4accb50
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/function.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Optional, Tuple, Union
+
+import torch
+from torch.nn import Module, functional
+
+
+class InputResizer(Module):
+    valid_interpolation_type = {
+        'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear', 'area',
+        'nearest-exact'
+    }
+
+    def __init__(
+            self,
+            interpolation_type: str = 'bicubic',
+            align_corners: bool = False,
+            scale_factor: Optional[Union[float, List[float]]] = None) -> None:
+        super().__init__()
+
+        if interpolation_type not in self.valid_interpolation_type:
+            raise ValueError(
+                'Expect `interpolation_type` be '
+                f'one of {self.valid_interpolation_type}, but got: '
+                f'{interpolation_type}')
+        self._interpolation_type = interpolation_type
+        self._scale_factor = scale_factor
+        self._align_corners = align_corners
+        self._size = None
+
+    def forward(self,
+                x: torch.Tensor,
+                size: Optional[Tuple[int, int]] = None) -> torch.Tensor:
+        size = size if size is not None else self._size
+
+        return functional.interpolate(
+            input=x,
+            size=size,
+            mode=self._interpolation_type,
+            scale_factor=self._scale_factor,
+            align_corners=self._align_corners)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/gather_tensors.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/gather_tensors.py
new file mode 100644
index 0000000000000000000000000000000000000000..7bc34fde161d0bbdfcdd426306cd3fe6010aad72
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/gather_tensors.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Tuple
+
+import torch
+import torch.distributed as dist
+
+
+class GatherTensors(torch.autograd.Function):
+    """Gather tensors from all GPUS, supporting backward propagation.
+
+    See more details in torch.distributed.all_gather and
+    torch.distributed.all_reduce.
+    """
+
+    @staticmethod
+    def forward(ctx: Any, input: torch.Tensor) -> Tuple[Any, ...]:
+        """Forward function.
+
+        It must accept a context ctx as the first argument.
+
+        The context can be used to store tensors that can be then retrieved
+        during the backward pass.
+
+        Args:
+            ctx (Any): Context to be used for forward propagation.
+            input (torch.Tensor): Tensor to be broadcast from current process.
+        """
+        output = [
+            torch.empty_like(input) for _ in range(dist.get_world_size())
+        ]
+        dist.all_gather(output, input)
+        return tuple(output)
+
+    @staticmethod
+    def backward(ctx: Any, *grads: torch.Tensor) -> torch.Tensor:
+        """Backward function.
+
+        It must accept a context :attr:`ctx` as the first argument, followed by
+        as many outputs did :func:`forward` return, and it should return as
+        many tensors, as there were inputs to :func:`forward`. Each argument is
+        the gradient w.r.t the given output, and each returned value should be
+        the gradient w.r.t. the corresponding input.
+
+        The context can be used to retrieve tensors saved during the forward
+        pass. It also has an attribute :attr:`ctx.needs_input_grad` as a tuple
+        of booleans representing whether each input needs gradient. E.g.,
+        :func:`backward` will have ``ctx.needs_input_grad[0] = True`` if the
+        first input to :func:`forward` needs gradient computated w.r.t. the
+        output.
+
+        Args:
+            ctx (Any): Context to be used for forward propagation.
+            grads (torch.Tensor): Grads to be merged from current process.
+        """
+        rank = dist.get_rank()
+        merged = torch.stack(grads)
+        dist.all_reduce(merged)
+        return merged[rank]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/mobilenet_series.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/mobilenet_series.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6817dd8757a5ba0bde106655ebc6f5ee85e018a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/mobilenet_series.py
@@ -0,0 +1,217 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch
+import torch.nn.functional as F
+import torch.utils.checkpoint as cp
+from mmcv.cnn import ConvModule
+from mmcv.cnn.bricks import build_conv_layer
+from mmcv.cnn.bricks.drop import drop_path
+
+from mmrazor.registry import MODELS
+from .base import BaseOP
+
+try:
+    from mmcls.models.utils import SELayer
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    SELayer = get_placeholder('mmcls')
+
+
+class ShortcutLayer(BaseOP):
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 reduction: int = 1,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 init_cfg=None):
+        super().__init__(in_channels, out_channels, init_cfg)
+
+        assert reduction in [1, 2]
+        self.reduction = reduction
+
+        # conv module can be removed if in_channels equal to out_channels
+        self.conv = build_conv_layer(
+            conv_cfg,
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            stride=1,
+            bias=False)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if self.reduction > 1:
+            padding = x.size(-1) & 1
+            x = F.avg_pool2d(x, self.reduction, padding=padding)
+
+        # HACK
+        if hasattr(self.conv, 'mutable_in_channels'
+                   ) and self.conv.mutable_in_channels is not None:
+            in_channels = self.conv.mutable_in_channels.current_mask.sum(
+            ).item()
+        else:
+            in_channels = self.conv.in_channels
+        if hasattr(self.conv, 'mutable_out_channels'
+                   ) and self.conv.mutable_out_channels is not None:
+            out_channels = self.conv.mutable_out_channels.current_mask.sum(
+            ).item()
+        else:
+            out_channels = self.conv.out_channels
+
+        if in_channels != out_channels:
+            x = self.conv(x)
+
+        return x
+
+
+@MODELS.register_module()
+class MBBlock(BaseOP):
+    """Mobilenet block for Searchable backbone.
+
+    Args:
+        kernel_size (int): Size of the convolving kernel.
+        expand_ratio (int): The input channels' expand factor of the depthwise
+             convolution.
+        se_cfg (dict, optional): Config dict for se layer. Defaults to None,
+            which means no se layer.
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Defaults to None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Defaults to dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Defaults to dict(type='ReLU').
+        drop_path_rate (float): stochastic depth rate. Defaults to 0.
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Defaults to False.
+        with_attentive_shortcut (bool): Use shortcut in AttentiveNAS or not.
+            Defaults to False.
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self,
+                 kernel_size: int,
+                 expand_ratio: int,
+                 se_cfg: Dict = None,
+                 conv_cfg: Dict = dict(type='Conv2d'),
+                 norm_cfg: Dict = dict(type='BN'),
+                 act_cfg: Dict = dict(type='ReLU'),
+                 drop_path_rate: float = 0.,
+                 with_cp: bool = False,
+                 with_attentive_shortcut: bool = False,
+                 **kwargs):
+
+        super().__init__(**kwargs)
+
+        if with_attentive_shortcut:
+            self.shortcut = ShortcutLayer(
+                in_channels=self.in_channels,
+                out_channels=self.out_channels,
+                reduction=self.stride,
+                conv_cfg=conv_cfg)
+        self.with_attentive_shortcut = with_attentive_shortcut
+
+        self.with_res_shortcut = (
+            self.stride == 1 and self.in_channels == self.out_channels
+            and not self.with_attentive_shortcut)
+        assert self.stride in [1, 2]
+        self._drop_path_rate = drop_path_rate
+        self.kernel_size = kernel_size
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.with_cp = with_cp
+        self.with_se = se_cfg is not None
+        self.mid_channels = self.in_channels * expand_ratio
+        self.with_expand_conv = (self.mid_channels != self.in_channels)
+
+        if self.with_se:
+            assert isinstance(se_cfg, dict)
+
+        if self.with_expand_conv:
+            self.expand_conv = ConvModule(
+                in_channels=self.in_channels,
+                out_channels=self.mid_channels,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                conv_cfg=conv_cfg,
+                norm_cfg=norm_cfg,
+                act_cfg=act_cfg)
+        self.depthwise_conv = ConvModule(
+            in_channels=self.mid_channels,
+            out_channels=self.mid_channels,
+            kernel_size=kernel_size,
+            stride=self.stride,
+            padding=kernel_size // 2,
+            groups=self.mid_channels,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+        if self.with_se:
+            self.se = SELayer(self.mid_channels, **se_cfg)
+        self.linear_conv = ConvModule(
+            in_channels=self.mid_channels,
+            out_channels=self.out_channels,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=None)
+
+    @property
+    def drop_path_rate(self):
+        return self._drop_path_rate
+
+    @drop_path_rate.setter
+    def drop_path_rate(self, value):
+        if not isinstance(value, float):
+            raise TypeError('Expected float.')
+        self._drop_path_rate = value
+
+    def forward(self, x):
+        """Forward function.
+
+        Args:
+            x (torch.Tensor): The input tensor.
+        Returns:
+            torch.Tensor: The output tensor.
+        """
+
+        def _inner_forward(x):
+            out = x
+
+            if self.with_expand_conv:
+                out = self.expand_conv(out)
+
+            out = self.depthwise_conv(out)
+            if self.with_se:
+                out = self.se(out)
+
+            out = self.linear_conv(out)
+
+            if self.with_res_shortcut:
+                if self.drop_path_rate > 0.:
+                    out = drop_path(out, self.drop_path_rate, self.training)
+                return x + out
+
+            elif self.with_attentive_shortcut:
+                sx = self.shortcut(x)
+                if self.drop_path_rate > 0. and \
+                        x.size(1) == sx.size(1) and \
+                        self.shortcut.reduction == 1:
+                    out = drop_path(out, self.drop_path_rate, self.training)
+                return sx + out
+
+            else:
+                return out
+
+        if self.with_cp and x.requires_grad:
+            out = cp.checkpoint(_inner_forward, x)
+        else:
+            out = _inner_forward(x)
+
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/shufflenet_series.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/shufflenet_series.py
new file mode 100644
index 0000000000000000000000000000000000000000..efd205999e1547b37115ae30a7e04647eae0101a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/shufflenet_series.py
@@ -0,0 +1,261 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.utils.checkpoint as cp
+from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule
+
+from mmrazor.registry import MODELS
+from .base import BaseOP
+
+try:
+    from mmcls.models.utils import channel_shuffle
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    channel_shuffle = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class ShuffleBlock(BaseOP):
+    """InvertedResidual block for Searchable ShuffleNetV2 backbone.
+
+    Args:
+        kernel_size (int): Size of the convolving kernel.
+        stride (int): Stride of the convolution layer. Default: 1
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Default: None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Default: dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Default: dict(type='ReLU').
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Default: False.
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self,
+                 kernel_size,
+                 conv_cfg=None,
+                 norm_cfg=dict(type='BN'),
+                 act_cfg=dict(type='ReLU'),
+                 with_cp=False,
+                 **kwargs):
+
+        super(ShuffleBlock, self).__init__(**kwargs)
+
+        assert kernel_size in [3, 5, 7]
+        self.kernel_size = kernel_size
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.with_cp = with_cp
+
+        branch_features = self.out_channels // 2
+        if self.stride == 1:
+            assert self.in_channels == branch_features * 2, (
+                f'in_channels ({self.in_channels}) should equal to '
+                f'branch_features * 2 ({branch_features * 2}) '
+                'when stride is 1')
+
+        if self.in_channels != branch_features * 2:
+            assert self.stride != 1, (
+                f'stride ({self.stride}) should not equal 1 when '
+                f'in_channels != branch_features * 2')
+
+        if self.stride > 1:
+            self.branch1 = nn.Sequential(
+                ConvModule(
+                    self.in_channels,
+                    self.in_channels,
+                    kernel_size=self.kernel_size,
+                    stride=self.stride,
+                    padding=self.kernel_size // 2,
+                    groups=self.in_channels,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=None),
+                ConvModule(
+                    self.in_channels,
+                    branch_features,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+            )
+
+        self.branch2 = nn.Sequential(
+            ConvModule(
+                self.in_channels if (self.stride > 1) else branch_features,
+                branch_features,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg),
+            ConvModule(
+                branch_features,
+                branch_features,
+                kernel_size=self.kernel_size,
+                stride=self.stride,
+                padding=self.kernel_size // 2,
+                groups=branch_features,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=None),
+            ConvModule(
+                branch_features,
+                branch_features,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg))
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            if self.stride > 1:
+                out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
+            else:
+                x1, x2 = x.chunk(2, dim=1)
+                out = torch.cat((x1, self.branch2(x2)), dim=1)
+
+            out = channel_shuffle(out, 2)
+
+            return out
+
+        if self.with_cp and x.requires_grad:
+            out = cp.checkpoint(_inner_forward, x)
+        else:
+            out = _inner_forward(x)
+
+        return out
+
+
+@MODELS.register_module()
+class ShuffleXception(BaseOP):
+    """Xception block for ShuffleNetV2 backbone.
+
+    Args:
+        conv_cfg (dict, optional): Config dict for convolution layer.
+            Defaults to None, which means using conv2d.
+        norm_cfg (dict): Config dict for normalization layer.
+            Defaults to dict(type='BN').
+        act_cfg (dict): Config dict for activation layer.
+            Defaults to dict(type='ReLU').
+        with_cp (bool): Use checkpoint or not. Using checkpoint will save some
+            memory while slowing down the training speed. Defaults to False.
+
+    Returns:
+        Tensor: The output tensor.
+    """
+
+    def __init__(self,
+                 conv_cfg=None,
+                 norm_cfg=dict(type='BN'),
+                 act_cfg=dict(type='ReLU'),
+                 with_cp=False,
+                 **kwargs):
+        super(ShuffleXception, self).__init__(**kwargs)
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        self.with_cp = with_cp
+        self.mid_channels = self.out_channels // 2
+
+        branch_features = self.out_channels // 2
+        if self.stride == 1:
+            assert self.in_channels == branch_features * 2, (
+                f'in_channels ({self.in_channels}) should equal to '
+                f'branch_features * 2 ({branch_features * 2}) '
+                'when stride is 1')
+
+        if self.in_channels != branch_features * 2:
+            assert self.stride != 1, (
+                f'stride ({self.stride}) should not equal 1 when '
+                f'in_channels != branch_features * 2')
+
+        if self.stride > 1:
+            self.branch1 = nn.Sequential(
+                ConvModule(
+                    self.in_channels,
+                    self.in_channels,
+                    kernel_size=3,
+                    stride=self.stride,
+                    padding=1,
+                    groups=self.in_channels,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=None),
+                ConvModule(
+                    self.in_channels,
+                    branch_features,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                    conv_cfg=self.conv_cfg,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+            )
+
+        self.branch2 = []
+
+        self.branch2.append(
+            DepthwiseSeparableConvModule(
+                self.in_channels if (self.stride > 1) else branch_features,
+                self.mid_channels,
+                kernel_size=3,
+                stride=self.stride,
+                padding=1,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                dw_act_cfg=None,
+                act_cfg=self.act_cfg), )
+        self.branch2.append(
+            DepthwiseSeparableConvModule(
+                self.mid_channels,
+                self.mid_channels,
+                kernel_size=3,
+                stride=1,
+                padding=1,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                dw_act_cfg=None,
+                act_cfg=self.act_cfg))
+        self.branch2.append(
+            DepthwiseSeparableConvModule(
+                self.mid_channels,
+                branch_features,
+                kernel_size=3,
+                stride=1,
+                padding=1,
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                dw_act_cfg=None,
+                act_cfg=self.act_cfg))
+        self.branch2 = nn.Sequential(*self.branch2)
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            if self.stride > 1:
+                out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
+            else:
+                x1, x2 = x.chunk(2, dim=1)
+                out = torch.cat((x1, self.branch2(x2)), dim=1)
+
+            out = channel_shuffle(out, 2)
+
+            return out
+
+        if self.with_cp and x.requires_grad:
+            out = cp.checkpoint(_inner_forward, x)
+        else:
+            out = _inner_forward(x)
+
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/transformer_series.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/transformer_series.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd8fb09749e4d3cd6f1cc55f1c77d7e9c7f27ac2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/ops/transformer_series.py
@@ -0,0 +1,193 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import torch
+import torch.nn as nn
+from mmengine.model.weight_init import trunc_normal_
+
+
+class RelativePosition2D(nn.Module):
+    """Rethinking and Improving Relative Position Encoding for Vision
+    Transformer.
+
+    ICCV 2021. https://arxiv.org/pdf/2107.14222.pdf
+    Image RPE (iRPE for short) methods are new relative position encoding
+    methods dedicated to 2D images.
+    Args:
+        head_dims (int): embedding dims of relative position.
+        max_relative_position (int): The max relative position distance.
+    """
+
+    def __init__(self, head_dims: int, max_relative_position: int = 14):
+        super().__init__()
+
+        self.head_dims = head_dims
+        self.max_relative_position = max_relative_position
+        # The first element in embeddings_table_v is the vertical embedding
+        # for the class
+        self.embeddings_table_v = nn.Parameter(
+            torch.randn(max_relative_position * 2 + 2, head_dims))
+        self.embeddings_table_h = nn.Parameter(
+            torch.randn(max_relative_position * 2 + 2, head_dims))
+
+        trunc_normal_(self.embeddings_table_v, std=.02)
+        trunc_normal_(self.embeddings_table_h, std=.02)
+
+    def forward(self, length_q, length_k):
+        # remove the first cls token distance computation
+        length_q = length_q - 1
+        length_k = length_k - 1
+        range_vec_q = torch.arange(length_q)
+        range_vec_k = torch.arange(length_k)
+        # compute the row and column distance
+        distance_mat_v = (
+            range_vec_k[None, :] // int(length_q**0.5) -
+            range_vec_q[:, None] // int(length_q**0.5))
+        distance_mat_h = (
+            range_vec_k[None, :] % int(length_q**0.5) -
+            range_vec_q[:, None] % int(length_q**0.5))
+        # clip the distance to the range of
+        # [-max_relative_position, max_relative_position]
+        distance_mat_clipped_v = torch.clamp(distance_mat_v,
+                                             -self.max_relative_position,
+                                             self.max_relative_position)
+        distance_mat_clipped_h = torch.clamp(distance_mat_h,
+                                             -self.max_relative_position,
+                                             self.max_relative_position)
+
+        # translate the distance from [1, 2 * max_relative_position + 1],
+        # 0 is for the cls token
+        final_mat_v = distance_mat_clipped_v + self.max_relative_position + 1
+        final_mat_h = distance_mat_clipped_h + self.max_relative_position + 1
+        # pad the 0 which represent the cls token
+        final_mat_v = torch.nn.functional.pad(final_mat_v, (1, 0, 1, 0),
+                                              'constant', 0)
+        final_mat_h = torch.nn.functional.pad(final_mat_h, (1, 0, 1, 0),
+                                              'constant', 0)
+
+        final_mat_v = torch.LongTensor(final_mat_v)
+        final_mat_h = torch.LongTensor(final_mat_h)
+        # get the embeddings with the corresponding distance
+        embeddings = self.embeddings_table_v[
+            final_mat_v] + self.embeddings_table_h[final_mat_h]
+
+        return embeddings
+
+
+class MultiheadAttention(nn.Module):
+    """Multi-head Attention Module with iRPE.
+
+    This module implements multi-head attention that supports different input
+    dims and embed dims. And it also supports a shortcut from ``value``, which
+    is useful if input dims is not the same with embed dims.
+
+    Args:
+        embed_dims (int): The embedding dimension.
+        num_heads (int): Parallel attention heads.
+        input_dims (int, optional): The input dimension, and if None,
+            use ``embed_dims``. Defaults to None.
+        attn_drop_rate (float): Dropout rate of the dropout layer after the
+            attention calculation of query and key. Defaults to 0.
+        proj_drop_rate (float): Dropout rate of the dropout layer after the
+            output projection. Defaults to 0.
+        out_drop_rate (dict): Dropout rate of the dropout layer before adding
+            the shortcut. Defaults to 0.
+        relative_position (bool, optional): Whether use relative position.
+            Defaults to True.
+        max_relative_position (int): The max relative position distance.
+        qkv_bias (bool): If True, add a learnable bias to q, k, v.
+            Defaults to True.
+        qk_scale (float, optional): Override default qk scale of
+            ``head_dim ** -0.5`` if set. Defaults to None.
+        proj_bias (bool) If True, add a learnable bias to output projection.
+            Defaults to True.
+        v_shortcut (bool): Add a shortcut from value to output. It's usually
+            used if ``input_dims`` is different from ``embed_dims``.
+            Defaults to False.
+        init_cfg (dict, optional): The Config for initialization.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 embed_dims: int,
+                 num_heads: int,
+                 input_dims: Optional[int] = None,
+                 attn_drop_rate: float = 0.,
+                 proj_drop_rate: float = 0.,
+                 out_drop_rate: float = 0.,
+                 relative_position: Optional[bool] = True,
+                 max_relative_position: int = 14,
+                 qkv_bias: bool = True,
+                 qk_scale: Optional[float] = None,
+                 proj_bias: bool = True,
+                 v_shortcut: bool = False,
+                 init_cfg: Optional[dict] = None):
+        super().__init__()
+
+        self.input_dims = input_dims or embed_dims
+        self.embed_dims = embed_dims
+        self.num_heads = num_heads
+        self.v_shortcut = v_shortcut
+        self.relative_position = relative_position
+        self.max_relative_position = max_relative_position
+
+        self.head_dims = 64  # unit
+        self.scale = qk_scale or self.head_dims**-0.5
+
+        self.q_embed_dims = num_heads * self.head_dims
+
+        self.w_qs = nn.Linear(
+            self.input_dims, num_heads * self.head_dims, bias=qkv_bias)
+        self.w_ks = nn.Linear(
+            self.input_dims, num_heads * self.head_dims, bias=qkv_bias)
+        self.w_vs = nn.Linear(
+            self.input_dims, num_heads * self.head_dims, bias=qkv_bias)
+
+        self.attn_drop = nn.Dropout(attn_drop_rate)
+        self.proj_drop = nn.Dropout(proj_drop_rate)
+        self.out_drop = nn.Dropout(out_drop_rate)
+
+        self.proj = nn.Linear(
+            num_heads * self.head_dims, embed_dims, bias=proj_bias)
+
+        # image relative position encoding
+        if self.relative_position:
+            self.rel_pos_embed_k = RelativePosition2D(
+                self.head_dims, self.max_relative_position)
+            self.rel_pos_embed_v = RelativePosition2D(
+                self.head_dims, self.max_relative_position)
+
+    def forward(self, x):
+        B, N, _ = x.shape
+
+        q = self.w_qs(x).view(B, N, self.num_heads, self.head_dims)
+        k = self.w_ks(x).view(B, N, self.num_heads, self.head_dims)
+        v = self.w_vs(x).view(B, N, self.num_heads, self.head_dims)
+
+        q, k, v = q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2)
+
+        attn = (q @ k.transpose(-2, -1)) * self.scale
+
+        if self.relative_position:
+            r_p_k = self.rel_pos_embed_k(N, N)
+            attn = attn + (q.permute(2, 0, 1, 3).reshape(N, self.num_heads * B, -1)  # noqa: E501
+                           @ r_p_k.transpose(2, 1)) \
+                .transpose(1, 0).reshape(B, self.num_heads, N, N) * self.scale
+
+        attn = attn.softmax(dim=-1)
+        attn = self.attn_drop(attn)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
+
+        if self.relative_position:
+            r_p_v = self.rel_pos_embed_v(N, N)
+            t_attn = attn.permute(2, 0, 1, 3).reshape(N, B * self.num_heads,
+                                                      -1)
+            x = x + (t_attn @ r_p_v).transpose(1, 0).reshape(
+                B, self.num_heads, N, -1).transpose(2, 1).reshape(B, N, -1)
+
+        x = self.proj(x)
+        x = self.out_drop(self.proj_drop(x))
+
+        if self.v_shortcut:
+            x = v.squeeze(1) + x
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7072a29d8d34af1802e2740b69173cd03cb909f4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .mutable_register import mutate_conv_module, mutate_mobilenet_layer
+from .set_dropout import set_dropout
+
+__all__ = ['mutate_conv_module', 'mutate_mobilenet_layer', 'set_dropout']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/mutable_register.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/mutable_register.py
new file mode 100644
index 0000000000000000000000000000000000000000..f3a916748eba2f9de3e3921bd4a61d1528fd79ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/mutable_register.py
@@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Optional, Sequence, Tuple
+
+from mmrazor.models.architectures.ops.mobilenet_series import MBBlock
+from ...mutables.base_mutable import BaseMutable
+from ...mutables.mutable_channel import MutableChannelContainer
+
+
+def mutate_conv_module(
+        conv_module,
+        mutable_in_channels: Optional[BaseMutable] = None,
+        mutable_out_channels: Optional[BaseMutable] = None,
+        mutable_kernel_size: Optional[Tuple[BaseMutable,
+                                            Sequence[int]]] = None):
+    """Mutate a conv module."""
+    if mutable_in_channels is not None:
+        MutableChannelContainer.register_mutable_channel_to_module(
+            conv_module.conv, mutable_in_channels, False)
+
+    if mutable_out_channels is not None:
+        MutableChannelContainer.register_mutable_channel_to_module(
+            conv_module.conv, mutable_out_channels, True)
+
+        if hasattr(conv_module, 'bn'):
+            MutableChannelContainer.register_mutable_channel_to_module(
+                conv_module.bn, mutable_out_channels, False)
+
+    if mutable_kernel_size is not None:
+        conv_module.conv.register_mutable_attr('kernel_size',
+                                               mutable_kernel_size)
+
+
+def mutate_mobilenet_layer(mb_layer: MBBlock,
+                           mutable_in_channels,
+                           mutable_out_channels,
+                           mutable_expand_ratio,
+                           mutable_kernel_size,
+                           fine_grained_mode: bool = False):
+    """Mutate MobileNet layers."""
+    mb_layer.derived_expand_channels = \
+        mutable_expand_ratio * mutable_in_channels
+
+    if mb_layer.with_expand_conv:
+        mutate_conv_module(
+            mb_layer.expand_conv,
+            mutable_in_channels=mutable_in_channels,
+            mutable_out_channels=mb_layer.derived_expand_channels)
+
+    mutate_conv_module(
+        mb_layer.depthwise_conv,
+        mutable_in_channels=mb_layer.derived_expand_channels,
+        mutable_out_channels=mb_layer.derived_expand_channels,
+        mutable_kernel_size=mutable_kernel_size)
+
+    if mb_layer.with_se:
+        if fine_grained_mode:
+            mutable_expand_ratio2 = copy.deepcopy(mutable_expand_ratio)
+            mutable_expand_ratio2.alias += '_se'
+            derived_se_channels = mutable_expand_ratio2 * mutable_in_channels
+            mb_layer.derived_se_channels = \
+                derived_se_channels.derive_divide_mutable(4, 8)
+        else:
+            mb_layer.derived_se_channels = \
+                mb_layer.derived_expand_channels.derive_divide_mutable(4, 8)
+
+        mutate_conv_module(
+            mb_layer.se.conv1,
+            mutable_in_channels=mb_layer.derived_expand_channels,
+            mutable_out_channels=mb_layer.derived_se_channels)
+        mutate_conv_module(
+            mb_layer.se.conv2,
+            mutable_in_channels=mb_layer.derived_se_channels,
+            mutable_out_channels=mb_layer.derived_expand_channels)
+
+    if not mb_layer.with_res_shortcut:
+        if mb_layer.with_attentive_shortcut:
+            MutableChannelContainer.register_mutable_channel_to_module(
+                mb_layer.shortcut.conv, mutable_in_channels, False)
+            MutableChannelContainer.register_mutable_channel_to_module(
+                mb_layer.shortcut.conv, mutable_out_channels, True)
+
+    mutate_conv_module(
+        mb_layer.linear_conv,
+        mutable_in_channels=mb_layer.derived_expand_channels,
+        mutable_out_channels=mutable_out_channels)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/set_dropout.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/set_dropout.py
new file mode 100644
index 0000000000000000000000000000000000000000..f45f03a0320ccd99b86a17762b81dd54cfe2daf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/architectures/utils/set_dropout.py
@@ -0,0 +1,35 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+from ..dynamic_ops.bricks import DynamicSequential
+
+
+def set_dropout(layers, module, dropout_stages: List[int],
+                drop_path_rate: float) -> None:
+    """Dynamically set dropout rate for layers by depth.
+
+    Args:
+        layers: Layers in MobileNet-style networks.
+        module: Specific module to set a different ratio.
+        dropout_stages (List[int]): Stages to be set dropout.
+        drop_path_rate (float): Drop path rate for layers.
+    """
+    assert hasattr(module, 'drop_path_rate')
+    visited_block_nums = 0
+    total_block_nums = len([
+        block for layer in layers for block in layer
+        if isinstance(block, module)
+    ])
+    for idx, layer in enumerate(layers, start=1):
+        assert isinstance(layer, DynamicSequential)
+        mblayer_nums = len(
+            [block for block in layer if isinstance(block, module)])
+        visited_block_nums += mblayer_nums
+        if idx not in dropout_stages:
+            continue
+
+        for block_idx, block in enumerate(layer):
+            if isinstance(block, module) and hasattr(block, 'drop_path_rate'):
+                ratio = (visited_block_nums - mblayer_nums +
+                         block_idx) / total_block_nums
+                block.drop_path_rate = drop_path_rate * ratio
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2d70bd2605c9cfb021884f5d6929d79da57933d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base_distiller import BaseDistiller
+from .byot_distiller import BYOTDistiller
+from .configurable_distiller import ConfigurableDistiller
+from .ofd_distiller import OFDDistiller
+
+__all__ = [
+    'ConfigurableDistiller', 'BaseDistiller', 'BYOTDistiller', 'OFDDistiller'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/base_distiller.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/base_distiller.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cf575e9058cb6fc64f5b7df44ca7efd39f08488
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/base_distiller.py
@@ -0,0 +1,22 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABC, abstractmethod
+from typing import Dict, Optional
+
+from mmengine.model import BaseModule
+
+from ..algorithms.base import LossResults
+
+
+class BaseDistiller(BaseModule, ABC):
+    """Base class for distiller.
+
+    Args:
+        init_cfg (dict, optional): Config for distiller. Default to None.
+    """
+
+    def __init__(self, init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg)
+
+    @abstractmethod
+    def compute_distill_losses(self) -> LossResults:
+        """Compute distill losses automatically."""
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/byot_distiller.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/byot_distiller.py
new file mode 100644
index 0000000000000000000000000000000000000000..fca0d774efe2b9c1f396089487b42342662386a7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/byot_distiller.py
@@ -0,0 +1,44 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Optional
+
+from mmrazor.registry import MODELS
+from .configurable_distiller import ConfigurableDistiller
+
+
+@MODELS.register_module()
+class BYOTDistiller(ConfigurableDistiller):
+    """``BYOTDistiller`` inherits ``ConfigurableDistiller`` and only modifies
+    ``get_record()`` function to ``get_record_with_cidx()``.
+
+    In ``BYOTDistiller``, ``self.teacher_recorder`` records self-teacher data
+    which requires detach().
+    """
+
+    def get_record(self,
+                   recorder: str,
+                   from_student: bool,
+                   record_idx: int = 0,
+                   data_idx: Optional[int] = None,
+                   connector: Optional[str] = None,
+                   connector_idx: Optional[int] = None) -> List:
+        """According to each item in ``record_infos``, get the corresponding
+        record in ``recorder_manager``.
+
+        Detach teacher_record.
+        """
+
+        if from_student:
+            recorder_ = self.student_recorders.get_recorder(recorder)
+        else:
+            recorder_ = self.teacher_recorders.get_recorder(recorder)
+        record_data = recorder_.get_record_data(record_idx, data_idx)
+
+        if connector:
+            record_data = self.connectors[connector](record_data)
+        if connector_idx is not None:
+            record_data = record_data[connector_idx]
+        # Detach self-teacher output Tensor from model, assert hook tensor.
+        if not from_student:
+            record_data = record_data.detach()
+
+        return record_data
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/configurable_distiller.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/configurable_distiller.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6c5c267e988b951cadb436ed32940ad5a74b6ab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/configurable_distiller.py
@@ -0,0 +1,294 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from inspect import signature
+from typing import Dict, List, Optional, Union
+
+from mmengine.model import BaseModel
+from torch import nn
+
+from mmrazor.registry import MODELS
+from ..algorithms.base import LossResults
+from ..task_modules import DistillDeliveryManager, RecorderManager
+from .base_distiller import BaseDistiller
+
+
+@MODELS.register_module()
+class ConfigurableDistiller(BaseDistiller):
+    """``ConfigurableDistiller`` is a powerful tool that can reproduce most
+    distillation algorithms without modifying the code of teacher or student
+    models.
+
+    ``ConfigurableDistiller`` can get various intermediate results of the
+    model in a hacky way by ``Recorder``. More details see user-docs for
+    ``Recorder``.
+
+    ``ConfigurableDistiller`` can use the teacher's intermediate results to
+    override the student's intermediate results in a hacky way by ``Delivery``.
+    More details see user-docs for ``Delivery``.
+
+    Args:
+        student_recorders (dict, optional): Config for multiple recorders. A
+            student model may have more than one recorder. These recorders
+            only record the student model's intermediate results. Defaults to
+            None.
+        teacher_recorders (dict, optional): Config for multiple recorders. A
+            teacher model may have more than one recorder. These recorders
+            only record the teacher model's intermediate results. Defaults to
+            None.
+        distill_deliveries (dict, optional): Config for multiple deliveries. A
+            distill algorithm may have more than one delivery. Defaults to
+            None.
+        connectors (dict, optional): Config for multiple connectors. A
+            distillation model may have more than one connector. Defaults to
+            None.
+        distill_losses: (Dict[str, Dict], optional): Config for multiple
+            distill losses. A distill algorithm may have more than one distill
+            loss. Defaults to None.
+        loss_forward_mappings: (Dict[str, Dict], optional): Mapping between
+            distill loss forward arguments and records.
+
+    Note:
+        If a distill loss needs to backward, the name of the loss must contain
+        "loss". If it is only used as a statistical value, the name can not
+        contain "loss". More details see docs for
+        :func:`mmengine.model.BaseModel._parse_loss`.
+
+    Note:
+        The keys of ``loss_forward_mappings`` should be consistent with the
+        keys of ``distill_losses``.
+
+        Each item in ``loss_forward_mappings`` is a mapping between a distill
+        loss and its forward arguments. The keys of the mapping are the
+        signature of the loss's forward, and the values of the mapping are the
+        recorded data location.
+
+        ``from_recorder``refers to the recorder where the data is stored, and
+        if ``from_student`` is True, it means the recorder is in `
+        `student_recorders``; otherwise, it means the recorder is in
+        ``teacher_recorders``.
+
+        A connector can be called according to its `connector_name`, so that a
+        input can use a different connector in different loss.
+
+    Examples:
+        >>> distill_losses = dict(
+        ...     loss_neck=dict(type='L2Loss', loss_weight=5))
+
+        >>> student_recorders = dict(
+        ...     feat = dict(type='ModuleOutputs', sources='neck.gap'))
+
+        >>> teacher_recorders = dict(
+        ...     feat = dict(type='ModuleOutputs', sources='neck.gap'))
+
+        >>> connectors = dict(
+        ...     loss_neck_sfeat = dict(
+        ...         type='SingleConvConnector', in_channel=32, out_channel=64),
+        ...     loss_neck_tfeat = dict(
+        ...         type='SingleConvConnector', in_channel=32, out_channel=64))
+
+        >>> loss_forward_mappings = dict(
+        ...     loss_neck=dict(
+        ...         s_feature=dict(from_recorder='feat', from_student=True,
+        ...                        connector='loss_neck_sfeat'),
+        ...         t_feature=dict(from_recorder='feat', from_student=False,
+        ...                        connector='loss_neck_tfeat')))
+    """
+
+    def __init__(self,
+                 student_recorders: Optional[Dict[str, Dict]] = None,
+                 teacher_recorders: Optional[Dict[str, Dict]] = None,
+                 distill_deliveries: Optional[Dict[str, Dict]] = None,
+                 connectors: Optional[Dict[str, Dict]] = None,
+                 distill_losses: Optional[Dict[str, Dict]] = None,
+                 loss_forward_mappings: Optional[Dict[str, Dict]] = None,
+                 **kwargs):
+        super().__init__(**kwargs)
+        # The recorder manager is just constructed, but not really initialized
+        # yet. Recorder manager initialization needs to input the corresponding
+        # model.
+        self.student_recorders = RecorderManager(student_recorders)
+        self.teacher_recorders = RecorderManager(teacher_recorders)
+
+        self.deliveries = DistillDeliveryManager(distill_deliveries)
+
+        self.distill_losses = self.build_distill_losses(distill_losses)
+
+        self.connectors = self.build_connectors(connectors)
+
+        if loss_forward_mappings:
+            # Check if loss_forward_mappings is in the correct format.
+            self._check_loss_forward_mappings(self.distill_losses,
+                                              loss_forward_mappings,
+                                              self.student_recorders,
+                                              self.teacher_recorders)
+            self.loss_forward_mappings = loss_forward_mappings
+        else:
+            self.loss_forward_mappings = dict()
+
+    def set_deliveries_override(self, override: bool) -> None:
+        """Set the `override_data` of all deliveries."""
+        self.deliveries.override_data = override
+
+    def prepare_from_student(self, model: BaseModel) -> None:
+        """Initialize student recorders."""
+        self.student_recorders.initialize(model)
+
+    def prepare_from_teacher(self, model: nn.Module) -> None:
+        """Initialize teacher recorders."""
+        self.teacher_recorders.initialize(model)
+
+    def build_connectors(
+        self,
+        connectors: Optional[Union[Dict[str, List], Dict[str, Dict]]] = None,
+    ) -> nn.ModuleDict:
+        """Initialize connectors."""
+
+        distill_connecotrs = nn.ModuleDict()
+        if connectors:
+            for connector_name, connector_cfg in connectors.items():
+                if isinstance(connector_cfg, dict):
+                    connector = MODELS.build(connector_cfg)
+                    distill_connecotrs[connector_name] = connector
+                else:
+                    assert isinstance(connector_cfg, list)
+                    module_list = []
+                    for cfg in connector_cfg:
+                        connector = MODELS.build(cfg)
+                        module_list.append(connector)
+                    distill_connecotrs[connector_name] = nn.Sequential(
+                        *module_list)
+
+        return distill_connecotrs
+
+    def build_distill_losses(
+        self,
+        losses: Optional[Dict[str, Dict]] = None,
+    ) -> nn.ModuleDict:
+        """build distill losses according config."""
+
+        distill_losses = nn.ModuleDict()
+        if losses:
+            for loss_name, loss_cfg in losses.items():
+                assert loss_name not in distill_losses
+                if 'loss' not in loss_name:
+                    warnings.warn(
+                        f'Warning: If {loss_name} is a loss that needs to '
+                        f'backward, the name of {loss_name} must contain '
+                        f'"loss". If it is only used as a statistical value, '
+                        'then the name must not contain "loss". More details '
+                        'see docs for '
+                        ':func:`mmengine.model.BaseModel._parse_loss`',
+                        UserWarning)
+                item_loss = MODELS.build(loss_cfg)
+                distill_losses[loss_name] = item_loss
+
+        return distill_losses
+
+    def get_record(self,
+                   recorder: str,
+                   from_student: bool,
+                   record_idx: int = 0,
+                   data_idx: Optional[int] = None,
+                   connector: Optional[str] = None,
+                   connector_idx: Optional[int] = None) -> List:
+        """According to each item in ``record_infos``, get the corresponding
+        record in ``recorder_manager``."""
+
+        if from_student:
+            recorder_ = self.student_recorders.get_recorder(recorder)
+        else:
+            recorder_ = self.teacher_recorders.get_recorder(recorder)
+        record_data = recorder_.get_record_data(record_idx, data_idx)
+
+        if connector:
+            record_data = self.connectors[connector](record_data)
+        if connector_idx is not None:
+            record_data = record_data[connector_idx]
+
+        return record_data
+
+    def compute_distill_losses(self) -> LossResults:
+        """Compute distill losses automatically."""
+        # Record all computed losses' results.
+        losses = dict()
+        for loss_name, forward_mappings in self.loss_forward_mappings.items():
+            forward_kwargs = dict()
+            for forward_key, record in forward_mappings.items():
+                forward_var = self.get_record(**record)
+                forward_kwargs[forward_key] = forward_var
+
+            loss_module = self.distill_losses[loss_name]
+            loss = loss_module(**forward_kwargs)  # type: ignore
+            # add computed loss result.
+            losses[loss_name] = loss
+
+        return losses
+
+    def _check_loss_forward_mappings(
+            self, losses: nn.ModuleDict, loss_forward_mappings: Dict[str,
+                                                                     Dict],
+            student_recorders: RecorderManager,
+            teacher_recorders: RecorderManager) -> None:
+        """Check if ``loss_forward_mappings`` is in the correct format."""
+
+        if not isinstance(loss_forward_mappings, dict):
+            raise TypeError(
+                'loss_forward_mappings should be a dict instance, but got'
+                f'{type(loss_forward_mappings)}')
+
+        for loss_name, forward_mappings in loss_forward_mappings.items():
+            assert loss_name in losses, \
+                f'"{loss_name}" is not in distill losses. The keys of ' \
+                'loss_forward_kwargs must match the keys of distill_losses.'
+
+            if not isinstance(forward_mappings, dict):
+                raise TypeError(
+                    'Each item of loss_forward_mappings should be a dict '
+                    f'instance, but got {type(forward_mappings)}')
+
+            loss_module = losses[loss_name]
+            loss_forward_params = signature(loss_module.forward).parameters
+            loss_forward_keys = loss_forward_params.keys()
+            # Allow default params.
+            # Check non-default params, not len(params).
+
+            for forward_key, record_info in forward_mappings.items():
+                assert forward_key in loss_forward_keys, \
+                    f'{forward_key} is not in the signature of \
+                    {type(loss_module).__name__} forward, \
+                    please check your config.'
+
+                if (loss_forward_params[forward_key].default !=
+                        loss_forward_params[forward_key].empty):
+                    # default params without check
+                    continue
+
+                assert 'recorder' in record_info, \
+                    'Each item of loss_forward_mappings should have ' \
+                    '"recorder", pls check your config.'
+
+                assert 'from_student' in record_info, \
+                    'Each item of loss_forward_mappings should have ' \
+                    '"from_student", pls check your config.'
+
+                recorder: str = record_info['recorder']
+                from_student: bool = record_info['from_student']
+
+                if not isinstance(from_student, bool):
+                    raise TypeError(f'from_student should be a bool instance, '
+                                    f'but got {type(from_student)}')
+
+                if from_student:
+                    assert recorder in student_recorders.recorders, \
+                        f'For {forward_key}, "{recorder}" must be in \
+                        `student_recorders`.'
+
+                else:
+                    assert recorder in teacher_recorders.recorders, \
+                        f'For {forward_key}, "{recorder}" must be in \
+                        `teacher_recorders`.'
+
+                if 'connector' in record_info:
+                    connector: str = record_info['connector']
+                    assert connector in self.connectors, \
+                        f'{connector} must be in "connectors".'
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/ofd_distiller.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/ofd_distiller.py
new file mode 100644
index 0000000000000000000000000000000000000000..6aed9ae35859e9a5af0c5129120c28444cac44fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/distillers/ofd_distiller.py
@@ -0,0 +1,77 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from operator import attrgetter
+
+import torch
+import torch.nn as nn
+
+try:
+    from scipy.stats import norm
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    norm = get_placeholder('norm')
+
+from mmrazor.registry import MODELS
+from ..architectures.connectors import OFDTeacherConnector
+from ..losses import OFDLoss
+from .configurable_distiller import ConfigurableDistiller
+
+
+@MODELS.register_module()
+class OFDDistiller(ConfigurableDistiller):
+    """Distiller for ``OverhaulFeatureDistillation``, inherited from
+    ``ConfigurableDistiller``, add func:
+
+    ``init_ofd_connectors`` to initialize margin.
+    """
+
+    def init_ofd_connectors(self, teacher: nn.Module) -> None:
+        """Initialize OFD connectors' `margin`."""
+        for loss_key, loss_forward_mapping in self.loss_forward_mappings.items(
+        ):
+            if isinstance(self.distill_losses[loss_key], OFDLoss):
+                for _input_keys, _input_mapping in loss_forward_mapping.items(
+                ):
+                    if 'connector' in _input_mapping and not _input_mapping[
+                            'from_student']:
+
+                        recorder = self.teacher_recorders.get_recorder(
+                            _input_mapping['recorder'])
+                        module_key = recorder.source
+                        bn_module = attrgetter(module_key)(teacher)
+
+                        assert isinstance(
+                            bn_module, (nn.BatchNorm2d, nn.SyncBatchNorm)
+                        ), ('Overhaul distillation only support connection on '
+                            'layers: [`BatchNorm2d`, `SyncBatchNorm`]')
+                        connector = self.connectors[
+                            _input_mapping['connector']]
+                        assert isinstance(connector, OFDTeacherConnector), (
+                            'OFD loss mapping for `t_feature` expect type '
+                            '`OFDTeacherConnector`, but get '
+                            f'`{type(connector)}`')
+                        margin = self._get_margin_from_BN(bn_module)
+                        connector.init_margin(margin)
+
+    def _get_margin_from_BN(self, bn: nn.BatchNorm2d) -> torch.Tensor:
+        """Get margin from BN layer.
+
+        Args:
+            bn (nn.BatchNorm2d): input module, must be a BN layer.
+
+        Returns:
+            torch.Tensor: margin
+        """
+        margin = []
+        std = bn.weight.data
+        mean = bn.bias.data
+        for (s, m) in zip(std, mean):
+            s = abs(s.item())
+            m = m.item()
+            if norm.cdf(-m / s) > 0.001:
+                margin.append(-s * math.exp(-(m / s)**2 / 2) /
+                              math.sqrt(2 * math.pi) / norm.cdf(-m / s) + m)
+            else:
+                margin.append(-3 * s)
+        return torch.FloatTensor(margin).unsqueeze(1).unsqueeze(2).unsqueeze(
+            0).detach()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..95082121039929131602f4a1d41db7b6b416cda8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base import BaseFakeQuantize
+from .lsq import (LearnableFakeQuantize, enable_param_learning,
+                  enable_static_estimate, enable_val)
+from .torch_fake_quants import register_torch_fake_quants
+
+__all__ = [
+    'BaseFakeQuantize', 'register_torch_fake_quants', 'LearnableFakeQuantize',
+    'enable_val', 'enable_param_learning', 'enable_static_estimate'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/base.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..45aed7421269a4e6c79a5fa8c1af329743d3696d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/base.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+try:
+    from torch.ao.quantization import FakeQuantize
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    FakeQuantize = get_placeholder('torch>=1.13')
+
+BaseFakeQuantize = FakeQuantize
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/lsq.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/lsq.py
new file mode 100644
index 0000000000000000000000000000000000000000..1689d039300f9d3ee3a80b2afaa6bf1ede4a7d5a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/lsq.py
@@ -0,0 +1,313 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from torch.nn.parameter import Parameter
+
+from mmrazor.registry import MODELS
+
+try:
+    from torch.ao.quantization import FakeQuantizeBase
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    FakeQuantizeBase = get_placeholder('torch>=1.13')
+
+
+def enable_param_learning(mod):
+    """Enables learning of quantization parameters, if applicable. Example
+    usage::
+
+    # model is any PyTorch model model.apply(enable_param_learning)
+    """
+    if isinstance(mod, LearnableFakeQuantize):
+        mod.enable_param_learning()
+
+
+def enable_static_estimate(mod):
+    """Enables static observer estimates, if applicable. Example usage::
+
+    # model is any PyTorch model model.apply(enable_static_estimate)
+    """
+    if isinstance(mod, LearnableFakeQuantize):
+        mod.enable_static_estimate()
+
+
+def enable_val(mod):
+    """Enable validation, if applicable. Example usage::
+
+    # model is any PyTorch model model.apply(enable_val)
+    """
+    if isinstance(mod, LearnableFakeQuantize):
+        mod.enable_val()
+
+
+@MODELS.register_module()
+class LearnableFakeQuantize(FakeQuantizeBase):
+    """This is an extension of the FakeQuantize module in fake_quantize.py,
+    which supports learning of the scale and zero point parameters through
+    backpropagation.
+
+    In addition to the attributes in the original FakeQuantize module, the
+    LearnableFakeQuantize module also includes the following attributes to
+    support quantization parameter learning.
+
+    * :attr:`fake_quant_enabled` defines the flag for enabling fake
+      quantization on the output.
+
+    * :attr:`static_enabled` defines the flag for using observer's static
+      estimation for scale and zero point.
+
+    * :attr:`learning_enabled` defines the flag for enabling backpropagation
+      for scale and zero point.
+
+    Args:
+        observer (module): Module for observing statistics on input tensors and
+            calculating scale and zero-point.
+        quant_min (int): Minimum quantization value. If unspecified, it will
+            follow the 8-bit setup.
+        quant_max (int): Maximum quantization value. If unspecified, it will
+            follow the 8-bit setup.
+        scale (float): The initial value of the floating-point scale factor.
+            Defaults to 1.
+        zero_point (float): The initial value of the floating-point zero-point.
+            Defaults to 0.
+        use_grad_scaling (bool): Whether the gradients for scale and zero point
+            are normalized by the constant, which is proportional to the square
+            root of the number of elements in the tensor. The related
+            literature justifying the use of this particular constant can be
+            found here: https://openreview.net/pdf?id=rkgO66VKDS. Defaults to
+            True.
+        zero_point_trainable (bool): Whether the zero_point is trainable.
+            Defaults to False.
+        observer_kwargs (dict | optional): Arguments for the observer module.
+    """
+
+    def __init__(self,
+                 observer,
+                 quant_min=0,
+                 quant_max=255,
+                 scale=1.,
+                 zero_point=0.,
+                 use_grad_scaling=True,
+                 zero_point_trainable=False,
+                 **observer_kwargs):
+        super(LearnableFakeQuantize, self).__init__()
+        assert quant_min < quant_max, \
+            'quant_min must be strictly less than quant_max.'
+        self.quant_min = quant_min
+        self.quant_max = quant_max
+        # also pass quant_min and quant_max to observer
+        observer_kwargs['quant_min'] = quant_min
+        observer_kwargs['quant_max'] = quant_max
+        self.use_grad_scaling = use_grad_scaling
+
+        self.scale = Parameter(torch.tensor([scale]))
+        self.zero_point_trainable = zero_point_trainable
+        if zero_point_trainable:
+            self.zero_point = Parameter(torch.tensor([zero_point]))
+        else:
+            self.register_buffer('zero_point', torch.tensor([zero_point]))
+
+        self.activation_post_process = observer(**observer_kwargs)
+        assert \
+            torch.iinfo(self.activation_post_process.dtype).min <= quant_min, \
+            'quant_min out of bound'
+        assert \
+            quant_max <= torch.iinfo(self.activation_post_process.dtype).max, \
+            'quant_max out of bound'
+        self.dtype = self.activation_post_process.dtype
+        self.qscheme = self.activation_post_process.qscheme
+        self.ch_axis = self.activation_post_process.ch_axis \
+            if hasattr(self.activation_post_process, 'ch_axis') else -1
+        self.register_buffer('fake_quant_enabled',
+                             torch.tensor([1], dtype=torch.uint8))
+        self.register_buffer('static_enabled',
+                             torch.tensor([1], dtype=torch.uint8))
+        self.register_buffer('learning_enabled',
+                             torch.tensor([0], dtype=torch.uint8))
+
+        bitrange = torch.tensor(quant_max - quant_min + 1).double()
+        self.bitwidth = int(torch.log2(bitrange).item())
+        self.register_buffer('eps',
+                             torch.tensor([torch.finfo(torch.float32).eps]))
+
+    @torch.jit.export
+    def enable_param_learning(self):
+        """Enables learning of quantization parameters and disables static
+        observer estimates.
+
+        Forward path returns fake quantized X.
+        """
+        self.toggle_qparam_learning(enabled=True) \
+            .toggle_fake_quant(enabled=True) \
+            .toggle_observer_update(enabled=False)
+        return self
+
+    @torch.jit.export
+    def enable_static_estimate(self):
+        """Enables static observer estimates and disables learning of
+        quantization parameters.
+
+        Forward path returns fake quantized X.
+        """
+        self.toggle_qparam_learning(enabled=False) \
+            .toggle_fake_quant(enabled=True) \
+            .toggle_observer_update(enabled=True)
+
+    @torch.jit.export
+    def enable_val(self):
+        """Disables static observer accumulating data from input and doesn't
+        update the quantization parameters.
+
+        Forward path returns fake quantized X.
+        """
+        self.toggle_qparam_learning(enabled=False) \
+            .toggle_fake_quant(enabled=True) \
+            .toggle_observer_update(enabled=False)
+
+    @torch.jit.export
+    def enable_static_observation(self):
+        """Enables static observer accumulating data from input but doesn't
+        update the quantization parameters.
+
+        Forward path returns the original X.
+        """
+        self.toggle_qparam_learning(enabled=False) \
+            .toggle_fake_quant(enabled=False) \
+            .toggle_observer_update(enabled=True)
+
+    @torch.jit.export
+    def toggle_observer_update(self, enabled=True):
+        """Toggles whether static observer accumulates data from input."""
+        self.static_enabled[0] = int(enabled)
+        return self
+
+    @torch.jit.export
+    def enable_observer(self, enabled=True):
+        """Enables static observer accumulating data from input."""
+        self.toggle_observer_update(enabled)
+
+    @torch.jit.export
+    def toggle_qparam_learning(self, enabled=True):
+        """Toggles whether the quantization parameters are learnable."""
+        self.learning_enabled[0] = int(enabled)
+        self.scale.requires_grad = enabled
+        if self.zero_point_trainable:
+            self.zero_point.requires_grad = enabled
+        return self
+
+    @torch.jit.export
+    def toggle_fake_quant(self, enabled=True):
+        """Toggles whether the fake quantization is enabled."""
+        self.fake_quant_enabled[0] = int(enabled)
+        return self
+
+    @torch.jit.export
+    def observe_quant_params(self):
+        """Shows the quantization parameters."""
+        print('LearnableFakeQuantize Scale: {}'.format(self.scale.detach()))
+        print('LearnableFakeQuantize Zero Point: {}'.format(
+            self.zero_point.detach()))
+
+    @torch.jit.export
+    def calculate_qparams(self):
+        """Calculate the quantization parameters."""
+        self.scale.data.clamp_(min=self.eps.item())
+        scale = self.scale.detach()
+        zero_point = self.zero_point.detach().round().clamp(
+            self.quant_min, self.quant_max).long()
+        return scale, zero_point
+
+    def forward(self, X):
+        """Forward computation.
+
+        Forward path returns fake quantized X.
+        """
+        if self.static_enabled[0] == 1:
+            self.activation_post_process(X.detach())
+            _scale, _zero_point = \
+                self.activation_post_process.calculate_qparams()
+            _scale = _scale.to(self.scale.device)
+            _zero_point = _zero_point.to(self.zero_point.device)
+
+            if self.qscheme in (torch.per_channel_symmetric,
+                                torch.per_channel_affine):
+                self.scale.data = torch.ones_like(_scale)
+                self.zero_point.data = torch.zeros_like(_zero_point.float())
+
+            self.scale.data.copy_(_scale)
+            self.zero_point.data.copy_(_zero_point)
+        else:
+            self.scale.data.clamp_(min=self.eps.item())
+
+        if self.fake_quant_enabled[0] == 1:
+
+            if self.use_grad_scaling:
+                grad_factor = 1.0 / (X.numel() * self.quant_max)**0.5
+            else:
+                grad_factor = 1.0
+            if self.qscheme in (torch.per_channel_symmetric,
+                                torch.per_channel_affine):
+                X = torch._fake_quantize_learnable_per_channel_affine(
+                    X, self.scale, self.zero_point, self.ch_axis,
+                    self.quant_min, self.quant_max, grad_factor)
+            else:
+                if not (self.quant_min <= self.zero_point <= self.quant_max):
+                    print(self.quant_min, self.zero_point, self.quant_max)
+                X = torch._fake_quantize_learnable_per_tensor_affine(
+                    X, self.scale, self.zero_point, self.quant_min,
+                    self.quant_max, grad_factor)
+
+        return X
+
+    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
+                              missing_keys, unexpected_keys, error_msgs):
+        """Removing this function throws an error that the the size of the
+        loaded tensor does not match the original size i.e., These buffers
+        start out with numel 0 and become numel 1 once they have their first
+        forward pass.
+
+        Modified from https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fake_quantize.py  # noqa:E501
+        """
+        local_state = ['scale', 'zero_point']
+        for name in local_state:
+            key = prefix + name
+            if key in state_dict:
+                val = state_dict[key]
+                # Custom handling to allow loading scale and zero_point
+                # of size N into uninitialized buffers of size 0. The
+                # buffers are resized here, and the values are copied in
+                # the default state_dict loading code of the parent.
+                if name == 'scale':
+                    self.scale.data = self.scale.data.resize_(val.shape)
+                else:
+                    assert name == 'zero_point'
+                    self.zero_point.data = self.zero_point.data.resize_(
+                        val.shape)
+                # For torchscript module we need to update the attributes here
+                # since we do not call the `_load_from_state_dict` function
+                # defined module.py
+                if torch.jit.is_scripting():
+                    if name == 'scale':
+                        self.scale.copy_(val)
+                    else:
+                        assert name == 'zero_point'
+                        self.zero_point.copy_(val)
+            elif strict:
+                missing_keys.append(key)
+        super(LearnableFakeQuantize,
+              self)._load_from_state_dict(state_dict, prefix, local_metadata,
+                                          strict, missing_keys,
+                                          unexpected_keys, error_msgs)
+
+    @torch.jit.export
+    def extra_repr(self):
+        """The printable representational string."""
+        repr_str = f'static_enabled={self.static_enabled}, '
+        repr_str += f'fake_quant_enabled={self.fake_quant_enabled}, '
+        repr_str += f'quant_min={self.activation_post_process.quant_min}, '
+        repr_str += f'quant_max={self.activation_post_process.quant_max}, '
+        repr_str += f'dtype={self.dtype}, '
+        repr_str += f'qscheme={self.qscheme}, '
+        repr_str += f'scale={self.scale}, '
+        repr_str += f'zero_point={self.zero_point}, '
+        repr_str += f'zero_point_trainable={self.zero_point_trainable}'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/torch_fake_quants.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/torch_fake_quants.py
new file mode 100644
index 0000000000000000000000000000000000000000..06e325b327ebec9d2432d7a5e9cc690b54922976
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/fake_quants/torch_fake_quants.py
@@ -0,0 +1,38 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import inspect
+from typing import List
+
+from mmrazor.registry import MODELS
+
+try:
+    import torch.ao.quantization.fake_quantize as torch_fake_quant_src
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    torch_fake_quant_src = get_package_placeholder('torch>=1.13')
+
+
+# TORCH_fake_quants = register_torch_fake_quants()
+# TORCH_fake_quants including:
+# FakeQuantize
+# FakeQuantizeBase
+# FixedQParamsFakeQuantize
+# FusedMovingAvgObsFakeQuantize
+def register_torch_fake_quants() -> List[str]:
+    """Register fake_quants in ``torch.ao.quantization.fake_quantize`` to the
+    ``MODELS`` registry.
+
+    Returns:
+        List[str]: A list of registered fake_quants' name.
+    """
+    torch_fake_quants = []
+    for module_name in dir(torch_fake_quant_src):
+        if module_name.startswith('__') or module_name.startswith('_') or \
+                                            module_name.startswith('default'):
+            continue
+        _fake_quant = getattr(torch_fake_quant_src, module_name)
+        if inspect.isclass(_fake_quant) and issubclass(
+                _fake_quant, torch_fake_quant_src.FakeQuantizeBase):
+            if MODELS.get(module_name) is None:
+                MODELS.register_module(module=_fake_quant)
+                torch_fake_quants.append(module_name)
+    return torch_fake_quants
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..65e2108fd39cbd1fa79386351c92a981899131d6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/__init__.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .ab_loss import ABLoss
+from .at_loss import ATLoss
+from .crd_loss import CRDLoss
+from .cross_entropy_loss import CrossEntropyLoss
+from .cwd import ChannelWiseDivergence
+from .dafl_loss import ActivationLoss, InformationEntropyLoss, OnehotLikeLoss
+from .decoupled_kd import DKDLoss
+from .dist_loss import DISTLoss
+from .factor_transfer_loss import FTLoss
+from .fbkd_loss import FBKDLoss
+from .kd_soft_ce_loss import KDSoftCELoss
+from .kl_divergence import KLDivergence
+from .l1_loss import L1Loss
+from .l2_loss import L2Loss
+from .mgd_loss import MGDLoss
+from .ofd_loss import OFDLoss
+from .pkd_loss import PKDLoss
+from .relational_kd import AngleWiseRKD, DistanceWiseRKD
+from .weighted_soft_label_distillation import WSLD
+
+__all__ = [
+    'ChannelWiseDivergence', 'KLDivergence', 'AngleWiseRKD', 'DistanceWiseRKD',
+    'WSLD', 'L2Loss', 'ABLoss', 'DKDLoss', 'KDSoftCELoss', 'ActivationLoss',
+    'OnehotLikeLoss', 'InformationEntropyLoss', 'FTLoss', 'ATLoss', 'OFDLoss',
+    'L1Loss', 'FBKDLoss', 'CRDLoss', 'CrossEntropyLoss', 'PKDLoss', 'MGDLoss',
+    'DISTLoss'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ab_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ab_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..b4c628154d6e9943cc84362def2dea9768470fcb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ab_loss.py
@@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class ABLoss(nn.Module):
+    """Activation Boundaries Loss.
+
+    Paper: Knowledge Transfer via Distillation of Activation Boundaries
+    Formed by Hidden Neurons, AAAI2019. https://arxiv.org/pdf/1811.03233.pdf
+
+    Modified from: https://github.com/facebookresearch/AlphaNet
+
+    Args:
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        margin (float): Relaxation for training stability. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        loss_weight: float = 1.0,
+        margin: float = 1.0,
+    ) -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+        self.margin = margin
+
+    def forward(
+        self,
+        s_feature: torch.Tensor,
+        t_feature: torch.Tensor,
+    ) -> torch.Tensor:
+        """ABLoss forward function.
+
+        Args:
+            s_features (torch.Tensor): Student featuremap.
+            t_features (torch.Tensor): Teacher featuremap.
+        """
+        batch_size = s_feature.shape[0]
+        loss = self.criterion_alternative_l2(s_feature, t_feature)
+        loss = loss / batch_size / 1000 * 3
+        return self.loss_weight * loss
+
+    def criterion_alternative_l2(
+        self,
+        source: torch.Tensor,
+        target: torch.Tensor,
+    ) -> torch.Tensor:
+        """Piecewise differentiable loss approximating the activation
+        boundaries loss.
+
+        Guide the student learns a separating boundary between activation
+        region and deactivation region formed by each neuron in the teacher.
+
+        Args:
+            source (torch.Tensor): Student featuremap.
+            target (torch.Tensor): Teacher featuremap.
+        """
+        loss = ((source + self.margin)**2 * ((source > -self.margin) &
+                                             (target <= 0)).float() +
+                (source - self.margin)**2 * ((source <= self.margin) &
+                                             (target > 0)).float())
+        return torch.abs(loss).sum()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/at_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/at_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..025ac6748dfceda5d8bc288f70afed62ddb27fbe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/at_loss.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class ATLoss(nn.Module):
+    """"Paying More Attention to Attention: Improving the Performance of
+    Convolutional Neural Networks via Attention Transfer" Conference paper at
+    ICLR2017 https://openreview.net/forum?id=Sks9_ajex.
+
+    https://github.com/szagoruyko/attention-transfer/blob/master/utils.py
+
+    Args:
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        loss_weight: float = 1.0,
+    ) -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+
+    def forward(self, s_feature: torch.Tensor,
+                t_feature: torch.Tensor) -> torch.Tensor:
+        """"Forward function for ATLoss."""
+        loss = (self.calc_attention_matrix(s_feature) -
+                self.calc_attention_matrix(t_feature)).pow(2).mean()
+        return self.loss_weight * loss
+
+    def calc_attention_matrix(self, x: torch.Tensor) -> torch.Tensor:
+        """"Calculate the attention matrix.
+
+        Args:
+            x (torch.Tensor): Input features.
+        """
+        return F.normalize(x.pow(2).mean(1).view(x.size(0), -1))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/crd_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/crd_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ca81aaf5cd5f85bafee78f1287db037592346f3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/crd_loss.py
@@ -0,0 +1,271 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+from typing import Union
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class CRDLoss(nn.Module):
+    """Variate CRD Loss, ICLR 2020.
+
+    https://arxiv.org/abs/1910.10699
+    Args:
+        loss_weight (float, optional): loss weight. Defaults to 1.0.
+        temperature (float, optional): temperature. Defaults to 0.07.
+        neg_num (int, optional): number of negative samples. Defaults to 16384.
+        sample_n (int, optional): number of total samples. Defaults to 50000.
+        dim_out (int, optional): output channels. Defaults to 128.
+        momentum (float, optional): momentum. Defaults to 0.5.
+        eps (double, optional): eps. Defaults to 1e-7.
+    """
+
+    def __init__(self,
+                 loss_weight: float = 1.0,
+                 temperature=0.07,
+                 neg_num=16384,
+                 sample_n=50000,
+                 dim_out=128,
+                 momentum=0.5,
+                 eps=1e-7):
+        super().__init__()
+        self.loss_weight = loss_weight
+        self.eps = eps
+
+        self.contrast = ContrastMemory(dim_out, sample_n, neg_num, temperature,
+                                       momentum)
+        self.criterion_s_t = ContrastLoss(sample_n, eps=self.eps)
+
+    def forward(self, s_feats, t_feats, data_samples):
+        input_data = data_samples[0]
+        assert 'sample_idx' in input_data, \
+            'you should pass a dict with key `sample_idx` in mimic function.'
+        assert isinstance(
+            input_data.sample_idx, torch.Tensor
+        ), f'`sample_idx` must be a tensor, but get {type(input_data.sample_idx)}'  # noqa: E501
+
+        sample_idxs = torch.stack(
+            [sample.sample_idx for sample in data_samples])
+        if 'contrast_sample_idxs' in input_data:
+            assert isinstance(
+                input_data.contrast_sample_idxs, torch.Tensor
+            ), f'`contrast_sample_idxs` must be a tensor, but get {type(input_data.contrast_sample_idxs)}'  # noqa: E501
+            contrast_sample_idxs = torch.stack(
+                [sample.contrast_sample_idxs for sample in data_samples])
+        else:
+            contrast_sample_idxs = None
+        out_s, out_t = self.contrast(s_feats, t_feats, sample_idxs,
+                                     contrast_sample_idxs)
+        s_loss = self.criterion_s_t(out_s)
+        t_loss = self.criterion_s_t(out_t)
+        loss = s_loss + t_loss
+        return loss
+
+
+class ContrastLoss(nn.Module):
+    """contrastive loss, corresponding to Eq (18)
+
+    Args:
+        n_data (int): number of data
+        eps (float, optional): eps. Defaults to 1e-7.
+    """
+
+    def __init__(self, n_data: int, eps: float = 1e-7):
+        super(ContrastLoss, self).__init__()
+        self.n_data = n_data
+        self.eps = eps
+
+    def forward(self, x):
+        bsz = x.shape[0]
+        m = x.size(1) - 1
+
+        # noise distribution
+        Pn = 1 / float(self.n_data)
+
+        # loss for positive pair
+        P_pos = x.select(1, 0)
+        log_D1 = torch.div(P_pos, P_pos.add(m * Pn + self.eps)).log_()
+
+        # loss for neg_sample negative pair
+        P_neg = x.narrow(1, 1, m)
+        log_D0 = torch.div(P_neg.clone().fill_(m * Pn),
+                           P_neg.add(m * Pn + self.eps)).log_()
+
+        loss = -(log_D1.sum(0) + log_D0.view(-1, 1).sum(0)) / bsz
+
+        return loss
+
+
+class ContrastMemory(nn.Module):
+    """memory buffer that supplies large amount of negative samples.
+
+    https://github.com/HobbitLong/RepDistiller/blob/master/crd/memory.py
+
+    Args:
+        dim_out (int, optional): output channels. Defaults to 128.
+        n_sample (int, optional): number of total samples.
+            Defaults to 50000.
+        neg_sample (int, optional): number of negative samples.
+            Defaults to 16384.
+        T (float, optional): temperature. Defaults to 0.07.
+        momentum (float, optional): momentum. Defaults to 0.5.
+    """
+
+    def __init__(self,
+                 dim_out: int,
+                 n_sample: int,
+                 neg_sample: int,
+                 T: float = 0.07,
+                 momentum: float = 0.5):
+        super(ContrastMemory, self).__init__()
+        self.n_sample = n_sample
+        self.unigrams = torch.ones(self.n_sample)
+        self.multinomial = AliasMethod(self.unigrams)
+        # self.multinomial.cuda()
+        self.neg_sample = neg_sample
+
+        self.register_buffer('params',
+                             torch.tensor([neg_sample, T, -1, -1, momentum]))
+        stdv = 1. / math.sqrt(dim_out / 3)
+        self.register_buffer(
+            'memory_v1',
+            torch.rand(n_sample, dim_out).mul_(2 * stdv).add_(-stdv))
+        self.register_buffer(
+            'memory_v2',
+            torch.rand(n_sample, dim_out).mul_(2 * stdv).add_(-stdv))
+
+    def forward(self,
+                feat_s: torch.Tensor,
+                feat_t: torch.Tensor,
+                idx: torch.Tensor,
+                sample_idx: Union[None, torch.Tensor] = None) -> torch.Tensor:
+        neg_sample = int(self.params[0].item())
+        T = self.params[1].item()
+        Z_s = self.params[2].item()
+        Z_t = self.params[3].item()
+
+        momentum = self.params[4].item()
+        bsz = feat_s.size(0)
+        n_sample = self.memory_v1.size(0)
+        dim_out = self.memory_v1.size(1)
+
+        # original score computation
+        if sample_idx is None:
+            sample_idx = self.multinomial.draw(bsz * (self.neg_sample + 1))\
+                .view(bsz, -1)
+            sample_idx.select(1, 0).copy_(idx.data)
+        # sample
+        weight_s = torch.index_select(self.memory_v1, 0,
+                                      sample_idx.view(-1)).detach()
+        weight_s = weight_s.view(bsz, neg_sample + 1, dim_out)
+        out_t = torch.bmm(weight_s, feat_t.view(bsz, dim_out, 1))
+        out_t = torch.exp(torch.div(out_t, T))
+        # sample
+        weight_t = torch.index_select(self.memory_v2, 0,
+                                      sample_idx.view(-1)).detach()
+        weight_t = weight_t.view(bsz, neg_sample + 1, dim_out)
+        out_s = torch.bmm(weight_t, feat_s.view(bsz, dim_out, 1))
+        out_s = torch.exp(torch.div(out_s, T))
+
+        # set Z if haven't been set yet
+        if Z_s < 0:
+            self.params[2] = out_s.mean() * n_sample
+            Z_s = self.params[2].clone().detach().item()
+            print('normalization constant Z_s is set to {:.1f}'.format(Z_s))
+        if Z_t < 0:
+            self.params[3] = out_t.mean() * n_sample
+            Z_t = self.params[3].clone().detach().item()
+            print('normalization constant Z_t is set to {:.1f}'.format(Z_t))
+
+        # compute out_s, out_t
+        out_s = torch.div(out_s, Z_s).contiguous()
+        out_t = torch.div(out_t, Z_t).contiguous()
+
+        # update memory
+        with torch.no_grad():
+            l_pos = torch.index_select(self.memory_v1, 0, idx.view(-1))
+            l_pos.mul_(momentum)
+            l_pos.add_(torch.mul(feat_s, 1 - momentum))
+            l_norm = l_pos.pow(2).sum(1, keepdim=True).pow(0.5)
+            updated_v1 = l_pos.div(l_norm)
+            self.memory_v1.index_copy_(0, idx, updated_v1)
+
+            ab_pos = torch.index_select(self.memory_v2, 0, idx.view(-1))
+            ab_pos.mul_(momentum)
+            ab_pos.add_(torch.mul(feat_t, 1 - momentum))
+            ab_norm = ab_pos.pow(2).sum(1, keepdim=True).pow(0.5)
+            updated_v2 = ab_pos.div(ab_norm)
+            self.memory_v2.index_copy_(0, idx, updated_v2)
+
+        return out_s, out_t
+
+
+class AliasMethod(object):
+    """
+    From: https://hips.seas.harvard.edu/blog/2013/03/03/
+    the-alias-method-efficient-sampling-with-many-discrete-outcomes/
+
+    Args:
+        probs (torch.Tensor): probility vector.
+    """
+
+    def __init__(self, probs: torch.Tensor) -> None:
+
+        if probs.sum() > 1:
+            probs.div_(probs.sum())
+        neg_sample = len(probs)
+        self.prob = torch.zeros(neg_sample)
+        self.alias = torch.LongTensor([0] * neg_sample)
+
+        # Sort the data into the outcomes with probabilities
+        # that are larger and smaller than 1/neg_sample.
+        smaller = []
+        larger = []
+        for kk, prob in enumerate(probs):
+            self.prob[kk] = neg_sample * prob
+            if self.prob[kk] < 1.0:
+                smaller.append(kk)
+            else:
+                larger.append(kk)
+
+        # Loop though and create little binary mixtures that
+        # appropriately allocate the larger outcomes over the
+        # overall uniform mixture.
+        while len(smaller) > 0 and len(larger) > 0:
+            small = smaller.pop()
+            large = larger.pop()
+
+            self.alias[small] = large
+            self.prob[large] = (self.prob[large] - 1.0) + self.prob[small]
+
+            if self.prob[large] < 1.0:
+                smaller.append(large)
+            else:
+                larger.append(large)
+
+        for last_one in smaller + larger:
+            self.prob[last_one] = 1
+
+    def cuda(self):
+        """To cuda device."""
+        self.prob = self.prob.cuda()
+        self.alias = self.alias.cuda()
+
+    def draw(self, N: int) -> torch.Tensor:
+        """Draw N samples from multinomial."""
+        neg_sample = self.alias.size(0)
+
+        kk = torch.zeros(
+            N, dtype=torch.long,
+            device=self.prob.device).random_(0, neg_sample)
+        prob = self.prob.index_select(0, kk)
+        alias = self.alias.index_select(0, kk)
+        # b is whether a random number is greater than q
+        b = torch.bernoulli(prob)
+        oq = kk.mul(b.long())
+        oj = alias.mul((1 - b).long())
+
+        return oq + oj
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cross_entropy_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cross_entropy_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..685748092960fb7ddbab5f1390df0b927467a1a9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cross_entropy_loss.py
@@ -0,0 +1,23 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class CrossEntropyLoss(nn.Module):
+    """Cross entropy loss.
+
+    Args:
+        loss_weight (float):  Weight of the loss. Defaults to 1.0.
+    """
+
+    def __init__(self, loss_weight=1.0):
+        super(CrossEntropyLoss, self).__init__()
+        self.loss_weight = loss_weight
+
+    def forward(self, preds_S, preds_T):
+        preds_T = preds_T.detach()
+        loss = F.cross_entropy(preds_S, preds_T.argmax(dim=1))
+        return loss * self.loss_weight
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cwd.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cwd.py
new file mode 100644
index 0000000000000000000000000000000000000000..3c8ca1195fca24cdb1af856c6a45932a48f07c00
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/cwd.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class ChannelWiseDivergence(nn.Module):
+    """PyTorch version of `Channel-wise Distillation for Semantic Segmentation.
+
+    <https://arxiv.org/abs/2011.13256>`_.
+
+    Args:
+        tau (float): Temperature coefficient. Defaults to 1.0.
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+    """
+
+    def __init__(self, tau=1.0, loss_weight=1.0):
+        super(ChannelWiseDivergence, self).__init__()
+        self.tau = tau
+        self.loss_weight = loss_weight
+
+    def forward(self, preds_S, preds_T):
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction with
+                shape (N, C, H, W).
+            preds_T (torch.Tensor): The teacher model prediction with
+                shape (N, C, H, W).
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        assert preds_S.shape[-2:] == preds_T.shape[-2:]
+        N, C, H, W = preds_S.shape
+
+        softmax_pred_T = F.softmax(preds_T.view(-1, W * H) / self.tau, dim=1)
+
+        logsoftmax = torch.nn.LogSoftmax(dim=1)
+        loss = torch.sum(softmax_pred_T *
+                         logsoftmax(preds_T.view(-1, W * H) / self.tau) -
+                         softmax_pred_T *
+                         logsoftmax(preds_S.view(-1, W * H) / self.tau)) * (
+                             self.tau**2)
+
+        loss = self.loss_weight * loss / (C * N)
+
+        return loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dafl_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dafl_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6205feae4b7d1a6ae4c4dceeef4b60f6cd8b962
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dafl_loss.py
@@ -0,0 +1,124 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmengine.dist import get_dist_info
+
+from mmrazor.registry import MODELS
+from ..architectures.ops import GatherTensors
+
+
+class DAFLLoss(nn.Module):
+    """Base class for DAFL losses.
+
+    paper link: https://arxiv.org/pdf/1904.01186.pdf
+
+    Args:
+        loss_weight (float): Weight of the loss. Defaults to 1.0.
+    """
+
+    def __init__(self, loss_weight=1.0) -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+
+    def forward(self, preds_T: torch.Tensor) -> torch.Tensor:
+        """Forward function for the DAFLLoss.
+
+        Args:
+            preds_T (torch.Tensor): The predictions of teacher.
+        """
+        return self.loss_weight * self.forward_train(preds_T)
+
+    def forward_train(self, preds_T: torch.Tensor) -> torch.Tensor:
+        """Forward function during training.
+
+        Args:
+            preds_T (torch.Tensor): The predictions of teacher.
+        """
+        raise NotImplementedError
+
+
+@MODELS.register_module()
+class OnehotLikeLoss(DAFLLoss):
+    """The loss function for measuring the one-hot-likeness of the target
+    logits."""
+
+    def __init__(self, **kwargs) -> None:
+        super().__init__(**kwargs)
+
+    def forward_train(self, preds_T: torch.Tensor) -> torch.Tensor:
+        """Forward function in training for the OnehotLikeLoss.
+
+        Args:
+            preds_T (torch.Tensor): The predictions of teacher.
+        """
+        fake_label = preds_T.data.max(1)[1]
+        return F.cross_entropy(preds_T, fake_label)
+
+
+@MODELS.register_module()
+class InformationEntropyLoss(DAFLLoss):
+    """The loss function for measuring the class balance of the target logits.
+
+    Args:
+        gather (bool, optional): The switch controlling whether
+            collecting tensors from multiple gpus. Defaults to True.
+    """
+
+    def __init__(self, gather=True, **kwargs) -> None:
+        super().__init__(**kwargs)
+        self.gather = gather
+        _, self.world_size = get_dist_info()
+
+    def forward_train(self, preds_T: torch.Tensor) -> torch.Tensor:
+        """Forward function in training for the InformationEntropyLoss.
+
+        Args:
+            preds_T (torch.Tensor): The predictions of teacher.
+        """
+        # Gather predictions from all GPUS to calibrate the loss function.
+        if self.gather and self.world_size > 1:
+            preds_T = torch.cat(GatherTensors.apply(preds_T), dim=0)
+        class_prob = F.softmax(preds_T, dim=1).mean(dim=0)
+        info_entropy = class_prob * torch.log10(class_prob)
+        return info_entropy.sum()
+
+
+@MODELS.register_module()
+class ActivationLoss(nn.Module):
+    """The loss function for measuring the activation of the target featuremap.
+    It is negative of the norm of the target featuremap.
+
+    Args:
+        loss_weight (float): Weight of the loss. Defaults to 1.0.
+        norm_type (str, optional):The type of the norm. Defaults to 'abs'.
+    """
+
+    def __init__(self, loss_weight=1.0, norm_type='abs') -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+        assert norm_type in ['norm', 'abs'], \
+            '"norm_type" must be "norm" or "abs"'
+        self.norm_type = norm_type
+
+        if self.norm_type == 'norm':
+            self.norm_fn = lambda x: -x.norm()
+        elif self.norm_type == 'abs':
+            self.norm_fn = lambda x: -x.abs().mean()
+
+    def forward(self, feat_T: torch.Tensor) -> torch.Tensor:
+        """Forward function for the ActivationLoss.
+
+        Args:
+            feat_T (torch.Tensor): The featuremap of teacher.
+        """
+        return self.loss_weight * self.forward_train(feat_T)
+
+    def forward_train(self, feat_T: torch.Tensor) -> torch.Tensor:
+        """Forward function in training for the ActivationLoss.
+
+        Args:
+            feat_T (torch.Tensor): The featuremap of teacher.
+        """
+        feat_T = feat_T.view(feat_T.size(0), -1)
+        return self.norm_fn(feat_T)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/decoupled_kd.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/decoupled_kd.py
new file mode 100644
index 0000000000000000000000000000000000000000..2aeff6a42204505e34125bd432d780f4c27858aa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/decoupled_kd.py
@@ -0,0 +1,157 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class DKDLoss(nn.Module):
+    """Decoupled Knowledge Distillation, CVPR2022.
+
+    link: https://arxiv.org/abs/2203.08679
+    reformulate the classical KD loss into two parts:
+        1. target class knowledge distillation (TCKD)
+        2. non-target class knowledge distillation (NCKD).
+    Args:
+    tau (float): Temperature coefficient. Defaults to 1.0.
+    alpha (float): Weight of TCKD loss. Defaults to 1.0.
+    beta (float): Weight of NCKD loss. Defaults to 1.0.
+    reduction (str): Specifies the reduction to apply to the loss:
+        ``'none'`` | ``'batchmean'`` | ``'sum'`` | ``'mean'``.
+        ``'none'``: no reduction will be applied,
+        ``'batchmean'``: the sum of the output will be divided by
+            the batchsize,
+        ``'sum'``: the output will be summed,
+        ``'mean'``: the output will be divided by the number of
+            elements in the output.
+        Default: ``'batchmean'``
+    loss_weight (float): Weight of loss. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        tau: float = 1.0,
+        alpha: float = 1.0,
+        beta: float = 1.0,
+        reduction: str = 'batchmean',
+        loss_weight: float = 1.0,
+    ) -> None:
+        super(DKDLoss, self).__init__()
+        self.tau = tau
+        accept_reduction = {'none', 'batchmean', 'sum', 'mean'}
+        assert reduction in accept_reduction, \
+            f'KLDivergence supports reduction {accept_reduction}, ' \
+            f'but gets {reduction}.'
+        self.reduction = reduction
+        self.alpha = alpha
+        self.beta = beta
+        self.loss_weight = loss_weight
+
+    def forward(
+        self,
+        preds_S: torch.Tensor,
+        preds_T: torch.Tensor,
+        gt_labels: torch.Tensor,
+    ) -> torch.Tensor:
+        """DKDLoss forward function.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction, shape (N, C).
+            preds_T (torch.Tensor): The teacher model prediction, shape (N, C).
+            gt_labels (torch.Tensor): The gt label tensor, shape (N, C).
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        gt_mask = self._get_gt_mask(preds_S, gt_labels)
+        tckd_loss = self._get_tckd_loss(preds_S, preds_T, gt_labels, gt_mask)
+        nckd_loss = self._get_nckd_loss(preds_S, preds_T, gt_mask)
+        loss = self.alpha * tckd_loss + self.beta * nckd_loss
+        return self.loss_weight * loss
+
+    def _get_nckd_loss(
+        self,
+        preds_S: torch.Tensor,
+        preds_T: torch.Tensor,
+        gt_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate non-target class knowledge distillation."""
+        # implementation to mask out gt_mask, faster than index
+        s_nckd = F.log_softmax(preds_S / self.tau - 1000.0 * gt_mask, dim=1)
+        t_nckd = F.softmax(preds_T / self.tau - 1000.0 * gt_mask, dim=1)
+        return self._kl_loss(s_nckd, t_nckd)
+
+    def _get_tckd_loss(
+        self,
+        preds_S: torch.Tensor,
+        preds_T: torch.Tensor,
+        gt_labels: torch.Tensor,
+        gt_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate target class knowledge distillation."""
+        non_gt_mask = self._get_non_gt_mask(preds_S, gt_labels)
+        s_tckd = F.softmax(preds_S / self.tau, dim=1)
+        t_tckd = F.softmax(preds_T / self.tau, dim=1)
+        mask_student = torch.log(self._cat_mask(s_tckd, gt_mask, non_gt_mask))
+        mask_teacher = self._cat_mask(t_tckd, gt_mask, non_gt_mask)
+        return self._kl_loss(mask_student, mask_teacher)
+
+    def _kl_loss(
+        self,
+        preds_S: torch.Tensor,
+        preds_T: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate the KL Divergence."""
+        kl_loss = F.kl_div(
+            preds_S, preds_T, size_average=False,
+            reduction=self.reduction) * self.tau**2
+        return kl_loss
+
+    def _cat_mask(
+        self,
+        tckd: torch.Tensor,
+        gt_mask: torch.Tensor,
+        non_gt_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate preds of target (pt) & preds of non-target (pnt)."""
+        t1 = (tckd * gt_mask).sum(dim=1, keepdims=True)
+        t2 = (tckd * non_gt_mask).sum(dim=1, keepdims=True)
+        return torch.cat([t1, t2], dim=1)
+
+    def _get_gt_mask(
+        self,
+        logits: torch.Tensor,
+        target: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate groundtruth mask on logits with target class tensor.
+
+        Args:
+            logits (torch.Tensor): The prediction logits with shape (N, C).
+            target (torch.Tensor): The gt_label target with shape (N, C).
+
+        Return:
+            torch.Tensor: The masked logits.
+        """
+        target = target.reshape(-1)
+        return torch.zeros_like(logits).scatter_(1, target.unsqueeze(1),
+                                                 1).bool()
+
+    def _get_non_gt_mask(
+        self,
+        logits: torch.Tensor,
+        target: torch.Tensor,
+    ) -> torch.Tensor:
+        """Calculate non-groundtruth mask on logits with target class tensor.
+
+        Args:
+            logits (torch.Tensor): The prediction logits with shape (N, C).
+            target (torch.Tensor): The gt_label target with shape (N, C).
+
+        Return:
+            torch.Tensor: The masked logits.
+        """
+        target = target.reshape(-1)
+        return torch.ones_like(logits).scatter_(1, target.unsqueeze(1),
+                                                0).bool()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dist_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dist_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..4d7ac63af6a604b1d1926e3c7247db3b198ea661
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/dist_loss.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+def cosine_similarity(a, b, eps=1e-8):
+    return (a * b).sum(1) / (a.norm(dim=1) * b.norm(dim=1) + eps)
+
+
+def pearson_correlation(a, b, eps=1e-8):
+    return cosine_similarity(a - a.mean(1, keepdim=True),
+                             b - b.mean(1, keepdim=True), eps)
+
+
+def inter_class_relation(y_s, y_t):
+    return 1 - pearson_correlation(y_s, y_t).mean()
+
+
+def intra_class_relation(y_s, y_t):
+    return inter_class_relation(y_s.transpose(0, 1), y_t.transpose(0, 1))
+
+
+@MODELS.register_module()
+class DISTLoss(nn.Module):
+
+    def __init__(
+        self,
+        inter_loss_weight=1.0,
+        intra_loss_weight=1.0,
+        tau=1.0,
+        loss_weight: float = 1.0,
+        teacher_detach: bool = True,
+    ):
+        super(DISTLoss, self).__init__()
+        self.inter_loss_weight = inter_loss_weight
+        self.intra_loss_weight = intra_loss_weight
+        self.tau = tau
+
+        self.loss_weight = loss_weight
+        self.teacher_detach = teacher_detach
+
+    def forward(self, logits_S, logits_T: torch.Tensor):
+        if self.teacher_detach:
+            logits_T = logits_T.detach()
+        y_s = (logits_S / self.tau).softmax(dim=1)
+        y_t = (logits_T / self.tau).softmax(dim=1)
+        inter_loss = self.tau**2 * inter_class_relation(y_s, y_t)
+        intra_loss = self.tau**2 * intra_class_relation(y_s, y_t)
+        kd_loss = self.inter_loss_weight * inter_loss + self.intra_loss_weight * intra_loss  # noqa
+        return kd_loss * self.loss_weight
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/factor_transfer_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/factor_transfer_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf49920dd0dc8bfea638ce35fd86981ab61dcdab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/factor_transfer_loss.py
@@ -0,0 +1,38 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class FTLoss(nn.Module):
+    """Paraphrasing Complex Network: Network Compression via Factor Transfer,
+    NeurIPS 2018.
+
+    https://arxiv.org/pdf/1802.04977.pdf
+
+    Args:
+        loss_weight (float, optional): loss weight. Defaults to 1.0.
+    """
+
+    def __init__(self, loss_weight: float = 1.0) -> None:
+        super(FTLoss, self).__init__()
+        self.criterion = nn.L1Loss()
+        self.loss_weight = loss_weight
+
+    def forward_train(self, s_feature: torch.Tensor,
+                      t_feature: torch.Tensor) -> torch.Tensor:
+        """loss computation func."""
+        loss = self.criterion(self.factor(s_feature), self.factor(t_feature))
+        return loss
+
+    def forward(self, s_feature: torch.Tensor,
+                t_feature: torch.Tensor) -> torch.Tensor:
+        """the forward func."""
+        return self.loss_weight * self.forward_train(s_feature, t_feature)
+
+    def factor(self, x: torch.Tensor) -> torch.Tensor:
+        """compute factor."""
+        return F.normalize(x.view(x.size(0), -1))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/fbkd_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/fbkd_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..66115c093aa47b14287934756acb8df736e16990
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/fbkd_loss.py
@@ -0,0 +1,120 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional, Tuple
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+def mask_l2_loss(
+        tensor_a: torch.Tensor,
+        tensor_b: torch.Tensor,
+        saptial_attention_mask: Optional[torch.Tensor] = None,
+        channel_attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+    """L2 loss with two attention mask, which used to weight the feature
+    distillation loss in FBKD.
+
+    Args:
+        tensor_a (torch.Tensor): Student featuremap.
+        tensor_b (torch.Tensor): Teacher featuremap.
+        saptial_attention_mask (torch.Tensor, optional): Mask of spatial-wise
+            attention. Defaults to None.
+        channel_attention_mask (torch.Tensor, optional): Mask of channel-wise
+            attention. Defaults to None.
+
+    Returns:
+        diff (torch.Tensor): l2 loss with two attention mask.
+    """
+    diff = (tensor_a - tensor_b)**2
+    if saptial_attention_mask is not None:
+        diff = diff * saptial_attention_mask
+    if channel_attention_mask is not None:
+        diff = diff * channel_attention_mask
+    diff = torch.sum(diff)**0.5
+    return diff
+
+
+@MODELS.register_module()
+class FBKDLoss(nn.Module):
+    """Loss For FBKD, which includs feat_loss, channel_loss, spatial_loss and
+    nonlocal_loss.
+
+    Source code:
+    https://github.com/ArchipLab-LinfengZhang/Object-Detection-Knowledge-
+    Distillation-ICLR2021
+
+    Args:
+        mask_l2_weight (float): The weight of the mask l2 loss.
+            Defaults to 7e-5, which is the default value in source code.
+        channel_weight (float): The weight of the channel loss.
+            Defaults to 4e-3, which is the default value in source code.
+        spatial_weight (float): The weight of the spatial loss.
+            Defaults to 4e-3, which is the default value in source code.
+        nonloacl_weight (float): The weight of the nonlocal loss.
+            Defaults to 7e-5, which is the default value in source code.
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+    """
+
+    def __init__(self,
+                 mask_l2_weight: float = 7e-5,
+                 channel_weight: float = 4e-3,
+                 spatial_weight: float = 4e-3,
+                 nonloacl_weight: float = 7e-5,
+                 loss_weight: float = 1.0) -> None:
+        """Inits FBKDLoss."""
+        super().__init__()
+
+        self.mask_l2_weight = mask_l2_weight
+        self.channel_weight = channel_weight
+        self.spatial_weight = spatial_weight
+        self.nonloacl_weight = nonloacl_weight
+        self.loss_weight = loss_weight
+
+    def forward(self, s_input: Tuple[torch.Tensor, ...],
+                t_input: Tuple[torch.Tensor, ...]) -> torch.Tensor:
+        """Forward function of FBKDLoss, including feat_loss, channel_loss,
+        spatial_loss and nonlocal_loss.
+
+        Args:
+            s_input (Tuple[torch.Tensor, ...]): Student input which is the
+                output of ``'FBKDStudentConnector'``.
+            t_input (Tuple[torch.Tensor, ...]): Teacher input which is the
+                output of ``'FBKDTeacherConnector'``.
+        """
+        losses = 0.0
+
+        (s_spatial_mask, s_channel_mask, s_channel_pool_adapt,
+         s_spatial_pool_adapt, s_relation_adapt, s_feat_adapt) = s_input
+
+        (t_spatial_mask, t_channel_mask, t_spatial_pool, t_relation,
+         t_feat) = t_input
+
+        # Spatial-wise mask.
+        spatial_sum_mask = (t_spatial_mask + s_spatial_mask) / 2
+        spatial_sum_mask = spatial_sum_mask.detach()
+
+        # Channel-wise mask, but not used in the FBKD source code.
+        channel_sum_mask = (t_channel_mask + s_channel_mask) / 2
+        channel_sum_mask = channel_sum_mask.detach()
+
+        # feat_loss with mask
+        losses += mask_l2_loss(
+            t_feat,
+            s_feat_adapt,
+            saptial_attention_mask=spatial_sum_mask,
+            channel_attention_mask=None) * self.mask_l2_weight
+
+        # channel_loss
+        losses += torch.dist(torch.mean(t_feat, [2, 3]),
+                             s_channel_pool_adapt) * self.channel_weight
+
+        # spatial_loss
+        losses += torch.dist(t_spatial_pool,
+                             s_spatial_pool_adapt) * self.spatial_weight
+
+        # nonlocal_loss
+        losses += torch.dist(
+            t_relation, s_relation_adapt, p=2) * self.nonloacl_weight
+
+        return self.loss_weight * losses
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kd_soft_ce_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kd_soft_ce_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ef8c23a4217e5587232352f83d737a17209b38e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kd_soft_ce_loss.py
@@ -0,0 +1,94 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+try:
+    from mmcls.models.losses.cross_entropy_loss import soft_cross_entropy
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    soft_cross_entropy = get_placeholder('mmcls')
+
+
+@MODELS.register_module()
+class KDSoftCELoss(nn.Module):
+    """Distilling the Knowledge in a Neural Network, NIPS2014. Based on Soft
+    Cross Entropy criterion.
+
+    https://arxiv.org/pdf/1503.02531.pdf
+
+
+    Args:
+        tau (int, optional): Temperature. Defaults to 1.
+        reduction (str): Specifies the reduction to apply to the loss:
+            ``'none'`` | ``'none'`` | ``'sum'`` | ``'mean'``.
+            ``'none'``: no reduction will be applied,
+            ``'sum'``: the output will be summed,
+            ``'mean'``: the output will be divided by the number of
+                elements in the output.
+            Default: ``'mean'``
+        mult_tem_square (bool, optional): Multiply square of temperature
+            or not. Defaults to True.
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        tau: float = 1.0,
+        reduction: str = 'mean',
+        mult_tem_square: bool = True,
+        loss_weight: float = 1.0,
+    ) -> None:
+        super().__init__()
+        self.tau = tau
+        self.mult_tem_square = mult_tem_square
+        self.loss_weight = loss_weight
+        self.cls_criterion = soft_cross_entropy
+
+        accept_reduction = {None, 'none', 'mean', 'sum'}
+        assert reduction in accept_reduction, \
+            f'KLDivergence supports reduction {accept_reduction}, ' \
+            f'but gets {reduction}.'
+        self.reduction = reduction
+
+    def forward(
+        self,
+        preds_S: torch.Tensor,
+        preds_T: torch.Tensor,
+        weight: torch.Tensor = None,
+        avg_factor: int = None,
+        reduction_override: str = None,
+    ) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction with
+                shape (N, C).
+            preds_T (torch.Tensor): The teacher model prediction with
+                shape (N, C).
+            weight (torch.Tensor, optional): Sample-wise loss weight with
+                shape (N, C). Defaults to None.
+            avg_factor (int, optional): Average factor that is used to average
+                the loss. Defaults to None.
+            reduction_override (str, optiom): Override redunction in forward.
+                Defaults to None.
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        reduction = (
+            reduction_override if reduction_override else self.reduction)
+
+        preds_S = preds_S / self.tau
+        soft_label = F.softmax((preds_T / self.tau), dim=-1)
+        loss_cls = self.loss_weight * self.cls_criterion(
+            preds_S,
+            soft_label,
+            weight,
+            reduction=reduction,
+            avg_factor=avg_factor)
+        if self.mult_tem_square:
+            loss_cls *= (self.tau**2)
+        return loss_cls
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kl_divergence.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kl_divergence.py
new file mode 100644
index 0000000000000000000000000000000000000000..d79d74c490932361b6b801b8fc6071e5c5c9c659
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/kl_divergence.py
@@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class KLDivergence(nn.Module):
+    """A measure of how one probability distribution Q is different from a
+    second, reference probability distribution P.
+
+    Args:
+        tau (float): Temperature coefficient. Defaults to 1.0.
+        reduction (str): Specifies the reduction to apply to the loss:
+            ``'none'`` | ``'batchmean'`` | ``'sum'`` | ``'mean'``.
+            ``'none'``: no reduction will be applied,
+            ``'batchmean'``: the sum of the output will be divided by
+                the batchsize,
+            ``'sum'``: the output will be summed,
+            ``'mean'``: the output will be divided by the number of
+                elements in the output.
+            Default: ``'batchmean'``
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        teacher_detach (bool): Whether to detach the teacher model prediction.
+            Will set to ``'False'`` in some data-free distillation algorithms.
+            Defaults to True.
+    """
+
+    def __init__(
+        self,
+        tau: float = 1.0,
+        reduction: str = 'batchmean',
+        loss_weight: float = 1.0,
+        teacher_detach: bool = True,
+    ):
+        super(KLDivergence, self).__init__()
+        self.tau = tau
+        self.loss_weight = loss_weight
+        self.teacher_detach = teacher_detach
+
+        accept_reduction = {'none', 'batchmean', 'sum', 'mean'}
+        assert reduction in accept_reduction, \
+            f'KLDivergence supports reduction {accept_reduction}, ' \
+            f'but gets {reduction}.'
+        self.reduction = reduction
+
+    def forward(self, preds_S, preds_T):
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction with
+                shape (N, C, H, W) or shape (N, C).
+            preds_T (torch.Tensor): The teacher model prediction with
+                shape (N, C, H, W) or shape (N, C).
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        if self.teacher_detach:
+            preds_T = preds_T.detach()
+        softmax_pred_T = F.softmax(preds_T / self.tau, dim=1)
+        logsoftmax_preds_S = F.log_softmax(preds_S / self.tau, dim=1)
+        loss = (self.tau**2) * F.kl_div(
+            logsoftmax_preds_S, softmax_pred_T, reduction=self.reduction)
+        return self.loss_weight * loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l1_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l1_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a220bc17a92defdc8f99a285b1c83f80810e63d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l1_loss.py
@@ -0,0 +1,71 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class L1Loss(nn.Module):
+    """Calculate the one-norm loss between the two inputs.
+
+    Args:
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        size_average (bool, optional): Deprecated (see :attr:`reduction`). By
+            default, the losses are averaged over each loss element in the
+            batch. Note that for some losses, there multiple elements per
+            sample. If the field :attr:`size_average` is set to ``False``, the
+            losses are instead summed for each minibatch. Ignored when reduce
+            is ``False``. Defaults to True.
+        reduce (bool, optional): Deprecated (see :attr:`reduction`). By
+            default, the losses are averaged or summed over observations for
+            each minibatch depending on :attr:`size_average`. When
+            :attr:`reduce` is ``False``, returns a loss per batch element
+            instead and ignores :attr:`size_average`. Defaults to True.
+        reduction (string, optional): Specifies the reduction to apply to the
+            output: ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no
+            reduction will be applied, ``'mean'``: the sum of the output will
+            be divided by the number of elements in the output, ``'sum'``: the
+            output will be summed. Note: :attr:`size_average` and
+            :attr: `reduce` are in the process of being deprecated, and in the
+            meantime, specifying either of those two args will override
+            :attr:`reduction`. Defaults to mean.
+    """
+
+    def __init__(
+        self,
+        loss_weight: float = 1.0,
+        size_average: Optional[bool] = None,
+        reduce: Optional[bool] = None,
+        reduction: str = 'mean',
+    ) -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+        self.size_average = size_average
+        self.reduce = reduce
+
+        accept_reduction = {'none', 'batchmean', 'sum', 'mean'}
+        assert reduction in accept_reduction, \
+            f'KLDivergence supports reduction {accept_reduction}, ' \
+            f'but gets {reduction}.'
+        self.reduction = reduction
+
+    def forward(
+        self,
+        s_feature: torch.Tensor,
+        t_feature: torch.Tensor,
+    ) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            s_feature (torch.Tensor): The student model feature with
+                shape (N, C, H, W) or shape (N, C).
+            t_feature (torch.Tensor): The teacher model feature with
+                shape (N, C, H, W) or shape (N, C).
+        """
+        loss = F.l1_loss(s_feature, t_feature, self.size_average, self.reduce,
+                         self.reduction)
+        return self.loss_weight * loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l2_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l2_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..553131274827dbc3f6af95bdea3547c774013d78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/l2_loss.py
@@ -0,0 +1,75 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class L2Loss(nn.Module):
+    """Calculate the two-norm loss between the two features.
+
+    Args:
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        normalize (bool): Whether to normalize the feature. Defaults to True.
+        mult (float): Multiplier for feature normalization. Defaults to 1.0.
+        div_element (bool): Whether to divide the loss by element-wise.
+            Defaults to False.
+        dist (bool): Whether to conduct two-norm dist as torch.dist(p=2).
+            Defaults to False.
+    """
+
+    def __init__(
+        self,
+        loss_weight: float = 1.0,
+        normalize: bool = True,
+        mult: float = 1.0,
+        div_element: bool = False,
+        dist: bool = False,
+    ) -> None:
+        super().__init__()
+        self.loss_weight = loss_weight
+        self.normalize = normalize
+        self.mult = mult
+        self.div_element = div_element
+        self.dist = dist
+
+    def forward(
+        self,
+        s_feature: torch.Tensor,
+        t_feature: torch.Tensor,
+    ) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            s_feature (torch.Tensor): The student model feature with
+                shape (N, C, H, W) or shape (N, C).
+            t_feature (torch.Tensor): The teacher model feature with
+                shape (N, C, H, W) or shape (N, C).
+        """
+        if self.normalize:
+            s_feature = self.normalize_feature(s_feature)
+            t_feature = self.normalize_feature(t_feature)
+
+        loss = torch.sum(torch.pow(torch.sub(s_feature, t_feature), 2))
+
+        # Calculate l2_loss as dist.
+        if self.dist:
+            loss = torch.sqrt(loss)
+        else:
+            if self.div_element:
+                loss = loss / s_feature.numel()
+            else:
+                loss = loss / s_feature.size(0)
+
+        return self.loss_weight * loss
+
+    def normalize_feature(self, feature: torch.Tensor) -> torch.Tensor:
+        """Normalize the input feature.
+
+        Args:
+            feature (torch.Tensor): The student model feature with
+                shape (N, C, H, W) or shape (N, C).
+        """
+        feature = feature.view(feature.size(0), -1)
+        return feature / feature.norm(2, dim=1, keepdim=True) * self.mult
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/mgd_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/mgd_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..1321edc53546818ff3323b7073e14817b7884223
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/mgd_loss.py
@@ -0,0 +1,54 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class MGDLoss(nn.Module):
+    """PyTorch version of `Masked Generative Distillation.
+
+    <https://arxiv.org/abs/2205.01529>`
+
+    Args:
+        alpha_mgd (float, optional): Weight of dis_loss. Defaults to 0.00002
+    """
+
+    def __init__(self, alpha_mgd: float = 0.00002) -> None:
+        super(MGDLoss, self).__init__()
+        self.alpha_mgd = alpha_mgd
+        self.loss_mse = nn.MSELoss(reduction='sum')
+
+    def forward(self, preds_S: torch.Tensor,
+                preds_T: torch.Tensor) -> torch.Tensor:
+        """Forward function.
+
+        Args:
+            preds_S(torch.Tensor): Bs*C*H*W, student's feature map
+            preds_T(torch.Tensor): Bs*C*H*W, teacher's feature map
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        assert preds_S.shape == preds_T.shape
+        loss = self.get_dis_loss(preds_S, preds_T) * self.alpha_mgd
+
+        return loss
+
+    def get_dis_loss(self, preds_S: torch.Tensor,
+                     preds_T: torch.Tensor) -> torch.Tensor:
+        """Get MSE distance of preds_S and preds_T.
+
+        Args:
+            preds_S(torch.Tensor): Bs*C*H*W, student's feature map
+            preds_T(torch.Tensor): Bs*C*H*W, teacher's feature map
+
+        Return:
+            torch.Tensor: The calculated mse distance value.
+        """
+        N, C, H, W = preds_T.shape
+        dis_loss = self.loss_mse(preds_S, preds_T) / N
+
+        return dis_loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ofd_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ofd_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e0c0adc6da5c29eb0b9a77cae3040888a7c8c16
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/ofd_loss.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class OFDLoss(nn.Module):
+    """A Comprehensive Overhaul of Feature Distillation
+    https://sites.google.com/view/byeongho-heo/overhaul.
+
+    The partial L2loss, only calculating loss when
+    `out_s > out_t` or `out_t > 0`.
+
+    Args:
+        loss_weight (float, optional): loss weight. Defaults to 1.0.
+        mul_factor (float, optional): multiply factor. Defaults to 1000.
+    """
+
+    def __init__(self,
+                 loss_weight: float = 1.0,
+                 mul_factor: float = 1000.) -> None:
+        super(OFDLoss, self).__init__()
+        self.loss_weight = loss_weight
+        self.mul_factor = mul_factor
+
+    def forward_train(self, s_feature: torch.Tensor,
+                      t_feature: torch.Tensor) -> torch.Tensor:
+        """forward func for training.
+
+        Args:
+            s_feature (torch.Tensor): student's feature
+            t_feature (torch.Tensor): teacher's feature
+
+        Returns:
+            torch.Tensor: loss
+        """
+        bsz = s_feature.shape[0]
+        loss = torch.nn.functional.mse_loss(
+            s_feature, t_feature, reduction='none')
+        loss = loss * ((s_feature > t_feature) | (t_feature > 0)).float()
+        return loss.sum() / bsz / self.mul_factor
+
+    def forward(self, s_feature: torch.Tensor,
+                t_feature: torch.Tensor) -> torch.Tensor:
+        """forward func.
+
+        Args:
+            s_feature (torch.Tensor): student's feature
+            t_feature (torch.Tensor): teacher's feature
+
+        Returns:
+            torch.Tensor: loss
+        """
+        return self.loss_weight * self.forward_train(s_feature, t_feature)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/pkd_loss.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/pkd_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..febc05c36d5228404860d2e246d7a844af190644
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/pkd_loss.py
@@ -0,0 +1,83 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class PKDLoss(nn.Module):
+    """PyTorch version of `PKD: General Distillation Framework for Object
+    Detectors via Pearson Correlation Coefficient.
+
+    <https://arxiv.org/abs/2207.02039>`_.
+
+    Args:
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        resize_stu (bool): If True, we'll down/up sample the features of the
+            student model to the spatial size of those of the teacher model if
+            their spatial sizes are different. And vice versa. Defaults to
+            True.
+    """
+
+    def __init__(self, loss_weight=1.0, resize_stu=True):
+        super(PKDLoss, self).__init__()
+        self.loss_weight = loss_weight
+        self.resize_stu = resize_stu
+
+    def norm(self, feat: torch.Tensor) -> torch.Tensor:
+        """Normalize the feature maps to have zero mean and unit variances.
+
+        Args:
+            feat (torch.Tensor): The original feature map with shape
+                (N, C, H, W).
+        """
+        assert len(feat.shape) == 4
+        N, C, H, W = feat.shape
+        feat = feat.permute(1, 0, 2, 3).reshape(C, -1)
+        mean = feat.mean(dim=-1, keepdim=True)
+        std = feat.std(dim=-1, keepdim=True)
+        feat = (feat - mean) / (std + 1e-6)
+        return feat.reshape(C, N, H, W).permute(1, 0, 2, 3)
+
+    def forward(self, preds_S: Union[torch.Tensor, Tuple],
+                preds_T: Union[torch.Tensor, Tuple]) -> torch.Tensor:
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor | Tuple[torch.Tensor]): The student model
+                prediction. If tuple, it should be several tensors with shape
+                (N, C, H, W).
+            preds_T (torch.Tensor | Tuple[torch.Tensor]): The teacher model
+                prediction. If tuple, it should be several tensors with shape
+                (N, C, H, W).
+
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        if isinstance(preds_S, torch.Tensor):
+            preds_S, preds_T = (preds_S, ), (preds_T, )
+
+        loss = 0.
+
+        for pred_S, pred_T in zip(preds_S, preds_T):
+            size_S, size_T = pred_S.shape[2:], pred_T.shape[2:]
+            if size_S[0] != size_T[0]:
+                if self.resize_stu:
+                    pred_S = F.interpolate(pred_S, size_T, mode='bilinear')
+                else:
+                    pred_T = F.interpolate(pred_T, size_S, mode='bilinear')
+            assert pred_S.shape == pred_T.shape
+
+            norm_S, norm_T = self.norm(pred_S), self.norm(pred_T)
+
+            # First conduct feature normalization and then calculate the
+            # MSE loss. Methematically, it is equivalent to firstly calculate
+            # the Pearson Correlation Coefficient (r) between two feature
+            # vectors, and then use 1-r as the new feature imitation loss.
+            loss += F.mse_loss(norm_S, norm_T) / 2
+
+        return loss * self.loss_weight
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/relational_kd.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/relational_kd.py
new file mode 100644
index 0000000000000000000000000000000000000000..1eae338c102dd72c8169eaf932c25f596ab4ec26
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/relational_kd.py
@@ -0,0 +1,149 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+def euclidean_distance(pred, squared=False, eps=1e-12):
+    """Calculate the Euclidean distance between the two examples in the output
+    representation space.
+
+    Args:
+        pred (torch.Tensor): The prediction of the teacher or student with
+            shape (N, C).
+        squared (bool): Whether to calculate the squared Euclidean
+            distance. Defaults to False.
+        eps (float): The minimum Euclidean distance between the two
+            examples. Defaults to 1e-12.
+    """
+    pred_square = pred.pow(2).sum(dim=-1)  # (N, )
+    prod = torch.mm(pred, pred.t())  # (N, N)
+    distance = (pred_square.unsqueeze(1) + pred_square.unsqueeze(0) -
+                2 * prod).clamp(min=eps)  # (N, N)
+
+    if not squared:
+        distance = distance.sqrt()
+
+    distance = distance.clone()
+    distance[range(len(prod)), range(len(prod))] = 0
+    return distance
+
+
+def angle(pred):
+    """Calculate the angle-wise relational potential which measures the angle
+    formed by the three examples in the output representation space.
+
+    Args:
+        pred (torch.Tensor): The prediction of the teacher or student with
+            shape (N, C).
+    """
+    pred_vec = pred.unsqueeze(0) - pred.unsqueeze(1)  # (N, N, C)
+    norm_pred_vec = F.normalize(pred_vec, p=2, dim=2)
+    angle = torch.bmm(norm_pred_vec,
+                      norm_pred_vec.transpose(1, 2)).view(-1)  # (N*N*N, )
+    return angle
+
+
+@MODELS.register_module()
+class DistanceWiseRKD(nn.Module):
+    """PyTorch version of distance-wise loss of `Relational Knowledge
+    Distillation.
+
+    <https://arxiv.org/abs/1904.05068>`_.
+
+    Args:
+        loss_weight (float): Weight of distance-wise distillation loss.
+            Defaults to 25.0.
+        with_l2_norm (bool): Whether to normalize the model predictions before
+            calculating the loss. Defaults to True.
+    """
+
+    def __init__(self, loss_weight=25.0, with_l2_norm=True):
+        super(DistanceWiseRKD, self).__init__()
+
+        self.loss_weight = loss_weight
+        self.with_l2_norm = with_l2_norm
+
+    def distance_loss(self, preds_S, preds_T):
+        """Calculate distance-wise distillation loss."""
+        d_T = euclidean_distance(preds_T, squared=False)
+        # mean_d_T is a normalization factor for distance
+        mean_d_T = d_T[d_T > 0].mean()
+        d_T = d_T / mean_d_T
+
+        d_S = euclidean_distance(preds_S, squared=False)
+        mean_d_S = d_S[d_S > 0].mean()
+        d_S = d_S / mean_d_S
+
+        return F.smooth_l1_loss(d_S, d_T)
+
+    def forward(self, preds_S, preds_T):
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction with
+                shape (N, C, H, W) or shape (N, C).
+            preds_T (torch.Tensor): The teacher model prediction with
+                shape (N, C, H, W) or shape (N, C).
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        preds_S = preds_S.view(preds_S.shape[0], -1)
+        preds_T = preds_T.view(preds_T.shape[0], -1)
+        if self.with_l2_norm:
+            preds_S = F.normalize(preds_S, p=2, dim=1)
+            preds_T = F.normalize(preds_T, p=2, dim=1)
+
+        loss = self.distance_loss(preds_S, preds_T) * self.loss_weight
+
+        return loss
+
+
+@MODELS.register_module()
+class AngleWiseRKD(nn.Module):
+    """PyTorch version of angle-wise loss of `Relational Knowledge
+    Distillation.
+
+    <https://arxiv.org/abs/1904.05068>`_.
+
+    Args:
+        loss_weight (float): Weight of angle-wise distillation loss.
+            Defaults to 50.0.
+        with_l2_norm (bool): Whether to normalize the model predictions before
+            calculating the loss. Defaults to True.
+    """
+
+    def __init__(self, loss_weight=50.0, with_l2_norm=True):
+        super(AngleWiseRKD, self).__init__()
+
+        self.loss_weight = loss_weight
+        self.with_l2_norm = with_l2_norm
+
+    def angle_loss(self, preds_S, preds_T):
+        """Calculate the angle-wise distillation loss."""
+        angle_T = angle(preds_T)
+        angle_S = angle(preds_S)
+        return F.smooth_l1_loss(angle_S, angle_T)
+
+    def forward(self, preds_S, preds_T):
+        """Forward computation.
+
+        Args:
+            preds_S (torch.Tensor): The student model prediction with
+                shape (N, C, H, W) or shape (N, C).
+            preds_T (torch.Tensor): The teacher model prediction with
+                shape (N, C, H, W) or shape (N, C).
+        Return:
+            torch.Tensor: The calculated loss value.
+        """
+        preds_S = preds_S.view(preds_S.shape[0], -1)
+        preds_T = preds_T.view(preds_T.shape[0], -1)
+        if self.with_l2_norm:
+            preds_S = F.normalize(preds_S, p=2, dim=-1)
+            preds_T = F.normalize(preds_T, p=2, dim=-1)
+
+        loss = self.angle_loss(preds_S, preds_T) * self.loss_weight
+
+        return loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/weighted_soft_label_distillation.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/weighted_soft_label_distillation.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7704c5f426f554b8978d39de4801d457d8489d5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/losses/weighted_soft_label_distillation.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class WSLD(nn.Module):
+    """PyTorch version of `Rethinking Soft Labels for Knowledge
+    Distillation: A Bias-Variance Tradeoff Perspective
+    <https://arxiv.org/abs/2102.00650>`_.
+
+    Args:
+        tau (float): Temperature coefficient. Defaults to 1.0.
+        loss_weight (float): Weight of loss. Defaults to 1.0.
+        num_classes (int): Defaults to 1000.
+    """
+
+    def __init__(self, tau=1.0, loss_weight=1.0, num_classes=1000):
+        super(WSLD, self).__init__()
+
+        self.tau = tau
+        self.loss_weight = loss_weight
+        self.num_classes = num_classes
+        self.softmax = nn.Softmax(dim=1)
+        self.logsoftmax = nn.LogSoftmax(dim=1)
+
+    def forward(self, student, teacher, gt_labels):
+
+        student_logits = student / self.tau
+        teacher_logits = teacher / self.tau
+
+        teacher_probs = self.softmax(teacher_logits)
+
+        ce_loss = -torch.sum(
+            teacher_probs * self.logsoftmax(student_logits), 1, keepdim=True)
+
+        student_detach = student.detach()
+        teacher_detach = teacher.detach()
+        log_softmax_s = self.logsoftmax(student_detach)
+        log_softmax_t = self.logsoftmax(teacher_detach)
+        one_hot_labels = F.one_hot(
+            gt_labels, num_classes=self.num_classes).float()
+        ce_loss_s = -torch.sum(one_hot_labels * log_softmax_s, 1, keepdim=True)
+        ce_loss_t = -torch.sum(one_hot_labels * log_softmax_t, 1, keepdim=True)
+
+        focal_weight = ce_loss_s / (ce_loss_t + 1e-7)
+        ratio_lower = torch.zeros_like(focal_weight)
+        focal_weight = torch.max(focal_weight, ratio_lower)
+        focal_weight = 1 - torch.exp(-focal_weight)
+        ce_loss = focal_weight * ce_loss
+
+        loss = (self.tau**2) * torch.mean(ce_loss)
+
+        loss = self.loss_weight * loss
+
+        return loss
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..baf12b0926b6915b0adaeaf6b4b8c6b2bede9437
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/__init__.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base_mutable import BaseMutable
+from .derived_mutable import DerivedMutable
+from .mutable_channel import (BaseMutableChannel, MutableChannelContainer,
+                              OneShotMutableChannel, SimpleMutableChannel,
+                              SquentialMutableChannel)
+from .mutable_channel.units import (ChannelUnitType, DCFFChannelUnit,
+                                    DMCPChannelUnit, L1MutableChannelUnit,
+                                    MutableChannelUnit,
+                                    OneShotMutableChannelUnit,
+                                    SequentialMutableChannelUnit,
+                                    SlimmableChannelUnit)
+from .mutable_module import (DiffChoiceRoute, DiffMutableModule, DiffMutableOP,
+                             OneHotMutableOP, OneShotMutableModule,
+                             OneShotMutableOP)
+from .mutable_value import MutableValue, OneShotMutableValue
+
+__all__ = [
+    'OneShotMutableOP', 'OneShotMutableModule', 'DiffMutableOP',
+    'DiffChoiceRoute', 'DiffMutableModule', 'DerivedMutable', 'MutableValue',
+    'OneShotMutableValue', 'SequentialMutableChannelUnit',
+    'L1MutableChannelUnit', 'OneShotMutableChannelUnit',
+    'SimpleMutableChannel', 'MutableChannelUnit', 'SlimmableChannelUnit',
+    'BaseMutableChannel', 'MutableChannelContainer', 'ChannelUnitType',
+    'SquentialMutableChannel', 'OneHotMutableOP', 'OneShotMutableChannel',
+    'BaseMutable', 'DCFFChannelUnit', 'DMCPChannelUnit'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/base_mutable.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/base_mutable.py
new file mode 100644
index 0000000000000000000000000000000000000000..2b5972d9f2ee3a052d3fdbbe2a8656f121dd5960
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/base_mutable.py
@@ -0,0 +1,94 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABC, abstractmethod
+from typing import Dict, Optional
+
+from mmengine.model import BaseModule
+
+from mmrazor.utils.typing import DumpChosen
+
+
+class BaseMutable(BaseModule, ABC):
+    """Base Class for mutables. Mutable means a searchable module widely used
+    in Neural Architecture Search(NAS).
+
+    It mainly consists of some optional operations, and achieving
+    searchable function by handling choice with ``MUTATOR``.
+
+    All subclass should implement the following APIs:
+
+    - ``fix_chosen()``
+    - ``dump_chosen()``
+    - ``current_choice.setter()``
+    - ``current_choice.getter()``
+
+    Args:
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(self,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg=init_cfg)
+
+        self.alias = alias
+        self._is_fixed = False
+
+    @property  # type: ignore
+    @abstractmethod
+    def current_choice(self):
+        """Current choice will affect :meth:`forward` and will be used in
+        :func:`mmrazor.core.subnet.utils.export_fix_subnet` or mutator.
+        """
+
+    @current_choice.setter  # type: ignore
+    @abstractmethod
+    def current_choice(self, choice) -> None:
+        """Current choice setter will be executed in mutator."""
+
+    @property
+    def is_fixed(self) -> bool:
+        """bool: whether the mutable is fixed.
+
+        Note:
+            If a mutable is fixed, it is no longer a searchable module, just
+                a normal fixed module.
+            If a mutable is not fixed, it still is a searchable module.
+        """
+        return self._is_fixed
+
+    @is_fixed.setter
+    def is_fixed(self, is_fixed: bool) -> None:
+        """Set the status of `is_fixed`."""
+        assert isinstance(is_fixed, bool), \
+            f'The type of `is_fixed` need to be bool type, ' \
+            f'but got: {type(is_fixed)}'
+        if self._is_fixed:
+            raise AttributeError(
+                'The mode of current MUTABLE is `fixed`. '
+                'Please do not set `is_fixed` function repeatedly.')
+        self._is_fixed = is_fixed
+
+    @abstractmethod
+    def fix_chosen(self, chosen) -> None:
+        """Fix mutable with chosen. This function would fix the chosen of
+        mutable. The :attr:`is_fixed` will be set to True and only the selected
+        operations can be retained. All subclasses must implement this method.
+
+        Note:
+            This operation is irreversible.
+        """
+        raise NotImplementedError()
+
+    @abstractmethod
+    def dump_chosen(self) -> DumpChosen:
+        """Save the current state of the mutable as a dictionary.
+
+        ``DumpChosen`` has ``chosen`` and ``meta`` fields. ``chosen`` is
+        necessary, ``fix_chosen`` will use the ``chosen`` . ``meta`` is used to
+        store some non-essential information.
+        """
+        raise NotImplementedError()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/derived_mutable.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/derived_mutable.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac8a8c60ab624ea6e2485f3d8fd23f30ed943293
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/derived_mutable.py
@@ -0,0 +1,456 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import sys
+
+if sys.version_info < (3, 8):
+    from typing_extensions import Protocol
+else:
+    from typing import Protocol
+
+import inspect
+import logging
+from itertools import product
+from typing import Any, Callable, Dict, Iterable, Optional, Set, Union
+
+import torch
+from mmengine.logging import print_log
+from torch import Tensor
+
+from mmrazor.utils.typing import DumpChosen
+from ..utils import make_divisible
+from .base_mutable import BaseMutable
+
+
+class MutableProtocol(Protocol):  # pragma: no cover
+    """Protocol for Mutable."""
+
+    @property
+    def current_choice(self) -> Any:
+        """Current choice."""
+
+    def derive_expand_mutable(self, expand_ratio: int) -> Any:
+        """Derive expand mutable."""
+
+    def derive_divide_mutable(self, ratio: int, divisor: int) -> Any:
+        """Derive divide mutable."""
+
+
+class MutableChannelProtocol(MutableProtocol):  # pragma: no cover
+    """Protocol for MutableChannel."""
+
+    @property
+    def current_mask(self) -> Tensor:
+        """Current mask."""
+
+
+def _expand_choice_fn(mutable: MutableProtocol,
+                      expand_ratio: Union[int, float]) -> Callable:
+    """Helper function to build `choice_fn` for expand derived mutable."""
+
+    def fn():
+        return int(mutable.current_choice * expand_ratio)
+
+    return fn
+
+
+def _expand_mask_fn(
+        mutable: MutableProtocol,
+        expand_ratio: Union[int, float]) -> Callable:  # pragma: no cover
+    """Helper function to build `mask_fn` for expand derived mutable."""
+    if not hasattr(mutable, 'current_mask'):
+        raise ValueError('mutable must have attribute `currnet_mask`')
+
+    def fn():
+        mask = mutable.current_mask
+        if isinstance(expand_ratio, int):
+            expand_num_channels = mask.size(0) * expand_ratio
+            expand_choice = mutable.current_choice * expand_ratio
+        elif isinstance(expand_ratio, float):
+            expand_num_channels = int(mask.size(0) * expand_ratio)
+            expand_choice = int(mutable.current_choice * expand_ratio)
+        else:
+            raise NotImplementedError(
+                f'Not support type of expand_ratio: {type(expand_ratio)}')
+        expand_mask = torch.zeros(expand_num_channels).bool()
+        expand_mask[:expand_choice] = True
+
+        return expand_mask
+
+    return fn
+
+
+def _divide_and_divise(x: int, ratio: int, divisor: int = 8) -> int:
+    """Helper function for divide and divise."""
+    new_x = x // ratio
+
+    return make_divisible(new_x, divisor)  # type: ignore
+
+
+def _divide_choice_fn(mutable: MutableProtocol,
+                      ratio: int,
+                      divisor: int = 8) -> Callable:
+    """Helper function to build `choice_fn` for divide derived mutable."""
+
+    def fn():
+        return _divide_and_divise(mutable.current_choice, ratio, divisor)
+
+    return fn
+
+
+def _divide_mask_fn(mutable: MutableProtocol,
+                    ratio: int,
+                    divisor: int = 8) -> Callable:  # pragma: no cover
+    """Helper function to build `mask_fn` for divide derived mutable."""
+    if not hasattr(mutable, 'current_mask'):
+        raise ValueError('mutable must have attribute `currnet_mask`')
+
+    def fn():
+        mask = mutable.current_mask
+        divide_num_channels = _divide_and_divise(mask.size(0), ratio, divisor)
+        divide_choice = _divide_and_divise(mutable.current_choice, ratio,
+                                           divisor)
+        divide_mask = torch.zeros(divide_num_channels).bool()
+        divide_mask[:divide_choice] = True
+
+        return divide_mask
+
+    return fn
+
+
+def _concat_choice_fn(mutables: Iterable[MutableChannelProtocol]) -> Callable:
+    """Helper function to build `choice_fn` for concat derived mutable."""
+
+    def fn():
+        return sum((m.current_choice for m in mutables))
+
+    return fn
+
+
+def _concat_mask_fn(mutables: Iterable[MutableChannelProtocol]) -> Callable:
+    """Helper function to build `mask_fn` for concat derived mutable."""
+
+    def fn():
+        return torch.cat([m.current_mask for m in mutables])
+
+    return fn
+
+
+class DerivedMethodMixin:
+    """A mixin that provides some useful method to derive mutable."""
+
+    def derive_same_mutable(self: MutableProtocol) -> 'DerivedMutable':
+        """Derive same mutable as the source."""
+        return self.derive_expand_mutable(expand_ratio=1)
+
+    def derive_expand_mutable(
+            self: MutableProtocol,
+            expand_ratio: Union[int, BaseMutable, float]) -> 'DerivedMutable':
+        """Derive expand mutable, usually used with `expand_ratio`."""
+        # avoid circular import
+        if isinstance(expand_ratio, int):
+            choice_fn = _expand_choice_fn(self, expand_ratio=expand_ratio)
+        elif isinstance(expand_ratio, float):
+            choice_fn = _expand_choice_fn(self, expand_ratio=expand_ratio)
+        elif isinstance(expand_ratio, BaseMutable):
+            current_ratio = expand_ratio.current_choice
+            choice_fn = _expand_choice_fn(self, expand_ratio=current_ratio)
+        else:
+            raise NotImplementedError(
+                f'Not support type of ratio: {type(expand_ratio)}')
+
+        mask_fn: Optional[Callable] = None
+        if hasattr(self, 'current_mask'):
+            if isinstance(expand_ratio, int):
+                mask_fn = _expand_mask_fn(self, expand_ratio=expand_ratio)
+            elif isinstance(expand_ratio, float):
+                mask_fn = _expand_mask_fn(self, expand_ratio=expand_ratio)
+            elif isinstance(expand_ratio, BaseMutable):
+                mask_fn = _expand_mask_fn(self, expand_ratio=current_ratio)
+            else:
+                raise NotImplementedError(
+                    f'Not support type of ratio: {type(expand_ratio)}')
+
+        return DerivedMutable(choice_fn=choice_fn, mask_fn=mask_fn)
+
+    def derive_divide_mutable(self: MutableProtocol,
+                              ratio: Union[int, float, BaseMutable],
+                              divisor: int = 8) -> 'DerivedMutable':
+        """Derive divide mutable, usually used with `make_divisable`."""
+        from .mutable_channel import BaseMutableChannel
+
+        # avoid circular import
+        if isinstance(ratio, int):
+            choice_fn = _divide_choice_fn(self, ratio=ratio, divisor=divisor)
+            current_ratio = ratio
+        elif isinstance(ratio, float):
+            current_ratio = int(ratio)
+            choice_fn = _divide_choice_fn(self, ratio=current_ratio, divisor=1)
+        elif isinstance(ratio, BaseMutable):
+            current_ratio = int(ratio.current_choice)
+            choice_fn = _divide_choice_fn(self, ratio=current_ratio, divisor=1)
+        else:
+            raise NotImplementedError(
+                f'Not support type of ratio: {type(ratio)}')
+
+        mask_fn: Optional[Callable] = None
+        if isinstance(self, BaseMutableChannel) and hasattr(
+                self, 'current_mask'):
+            mask_fn = _divide_mask_fn(
+                self, ratio=current_ratio, divisor=divisor)
+        elif getattr(self, 'mask_fn', None):  # OneShotMutableChannel
+            mask_fn = _divide_mask_fn(
+                self, ratio=current_ratio, divisor=divisor)
+
+        return DerivedMutable(choice_fn=choice_fn, mask_fn=mask_fn)
+
+    @staticmethod
+    def derive_concat_mutable(
+            mutables: Iterable[MutableChannelProtocol]) -> 'DerivedMutable':
+        """Derive concat mutable, usually used with `torch.cat`."""
+        for mutable in mutables:
+            if not hasattr(mutable, 'current_mask'):
+                raise RuntimeError('Source mutable of concat derived mutable '
+                                   'must have attribute `currnet_mask`')
+
+        choice_fn = _concat_choice_fn(mutables)
+        mask_fn = _concat_mask_fn(mutables)
+
+        return DerivedMutable(choice_fn=choice_fn, mask_fn=mask_fn)
+
+
+class DerivedMutable(BaseMutable, DerivedMethodMixin):
+    """Class for derived mutable.
+
+    A derived mutable is a mutable derived from other mutables that has
+    `current_choice` and `current_mask` attributes (if any).
+
+    Note:
+        A derived mutable does not have its own search space, so it is
+        not legal to modify its `current_choice` or `current_mask` directly.
+        And the only way to modify them is by modifying `current_choice` or
+        `current_mask` in corresponding source mutables.
+
+    Args:
+        choice_fn (callable): A closure that controls how to generate
+            `current_choice`.
+        mask_fn (callable, optional): A closure that controls how to generate
+            `current_mask`. Defaults to None.
+        source_mutables (iterable, optional): Specify source mutables for this
+            derived mutable. If the argument is None, source mutables will be
+            traced automatically by parsing mutables in closure variables.
+            Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`. Defaults to None.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`. Defaults to None.
+
+    Examples:
+        >>> from mmrazor.models.mutables import SquentialMutableChannel
+        >>> mutable_channel = SquentialMutableChannel(num_channels=3)
+        >>> # derive expand mutable
+        >>> derived_mutable_channel = mutable_channel * 2
+        >>> # source mutables will be traced automatically
+        >>> derived_mutable_channel.source_mutables
+        {SquentialMutableChannel(name=unbind, num_channels=3, current_choice=3)}  # noqa: E501
+        >>> # modify `current_choice` of `mutable_channel`
+        >>> mutable_channel.current_choice = 2
+        >>> # `current_choice` and `current_mask` of derived mutable will be modified automatically  # noqa: E501
+        >>> derived_mutable_channel
+        DerivedMutable(current_choice=4, activated_channels=4, source_mutables={SquentialMutableChannel(name=unbind, num_channels=3, current_choice=2)}, is_fixed=False)  # noqa: E501
+    """
+
+    def __init__(self,
+                 choice_fn: Callable,
+                 mask_fn: Optional[Callable] = None,
+                 source_mutables: Optional[Iterable[BaseMutable]] = None,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(alias, init_cfg)
+
+        self.choice_fn = choice_fn
+        self.mask_fn = mask_fn
+
+        if source_mutables is None:
+            source_mutables = self._trace_source_mutables()
+            if len(source_mutables) == 0:
+                raise RuntimeError(
+                    'Can not find source mutables automatically, '
+                    'please provide manually.')
+        else:
+            source_mutables = set(source_mutables)
+        for mutable in source_mutables:
+            if not self.is_source_mutable(mutable):
+                raise ValueError('Expect all mutable to be source mutable, '
+                                 f'but {mutable} is not')
+        self.source_mutables = source_mutables
+
+    # TODO
+    # has no effect
+    def fix_chosen(self, chosen) -> None:
+        """Fix mutable with subnet config.
+
+        Warning:
+            Fix derived mutable will have no actually effect.
+        """
+        print_log(
+            'Trying to fix chosen for derived mutable, '
+            'which will have no effect.',
+            level=logging.WARNING)
+
+    def dump_chosen(self) -> DumpChosen:
+        """Dump information of chosen.
+
+        Returns:
+            Dict: Dumped information.
+        """
+        print_log(
+            'Trying to dump chosen for derived mutable, '
+            'but its value depend on the source mutables.',
+            level=logging.WARNING)
+        return DumpChosen(chosen=self.export_chosen(), meta=None)
+
+    def export_chosen(self):
+        return self.current_choice
+
+    @property
+    def is_fixed(self) -> bool:
+        """Whether the derived mutable is fixed.
+
+        Note:
+            Depends on whether all source mutables are already fixed.
+        """
+        return all(m.is_fixed for m in self.source_mutables)
+
+    @is_fixed.setter
+    def is_fixed(self, is_fixed: bool) -> bool:
+        """Setter of is fixed."""
+        raise RuntimeError(
+            '`is_fixed` of derived mutable should not be modified directly')
+
+    @property
+    def choices(self):
+        origin_choices = [m.current_choice for m in self.source_mutables]
+
+        all_choices = [m.choices for m in self.source_mutables]
+
+        product_choices = product(*all_choices)
+
+        derived_choices = list()
+        for item_choices in product_choices:
+            for m, choice in zip(self.source_mutables, item_choices):
+                m.current_choice = choice
+
+            derived_choices.append(self.choice_fn())
+
+        for m, choice in zip(self.source_mutables, origin_choices):
+            m.current_choice = choice
+
+        return derived_choices
+
+    @property
+    def num_choices(self) -> int:
+        """Number of all choices.
+
+        Note:
+            Since derive mutable does not have its own search space, the number
+            of choices will always be `1`.
+
+        Returns:
+            int: Number of choices.
+        """
+        return 1
+
+    @property
+    def current_choice(self):
+        """Current choice of derived mutable."""
+        return self.choice_fn()
+
+    @current_choice.setter
+    def current_choice(self, choice) -> None:
+        """Setter of current choice.
+
+        Raises:
+            RuntimeError: Error when `current_choice` of derived mutable
+                is modified directly.
+        """
+        raise RuntimeError('Choice of drived mutable can not be set.')
+
+    @property
+    def current_mask(self) -> Tensor:
+        """Current mask of derived mutable."""
+        if self.mask_fn is None:
+            raise RuntimeError(
+                '`mask_fn` must be set before access `current_mask`.')
+        return self.mask_fn()
+
+    @current_mask.setter
+    def current_mask(self, mask: Tensor) -> None:
+        """Setter of current mask.
+
+        Raises:
+            RuntimeError: Error when `current_mask` of derived mutable
+                is modified directly.
+        """
+        raise RuntimeError('Mask of drived mutable can not be set.')
+
+    @staticmethod
+    def _trace_source_mutables_from_closure(
+            closure: Callable) -> Set[BaseMutable]:
+        """Trace source mutables from closure."""
+        source_mutables: Set[BaseMutable] = set()
+
+        def add_mutables_dfs(
+                mutable: Union[Iterable, BaseMutable, Dict]) -> None:
+            nonlocal source_mutables
+            if isinstance(mutable, BaseMutable):
+                if isinstance(mutable, DerivedMutable):
+                    source_mutables |= mutable.source_mutables
+                else:
+                    source_mutables.add(mutable)
+            # dict is also iterable, should parse first
+            elif isinstance(mutable, dict):
+                add_mutables_dfs(mutable.values())
+                add_mutables_dfs(mutable.keys())
+            elif isinstance(mutable, Iterable):
+                for m in mutable:
+                    add_mutables_dfs(m)
+
+        noncolcal_pars = inspect.getclosurevars(closure).nonlocals
+        add_mutables_dfs(noncolcal_pars.values())
+
+        return source_mutables
+
+    def _trace_source_mutables(self) -> Set[BaseMutable]:
+        """Trace source mutables."""
+        source_mutables = self._trace_source_mutables_from_closure(
+            self.choice_fn)
+        if self.mask_fn is not None:
+            source_mutables |= self._trace_source_mutables_from_closure(
+                self.mask_fn)
+
+        return source_mutables
+
+    @staticmethod
+    def is_source_mutable(mutable: object) -> bool:
+        """Judge whether an object is source mutable(not derived mutable).
+
+        Args:
+            mutable (object): An object.
+
+        Returns:
+            bool: Indicate whether the object is source mutable or not.
+        """
+        return isinstance(mutable, BaseMutable) and \
+            not isinstance(mutable, DerivedMutable)
+
+    # TODO
+    # should be __str__? but can not provide info when debug
+    def __repr__(self) -> str:  # pragma: no cover
+        s = f'{self.__class__.__name__}('
+        s += f'current_choice={self.current_choice}, '
+        if self.mask_fn is not None:
+            s += f'activated_channels={self.current_mask.sum().item()}, '
+        s += f'source_mutables={self.source_mutables}, '
+        s += f'is_fixed={self.is_fixed})'
+
+        return s
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/MutableChannel.md b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/MutableChannel.md
new file mode 100644
index 0000000000000000000000000000000000000000..20b3db816064ca908e067934c06f3e616a233257
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/MutableChannel.md
@@ -0,0 +1,36 @@
+# MutableChannels
+
+MutableChannels are used to deal with mutable number of channels in DynamicOps.
+
+```
+|-----------------------------------------|
+| mutable_in_channel(BaseMutableChannel)  |
+| --------------------------------------- |
+| DynamicOp                               |
+| --------------------------------------- |
+| mutable_out_channel(BaseMutableChannel) |
+| --------------------------------------- |
+```
+
+\`
+All MutableChannels inherit from BaseMutableChannel. Each MutableChannel has to implement two property.
+
+- current_choice: get and set the choice of the MutableChannel.
+- current_mask: get the channel mask according to the current_choice.
+
+## MutableChannelContainer
+
+Here, we introduce a special MutableChannel: MutableChannelContainer. As the channels of a DynamicOp may belong to different MutableChannelUnits, we use MutableChannelContainers to store multiple MutableChannels as below.
+
+```
+-----------------------------------------------------------
+|                   MutableChannelContainer               |
+-----------------------------------------------------------
+|MutableChannel1|     MutableChannel2     |MutableChannel3|
+-----------------------------------------------------------
+```
+
+MutableChannelContainer has an method to register MutableChannels.
+
+- register_mutable: register/store BaseMutableChannel in the
+  MutableChannelContainer
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..1dd78cb697b97235b58c7f534d2b6db6a7cf096d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base_mutable_channel import BaseMutableChannel
+from .mutable_channel_container import MutableChannelContainer
+from .oneshot_mutable_channel import OneShotMutableChannel
+from .sequential_mutable_channel import SquentialMutableChannel
+from .simple_mutable_channel import SimpleMutableChannel
+from .units import (ChannelUnitType, DCFFChannelUnit, DMCPChannelUnit,
+                    L1MutableChannelUnit, MutableChannelUnit,
+                    OneShotMutableChannelUnit, SequentialMutableChannelUnit,
+                    SlimmableChannelUnit)
+
+__all__ = [
+    'SimpleMutableChannel', 'L1MutableChannelUnit',
+    'SequentialMutableChannelUnit', 'MutableChannelUnit',
+    'OneShotMutableChannelUnit', 'SlimmableChannelUnit', 'BaseMutableChannel',
+    'MutableChannelContainer', 'SquentialMutableChannel', 'ChannelUnitType',
+    'DCFFChannelUnit', 'OneShotMutableChannel', 'DMCPChannelUnit'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/base_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/base_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..65d5a44d65edfe49878ca35a2ac3672b5289dfe0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/base_mutable_channel.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+""""""
+from abc import abstractmethod
+
+import torch
+
+from mmrazor.utils.typing import DumpChosen
+from ..base_mutable import BaseMutable
+from ..derived_mutable import DerivedMethodMixin
+
+
+class BaseMutableChannel(BaseMutable, DerivedMethodMixin):
+    """BaseMutableChannel works as a channel mask for DynamicOps to select
+    channels.
+
+    |---------------------------------------|
+    |mutable_in_channel(BaseMutableChannel) |
+    |---------------------------------------|
+    |             DynamicOp                 |
+    |---------------------------------------|
+    |mutable_out_channel(BaseMutableChannel)|
+    |---------------------------------------|
+
+    All subclasses should implement the following APIs and the other
+    abstract method in ``BaseMutable``
+
+    - ``current_mask``
+
+    Args:
+        num_channels (int): number(dimension) of channels(mask).
+    """
+
+    def __init__(self, num_channels: int, **kwargs):
+        super().__init__(**kwargs)
+        self.name = ''
+        self.num_channels = num_channels
+
+    @property  # type: ignore
+    @abstractmethod
+    def current_mask(self) -> torch.Tensor:
+        """Return a mask indicating the channel selection."""
+        raise NotImplementedError()
+
+    @property
+    def activated_channels(self) -> int:
+        """Number of activated channels."""
+        return (self.current_mask == 1).sum().item()
+
+    # implementation of abstract methods
+
+    def fix_chosen(self, chosen=None):
+        """Fix the mutable  with chosen."""
+        if chosen is not None:
+            self.current_choice = chosen
+
+        if self.is_fixed:
+            raise AttributeError(
+                'The mode of current MUTABLE is `fixed`. '
+                'Please do not call `fix_chosen` function again.')
+
+        self.is_fixed = True
+
+    def dump_chosen(self) -> DumpChosen:
+        """Dump chosen."""
+        meta = dict(max_channels=self.mask.size(0))
+        chosen = self.export_chosen()
+
+        return DumpChosen(chosen=chosen, meta=meta)
+
+    def export_chosen(self) -> int:
+        return self.activated_channels
+
+    def num_choices(self) -> int:
+        """Number of available choices."""
+        raise NotImplementedError()
+
+    # others
+
+    def __repr__(self):
+        repr_str = self.__class__.__name__
+        repr_str += '('
+        repr_str += f'num_channels={self.num_channels}, '
+        repr_str += f'activated_channels={self.activated_channels}'
+        repr_str += ')'
+        return repr_str
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/mutable_channel_container.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/mutable_channel_container.py
new file mode 100644
index 0000000000000000000000000000000000000000..5706d07501be2a21358f71c94c39bc4d43e3e374
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/mutable_channel_container.py
@@ -0,0 +1,123 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+
+import torch
+
+from mmrazor.registry import MODELS
+from mmrazor.utils import IndexDict
+from ...architectures.dynamic_ops.mixins import DynamicChannelMixin
+from .base_mutable_channel import BaseMutableChannel
+from .simple_mutable_channel import SimpleMutableChannel
+
+
+@MODELS.register_module()
+class MutableChannelContainer(BaseMutableChannel):
+    """MutableChannelContainer inherits from BaseMutableChannel. However,
+    it's not a single BaseMutableChannel, but a container for
+    BaseMutableChannel. The mask of MutableChannelContainer consists of
+    all masks of stored MutableChannels.
+
+    -----------------------------------------------------------
+    |                   MutableChannelContainer               |
+    -----------------------------------------------------------
+    |MutableChannel1|     MutableChannel2     |MutableChannel3|
+    -----------------------------------------------------------
+
+    Important interfaces:
+        register_mutable: register/store BaseMutableChannel in the
+            MutableChannelContainer
+    """
+
+    def __init__(self, num_channels: int, **kwargs):
+        super().__init__(num_channels, **kwargs)
+        self.mutable_channels = IndexDict()
+
+    # choice
+
+    @property
+    def current_choice(self) -> torch.Tensor:
+        """Get current choices."""
+        if len(self.mutable_channels) == 0:
+            return torch.ones([self.num_channels]).bool()
+        else:
+            self._fill_unregistered_range()
+            self._assert_mutables_valid()
+            mutable_channels = list(self.mutable_channels.values())
+            masks = [mutable.current_mask for mutable in mutable_channels]
+            mask = torch.cat(masks)
+            return mask.bool()
+
+    @current_choice.setter
+    def current_choice(self, choice):
+        """Set current choices.
+
+        However, MutableChannelContainer doesn't support directly set mask. You
+        can change the mask of MutableChannelContainer by changing its stored
+        BaseMutableChannel.
+        """
+        raise NotImplementedError()
+
+    @property
+    def current_mask(self) -> torch.Tensor:
+        """Return current mask."""
+        return self.current_choice.bool()
+
+    # basic extension
+
+    def register_mutable(self, mutable_channel: BaseMutableChannel, start: int,
+                         end: int):
+        """Register/Store BaseMutableChannel in the MutableChannelContainer in
+        the range [start,end)"""
+
+        self.mutable_channels[(start, end)] = mutable_channel
+
+    @classmethod
+    def register_mutable_channel_to_module(cls,
+                                           module: DynamicChannelMixin,
+                                           mutable: BaseMutableChannel,
+                                           is_to_output_channel=True,
+                                           start=0,
+                                           end=-1):
+        """Register a BaseMutableChannel to a module with
+        MutableChannelContainers."""
+        if end == -1:
+            end = mutable.current_choice + start
+        if is_to_output_channel:
+            container: MutableChannelContainer = module.get_mutable_attr(
+                'out_channels')
+        else:
+            container = module.get_mutable_attr('in_channels')
+        assert isinstance(container, MutableChannelContainer)
+        container.register_mutable(mutable, start, end)
+
+    # private methods
+
+    def _assert_mutables_valid(self):
+        """Assert the current stored BaseMutableChannels are valid to generate
+        mask."""
+        assert len(self.mutable_channels) > 0
+        last_end = 0
+        for start, end in self.mutable_channels:
+            assert start == last_end
+            last_end = end
+        assert last_end == self.num_channels, (
+            f'channel mismatch: {last_end} vs {self.num_channels}')
+
+    def _fill_unregistered_range(self):
+        """Fill with SimpleMutableChannels in the range without any stored
+        BaseMutableChannel.
+
+        For example, if a MutableChannelContainer has 10 channels, and only the
+        [0,5) is registered with BaseMutableChannels, this method will
+        automatically register BaseMutableChannels in the range [5,10).
+        """
+        last_end = 0
+        for start, end in copy.copy(self.mutable_channels):
+            if last_end < start:
+                self.register_mutable(
+                    SimpleMutableChannel(last_end - start), last_end, start)
+            last_end = end
+        if last_end < self.num_channels:
+            self.register_mutable(
+                SimpleMutableChannel(self.num_channels - last_end), last_end,
+                self.num_channels)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/oneshot_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/oneshot_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..3265b79c68c5fbf40004223ba712b2f563478aab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/oneshot_mutable_channel.py
@@ -0,0 +1,42 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Union
+
+from .sequential_mutable_channel import SquentialMutableChannel
+
+
+class OneShotMutableChannel(SquentialMutableChannel):
+    """OneShotMutableChannel is a subclass of SquentialMutableChannel. The
+    difference is that a OneShotMutableChannel limits the candidates of the
+    choice.
+
+    Args:
+        num_channels (int): number of channels.
+        candidate_choices (List[Union[float, int]], optional):  A list of
+            candidate width ratios. Each candidate indicates how many
+            channels to be reserved. Defaults to [].
+        choice_mode (str, optional): Mode of choices. Defaults to 'number'.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 candidate_choices: List[Union[float, int]] = [],
+                 choice_mode='number',
+                 **kwargs):
+        super().__init__(num_channels, choice_mode, **kwargs)
+        candidate_choices.sort()
+        self.candidate_choices = candidate_choices
+        if candidate_choices == []:
+            candidate_choices.append(num_channels if self.is_num_mode else 1.0)
+
+    @property
+    def current_choice(self) -> Union[int, float]:
+        """Get current choice."""
+        return super().current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice: Union[int, float]):
+        """Set current choice."""
+        assert choice in self.candidate_choices
+        SquentialMutableChannel.current_choice.fset(  # type: ignore
+            self,  # type: ignore
+            choice)  # type: ignore
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/sequential_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/sequential_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2b4f9291554f0720faf79e7085262c63695eafa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/sequential_mutable_channel.py
@@ -0,0 +1,140 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Callable, Union
+
+import torch
+
+from mmrazor.registry import MODELS
+from ..derived_mutable import DerivedMutable
+from .simple_mutable_channel import SimpleMutableChannel
+
+# TODO discuss later
+
+
+@MODELS.register_module()
+class SquentialMutableChannel(SimpleMutableChannel):
+    """SquentialMutableChannel defines a BaseMutableChannel which switch off
+    channel mask from right to left sequentially, like '11111000'.
+
+    A choice of SquentialMutableChannel is an integer, which indicates how many
+    channel are activated from left to right.
+
+    Args:
+        num_channels (int): number of channels.
+    """
+
+    def __init__(self, num_channels: int, choice_mode='number', **kwargs):
+
+        super().__init__(num_channels, **kwargs)
+        assert choice_mode in ['ratio', 'number']
+        self.choice_mode = choice_mode
+
+    @property
+    def is_num_mode(self):
+        """Get if the choice is number mode."""
+        return self.choice_mode == 'number'
+
+    @property
+    def current_choice(self) -> Union[int, float]:
+        """Get current choice."""
+        int_choice = (self.mask == 1).sum().item()
+        if self.is_num_mode:
+            return int_choice
+        else:
+            return self._num2ratio(int_choice)
+
+    @current_choice.setter
+    def current_choice(self, choice: Union[int, float]):
+        """Set choice."""
+        if isinstance(choice, float):
+            int_choice = self._ratio2num(choice)
+        else:
+            int_choice = choice
+        self.mask.fill_(0.0)
+        self.mask[0:int_choice] = 1.0
+
+    @property
+    def current_mask(self) -> torch.Tensor:
+        """Return current mask."""
+        return self.mask.bool()
+
+    # methods for
+
+    def fix_chosen(self, chosen=...):
+        """Fix chosen."""
+        if chosen is ...:
+            chosen = self.current_choice
+        assert self.is_fixed is False
+        self.current_choice = chosen
+        self.is_fixed = True
+
+    def __rmul__(self, other) -> DerivedMutable:
+        return self * other
+
+    def __mul__(self, other) -> DerivedMutable:
+        if isinstance(other, int) or isinstance(other, float):
+            return self.derive_expand_mutable(other)
+
+        from ..mutable_value import OneShotMutableValue
+
+        def expand_choice_fn(mutable1: 'SquentialMutableChannel',
+                             mutable2: OneShotMutableValue) -> Callable:
+
+            def fn():
+                return int(mutable1.current_choice * mutable2.current_choice)
+
+            return fn
+
+        def expand_mask_fn(mutable1: 'SquentialMutableChannel',
+                           mutable2: OneShotMutableValue) -> Callable:
+
+            def fn():
+                mask = mutable1.current_mask
+                max_expand_ratio = mutable2.max_choice
+                current_expand_ratio = mutable2.current_choice
+                expand_num_channels = int(mask.size(0) * max_expand_ratio)
+
+                expand_choice = int(mutable1.current_choice *
+                                    current_expand_ratio)
+                expand_mask = torch.zeros(expand_num_channels).bool()
+                expand_mask[:expand_choice] = True
+
+                return expand_mask
+
+            return fn
+
+        if isinstance(other, OneShotMutableValue):
+            return DerivedMutable(
+                choice_fn=expand_choice_fn(self, other),
+                mask_fn=expand_mask_fn(self, other))
+
+        raise TypeError(f'Unsupported type {type(other)} for mul!')
+
+    def __floordiv__(self, other) -> DerivedMutable:
+        if isinstance(other, int):
+            return self.derive_divide_mutable(other)
+        elif isinstance(other, float):
+            return self.derive_divide_mutable(int(other))
+        if isinstance(other, tuple):
+            assert len(other) == 2
+            return self.derive_divide_mutable(*other)
+
+        from ..mutable_value import OneShotMutableValue
+        if isinstance(other, OneShotMutableValue):
+            ratio = other.current_choice
+            return self.derive_divide_mutable(ratio)
+
+        raise TypeError(f'Unsupported type {type(other)} for div!')
+
+    def _num2ratio(self, choice: Union[int, float]) -> float:
+        """Convert the a number choice to a ratio choice."""
+        if isinstance(choice, float):
+            return choice
+        else:
+            return choice / self.num_channels
+
+    def _ratio2num(self, choice: Union[int, float]) -> int:
+        """Convert the a ratio choice to a number choice."""
+        if isinstance(choice, int):
+            return choice
+        else:
+            return max(1, int(self.num_channels * choice))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/simple_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/simple_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..9e85f81a34ca07964caf596a68a040bbd0c39cb3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/simple_mutable_channel.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import Union
+
+import torch
+
+from mmrazor.registry import MODELS
+from ..derived_mutable import DerivedMutable
+from .base_mutable_channel import BaseMutableChannel
+
+
+@MODELS.register_module()
+class SimpleMutableChannel(BaseMutableChannel):
+    """SimpleMutableChannel is a simple BaseMutableChannel, it directly take a
+    mask as a choice.
+
+    Args:
+        num_channels (int): number of channels.
+    """
+
+    def __init__(self, num_channels: int, **kwargs) -> None:
+        super().__init__(num_channels, **kwargs)
+        mask = torch.ones([self.num_channels
+                           ])  # save bool as float for dist training
+        self.register_buffer('mask', mask)
+        self.mask: torch.Tensor
+
+    # choice
+
+    @property
+    def current_choice(self) -> torch.Tensor:
+        """Get current choice."""
+        return self.mask.bool()
+
+    @current_choice.setter
+    def current_choice(self, choice: torch.Tensor):
+        """Set current choice."""
+        self.mask = choice.to(self.mask.device).float()
+
+    @property
+    def current_mask(self) -> torch.Tensor:
+        """Get current mask."""
+        return self.current_choice.bool()
+
+    # basic extension
+
+    def expand_mutable_channel(
+            self, expand_ratio: Union[int, float]) -> DerivedMutable:
+        """Get a derived SimpleMutableChannel with expanded mask."""
+
+        def _expand_mask():
+            mask = self.current_mask
+            mask = torch.unsqueeze(
+                mask, -1).expand(list(mask.shape) + [expand_ratio]).flatten(-2)
+            return mask
+
+        return DerivedMutable(_expand_mask, _expand_mask, [self])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6aa19222e75cc74c57d5b98aeb81a17ea2033ce
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .dcff_channel_unit import DCFFChannelUnit
+from .dmcp_channel_unit import DMCPChannelUnit
+from .l1_mutable_channel_unit import L1MutableChannelUnit
+from .mutable_channel_unit import ChannelUnitType, MutableChannelUnit
+from .one_shot_mutable_channel_unit import OneShotMutableChannelUnit
+from .sequential_mutable_channel_unit import SequentialMutableChannelUnit
+from .slimmable_channel_unit import SlimmableChannelUnit
+
+__all__ = [
+    'L1MutableChannelUnit', 'MutableChannelUnit',
+    'SequentialMutableChannelUnit', 'OneShotMutableChannelUnit',
+    'SlimmableChannelUnit', 'ChannelUnitType', 'DCFFChannelUnit',
+    'DMCPChannelUnit'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..e730245d4a5ea974ab567696a0fa0048ac07d98d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/channel_unit.py
@@ -0,0 +1,258 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, List
+
+import torch.nn as nn
+from mmengine.model import BaseModule
+
+from mmrazor.models.architectures.dynamic_ops.mixins import DynamicChannelMixin
+from mmrazor.registry import TASK_UTILS
+
+
+class Channel(BaseModule):
+    """Channel records information about channels for pruning.
+
+    Args:
+        name (str): The name of the channel. When the channel is related with
+            a module, the name should be the name of the module in the model.
+        module (Any): Module of the channel.
+        index (Tuple[int,int]): Index(start,end) of the Channel in the Module
+        node (ChannelNode, optional): A ChannelNode corresponding to the
+            Channel. Defaults to None.
+        is_output_channel (bool, optional): Is the channel output channel.
+            Defaults to True.
+    """
+
+    # init
+
+    def __init__(self,
+                 name,
+                 module,
+                 index,
+                 node=None,
+                 is_output_channel=True) -> None:
+        super().__init__()
+        self.name = name
+        self.module: nn.Module = module
+        self.index = index
+        self.start = index[0]
+        self.end = index[1]
+
+        self.node = node
+
+        self.is_output_channel = is_output_channel
+
+    @classmethod
+    def init_from_cfg(cls, model: nn.Module, config: Dict):
+        """init a Channel using a config which can be generated by
+        self.config_template()"""
+        name = config['name']
+        start = config['start']
+        end = config['end']
+        is_output_channel = config['is_output_channel']
+
+        name2module = dict(model.named_modules())
+        name2module.pop('')
+        module = name2module[name] if name in name2module else None
+        return Channel(
+            name, module, (start, end), is_output_channel=is_output_channel)
+
+    # config template
+
+    def config_template(self):
+        """Generate a config template which can be used to initialize a Channel
+        by cls.init_from_cfg(**kwargs)"""
+
+        return {
+            'name': str(self.name),
+            'start': self.start,
+            'end': self.end,
+            'is_output_channel': self.is_output_channel
+        }
+
+    # basic properties
+
+    @property
+    def num_channels(self) -> int:
+        """The number of channels in the Channel."""
+        return self.index[1] - self.index[0]
+
+    @property
+    def is_mutable(self) -> bool:
+        """If the channel is prunable."""
+        if self.module is not None:
+            has_prama = len(list(self.module.parameters())) != 0
+            is_dynamic_op = isinstance(self.module, DynamicChannelMixin)
+            return (not has_prama) or is_dynamic_op
+        else:
+            is_unmutable = self.name in [
+                'input_placeholder', 'output_placeholder'
+            ]
+            return not is_unmutable
+
+    def __repr__(self) -> str:
+        return (f'{self.__class__.__name__}('
+                f'{self.name}, index={self.index}, '
+                f'is_output_channel='
+                f'{"true" if self.is_output_channel else "false"}, '
+                ')')
+
+    def __eq__(self, obj: object) -> bool:
+        if isinstance(obj, Channel):
+            return self.name == obj.name \
+                and self.module == obj.module \
+                and self.index == obj.index \
+                and self.is_output_channel == obj.is_output_channel \
+                and self.node == obj.node
+        else:
+            return False
+
+
+# Channel && ChannelUnit
+
+
+class ChannelUnit(BaseModule):
+    """A unit of Channels.
+
+    A ChannelUnit has two list, input_related and output_related, to store
+    the Channels. These Channels are dependent on each other, and have to
+    have the same number of activated number of channels.
+
+    Args:
+        num_channels (int): the number of channels of Channel object.
+    """
+
+    # init methods
+
+    def __init__(self, num_channels: int, **kwargs):
+        super().__init__()
+
+        self.num_channels = num_channels
+        self.output_related: List[nn.Module] = list()
+        self.input_related: List[nn.Module] = list()
+        self.init_args: Dict = {
+        }  # is used to generate new channel unit with same args
+
+    @classmethod
+    def init_from_cfg(cls, model: nn.Module, config: Dict) -> 'ChannelUnit':
+        """init a ChannelUnit using a config which can be generated by
+        self.config_template()"""
+
+        def auto_fill_channel_config(channel_config: Dict,
+                                     is_output_channel: bool,
+                                     unit_config: Dict = config):
+            """Fill channel config with default values."""
+            if 'start' not in channel_config:
+                channel_config['start'] = 0
+            if 'end' not in channel_config:
+                channel_config['end'] = unit_config['init_args'][
+                    'num_channels']
+            channel_config['is_output_channel'] = is_output_channel
+
+        config = copy.deepcopy(config)
+        if 'channels' in config:
+            channels = config.pop('channels')
+        else:
+            channels = None
+        unit = cls(**(config['init_args']))
+        if channels is not None:
+            for channel_config in channels['input_related']:
+                auto_fill_channel_config(channel_config, False)
+                unit.add_input_related(
+                    Channel.init_from_cfg(model, channel_config))
+            for channel_config in channels['output_related']:
+                auto_fill_channel_config(channel_config, True)
+                unit.add_output_related(
+                    Channel.init_from_cfg(model, channel_config))
+        return unit
+
+    @classmethod
+    def init_from_channel_unit(cls,
+                               unit: 'ChannelUnit',
+                               args: Dict = {}) -> 'ChannelUnit':
+        """Initial a object of current class from a ChannelUnit object."""
+        args['num_channels'] = unit.num_channels
+        mutable_unit = cls(**args)
+        mutable_unit.input_related = unit.input_related
+        mutable_unit.output_related = unit.output_related
+        return mutable_unit
+
+    @classmethod
+    def init_from_channel_analyzer(cls, model, analyzer=None):
+        """Init MutableChannelUnits from a ChannelAnalyzer."""
+
+        if analyzer is None:
+            from mmrazor.models.task_modules.tracer import ChannelAnalyzer
+            analyzer = ChannelAnalyzer()
+        if isinstance(analyzer, dict):
+            analyzer = TASK_UTILS.build(analyzer)
+        unit_config = analyzer.analyze(model)
+        return [cls.init_from_cfg(model, cfg) for cfg in unit_config.values()]
+
+    # tools
+
+    @property
+    def name(self) -> str:
+        """str: name of the unit"""
+        if len(self.output_related) + len(self.input_related) > 0:
+            first_module = (list(self.output_related) +
+                            list(self.input_related))[0]
+            first_module_name = f'{first_module.name}_{first_module.index}'
+        else:
+            first_module_name = 'unitx'
+        name = f'{first_module_name}_{self.num_channels}'
+        return getattr(self, '_name', name)
+
+    @name.setter
+    def name(self, unit_name) -> None:
+        self._name = unit_name
+
+    @property
+    def alias(self) -> str:
+        """str: alias of the unit"""
+        return self.name
+
+    def config_template(self,
+                        with_init_args=False,
+                        with_channels=False) -> Dict:
+        """Generate a config template which can be used to initialize a
+        ChannelUnit by cls.init_from_cfg(**kwargs)"""
+        config = {}
+        if with_init_args:
+            config['init_args'] = {'num_channels': self.num_channels}
+        if with_channels:
+            config['channels'] = self._channel_dict()
+        return config
+
+    # node operations
+
+    def add_output_related(self, channel: Channel):
+        """Add a Channel which is output related."""
+        assert channel.is_output_channel
+        if channel not in self.output_related:
+            self.output_related.append(channel)
+
+    def add_input_related(self, channel: Channel):
+        """Add a Channel which is input related."""
+        assert channel.is_output_channel is False
+        if channel not in self.input_related:
+            self.input_related.append(channel)
+
+    # others
+
+    def extra_repr(self) -> str:
+        s = super().extra_repr()
+        s += f'name={self.name}'
+        return s
+
+    # private methods
+
+    def _channel_dict(self) -> Dict:
+        """Return channel config."""
+        info = {
+            'input_related':
+            [channel.config_template() for channel in self.input_related],
+            'output_related':
+            [channel.config_template() for channel in self.output_related],
+        }
+        return info
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dcff_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dcff_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..743a6473c196cac2a96a18135e9a4ec029393279
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dcff_channel_unit.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Union
+
+import torch.nn as nn
+
+from mmrazor.models.architectures import dynamic_ops
+from mmrazor.registry import MODELS
+from ..mutable_channel_container import MutableChannelContainer
+from .sequential_mutable_channel_unit import SequentialMutableChannelUnit
+
+
+@MODELS.register_module()
+class DCFFChannelUnit(SequentialMutableChannelUnit):
+    """``DCFFChannelUnit`` is for supernet DCFF and based on
+    OneShotMutableChannelUnit. In DCFF supernet, each module only has one
+    choice. The channel choice is fixed before training.
+
+    Args:
+        num_channels (int): The raw number of channels.
+        candidate_choices (List[Union[int, float]], optional):
+            A list of candidate width numbers or ratios. Each
+            candidate indicates how many channels to be reserved.
+            Defaults to [1.0](choice_mode='number').
+        choice_mode (str, optional): Mode of candidates.
+            One of "ratio" or "number". Defaults to 'ratio'.
+        divisor (int): Used to make choice divisible.
+        min_value (int): the minimal value used when make divisible.
+        min_ratio (float): the minimal ratio used when make divisible.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 candidate_choices: List[Union[int, float]] = [1.0],
+                 choice_mode: str = 'ratio',
+                 divisor: int = 1,
+                 min_value: int = 1,
+                 min_ratio: float = 0.9) -> None:
+        super().__init__(num_channels, choice_mode, divisor, min_value,
+                         min_ratio)
+
+    def prepare_for_pruning(self, model: nn.Module):
+        """In ``DCFFChannelGroup`` nn.Conv2d is replaced with FuseConv2d."""
+        self._replace_with_dynamic_ops(
+            model, {
+                nn.Conv2d: dynamic_ops.FuseConv2d,
+                nn.BatchNorm2d: dynamic_ops.DynamicBatchNorm2d,
+                nn.Linear: dynamic_ops.DynamicLinear
+            })
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dmcp_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dmcp_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..144127420d824d3b142aba2e6ab600b537cccdf4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/dmcp_channel_unit.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch.nn as nn
+
+from mmrazor.models.architectures import dynamic_ops
+from mmrazor.registry import MODELS
+from ..mutable_channel_container import MutableChannelContainer
+from .sequential_mutable_channel_unit import SequentialMutableChannelUnit
+
+
+@MODELS.register_module()
+class DMCPChannelUnit(SequentialMutableChannelUnit):
+    """``DMCPChannelUnit`` is for supernet DMCP and based on
+    OneShotMutableChannelUnit. In DMCP supernet, each module only has one
+    choice. The channel choice is fixed before training.
+
+    Note:
+        In dmcpunit, a new attribute `activated_tensor_channels` is defined
+    in self.mutable_channel, which is specifically used to store the number
+    of channels in the form of tensor. Defaults to None.
+
+    Args:
+        num_channels (int): The raw number of channels.
+        choice_mode (str, optional): Mode of candidates.
+            One of "ratio" or "number". Defaults to 'ratio'.
+        divisor (int): Used to make choice divisible.
+        min_value (int): the minimal value used when make divisible.
+        min_ratio (float): the minimal ratio used when make divisible.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 choice_mode: str = 'number',
+                 divisor: int = 1,
+                 min_value: int = 1,
+                 min_ratio: float = 0.5) -> None:
+        super().__init__(num_channels, choice_mode, divisor, min_value,
+                         min_ratio)
+        self.mutable_channel.activated_tensor_channels = None
+
+    def prepare_for_pruning(self, model: nn.Module):
+        """In ``DMCPChannelGroup`` nn.BatchNorm2d is replaced with
+        DMCPBatchNorm2d."""
+        self._replace_with_dynamic_ops(
+            model, {
+                nn.Conv2d: dynamic_ops.DynamicConv2d,
+                nn.BatchNorm2d: dynamic_ops.DMCPBatchNorm2d,
+                nn.Linear: dynamic_ops.DynamicLinear
+            })
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/group_fisher_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/group_fisher_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d33f4232a44658e530c38e2144cfd8034bfe2a4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/group_fisher_unit.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherChannelUnit  # noqa
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/l1_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/l1_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b3c258adceb2cac66dcbd4fb5c30ee9e354dd42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/l1_mutable_channel_unit.py
@@ -0,0 +1,82 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Union
+
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from ..simple_mutable_channel import SimpleMutableChannel
+from .sequential_mutable_channel_unit import SequentialMutableChannelUnit
+
+
+@MODELS.register_module()
+class L1MutableChannelUnit(SequentialMutableChannelUnit):
+    """Implementation of L1-norm pruning algorithm. It compute the l1-norm of
+    modules and preferly prune the modules with less l1-norm.
+
+    Please refer to papre `https://arxiv.org/pdf/1608.08710.pdf` for more
+    detail.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 choice_mode='number',
+                 divisor=1,
+                 min_value=1,
+                 min_ratio=0.9) -> None:
+        super().__init__(num_channels, choice_mode, divisor, min_value,
+                         min_ratio)
+        self.mutable_channel = SimpleMutableChannel(num_channels)
+
+    # choices
+
+    @property
+    def current_choice(self) -> Union[int, float]:
+        num = self.mutable_channel.activated_channels
+        if self.is_num_mode:
+            return num
+        else:
+            return self._num2ratio(num)
+
+    @current_choice.setter
+    def current_choice(self, choice: Union[int, float]):
+        int_choice = self._get_valid_int_choice(choice)
+        mask = self._generate_mask(int_choice).bool()
+        self.mutable_channel.current_choice = mask
+
+    # private methods
+
+    def _generate_mask(self, choice: int) -> torch.Tensor:
+        """Generate mask using choice."""
+        norm = self._get_unit_norm()
+        idx = norm.topk(choice)[1]
+        mask = torch.zeros([self.num_channels]).to(idx.device)
+        mask.scatter_(0, idx, 1)
+        return mask
+
+    def _get_l1_norm(self, module: Union[nn.modules.conv._ConvNd, nn.Linear],
+                     start, end):
+        """Get l1-norm of a module."""
+        if isinstance(module, nn.modules.conv._ConvNd):
+            weight = module.weight.flatten(1)  # out_c * in_c * k * k
+        elif isinstance(module, nn.Linear):
+            weight = module.weight  # out_c * in_c
+        weight = weight[start:end]
+        norm = weight.abs().mean(dim=[1])
+        return norm
+
+    def _get_unit_norm(self):
+        """Get l1-norm of the unit by averaging the l1-norm of the moduls in
+        the unit."""
+        avg_norm = 0
+        module_num = 0
+        for channel in self.output_related:
+            if isinstance(channel.module,
+                          nn.modules.conv._ConvNd) or isinstance(
+                              channel.module, nn.Linear):
+                norm = self._get_l1_norm(channel.module, channel.start,
+                                         channel.end)
+                avg_norm += norm
+                module_num += 1
+        avg_norm = avg_norm / module_num
+        return avg_norm
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.ipynb b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..bc40d191b6db6c5b417615f45c5f58310adea852
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.ipynb
@@ -0,0 +1,314 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# MutableChannelUnit"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each MutableChannelUnit is a basic unit for pruning. It records all channels which are dependent on each other.\n",
+    "Below, we will introduce you about:\n",
+    "1. The data structure of MutableChannelUnit.\n",
+    "2. How to prune the model with a MutableChannelUnit.\n",
+    "3. How to get MutableChannelUnits.\n",
+    "4. How to develop a new MutableChannelUnit for a new pruning algorithm.\n",
+    "<p align=\"center\"><img src=\"../../../../../docs/en/imgs/pruning/unit.png\" alt=\"MutableChannelUnit\" width=\"800\"></p>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The Data Structure of MutableChannelUnit"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, let's parse a model and get several MutableChannelUnits."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define a model\n",
+    "from mmengine.model import BaseModel\n",
+    "from torch import nn\n",
+    "from collections import OrderedDict\n",
+    "\n",
+    "class MyModel(nn.Module):\n",
+    "\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.net = nn.Sequential(\n",
+    "            OrderedDict([('conv0', nn.Conv2d(3, 8, 3, 1, 1)),\n",
+    "                         ('relu', nn.ReLU()),\n",
+    "                         ('conv1', nn.Conv2d(8, 16, 3, 1, 1))]))\n",
+    "        self.pool = nn.AdaptiveAvgPool2d(1)\n",
+    "        self.head = nn.Linear(16, 1000)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        feature = self.net(x)\n",
+    "        pool = self.pool(feature).flatten(1)\n",
+    "        return self.head(pool)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# There are multiple types of MutableChannelUnits. Here, We take SequentialMutableChannelUnit as the example.\n",
+    "from mmrazor.models.mutables.mutable_channel.units import SequentialMutableChannelUnit\n",
+    "from mmrazor.structures.graph import ModuleGraph\n",
+    "from typing import List\n",
+    "\n",
+    "model = MyModel()\n",
+    "units: List[\n",
+    "    SequentialMutableChannelUnit] = SequentialMutableChannelUnit.init_from_channel_analyzer(model)  # type: ignore\n",
+    "print(\n",
+    "    f'This model has {len(units)} MutableChannelUnit(SequentialMutableChannelUnit).'\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "unit1=units[1]\n",
+    "print(unit1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As shown above, each MutableChannelUnit has four important attributes: \n",
+    "1. name: str\n",
+    "2. output_related: ModuleList\n",
+    "3. input_related: ModuleList\n",
+    "4. mutable_channel: BaseMutableChannel\n",
+    "\n",
+    "\"name\" is the identifier of the MutableChannelUnit. It's automatically generated usually.\n",
+    "\n",
+    "\"output_related\" and \"input_related\" are two ModuleLists. They store all Channels with channel dependency.\n",
+    "The difference is that the \"output_related\" includes output channels and the \"input_related\" includes input channels.\n",
+    "All these channels\n",
+    "\n",
+    "\"mutable_channel\" is a BaseMutableChannel used to control the channel mask of modules. The mutable_channel is registered to the modules whose channels are stored in \"output_related\" and \"input_related\"."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How to prune the model with a MutableChannelUnit."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are three steps to prune the model using a MutableChannelUnit:\n",
+    "1. replace modules, whose channel are stored in the \"output_related\" and \"input_related\", with dynamic ops which are able to deal with mutable number of channels.\n",
+    "2. register the \"mutable_channel\" to the replaced dynamic ops.\n",
+    "3. change the choice of the \"mutable_channel\".\n",
+    "\n",
+    "For simplicity, we run step 1 and 2 with one method \"prepare_for_pruning\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We run \"prepare_for_pruning\" once before pruning to run step 1 and 2 above.\n",
+    "unit1.prepare_for_pruning(model)\n",
+    "print(f'The current choice of unit1 is {unit1.current_choice}.')\n",
+    "print(model.net.conv0)\n",
+    "print(model.net.conv1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We prune the model by changing the current_choice of the MutableChannelUnits."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampled_choice=unit1.sample_choice()\n",
+    "print(f'We get a sampled choice {sampled_choice}.')\n",
+    "unit1.current_choice=sampled_choice\n",
+    "print(model.net.conv0)\n",
+    "print(model.net.conv1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Besides, different types of MutableChannelUnit may have different types of choices. Please read documents for more details."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How to get MutableChannelUnits."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are three ways to get MutableChannelUnits.\n",
+    "1. Using a tracer.\n",
+    "   This way, firstly, converts a model to a graph, then converts the graph to MutableChannelUnits. It automatically returns all available MutableChannelUnits.\n",
+    "2. Using a config.\n",
+    "   This way uses a config to initialize a MutableChannelUnit.\n",
+    "3. Using a predefined model.\n",
+    "   This way parses a predefined model with dynamic ops. It returns all available MutableChannelUnits.\n",
+    "\n",
+    "All these three ways have corresponding documents in the README of ChannelMutator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. using tracer\n",
+    "def get_mutable_channel_units_using_tracer(model):\n",
+    "    units = SequentialMutableChannelUnit.init_from_channel_analyzer(model)\n",
+    "    return units\n",
+    "\n",
+    "\n",
+    "model = MyModel()\n",
+    "units = get_mutable_channel_units_using_tracer(model)\n",
+    "print(f'The model has {len(units)} MutableChannelUnits.')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 2. using config\n",
+    "config = {\n",
+    "    'init_args': {\n",
+    "        'num_channels': 8,\n",
+    "    },\n",
+    "    'channels': {\n",
+    "        'input_related': [{\n",
+    "            'name': 'net.conv1',\n",
+    "        }],\n",
+    "        'output_related': [{\n",
+    "            'name': 'net.conv0',\n",
+    "        }]\n",
+    "    },\n",
+    "    'choice': 8\n",
+    "}\n",
+    "unit=SequentialMutableChannelUnit.init_from_cfg(model, config)\n",
+    "print(unit)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 3. using predefined model\n",
+    "\n",
+    "from mmrazor.models.architectures.dynamic_ops import DynamicConv2d, DynamicLinear\n",
+    "from mmrazor.models.mutables import MutableChannelUnit, MutableChannelContainer,SquentialMutableChannel\n",
+    "from collections import OrderedDict\n",
+    "\n",
+    "class MyDynamicModel(BaseModel):\n",
+    "\n",
+    "    def __init__(self):\n",
+    "        super().__init__(None, None)\n",
+    "        self.net = nn.Sequential(\n",
+    "            OrderedDict([('conv0', DynamicConv2d(3, 8, 3, 1, 1)),\n",
+    "                         ('relu', nn.ReLU()),\n",
+    "                         ('conv1', DynamicConv2d(8, 16, 3, 1, 1))]))\n",
+    "        self.pool = nn.AdaptiveAvgPool2d(1)\n",
+    "        self.head = DynamicLinear(16, 1000)\n",
+    "\n",
+    "        # register MutableChannelContainer\n",
+    "        MutableChannelUnit._register_channel_container(\n",
+    "            self, MutableChannelContainer)\n",
+    "        self._register_mutables()\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        feature = self.net(x)\n",
+    "        pool = self.pool(feature).flatten(1)\n",
+    "        return self.head(pool)\n",
+    "\n",
+    "    def _register_mutables(self):\n",
+    "        mutable1 = SquentialMutableChannel(8)\n",
+    "        mutable2 = SquentialMutableChannel(16)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv0, mutable1, is_to_output_channel=True)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv1, mutable1, is_to_output_channel=False)\n",
+    "\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv1, mutable2, is_to_output_channel=True)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.head, mutable2, is_to_output_channel=False)\n",
+    "model=MyDynamicModel()\n",
+    "units=SequentialMutableChannelUnit.init_from_predefined_model(model)            \n",
+    "print(f'The model has {len(units)} MutableChannelUnits.')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.9.13 ('lab2max')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "e31a827d0913016ad78e01c7b97f787f4b9e53102dd62d238e8548bcd97ff875"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..251214f701a0607ce5a6212ca9d6651641dd0c5f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.py
@@ -0,0 +1,308 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module defines MutableChannelUnit."""
+import abc
+# from collections import set
+from typing import Dict, List, Type, TypeVar
+
+import torch
+import torch.nn as nn
+
+from mmrazor.models.architectures.dynamic_ops.mixins import DynamicChannelMixin
+from mmrazor.models.mutables import DerivedMutable
+from mmrazor.models.mutables.mutable_channel import (BaseMutableChannel,
+                                                     MutableChannelContainer)
+from mmrazor.models.utils import get_module_device
+from .channel_unit import Channel, ChannelUnit
+
+
+class MutableChannelUnit(ChannelUnit):
+    # init methods
+    def __init__(self, num_channels: int, **kwargs) -> None:
+        """MutableChannelUnit inherits from ChannelUnit, which manages channels
+        with channel-dependency. Compared with ChannelUnit, MutableChannelUnit
+        defines the core interfaces for pruning. By inheriting
+        MutableChannelUnit, we can implement a variant pruning and nas
+        algorithm. These apis includes.
+
+            - basic property
+                - name
+                - is_mutable
+            - before pruning
+                - prepare_for_pruning
+            - pruning stage
+                - current_choice
+                - sample_choice
+            - after pruning
+                - fix_chosen
+
+        Args:
+            num_channels (int): dimension of the channels of the Channel
+            objects in the unit.
+        """
+
+        super().__init__(num_channels)
+
+    @classmethod
+    def init_from_cfg(cls, model: nn.Module, config: Dict):
+        """init a Channel using a config which can be generated by
+        self.config_template(), include init choice."""
+        unit = super().init_from_cfg(model, config)
+        # TO DO: add illegal judgement here?
+        if 'choice' in config:
+            unit.current_choice = config['choice']
+        return unit
+
+    @classmethod
+    def init_from_mutable_channel(cls, mutable_channel: BaseMutableChannel):
+        unit = cls(mutable_channel.num_channels)
+        return unit
+
+    @classmethod
+    def init_from_predefined_model(cls, model: nn.Module):
+        """Initialize units using the model with pre-defined dynamicops and
+        mutable-channels."""
+
+        def process_container(container: MutableChannelContainer,
+                              module,
+                              module_name,
+                              mutable2units,
+                              is_output=True):
+            for index, mutable in container.mutable_channels.items():
+                derived_choices = mutable.current_choice
+                if isinstance(derived_choices, torch.Tensor):
+                    derived_choices = derived_choices.sum().item()
+                if isinstance(mutable, DerivedMutable):
+                    source_mutables: set = \
+                        mutable._trace_source_mutables()
+                    source_channel_mutables = [
+                        mutable for mutable in source_mutables
+                        if isinstance(mutable, BaseMutableChannel)
+                    ]
+                    assert len(source_channel_mutables) == 1, (
+                        'only support one mutable channel '
+                        'used in DerivedMutable')
+                    mutable = source_channel_mutables[0]
+
+                if mutable not in mutable2units:
+                    mutable2units[mutable] = cls.init_from_mutable_channel(
+                        mutable)
+
+                unit: MutableChannelUnit = mutable2units[mutable]
+                if is_output:
+                    unit.add_output_related(
+                        Channel(
+                            module_name,
+                            module,
+                            index,
+                            is_output_channel=is_output))
+                else:
+                    unit.add_input_related(
+                        Channel(
+                            module_name,
+                            module,
+                            index,
+                            is_output_channel=is_output))
+
+        mutable2units: Dict = {}
+        for name, module in model.named_modules():
+            if isinstance(module, DynamicChannelMixin):
+                in_container: MutableChannelContainer = \
+                    module.get_mutable_attr(
+                        'in_channels')
+                out_container: MutableChannelContainer = \
+                    module.get_mutable_attr(
+                        'out_channels')
+                process_container(in_container, module, name, mutable2units,
+                                  False)
+                process_container(out_container, module, name, mutable2units,
+                                  True)
+        units = list(mutable2units.values())
+        return units
+
+    # properties
+
+    @property
+    def mutable_prefix(self) -> str:
+        """Mutable prefix."""
+        return 'channel'
+
+    @property
+    def is_mutable(self) -> bool:
+        """If the channel-unit is prunable."""
+
+        def traverse(channels: List[Channel]):
+            has_dynamic_op = False
+            all_channel_prunable = True
+            for channel in channels:
+                if channel.is_mutable is False:
+                    all_channel_prunable = False
+                    break
+                if isinstance(channel.module, DynamicChannelMixin):
+                    has_dynamic_op = True
+            return has_dynamic_op, all_channel_prunable
+
+        input_has_dynamic_op, input_all_prunable = traverse(self.input_related)
+        output_has_dynamic_op, output_all_prunable = traverse(
+            self.output_related)
+
+        return len(self.output_related) > 0 \
+            and len(self.input_related) > 0 \
+            and input_has_dynamic_op \
+            and input_all_prunable \
+            and output_has_dynamic_op \
+            and output_all_prunable
+
+    def config_template(self,
+                        with_init_args=False,
+                        with_channels=False) -> Dict:
+        """Return the config template of this unit. By default, the config
+        template only includes a key 'choice'.
+
+        Args:
+            with_init_args (bool): if the config includes args for
+                initialization.
+            with_channels (bool): if the config includes info about
+                channels. the config with info about channels can used to
+                parse channel units without tracer.
+        """
+        config = super().config_template(with_init_args, with_channels)
+        config['choice'] = self.current_choice
+        return config
+
+    # before pruning: prepare a model
+
+    @abc.abstractmethod
+    def prepare_for_pruning(self, model):
+        """Post process after parse units.
+
+        For example, we need to register mutables to dynamic-ops.
+        """
+        raise NotImplementedError
+
+    # pruning: choice-related
+
+    @property
+    def current_choice(self):
+        """Choice of this unit."""
+        raise NotImplementedError()
+
+    @current_choice.setter
+    def current_choice(self, choice) -> None:
+        """setter of current_choice."""
+        raise NotImplementedError()
+
+    @abc.abstractmethod
+    def sample_choice(self):
+        """Randomly sample a valid choice and return."""
+        raise NotImplementedError()
+
+    # after pruning
+
+    def fix_chosen(self, choice=None):
+        """Make the channels in this unit fixed."""
+        if choice is not None:
+            self.current_choice = choice
+
+    # private methods
+
+    def _replace_with_dynamic_ops(
+            self, model: nn.Module,
+            dynamicop_map: Dict[Type[nn.Module], Type[DynamicChannelMixin]]):
+        """Replace torch modules with dynamic-ops."""
+
+        def replace_op(model: nn.Module, name: str, module: nn.Module):
+            names = name.split('.')
+            for sub_name in names[:-1]:
+                model = getattr(model, sub_name)
+
+            setattr(model, names[-1], module)
+
+        def get_module(model, name):
+            names = name.split('.')
+            for sub_name in names:
+                model = getattr(model, sub_name)
+            return model
+
+        for channel in list(self.input_related) + list(self.output_related):
+            if isinstance(channel.module, nn.Module):
+                module = get_module(model, channel.name)
+                if type(module) in dynamicop_map:
+                    new_module = dynamicop_map[type(module)].convert_from(
+                        module).to(get_module_device(module))
+                    replace_op(model, channel.name, new_module)
+                    channel.module = new_module
+                else:
+                    channel.module = module
+
+    @staticmethod
+    def _register_channel_container(
+            model: nn.Module, container_class: Type[MutableChannelContainer]):
+        """register channel container for dynamic ops."""
+        device = get_module_device(model)
+        for module in model.modules():
+            if isinstance(module, DynamicChannelMixin):
+                in_channels = getattr(module,
+                                      module.attr_mappings['in_channels'], 0)
+                if module.get_mutable_attr('in_channels') is None:
+                    module.register_mutable_attr(
+                        'in_channels',
+                        container_class(in_channels).to(device))
+                out_channels = getattr(module,
+                                       module.attr_mappings['out_channels'], 0)
+                if module.get_mutable_attr('out_channels') is None:
+
+                    module.register_mutable_attr(
+                        'out_channels',
+                        container_class(out_channels).to(device))
+
+    def _register_mutable_channel(self, mutable_channel: BaseMutableChannel):
+        # register mutable_channel
+        for channel in list(self.input_related) + list(self.output_related):
+            module = channel.module
+            if isinstance(module, DynamicChannelMixin):
+                container: MutableChannelContainer
+                if channel.is_output_channel and module.get_mutable_attr(
+                        'out_channels') is not None:
+                    container = module.get_mutable_attr('out_channels')
+                elif channel.is_output_channel is False \
+                        and module.get_mutable_attr('in_channels') is not None:
+                    container = module.get_mutable_attr('in_channels')
+                else:
+                    raise NotImplementedError()
+
+                if channel.num_channels == self.num_channels:
+                    mutable_channel_ = mutable_channel
+                    start = channel.start
+                    end = channel.end
+                elif channel.num_channels > self.num_channels:
+
+                    if channel.num_channels % self.num_channels == 0:
+                        ratio = channel.num_channels // self.num_channels
+                    else:
+                        ratio = channel.num_channels / self.num_channels
+
+                    mutable_channel_ = \
+                        mutable_channel.expand_mutable_channel(ratio)
+                    start = channel.start
+                    end = channel.end
+                else:
+                    raise NotImplementedError()
+
+                if (start, end) in container.mutable_channels:
+                    existed = container.mutable_channels[(start, end)]
+                    if not isinstance(existed, DerivedMutable):
+                        assert mutable_channel is existed
+                    else:
+                        source_mutables = list(
+                            existed._trace_source_mutables())
+                        is_same = [
+                            mutable_channel is mutable
+                            for mutable in source_mutables
+                        ]
+                        assert any(is_same), 'existed a mutable channel.'
+
+                else:
+                    container.register_mutable(mutable_channel_, start, end)
+
+
+ChannelUnitType = TypeVar('ChannelUnitType', bound=MutableChannelUnit)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/one_shot_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/one_shot_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..220d49b41853b08e0a14dd12c16ba82146492919
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/one_shot_mutable_channel_unit.py
@@ -0,0 +1,139 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import random
+from typing import Dict, List, Union
+
+import torch.nn as nn
+
+from mmrazor.registry import MODELS
+from ..oneshot_mutable_channel import OneShotMutableChannel
+from .sequential_mutable_channel_unit import SequentialMutableChannelUnit
+
+
+@MODELS.register_module()
+class OneShotMutableChannelUnit(SequentialMutableChannelUnit):
+    """OneShotMutableChannelUnit is for single path supernet such as AutoSlim.
+    In single path supernet, each module only has one choice invoked at the
+    same time. A path is obtained by sampling all the available choices. It is
+    the base class for one shot mutable channel.
+
+    Args:
+        num_channels (_type_): The raw number of channels.
+        candidate_choices (List[Union[int, float]], optional):
+            A list of candidate width ratios. Each
+            candidate indicates how many channels to be reserved.
+            Defaults to [0.5, 1.0](choice_mode='ratio').
+        choice_mode (str, optional): Mode of candidates.
+            One of "ratio" or "number". Defaults to 'ratio'.
+        divisor (int): Used to make choice divisible.
+        min_value (int): the minimal value used when make divisible.
+        min_ratio (float): the minimal ratio used when make divisible.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 candidate_choices: List[Union[int, float]] = [0.5, 1.0],
+                 choice_mode='ratio',
+                 divisor=1,
+                 min_value=1,
+                 min_ratio=0.9) -> None:
+        super().__init__(num_channels, choice_mode, divisor, min_value,
+                         min_ratio)
+
+        candidate_choices = copy.copy(candidate_choices)
+        if candidate_choices == []:
+            candidate_choices.append(
+                self.num_channels if self.is_num_mode else 1.0)
+        self.candidate_choices = self._prepare_candidate_choices(
+            candidate_choices, choice_mode)
+
+        self.mutable_channel = OneShotMutableChannel(num_channels,
+                                                     self.candidate_choices,
+                                                     choice_mode)
+
+        self.unit_predefined = False
+
+    @classmethod
+    def init_from_mutable_channel(cls, mutable_channel: OneShotMutableChannel):
+        unit = cls(mutable_channel.num_channels,
+                   mutable_channel.candidate_choices,
+                   mutable_channel.choice_mode)
+        mutable_channel.candidate_choices = unit.candidate_choices
+        unit.mutable_channel = mutable_channel
+        return unit
+
+    def prepare_for_pruning(self, model: nn.Module):
+        """Prepare for pruning."""
+        if not self.unit_predefined:
+            super().prepare_for_pruning(model)
+        self.current_choice = self.max_choice
+
+    # ~
+
+    def config_template(self,
+                        with_init_args=False,
+                        with_channels=False) -> Dict:
+        """Config template of the OneShotMutableChannelUnit."""
+        config = super().config_template(with_init_args, with_channels)
+        if with_init_args:
+            init_cfg = config['init_args']
+            init_cfg.pop('choice_mode')
+            init_cfg.update({
+                'candidate_choices': self.candidate_choices,
+                'choice_mode': self.choice_mode
+            })
+        return config
+
+    # choice
+
+    @property
+    def current_choice(self) -> Union[int, float]:
+        """Get current choice."""
+        return super().current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice: Union[int, float]):
+        """Set current choice."""
+        assert choice in self.candidate_choices
+        int_choice = self._get_valid_int_choice(choice)
+        choice_ = int_choice if self.is_num_mode else self._num2ratio(
+            int_choice)
+        self.mutable_channel.current_choice = choice_
+
+    def sample_choice(self) -> Union[int, float]:
+        """Sample a valid choice."""
+        rand_idx = random.randint(0, len(self.candidate_choices) - 1)
+        return self.candidate_choices[rand_idx]
+
+    @property
+    def min_choice(self) -> Union[int, float]:
+        """Get Minimal choice."""
+        return self.candidate_choices[0]
+
+    @property
+    def max_choice(self) -> Union[int, float]:
+        """Get Maximal choice."""
+        return self.candidate_choices[-1]
+
+    # private methods
+
+    def _prepare_candidate_choices(self, candidate_choices: List,
+                                   choice_mode) -> List:
+        """Process candidate_choices."""
+        choice_type = int if choice_mode == 'number' else float
+        for choice in candidate_choices:
+            assert isinstance(choice, choice_type)
+        if self.is_num_mode:
+            candidate_choices_ = [
+                self._make_divisible(choice) for choice in candidate_choices
+            ]
+        else:
+            candidate_choices_ = [
+                self._num2ratio(self._make_divisible(self._ratio2num(choice)))
+                for choice in candidate_choices
+            ]
+        if candidate_choices_ != candidate_choices:
+            self._make_divisible_info(candidate_choices, candidate_choices_)
+
+        candidate_choices_ = sorted(candidate_choices_)
+        return candidate_choices_
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/sequential_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/sequential_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..d32c5fead2b67d5633a5904aad76c9f07883ceba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/sequential_mutable_channel_unit.py
@@ -0,0 +1,157 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+from typing import Dict, Union
+
+import torch.nn as nn
+from mmcv.cnn.bricks import Conv2dAdaptivePadding
+from mmengine import MMLogger
+from mmengine.model.utils import _BatchNormXd
+from mmengine.utils.dl_utils.parrots_wrapper import \
+    SyncBatchNorm as EngineSyncBatchNorm
+
+from mmrazor.models.architectures import dynamic_ops
+from mmrazor.registry import MODELS
+from ..mutable_channel_container import MutableChannelContainer
+from ..sequential_mutable_channel import SquentialMutableChannel
+from .mutable_channel_unit import MutableChannelUnit
+
+
+# TODO change the name of SequentialMutableChannelUnit
+@MODELS.register_module()
+class SequentialMutableChannelUnit(MutableChannelUnit):
+    """SequentialMutableChannelUnit accepts a intger(number) or float(ratio) as
+    the choice, which indicates how many of the channels are remained from left
+    to right, like 11110000.
+
+    Args:
+        num_channels (int): number of channels.
+        choice_mode (str): mode of choice, which is one of 'number' or 'ratio'.
+        divisor (int): Used to make choice divisible.
+        min_value (int): the minimal value used when make divisible.
+        min_ratio (float): the minimal ratio used when make divisible.
+    """
+
+    def __init__(
+            self,
+            num_channels: int,
+            choice_mode='number',
+            # args for make divisible
+            divisor=1,
+            min_value=1,
+            min_ratio=0.9) -> None:
+        super().__init__(num_channels)
+        assert choice_mode in ['ratio', 'number']
+        self.choice_mode = choice_mode
+
+        self.mutable_channel: SquentialMutableChannel = \
+            SquentialMutableChannel(num_channels, choice_mode=choice_mode)
+
+        # for make_divisible
+        self.divisor = divisor
+        self.min_value = min_value
+        self.min_ratio = min_ratio
+
+    @classmethod
+    def init_from_mutable_channel(cls,
+                                  mutable_channel: SquentialMutableChannel):
+        unit = cls(mutable_channel.num_channels, mutable_channel.choice_mode)
+        unit.mutable_channel = mutable_channel
+        return unit
+
+    def prepare_for_pruning(self, model: nn.Module):
+        """Prepare for pruning, including register mutable channels."""
+        # register MutableMask
+        self._replace_with_dynamic_ops(
+            model, {
+                Conv2dAdaptivePadding:
+                dynamic_ops.DynamicConv2dAdaptivePadding,
+                nn.Conv2d: dynamic_ops.DynamicConv2d,
+                nn.BatchNorm2d: dynamic_ops.DynamicBatchNorm2d,
+                nn.Linear: dynamic_ops.DynamicLinear,
+                nn.SyncBatchNorm: dynamic_ops.DynamicSyncBatchNorm,
+                EngineSyncBatchNorm: dynamic_ops.DynamicSyncBatchNorm,
+                _BatchNormXd: dynamic_ops.DynamicBatchNormXd,
+            })
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
+
+    #  ~
+
+    @property
+    def is_num_mode(self):
+        return self.choice_mode == 'number'
+
+    def fix_chosen(self, choice=None):
+        """fix chosen."""
+        super().fix_chosen(choice)
+        self.mutable_channel.fix_chosen()
+
+    def config_template(self,
+                        with_init_args=False,
+                        with_channels=False) -> Dict:
+        """Template of config."""
+        config = super().config_template(with_init_args, with_channels)
+        if with_init_args:
+            init_args: Dict = config['init_args']
+            init_args.update(
+                dict(
+                    choice_mode=self.choice_mode,
+                    divisor=self.divisor,
+                    min_value=self.min_value,
+                    min_ratio=self.min_ratio))
+        return config
+
+    # choice
+
+    @property
+    def current_choice(self) -> Union[int, float]:
+        """return current choice."""
+        return self.mutable_channel.current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice: Union[int, float]):
+        """set choice."""
+        choice_num_ = self._get_valid_int_choice(choice)
+        self.mutable_channel.current_choice = choice_num_
+
+    def sample_choice(self) -> Union[int, float]:
+        """Sample a choice in (0,1]"""
+        num_choice = random.randint(1, self.num_channels)
+        num_choice = self._make_divisible(num_choice)
+        if self.is_num_mode:
+            return num_choice
+        else:
+            return self._num2ratio(num_choice)
+
+    # private methods
+    def _get_valid_int_choice(self, choice: Union[float, int]) -> int:
+        choice_num = self._ratio2num(choice)
+        choice_num_ = self._make_divisible(choice_num)
+        if choice_num != choice_num_:
+            self._make_divisible_info(choice, self.current_choice)
+        return choice_num_
+
+    def _make_divisible(self, choice_int: int):
+        """Make the choice divisible."""
+        from mmrazor.models.utils import make_divisible
+        return make_divisible(choice_int, self.divisor, self.min_value,
+                              self.min_ratio)
+
+    def _num2ratio(self, choice: Union[int, float]) -> float:
+        """Convert the a number choice to a ratio choice."""
+        if isinstance(choice, float):
+            return choice
+        else:
+            return choice / self.num_channels
+
+    def _ratio2num(self, choice: Union[int, float]) -> int:
+        """Convert the a ratio choice to a number choice."""
+        if isinstance(choice, int):
+            return choice
+        else:
+            return max(1, int(self.num_channels * choice))
+
+    def _make_divisible_info(self, choice, new_choice):
+        logger = MMLogger.get_current_instance()
+        logger.info(f'The choice={choice}, which is set to {self.name}, '
+                    f'is changed to {new_choice} for a divisible choice.')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/slimmable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/slimmable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..a51dce80b03d4a5e102c998336fb225bf067dde3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/slimmable_channel_unit.py
@@ -0,0 +1,59 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import List, Union
+
+import torch.nn as nn
+
+from mmrazor.models.architectures import dynamic_ops
+from mmrazor.registry import MODELS
+from ..mutable_channel_container import MutableChannelContainer
+from .one_shot_mutable_channel_unit import OneShotMutableChannelUnit
+
+
+@MODELS.register_module()
+class SlimmableChannelUnit(OneShotMutableChannelUnit):
+    """A type of ``MutableChannelUnit`` to train several subnets together.
+
+    Args:
+        num_channels (int): The raw number of channels.
+        candidate_choices (List[Union[int, float]], optional):
+            A list of candidate width ratios. Each
+            candidate indicates how many channels to be reserved.
+            Defaults to [0.5, 1.0](choice_mode='ratio').
+        choice_mode (str, optional): Mode of candidates.
+            One of 'ratio' or 'number'. Defaults to 'number'.
+        divisor (int, optional): Used to make choice divisible.
+        min_value (int, optional): The minimal value used when make divisible.
+        min_ratio (float, optional): The minimal ratio used when make
+            divisible.
+    """
+
+    def __init__(self,
+                 num_channels: int,
+                 candidate_choices: List[Union[int, float]] = [],
+                 choice_mode='number',
+                 divisor=1,
+                 min_value=1,
+                 min_ratio=0.9) -> None:
+        super().__init__(num_channels, candidate_choices, choice_mode, divisor,
+                         min_value, min_ratio)
+
+    def prepare_for_pruning(self, model: nn.Module):
+        """Prepare for pruning."""
+        self._replace_with_dynamic_ops(
+            model, {
+                nn.Conv2d: dynamic_ops.DynamicConv2d,
+                nn.BatchNorm2d: dynamic_ops.SwitchableBatchNorm2d,
+                nn.Linear: dynamic_ops.DynamicLinear
+            })
+        self.alter_candidates_of_switchbn(self.candidate_choices)
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
+
+    def alter_candidates_of_switchbn(self, candidates: List):
+        """Change candidates of SwitchableBatchNorm2d."""
+        for channel in list(self.output_related) + list(self.input_related):
+            if isinstance(channel.module, dynamic_ops.SwitchableBatchNorm2d) \
+                    and len(channel.module.candidate_bn) == 0:
+                channel.module.init_candidates(candidates)
+        self.current_choice = self.max_choice
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/utils.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..41601ac7ae7571d941233c8b7e4ab39211926e6f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_channel/units/utils.py
@@ -0,0 +1,80 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import List
+
+import torch
+
+from mmrazor.models.mutables.mutable_channel.units import \
+    SequentialMutableChannelUnit
+from mmrazor.utils import print_log
+
+
+def assert_model_is_changed(tensors1, tensors2):
+    """Return if the tensors has the same shape (length)."""
+    shape1 = get_shape(tensors1, only_length=True)
+    shape2 = get_shape(tensors2, only_length=True)
+    assert shape1 == shape2, f'{shape1}!={shape2}'
+
+
+def get_shape(tensor, only_length=False):
+    """Get the shape of a tensor list/tuple/dict.
+
+    Args:
+        tensor (Union[List,Tuple,Dict,Tensor]): input tensors.
+        only_length (bool, optional): If only return the length of the tensors.
+            Defaults to False.
+    """
+    if isinstance(tensor, torch.Tensor):
+        if only_length:
+            return len(tensor.shape)
+        else:
+            return tensor.shape
+    elif isinstance(tensor, list) or isinstance(tensor, tuple):
+        shapes = []
+        for x in tensor:
+            shapes.append(get_shape(x, only_length))
+        return shapes
+    elif isinstance(tensor, dict):
+        shapes = {}
+        for key in tensor:
+            shapes[key] = get_shape(tensor[key], only_length)
+        return shapes
+    else:
+        raise NotImplementedError(
+            f'unsuppored type{type(tensor)} to get shape of tensors.')
+
+
+def forward_units(model, try_units: List[SequentialMutableChannelUnit],
+                  units: List[SequentialMutableChannelUnit], demo_input,
+                  template_output):
+    """Forward a model with MutableChannelUnits and assert if the result
+    changed."""
+    model.eval()
+    for unit in units:
+        unit.current_choice = 1.0
+    for unit in try_units:
+        unit.current_choice = min(max(0.1, unit.sample_choice()), 0.9)
+    if isinstance(demo_input, dict):
+        tensors = model(**demo_input)
+    else:
+        tensors = model(demo_input)
+    assert_model_is_changed(template_output, tensors)
+
+
+def find_mutable(model, try_units, units, demo_input, template_tensors):
+    """Find really mutable MutableChannelUnits in some MutableChannelUnits."""
+    if len(try_units) == 0:
+        return []
+    try:
+        forward_units(model, try_units, units, demo_input, template_tensors)
+        return try_units
+    except Exception:
+        if len(try_units) == 1:
+            print_log(f'Find an unmutable unit {try_units[0]}', level='debug')
+            return []
+        else:
+            num = len(try_units)
+            return find_mutable(model, try_units[:num // 2], units, demo_input,
+                                template_tensors) + find_mutable(
+                                    model, try_units[num // 2:], units,
+                                    demo_input, template_tensors)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bcf10c3a8c6f174fa9fcf61f34eb2d399035ada7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .diff_mutable_module import (DiffChoiceRoute, DiffMutableModule,
+                                  DiffMutableOP, OneHotMutableOP)
+from .mutable_module import MutableModule
+from .one_shot_mutable_module import OneShotMutableModule, OneShotMutableOP
+
+__all__ = [
+    'DiffMutableModule', 'DiffMutableOP', 'DiffChoiceRoute',
+    'OneShotMutableOP', 'OneShotMutableModule', 'MutableModule',
+    'OneHotMutableOP'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/diff_mutable_module.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/diff_mutable_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e524ec67c78fe32922e2306838ba861d86abeaee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/diff_mutable_module.py
@@ -0,0 +1,582 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import abstractmethod
+from functools import partial
+from typing import Any, Callable, Dict, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+from mmrazor.registry import MODELS
+from mmrazor.utils.typing import DumpChosen
+from .mutable_module import MutableModule
+
+PartialType = Callable[[Any, Optional[nn.Parameter]], Any]
+
+
+class DiffMutableModule(MutableModule):
+    """Base class for differentiable mutables.
+
+    Args:
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+
+    Note:
+        :meth:`forward_all` is called when calculating FLOPs.
+    """
+
+    def __init__(self, **kwargs) -> None:
+        super().__init__(**kwargs)
+
+    @abstractmethod
+    def sample_choice(self, arch_param: Tensor):
+        """Sample choice according arch parameters."""
+        raise NotImplementedError
+
+    def forward(self, x: Any, arch_param: Optional[nn.Parameter] = None):
+        """Calls either :func:`forward_fixed` or :func:`forward_arch_param`
+        depending on whether :func:`is_fixed` is ``True`` and whether
+        :func:`arch_param` is None.
+
+        To reduce the coupling between `Mutable` and `Mutator`, the
+        `arch_param` is generated by the `Mutator` and is passed to the
+        forward function as an argument.
+
+        Note:
+            :meth:`forward_fixed` is called when in `fixed` mode.
+            :meth:`forward_arch_param` is called when in `unfixed` mode.
+
+        Args:
+            x (Any): input data for forward computation.
+            arch_param (nn.Parameter, optional): the architecture parameters
+                for ``DiffMutableModule``.
+
+        Returns:
+            Any: the result of forward
+        """
+        if self.is_fixed:
+            return self.forward_fixed(x)
+        else:
+            if arch_param is None:
+                return self.forward_all(x)
+            else:
+                return self.forward_arch_param(x, arch_param=arch_param)
+
+    def compute_arch_probs(self, arch_param: nn.Parameter) -> Tensor:
+        """compute chosen probs according to architecture params."""
+        return F.softmax(arch_param, -1)
+
+    @abstractmethod
+    def forward_arch_param(self, x, arch_param: nn.Parameter):
+        """Forward when the mutable is not fixed.
+
+        All subclasses must implement this method.
+        """
+
+    def set_forward_args(self, arch_param: nn.Parameter) -> None:
+        """Interface for modifying the arch_param using partial."""
+        forward_with_default_args: PartialType = \
+            partial(self.forward, arch_param=arch_param)
+        setattr(self, 'forward', forward_with_default_args)
+
+
+@MODELS.register_module()
+class DiffMutableOP(DiffMutableModule):
+    """A type of ``MUTABLES`` for differentiable architecture search, such as
+    DARTS. Search the best module by learnable parameters `arch_param`.
+
+    Args:
+        candidates (dict[str, dict]): the configs for the candidate
+            operations.
+        fix_threshold (float): The threshold that determines whether to fix
+            the choice of current module as the op with the maximum `probs`.
+            It happens when the maximum prob is `fix_threshold` or more higher
+            then all the other probs. Default to 1.0.
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(
+        self,
+        candidates: Dict[str, Dict],
+        fix_threshold: float = 1.0,
+        module_kwargs: Optional[Dict[str, Dict]] = None,
+        alias: Optional[str] = None,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(
+            module_kwargs=module_kwargs, alias=alias, init_cfg=init_cfg)
+        assert len(candidates) >= 1, \
+            f'Number of candidate op must greater than or equal to 1, ' \
+            f'but got: {len(candidates)}'
+
+        self._is_fixed = False
+        if fix_threshold < 0 or fix_threshold > 1.0:
+            raise ValueError(
+                f'The fix_threshold should be in [0, 1]. Got {fix_threshold}.')
+        self.fix_threshold = fix_threshold
+        self._candidates = self._build_ops(candidates, self.module_kwargs)
+
+    @staticmethod
+    def _build_ops(candidates: Dict[str, Dict],
+                   module_kwargs: Optional[Dict[str, Dict]]) -> nn.ModuleDict:
+        """Build candidate operations based on candidates configures.
+
+        Args:
+            candidates (dict[str, dict]): the configs for the candidate
+                operations.
+            module_kwargs (dict[str, dict], optional): Module initialization
+                named arguments.
+
+        Returns:
+            ModuleDict (dict[str, Any], optional):  the key of ``ops`` is
+                the name of each choice in configs and the value of ``ops``
+                is the corresponding candidate operation.
+        """
+        ops = nn.ModuleDict()
+        for name, op_cfg in candidates.items():
+            assert name not in ops
+            if module_kwargs is not None:
+                op_cfg.update(module_kwargs)
+            ops[name] = MODELS.build(op_cfg)
+        return ops
+
+    def forward_fixed(self, x) -> Tensor:
+        """Forward when the mutable is in `fixed` mode.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+
+        Returns:
+            Tensor: the result of forward the fixed operation.
+        """
+        return sum(self._candidates[choice](x) for choice in self._chosen)
+
+    def forward_arch_param(self, x, arch_param: nn.Parameter) -> Tensor:
+        """Forward with architecture parameters.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+            arch_param (str, optional): architecture parameters for
+                `DiffMutableModule`
+
+
+        Returns:
+            Tensor: the result of forward with ``arch_param``.
+        """
+
+        # compute the probs of choice
+        probs = self.compute_arch_probs(arch_param=arch_param)
+
+        # forward based on probs
+        outputs = list()
+        for prob, module in zip(probs, self._candidates.values()):
+            if prob > 0.:
+                outputs.append(prob * module(x))
+
+        return sum(outputs)
+
+    def forward_all(self, x) -> Tensor:
+        """Forward all choices. Used to calculate FLOPs.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+
+        Returns:
+            Tensor: the result of forward all of the ``choice`` operation.
+        """
+        outputs = list()
+        for op in self._candidates.values():
+            outputs.append(op(x))
+        return sum(outputs)
+
+    def fix_chosen(self, chosen: Union[str, List[str]]) -> None:
+        """Fix mutable with `choice`. This operation would convert `unfixed`
+        mode to `fixed` mode. The :attr:`is_fixed` will be set to True and only
+        the selected operations can be retained.
+
+        Args:
+            chosen (str): the chosen key in ``MUTABLE``.
+                Defaults to None.
+        """
+        if self.is_fixed:
+            raise AttributeError(
+                'The mode of current MUTABLE is `fixed`. '
+                'Please do not call `fix_chosen` function again.')
+
+        if isinstance(chosen, str):
+            chosen = [chosen]
+
+        for c in self.choices:
+            if c not in chosen:
+                self._candidates.pop(c)
+
+        self._chosen = chosen
+        self.is_fixed = True
+
+    def sample_choice(self, arch_param: Tensor) -> str:
+        """Sample choice based on arch_parameters."""
+        return self.choices[torch.argmax(arch_param).item()]
+
+    def dump_chosen(self) -> DumpChosen:
+        chosen = self.export_chosen()
+        meta = dict(all_choices=self.choices)
+        return DumpChosen(chosen=chosen, meta=meta)
+
+    def export_chosen(self) -> str:
+        assert self.current_choice is not None
+        return self.current_choice
+
+    @property
+    def choices(self) -> List[str]:
+        """list: all choices. """
+        return list(self._candidates.keys())
+
+
+@MODELS.register_module()
+class OneHotMutableOP(DiffMutableOP):
+    """A type of ``MUTABLES`` for one-hot sample based architecture search,
+    such as DSNAS. Search the best module by learnable parameters `arch_param`.
+
+    Args:
+        candidates (dict[str, dict]): the configs for the candidate
+            operations.
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def sample_weights(self,
+                       arch_param: nn.Parameter,
+                       probs: torch.Tensor,
+                       random_sample: bool = False) -> Tensor:
+        """Use one-hot distributions to sample the arch weights based on the
+        arch params.
+
+        Args:
+            arch_param (nn.Parameter): architecture parameters for
+                `DiffMutableModule`.
+            probs (Tensor): the probs of choice.
+            random_sample (bool): Whether to random sample arch weights or not
+                Defaults to False.
+
+        Returns:
+            Tensor: Sampled one-hot arch weights.
+        """
+        import torch.distributions as D
+        if random_sample:
+            uni = torch.ones_like(arch_param)
+            m = D.one_hot_categorical.OneHotCategorical(uni)
+        else:
+            m = D.one_hot_categorical.OneHotCategorical(probs=probs)
+        return m.sample()
+
+    def forward_arch_param(
+        self,
+        x: Any,
+        arch_param: nn.Parameter,
+    ) -> Tensor:
+        """Forward with architecture parameters.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+            arch_param (str, optional): architecture parameters for
+                `DiffMutableModule`.
+
+        Returns:
+            Tensor: the result of forward with ``arch_param``.
+        """
+
+        # compute the probs of choice
+        probs = self.compute_arch_probs(arch_param=arch_param)
+
+        if not self.is_fixed:
+            self.arch_weights = self.sample_weights(arch_param, probs)
+            sorted_param = torch.topk(probs, 2)
+            index = (
+                sorted_param[0][0] - sorted_param[0][1] >= self.fix_threshold)
+            if index:
+                self.fix_chosen(self.choices[index])
+
+        if self.is_fixed:
+            index = self.choices.index(self._chosen[0])
+            self.arch_weights.data.zero_()
+            self.arch_weights.data[index].fill_(1.0)
+        self.arch_weights.requires_grad_()
+
+        # forward based on self.arch_weights
+        outputs = list()
+        for prob, module in zip(self.arch_weights, self._candidates.values()):
+            if prob > 0.:
+                outputs.append(prob * module(x))
+
+        return sum(outputs)
+
+
+@MODELS.register_module()
+class DiffChoiceRoute(DiffMutableModule):
+    """A type of ``MUTABLES`` for Neural Architecture Search, which can select
+    inputs from different edges in a differentiable or non-differentiable way.
+    It is commonly used in DARTS.
+
+    Args:
+        edges (nn.ModuleDict): the key of `edges` is the name of different
+            edges. The value of `edges` can be :class:`nn.Module` or
+            :class:`DiffMutableModule`.
+        with_arch_param (bool): whether forward with arch_param. When set to
+            `True`, a differentiable way is adopted. When set to `False`,
+            a non-differentiable way is adopted.
+        alias (str, optional): alias of the `DiffChoiceRoute`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 6 initializers including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+
+    Examples:
+        >>> import torch
+        >>> import torch.nn as nn
+        >>> edges_dict=nn.ModuleDict()
+        >>> edges_dict.add_module('first_edge', nn.Conv2d(32, 32, 3, 1, 1))
+        >>> edges_dict.add_module('second_edge', nn.Conv2d(32, 32, 5, 1, 2))
+        >>> edges_dict.add_module('third_edge', nn.MaxPool2d(3, 1, 1))
+        >>> edges_dict.add_module('fourth_edge', nn.MaxPool2d(5, 1, 2))
+        >>> edges_dict.add_module('fifth_edge', nn.MaxPool2d(7, 1, 3))
+        >>> diff_choice_route_cfg = dict(
+        ...     type="DiffChoiceRoute",
+        ...     edges=edges_dict,
+        ...     with_arch_param=True,
+        ... )
+        >>> arch_param
+        Parameter containing:
+        tensor([-6.1426e-04,  2.3596e-04,  1.4427e-03,  7.1668e-05,
+            -8.9739e-04], requires_grad=True)
+        >>> x = [torch.randn(4, 32, 64, 64) for _ in range(5)]
+        >>> output=diffchoiceroute.forward_arch_param(x, arch_param)
+        >>> output.shape
+        torch.Size([4, 32, 64, 64])
+    """
+
+    def __init__(
+        self,
+        edges: nn.ModuleDict,
+        num_chosen: int = 2,
+        with_arch_param: bool = False,
+        alias: Optional[str] = None,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(alias=alias, init_cfg=init_cfg)
+        assert len(edges) >= 1, \
+            f'Number of edges must greater than or equal to 1, ' \
+            f'but got: {len(edges)}'
+
+        self._with_arch_param = with_arch_param
+        self._is_fixed = False
+        self._candidates: nn.ModuleDict = edges
+        self.num_chosen = num_chosen
+
+    def forward(self, x: Any, arch_param: Optional[nn.Parameter] = None):
+        """Calls either :func:`forward_fixed` or :func:`forward_arch_param`
+        depending on whether :func:`is_fixed` is ``True`` and whether
+        :func:`arch_param` is None.
+
+        To reduce the coupling between `Mutable` and `Mutator`, the
+        `arch_param` is generated by the `Mutator` and is passed to the
+        forward function as an argument.
+
+        Note:
+            :meth:`forward_fixed` is called when in `fixed` mode.
+            :meth:`forward_arch_param` is called when in `unfixed` mode.
+
+        Args:
+            x (Any): input data for forward computation.
+            arch_param (nn.Parameter, optional): the architecture parameters
+                for ``DiffMutableModule``.
+
+        Returns:
+            Any: the result of forward
+        """
+        if self.is_fixed:
+            return self.forward_fixed(x)
+        else:
+            if arch_param is not None and self._with_arch_param:
+                return self.forward_arch_param(x, arch_param=arch_param)
+            else:
+                return self.forward_all(x)
+
+    def forward_fixed(self, inputs: Union[List, Tuple]) -> Tensor:
+        """Forward when the mutable is in `fixed` mode.
+
+        Args:
+            inputs (Union[List[Any], Tuple[Any]]): inputs could be a list or
+                a tuple of Torch.tensor, containing input data for
+                forward computation.
+
+        Returns:
+            Tensor: the result of forward the fixed operation.
+        """
+        assert self._chosen is not None, \
+            'Please call fix_chosen before calling `forward_fixed`.'
+
+        outputs = list()
+        for choice, x in zip(self._unfixed_choices, inputs):
+            if choice in self._chosen:
+                outputs.append(self._candidates[choice](x))
+        return sum(outputs)
+
+    def forward_arch_param(self, x, arch_param: nn.Parameter) -> Tensor:
+        """Forward with architecture parameters.
+
+        Args:
+            x (list[Any] | tuple[Any]]): x could be a list or a tuple
+                of Torch.tensor, containing input data for forward selection.
+            arch_param (nn.Parameter): architecture parameters for
+                for ``DiffMutableModule``.
+
+        Returns:
+            Tensor: the result of forward with ``arch_param``.
+        """
+        assert len(x) == len(self._candidates), \
+            f'Length of `edges` {len(self._candidates)} should be ' \
+            f'same as the length of inputs {len(x)}.'
+
+        probs = self.compute_arch_probs(arch_param=arch_param)
+
+        outputs = list()
+        for prob, module, input in zip(probs, self._candidates.values(), x):
+            if prob > 0:
+                # prob may equal to 0 in gumbel softmax.
+                outputs.append(prob * module(input))
+
+        return sum(outputs)
+
+    def forward_all(self, x):
+        """Forward all choices.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+
+        Returns:
+            Tensor: the result of forward all of the ``choice`` operation.
+        """
+        assert len(x) == len(self._candidates), \
+            f'Lenght of edges {len(self._candidates)} should be same as ' \
+            f'the length of inputs {len(x)}.'
+
+        outputs = list()
+        for op, input in zip(self._candidates.values(), x):
+            outputs.append(op(input))
+
+        return sum(outputs)
+
+    def fix_chosen(self, chosen: List[str]) -> None:
+        """Fix mutable with `choice`. This operation would convert to `fixed`
+        mode. The :attr:`is_fixed` will be set to True and only the selected
+        operations can be retained.
+
+        Args:
+            chosen (list(str)): the chosen key in ``MUTABLE``.
+        """
+        self._unfixed_choices = self.choices
+
+        if self.is_fixed:
+            raise AttributeError(
+                'The mode of current MUTABLE is `fixed`. '
+                'Please do not call `fix_chosen` function again.')
+
+        for c in self.choices:
+            if c not in chosen:
+                self._candidates.pop(c)
+
+        self._chosen = chosen
+        self.is_fixed = True
+
+    @property
+    def choices(self) -> List[str]:
+        """list: all choices. """
+        return list(self._candidates.keys())
+
+    def dump_chosen(self) -> DumpChosen:
+        chosen = self.export_chosen()
+        meta = dict(all_choices=self.choices)
+        return DumpChosen(chosen=chosen, meta=meta)
+
+    def export_chosen(self) -> str:
+        assert self.current_choice is not None
+        return self.current_choice
+
+    def sample_choice(self, arch_param: Tensor) -> List[str]:
+        """sample choice based on `arch_param`."""
+        sort_idx = torch.argsort(-arch_param).cpu().numpy().tolist()
+        choice_idx = sort_idx[:self.num_chosen]
+        choice = [self.choices[i] for i in choice_idx]
+        return choice
+
+
+@MODELS.register_module()
+class GumbelChoiceRoute(DiffChoiceRoute):
+    """A type of ``MUTABLES`` for Neural Architecture Search using Gumbel-Max
+    trick, which can select inputs from different edges in a differentiable or
+    non-differentiable way. It is commonly used in DARTS.
+
+    Args:
+        edges (nn.ModuleDict): the key of `edges` is the name of different
+            edges. The value of `edges` can be :class:`nn.Module` or
+            :class:`DiffMutableModule`.
+        tau (float): non-negative scalar temperature in gumbel softmax.
+        hard (bool): if `True`, the returned samples will be discretized as
+            one-hot vectors, but will be differentiated as if it is the soft
+            sample in autograd. Defaults to `True`.
+        with_arch_param (bool): whether forward with arch_param. When set to
+            `True`, a differentiable way is adopted. When set to `False`,
+            a non-differentiable way is adopted.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 6 initializers including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(
+        self,
+        edges: nn.ModuleDict,
+        tau: float = 1.0,
+        hard: bool = True,
+        with_arch_param: bool = False,
+        alias: Optional[str] = None,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(
+            edges=edges,
+            with_arch_param=with_arch_param,
+            alias=alias,
+            init_cfg=init_cfg)
+        self.tau = tau
+        self.hard = hard
+
+    def compute_arch_probs(self, arch_param: nn.Parameter) -> Tensor:
+        """Compute chosen probs by Gumbel-Max trick."""
+        return F.gumbel_softmax(
+            arch_param, tau=self.tau, hard=self.hard, dim=-1)
+
+    def set_temperature(self, tau: float) -> None:
+        """Set temperature of gumbel softmax."""
+        self.tau = tau
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/mutable_module.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/mutable_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e03df285abe14560e11584206ecc2f17c9aa87c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/mutable_module.py
@@ -0,0 +1,97 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import abstractmethod
+from typing import Any, Dict, List, Optional
+
+from ..base_mutable import BaseMutable
+
+
+class MutableModule(BaseMutable):
+    """Base Class for mutables. Mutable means a searchable module widely used
+    in Neural Architecture Search(NAS).
+
+    It mainly consists of some optional operations, and achieving
+    searchable function by handling choice with ``MUTATOR``.
+
+    All subclass should implement the following APIs and the other
+    abstract method in ``BaseMutable``:
+
+    - ``forward()``
+    - ``forward_all()``
+    - ``forward_fix()``
+    - ``choices()``
+
+    Args:
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(self,
+                 module_kwargs: Optional[Dict[str, Dict]] = None,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(alias, init_cfg)
+
+        self.module_kwargs = module_kwargs
+        self._current_choice = None
+
+    @property
+    def mutable_prefix(self) -> str:
+        """Mutable prefix."""
+        return 'module'
+
+    @property
+    def max_choice(self):
+        """max_choice shouldn't exist."""
+        raise AttributeError(
+            'MutableModule does not have the attr `max choice`.')
+
+    @property
+    def min_choice(self):
+        """min_choice shouldn't exist."""
+        raise AttributeError(
+            'MutableModule does not have the attr `min choice`.')
+
+    @property
+    def current_choice(self):
+        """Current choice will affect :meth:`forward` and will be used in
+        :func:`mmrazor.core.subnet.utils.export_fix_subnet` or mutator.
+        """
+        return self._current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice) -> None:
+        """Current choice setter will be executed in mutator."""
+        self._current_choice = choice
+
+    @property
+    @abstractmethod
+    def choices(self) -> List[str]:
+        """list: all choices.  All subclasses must implement this method."""
+
+    @abstractmethod
+    def forward(self, x: Any) -> Any:
+        """Forward computation."""
+
+    @abstractmethod
+    def forward_fixed(self, x):
+        """Forward with the fixed mutable.
+
+        All subclasses must implement this method.
+        """
+
+    @abstractmethod
+    def forward_all(self, x):
+        """Forward all choices.
+
+        All subclasses must implement this method.
+        """
+
+    @property
+    def num_choices(self) -> int:
+        """Number of choices."""
+        return len(self.choices)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/one_shot_mutable_module.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/one_shot_mutable_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..434b05079af8faab1b7c24b025f11ff4fdef9896
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_module/one_shot_mutable_module.py
@@ -0,0 +1,299 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+from abc import abstractmethod
+from typing import Any, Dict, List, Optional, Union
+
+import numpy as np
+import torch.nn as nn
+from torch import Tensor
+
+from mmrazor.registry import MODELS
+from mmrazor.utils.typing import DumpChosen
+from .mutable_module import MutableModule
+
+
+class OneShotMutableModule(MutableModule):
+    """Base class for one shot mutable module. A base type of ``MUTABLES`` for
+    single path supernet such as Single Path One Shot.
+
+    All subclass should implement the following APIs and the other
+    abstract method in ``MutableModule``:
+
+    - ``sample_choice()``
+    - ``forward_choice()``
+
+    Note:
+        :meth:`forward_all` is called when calculating FLOPs.
+    """
+
+    def forward(self, x: Any) -> Any:
+        """Calls either :func:`forward_fixed` or :func:`forward_choice`
+        depending on whether :func:`is_fixed` is ``True`` and whether
+        :func:`current_choice` is None.
+
+        Note:
+            :meth:`forward_fixed` is called in `fixed` mode.
+            :meth:`forward_all` is called in `unfixed` mode with
+                :func:`current_choice` is None.
+            :meth:`forward_choice` is called in `unfixed` mode with
+                :func:`current_choice` is not None.
+
+        Args:
+            x (Any): input data for forward computation.
+            choice (CHOICE_TYPE, optional): the chosen key in ``MUTABLE``.
+
+        Returns:
+            Any: the result of forward
+        """
+        if self.is_fixed:
+            return self.forward_fixed(x)
+        if self.current_choice is None:
+            return self.forward_all(x)
+        else:
+            return self.forward_choice(x, choice=self.current_choice)
+
+    @abstractmethod
+    def sample_choice(self) -> str:
+        """Sample random choice.
+
+        Returns:
+            str: the chosen key in ``MUTABLE``.
+        """
+
+    @abstractmethod
+    def forward_choice(self, x, choice: str):
+        """Forward with the unfixed mutable and current_choice is not None.
+
+        All subclasses must implement this method.
+        """
+
+
+@MODELS.register_module()
+class OneShotMutableOP(OneShotMutableModule):
+    """A type of ``MUTABLES`` for single path supernet, such as Single Path One
+    Shot. In single path supernet, each choice block only has one choice
+    invoked at the same time. A path is obtained by sampling all the choice
+    blocks.
+
+    Args:
+        candidates (dict[str, dict]): the configs for the candidate
+            operations.
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+
+    Examples:
+        >>> import torch
+        >>> from mmrazor.models.mutables import OneShotMutableOP
+
+        >>> candidates = nn.ModuleDict({
+        ...     'conv3x3': nn.Conv2d(32, 32, 3, 1, 1),
+        ...     'conv5x5': nn.Conv2d(32, 32, 5, 1, 2),
+
+        >>> input = torch.randn(1, 32, 64, 64)
+        >>> op = OneShotMutableOP(candidates)
+
+        >>> op.choices
+        ['conv3x3', 'conv5x5', 'conv7x7']
+        >>> op.num_choices
+        3
+        >>> op.is_fixed
+        False
+
+        >>> op.current_choice = 'conv3x3'
+        >>> unfix_output = op.forward(input)
+        >>> torch.all(unfixed_output == candidates['conv3x3'](input))
+        True
+
+        >>> op.fix_chosen('conv3x3')
+        >>> fix_output = op.forward(input)
+        >>> torch.all(fix_output == unfix_output)
+        True
+
+        >>> op.choices
+        ['conv3x3']
+        >>> op.num_choices
+        1
+        >>> op.is_fixed
+        True
+    """
+
+    def __init__(
+        self,
+        candidates: Union[Dict[str, Dict], nn.ModuleDict],
+        module_kwargs: Optional[Dict[str, Dict]] = None,
+        alias: Optional[str] = None,
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__(
+            module_kwargs=module_kwargs, alias=alias, init_cfg=init_cfg)
+        assert len(candidates) >= 1, \
+            f'Number of candidate op must greater than 1, ' \
+            f'but got: {len(candidates)}'
+
+        self._chosen: Optional[str] = None
+        if isinstance(candidates, dict):
+            self._candidates = self._build_ops(candidates, self.module_kwargs)
+        elif isinstance(candidates, nn.ModuleDict):
+            self._candidates = candidates
+        else:
+            raise TypeError('candidata_ops should be a `dict` or '
+                            f'`nn.ModuleDict` instance, but got '
+                            f'{type(candidates)}')
+
+        assert len(self._candidates) >= 1, \
+            f'Number of candidate op must greater than or equal to 1, ' \
+            f'but got {len(self._candidates)}'
+
+    @staticmethod
+    def _build_ops(
+            candidates: Union[Dict[str, Dict], nn.ModuleDict],
+            module_kwargs: Optional[Dict[str, Dict]] = None) -> nn.ModuleDict:
+        """Build candidate operations based on choice configures.
+
+        Args:
+            candidates (dict[str, dict] | :obj:`nn.ModuleDict`): the configs
+                for the candidate operations or nn.ModuleDict.
+            module_kwargs (dict[str, dict], optional): Module initialization
+                named arguments.
+
+        Returns:
+            ModuleDict (dict[str, Any], optional):  the key of ``ops`` is
+                the name of each choice in configs and the value of ``ops``
+                is the corresponding candidate operation.
+        """
+        if isinstance(candidates, nn.ModuleDict):
+            return candidates
+
+        ops = nn.ModuleDict()
+        for name, op_cfg in candidates.items():
+            assert name not in ops
+            if module_kwargs is not None:
+                op_cfg.update(module_kwargs)
+            ops[name] = MODELS.build(op_cfg)
+        return ops
+
+    def forward_fixed(self, x: Any) -> Tensor:
+        """Forward with the `fixed` mutable.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+
+        Returns:
+            Tensor: the result of forward the fixed operation.
+        """
+        return self._candidates[self._chosen](x)
+
+    def forward_choice(self, x, choice: str) -> Tensor:
+        """Forward with the `unfixed` mutable and current choice is not None.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+            choice (str): the chosen key in ``OneShotMutableOP``.
+
+        Returns:
+            Tensor: the result of forward the ``choice`` operation.
+        """
+        assert isinstance(choice, str) and choice in self.choices
+        return self._candidates[choice](x)
+
+    def forward_all(self, x) -> Tensor:
+        """Forward all choices. Used to calculate FLOPs.
+
+        Args:
+            x (Any): x could be a Torch.tensor or a tuple of
+                Torch.tensor, containing input data for forward computation.
+
+        Returns:
+            Tensor: the result of forward all of the ``choice`` operation.
+        """
+        outputs = list()
+        for op in self._candidates.values():
+            outputs.append(op(x))
+        return sum(outputs)
+
+    def fix_chosen(self, chosen: str) -> None:
+        """Fix mutable with subnet config. This operation would convert
+        `unfixed` mode to `fixed` mode. The :attr:`is_fixed` will be set to
+        True and only the selected operations can be retained.
+
+        Args:
+            chosen (str): the chosen key in ``MUTABLE``. Defaults to None.
+        """
+        if self.is_fixed:
+            raise AttributeError(
+                'The mode of current MUTABLE is `fixed`. '
+                'Please do not call `fix_chosen` function again.')
+
+        for c in self.choices:
+            if c != chosen:
+                self._candidates.pop(c)
+
+        self._chosen = chosen
+        self.is_fixed = True
+
+    def dump_chosen(self) -> DumpChosen:
+        chosen = self.export_chosen()
+        meta = dict(all_choices=self.choices)
+        return DumpChosen(chosen=chosen, meta=meta)
+
+    def export_chosen(self) -> str:
+        assert self.current_choice is not None
+        return self.current_choice
+
+    def sample_choice(self) -> str:
+        """uniform sampling."""
+        return np.random.choice(self.choices, 1)[0]
+
+    @property
+    def choices(self) -> List[str]:
+        """list: all choices. """
+        return list(self._candidates.keys())
+
+
+@MODELS.register_module()
+class OneShotProbMutableOP(OneShotMutableOP):
+    """Sampling candidate operation according to probability.
+
+    Args:
+        candidates (dict[str, dict]): the configs for the candidate
+            operations.
+        choice_probs (list): the probability of sampling each
+            candidate operation.
+        module_kwargs (dict[str, dict], optional): Module initialization named
+            arguments. Defaults to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(self,
+                 candidates: Dict[str, Dict],
+                 choice_probs: list = None,
+                 module_kwargs: Optional[Dict[str, Dict]] = None,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(
+            candidates=candidates,
+            module_kwargs=module_kwargs,
+            alias=alias,
+            init_cfg=init_cfg)
+        assert choice_probs is not None
+        assert sum(choice_probs) - 1 < np.finfo(np.float64).eps, \
+            f'Please make sure the sum of the {choice_probs} is 1.'
+        self.choice_probs = choice_probs
+
+    def sample_choice(self) -> str:
+        """Sampling with probabilities."""
+        assert len(self.choice_probs) == len(self._candidates.keys())
+        choice = random.choices(
+            self.choices, weights=self.choice_probs, k=1)[0]
+        return choice
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f83c93fe9f3fbfbf511a29d6a7d13354515942a9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .mutable_value import MutableValue, OneShotMutableValue
+
+__all__ = ['MutableValue', 'OneShotMutableValue']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/mutable_value.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/mutable_value.py
new file mode 100644
index 0000000000000000000000000000000000000000..146e886d06c8a54460381bc359798646f07acd90
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutables/mutable_value/mutable_value.py
@@ -0,0 +1,246 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+from mmrazor.registry import MODELS
+from mmrazor.utils.typing import DumpChosen
+from ..base_mutable import BaseMutable
+from ..derived_mutable import DerivedMethodMixin, DerivedMutable
+
+Value = Union[int, float]
+
+
+@MODELS.register_module()
+class MutableValue(BaseMutable, DerivedMethodMixin):
+    """Base class for mutable value.
+
+    A mutable value is actually a mutable that adds some functionality to a
+    list containing objects of the same type.
+
+    Args:
+        value_list (list): List of value, each value must have the same type.
+        default_value (any, optional): Default value, must be one in
+            `value_list`. Default to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(self,
+                 value_list: List[Value],
+                 default_value: Optional[Any] = None,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(alias, init_cfg)
+
+        self._check_is_same_type(value_list)
+        self._value_list = value_list
+
+        if default_value is None:
+            default_value = value_list[0]
+        self.current_choice = default_value
+
+    @staticmethod
+    def _check_is_same_type(value_list: List[Any]) -> None:
+        """Check whether value in `value_list` has the same type."""
+        if len(value_list) == 1:
+            return
+
+        for i in range(1, len(value_list)):
+            is_same_type = type(value_list[i - 1]) is \
+                type(value_list[i])  # noqa: E721
+            if not is_same_type:
+                raise TypeError(
+                    'All elements in `value_list` must have same '
+                    f'type, but both types {type(value_list[i-1])} '
+                    f'and type {type(value_list[i])} exist.')
+
+    @property
+    def mutable_prefix(self) -> str:
+        """Mutable prefix."""
+        return 'value'
+
+    @property
+    def choices(self) -> List[Any]:
+        """List of choices."""
+        return self._value_list
+
+    def fix_chosen(self, chosen: Value) -> None:
+        """Fix mutable value with subnet config.
+
+        Args:
+            chosen (dict): the information of chosen.
+        """
+        if self.is_fixed:
+            raise RuntimeError('MutableValue can not be fixed twice')
+
+        assert chosen in self.choices
+
+        self.current_choice = chosen
+        self.is_fixed = True
+
+    def dump_chosen(self) -> DumpChosen:
+        """Dump information of chosen.
+
+        Returns:
+            Dict[str, Any]: Dumped information.
+        """
+        chosen = self.export_chosen()
+        meta = dict(all_choices=self.choices)
+        return DumpChosen(chosen=chosen, meta=meta)
+
+    def export_chosen(self):
+        return self.current_choice
+
+    @property
+    def num_choices(self) -> int:
+        """Number of all choices.
+
+        Returns:
+            int: Number of choices.
+        """
+        return len(self.choices)
+
+    @property
+    def current_choice(self) -> Value:
+        """Current choice of mutable value."""
+        return self._current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice: Any) -> Any:
+        """Setter of current choice."""
+        if choice not in self.choices:
+            raise ValueError(f'Expected choice in: {self.choices}, '
+                             f'but got: {choice}')
+
+        self._current_choice = choice
+
+    def __rmul__(self, other) -> DerivedMutable:
+        """Please refer to method :func:`__mul__`."""
+        return self * other
+
+    def __mul__(self, other: Union[int, float]) -> DerivedMutable:
+        """Overload `*` operator.
+
+        Args:
+            other (int): Expand ratio.
+
+        Returns:
+            DerivedMutable: Derived expand mutable.
+        """
+        if isinstance(other, int):
+            return self.derive_expand_mutable(other)
+        elif isinstance(other, float):
+            return self.derive_expand_mutable(other)
+        raise TypeError(f'Unsupported type {type(other)} for mul!')
+
+    def __floordiv__(self, other: Union[int, Tuple[int,
+                                                   int]]) -> DerivedMutable:
+        """Overload `//` operator.
+
+        Args:
+            other: (int, tuple): divide ratio for int or
+                (divide ratio, divisor) for tuple.
+
+        Returns:
+            DerivedMutable: Derived divide mutable.
+        """
+        if isinstance(other, int):
+            return self.derive_divide_mutable(other)
+        elif isinstance(other, float):
+            return self.derive_divide_mutable(int(other))
+        if isinstance(other, tuple):
+            assert len(other) == 2
+            return self.derive_divide_mutable(*other)
+
+        raise TypeError(f'Unsupported type {type(other)} for div!')
+
+    def __repr__(self) -> str:
+        s = self.__class__.__name__
+        s += f'(value_list={self._value_list}, '
+        s += f'current_choice={self.current_choice})'
+
+        return s
+
+
+# TODO
+# 1. use comparable for type hint
+# 2. use mixin
+@MODELS.register_module()
+class OneShotMutableValue(MutableValue):
+    """Class for one-shot mutable value.
+
+    one-shot mutable value provides `sample_choice` method and `min_choice`,
+    `max_choice` properties on the top of mutable value.
+
+    Args:
+        value_list (list): List of value, each value must have the same type.
+        default_value (any, optional): Default value, must be one in
+            `value_list`. Default to None.
+        alias (str, optional): alias of the `MUTABLE`.
+        init_cfg (dict, optional): initialization configuration dict for
+            ``BaseModule``. OpenMMLab has implement 5 initializer including
+            `Constant`, `Xavier`, `Normal`, `Uniform`, `Kaiming`,
+            and `Pretrained`.
+    """
+
+    def __init__(self,
+                 value_list: List[Any],
+                 default_value: Optional[Any] = None,
+                 alias: Optional[str] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        value_list = sorted(value_list)
+        # set default value as max value
+        if default_value is None:
+            default_value = value_list[-1]
+
+        super().__init__(
+            value_list=value_list,
+            default_value=default_value,
+            alias=alias,
+            init_cfg=init_cfg)
+
+    def sample_choice(self) -> Any:
+        """Random sampling from choices.
+
+        Returns:
+            Any: Selected choice.
+        """
+        return random.choice(self.choices)
+
+    @property
+    def max_choice(self) -> Any:
+        """Max choice of all choices.
+
+        Returns:
+            Any: Max choice.
+        """
+        return self.choices[-1]
+
+    @property
+    def min_choice(self) -> Any:
+        """Min choice of all choices.
+
+        Returns:
+            Any: Min choice.
+        """
+        return self.choices[0]
+
+    def __mul__(self, other) -> DerivedMutable:
+        """Overload `*` operator.
+
+        Args:
+            other (int, SquentialMutableChannel): Expand ratio or
+                SquentialMutableChannel.
+
+        Returns:
+            DerivedMutable: Derived expand mutable.
+        """
+        from ..mutable_channel import SquentialMutableChannel
+
+        if isinstance(other, SquentialMutableChannel):
+            return other * self
+
+        return super().__mul__(other)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..179b4455dd3fd342507e35a8e0c94a1ad540fa23
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .channel_mutator import (ChannelMutator, DCFFChannelMutator,
+                              DMCPChannelMutator, OneShotChannelMutator,
+                              SlimmableChannelMutator)
+from .nas_mutator import NasMutator
+
+__all__ = [
+    'ChannelMutator', 'DCFFChannelMutator', 'DMCPChannelMutator',
+    'SlimmableChannelMutator', 'NasMutator', 'OneShotChannelMutator'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/base_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/base_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..994ba5e6d17273f9ebe1ca4203b0d1877bfbc797
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/base_mutator.py
@@ -0,0 +1,53 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABC, abstractmethod
+from typing import Dict, Generic, Optional, TypeVar
+
+from mmengine.model import BaseModule
+from torch.nn import Module
+
+from ..mutables.base_mutable import BaseMutable
+
+MUTABLE_TYPE = TypeVar('MUTABLE_TYPE', bound=BaseMutable)
+
+
+class BaseMutator(ABC, BaseModule, Generic[MUTABLE_TYPE]):
+    """The base class for mutator.
+
+    Mutator is mainly used for subnet management, it usually provides functions
+    such as sampling and setting of subnets.
+
+    All subclasses should implement the following APIs:
+
+    - ``prepare_from_supernet()``
+    - ``search_space``
+
+    Args:
+        init_cfg (dict, optional): The config to control the initialization.
+    """
+
+    def __init__(self, init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg=init_cfg)
+
+    @abstractmethod
+    def prepare_from_supernet(self, supernet: Module) -> None:
+        """Do some necessary preparations with supernet.
+
+        Args:
+            supernet (:obj:`torch.nn.Module`): The supernet to be searched
+                in your algorithm.
+        """
+
+    @property
+    @abstractmethod
+    def search_groups(self) -> Dict:
+        """Search group of the supernet.
+
+        Note:
+            Search group is different from search space. The key of search
+            group is called ``group_id``, and the value is corresponding
+            searchable modules. The searchable modules will have the same
+            search space if they are in the same group.
+
+        Returns:
+            dict: Search group.
+        """
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6bc1a953f3c216f89dba46d4df2df1352a71c351
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .channel_mutator import ChannelMutator
+from .dcff_channel_mutator import DCFFChannelMutator
+from .dmcp_channel_mutator import DMCPChannelMutator
+from .one_shot_channel_mutator import OneShotChannelMutator
+from .slimmable_channel_mutator import SlimmableChannelMutator
+
+__all__ = [
+    'SlimmableChannelMutator', 'ChannelMutator', 'OneShotChannelMutator',
+    'DCFFChannelMutator', 'DMCPChannelMutator'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.ipynb b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..58b56c783bad4f4625f3d7a97253d5f4495602b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.ipynb
@@ -0,0 +1,375 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ChannelMutator\n",
+    "A channel mutator is a manager of the channel structure of a model. In other words, it manages all MutableChannelUnits of a model.  \n",
+    "ChannelMutator is the simplest channel mutator. All other channel mutators should inherit from ChannelMutator class. We take ChannelMutator as an example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##  How to Construct a ChannelMutator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Suppose we have a model archtecture defineed below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/liukai/miniconda3/envs/lab2max/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "# define a model\n",
+    "from mmengine.model import BaseModel\n",
+    "from torch import nn\n",
+    "import torch\n",
+    "from collections import OrderedDict\n",
+    "\n",
+    "class MyModel(nn.Module):\n",
+    "\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.net = nn.Sequential(\n",
+    "            OrderedDict([('conv0', nn.Conv2d(3, 8, 3, 1, 1)),\n",
+    "                         ('relu', nn.ReLU()),\n",
+    "                         ('conv1', nn.Conv2d(8, 16, 3, 1, 1))]))\n",
+    "        self.pool = nn.AdaptiveAvgPool2d(1)\n",
+    "        self.head = nn.Linear(16, 1000)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        feature = self.net(x)\n",
+    "        pool = self.pool(feature).flatten(1)\n",
+    "        return self.head(pool)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are two steps to fully constructing a ChannelMutator object as below. \n",
+    "1. we need to initialize a ChannelMutator object.\n",
+    "2. Then we need to init the ChannelMutator object with a model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "11/14 14:24:13 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - add a input before net.conv0(net.conv0), error: net.conv0(net.conv0)\n",
+      "11/14 14:24:13 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - add a output after head(head), error: head(head)\n",
+      "The mutator has 2 mutable channel units.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from mmrazor.models.mutators import ChannelMutator\n",
+    "\n",
+    "model = MyModel()\n",
+    "# initialize a ChannelMutator object\n",
+    "mutator = ChannelMutator(\n",
+    "    channel_unit_cfg=dict(\n",
+    "        type='SequentialMutableChannelUnit',\n",
+    "        default_args=dict(choice_mode='ratio'),\n",
+    "        units={},\n",
+    "    ),\n",
+    "    parse_cfg=dict(\n",
+    "        type='ChannelAnalyzer'))\n",
+    "# init the ChannelMutator object with a model\n",
+    "mutator.prepare_from_supernet(model)\n",
+    "print(f'The mutator has {len(mutator.mutable_units)} mutable channel units.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "ChannelMutator has two arguments:\n",
+    "1. channel_unit_cfg: config of the MutableChannelUnit to use in the ChannelMutator.\n",
+    "2. parse_cfg: the way to parse the model and get MutableChannelUnits."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are there ways to parse model and get MutableChannelUnits.\n",
+    "1. Use a tracer to get MutableChannelUnits automatically.\n",
+    "2. Use config dicts to indicate MutableChannelUnits.\n",
+    "3. Predefine MutableChannels in the model archtecture.\n",
+    "   \n",
+    "The example of method 1 has been post above. We post the examples of method 2 and method 3 below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mutator has 2 mutable channel units.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 2. use config dicts to indicate MutableChannelUnits.\n",
+    "from mmrazor.models.mutators import ChannelMutator\n",
+    "\n",
+    "model = MyModel()\n",
+    "# initialize a ChannelMutator object\n",
+    "mutator = ChannelMutator(\n",
+    "    channel_unit_cfg=dict(\n",
+    "        type='SequentialMutableChannelUnit',\n",
+    "        default_args=dict(choice_mode='ratio'),\n",
+    "        units={\n",
+    "            'net.conv0_(0, 8)_8': {\n",
+    "                'init_args': {\n",
+    "                    'num_channels': 8,\n",
+    "                },\n",
+    "                'channels': {\n",
+    "                    'input_related': [{\n",
+    "                        'name': 'net.conv1',\n",
+    "                    }],\n",
+    "                    'output_related': [{\n",
+    "                        'name': 'net.conv0',\n",
+    "                    }]\n",
+    "                },\n",
+    "                'choice': 1.0\n",
+    "            },\n",
+    "            'net.conv1_(0, 16)_16': {\n",
+    "                'init_args': {\n",
+    "                    'num_channels': 16,\n",
+    "                },\n",
+    "                'channels': {\n",
+    "                    'input_related': [{\n",
+    "                        'name': 'head',\n",
+    "                    }],\n",
+    "                    'output_related': [{\n",
+    "                        'name': 'net.conv1',\n",
+    "                    }]\n",
+    "                },\n",
+    "                'choice': 1.0\n",
+    "            }\n",
+    "        }),\n",
+    "    parse_cfg=dict(type='Config'))\n",
+    "# init the ChannelMutator object with a model\n",
+    "mutator.prepare_from_supernet(model)\n",
+    "print(f'The mutator has {len(mutator.mutable_units)} mutable channel units.')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The mutator has 2 mutable channel units.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 3. Predefine MutableChannels in the model archtecture.\n",
+    "\n",
+    "from mmrazor.models.architectures.dynamic_ops import DynamicConv2d, DynamicLinear\n",
+    "from mmrazor.models.mutables import MutableChannelUnit, MutableChannelContainer, SquentialMutableChannel\n",
+    "from collections import OrderedDict\n",
+    "\n",
+    "class MyDynamicModel(BaseModel):\n",
+    "\n",
+    "    def __init__(self):\n",
+    "        super().__init__(None, None)\n",
+    "        self.net = nn.Sequential(\n",
+    "            OrderedDict([('conv0', DynamicConv2d(3, 8, 3, 1, 1)),\n",
+    "                         ('relu', nn.ReLU()),\n",
+    "                         ('conv1', DynamicConv2d(8, 16, 3, 1, 1))]))\n",
+    "        self.pool = nn.AdaptiveAvgPool2d(1)\n",
+    "        self.head = DynamicLinear(16, 1000)\n",
+    "\n",
+    "        # register MutableChannelContainer\n",
+    "        MutableChannelUnit._register_channel_container(\n",
+    "            self, MutableChannelContainer)\n",
+    "        self._register_mutables()\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        feature = self.net(x)\n",
+    "        pool = self.pool(feature).flatten(1)\n",
+    "        return self.head(pool)\n",
+    "\n",
+    "    def _register_mutables(self):\n",
+    "        mutable1 = SquentialMutableChannel(8)\n",
+    "        mutable2 = SquentialMutableChannel(16)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv0, mutable1, is_to_output_channel=True)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv1, mutable1, is_to_output_channel=False)\n",
+    "\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.net.conv1, mutable2, is_to_output_channel=True)\n",
+    "        MutableChannelContainer.register_mutable_channel_to_module(\n",
+    "            self.head, mutable2, is_to_output_channel=False)\n",
+    "\n",
+    "\n",
+    "model = MyDynamicModel()\n",
+    "# initialize a ChannelMutator object\n",
+    "mutator = ChannelMutator(\n",
+    "    channel_unit_cfg=dict(\n",
+    "        type='SequentialMutableChannelUnit',\n",
+    "        default_args=dict(choice_mode='ratio'),\n",
+    "        units={},\n",
+    "    ),\n",
+    "    parse_cfg=dict(type='Predefined'))\n",
+    "# init the ChannelMutator object with a model\n",
+    "mutator.prepare_from_supernet(model)\n",
+    "print(f'The mutator has {len(mutator.mutable_units)} mutable channel units.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How to Change the Structure of a Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The structure of a model is represented by a dict where the key is the name of a MutableChannelUnit and the value is a structure choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{0: 8, 1: 16}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(mutator.current_choices)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can change the dict to prune the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MyDynamicModel(\n",
+      "  (data_preprocessor): BaseDataPreprocessor()\n",
+      "  (net): Sequential(\n",
+      "    (conv0): DynamicConv2d(\n",
+      "      3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)\n",
+      "      (mutable_attrs): ModuleDict(\n",
+      "        (in_channels): MutableChannelContainer(num_channels=3, activated_channels=3)\n",
+      "        (out_channels): MutableChannelContainer(num_channels=8, activated_channels=4)\n",
+      "      )\n",
+      "    )\n",
+      "    (relu): ReLU()\n",
+      "    (conv1): DynamicConv2d(\n",
+      "      8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)\n",
+      "      (mutable_attrs): ModuleDict(\n",
+      "        (in_channels): MutableChannelContainer(num_channels=8, activated_channels=4)\n",
+      "        (out_channels): MutableChannelContainer(num_channels=16, activated_channels=8)\n",
+      "      )\n",
+      "    )\n",
+      "  )\n",
+      "  (pool): AdaptiveAvgPool2d(output_size=1)\n",
+      "  (head): DynamicLinear(\n",
+      "    in_features=16, out_features=1000, bias=True\n",
+      "    (mutable_attrs): ModuleDict(\n",
+      "      (in_features): MutableChannelContainer(num_channels=16, activated_channels=8)\n",
+      "      (out_features): MutableChannelContainer(num_channels=1000, activated_channels=1000)\n",
+      "    )\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "mutator.set_choices(\n",
+    "    {0: 4, 1: 8}\n",
+    ")\n",
+    "print(model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please refer to our documents for more choices related methods."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.9.13 ('lab2max')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "e31a827d0913016ad78e01c7b97f787f4b9e53102dd62d238e8548bcd97ff875"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..910992e1ead2e4060b14a960903962b8a5087fc1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/channel_mutator.py
@@ -0,0 +1,374 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Any, Dict, Generic, List, Optional, Tuple, Type, Union
+
+from mmengine import fileio
+from torch.nn import Module, ModuleList
+
+from mmrazor.models.mutables import (ChannelUnitType, MutableChannelUnit,
+                                     SequentialMutableChannelUnit)
+from mmrazor.models.mutables.mutable_channel.units.channel_unit import \
+    ChannelUnit
+from mmrazor.models.task_modules.tracer.channel_analyzer import ChannelAnalyzer
+from mmrazor.registry import MODELS, TASK_UTILS
+from ..base_mutator import BaseMutator
+
+
+@MODELS.register_module()
+class ChannelMutator(BaseMutator, Generic[ChannelUnitType]):
+    """ChannelMutator manages the pruning structure of a model.
+
+    Args:
+        channel_unit_cfg (Union[ dict, Type[MutableChannelUnit]], optional):
+            The config of ChannelUnits. When the channel_unit_cfg
+            is a dict, it should follow the template below:
+                channel_unit_cfg = dict(
+                    # type of used MutableChannelUnit
+                    type ='XxxMutableChannelUnit',
+                    # default args for MutableChannelUnit
+                    default_args={},
+                    units = {
+                        # config of a unit
+                        "xxx_unit_name": {},
+                        ...
+                    }
+                ),
+            The config template of 'units' can be got using
+            MutableChannelUnit.config_template()
+            Defaults to SequentialMutableChannelUnit.
+
+        parse_cfg (Dict, optional):
+            The config to parse the model.
+            Defaults to
+                dict(
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='BackwardTracer')
+
+        init_cfg (dict, optional): initialization configuration dict for
+            BaseModule.
+
+    Note:
+        There are three ways used in ChannelMutator to parse a model and
+        get MutableChannelUnits.
+        1. Using tracer. It needs parse_cfg to be the config of the
+        ChannelAnalyzer.
+        2. Using config. When parse_cfg['type']='Config'. It needs that
+        channel_unit_cfg['unit']['xxx_unit_name] has a key 'channels',
+        otherwise tracer is required.
+        3. Using the model with pre-defined dynamic-ops and mutablechannels:
+        When parse_cfg['type']='Predefined'.
+    """
+
+    # init
+
+    def __init__(self,
+                 channel_unit_cfg: Union[
+                     dict,
+                     Type[MutableChannelUnit]] = SequentialMutableChannelUnit,
+                 parse_cfg: Dict = dict(
+                     _scope_='mmrazor',
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='BackwardTracer'),
+                 init_cfg: Optional[Dict] = None) -> None:
+
+        super().__init__(init_cfg)
+
+        # tracer
+        if isinstance(parse_cfg, dict):
+            assert parse_cfg['type'] in [
+                'ChannelAnalyzer', 'Config', 'Predefined'
+            ]
+        self.parse_cfg = parse_cfg
+
+        # units
+        self._name2unit: Dict[str, ChannelUnitType] = {}
+        self.units: ModuleList[ChannelUnitType] = ModuleList()
+
+        # unit config
+        self.channel_unit_cfg = channel_unit_cfg
+        self.unit_class, self.unit_default_args, self.units_cfg = \
+            self._parse_channel_unit_cfg(
+                channel_unit_cfg)
+
+    def prepare_from_supernet(self, supernet: Module) -> None:
+        """Prepare from a model for pruning.
+
+        It includes two steps:
+        1. parse the model and get MutableChannelUnits.
+        2. call unit.prepare_for_pruning for each unit.
+        """
+        from mmrazor.models.utils import get_module_device
+        device = get_module_device(supernet)
+
+        self._name2module = dict(supernet.named_modules())
+
+        if isinstance(self.parse_cfg,
+                      ChannelAnalyzer) or 'Analyzer' in self.parse_cfg['type']:
+            if isinstance(self.parse_cfg,
+                          dict) and 'from_cfg' in self.parse_cfg:
+                units = self._prepare_from_cfg(supernet, self.units_cfg)
+            else:
+                units = self._prepare_from_tracer(supernet, self.parse_cfg)
+        elif self.parse_cfg['type'] == 'Config' \
+                or 'from_cfg' in self.parse_cfg:
+            units = self._prepare_from_cfg(supernet, self.units_cfg)
+        elif self.parse_cfg['type'] == 'Predefined':
+            units = self._prepare_from_predefined_model(supernet)
+        else:
+            raise NotImplementedError()
+        for i in range(len(units)):
+            units[i] = units[i].to(device)
+            units[i].prepare_for_pruning(supernet)
+            self._name2unit[units[i].name] = units[i]
+
+        self.units = ModuleList(units)
+
+    @property
+    def mutable_units(self) -> List[ChannelUnitType]:
+        """Prunable units."""
+        return [unit for unit in self.units if unit.is_mutable]
+
+    def config_template(self,
+                        only_mutable_units=False,
+                        with_unit_init_args=False,
+                        with_channels=False):
+        """Config template of the mutator.
+
+        Args:
+            only_mutable_units (bool, optional): Whether only return config of
+                prunable units. It can omit unmutable MutableChannelUnits
+                to decrease the length of the config. Defaults to False.
+            with_unit_init_args (bool, optional): Whether return init_args of
+                units. Let it be true, when you want to change the init
+                args of units. Defaults to False.
+            with_channels (bool, optional): Whether return channel info.
+                The channel info can initialization the units without
+                tracer. When you want to prune your model without a
+                tracer next time, let it be true. Defaults to False.
+        Example:
+            dict(
+                channel_unit_cfg = dict(
+                    # type of used MutableChannelUnit
+                    type ='XxxMutableChannelUnit',
+                    # default args for MutableChananelUnit
+                    default_args={},
+                    # config of units
+                    units = {
+                        # config of a unit
+                        "xxx_unit_name": {
+                            'init_args':{}, # if with_unit_init_args
+                            'channels':{} # if with_channels
+                        },
+                        ...
+                    }
+                ),
+                # config of tracer
+                parse_cfg={}
+            )
+
+
+        About the detail of the config of each unit, please refer to
+        MutableChannelUnit.config_template()
+        """
+        # template of units
+        units = self.mutable_units if only_mutable_units else self.units
+        units_template = {}
+        for unit in units:
+            units_template[unit.name] = unit.config_template(
+                with_init_args=with_unit_init_args,
+                with_channels=with_channels)
+
+        # template of mutator
+        template = dict(
+            type=str(self.__class__.__name__),
+            channel_unit_cfg=dict(
+                type=str(self.unit_class.__name__),
+                default_args=self.unit_default_args,
+                units=units_template),
+            parse_cfg=self.parse_cfg)
+
+        return template
+
+    def fix_channel_mutables(self):
+        """Fix ChannelMutables."""
+        for unit in self.units:
+            unit.fix_chosen()
+
+    # choice manage
+
+    def sample_choices(self, kind: str = 'random') -> Dict[str, Any]:
+        """Sampling by search groups.
+
+        The sampling result of the first mutable of each group is the sampling
+        result of this group.
+
+        Returns:
+            Dict[int, Any]: Random choices dict.
+        """
+        assert kind == 'random', f'unsupported the {kind} sample method.'
+        template = self.choice_template
+        for key in template:
+            template[key] = self._name2unit[key].sample_choice()
+        return template
+
+    def set_choices(self, choices: Dict[str, Any]) -> None:
+        """Set mutables' current choice according to choices sample by
+        :func:`sample_choices`.
+
+        Args:
+            choices (Dict[int, Any]): Choices dict. The key is group_id in
+                search groups, and the value is the sampling results
+                corresponding to this group.
+        """
+        for name, choice in choices.items():
+            unit = self._name2unit[name]
+            unit.current_choice = choice
+
+    @property
+    def current_choices(self) -> Dict:
+        """Get current choices."""
+        config = self.choice_template
+        for unit in self.mutable_units:
+            config[unit.name] = unit.current_choice
+        return config
+
+    @property
+    def choice_template(self) -> Dict:
+        """Get the chocie template of the Mutator.
+
+        Example:
+            {
+                'xxx_unit_name': xx_choice_value,
+                ...
+            }
+        """
+        template = {}
+        for unit in self.mutable_units:
+            template[unit.name] = unit.current_choice
+        return template
+
+    @property
+    def search_groups(self) -> Dict[int, List]:
+        """Search group of the supernet.
+
+        Note:
+            Search group is different from search space. The key of search
+            group is called ``group_id``, and the value is corresponding
+            searchable modules. The searchable modules will have the same
+            search space if they are in the same group.
+
+        Returns:
+            dict: Search group.
+        """
+        return self._search_groups
+
+    # private methods
+
+    def _convert_channel_unit_to_mutable(self, units: List[ChannelUnit]):
+        """Convert ChannelUnits to MutableChannelUnits."""
+        mutable_units = []
+        for unit in units:
+            args = copy.copy(self.unit_default_args)
+            if unit.name in self.units_cfg and \
+                    'init_args' in self.units_cfg[unit.name]:
+                args = self.units_cfg[unit.name]['init_args']
+            mutable_unit = self.unit_class.init_from_channel_unit(unit, args)
+            mutable_units.append(mutable_unit)
+        return mutable_units
+
+    def _parse_channel_unit_cfg(
+            self,
+            channel_unit_cfg) -> Tuple[Type[ChannelUnitType], Dict, Dict]:
+        """Parse channel_unit_cfg."""
+        if isinstance(channel_unit_cfg, dict):
+            unit_class = MODELS.module_dict[channel_unit_cfg['type']]
+
+            default_unit_args = channel_unit_cfg[
+                'default_args'] if 'default_args' in channel_unit_cfg else {}
+
+            unit_init_cfg = channel_unit_cfg[
+                'units'] if 'units' in channel_unit_cfg else {}
+            if isinstance(unit_init_cfg, str):
+                # load config file
+                unit_init_cfg = fileio.load(unit_init_cfg)
+        elif issubclass(channel_unit_cfg, MutableChannelUnit):
+            unit_class = channel_unit_cfg
+            default_unit_args = {}
+            unit_init_cfg = {}
+        else:
+            raise NotImplementedError()
+        return unit_class, default_unit_args, unit_init_cfg
+
+    def _prepare_from_tracer(self, model: Module, parse_cfg: Dict):
+        """Initialize units using a tracer."""
+
+        if isinstance(parse_cfg, Dict):
+            tracer: ChannelAnalyzer = TASK_UTILS.build(parse_cfg)
+        else:
+            tracer = parse_cfg
+        unit_configs = tracer.analyze(model)
+
+        # get ChannelUnits
+        units = [
+            ChannelUnit.init_from_cfg(model, cfg)
+            for cfg in unit_configs.values()
+        ]
+        # convert to MutableChannelUnits
+        units = self._convert_channel_unit_to_mutable(units)
+        return units
+
+    def _prepare_from_cfg(self, model, config: Dict):
+        """Initialize units using config dict."""
+        assert isinstance(self.channel_unit_cfg, dict)
+        assert 'units' in self.channel_unit_cfg
+        config = self.channel_unit_cfg['units']
+        if isinstance(config, str):
+            config = fileio.load(config)
+        assert isinstance(config, dict)
+
+        if 'Analyzer' in self.parse_cfg['type']:
+            self.parse_cfg.pop('from_cfg')
+            tracer = TASK_UTILS.build(self.parse_cfg)
+            unit_configs = tracer.analyze(model)
+
+        units = []
+        for unit_key in config:
+            init_args = copy.deepcopy(self.unit_default_args)
+            if 'init_args' in config[unit_key]:
+                init_args.update(config[unit_key]['init_args'])
+            config[unit_key]['init_args'] = init_args
+            if 'channels' in config[unit_key]:
+                unit = self.unit_class.init_from_cfg(model, config[unit_key])
+                unit.name = unit_key
+            else:
+                try:
+                    unit = self._prepare_unit_from_init_cfg(
+                        model, config[unit_key], unit_configs[unit_key])
+                except ValueError:
+                    raise ValueError(
+                        'Initializing channel_mutator from the config needs'
+                        'to include `channels` or `Analyzer` in the config.')
+            units.append(unit)
+        return units
+
+    def _prepare_unit_from_init_cfg(self, model: Module, channel_cfg: dict,
+                                    init_cfg: dict):
+        """Initialize units using the init_cfg, which created by tracer."""
+        unit = ChannelUnit.init_from_cfg(model, init_cfg)
+        unit = self._convert_channel_unit_to_mutable([unit])[0]
+        if 'choice' in channel_cfg:
+            unit.current_choice = channel_cfg['choice']
+        return unit
+
+    def _prepare_from_predefined_model(self, model: Module):
+        """Initialize units using the model with pre-defined dynamicops and
+        mutable-channels."""
+
+        units = self.unit_class.init_from_predefined_model(model)
+
+        for unit in units:
+            unit.unit_predefined = self.unit_default_args.pop(
+                'unit_predefined', False)
+        return units
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dcff_channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dcff_channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..5994eade4d8a5f6ce3e47198003f28618e1dedf6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dcff_channel_mutator.py
@@ -0,0 +1,47 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Type, Union
+
+from mmrazor.models.architectures.dynamic_ops import FuseConv2d
+from mmrazor.models.mutables import DCFFChannelUnit
+from mmrazor.registry import MODELS
+from .channel_mutator import ChannelMutator, ChannelUnitType
+
+
+@MODELS.register_module()
+class DCFFChannelMutator(ChannelMutator[DCFFChannelUnit]):
+    """DCFF channel mutable based channel mutator. It uses DCFFChannelUnit.
+
+    Args:
+        channel_unit_cfg (Union[dict, Type[ChannelUnitType]], optional):
+            Config of MutableChannelUnits. Defaults to
+            dict( type='DCFFChannelUnit', units={}).
+        parse_cfg (Dict): The config of the tracer to parse the model.
+            Defaults to dict( type='BackwardTracer',
+                loss_calculator=dict(type='ImageClassifierPseudoLoss')).
+            Change loss_calculator according to task and backbone.
+    """
+
+    def __init__(self,
+                 channel_unit_cfg: Union[dict, Type[ChannelUnitType]] = dict(
+                     type='DCFFChannelUnit', units={}),
+                 parse_cfg=dict(
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='BackwardTracer'),
+                 **kwargs) -> None:
+        super().__init__(channel_unit_cfg, parse_cfg, **kwargs)
+
+    def calc_information(self, tau: float):
+        """Calculate channel's kl and apply softmax pooling on channel to solve
+        CUDA out of memory problem. KL calculation & pool are conducted in ops.
+
+        Args:
+            tau (float): temporature calculated by iter or epoch
+        """
+        # Calculate the filter importance of the current epoch.
+        for layerid, unit in enumerate(self.units):
+            for channel in unit.output_related:
+                if isinstance(channel.module, FuseConv2d):
+                    layeri_softmaxp = channel.module.get_pooled_channel(tau)
+                    # update fuseconv op's selected layeri_softmax
+                    channel.module.set_forward_args(choice=layeri_softmaxp)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dmcp_channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dmcp_channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..de7bbc405bc20616e71d634fec5d7aa356a075d2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/dmcp_channel_mutator.py
@@ -0,0 +1,178 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+from typing import Any, Dict, Optional, Tuple, Type, Union
+
+import torch
+import torch.nn as nn
+from torch.nn import Module
+
+from mmrazor.models.mutables import DMCPChannelUnit
+from mmrazor.registry import MODELS
+from ...architectures import DMCPBatchNorm2d
+from .channel_mutator import ChannelMutator, ChannelUnitType
+
+
+@MODELS.register_module()
+class DMCPChannelMutator(ChannelMutator[DMCPChannelUnit]):
+    """DMCP channel mutable based channel mutator. It uses DMCPPChannelUnit.
+
+    Args:
+        channel_unit_cfg (Union[dict, Type[ChannelUnitType]], optional):
+            Config of MutableChannelUnits. Defaults to
+            dict( type='DMCPPChannelUnit', units={}).
+        parse_cfg (Dict): The config of the tracer to parse the model.
+            Defaults to dict( type='BackwardTracer',
+                loss_calculator=dict(type='ImageClassifierPseudoLoss')).
+            Change loss_calculator according to task and backbone.
+        pruning_cfg (Tuple): (min_sample_rate, max_sample_rate, sample_offset)
+    """
+
+    def __init__(self,
+                 channel_unit_cfg: Union[dict, Type[ChannelUnitType]] = dict(
+                     type='DMCPChannelUnit', units={}),
+                 parse_cfg: Dict = dict(
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='BackwardTracer'),
+                 pruning_cfg=(0.1, 1, 0.05),
+                 **kwargs) -> None:
+        super().__init__(channel_unit_cfg, parse_cfg, **kwargs)
+        self.pruning_cfg = pruning_cfg
+
+    def prepare_from_supernet(self, supernet: Module) -> None:
+        """Prepare from a model for pruning.
+
+        It includes two steps:
+        1. parse the model and get MutableChannelUnits.
+        2. call unit.prepare_for_pruning for each unit.
+        """
+        super().prepare_from_supernet(supernet)
+        self.prepare_arch_params(supernet)
+
+    def _build_arch_param(self, num_choices) -> nn.Parameter:
+        """Build learnable architecture parameters."""
+        return nn.Parameter(torch.zeros(num_choices))
+
+    def prepare_arch_params(self, supernet: Module) -> None:
+        """Prepare the arch parameters and associate them with the
+        corresponding op."""
+        self.arch_params = nn.ParameterDict()
+        self._op_arch_align = dict()
+        self._arch_params_attr = dict()
+        for group_id, module in enumerate(self.units):
+            arch_message = self._generate_arch_message(
+                module.mutable_channel.num_channels)
+            self._arch_params_attr[str(group_id)] = arch_message
+            group_arch_param = self._build_arch_param(arch_message[1])
+            self.arch_params[str(group_id)] = group_arch_param
+
+            for unit in module.output_related:
+                self._op_arch_align[str(unit.name)] = str(group_id)
+
+        self._bn_arch_align = dict()
+        for name, module in supernet.named_modules():
+            if isinstance(module, DMCPBatchNorm2d):
+                self._bn_arch_align[module] = self._op_arch_align[str(name)]
+
+    def _generate_arch_message(self, out_channels: int) -> tuple:
+        """
+        Define the search space of the channel according to the pruning
+        rate range, where the search space consists of two parts
+            1. sampled by pruning rate (that is, maximum, minimum and random
+                pruning rate)
+            2. sampled by probability
+        Inputs:
+            out_channels (int): channel num of conv layers.
+        Outputs:
+            attr (tuple): (group_size, num_groups, min_ch)
+        """
+        (min_rate, max_rate, rate_offset) = self.pruning_cfg
+
+        # sampled by probability
+        group_size = int(rate_offset * out_channels / max_rate)
+        num_groups = int((max_rate - min_rate) / rate_offset + 1e-4)
+        min_ch = out_channels - (group_size * num_groups)
+        assert min_ch > 0
+        assert group_size * num_groups + min_ch == out_channels
+
+        return (group_size, num_groups, min_ch)
+
+    def modify_supernet_forward(self, arch_train: str) -> None:
+        """According to the arch_train, assign the arch parameter to the
+        forward of the corresponding op."""
+        for module, group_id in self._bn_arch_align.items():
+            arch_param: Optional[nn.Parameter] = None
+            arch_params_attr: Optional[Tuple] = None
+            if arch_train:
+                arch_param = self.arch_params[self._bn_arch_align[module]]
+                arch_params_attr = self._arch_params_attr[str(group_id)]
+            module.set_forward_args(
+                arch_param=arch_param, arch_attr=arch_params_attr)
+
+    def sample_subnet(self, mode: str, arch_train: str) -> None:
+        """Sampling according to the input mode."""
+        choices = dict()
+
+        for group_id, _ in enumerate(self.units):
+            choices[str(group_id)] = self._prune_by_arch(mode, group_id)
+        self.set_choices(choices)
+
+        self.modify_supernet_forward(arch_train)
+
+    def _prune_by_arch(self, mode: str, group_id: int) -> Union[int, Any]:
+        """Prune the output channels according to the specified mode.
+
+        Inputs:
+            mode (list): one of ['max', 'min', 'random', 'direct', 'expected']
+            group_id (int): index of units
+
+        Outputs:
+            channels (int): for mode 'max'/'min'/'random'/'dirext'
+            channels (tensor): for mode 'expected'
+        """
+        arch_param = self.arch_params[str(group_id)]
+        (group_size, num_groups, min_ch) =\
+            self._arch_params_attr[str(group_id)]
+
+        if mode == 'max':
+            return min_ch + group_size * num_groups
+        elif mode == 'min':
+            return min_ch
+        elif mode == 'random':
+            return min_ch + group_size * random.randint(0, num_groups)
+        else:
+            if num_groups == 0:
+                return min_ch
+            prob = torch.clamp(arch_param, min=0)
+            condition_prob = torch.exp(-prob)
+            if mode == 'direct':
+                direct_channel = min_ch
+                for i in range(num_groups):
+                    if random.uniform(0, 1) > condition_prob[i]:
+                        break
+                    direct_channel += group_size
+                return direct_channel
+            elif mode == 'expected':
+                marginal_prob = torch.cumprod(condition_prob, dim=0)
+                expected_channel = (torch.sum(marginal_prob) *
+                                    group_size) + min_ch
+                return expected_channel
+            else:
+                raise NotImplementedError
+
+    def set_choices(self, choices: Dict[str, Any]) -> None:
+        """Set mutables' current choice according to choices sample by
+        :func:`sample_choices`.
+
+        Args:
+            choices (Dict[str, Any]): Choices dict. The key is group_id in
+                search groups, and the value is the sampling results
+                corresponding to this group.
+        """
+        for group_id, module in enumerate(self.units):
+            if str(group_id) not in choices.keys():
+                # allow optional target_prune_ratio
+                continue
+            choice = choices[str(group_id)]
+            module.current_choice = choice
+            module.mutable_channel.activated_tensor_channels = choice
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/group_fisher_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/group_fisher_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..cde31bacf837ee126b890653f670b97e703b4c68
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/group_fisher_mutator.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherChannelMutator  # noqa
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/one_shot_channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/one_shot_channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc008b0b8c9af663d088c5286fcd49b3ffa89730
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/one_shot_channel_mutator.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, Type, Union
+
+from mmrazor.models.mutables import OneShotMutableChannelUnit
+from mmrazor.registry import MODELS
+from .channel_mutator import ChannelMutator, ChannelUnitType
+
+
+@MODELS.register_module()
+class OneShotChannelMutator(ChannelMutator[OneShotMutableChannelUnit]):
+    """OneShotChannelMutator based on ChannelMutator. It use
+    OneShotMutableChannelUnit by default.
+
+    Args:
+        channel_unit_cfg (Union[dict, Type[ChannelUnitType]], optional):
+            Config of MutableChannelUnits. Defaults to
+            dict( type='OneShotMutableChannelUnit',
+            default_args=dict( num_blocks=8, min_blocks=2 ) ).
+    """
+
+    def __init__(self,
+                 channel_unit_cfg: Union[dict, Type[ChannelUnitType]] = dict(
+                     type='OneShotMutableChannelUnit',
+                     default_args=dict(num_blocks=8, min_blocks=2)),
+                 **kwargs) -> None:
+
+        super().__init__(channel_unit_cfg, **kwargs)
+
+    @property
+    def max_choices(self) -> Dict:
+        """Get max choice for each unit in choice_template."""
+        max_choices = copy.deepcopy(self.choice_template)
+        for key in self.choice_template:
+            max_choices[key] = self._name2unit[key].max_choice
+        return max_choices
+
+    @property
+    def min_choices(self) -> Dict:
+        """Get min choice for each unit in choice_template."""
+        min_choices = copy.deepcopy(self.choice_template)
+        for key in self.choice_template:
+            min_choices[key] = self._name2unit[key].min_choice
+        return min_choices
+
+    def sample_choices(self, kind: str = 'random') -> Dict:
+        """Sample choice for each unit in choice_template."""
+        choices = copy.deepcopy(self.choice_template)
+        for key in self.choice_template:
+            if kind == 'max':
+                choices[key] = self._name2unit[key].max_choice
+            elif kind == 'min':
+                choices[key] = self._name2unit[key].min_choice
+            elif kind == 'random':
+                choices[key] = self._name2unit[key].sample_choice()
+            else:
+                raise NotImplementedError()
+        return choices
+
+    def set_max_choices(self):
+        """Set max choice for each unit in choice_template."""
+        for name, choice in self.max_choices.items():
+            unit = self._name2unit[name]
+            unit.current_choice = choice
+
+    def set_min_choices(self):
+        """Set min choice for each unit in choice_template."""
+        for name, choice in self.min_choices.items():
+            unit = self._name2unit[name]
+            unit.current_choice = choice
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/slimmable_channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/slimmable_channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..c3da419bf194e7cfa13209d1c778a43655418a41
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/channel_mutator/slimmable_channel_mutator.py
@@ -0,0 +1,82 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional
+
+from mmrazor.models.mutables import SlimmableChannelUnit
+from mmrazor.registry import MODELS
+from .channel_mutator import ChannelMutator
+
+
+@MODELS.register_module()
+class SlimmableChannelMutator(ChannelMutator[SlimmableChannelUnit]):
+    """SlimmableChannelMutator is the default ChannelMutator for
+    SlimmableNetwork algorithm.
+
+    Args:
+        channel_unit_cfg (Dict): The config of ChannelUnits. Defaults to
+            dict( type='SlimmableChannelUnit', units={}).
+        parse_cfg (Dict): The config of the tracer to parse the model.
+            Defaults to dict( type='BackwardTracer',
+                loss_calculator=dict(type='ImageClassifierPseudoLoss')).
+        init_cfg (dict, optional): initialization configuration dict for
+            BaseModule.
+    """
+
+    def __init__(self,
+                 channel_unit_cfg=dict(type='SlimmableChannelUnit', units={}),
+                 parse_cfg=dict(
+                     type='ChannelAnalyzer',
+                     demo_input=(1, 3, 224, 224),
+                     tracer_type='BackwardTracer'),
+                 init_cfg: Optional[Dict] = None) -> None:
+
+        super().__init__(channel_unit_cfg, parse_cfg, init_cfg)
+
+        self.subnets = self._prepare_subnets(self.units_cfg)
+
+    def set_choices(self, config: Dict[str, float]):  # type: ignore[override]
+        """Set choices."""
+        for name, choice in config.items():
+            unit = self._name2unit[name]
+            unit.current_choice = choice
+
+    def sample_choices(self):
+        """Sample choices(pruning structure)."""
+        raise RuntimeError
+
+    # private methods
+
+    def _prepare_subnets(self, unit_cfg: Dict) -> List[Dict[str, int]]:
+        """Prepare subnet config.
+
+        Args:
+            unit_cfg (Dict[str, Dict[str]]): Config of the units.
+                unit_cfg follows the below template:
+                    {
+                        'xx_unit_name':{
+                            'init_args':{
+                                'candidate_choices':[c1,c2,c3...],...
+                            },...
+                        },...
+                    }
+                Every unit must have the same number of candidate_choices, and
+                the candidate in the list of candidate_choices with the same
+                position compose a subnet.
+
+        Returns:
+            List[Dict[str, int]]: config of the subnets.
+        """
+        subnets: List[Dict[str, int]] = []
+        num_subnets = 0
+        for key in unit_cfg:
+            num_subnets = len(unit_cfg[key]['init_args']['candidate_choices'])
+            break
+        for _ in range(num_subnets):
+            subnets.append({})
+        for key in unit_cfg:
+            assert num_subnets == len(
+                unit_cfg[key]['init_args']['candidate_choices'])
+            for i, value in enumerate(
+                    unit_cfg[key]['init_args']['candidate_choices']):
+                subnets[i][key] = value
+
+        return subnets
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/group_mixin.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/group_mixin.py
new file mode 100644
index 0000000000000000000000000000000000000000..569f01ebc88773e1c7228c4821d3b22f2446dd91
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/group_mixin.py
@@ -0,0 +1,261 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import Counter
+from typing import Dict, List
+
+from torch.nn import Module
+
+from mmrazor.models.mutables import MutableValue
+from mmrazor.models.mutables.mutable_module import MutableModule
+from .base_mutator import MUTABLE_TYPE
+
+
+class GroupMixin():
+    """A mixin for :class:`BaseMutator`, which can group mutables by
+    ``custom_group`` and ``alias``(see more information in
+    :class:`MUTABLE_TYPE`). Grouping by alias and module name are both
+    supported.
+
+    Note:
+        Apart from user-defined search group, all other searchable
+        modules(mutable) will be grouped separately.
+
+        The main difference between using alias and module name for
+        grouping is that the alias is One-to-Many while the module
+        name is One-to-One.
+
+        When using both alias and module name in `custom_group`, the
+        priority of alias is higher than that of module name.
+
+        If alias is set in `custom_group`, then its corresponding module
+        name should not be in the `custom_group`.
+
+        Moreover, there should be no duplicate keys in the `custom_group`.
+
+    Example:
+        >>> import torch
+        >>> from mmrazor.models import DiffModuleMutator
+
+        >>> # Assume that a toy model consists of three mutables
+        >>> # whose name are op1,op2,op3. The corresponding
+        >>> # alias names of the three mutables are a1, a1, a2.
+        >>> model = ToyModel()
+
+        >>> # Using alias for grouping
+        >>> mutator = DiffModuleMutator(custom_group=[['a1'], ['a2']])
+        >>> mutator.prepare_from_supernet(model)
+        >>> mutator.search_groups
+        {0: [op1, op2], 1: [op3]}
+
+        >>> # Using module name for grouping
+        >>> mutator = DiffModuleMutator(custom_group=[['op1', 'op2'], ['op3']])
+
+        >>> # Using module name for grouping
+        >>> mutator.prepare_from_supernet(model)
+        >>> mutator.search_groups
+        {0: [op1, op2], 1: [op3]}
+
+        >>> # Using both alias and module name for grouping
+        >>> mutator = DiffModuleMutator(custom_group=[['a2'], ['op2']])
+        >>> mutator.prepare_from_supernet(model)
+        >>> # The last operation would be grouped
+        >>> mutator.search_groups
+        {0: [op3], 1: [op2], 2: [op1]}
+
+    """
+
+    def is_supported_mutable(self, module):
+        """Judge whether is a supported mutable."""
+        for mutable_type in [MutableModule, MutableValue]:
+            if isinstance(module, mutable_type):
+                return True
+        return False
+
+    def _build_name_mutable_mapping(
+            self, supernet: Module) -> Dict[str, MUTABLE_TYPE]:
+        """Mapping module name to mutable."""
+        name2mutable: Dict[str, MUTABLE_TYPE] = dict()
+        for name, module in supernet.named_modules():
+            if self.is_supported_mutable(module):
+                name2mutable[name] = module
+            elif hasattr(module, 'source_mutables'):
+                for each_mutable in module.source_mutables:
+                    if self.is_supported_mutable(each_mutable):
+                        name2mutable[name] = each_mutable
+
+        self._name2mutable = name2mutable
+
+        return name2mutable
+
+    def _build_alias_names_mapping(self,
+                                   supernet: Module) -> Dict[str, List[str]]:
+        """Mapping alias to module names."""
+        alias2mutable_names: Dict[str, List[str]] = dict()
+
+        def _append(key, dict, name):
+            if key not in dict:
+                dict[key] = [name]
+            else:
+                dict[key].append(name)
+
+        for name, module in supernet.named_modules():
+            if self.is_supported_mutable(module):
+                if module.alias is not None:
+                    _append(module.alias, alias2mutable_names, name)
+            elif hasattr(module, 'source_mutables'):
+                for each_mutable in module.source_mutables:
+                    if self.is_supported_mutable(each_mutable):
+                        if each_mutable.alias is not None:
+                            _append(each_mutable.alias, alias2mutable_names,
+                                    name)
+
+        return alias2mutable_names
+
+    def build_search_groups(
+            self, supernet: Module,
+            custom_groups: List[List[str]]) -> Dict[str, List[MUTABLE_TYPE]]:
+        """Build search group with ``custom_group`` and ``alias``(see more
+        information in :class:`MUTABLE_TYPE`). Grouping by alias and module
+        name are both supported.
+
+        Args:
+            supernet (:obj:`torch.nn.Module`): The supernet to be searched
+                in your algorithm.
+            support_mutables (Type): Mutable type that can be grouped.
+            custom_group (list, optional): User-defined search groups.
+                All searchable modules that are not in ``custom_group`` will be
+                grouped separately.
+
+        Return:
+            search_groups (Dict[str, List[MUTABLE_TYPE]]): The built
+                search_groups.
+        """
+        name2mutable: Dict[
+            str, MUTABLE_TYPE] = self._build_name_mutable_mapping(supernet)
+        alias2mutable_names = self._build_alias_names_mapping(supernet)
+
+        # Check whether the custom group is valid
+        if len(custom_groups) > 0:
+            self._check_valid_groups(alias2mutable_names, name2mutable,
+                                     custom_groups)
+
+        # Construct search_groups based on user-defined group
+        search_groups: Dict[str, List[MUTABLE_TYPE]] = dict()
+
+        current_group_nums = 0
+        grouped_mutable_names: List[str] = list()
+        grouped_alias: List[str] = list()
+        for group in custom_groups:
+            group_mutables = list()
+            for item in group:
+                if item in alias2mutable_names:
+                    # if the item is from alias name
+                    mutable_names: List[str] = alias2mutable_names[item]
+                    grouped_alias.append(item)
+                    group_mutables.extend(
+                        [name2mutable[n] for n in mutable_names])
+                    grouped_mutable_names.extend(mutable_names)
+                else:
+                    # if the item is in name2mutable
+                    group_mutables.append(name2mutable[item])
+                    grouped_mutable_names.append(item)
+
+            # TODO: fix prefix when constructing custom groups.
+            prefix = name2mutable[item].mutable_prefix
+            group_name = prefix + '_' + str(current_group_nums)
+            search_groups[group_name] = group_mutables
+            current_group_nums += 1
+
+        # Construct search_groups based on alias
+        for alias, mutable_names in alias2mutable_names.items():
+            if alias not in grouped_alias:
+                # Check whether all current names are already grouped
+                flag_all_grouped = True
+                for mutable_name in mutable_names:
+                    if mutable_name not in grouped_mutable_names:
+                        flag_all_grouped = False
+
+                # If not all mutables are already grouped
+                if not flag_all_grouped:
+                    prefix = name2mutable[mutable_names[0]].mutable_prefix
+                    group_name = prefix + '_' + str(current_group_nums)
+                    search_groups[group_name] = []
+                    for mutable_name in mutable_names:
+                        if mutable_name not in grouped_mutable_names:
+                            search_groups[group_name].append(
+                                name2mutable[mutable_name])
+                            grouped_mutable_names.append(mutable_name)
+                    current_group_nums += 1
+
+        # check whether all the mutable objects are in the search_groups
+        for name, module in supernet.named_modules():
+            if self.is_supported_mutable(module):
+                if name in grouped_mutable_names:
+                    continue
+                else:
+                    prefix = module.mutable_prefix
+                    group_name = prefix + '_' + str(current_group_nums)
+                    search_groups[group_name] = [module]
+                    current_group_nums += 1
+            elif hasattr(module, 'source_mutables'):
+                for each_mutable in module.source_mutables:
+                    if self.is_supported_mutable(each_mutable):
+                        if name in grouped_mutable_names:
+                            continue
+                        else:
+                            prefix = each_mutable.mutable_prefix
+                            group_name = prefix + '_' + str(current_group_nums)
+                            search_groups[group_name] = [each_mutable]
+                            current_group_nums += 1
+
+        grouped_counter = Counter(grouped_mutable_names)
+
+        # find duplicate keys
+        duplicate_keys = list()
+        for key, count in grouped_counter.items():
+            if count > 1:
+                duplicate_keys.append(key)
+
+        assert len(grouped_mutable_names) == len(
+            list(set(grouped_mutable_names))), \
+            'There are duplicate keys in grouped mutable names. ' \
+            f'The duplicate keys are {duplicate_keys}. ' \
+            'Please check if there are duplicate keys in the `custom_group`.'
+
+        return search_groups
+
+    def _check_valid_groups(self, alias2mutable_names: Dict[str, List[str]],
+                            name2mutable: Dict[str, MUTABLE_TYPE],
+                            custom_group: List[List[str]]) -> None:
+        """Check if all keys are legal."""
+        aliases = [*alias2mutable_names.keys()]
+        module_names = [*name2mutable.keys()]
+
+        expanded_custom_group: List[str] = [
+            _ for group in custom_group for _ in group
+        ]
+        legal_keys: List[str] = [*aliases, *module_names]
+
+        for key in expanded_custom_group:
+            if key not in legal_keys:
+                raise AssertionError(
+                    f'The key: {key} in `custom_group` is not legal. '
+                    f'Legal keys are: {legal_keys}. '
+                    'Make sure that the keys are either alias or mutable name')
+
+        # when the mutable has alias attribute, the corresponding module
+        # name should not be used in `custom_group`.
+        used_aliases = list()
+        for group in custom_group:
+            for key in group:
+                if key in aliases:
+                    used_aliases.append(key)
+
+        for alias_key in used_aliases:
+            mutable_names: List = alias2mutable_names[alias_key]
+            # check whether module name is in custom group
+            for mutable_name in mutable_names:
+                if mutable_name in expanded_custom_group:
+                    raise AssertionError(
+                        f'When a mutable is set alias attribute :{alias_key},'
+                        f'the corresponding module name {mutable_name} should '
+                        f'not be used in `custom_group` {custom_group}.')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/nas_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/nas_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..4636a899c298d360cb3379adaaabe857b785eba9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/mutators/nas_mutator.py
@@ -0,0 +1,260 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from mmengine.model import ModuleList
+from torch.nn import Module
+
+from mmrazor.models.architectures.dynamic_ops.mixins import DynamicChannelMixin
+from mmrazor.models.mutables.mutable_module import MutableModule
+from mmrazor.registry import MODELS
+from .base_mutator import MUTABLE_TYPE, BaseMutator
+from .group_mixin import GroupMixin
+
+
+@MODELS.register_module()
+class NasMutator(BaseMutator[MUTABLE_TYPE], GroupMixin):
+    """The base class for mutable based mutator.
+
+    Args:
+        custom_groups (list[list[str]], optional): User-defined search groups.
+            All searchable modules that are not in ``custom_group`` will be
+            grouped separately.
+    """
+
+    def __init__(self,
+                 custom_groups: Optional[List[List[str]]] = None,
+                 init_cfg: Optional[Dict] = None) -> None:
+        super().__init__(init_cfg)
+
+        if custom_groups is None:
+            custom_groups = []
+        self._custom_groups = custom_groups
+        self._search_groups: Optional[Dict[str, List[MUTABLE_TYPE]]] = None
+
+    def prepare_from_supernet(self, supernet: Module) -> None:
+        """Do some necessary preparations with supernet.
+
+        Note:
+            For mutable based mutator, we need to build search group first.
+
+        Args:
+            supernet (:obj:`torch.nn.Module`): The supernet to be searched
+                in your algorithm.
+        """
+        self._search_groups = dict()
+
+        # prepare for channel mutables
+        if self.has_channel(supernet):
+            units = self._prepare_from_predefined_model(supernet)
+            self.mutable_units = [unit for unit in units if unit.is_mutable]
+
+            _channel_groups = dict()
+            for id, unit in enumerate(ModuleList(self.mutable_units)):
+                _channel_groups['channel' + '_' + str(id)] = [unit]
+            self._search_groups.update(_channel_groups)
+        else:
+            self.mutable_units = []
+
+        # prepare for value mutables
+        _value_groups: Dict[str, List[MUTABLE_TYPE]] = \
+            self.build_search_groups(supernet, self._custom_groups)
+        self._search_groups.update(_value_groups)
+
+    def prepare_arch_params(self):
+        """This function will build searchable params for each layer, which are
+        generally used in differentiable search algorithms, such as Darts'
+        series.
+
+        Each name corresponds to an search param, so the Mutables with the same
+        name share the same search param.
+        """
+        self._arch_params = nn.ParameterDict()
+
+        for name, mutables in self.search_groups.items():
+            if isinstance(mutables[0], MutableModule):
+                self._arch_params[name] = nn.Parameter(
+                    torch.randn(mutables[0].num_choices) * 1e-3)
+
+        self._modify_supernet_forward()
+
+    def has_channel(self, supernet):
+        """Whether to build channel space."""
+        for module in supernet.modules():
+            if isinstance(module, DynamicChannelMixin):
+                if module.get_mutable_attr('out_channels') or \
+                        module.get_mutable_attr('in_channels'):
+                    return True
+        return False
+
+    @property
+    def search_groups(self) -> Dict[str, List[MUTABLE_TYPE]]:
+        """Search group of supernet.
+
+        Note:
+            For mutable based mutator, the search group is composed of
+            corresponding mutables.
+
+        Raises:
+            RuntimeError: Called before search group has been built.
+
+        Returns:
+            Dict[int, List[MUTABLE_TYPE]]: Search group.
+        """
+        if self._search_groups is None:
+            raise RuntimeError(
+                'Call `prepare_from_supernet` first to get the search space.')
+        return self._search_groups
+
+    @property
+    def arch_params(self) -> nn.ParameterDict:
+        """Search params of supernet.
+
+        Note:
+            For mutable based mutator, the search group is composed of
+            corresponding mutables.
+
+        Raises:
+            RuntimeError: Called before search group has been built.
+
+        Returns:
+            Dict[int, List[MUTABLE_TYPE]]: Search group.
+        """
+        if self._arch_params is None:
+            raise RuntimeError(
+                'Call `prepare_arch_params` first to get the search params.')
+        return self._arch_params
+
+    def _prepare_from_predefined_model(self, model: Module):
+        """Initialize units using the model with pre-defined dynamic-ops and
+        mutable-channels."""
+        from mmrazor.models.mutables import OneShotMutableChannelUnit
+
+        self._name2unit: Dict = {}
+        units = OneShotMutableChannelUnit.init_from_predefined_model(model)
+
+        for unit in units:
+            unit.current_choice = unit.max_choice
+            self._name2unit[unit.name] = unit
+
+        return units
+
+    def _modify_supernet_forward(self):
+        """Modify the DiffMutableModule's default arch_param in forward.
+
+        In MMRazor, the `DiffMutableModule` needs `arch_param` in the forward.
+        Here we use partial function to assign the corresponding `arch_param`
+        to each `DiffMutableModule`.
+        """
+        for name, mutables in self.search_groups.items():
+            for mutable in mutables:
+                if isinstance(mutable, MutableModule):
+                    mutable.set_forward_args(arch_param=self.arch_params[name])
+
+    # choice manage
+
+    def sample_choices(self, kind='random') -> Dict:
+        """Random sample choices by search space."""
+        choices = dict()
+        for name, mutables in self.search_groups.items():
+            if hasattr(self,
+                       'arch_params') and name in self.arch_params.keys():
+                arch_param = self.arch_params[name]
+                choices[name] = mutables[0].sample_choice(arch_param)
+            else:
+                if kind == 'max':
+                    choices[name] = mutables[0].max_choice
+                elif kind == 'min':
+                    choices[name] = mutables[0].min_choice
+                elif kind == 'random':
+                    choices[name] = mutables[0].sample_choice()
+                else:
+                    raise NotImplementedError()
+        return choices
+
+    def set_choices(self, choices: Dict) -> None:
+        """Set choices for each mutable in search space."""
+        for name, mutables in self.search_groups.items():
+            choice = choices[name]
+
+            for mutable in mutables:
+                mutable.current_choice = choice  # type: ignore
+
+    @property
+    def max_choices(self) -> Dict:
+        """Get max choices for each mutable in search space."""
+        max_choices = dict()
+        warned = False
+        for name, mutables in self.search_groups.items():
+            if hasattr(self,
+                       'arch_params') and name in self.arch_params.keys():
+                arch_param = self.arch_params[name]
+                max_choices[name] = mutables[0].sample_choice(arch_param)
+                if not warned:
+                    warnings.warn('mutables with `arch param` detected. '
+                                  'which is not supposed to have max choices. '
+                                  'Sample by arch params instead.')
+                    warned = True
+            else:
+                max_choices[name] = mutables[0].max_choice
+
+        return max_choices
+
+    @property
+    def min_choices(self) -> Dict:
+        """Get min choices for each mutable in search space."""
+        min_choices = dict()
+        warned = False
+        for name, mutables in self.search_groups.items():
+            if hasattr(self,
+                       'arch_params') and name in self.arch_params.keys():
+                arch_param = self.arch_params[name]
+                min_choices[name] = mutables[0].sample_choice(arch_param)
+                if not warned:
+                    warnings.warn('mutables with `arch param` detected. '
+                                  'which is not supposed to have min choices. '
+                                  'Sample by arch params instead.')
+                    warned = True
+            else:
+                min_choices[name] = mutables[0].min_choice
+
+        return min_choices
+
+    @property
+    def current_choices(self) -> Dict:
+        """Get current choices by search space."""
+        current_choices = dict()
+        for name, mutables in self.search_groups.items():
+            current_choices[name] = mutables[0].current_choice
+
+        return current_choices
+
+    def set_max_choices(self):
+        """Set max choices for each mutable in search space."""
+        warned = False
+        for name, mutables in self.search_groups.items():
+            choice = self.max_choices[name]
+            if hasattr(self,
+                       'arch_params') and name in self.arch_params.keys():
+                if not warned:
+                    warnings.warn('mutables with `arch param` detected. '
+                                  '`set_max_choices` is not available for it.')
+                    warned = True
+            for mutable in mutables:
+                mutable.current_choice = choice
+
+    def set_min_choices(self):
+        """Set min choices for each mutable in search space."""
+        warned = False
+        for name, mutables in self.search_groups.items():
+            choice = self.min_choices[name]
+            if hasattr(self,
+                       'arch_params') and name in self.arch_params.keys():
+                if not warned:
+                    warnings.warn('mutables with `arch param` detected. '
+                                  '`set_max_choices` is not available for it.')
+                    warned = True
+            for mutable in mutables:
+                mutable.current_choice = choice
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..84d1677ddf4aedb149dce0531c904b8a23321f02
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base import BaseObserver
+from .lsq import LSQObserver, LSQPerChannelObserver
+from .torch_observers import register_torch_observers
+
+__all__ = [
+    'BaseObserver', 'register_torch_observers', 'LSQObserver',
+    'LSQPerChannelObserver'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/base.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce226cb48058c2064295dfc2a34654fa6ac31add
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/base.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+try:
+    from torch.ao.quantization.observer import UniformQuantizationObserverBase
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    UniformQuantizationObserverBase = get_placeholder('torch>=1.13')
+
+BaseObserver = UniformQuantizationObserverBase
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/lsq.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/lsq.py
new file mode 100644
index 0000000000000000000000000000000000000000..ccab3b0e6bc37a5cf896cc994751b0e65d1c8133
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/lsq.py
@@ -0,0 +1,129 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+
+import torch
+import torch.distributed as dist
+
+from mmrazor.registry import MODELS
+
+try:
+    from torch.ao.quantization.observer import (MinMaxObserver,
+                                                PerChannelMinMaxObserver)
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    MinMaxObserver = get_placeholder('torch>=1.13')
+    PerChannelMinMaxObserver = get_placeholder('torch>=1.13')
+
+
+def sync_tensor(tensor):
+    """Synchronize the target tensor during distributed training."""
+    if torch.distributed.is_initialized() and tensor.is_cuda:
+        tensor.data = tensor.data / dist.get_world_size()
+        dist.all_reduce(tensor.data)
+    return tensor
+
+
+class LSQObserverMixIn:
+    """A mixin class for LSQObserver which can provide the initialized
+    floating-point scale factor."""
+
+    def __init__(self):
+        self.tensor_norm = None
+
+    @torch.jit.export
+    def _calculate_scale(self):
+        """Calculate the initialized floating-point scale factor.
+
+        Each layer of weights and each layer of activations has a distinct step
+        size, represented as a fp32 value, initialized to 2<|v|> / sqrt(Q_p),
+        computed on either the initial weights values or the first batch of
+        activations, respectively.
+        """
+        scale = 2 * self.tensor_norm / math.sqrt(self.quant_max)
+        sync_tensor(scale)
+        return scale
+
+
+@MODELS.register_module()
+class LSQObserver(MinMaxObserver, LSQObserverMixIn):
+    """LSQ observer.
+
+    Paper: Learned Step Size Quantization. <https://arxiv.org/abs/1902.08153>
+    """
+
+    def __init__(self, *args, **kwargs):
+        MinMaxObserver.__init__(self, *args, **kwargs)
+        LSQObserverMixIn.__init__(self)
+
+    def forward(self, x_orig):
+        """Records the running minimum, maximum and tensor_norm of ``x``."""
+        if x_orig.numel() == 0:
+            return x_orig
+        x = x_orig.detach()  # avoid keeping autograd tape
+        x = x.to(self.min_val.dtype)
+        self.tensor_norm = x.abs().mean()
+        min_val_cur, max_val_cur = torch.aminmax(x)
+        min_val = torch.min(min_val_cur, self.min_val)
+        max_val = torch.max(max_val_cur, self.max_val)
+        self.min_val.copy_(min_val)
+        self.max_val.copy_(max_val)
+        return x_orig
+
+    @torch.jit.export
+    def calculate_qparams(self):
+        """Calculates the quantization parameters."""
+        _, zero_point = MinMaxObserver.calculate_qparams(self)
+        scale = LSQObserverMixIn._calculate_scale(self)
+        return scale, zero_point
+
+
+@MODELS.register_module()
+class LSQPerChannelObserver(PerChannelMinMaxObserver, LSQObserverMixIn):
+    """LSQ per-channel observer.
+
+    Paper: Learned Step Size Quantization. <https://arxiv.org/abs/1902.08153>
+    """
+
+    def __init__(self, *args, **kwargs):
+        PerChannelMinMaxObserver.__init__(self, *args, **kwargs)
+        LSQObserverMixIn.__init__(self)
+
+    def forward(self, x_orig):
+        """Records the per-channel running minimum, maximum and tensor_norm of
+        ``x``."""
+        if x_orig.numel() == 0:
+            return x_orig
+        x = x_orig.detach()  # avoid keeping autograd tape
+        min_val = self.min_val
+        max_val = self.max_val
+        x_dim = x.size()
+
+        new_axis_list = [i for i in range(len(x_dim))]  # noqa: C416
+        new_axis_list[self.ch_axis] = 0
+        new_axis_list[0] = self.ch_axis
+        y = x.permute(new_axis_list)
+        # Need to match dtype of min/max because the updates to buffers
+        # are done in place and types need to match for comparisons
+        y = y.to(self.min_val.dtype)
+        y = torch.flatten(y, start_dim=1)
+
+        self.tensor_norm = y.abs().mean(1)
+
+        if min_val.numel() == 0 or max_val.numel() == 0:
+            min_val, max_val = torch.aminmax(y, dim=1)
+        else:
+            min_val_cur, max_val_cur = torch.aminmax(y, dim=1)
+            min_val = torch.min(min_val_cur, min_val)
+            max_val = torch.max(max_val_cur, max_val)
+        self.min_val.resize_(min_val.shape)
+        self.max_val.resize_(max_val.shape)
+        self.min_val.copy_(min_val)
+        self.max_val.copy_(max_val)
+        return x_orig
+
+    @torch.jit.export
+    def calculate_qparams(self):
+        """Calculates the quantization parameters."""
+        _, zero_point = PerChannelMinMaxObserver.calculate_qparams(self)
+        scale = LSQObserverMixIn._calculate_scale(self)
+        return scale, zero_point
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/torch_observers.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/torch_observers.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e540667a276a6e518325188bbe4b9e9a9fea730
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/observers/torch_observers.py
@@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import inspect
+from typing import List
+
+import torch
+
+from mmrazor.registry import MODELS
+
+try:
+    import torch.ao.quantization.observer as torch_observer_src
+    from torch.ao.quantization.observer import PerChannelMinMaxObserver
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    torch_observer_src = get_package_placeholder('torch>=1.13')
+    PerChannelMinMaxObserver = get_package_placeholder('torch>=1.13')
+
+
+@torch.jit.export
+def reset_min_max_vals(self):
+    """Resets the min/max values.
+
+    `min_val` and `max_val` are always be on cpu in the pytorch version of this
+    method.
+    """
+    min_val = torch.rand(0, )
+    max_val = torch.rand(0, )
+    self.min_val.resize_(min_val.shape).copy_(min_val)
+    self.max_val.resize_(max_val.shape).copy_(max_val)
+
+
+PerChannelMinMaxObserver.reset_min_max_vals = reset_min_max_vals
+
+
+# TORCH_observers = register_torch_observers()
+# TORCH_observers including:
+# FixedQParamsObserver
+# HistogramObserver
+# MinMaxObserver
+# MovingAverageMinMaxObserver
+# MovingAveragePerChannelMinMaxObserver
+# NoopObserver
+# ObserverBase
+# PerChannelMinMaxObserver
+# PlaceholderObserver
+# RecordingObserver
+# ReuseInputObserver
+# UniformQuantizationObserverBase
+def register_torch_observers() -> List[str]:
+    """Register observers in ``torch.ao.quantization.observer`` to the
+    ``MODELS`` registry.
+
+    Returns:
+        List[str]: A list of registered observers' name.
+    """
+    torch_observers = []
+    for module_name in dir(torch_observer_src):
+        if module_name.startswith('__') or module_name.startswith('_') or \
+                                            module_name.startswith('default'):
+            continue
+        _observer = getattr(torch_observer_src, module_name)
+        if inspect.isclass(_observer) and issubclass(
+                _observer, torch_observer_src.ObserverBase):
+            if MODELS.get(module_name) is None:
+                MODELS.register_module(module=_observer)
+                torch_observers.append(module_name)
+    return torch_observers
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..a26bb1322fa53b5e693c1bc78a73da2616fb4fd8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .academic_quantizer import AcademicQuantizer
+from .base import BaseQuantizer
+from .native_quantizer import TorchNativeQuantizer
+from .openvino_quantizer import OpenVINOQuantizer
+from .tensorrt_quantizer import TensorRTQuantizer
+
+__all__ = [
+    'BaseQuantizer', 'AcademicQuantizer', 'TorchNativeQuantizer',
+    'TensorRTQuantizer', 'OpenVINOQuantizer'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/academic_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/academic_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..0dbe6dcdd7d13075c03271e74f0a161a63f0882c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/academic_quantizer.py
@@ -0,0 +1,170 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+import torch
+
+from mmrazor.models.task_modules.tracer import build_graphmodule
+from mmrazor.models.utils import str2class
+from mmrazor.registry import MODELS
+from mmrazor.structures.quantization import BackendConfigs, QConfigHandler
+from .base import BaseQuantizer
+
+try:
+    from torch.ao.quantization.fx import prepare
+    from torch.ao.quantization.fx.custom_config import (FuseCustomConfig,
+                                                        PrepareCustomConfig)
+    from torch.ao.quantization.qconfig_mapping import QConfigMapping
+    from torch.ao.quantization.quantize_fx import _fuse_fx
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    prepare = get_placeholder('torch>=1.13')
+    FuseCustomConfig = get_placeholder('torch>=1.13')
+    PrepareCustomConfig = get_placeholder('torch>=1.13')
+    QConfigMapping = get_placeholder('torch>=1.13')
+    _fuse_fx = get_placeholder('torch>=1.13')
+
+GLOBAL_DICT_KEY = '_global_'
+OBJECT_TYPE_DICT_KEY = 'object_type'
+MODULE_NAME_DICT_KEY = 'module_name'
+
+# keys can be used in `prepare_custom_config` of `AcademicQuantizer`.
+FLOAT_TO_OBSERVED_DICT_KEY = 'float_to_observed_custom_module_class'
+PRESERVED_ATTRIBUTES_DICT_KEY = 'preserved_attributes'
+
+
+@MODELS.register_module()
+class AcademicQuantizer(BaseQuantizer):
+    """Quantizer for academic researching. Different from some quantizers for
+    deploying, `AcademicQuantizer` is without the interfaces for deployment,
+    but it has more flexible functions for quantizing your model. With its
+    help, you can custom configuration qconfig for differenet OP by
+    `qconfig_mapping` to implement customized experiments, including using
+    custom fakquant, trying mixed precision quantization, comparing different
+    quantization scheme and so on.
+
+    Args:
+        qconfig_mapping (Dict): Mapping from model ops to qconfig to configure
+            how a model is quantized. You can specify qconfigs using the
+            following keys (in increasing match priority):
+                ``_global_`` : sets the global (default) qconfig
+                ``object_type`` : sets the qconfig for a given module type,
+                    function, or method name
+                ``module_name`` : sets the qconfig for modules matching the
+                    given module name
+        tracer (Dict): It can be used to trace the float model to generate the
+            corresponding graph, which contributes to prepare for quantizing
+            the float model with code-free. Default to
+            `dict(type='mmrazor.CustomTracer')`.
+        prepare_custom_config (Optional[Dict]): Custom configuration for
+            :func:`~torch.ao.quantization.fx.prepare`. You can specify the
+            follow:
+                ``float_to_observed_custom_module_class`` : a list of dict that
+                    mapping from float module classes to observed module
+                    classes, e.g.
+                    `[('FloatCustomModule', 'ObservedCustomModule')]`
+                ``preserved_attributes``: a list of attributes that persist
+                    even if they are not used in ``forward``, e.g.
+                    `['attr1', 'attr2']`
+    """
+
+    def __init__(self,
+                 qconfig_mapping: Dict,
+                 tracer: Dict = dict(type='mmrazor.CustomTracer'),
+                 prepare_custom_config: Optional[Dict] = None):
+        super().__init__(tracer)
+        self.qconfig_mapping = self.gen_qconfig_mapping(qconfig_mapping)
+        self.prepare_custom_config = self.gen_prepare_custom_config(
+            prepare_custom_config)
+        self.backend_config = BackendConfigs[self.backend]
+        self.example_inputs = (torch.randn(1, 3, 224, 224), )
+
+    @property
+    def backend(self):
+        """The key of the corresponding backend config."""
+        return 'academic'
+
+    def prepare(self, model, concrete_args=None):
+        """Prepare for quantizing model, which includes as follows:
+
+        1. Swap floatfunctional with FXFloatFunctional;
+        2. Trace model to generate `GraphModule`;
+        2. Fuse some OPs combination, such as conv + bn, conv + relu and so on;
+        3. Swap some conv or linear module with QAT Modules which contain
+        weight fakequant nodes;
+        4. Insert required fakequant nodes for activation.
+        step 3 and step 4 are implemented in
+        :func:`~torch.ao.quantization.fx.prepare`
+        """
+        self.swap_ff_with_fxff(model)
+        traced_graph = self.tracer.trace(model, concrete_args=concrete_args)
+        graph_module = build_graphmodule(model, traced_graph)
+        preserved_attributes = self.prepare_custom_config.preserved_attributes
+        for attr_name in preserved_attributes:
+            setattr(graph_module, attr_name, getattr(model, attr_name))
+        fuse_custom_config = FuseCustomConfig().set_preserved_attributes(
+            preserved_attributes)
+
+        # set the training modes of all modules to True to `_fuse_fx` correctly
+        # todo: check freezebn
+        self.sync_module_training_mode(graph_module, mode=True)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            fuse_custom_config=fuse_custom_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            prepare_custom_config=self.prepare_custom_config,
+            backend_config=self.backend_config)
+        for attr_name in preserved_attributes:
+            setattr(prepared, attr_name, getattr(model, attr_name))
+
+        return prepared
+
+    def gen_qconfig_mapping(self, qconfig_mapping: Dict):
+        """Convert qconfig_mapping in config file to `QConfigMapping`.
+
+        `QConfigMapping` is a custom class for mapping from model ops to
+        :class:`torch.ao.quantization.QConfig` s.
+        """
+        conf = QConfigMapping()
+        if GLOBAL_DICT_KEY in qconfig_mapping:
+            qconfig = QConfigHandler(
+                qconfig_mapping[GLOBAL_DICT_KEY]).convert()
+            conf.set_global(qconfig)
+
+        for object_type, qconfig in qconfig_mapping.get(
+                OBJECT_TYPE_DICT_KEY, []):
+            qconfig = QConfigHandler(qconfig).convert()
+            conf.set_object_type(str2class(object_type), qconfig)
+
+        for module_name, qconfig in qconfig_mapping.get(
+                MODULE_NAME_DICT_KEY, []):
+            qconfig = QConfigHandler(qconfig).convert()
+            conf.set_module_name(module_name, qconfig)
+
+        return conf
+
+    def gen_prepare_custom_config(self, prepare_custom_config: Optional[Dict]):
+        """Convert prepare_custom_config in config file to
+        `PrepareCustomConfig`.
+
+        `PrepareCustomConfig` is a custom class for custom configurating
+        :func:`~torch.ao.quantization.fx.prepare`.
+        """
+        conf = PrepareCustomConfig()
+        if prepare_custom_config is None:
+            return conf
+        else:
+            for float_class_str, observed_class_str in prepare_custom_config.get(  # noqa: E501
+                    FLOAT_TO_OBSERVED_DICT_KEY, []):
+                float_class = MODELS.get(float_class_str)
+                observed_class = MODELS.get(observed_class_str)
+                conf.set_float_to_observed_mapping(float_class, observed_class)
+            conf.set_preserved_attributes(
+                prepare_custom_config.get(PRESERVED_ATTRIBUTES_DICT_KEY, []))
+            return conf
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/base.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..78c8163c70ba7efc2ee7e3a6173aa925560e21f6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/base.py
@@ -0,0 +1,87 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import abstractmethod
+from typing import Dict
+
+import torch
+import torch.nn as nn
+from mmengine.model import BaseModule
+from mmengine.model.utils import _BatchNormXd
+
+from mmrazor.registry import TASK_UTILS
+
+
+class BaseQuantizer(BaseModule):
+    """Base class for quantizers. Its role for several subclass is as follows:
+    1. Provide tracer for tracing model for all subclass.
+    2. Define some common abstract methods, such as `prepare`.
+    3. Provide some common functional interfaces, such as `swap_ff_with_fxff`.
+
+    Args:
+        tracer (Dict): It can be used to trace the float model to generate the
+            corresponding graph, which contributes to prepare for quantizing
+            the float model with code-free.
+    """
+
+    def __init__(self, tracer: Dict):
+        super().__init__()
+        self.tracer = TASK_UTILS.build(tracer)
+
+    def sync_module_training_mode(self, model, mode=True):
+        """Synchronize the training modes.
+
+        Note that modes of conv and bn must be the same during ``_fuse_fx``.
+        """
+        for module in model.modules():
+            module.training = mode
+        return
+
+    @staticmethod
+    def convert_batchnorm2d(model):
+        """Helper function to convert all :attr:`_BatchNormXd` layers and
+        :class:`torch.nn.SyncBatchNorm` layers in the model to
+        :class:`torch.nn.BatchNorm2d` layers.
+        """
+        # todo: Convert all `_BatchNormXd` and `SyncBatchNorm`
+        #  layers to `BatchNorm2d` layers but they may be :attr:`BatchNorm*D`
+        #  layers
+        module_checklist = [nn.modules.batchnorm.SyncBatchNorm, _BatchNormXd]
+
+        def traverse(module: nn.Module):
+            for child_name, child in module.named_children():
+                if isinstance(child, tuple(module_checklist)):
+                    bn = nn.BatchNorm2d(child.num_features, child.eps,
+                                        child.momentum, child.affine,
+                                        child.track_running_stats)
+                    setattr(module, child_name, bn)
+                else:
+                    traverse(child)
+
+        traverse(model)
+
+    @abstractmethod
+    def prepare(self, model):
+        """Prepare for quantizing model, which usually includes as follows:
+
+        1. Swap floatfunctional with FXFloatFunctional;
+        2. Trace model to generate `GraphModule`;
+        2. Fuse some OPs combination, such as conv + bn, conv + relu and so on;
+        3. Swap some conv or linear module with QAT Modules which contain
+        weight fakequant nodes;
+        4. Insert required fakequant nodes for activation.
+        5. (Optional) Delete some redundant fakequant nodes according to the
+        special requirement of the backend for deployment.
+        """
+        pass
+
+    def swap_ff_with_fxff(self, model: torch.nn.Module):
+        """Swap FloatFunctional with FXFloatFunctional."""
+        modules_to_swap = []
+        for name, module in model.named_children():
+            if isinstance(module, torch.ao.nn.quantized.FloatFunctional):
+                modules_to_swap.append(name)
+            else:
+                self.swap_ff_with_fxff(module)
+
+        for name in modules_to_swap:
+            del model._modules[name]
+            model._modules[name] = torch.ao.nn.quantized.FXFloatFunctional()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b8153289de77f381dddbe4ee8d3eb67b43be61fa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .openvino_quantize_exporter import OpenVinoQuantizeExportor
+from .tensorrt_quantize_exporter import TensorRTExplicitExporter
+
+__all__ = ['OpenVinoQuantizeExportor', 'TensorRTExplicitExporter']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/base_quantize_exporter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/base_quantize_exporter.py
new file mode 100644
index 0000000000000000000000000000000000000000..6527d320759fc0eaac75e07ff134c65b96b58743
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/base_quantize_exporter.py
@@ -0,0 +1,167 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+from mmengine import print_log
+
+from .optim_utils import ONNXOptimUtils
+
+try:
+    import onnx
+    from onnx import numpy_helper
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    onnx = get_package_placeholder('No module named onnx')
+    numpy_helper = get_package_placeholder('No module named onnx.numpy_helper')
+
+SUPPORT_QWEIGHT_NODE = ['Gemm', 'Conv', 'ConvTranspose']
+
+PERCHANNEL_FAKEQUANTIZER = [
+    'FakeQuantizeLearnablePerchannelAffine', 'FixedPerChannelAffine'
+]
+PERTENSOR_FAKEQUANTIZER = ['LearnablePerTensorAffine', 'FixedPerTensorAffine']
+
+ALL_FAKEQUANTIZER = PERCHANNEL_FAKEQUANTIZER + PERTENSOR_FAKEQUANTIZER
+
+
+def _parse_attrs(node_attrs):
+    attrs = {}
+    for attr in node_attrs:
+        if attr.type == onnx.AttributeProto.AttributeType.INTS:
+            attrs[attr.name] = tuple(attr.ints)
+        elif attr.type == onnx.AttributeProto.AttributeType.INT:
+            attrs[attr.name] = attr.i
+        elif attr.type == onnx.AttributeProto.AttributeType.FLOATS:
+            attrs[attr.name] = tuple(attr.floats)
+        elif attr.type == onnx.AttributeProto.AttributeType.FLOAT:
+            attrs[attr.name] = attr.f
+        elif attr.type == onnx.AttributeProto.AttributeType.TENSOR:
+            attrs[attr.name] = numpy_helper.to_array(attr.t)
+        elif attr.type == onnx.AttributeProto.AttributeType.STRING:
+            attrs[attr.name] = str(attr.s)
+        elif attr.type == onnx.AttributeProto.AttributeType.STRINGS:
+            attrs[attr.name] = tuple([str(x) for x in attr.strings])
+        else:
+            raise Exception('ATTR Type [{}] Not Supported!'.format(attr.type))
+    return attrs
+
+
+class BaseQuantizeExportor():
+
+    optimizer = ONNXOptimUtils
+
+    def __init__(self, onnx_model, export_path) -> None:
+
+        if isinstance(onnx_model, str):
+            self.onnx_model = onnx.load(onnx_model)
+        elif isinstance(onnx_model, onnx.ModelProto):
+            self.onnx_model = onnx_model
+        else:
+            raise TypeError
+
+        self.export_path = export_path
+        self._init_mappings_from_onnx(self.onnx_model)
+
+        self.optimizer.remove_fake_pad_op(self.onnx_model, self.name2data,
+                                          self.input2node, self.output2node)
+
+        self._remap_input_and_node()
+        self._remap_output_and_node()
+
+    @property
+    def graph(self):
+        """The onnx model's graph."""
+        return self.onnx_model.graph
+
+    def _init_mappings_from_onnx(self, onnx_model):
+        """Build necessary mappings in a onnx model."""
+
+        self.input2node = self.optimizer.map_input_and_node(onnx_model)
+        self.output2node = self.optimizer.map_output_and_node(onnx_model)
+        self.name2data = self.optimizer.map_name_and_data(onnx_model)
+
+    def _remap_input_and_node(self):
+        """Rebuild the mapping from input name to a (node, input index)
+        tuple."""
+        self.input2node = self.optimizer.map_input_and_node(self.onnx_model)
+
+    def _remap_output_and_node(self):
+        """Rebuild the mapping from a node's output name to this node."""
+        self.output2node = self.optimizer.map_output_and_node(self.onnx_model)
+
+    def parse_qparams(self, node: onnx.NodeProto):
+        """Parse the quantize-related parameters based on a node."""
+        tensor_name, scale, zero_point = node.input[:3]
+
+        scale, zero_point = self.name2data[scale], self.name2data[zero_point]
+        if len(node.input) > 3:
+            qmin, qmax = node.input[-2:]
+            qmin, qmax = self.name2data[qmin], self.name2data[qmax]
+        elif len(node.attribute) > 0:
+            qparams = _parse_attrs(node.attribute)
+            qmin = qparams['quant_min']
+            qmax = qparams['quant_max']
+        else:
+            print_log(f'qmin and qmax are not found for <{node.name}>!')
+            qmax = qmin = None
+        return tensor_name, scale, zero_point, qmin, qmax
+
+    def collect_symbolic_nodes(self, onnx_model: onnx.ModelProto):
+        """Collect all the fakequant nodes from a onnx model."""
+        symbolic_nodes = list()
+        for node in onnx_model.graph.node:
+            if node.op_type in ALL_FAKEQUANTIZER:
+                symbolic_nodes.append(node)
+        return symbolic_nodes
+
+    def _get_constant_inputs(self, node: onnx.NodeProto):
+        """Get the constant input node for the current node."""
+        constant_nodes = list()
+        output2node = self.output2node
+        for inp in node.input:
+            if inp in output2node and output2node[inp].op_type == 'Constant':
+                cnode = output2node[inp]
+
+                constant_nodes.append(cnode)
+        return constant_nodes
+
+    def _collect_symbolic_constant_inputs(self, symbolic_nodes: List):
+        """Collect these constant nodes which is the input of all the symbolic
+        node."""
+
+        collected_constant_names = set()
+        constant_inputs = list()
+        for node in symbolic_nodes:
+            constant_inputs = self._get_constant_inputs(node)
+            for constant in constant_inputs:
+                if constant.name in collected_constant_names:
+                    continue
+                constant_inputs.append(constant)
+                collected_constant_names.add(constant.name)
+        return constant_inputs
+
+    def _remove_symbolic_related_from_onnx(self, symbolic_nodes: List,
+                                           symbolic_constant_inputs: List):
+        """Remove these out of date fakequant nodes and theirs constant input
+        nodes."""
+        for node in symbolic_nodes:
+            self.onnx_model.graph.node.remove(node)
+
+        # Remove symbolic related constant nodes. The constant node which is
+        # only used by those symbolic nodes can be removed.
+
+        def _is_standalone_constant_node(constant):
+            for node in self.onnx_model.graph.node:
+                for input_name in node.input:
+                    # A constant node always has one output.
+                    if input_name == constant.output[0]:
+                        return False
+            return True
+
+        for constant in symbolic_constant_inputs:
+            if _is_standalone_constant_node(constant):
+                self.onnx_model.graph.node.remove(constant)
+
+    def export(self):
+        """Export end to end onnx model."""
+        # todo: is it a abstract method?
+        raise NotImplementedError
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/openvino_quantize_exporter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/openvino_quantize_exporter.py
new file mode 100644
index 0000000000000000000000000000000000000000..e706251cabfd5d2844799381f6446a37b2e6b421
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/openvino_quantize_exporter.py
@@ -0,0 +1,159 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from typing import List
+
+import numpy as np
+from google.protobuf.internal.containers import RepeatedScalarFieldContainer
+
+try:
+    import onnx
+    from onnx import helper, numpy_helper
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    onnx = get_package_placeholder('No module named onnx')
+    numpy_helper = get_package_placeholder('No module named onnx.numpy_helper')
+    helper = get_package_placeholder('No module named onnx.helper')
+
+from .base_quantize_exporter import BaseQuantizeExportor
+
+
+class OpenVinoQuantizeExportor(BaseQuantizeExportor):
+
+    def __init__(self, onnx_model, export_path) -> None:
+        super().__init__(onnx_model, export_path)
+
+    def _build_backend_node_from_symbolic(self, node: onnx.NodeProto,
+                                          tensor_name: str, qmin: np.ndarray,
+                                          qmax: np.ndarray):
+        """Build new onnx nodes which can be deployed to the specific backend.
+
+        These nodes will be used to replace those symbolic nodes in the
+        original onnx model.
+        """
+        qmax = int(qmax)
+        qmin = int(qmin)
+        levels = qmax - qmin + 1
+        # adjust weight levels
+        # if levels == 128:
+        #     levels = 256
+        #     qmax = qmax * 2 + 1
+        #     qmin = qmin * 2
+        output_name = node.output[0]
+        # Create a node (FakeQuantize)
+        keys = ['input_min', 'input_max', 'output_min', 'output_max']
+        input_names = [f'{tensor_name}_{key}' for key in keys]
+        backend_node = helper.make_node(
+            'FakeQuantize',  # node name
+            [tensor_name, *input_names],  # inputs
+            [output_name],  # outputs
+            levels=levels,  # Attributes
+            domain='org.openvinotoolkit',
+            name=node.name)
+        return backend_node
+
+    def _build_backend_initializer(self,
+                                   names: RepeatedScalarFieldContainer[str],
+                                   scale: np.ndarray, zero_point: np.ndarray,
+                                   qmin: np.ndarray, qmax: np.ndarray,
+                                   shape: List[int]):
+        """Build onnx initializers which can be deployed to specific
+        backend."""
+
+        scale = np.abs(np.asarray(scale, dtype=np.float64).reshape(-1))
+        zero_point = np.clip(
+            np.asarray(np.round(zero_point), dtype=np.int32).reshape(-1),
+            a_min=qmin,
+            a_max=qmax)
+
+        qrange = float(qmax - qmin)
+        input_range = scale * qrange
+        input_high = (qmax - zero_point).astype(
+            np.float64) * input_range / qrange
+        input_low = input_high - input_range
+        input_low_size = input_low.size
+
+        if input_low_size != 1:
+            input_low = input_low.reshape(*shape)
+            input_high = input_high.reshape(*shape)
+
+        input_low = input_low.astype(np.float32)
+        input_high = input_high.astype(np.float32)
+
+        initializers = list()
+        for init_name, value_tensor in zip(
+                names, [input_low, input_high, input_low, input_high]):
+            init = numpy_helper.from_array(value_tensor)
+            init.name = init_name
+            initializers.append(init)
+        return initializers
+
+    def build_backend_nodes_and_initializers(self, symbolic_nodes: List):
+        """Build new onnx nodes and initializers which can be deployed to
+        specific backend."""
+        backend_nodes = list()
+        backend_initializers = list()
+        for node in symbolic_nodes:
+            tensor_name, scale, zero_point, qmin, qmax = self.parse_qparams(
+                node)
+            new_node = self._build_backend_node_from_symbolic(
+                node, tensor_name, qmin, qmax)
+            backend_nodes.append(new_node)
+
+            try:
+                # If the successor node (such as a conv node) has weight,
+                # we need get the length of the weight's shape. And ensure
+                # the length of the weight's shape and the new node's
+                # input shape (such as input_low and input_high) is the same.
+                next_node = self.input2node[node.output[0]][0][0]
+                # node for save weights
+                fake_node = self.output2node[next_node.input[1]]
+                tensor = self.name2data[fake_node.input[0]]
+                shape_length = len(tensor.shape)
+                new_shape = [-1] + [1] * (shape_length - 1)
+            except Exception:
+                new_shape = [-1]
+
+            # The first element of new_node.input is the tensor name.
+            new_init_names = new_node.input[1:]
+            new_initializers = self._build_backend_initializer(
+                new_init_names, scale, zero_point, qmin, qmax, new_shape)
+            backend_initializers.extend(new_initializers)
+        return backend_nodes, backend_initializers
+
+    def _insert_initializers_to_onnx(self, initializers: List):
+        """Insert onnx initializers to the onnx graph."""
+        inserted_init_names = set()
+        for init in initializers:
+            if init.name in inserted_init_names:
+                continue
+
+            self.onnx_model.graph.initializer.append(init)
+            inserted_init_names.add(init.name)
+
+    def _replace_symbolic_related(self):
+        """Replacing symbolic related nodes and initializers in the original
+        onnx model with new nodes and initializers that can be deployed to the
+        specific backend."""
+
+        symbolic_nodes = self.collect_symbolic_nodes(self.onnx_model)
+
+        collect_func = self._collect_symbolic_constant_inputs
+        # Usually different activation fakequants share the same constant
+        # input, and different weight fakequants share the same constant input.
+        symbolic_constant_inputs = collect_func(symbolic_nodes)
+
+        build_func = self.build_backend_nodes_and_initializers
+        new_nodes, new_initializers = build_func(symbolic_nodes)
+
+        self._insert_initializers_to_onnx(new_initializers)
+
+        self._remove_symbolic_related_from_onnx(symbolic_nodes,
+                                                symbolic_constant_inputs)
+
+        self.onnx_model.graph.node.extend(new_nodes)
+        self.optimizer.optimize(self.onnx_model)
+
+    def export(self):
+        """Export end to end onnx model."""
+        self._replace_symbolic_related()
+        onnx.save(self.onnx_model, self.export_path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/optim_utils.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/optim_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4adc5ee133cd6c37acfd3b6f2c69fafd049bcd0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/optim_utils.py
@@ -0,0 +1,265 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, List, Optional
+
+from mmengine import print_log
+
+try:
+    import onnx
+    from onnx import numpy_helper
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    onnx = get_package_placeholder('No module named onnx')
+    numpy_helper = get_package_placeholder('No module named onnx.numpy_helper')
+
+
+class ONNXOptimUtils():
+
+    @classmethod
+    def map_name_and_data(cls, onnx_model: onnx.ModelProto):
+        """Build the mapping from a data's name to the data itself."""
+        params = {}
+        for init in onnx_model.graph.initializer:
+            params[init.name] = numpy_helper.to_array(init)
+        for node in onnx_model.graph.node:
+            # If two zero_points are identity, one is a reference to the other
+            # after optimized by onnx.
+            if node.op_type == 'Identity' and len(node.input) == 1 and \
+                    node.input[0] in params:
+                params[node.output[0]] = copy.deepcopy(params[node.input[0]])
+            if node.op_type == 'Constant':
+                for attr in node.attribute:
+                    if attr.name == 'value':
+                        params[node.output[0]] = numpy_helper.to_array(attr.t)
+        return params
+
+    @classmethod
+    def map_name_and_initializer(cls,
+                                 onnx_model: onnx.ModelProto,
+                                 allow_redundant=True):
+        """Build the mapping from a initializer's output name to this
+        initializer."""
+
+        initializers = dict()
+
+        for idx, init in enumerate(onnx_model.graph.initializer):
+            initializers[init.name] = (init, idx)
+
+        return initializers
+
+    @classmethod
+    def map_output_and_node(cls, onnx_model: onnx.ModelProto):
+        """Build the mapping from a node's output name to this node."""
+        output2node = dict()
+        for node in onnx_model.graph.node:
+            for output_name in node.output:
+                output2node[output_name] = node
+        return output2node
+
+    @classmethod
+    def map_input_and_node(cls, onnx_model: onnx.ModelProto):
+        """Build the mapping from input name to a (node, input index) tuple."""
+
+        input2node: Dict[str, List] = dict()
+        for node in onnx_model.graph.node:
+            for idx, input_name in enumerate(node.input):
+                if input_name not in input2node:
+                    input2node[input_name] = []
+                input2node[input_name].append([node, idx])
+        return input2node
+
+    @classmethod
+    def remove_node_from_onnx(cls, node: onnx.NodeProto,
+                              onnx_model: onnx.ModelProto):
+        """Removes a node from node list."""
+        onnx_model.graph.node.remove(node)
+
+    @classmethod
+    def remove_initializer_from_onnx(cls, initializer: onnx.TensorProto,
+                                     onnx_model: onnx.ModelProto):
+        """Inserts the initializer at the specified position."""
+        onnx_model.graph.initializer.remove(initializer)
+
+    @classmethod
+    def remove_fake_pad_op(cls, onnx_model, name2data, inp2node, out2node):
+        nodes_to_be_removed = []
+        for idx, node in enumerate(onnx_model.graph.node):
+            if node.op_type == 'Pad':
+                pads = name2data[node.input[1]]
+                if all([x == 0 for x in pads]):
+                    print_log(f'Remove pad op: <{node.name}>.')
+                    next_nodes = inp2node[node.output[0]]
+                    for next_node, idx in next_nodes:
+                        next_node.input[idx] = node.input[0]
+                    nodes_to_be_removed.append(node)
+
+        for node in nodes_to_be_removed:
+            onnx_model.graph.node.remove(node)
+
+    @classmethod
+    def insert_node_to_onnx(cls,
+                            node: onnx.NodeProto,
+                            onnx_model: onnx.ModelProto,
+                            idx: int = 0):
+        """Inserts the node at the specified position."""
+        onnx_model.graph.node.insert(idx, node)
+
+    @classmethod
+    def find_standalone_nodes(cls,
+                              onnx_model: onnx.ModelProto,
+                              input2node: Optional[Dict] = None,
+                              output2node: Optional[Dict] = None):
+        """Find unused nodes."""
+
+        if input2node is None:
+            input2node = cls.map_input_and_node(onnx_model)
+        if output2node is None:
+            output2node = cls.map_output_and_node(onnx_model)
+
+        def _is_standalone_node(node, input2node, output2node):
+            for input_name in node.input:
+                if input_name in output2node:
+                    return False
+
+            for out_node in node.output:
+                if out_node in input2node:
+                    return False
+
+            return True
+
+        standalone_nodes = list()
+        for node in onnx_model.graph.node:
+
+            if _is_standalone_node(node, input2node, output2node):
+                standalone_nodes.append(node)
+        return standalone_nodes
+
+    @classmethod
+    def find_redundant_initializers(cls,
+                                    onnx_model: onnx.ModelProto,
+                                    input2node: Optional[Dict] = None):
+        """Find unused initializers."""
+        if input2node is None:
+            input2node = cls.map_input_and_node(onnx_model)
+
+        initializers = cls.map_name_and_initializer(onnx_model)
+        redundant_initializers = list()
+        redundant_set = set()
+        for name, init_and_idx in initializers.items():
+            if name not in input2node and name not in redundant_set:
+                # init_and_idx[0] is onnx.onnx_ml_pb2.TensorProto
+                # init_and_idx[1] is a integer index
+                redundant_initializers.append(init_and_idx[0])
+                redundant_set.add(name)
+        return redundant_initializers
+
+    @classmethod
+    def topo_sort(cls,
+                  onnx_model: onnx.ModelProto,
+                  initializers: Optional[Dict] = None,
+                  inplace: bool = True):
+        """Topologically sort the nodes in a directed acyclic graph.
+
+        Note that nodes in a directed acyclic graph may be out of order
+        after replacing symbolic related nodes with new nodes.
+
+        Args:
+            onnx_model (onnx.ModelProto): The onnx model to be sorted
+                topologically.
+            initializers (Dict | Optional): The mapping from name to
+                initializers. Default to None.
+            inplace (bool): Can optionally do the operation in-place.
+                Defaults to True.
+        """
+
+        if inplace:
+            _onnx_model = onnx_model
+        else:
+            _onnx_model = copy.deepcopy(onnx_model)
+
+        if initializers is None:
+            initializers = cls.map_name_and_initializer(
+                _onnx_model, allow_redundant=True)
+
+        # A node may have multiple outputs. The first output name of a node
+        # named `/conv/Conv` is `/conv/Conv_output_0`
+        output_name2node = {}
+        for node in _onnx_model.graph.node:
+            for output_name in node.output:
+                output_name2node[output_name] = node
+        for node in _onnx_model.graph.input:
+            output_name2node[node.name] = node
+
+        name2node = {node.name: node for node in _onnx_model.graph.node}
+
+        graph: Dict[str,
+                    List] = {node.name: []
+                             for node in _onnx_model.graph.node}
+        for node in _onnx_model.graph.input:
+            graph[node.name] = []
+
+        indegree = {node.name: 0 for node in _onnx_model.graph.node}
+
+        # Build graph
+        for i, node in enumerate(_onnx_model.graph.node):
+            for input_name in node.input:
+                if input_name not in initializers:
+                    indegree[node.name] += 1
+                    prev_node = output_name2node[input_name]
+                    graph[prev_node.name].append(node)
+
+        graph_input = [node.name for node in _onnx_model.graph.input]
+        root = graph_input.copy()
+        sorted_nodes = []
+
+        # There are some nodes whose input are all initializers.
+        for node_name, in_degree in indegree.items():
+            if in_degree == 0:
+                root.append(node_name)
+
+        while root:
+            node_name = root.pop()
+            # There is no intersection between graph_input and
+            # _onnx_model.graph.node
+            if node_name not in graph_input:
+                node = name2node[node_name]
+                sorted_nodes.append(node)
+            for next_node in graph[node_name]:
+                indegree[next_node.name] -= 1
+                if indegree[next_node.name] == 0:
+                    root.append(next_node.name)
+
+        num_nodes = len(_onnx_model.graph.node)
+        if len(sorted_nodes) != num_nodes:
+            raise RuntimeError('The graph is not a DAG.')
+
+        for _ in range(num_nodes):
+            _onnx_model.graph.node.pop()
+        for node in sorted_nodes:
+            _onnx_model.graph.node.append(node)
+
+        return _onnx_model
+
+    @classmethod
+    def optimize(cls, onnx_model):
+        """Remove standalone nodes and redundant initializers, and
+        topologically sort the nodes in a directed acyclic graph."""
+
+        input2node = cls.map_input_and_node(onnx_model)
+        output2node = cls.map_output_and_node(onnx_model)
+
+        standalone_nodes = cls.find_standalone_nodes(onnx_model, input2node,
+                                                     output2node)
+        for node in standalone_nodes:
+            cls.remove_node_from_onnx(node, onnx_model)
+            print_log(f'Remove node {node.name}')
+
+        redundant_inits = cls.find_redundant_initializers(
+            onnx_model, input2node)
+        for init in redundant_inits:
+            cls.remove_initializer_from_onnx(init, onnx_model)
+            print_log(f'Remove initializer {init.name}')
+
+        sorted_onnx_model = cls.topo_sort(onnx_model)
+
+        return sorted_onnx_model
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/tensorrt_quantize_exporter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/tensorrt_quantize_exporter.py
new file mode 100644
index 0000000000000000000000000000000000000000..cde430b088147cc39bf6bdfd90f36625ab601a2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/exporters/tensorrt_quantize_exporter.py
@@ -0,0 +1,49 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+try:
+    import onnx
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    onnx = get_package_placeholder('No module named onnx')
+
+from .base_quantize_exporter import BaseQuantizeExportor
+
+
+class TensorRTExplicitExporter(BaseQuantizeExportor):
+
+    def __init__(self, onnx_model, export_path) -> None:
+        super().__init__(onnx_model, export_path)
+
+    def _build_backend_node_from_symbolic(self, node):
+        quantize_linear_node = onnx.helper.make_node(
+            'QuantizeLinear', node.input[:3], [node.name + '_quantized_out'],
+            node.name + '_quantized')
+        dequantize_linear_node = onnx.helper.make_node(
+            'DequantizeLinear',
+            [node.name + '_quantized_out'] + quantize_linear_node.input[1:3],
+            node.output, node.name + '_dequantized')
+        return [quantize_linear_node, dequantize_linear_node]
+
+    def build_backend_nodes(self, symbolic_nodes):
+        backend_nodes = list()
+        for node in symbolic_nodes:
+            _, _, zero_point, qmin, qmax = self.parse_qparams(node)
+            assert qmax - qmin in (
+                2**8 - 1, 2**8 -
+                2), 'Only 8 bit quantization support deployment to ONNX.'
+            assert not np.any(zero_point != 0), \
+                'This pass is only supposed to be used with TensorRT ' \
+                'Backend which does not support asymmetric quantization.'
+            new_nodes = self._build_backend_node_from_symbolic(node)
+            backend_nodes.extend(new_nodes)
+        return backend_nodes
+
+    def export(self):
+        symbolic_nodes = self.collect_symbolic_nodes(self.onnx_model)
+        new_nodes = self.build_backend_nodes(symbolic_nodes)
+        for node in symbolic_nodes:
+            self.onnx_model.graph.node.remove(node)
+        self.onnx_model.graph.node.extend(new_nodes)
+        self.optimizer.optimize(self.onnx_model)
+        onnx.save(self.onnx_model, self.export_path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/native_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/native_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b6f2f9ad022fef9c12693e18fb8dc7331075ac4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/native_quantizer.py
@@ -0,0 +1,446 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import torch
+from mmengine.config import Config
+
+try:
+    from torch.ao.quantization import (disable_observer, enable_fake_quant,
+                                       enable_observer)
+    from torch.ao.quantization.fx import prepare
+    from torch.ao.quantization.fx.graph_module import ObservedGraphModule
+    from torch.ao.quantization.qconfig_mapping import (
+        _FIXED_QPARAMS_OP_TO_OBSERVER, FixedQParamsFakeQuantize, QConfig,
+        QConfigMapping, default_weight_fake_quant)
+    from torch.ao.quantization.quantize_fx import _fuse_fx
+    from torch.fx.graph_module import GraphModule
+    from torch.nn.intrinsic.qat import modules as qat_fused_modules
+    from torch.nn.qat import modules as qat_modules
+    from torch.onnx import register_custom_op_symbolic
+except ImportError:
+    from mmrazor.utils import get_package_placeholder, get_placeholder
+    GraphModule = get_placeholder('torch>=1.13')
+    ObservedGraphModule = get_placeholder('torch>=1.13')
+    enable_fake_quant = get_placeholder('torch>=1.13')
+    disable_observer = get_placeholder('torch>=1.13')
+    enable_observer = get_placeholder('torch>=1.13')
+    prepare = get_placeholder('torch>=1.13')
+    QConfigMapping = get_placeholder('torch>=1.13')
+    _fuse_fx = get_placeholder('torch>=1.13')
+    qat_fused_modules = get_package_placeholder('torch>=1.13')
+    qat_modules = get_package_placeholder('torch>=1.13')
+    _FIXED_QPARAMS_OP_TO_OBSERVER = get_package_placeholder('torch>=1.13')
+    FixedQParamsFakeQuantize = get_package_placeholder('torch>=1.13')
+    QConfig = get_package_placeholder('torch>=1.13')
+    default_weight_fake_quant = get_package_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.task_modules.tracer import build_graphmodule
+from mmrazor.models.task_modules.tracer.fx import (
+    del_fakequant_after_function, del_fakequant_after_method,
+    del_fakequant_after_module, del_fakequant_after_op,
+    del_fakequant_before_function, del_fakequant_before_method,
+    del_fakequant_before_module, del_fakequant_before_op)
+from mmrazor.models.utils import str2class
+from mmrazor.registry import MODELS
+from mmrazor.structures.quantization import BackendConfigs, QConfigHandler
+from .base import BaseQuantizer
+
+if digit_version(torch.__version__) >= digit_version('1.13.0'):
+    SUPPORT_QAT_MODULES: Tuple = (
+        qat_fused_modules.ConvBn1d, qat_fused_modules.ConvBn2d,
+        qat_fused_modules.ConvBn3d, qat_fused_modules.ConvBnReLU1d,
+        qat_fused_modules.ConvBnReLU2d, qat_fused_modules.ConvBnReLU3d,
+        qat_fused_modules.ConvReLU1d, qat_fused_modules.ConvReLU2d,
+        qat_fused_modules.ConvReLU3d, qat_fused_modules.LinearBn1d,
+        qat_fused_modules.LinearReLU, qat_modules.Conv1d, qat_modules.Conv2d,
+        qat_modules.Conv3d, qat_modules.Linear)
+
+    MERGE_BN_MAPPINGS: Dict = {
+        qat_fused_modules.ConvBn1d: qat_modules.Conv1d,
+        qat_fused_modules.ConvBn2d: qat_modules.Conv2d,
+        qat_fused_modules.ConvBn3d: qat_modules.Conv3d,
+        qat_fused_modules.ConvBnReLU1d: qat_fused_modules.ConvReLU1d,
+        qat_fused_modules.ConvBnReLU2d: qat_fused_modules.ConvReLU2d,
+        qat_fused_modules.ConvBnReLU3d: qat_fused_modules.ConvReLU3d,
+        qat_fused_modules.LinearBn1d: qat_modules.Linear
+    }
+
+    def fake_quantize_per_channel_affine(g, x, scale, zero_point, ch_axis,
+                                         quant_min, quant_max):
+        return g.op('mmrazor::FixedPerChannelAffine', x, scale, zero_point,
+                    ch_axis, quant_min, quant_max)
+
+    register_custom_op_symbolic('::fake_quantize_per_channel_affine',
+                                fake_quantize_per_channel_affine, 11)
+
+    def fake_quantize_per_tensor_affine(g, x, scale, zero_point, quant_min,
+                                        quant_max):
+        return g.op('mmrazor::FixedPerTensorAffine', x, scale, zero_point,
+                    quant_min, quant_max)
+
+    register_custom_op_symbolic('::fake_quantize_per_tensor_affine',
+                                fake_quantize_per_tensor_affine, 11)
+
+else:
+    SUPPORT_QAT_MODULES = ()
+    MERGE_BN_MAPPINGS = {}
+
+
+@MODELS.register_module()
+class TorchNativeQuantizer(BaseQuantizer):
+    """Native class for quantizer.
+
+    Args:
+        global_qconfig (Union[Dict, Config]): Config for quantization details
+            of weight and activation include observer, quantizer, and qscheme.
+        no_observer_modules (Optional[List]): Modules don't need observer.
+            To fit different backend, we need qconfig to determine the modules
+            which don't need observer.
+        tracer (Dict): Config for tracer to trace modules for torch fx .
+
+    Raises:
+        NotImplementedError: _description_
+
+    Examples:
+        >>> global_qconfig = dict(
+        ...     w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+        ...     a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+        ...     w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+        ...     a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+        ...     w_qscheme=dict(
+        ...         qdtype='qint8', bit=8, is_symmetry=True,
+        ...         is_symmetric_range=True),
+        ...     a_qscheme=dict(
+        ...         qdtype='quint8', bit=8, is_symmetry=True,
+        ...         averaging_constant=0.1),
+)
+    """
+
+    def __init__(self,
+                 global_qconfig: Union[Dict, Config],
+                 no_observer_modules: Optional[List] = None,
+                 tracer: Dict = dict(type='CustomTracer'),
+                 extra_redundant_fakequants: Dict = dict(
+                     extra_module_prev_wo_fakequant=tuple(),
+                     extra_module_next_wo_fakequant=tuple(),
+                     extra_function_prev_wo_fakequant=tuple(),
+                     extra_function_next_wo_fakequant=tuple(),
+                     extra_method_prev_wo_fakequant=tuple(),
+                     extra_method_next_wo_fakequant=tuple(),
+                     extra_op_prev_wo_fakequant=tuple(),
+                     extra_op_next_wo_fakequant=tuple())):
+        super().__init__(tracer)
+        self.qconfig = QConfigHandler(global_qconfig)
+        if self.qconfig.w_qscheme.is_per_channel:
+            w_mode = 'per_channel'
+        else:
+            w_mode = 'per_tensor'
+        if self.qconfig.a_qscheme.is_per_channel:
+            a_mode = 'per_channel'
+        else:
+            a_mode = 'per_tensor'
+        assert w_mode in self.support_w_modes
+        assert a_mode in self.support_a_modes
+
+        self.qconfig_mapping = self.gen_qconfig_mapping(
+            self.qconfig, no_observer_modules)
+        self.no_observer_modules = no_observer_modules
+
+        self.backend_config = BackendConfigs[self.backend]
+        self.example_inputs = (torch.randn(1, 3, 224, 224), )
+
+        self.extra_redundant_fakequants = extra_redundant_fakequants
+
+    def gen_qconfig_mapping(self, qconfig, no_observer_modules):
+        """Convert qconfig in config file to `QConfigMapping`.
+
+        `QConfigMapping` is a custom class for mapping from model ops to
+        :class:`torch.ao.quantization.QConfig` s.
+        """
+        qconfig_mapping = QConfigMapping().set_global(qconfig.convert())
+
+        if no_observer_modules is not None:
+            no_observer_modules = str2class(no_observer_modules)
+            for mod in no_observer_modules:
+                qconfig_mapping.set_object_type(mod, None)
+
+        fixed_qparams_observer_to_qconfig = {}
+        for fixed_qparams_op, observer in _FIXED_QPARAMS_OP_TO_OBSERVER.items(
+        ):
+            if observer in fixed_qparams_observer_to_qconfig:
+                fixed_qparams_qconfig = fixed_qparams_observer_to_qconfig[
+                    observer]
+            else:
+                activation = FixedQParamsFakeQuantize.with_args(
+                    observer=observer)
+
+                fixed_qparams_qconfig = QConfig(
+                    activation=activation, weight=default_weight_fake_quant)
+                fixed_qparams_observer_to_qconfig[
+                    observer] = fixed_qparams_qconfig
+            qconfig_mapping.set_object_type(fixed_qparams_op,
+                                            fixed_qparams_qconfig)
+
+        return qconfig_mapping
+
+    @property
+    def backend(self):
+        """The key of the corresponding backend config."""
+        return 'native'
+
+    @property
+    def support_w_modes(self):
+        """Supported quantization modes for weight about per_tensor or
+        per_channel."""
+        return ('per_tensor', 'per_channel')
+
+    @property
+    def support_a_modes(self):
+        """Supported quantization modes for activation about per_tensor or
+        per_channel."""
+        return ('per_tensor')
+
+    def export_onnx(self, model: Union[torch.nn.Module, torch.jit.ScriptModule,
+                                       torch.jit.ScriptFunction],
+                    args: Union[Tuple[Any, ...],
+                                torch.Tensor], output_path: str, **kwargs):
+        """Export the onnx model that can be deployed to a native backend."""
+        torch.onnx.export(model, args, output_path, **kwargs)
+
+    def prepare(self, model, concrete_args=None):
+        """prepare graph to ObservedGraphModule.
+
+        Returns:
+            ObservedGraphModule: GraphModules after fuse and observer.
+
+        Notes:
+            'graph_module' after '_fuse_fx()' function will fuse conv, BN, ReLU
+            into modules in SUPPORT_QAT_MODULES.
+            'graph_module' after 'prepare()' function will become observed.
+
+        Notes:
+            Keep `is_qat` is True is because in Pytorch when `is_qat` is false,
+            the `_fuse_fx()` function only fuse module into `nn.Squential`.
+            In mmrazor, we aim to add more ptq algorithm into our pipeline such
+            as Adaround, these kind of ptq method have some additional
+            fake_quant  operations that we need it to be fused into our
+            `SUPPORT_QAT_MODULES` type, which is a tricky way to deal with it.
+        """
+        self.swap_ff_with_fxff(model)
+        traced_graph = self.tracer.trace(model, concrete_args=concrete_args)
+        graph_module = build_graphmodule(model, traced_graph)
+
+        # set the training modes of all modules to True to `_fuse_fx` correctly
+        # todo: check freezebn
+        self.sync_module_training_mode(graph_module, mode=True)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+        prepared = self.del_redundant_fakequant(prepared)
+
+        return prepared
+
+    def post_process_for_deploy(self,
+                                observed_module: ObservedGraphModule,
+                                device: str = 'cpu',
+                                update_weight_with_fakequant: bool = False,
+                                keep_w_fake_quant: bool = False):
+        """weight fake-quant for supported QAT modules.
+
+        Args:
+            observed_module (ObservedGraphModule): Modules after fused and
+                observed.
+            keep_w_fake_quant (bool, optional): Bool to determine whether to
+                keep weight fake-quant op, depending on the backend. Defaults
+                to False.
+
+        Note:
+            `post_process_weight_fakequant()` function is necessary that the
+                `SUPPORT_QAT_MODULES` will be convert to normal modules, and
+                BN will be really integrated into conv layers.
+        """
+
+        def traverse(module):
+            for name, child in module.named_children():
+                # Trace `SUPPORT_QAT_MODULES` recursively.
+                if isinstance(child, SUPPORT_QAT_MODULES):
+                    # We add w_fakequant once in case some ptq methods have
+                    # specific operations such as Adaround. So we do Quantize
+                    # to perform these operations and do dequantize to
+                    # introduce quantization loss in advance.
+                    weight_fakequant = child.weight_fake_quant
+
+                    # `to_float()` function fuse BN into conv or conv_relu, and
+                    # also convert a qat module to a normal module.
+                    # source url: https://github.com/pytorch/pytorch/blob/master/torch/nn/intrinsic/qat/modules/conv_fused.py # noqa: E501
+                    float_child = child.to_float()
+
+                    if update_weight_with_fakequant:
+                        from torch.ao.nn.intrinsic import _FusedModule
+                        if issubclass(type(float_child), _FusedModule):
+                            float_child[0].weight.data = weight_fakequant(
+                                float_child[0].weight.data)
+                        else:
+                            float_child.weight.data = weight_fakequant(
+                                float_child.weight.data)
+                    # This is decided by backend type, some backend need
+                    # explicitly keep the fake quant structure, others don't.
+                    # TODO add deploy doc link
+                    if keep_w_fake_quant:
+                        # make weight fakequant fixed as the consistent
+                        # fakequant, it will help to deploy our model to
+                        # various backends.
+                        self.qconfig.fixed_w_fakequant()
+                        for m in float_child.modules():
+                            setattr(m, 'qconfig', self.qconfig.convert())
+                        if type(child) in MERGE_BN_MAPPINGS:
+                            cls = MERGE_BN_MAPPINGS[type(child)]
+                            new_child = cls.from_float(float_child).to(device)
+                        else:
+                            new_child = type(child).from_float(float_child).to(
+                                device)
+
+                        # because weight fakequants and observers are replaced
+                        # with base fakequants and base observers, some
+                        # initialized args need to be update by running
+                        # weight_fake_quant.
+                        enable_observer(new_child)
+                        new_child.weight_fake_quant(new_child.weight)
+                        disable_observer(new_child)
+                    else:
+                        new_child = float_child.to(device)
+                    setattr(module, name, new_child)
+                else:
+                    traverse(child)
+
+        observed_module.apply(enable_fake_quant)
+        observed_module.apply(disable_observer)
+        traverse(observed_module)
+
+    def del_redundant_fakequant(self, prepared: GraphModule):
+        """delete redundant fakequant op in prepared model.
+
+        Returns:
+            prepared (GraphModule): prepared model after delete redundant
+                fakequant op.
+
+        Notes:
+             We can configure different ways to delete redundant nodes:
+                @property
+                def module_prev_wo_fakequant(self):
+                    return (torch.nn.ReLU6, torch.nn.Identity)
+        """
+        extra_module_prev_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_module_prev_wo_fakequant', tuple())
+        prepared = del_fakequant_before_module(
+            prepared,
+            self.module_prev_wo_fakequant + extra_module_prev_wo_fakequant,
+            inplace=True)
+
+        extra_module_next_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_module_next_wo_fakequant', tuple())
+        prepared = del_fakequant_after_module(
+            prepared,
+            self.module_next_wo_fakequant + extra_module_next_wo_fakequant,
+            inplace=True)
+
+        extra_function_prev_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_function_prev_wo_fakequant', tuple())
+        prepared = del_fakequant_before_function(
+            prepared,
+            self.function_prev_wo_fakequant + extra_function_prev_wo_fakequant,
+            inplace=True)
+
+        extra_function_next_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_function_next_wo_fakequant', tuple())
+        prepared = del_fakequant_after_function(
+            prepared,
+            self.function_next_wo_fakequant + extra_function_next_wo_fakequant,
+            inplace=True)
+
+        extra_method_prev_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_method_prev_wo_fakequant', tuple())
+        prepared = del_fakequant_before_method(
+            prepared,
+            self.method_prev_wo_fakequant + extra_method_prev_wo_fakequant,
+            inplace=True)
+
+        extra_method_next_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_method_next_wo_fakequant', tuple())
+        prepared = del_fakequant_after_method(
+            prepared,
+            self.method_next_wo_fakequant + extra_method_next_wo_fakequant,
+            inplace=True)
+
+        extra_op_prev_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_op_prev_wo_fakequant', tuple())
+        prepared = del_fakequant_before_op(
+            prepared,
+            self.op_prev_wo_fakequant + extra_op_prev_wo_fakequant,
+            inplace=True)
+
+        extra_op_next_wo_fakequant = self.extra_redundant_fakequants.get(
+            'extra_op_next_wo_fakequant', tuple())
+        prepared = del_fakequant_after_op(
+            prepared,
+            self.op_next_wo_fakequant + extra_op_next_wo_fakequant,
+            inplace=True)
+        return prepared
+
+    @property
+    def module_prev_wo_fakequant(self):
+        """Configurate the modules that their previous nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def module_next_wo_fakequant(self):
+        """Configurate the modules that their next nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def function_prev_wo_fakequant(self):
+        """Configurate the functions that their previous nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def function_next_wo_fakequant(self):
+        """Configurate the functions that their next nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def method_prev_wo_fakequant(self):
+        """Configurate the methods that their previous nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def method_next_wo_fakequant(self):
+        """Configurate the methods that their next nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def op_prev_wo_fakequant(self):
+        """Configurate the OPs that their previous nodes are redundant
+        fakequants."""
+        return tuple()
+
+    @property
+    def op_next_wo_fakequant(self):
+        """Configurate the OPs that their next nodes are redundant
+        fakequants."""
+        return tuple()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/openvino_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/openvino_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f5ef3873c0c41e958395879e86eace3f99df8a0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/openvino_quantizer.py
@@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Optional, Tuple, Union
+
+import torch
+
+from mmrazor.registry import MODELS
+from .native_quantizer import TorchNativeQuantizer
+
+
+@MODELS.register_module()
+class OpenVINOQuantizer(TorchNativeQuantizer):
+    """Quantizer for quantizing and deploying to Openvino backend.
+
+    Each backend has its own features, for reducing the gap of quantized
+    performance between before and after deployment as possible, we should
+    match the backend's features in quantization.
+
+    Openvino's some important features about quantization is as follows:
+    * support_w_mode = ('per_tensor', 'per_channel')
+    * support_a_mode = ('per_tensor')
+    * weight range should be symmetric, such as int 8 is [-127, 127] rather
+    than [-128, 127]
+    """
+
+    @property
+    def backend(self):
+        """The backend to deploy, also the key of the corresponding backend
+        config."""
+        return 'openvino'
+
+    @property
+    def support_w_modes(self):
+        """Supported quantization modes for weight about per_tensor or
+        per_channel."""
+        return ('per_tensor', 'per_channel')
+
+    @property
+    def support_a_modes(self):
+        """Supported quantization modes for activation about per_tensor or
+        per_channel."""
+        return ('per_tensor')
+
+    def export_onnx(self,
+                    model: Union[torch.nn.Module, torch.jit.ScriptModule,
+                                 torch.jit.ScriptFunction],
+                    args: Union[Tuple[Any, ...], torch.Tensor],
+                    output_path: str,
+                    opset_version: Optional[int] = 11,
+                    **kwargs):
+        """Export the onnx model that can be deployed to OpenVino backend."""
+
+        symbolic_output_path = output_path.replace('.onnx', '_symbolic.onnx')
+        torch.onnx.export(
+            model,
+            args,
+            symbolic_output_path,
+            opset_version=opset_version,
+            **kwargs)
+
+        from .exporters import OpenVinoQuantizeExportor
+        exporter = OpenVinoQuantizeExportor(symbolic_output_path, output_path)
+        exporter.export()
+
+    @property
+    def module_prev_wo_fakequant(self):
+        """Configurate the modules that their previous nodes are redundant
+        fakequants."""
+        return (torch.nn.ReLU6, torch.nn.Identity)
+
+    @property
+    def module_next_wo_fakequant(self):
+        """Configurate the modules that their next nodes are redundant
+        fakequants."""
+        return (torch.nn.MaxPool2d, )
+
+    @property
+    def method_next_wo_fakequant(self):
+        """Configurate the methods that their next nodes are redundant
+        fakequants."""
+        return ('flatten', )
+
+    @property
+    def op_prev_wo_fakequant(self):
+        """Configurate the OPs that their previous nodes are redundant
+        fakequants."""
+        return ('output', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/tensorrt_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/tensorrt_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..be067fd4f5de05bb516f39c70d32e8c99efa645f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/quantizers/tensorrt_quantizer.py
@@ -0,0 +1,84 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Optional, Tuple, Union
+
+import torch
+
+from mmrazor.registry import MODELS
+from .native_quantizer import TorchNativeQuantizer
+
+
+@MODELS.register_module()
+class TensorRTQuantizer(TorchNativeQuantizer):
+    """Quantizer for quantizing and deploying to TensorRT backend.
+
+    Each backend has its own features, for reducing the gap of quantized
+    performance between before and after deployment as possible, we should
+    match the backend's features in quantization.
+
+    TensorRT's some important features about quantization is as follows:
+    * support_w_mode = ('per_tensor', 'per_channel')
+    * support_a_mode = ('per_tensor')
+    """
+
+    @property
+    def backend(self):
+        """The backend to deploy, also the key of the corresponding backend
+        config."""
+        return 'tensorrt'
+
+    @property
+    def support_w_modes(self):
+        """Supported quantization modes for weight about per_tensor or
+        per_channel."""
+        return ('per_tensor', 'per_channel')
+
+    @property
+    def support_a_modes(self):
+        """Supported quantization modes for activation about per_tensor or
+        per_channel."""
+        return ('per_tensor')
+
+    def export_onnx(self,
+                    model: Union[torch.nn.Module, torch.jit.ScriptModule,
+                                 torch.jit.ScriptFunction],
+                    args: Union[Tuple[Any, ...], torch.Tensor],
+                    output_path: str,
+                    opset_version: Optional[int] = 13,
+                    **kwargs):
+        """Export the onnx model that can be deployed to OpenVino backend."""
+
+        symbolic_output_path = output_path.replace('.onnx', '_symbolic.onnx')
+        torch.onnx.export(
+            model,
+            args,
+            symbolic_output_path,
+            opset_version=opset_version,
+            **kwargs)
+
+        from .exporters import TensorRTExplicitExporter
+        exporter = TensorRTExplicitExporter(symbolic_output_path, output_path)
+        exporter.export()
+
+    @property
+    def module_prev_wo_fakequant(self):
+        """Configurate the modules that their previous nodes are redundant
+        fakequants."""
+        return (torch.nn.ReLU6, torch.nn.Identity)
+
+    @property
+    def module_next_wo_fakequant(self):
+        """Configurate the modules that their next nodes are redundant
+        fakequants."""
+        return (torch.nn.MaxPool2d, )
+
+    @property
+    def method_next_wo_fakequant(self):
+        """Configurate the methods that their next nodes are redundant
+        fakequants."""
+        return ('flatten', )
+
+    @property
+    def op_prev_wo_fakequant(self):
+        """Configurate the OPs that their previous nodes are redundant
+        fakequants."""
+        return ('output', )
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..931278b8abe4f0362d944186e6c2c4a0b25849c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .delivery import *  # noqa: F401,F403
+from .demo_inputs import *  # noqa: F401,F403
+from .estimators import ResourceEstimator
+from .predictor import *  # noqa: F401,F403
+from .recorder import *  # noqa: F401,F403
+from .tracer import *  # noqa: F401,F403
+
+__all__ = ['ResourceEstimator']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..814272932f790ed10afccdece7aea1bc5edb3f20
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .delivery_manager import DistillDeliveryManager
+from .function_outputs_delivery import FunctionOutputsDelivery
+from .method_outputs_delivery import MethodOutputsDelivery
+
+__all__ = [
+    'FunctionOutputsDelivery', 'MethodOutputsDelivery',
+    'DistillDeliveryManager'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/delivery_manager.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/delivery_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..592d8917d1606e347e69d83421760e39767ec99f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/delivery_manager.py
@@ -0,0 +1,113 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, Optional
+
+from mmrazor.registry import TASK_UTILS
+from .distill_delivery import DistillDelivery
+
+SUPPORT_DELIVERIES = ['FunctionOutputs', 'MethodOutputs']
+
+
+class DistillDeliveryManager:
+    """Various types deliveries' manager. The ``DistillDeliveryManager`` is
+    also a context manager, managing various types of deliveries.
+
+    When entering the ``DistillDeliveryManager``, all deliveries managed by it
+    will be started.
+
+    Notes:
+        DistillDelivery is a context manager used to override function(method)
+        outputs during teacher(student) forward.
+
+    Args:
+        deliveries (dict): Configs of all deliveries.
+
+    Examples:
+        >>> from mmcls.models.utils import Augments
+
+        >>> augments_cfg = dict(
+        ...     type='BatchMixup', alpha=1., num_classes=10, prob=1.0)
+        >>> augments = Augments(augments_cfg)
+        >>> imgs = torch.randn(2, 3, 32, 32)
+        >>> label = torch.randint(0, 10, (2,))
+
+        >>> # Without ``MethodOutputsDelivery``, outputs of the teacher and
+        >>> # the student are different.
+        >>> imgs_tea, label_tea = augments(imgs, label)
+        >>> imgs_stu, label_stu = augments(imgs, label)
+        >>> torch.equal(label_tea, label_stu)
+        False
+        >>> torch.equal(imgs_tea, imgs_stu)
+        False
+
+        >>> distill_deliveries = ConfigDict(
+        ...     aug=dict(type='MethodOutputs', max_keep_data=1,
+        ...             method_path='mmcls.models.utils.Augments.__call__'))
+        >>> manager = DistillDeliveryManager(distill_deliveries)
+
+        >>> manager.override_data = False
+        >>> with manager:
+        ...     imgs_tea, label_tea = augments(imgs, label)
+
+        >>> manager.override_data = True
+        >>> with manager:
+        ...     imgs_stu, label_stu = augments(imgs, label)
+
+        >>> torch.equal(label_tea, label_stu)
+        True
+        >>> torch.equal(imgs_tea, imgs_stu)
+        True
+    """
+
+    def __init__(self, deliveries: Optional[Dict[str, Dict]] = None) -> None:
+
+        self._deliveries: Dict[str, DistillDelivery] = dict()
+        if deliveries:
+            for delivery_name, delivery_cfg in deliveries.items():
+                delivery_cfg_ = copy.deepcopy(delivery_cfg)
+                delivery_type_ = delivery_cfg_.get('type', '')
+                assert isinstance(delivery_type_, str)
+                assert delivery_type_ in SUPPORT_DELIVERIES
+
+                delivery_type_ = delivery_type_ + 'Delivery'
+                delivery_cfg_.update(dict(type=delivery_type_))
+
+                delivery = TASK_UTILS.build(delivery_cfg_)
+                self.deliveries[delivery_name] = delivery
+
+        self._override_data = False
+
+    @property
+    def deliveries(self) -> Dict[str, DistillDelivery]:
+        """dict: all deliveries."""
+        return self._deliveries
+
+    @property
+    def override_data(self) -> bool:
+        """bool: indicate whether to override the data with the recorded data.
+        """
+        return self._override_data
+
+    @override_data.setter
+    def override_data(self, override: bool) -> None:
+        """Set the override_data property to all the deliveries.
+
+        If the `override_data` of a delivery is False, the delivery will
+        record the origin data.
+
+        If the `override_data` of a delivery is True, the delivery will
+        override the origin data with the recorded data.
+        """
+        self._override_data = override
+        for delivery in self.deliveries.values():
+            delivery.override_data = override
+
+    def __enter__(self) -> None:
+        """Enter the context manager."""
+        for delivery in self.deliveries.values():
+            delivery.__enter__()
+
+    def __exit__(self, exc_type, exc_value, traceback) -> None:
+        """Exit the context manager."""
+        for delivery in self.deliveries.values():
+            delivery.__exit__(exc_type, exc_value, traceback)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/distill_delivery.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/distill_delivery.py
new file mode 100644
index 0000000000000000000000000000000000000000..d8c335f005bc58ed398ab9581f6cff5102b7c407
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/distill_delivery.py
@@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from collections import deque
+from typing import Callable
+
+
+# TODO: Support overriding part of the outputs of a function or method
+class DistillDelivery(metaclass=ABCMeta):
+    """Base class for deliveries for distillation.
+
+    DistillDelivery is a context manager used to override function(method)
+    outputs during teacher(student) forward.
+
+    A delivery can only handle one function or method. Some algorithms may use
+    multiple deliveries, which can be managed uniformly using
+    ``DistillDeliverManager``.
+
+    Args:
+        max_keep_data (int): The length limitation of the queue, should be
+            larger than the execute times of the function or method. Defaults
+            to 1.
+
+    Notes:
+        If a function (method) is executed more than once during the forward
+        of the source model, all the outputs of this function (method) will be
+        used to override function (method) outputs from the target model.
+
+        If a function or method is executed more than once during the forward
+        of the target model, its' outputs from the source model are pushed
+        into the queue in order.
+    """
+
+    def __init__(self, max_keep_data: int = 1) -> None:
+
+        self._override_data = False
+        self.data_queue: deque = deque([], maxlen=max_keep_data)
+        self.max_keep_data = max_keep_data
+
+    @property
+    def override_data(self) -> bool:
+        """bool: indicate whether to override the data with the recorded data.
+        """
+        return self._override_data
+
+    @override_data.setter
+    def override_data(self, override: bool) -> None:
+        """Set the override_data property to this delivery.
+
+        If the `override_data` of a deliver is False, the deliver will record
+        and keep the origin data. If the current_mode of a deliver is True, the
+        deliver will override the origin data with the recorded data.
+        """
+        self._override_data = override
+
+    @abstractmethod
+    def deliver_wrapper(self, origin: Callable) -> Callable:
+        """Wrap the specific object to make the intermediate results of the
+        model can be delivered."""
+
+    @abstractmethod
+    def __enter__(self) -> None:
+        """Enter the context manager."""
+
+    @abstractmethod
+    def __exit__(self, exc_type, exc_value, traceback) -> None:
+        """Exit the context manager."""
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/function_outputs_delivery.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/function_outputs_delivery.py
new file mode 100644
index 0000000000000000000000000000000000000000..15c361e3835ca4264715d819490bbb88d6667db4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/function_outputs_delivery.py
@@ -0,0 +1,158 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from types import FunctionType
+from typing import Callable
+
+from mmengine.utils import import_modules_from_strings
+
+from mmrazor.registry import TASK_UTILS
+from .distill_delivery import DistillDelivery
+
+
+@TASK_UTILS.register_module()
+class FunctionOutputsDelivery(DistillDelivery):
+    """Delivery for intermediate results which are ``FunctionType``'s outputs.
+
+    Args:
+        func_path (str): The name of the function whose output needs to be
+            delivered.
+        max_keep_data (int): The length limitation of the queue. Outputs from
+            the source model are pushed in the queue in order.
+
+    Notes:
+        The form of `func_path` needs special attention. For example,
+        `anchor_inside_flags` is a function in mmdetection to check whether the
+        anchors are inside the border. This function is in
+        `mmdet/core/anchor/utils.py` and used in
+        `mmdet/models/dense_heads/anchor_head`. Then the `func_path` should be
+        `mmdet.models.dense_heads.anchor_head.anchor_inside_flags` but not
+        `mmdet.core.anchor.utils.anchor_inside_flags`.
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> def toy_func():
+        >>>     return random.randint(0, 1000)
+
+        >>> # Below code in main.py
+        >>> # Teacher and student both will execute toy_func.
+        >>> # Now, we want to deliver outputs from the teacher to
+        >>> # the student
+        >>> import toy_module
+        >>> delivery = FunctionOutputsDeliver(
+        ...     max_keep_data=1, func_path='toy_module.toy_func')
+
+        >>> delivery.override_data = False
+        >>> with delivery:
+        ...     output_teacher = toy_module.toy_func()
+
+        >>> delivery.override_data = True
+        >>> with delivery:
+        ...     output_student = toy_module.toy_func()
+
+        >>> output_teacher == output_student
+        True
+
+        >>> # If a function (method) is executed more than once during the
+        >>> # forward of the source model, all the outputs of this function
+        >>> # (method) will be used to override function (method) outputs from
+        >>> # the target model.
+        >>> delivery = FunctionOutputsDeliver(
+        ...     max_keep_data=2, func_path='toy_module.toy_func')
+
+        >>> delivery.override_data = False
+        >>> with delivery:
+        ...     output1_tea = toy_module.toy_func()
+        ...     output2_tea = toy_module.toy_func()
+
+        >>> delivery.override_data = True
+        >>> with delivery:
+        ...     output1_stu = toy_module.toy_func()
+        ...     output2_stu = toy_module.toy_func()
+
+        >>> output1_stu == output1_tea and output2_stu == output2_tea
+        True
+    """
+
+    def __init__(self, func_path: str, max_keep_data: int):
+        super().__init__(max_keep_data)
+
+        self._check_valid_path(func_path)
+        self.func_path = func_path
+
+    @staticmethod
+    def _check_valid_path(func_path: str) -> None:
+        """Check if the `func_path` is valid."""
+        if not isinstance(func_path, str):
+            raise TypeError(f'func_path should be a FunctionType '
+                            f'instance, but got {type(func_path)}')
+
+        assert len(func_path.split('.')) > 1, \
+            'func_path must have at least one `.`'
+
+    @staticmethod
+    def _get_func_name(func_path: str) -> str:
+        """Get the function name according to `func_path`."""
+        return func_path.split('.')[-1]
+
+    @staticmethod
+    def _get_module_path(func_path: str) -> str:
+        """Get the module name according to `func_path`."""
+        return '.'.join(func_path.split('.')[:-1])
+
+    def __enter__(self) -> None:
+        """Enter the context manager.
+
+        Wrap the origin function.
+        """
+        module_path = self._get_module_path(self.func_path)
+        try:
+            module = import_modules_from_strings(module_path)
+        except ImportError:
+            raise ImportError(f'{module_path} is not imported correctly.')
+        self.module = module
+
+        func_name = self._get_func_name(self.func_path)
+        assert hasattr(module, func_name), \
+            f'{func_name} is not in {module_path}.'
+        self.func_name = func_name
+
+        origin_func = getattr(module, func_name)
+        if not isinstance(origin_func, FunctionType):
+            raise TypeError(f'{func_name} should be a FunctionType '
+                            f'instance, but got {type(origin_func)}')
+        self.origin_func = origin_func
+
+        wrapped_func = self.deliver_wrapper(self.origin_func)
+        setattr(self.module, self.func_name, wrapped_func)
+
+    def __exit__(self, exc_type, exc_value, traceback) -> None:
+        """Exit the context manager.
+
+        Reset the origin function.
+        """
+        setattr(self.module, self.func_name, self.origin_func)
+
+        # self.module and self.origin_func can not be pickled.
+        # Delete these two attributes to avoid errors when ema model is used.
+        del self.module
+        del self.origin_func
+
+    def deliver_wrapper(self, origin_func: Callable) -> Callable:
+        """Wrap the specific function to make the intermediate results of the
+        model can be delivered."""
+
+        @functools.wraps(origin_func)
+        def wrap_func(*args, **kwargs):
+
+            if self.override_data:
+                assert len(self.data_queue) > 0, 'pop from an empty queue'
+                outputs = self.data_queue.popleft()
+            else:
+                assert len(self.data_queue) < self.data_queue.maxlen,\
+                    'push into an full queue'
+                outputs = origin_func(*args, **kwargs)
+                self.data_queue.append(outputs)
+            return outputs
+
+        return wrap_func
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/method_outputs_delivery.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/method_outputs_delivery.py
new file mode 100644
index 0000000000000000000000000000000000000000..fa9f6c4a4be32b7d112356f172761310b9e23273
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/delivery/method_outputs_delivery.py
@@ -0,0 +1,155 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from types import FunctionType, ModuleType
+from typing import Callable
+
+from mmengine.utils import import_modules_from_strings
+
+from mmrazor.registry import TASK_UTILS
+from .distill_delivery import DistillDelivery
+
+
+@TASK_UTILS.register_module()
+class MethodOutputsDelivery(DistillDelivery):
+    """Delivery for intermediate results which are ``MethodType``'s outputs.
+
+    Note:
+        Different from ``FunctionType``, ``MethodType`` is the type of methods
+        of class instances.
+
+    Args:
+        method_path (str): The name of the method whose output needs to be
+            delivered.
+        max_keep_data (int): The length limitation of the queue. Outputs from
+            the source model are pushed in the queue in order.
+
+    Examples:
+        >>> from mmcls.models.utils import Augments
+
+        >>> augments_cfg = dict(
+        ...     type='BatchMixup', alpha=1., num_classes=10, prob=1.0)
+        >>> augments = Augments(augments_cfg)
+        >>> imgs = torch.randn(2, 3, 32, 32)
+        >>> label = torch.randint(0, 10, (2,))
+
+        >>> # Without ``MethodOutputsDeliver``, outputs of the teacher and the
+        >>> # student are very likely to be different.
+        >>> imgs_tea, label_tea = augments(imgs, label)
+        >>> imgs_stu, label_stu = augments(imgs, label)
+        >>> torch.equal(label_tea, label_stu)
+        False
+        >>> torch.equal(imgs_tea, imgs_stu)
+        False
+
+        >>> # Suppose we want to deliver outputs from the teacher to
+        >>> # the student
+        >>> delivery = MethodOutputsDeliver(
+        ...     max_keep_data=1,
+        ...     method_path='mmcls.models.utils.Augments.__call__')
+
+        >>> delivery.override_data = False
+        >>> with delivery:
+        ...     imgs_tea, label_tea = augments(imgs, label)
+
+        >>> delivery.override_data = True
+        >>> with delivery:
+        ...     imgs_stu, label_stu = augments(imgs, label)
+
+        >>> torch.equal(label_tea, label_stu)
+        True
+        >>> torch.equal(imgs_tea, imgs_stu)
+        True
+    """
+
+    def __init__(self, method_path: str, max_keep_data: int):
+        super().__init__(max_keep_data)
+
+        self._check_valid_path(method_path)
+        module_path = self._get_module_path(method_path)
+        try:
+            module: ModuleType = import_modules_from_strings(module_path)
+        except ImportError:
+            raise ImportError(f'{module_path} is not imported correctly.')
+
+        cls_name = self._get_cls_name(method_path)
+        assert hasattr(module, cls_name), \
+            f'{cls_name} is not in {module_path}.'
+
+        imported_cls: type = getattr(module, cls_name)
+        if not isinstance(imported_cls, type):
+            raise TypeError(f'{cls_name} should be a type '
+                            f'instance, but got {type(imported_cls)}')
+        self.imported_cls = imported_cls
+
+        method_name = self._get_method_name(method_path)
+        assert hasattr(imported_cls, method_name), \
+            f'{method_name} is not in {cls_name}.'
+        self.method_name = method_name
+
+        origin_method = getattr(imported_cls, method_name)
+        # Before instantiation of a class, the type of a method of a class
+        # is FunctionType
+        if not isinstance(origin_method, FunctionType):
+            raise TypeError(f'{method_name} should be a FunctionType  '
+                            f'instance, but got {type(origin_method)}')
+        self.origin_method = origin_method
+
+    @staticmethod
+    def _check_valid_path(method_path: str) -> None:
+        """Check if the `method_path` is valid."""
+        if not isinstance(method_path, str):
+            raise TypeError(f'method_path should be a str instance, '
+                            f'but got {type(method_path)}')
+
+        assert len(method_path.split('.')) > 2, \
+            'method_path must have at least one `.`'
+
+    @staticmethod
+    def _get_method_name(method_path: str) -> str:
+        """Get the method name according to `method_path`."""
+        return method_path.split('.')[-1]
+
+    @staticmethod
+    def _get_cls_name(method_path: str) -> str:
+        """Get the class name corresponding to this method according to
+        `method_path`."""
+        return method_path.split('.')[-2]
+
+    @staticmethod
+    def _get_module_path(method_path: str) -> str:
+        """Get the module name according to `method_path`."""
+        return '.'.join(method_path.split('.')[:-2])
+
+    def __enter__(self) -> None:
+        """Enter the context manager.
+
+        Wrap the origin method.
+        """
+        wrapped_method = self.deliver_wrapper(self.origin_method)
+        setattr(self.imported_cls, self.method_name, wrapped_method)
+
+    def __exit__(self, exc_type, exc_value, traceback) -> None:
+        """Exit the context manager.
+
+        Reset the origin method.
+        """
+        setattr(self.imported_cls, self.method_name, self.origin_method)
+
+    def deliver_wrapper(self, origin_method: Callable) -> Callable:
+        """Wrap the specific method to make the intermediate results of the
+        model can be delivered."""
+
+        @functools.wraps(origin_method)
+        def wrap_method(*args, **kwargs):
+
+            if self.override_data:
+                assert len(self.data_queue) > 0, 'pop from an empty queue'
+                outputs = self.data_queue.popleft()
+            else:
+                assert len(self.data_queue) < self.data_queue.maxlen,\
+                    'push into an full queue'
+                outputs = origin_method(*args, **kwargs)
+                self.data_queue.append(outputs)
+            return outputs
+
+        return wrap_method
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d25e342e26717a5f7f15cf25cc912f296d8729a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .default_demo_inputs import DefaultDemoInput, defaul_demo_inputs
+from .demo_inputs import (BaseDemoInput, DefaultMMClsDemoInput,
+                          DefaultMMDemoInput, DefaultMMDetDemoInput,
+                          DefaultMMSegDemoInput)
+
+__all__ = [
+    'defaul_demo_inputs',
+    'DefaultMMClsDemoInput',
+    'DefaultMMDetDemoInput',
+    'DefaultMMDemoInput',
+    'DefaultMMSegDemoInput',
+    'BaseDemoInput',
+    'DefaultDemoInput',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py
new file mode 100644
index 0000000000000000000000000000000000000000..60b69a738b5f24f97e95a15c33566b2c9429190f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py
@@ -0,0 +1,108 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import OrderedDict
+
+import torch.nn as nn
+from mmengine.model import BaseModel
+
+from mmrazor.registry import TASK_UTILS
+from mmrazor.utils import get_placeholder
+from ...algorithms.base import BaseAlgorithm
+from .demo_inputs import (BaseDemoInput, DefaultMMClsDemoInput,
+                          DefaultMMDemoInput, DefaultMMDetDemoInput,
+                          DefaultMMPoseDemoInput, DefaultMMRotateDemoInput,
+                          DefaultMMSegDemoInput, DefaultMMYoloDemoInput)
+
+try:
+    from mmdet.models import BaseDetector
+except Exception:
+    BaseDetector = get_placeholder('mmdet')
+
+try:
+    from mmcls.models import ImageClassifier
+except Exception:
+    ImageClassifier = get_placeholder('mmcls')
+
+try:
+    from mmseg.models import BaseSegmentor
+except Exception:
+    BaseSegmentor = get_placeholder('mmseg')
+
+# New
+try:
+    from mmpose.models import TopdownPoseEstimator
+except Exception:
+    TopdownPoseEstimator = get_placeholder('mmpose')
+
+default_demo_input_class = OrderedDict([
+    (BaseDetector, DefaultMMDetDemoInput),
+    (ImageClassifier, DefaultMMClsDemoInput),
+    (BaseSegmentor, DefaultMMSegDemoInput),
+    (TopdownPoseEstimator, DefaultMMPoseDemoInput),
+    (BaseModel, DefaultMMDemoInput),
+    (nn.Module, BaseDemoInput),
+])
+
+default_demo_input_class_for_scope = {
+    'mmcls': DefaultMMClsDemoInput,
+    'mmdet': DefaultMMDetDemoInput,
+    'mmseg': DefaultMMSegDemoInput,
+    'mmrotate': DefaultMMRotateDemoInput,
+    'mmyolo': DefaultMMYoloDemoInput,
+    'mmpose': DefaultMMPoseDemoInput,
+    'torchvision': BaseDemoInput,
+}
+
+
+def get_default_demo_input_class(model, scope):
+    """Get demo input generator according to a model and scope."""
+    if scope is not None:
+        for scope_name, demo_input in default_demo_input_class_for_scope.items(
+        ):
+            if scope == scope_name:
+                return demo_input
+
+    for module_type, demo_input in default_demo_input_class.items(  # noqa
+    ):  # noqa
+        if isinstance(model, module_type):
+            return demo_input
+    # default
+    return BaseDemoInput
+
+
+def defaul_demo_inputs(model, input_shape, training=False, scope=None):
+    """Get demo input according to a model and scope."""
+    if isinstance(model, BaseAlgorithm):
+        return defaul_demo_inputs(model.architecture, input_shape, training,
+                                  scope)
+    else:
+        demo_input = get_default_demo_input_class(model, scope)
+        return demo_input().get_data(model, input_shape, training)
+
+
+@TASK_UTILS.register_module()
+class DefaultDemoInput(BaseDemoInput):
+    """Default demo input generator.
+
+    Args:
+        input_shape: default input shape . Defaults to None.
+        training (bool, optional): Whether is training mode. Defaults to False.
+        scope (str, optional): mm scope name. Defaults to None.
+    """
+
+    def __init__(
+        self,
+        input_shape=None,
+        training=False,
+        scope: str = None,
+        kwargs={},
+    ) -> None:
+
+        default_demo_input_class = get_default_demo_input_class(None, scope)
+        if input_shape is None:
+            input_shape = default_demo_input_class.default_shape
+        super().__init__(input_shape, training, kwargs=kwargs)
+        self.scope = scope
+
+    def _get_data(self, model, input_shape, training):
+        """Helper for get_data, including core logic to generate demo input."""
+        return defaul_demo_inputs(model, input_shape, training, self.scope)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py
new file mode 100644
index 0000000000000000000000000000000000000000..e1222f2b1fbbcb8a3b06a335ca43e76b8befcce2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/demo_inputs.py
@@ -0,0 +1,148 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+
+@TASK_UTILS.register_module()
+class BaseDemoInput():
+    """Base demo input generator.
+
+    Args:
+        input_shape: Default input shape. Defaults to default_shape.
+        training (bool, optional): Default training mode. Defaults to None.
+        kwargs (dict): Other keyword args to update the generated inputs.
+    """
+    default_shape = (1, 3, 224, 224)
+
+    def __init__(self,
+                 input_shape=default_shape,
+                 training=None,
+                 kwargs={}) -> None:
+
+        self.input_shape = input_shape
+        self.training = training
+        self.kwargs = kwargs
+
+    def get_data(self, model, input_shape=None, training=None):
+        """Api to generate demo input."""
+        if input_shape is None:
+            input_shape = self.input_shape
+        if training is None:
+            training = self.training
+
+        data = self._get_data(model, input_shape, training)
+        if isinstance(data, dict):
+            data.update(self.kwargs)
+        return data
+
+    def _get_data(self, model, input_shape, training):
+        """Helper for get_data, including core logic to generate demo input."""
+        return torch.rand(input_shape)
+
+    def __call__(self,
+                 model=None,
+                 input_shape=[1, 3, 224, 224],
+                 training=False):
+        return self.get_data(model, input_shape, training)
+
+
+@TASK_UTILS.register_module()
+class DefaultMMDemoInput(BaseDemoInput):
+    """Default demo input generator for openmmable models."""
+
+    def _get_data(self, model, input_shape=None, training=None):
+        """Helper for get_data, including core logic to generate demo input."""
+
+        data = self._get_mm_data(model, input_shape, training)
+        data['mode'] = 'tensor'
+        return data
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        data = {'inputs': torch.rand(input_shape), 'data_samples': None}
+        data = model.data_preprocessor(data, training)
+        return data
+
+
+@TASK_UTILS.register_module()
+class DefaultMMClsDemoInput(DefaultMMDemoInput):
+    """Default demo input generator for mmcls models."""
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        """Helper for get_data, including core logic to generate demo input."""
+        from mmcls.structures import ClsDataSample
+        x = torch.rand(input_shape)
+        mm_inputs = {
+            'inputs':
+            x,
+            'data_samples': [
+                ClsDataSample(
+                    metainfo=dict(img_shape=input_shape[i],
+                                  num_classes=1000)).set_gt_label(1)
+                for i in range(input_shape[0])
+            ],
+        }
+        mm_inputs = model.data_preprocessor(mm_inputs, training)
+        return mm_inputs
+
+
+@TASK_UTILS.register_module()
+class DefaultMMDetDemoInput(DefaultMMDemoInput):
+    """Default demo input generator for mmdet models."""
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        """Helper for get_data, including core logic to generate demo input."""
+        from mmdet.models import BaseDetector
+        from mmdet.testing._utils import demo_mm_inputs
+        assert isinstance(model, BaseDetector), f'{type(model)}'
+
+        data = demo_mm_inputs(1, [input_shape[1:]], with_mask=True)
+        data = model.data_preprocessor(data, training)
+        return data
+
+
+@TASK_UTILS.register_module()
+class DefaultMMSegDemoInput(DefaultMMDemoInput):
+    """Default demo input generator for mmseg models."""
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        """Helper for get_data, including core logic to generate demo input."""
+        from mmseg.models import BaseSegmentor
+        assert isinstance(model, BaseSegmentor)
+        from .mmseg_demo_input import demo_mmseg_inputs
+        data = demo_mmseg_inputs(model, input_shape)
+        return data
+
+
+@TASK_UTILS.register_module()
+class DefaultMMRotateDemoInput(DefaultMMDemoInput):
+    """Default demo input generator for mmrotate models."""
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        """Helper for get_data, including core logic to generate demo input."""
+        from mmrotate.testing._utils import demo_mm_inputs
+
+        data = demo_mm_inputs(1, [input_shape[1:]], use_box_type=True)
+        data = model.data_preprocessor(data, training)
+        return data
+
+
+@TASK_UTILS.register_module()
+class DefaultMMYoloDemoInput(DefaultMMDetDemoInput):
+    """Default demo input generator for mmyolo models."""
+
+    default_shape = (1, 3, 125, 320)
+
+
+@TASK_UTILS.register_module()
+class DefaultMMPoseDemoInput(DefaultMMDemoInput):
+    """Default demo input generator for mmpose models."""
+
+    def _get_mm_data(self, model, input_shape, training=False):
+        from mmpose.models import TopdownPoseEstimator
+
+        from .mmpose_demo_input import demo_mmpose_inputs
+        assert isinstance(model, TopdownPoseEstimator), f'{type(model)}'
+
+        data = demo_mmpose_inputs(model, input_shape)
+        return data
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmpose_demo_input.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmpose_demo_input.py
new file mode 100644
index 0000000000000000000000000000000000000000..dbf5f2772decf0a082bb5e90d063cb81cde01f56
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmpose_demo_input.py
@@ -0,0 +1,119 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""Include functions to generate mmpose demo inputs.
+
+Modified from mmpose.
+"""
+
+import torch
+from mmpose.models.heads import (CPMHead, DSNTHead, HeatmapHead,
+                                 IntegralRegressionHead, MSPNHead,
+                                 RegressionHead, RLEHead, SimCCHead,
+                                 ViPNASHead)
+from mmpose.testing._utils import get_packed_inputs
+
+from mmrazor.utils import get_placeholder
+
+try:
+    from mmpose.models import PoseDataPreProcessor
+    from mmpose.structures import PoseDataSample
+except ImportError:
+    PoseDataPreProcessor = get_placeholder('mmpose')
+    PoseDataSample = get_placeholder('mmpose')
+
+
+def demo_mmpose_inputs(model, for_training=False, batch_size=1):
+    input_shape = (
+        1,
+        3,
+    ) + model.head.decoder.input_size
+    imgs = torch.randn(*input_shape)
+
+    batch_data_samples = []
+    from mmpose.models.heads import RTMHead
+    if isinstance(model.head, HeatmapHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.out_channels,
+            heatmap_size=model.head.decoder.heatmap_size[::-1])['data_samples']
+    elif isinstance(model.head, MSPNHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size=batch_size,
+            num_instances=1,
+            num_keypoints=model.head.out_channels,
+            heatmap_size=model.head.decoder.heatmap_size,
+            with_heatmap=True,
+            with_reg_label=False,
+            num_levels=model.head.num_stages *
+            model.head.num_units)['data_samples']
+    elif isinstance(model.head, CPMHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size=batch_size,
+            num_instances=1,
+            num_keypoints=model.head.out_channels,
+            heatmap_size=model.head.decoder.heatmap_size[::-1],
+            with_heatmap=True,
+            with_reg_label=False)['data_samples']
+
+    elif isinstance(model.head, SimCCHead):
+        # bug
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.out_channels,
+            simcc_split_ratio=model.head.decoder.simcc_split_ratio,
+            input_size=model.head.decoder.input_size,
+            with_simcc_label=True)['data_samples']
+
+    elif isinstance(model.head, ViPNASHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.out_channels,
+        )['data_samples']
+
+    elif isinstance(model.head, DSNTHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.num_joints,
+            with_reg_label=True)['data_samples']
+
+    elif isinstance(model.head, IntegralRegressionHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.num_joints,
+            with_reg_label=True)['data_samples']
+
+    elif isinstance(model.head, RegressionHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.num_joints,
+            with_reg_label=True)['data_samples']
+
+    elif isinstance(model.head, RLEHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.num_joints,
+            with_reg_label=True)['data_samples']
+
+    elif isinstance(model.head, RTMHead):
+        batch_data_samples = get_packed_inputs(
+            batch_size,
+            num_keypoints=model.head.out_channels,
+            simcc_split_ratio=model.head.decoder.simcc_split_ratio,
+            input_size=model.head.decoder.input_size,
+            with_simcc_label=True)['data_samples']
+
+    else:
+        raise AssertionError(f'Head Type {type(model.head)} is Not Predefined')
+
+    mm_inputs = {
+        'inputs': torch.FloatTensor(imgs),
+        'data_samples': batch_data_samples
+    }
+
+    # check data preprocessor
+    if not hasattr(model,
+                   'data_preprocessor') or model.data_preprocessor is None:
+        model.data_preprocessor = PoseDataPreProcessor()
+
+    mm_inputs = model.data_preprocessor(mm_inputs, for_training)
+
+    return mm_inputs
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmseg_demo_input.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmseg_demo_input.py
new file mode 100644
index 0000000000000000000000000000000000000000..49dcdf6b5222fccc3cf80b26c944f5f4112bd027
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/demo_inputs/mmseg_demo_input.py
@@ -0,0 +1,81 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""Include functions to generate mmsegementation demo inputs.
+
+Modified from mmseg.
+"""
+import torch
+from mmengine.structures import PixelData
+from torch import nn
+
+from mmrazor.utils import get_placeholder
+
+try:
+    from mmseg.models import SegDataPreProcessor
+    from mmseg.structures import SegDataSample
+except ImportError:
+    SegDataPreProcessor = get_placeholder('mmseg')
+    SegDataSample = get_placeholder('mmseg')
+
+
+def demo_mmseg_inputs(segmentor, input_shape, for_training=False):
+
+    if isinstance(segmentor.decode_head, nn.ModuleList):
+        num_classes = segmentor.decode_head[-1].num_classes
+    else:
+        num_classes = segmentor.decode_head.num_classes
+    # batch_size=2 for BatchNorm
+    mm_inputs = _demo_mmseg_inputs(
+        num_classes=num_classes, input_shape=input_shape)
+
+    # convert to cuda Tensor if applicabled
+    # if torch.cuda.is_available():
+    #     segmentor = segmentor.cuda()
+
+    # check data preprocessor
+    if not hasattr(segmentor,
+                   'data_preprocessor') or segmentor.data_preprocessor is None:
+        segmentor.data_preprocessor = SegDataPreProcessor()
+
+    mm_inputs = segmentor.data_preprocessor(mm_inputs, for_training)
+
+    return mm_inputs
+
+
+def _demo_mmseg_inputs(input_shape=(1, 3, 8, 16), num_classes=10):
+    """Create a superset of inputs needed to run test or train batches.
+
+    Args:
+        input_shape (tuple):
+            input batch dimensions
+
+        num_classes (int):
+            number of semantic classes
+    """
+    (N, C, H, W) = input_shape
+
+    imgs = torch.randn(*input_shape)
+    segs = torch.randint(
+        low=0, high=num_classes - 1, size=(N, H, W), dtype=torch.long)
+
+    img_metas = [{
+        'img_shape': (H, W),
+        'ori_shape': (H, W),
+        'pad_shape': (H, W, C),
+        'filename': '<demo>.png',
+        'scale_factor': 1.0,
+        'flip': False,
+        'flip_direction': 'horizontal'
+    } for _ in range(N)]
+
+    data_samples = [
+        SegDataSample(
+            gt_sem_seg=PixelData(data=segs[i]), metainfo=img_metas[i])
+        for i in range(N)
+    ]
+
+    mm_inputs = {
+        'inputs': torch.FloatTensor(imgs),
+        'data_samples': data_samples
+    }
+
+    return mm_inputs
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f1cd00f8e0bb2c5f8dee8ddee9d21c8e3a45783a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .counters import *  # noqa: F401,F403
+from .resource_estimator import ResourceEstimator
+
+__all__ = ['ResourceEstimator']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/base_estimator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/base_estimator.py
new file mode 100644
index 0000000000000000000000000000000000000000..1a6f69264afa09e81f99641228c606f54845bc30
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/base_estimator.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import Dict, Tuple, Union
+
+import torch.nn
+
+from mmrazor.registry import TASK_UTILS
+
+
+@TASK_UTILS.register_module()
+class BaseEstimator(metaclass=ABCMeta):
+    """The base class of Estimator, used for estimating model infos.
+
+    Args:
+        input_shape (tuple): Input data's default shape, for calculating
+            resources consume. Defaults to (1, 3, 224, 224).
+        units (dict): A dict including required units. Default to dict().
+        as_strings (bool): Output FLOPs and params counts in a string
+            form. Default to False.
+    """
+
+    def __init__(self,
+                 input_shape: Tuple = (1, 3, 224, 224),
+                 units: Dict = dict(),
+                 as_strings: bool = False):
+        assert len(input_shape) in [
+            3, 4, 5
+        ], ('The length of input_shape must be in [3, 4, 5]. '
+            f'Got `{len(input_shape)}`.')
+        self.input_shape = input_shape
+        self.units = units
+        self.as_strings = as_strings
+
+    @abstractmethod
+    def estimate(self,
+                 model: torch.nn.Module,
+                 flops_params_cfg: dict = None,
+                 latency_cfg: dict = None) -> Dict[str, Union[float, str]]:
+        """Estimate the resources(flops/params/latency) of the given model.
+
+        Args:
+            model: The measured model.
+            flops_params_cfg (dict): Cfg for estimating FLOPs and parameters.
+                Default to None.
+            latency_cfg (dict): Cfg for estimating latency. Default to None.
+
+        Returns:
+            Dict[str, Union[float, str]]): A dict that contains the resource
+                results(FLOPs, params and latency).
+        """
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..721987ec10b7f860a3ec0209a225bc69b4ae9117
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .flops_params_counter import get_model_flops_params
+from .latency_counter import get_model_latency
+from .op_counters import *  # noqa: F401,F403
+
+__all__ = ['get_model_flops_params', 'get_model_latency']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/flops_params_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/flops_params_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a3e44df0eef6eeb6441447c9f7b5dab98686577
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/flops_params_counter.py
@@ -0,0 +1,604 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import sys
+from functools import partial
+from typing import Dict, List
+
+import mmcv
+import torch
+import torch.nn as nn
+
+from mmrazor.registry import TASK_UTILS
+
+no_positional_input_warned = False
+
+
+def get_model_flops_params(model,
+                           input_shape=(1, 3, 224, 224),
+                           spec_modules=[],
+                           disabled_counters=[],
+                           print_per_layer_stat=False,
+                           units=dict(flops='M', params='M'),
+                           as_strings=False,
+                           seperate_return: bool = False,
+                           input_constructor=None,
+                           flush=False,
+                           ost=sys.stdout):
+    """Get FLOPs and parameters of a model. This method can calculate FLOPs and
+    parameter counts of a model with corresponding input shape. It can also
+    print FLOPs and params for each layer in a model. Supported layers are
+    listed as below:
+
+        - Convolutions: ``nn.Conv1d``, ``nn.Conv2d``, ``nn.Conv3d``.
+        - Activations: ``nn.ReLU``, ``nn.PReLU``, ``nn.ELU``, ``nn.LeakyReLU``,
+            ``nn.ReLU6``.
+        - Poolings: ``nn.MaxPool1d``, ``nn.MaxPool2d``, ``nn.MaxPool3d``,
+            ``nn.AvgPool1d``, ``nn.AvgPool2d``, ``nn.AvgPool3d``,
+            ``nn.AdaptiveMaxPool1d``, ``nn.AdaptiveMaxPool2d``,
+            ``nn.AdaptiveMaxPool3d``, ``nn.AdaptiveAvgPool1d``,
+            ``nn.AdaptiveAvgPool2d``, ``nn.AdaptiveAvgPool3d``.
+        - BatchNorms: ``nn.BatchNorm1d``, ``nn.BatchNorm2d``,
+            ``nn.BatchNorm3d``.
+        - Linear: ``nn.Linear``.
+        - Deconvolution: ``nn.ConvTranspose2d``.
+        - Upsample: ``nn.Upsample``.
+
+    Args:
+        model (nn.Module): The model for complexity calculation.
+        input_shape (tuple): Input shape (including batchsize) used for
+            calculation. Default to (1, 3, 224, 224).
+        spec_modules (list): A list that contains the names of several spec
+            modules, which users want to get resources infos of them.
+            e.g., ['backbone', 'head'], ['backbone.layer1']. Default to [].
+        disabled_counters (list): One can limit which ops' spec would be
+            calculated. Default to [].
+        print_per_layer_stat (bool): Whether to print FLOPs and params
+            for each layer in a model. Default to True.
+        units (dict): A dict including converted FLOPs and params units.
+            Default to dict(flops='M', params='M').
+        as_strings (bool): Output FLOPs and params counts in a string form.
+            Default to True.
+        seperate_return (bool): Whether to return the resource information
+            separately. Default to False.
+        input_constructor (None | callable): If specified, it takes a callable
+            method that generates input. otherwise, it will generate a random
+            tensor with input shape to calculate FLOPs. Default to None.
+        flush (bool): same as that in :func:`print`. Default to False.
+        ost (stream): same as ``file`` param in :func:`print`.
+            Default to sys.stdout.
+
+    Returns:
+        tuple[float | str] | dict[str, float]: If `as_strings` is set to True,
+            it will return FLOPs and parameter counts in a string format.
+            Otherwise, it will return those in a float number format.
+            NOTE: If seperate_return, it will return a resource info dict with
+            FLOPs & params counts of each spec module in float|string format.
+    """
+    assert type(input_shape) is tuple
+    assert len(input_shape) >= 1
+    assert isinstance(model, nn.Module)
+    if seperate_return and not len(spec_modules):
+        raise AssertionError('`seperate_return` can only be set to True when '
+                             '`spec_modules` are not empty.')
+
+    flops_params_model = add_flops_params_counting_methods(model)
+    flops_params_model.eval()
+    flops_params_model.start_flops_params_count(disabled_counters)
+    if input_constructor:
+        input = input_constructor(input_shape)
+        _ = flops_params_model(**input)
+    else:
+        try:
+            batch = torch.ones(()).new_empty(
+                tuple(input_shape),
+                dtype=next(flops_params_model.parameters()).dtype,
+                device=next(flops_params_model.parameters()).device)
+        except StopIteration:
+            # Avoid StopIteration for models which have no parameters,
+            # like `nn.Relu()`, `nn.AvgPool2d`, etc.
+            batch = torch.ones(()).new_empty(tuple(input_shape))
+
+        _ = flops_params_model(batch)
+
+    flops_count, params_count = \
+        flops_params_model.compute_average_flops_params_cost()
+
+    if print_per_layer_stat:
+        print_model_with_flops_params(
+            flops_params_model,
+            flops_count,
+            params_count,
+            ost=ost,
+            flush=flush)
+
+    if units is not None:
+        flops_count = params_units_convert(flops_count, units['flops'])
+        params_count = params_units_convert(params_count, units['params'])
+
+    if as_strings:
+        flops_suffix = ' ' + units['flops'] + 'FLOPs' if units else ' FLOPs'
+        params_suffix = ' ' + units['params'] if units else ''
+
+    if len(spec_modules):
+        flops_count, params_count = 0.0, 0.0
+        module_names = [name for name, _ in flops_params_model.named_modules()]
+        for module in spec_modules:
+            assert module in module_names, \
+                f'All modules in spec_modules should be in the measured ' \
+                f'flops_params_model. Got module `{module}` in spec_modules.'
+        spec_modules_resources: Dict[str, dict] = dict()
+        accumulate_sub_module_flops_params(flops_params_model, units=units)
+        for name, module in flops_params_model.named_modules():
+            if name in spec_modules:
+                spec_modules_resources[name] = dict()
+                spec_modules_resources[name]['flops'] = module.__flops__
+                spec_modules_resources[name]['params'] = module.__params__
+                flops_count += module.__flops__
+                params_count += module.__params__
+                if as_strings:
+                    spec_modules_resources[name]['flops'] = \
+                        str(module.__flops__) + flops_suffix
+                    spec_modules_resources[name]['params'] = \
+                        str(module.__params__) + params_suffix
+
+    flops_params_model.stop_flops_params_count()
+
+    if seperate_return:
+        return spec_modules_resources
+
+    if as_strings:
+        flops_string = str(flops_count) + flops_suffix
+        params_string = str(params_count) + params_suffix
+        return flops_string, params_string
+
+    return flops_count, params_count
+
+
+def params_units_convert(num_params, units='M', precision=3):
+    """Convert parameter number with units.
+
+    Args:
+        num_params (float): Parameter number to be converted.
+        units (str | None): Converted FLOPs units. Options are None, 'M',
+            'K' and ''. If set to None, it will automatically choose the most
+            suitable unit for Parameter number. Default to None.
+        precision (int): Digit number after the decimal point. Default to 2.
+
+    Returns:
+        str: The converted parameter number.
+
+    Examples:
+        >>> params_units_convert(1e9)
+        '1000.0'
+        >>> params_units_convert(2e5)
+        '200.0'
+        >>> params_units_convert(3e-9)
+        '3e-09'
+    """
+
+    if units == 'G':
+        return round(num_params / 10.**9, precision)
+    elif units == 'M':
+        return round(num_params / 10.**6, precision)
+    elif units == 'K':
+        return round(num_params / 10.**3, precision)
+    else:
+        raise ValueError(f'Unsupported units convert: {units}')
+
+
+def print_model_with_flops_params(model,
+                                  total_flops,
+                                  total_params,
+                                  units=dict(flops='M', params='M'),
+                                  precision=3,
+                                  ost=sys.stdout,
+                                  flush=False):
+    """Print a model with FLOPs and Params for each layer.
+
+    Args:
+        model (nn.Module): The model to be printed.
+        total_flops (float): Total FLOPs of the model.
+        total_params (float): Total parameter counts of the model.
+        units (tuple | none): A tuple pair including converted FLOPs & params
+            units. e.g., ('G', 'M') stands for FLOPs as 'G' & params as 'M'.
+            Default to ('M', 'M').
+        precision (int): Digit number after the decimal point. Default to 3.
+        ost (stream): same as `file` param in :func:`print`.
+            Default to sys.stdout.
+        flush (bool): same as that in :func:`print`. Default to False.
+
+    Example:
+        >>> class ExampleModel(nn.Module):
+        >>> def __init__(self):
+        >>>     super().__init__()
+        >>>     self.conv1 = nn.Conv2d(3, 8, 3)
+        >>>     self.conv2 = nn.Conv2d(8, 256, 3)
+        >>>     self.conv3 = nn.Conv2d(256, 8, 3)
+        >>>     self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
+        >>>     self.flatten = nn.Flatten()
+        >>>     self.fc = nn.Linear(8, 1)
+        >>> def forward(self, x):
+        >>>     x = self.conv1(x)
+        >>>     x = self.conv2(x)
+        >>>     x = self.conv3(x)
+        >>>     x = self.avg_pool(x)
+        >>>     x = self.flatten(x)
+        >>>     x = self.fc(x)
+        >>>     return x
+        >>> model = ExampleModel()
+        >>> x = (3, 16, 16)
+        to print the FLOPs and params state for each layer, you can use
+        >>> get_model_flops_params(model, x)
+        or directly use
+        >>> print_model_with_flops_params(model, 4579784.0, 37361)
+        ExampleModel(
+          0.037 M, 100.000% Params, 0.005 GFLOPs, 100.000% FLOPs,
+          (conv1): Conv2d(0.0 M, 0.600% Params, 0.0 GFLOPs, 0.959% FLOPs, 3, 8, kernel_size=(3, 3), stride=(1, 1))  # noqa: E501
+          (conv2): Conv2d(0.019 M, 50.020% Params, 0.003 GFLOPs, 58.760% FLOPs, 8, 256, kernel_size=(3, 3), stride=(1, 1))
+          (conv3): Conv2d(0.018 M, 49.356% Params, 0.002 GFLOPs, 40.264% FLOPs, 256, 8, kernel_size=(3, 3), stride=(1, 1))
+          (avg_pool): AdaptiveAvgPool2d(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.017% FLOPs, output_size=(1, 1))
+          (flatten): Flatten(0.0 M, 0.000% Params, 0.0 GFLOPs, 0.000% FLOPs, )
+          (fc): Linear(0.0 M, 0.024% Params, 0.0 GFLOPs, 0.000% FLOPs, in_features=8, out_features=1, bias=True)
+        )
+    """
+
+    def accumulate_params(self):
+        """Accumulate params by recursion."""
+        if is_supported_instance(self):
+            return self.__params__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_params()
+            return sum
+
+    def accumulate_flops(self):
+        """Accumulate flops by recursion."""
+        if is_supported_instance(self):
+            return self.__flops__ / model.__batch_counter__
+        else:
+            sum = 0
+            for m in self.children():
+                sum += m.accumulate_flops()
+            return sum
+
+    def flops_repr(self):
+        """A new extra_repr method of the input module."""
+        accumulated_num_params = self.accumulate_params()
+        accumulated_flops_cost = self.accumulate_flops()
+        flops_string = str(
+            params_units_convert(
+                accumulated_flops_cost, units['flops'],
+                precision=precision)) + ' ' + units['flops'] + 'FLOPs'
+        params_string = str(
+            params_units_convert(accumulated_num_params, units['params'],
+                                 precision)) + ' M'
+        return ', '.join([
+            params_string,
+            '{:.3%} Params'.format(accumulated_num_params / total_params),
+            flops_string,
+            '{:.3%} FLOPs'.format(accumulated_flops_cost / total_flops),
+            self.original_extra_repr()
+        ])
+
+    def add_extra_repr(m):
+        """Reload extra_repr method."""
+        m.accumulate_flops = accumulate_flops.__get__(m)
+        m.accumulate_params = accumulate_params.__get__(m)
+        flops_extra_repr = flops_repr.__get__(m)
+        if m.extra_repr != flops_extra_repr:
+            m.original_extra_repr = m.extra_repr
+            m.extra_repr = flops_extra_repr
+            assert m.extra_repr != m.original_extra_repr
+
+    def del_extra_repr(m):
+        """Recover origin extra_repr method."""
+        if hasattr(m, 'original_extra_repr'):
+            m.extra_repr = m.original_extra_repr
+            del m.original_extra_repr
+        if hasattr(m, 'accumulate_flops'):
+            del m.accumulate_flops
+
+    model.apply(add_extra_repr)
+    print(model, file=ost, flush=flush)
+    model.apply(del_extra_repr)
+
+
+def accumulate_sub_module_flops_params(model, units=None):
+    """Accumulate FLOPs and params for each module in the model. Each module in
+    the model will have the `__flops__` and `__params__` parameters.
+
+    Args:
+        model (nn.Module): The model to be accumulated.
+        units (tuple | none): A tuple pair including converted FLOPs & params
+            units. e.g., ('G', 'M') stands for FLOPs as 'G' & params as 'M'.
+            Default to None.
+    """
+
+    def accumulate_params(module):
+        """Accumulate params by recursion."""
+        if is_supported_instance(module):
+            return module.__params__
+        else:
+            sum = 0
+            for m in module.children():
+                sum += accumulate_params(m)
+            return sum
+
+    def accumulate_flops(module):
+        """Accumulate flops by recursion."""
+        if is_supported_instance(module):
+            return module.__flops__ / model.__batch_counter__
+        else:
+            sum = 0
+            for m in module.children():
+                sum += accumulate_flops(m)
+            return sum
+
+    for module in model.modules():
+        _flops = accumulate_flops(module)
+        _params = accumulate_params(module)
+        module.__flops__ = _flops
+        module.__params__ = _params
+        if units is not None:
+            module.__flops__ = params_units_convert(_flops, units['flops'])
+            module.__params__ = params_units_convert(_params, units['params'])
+
+
+def get_model_parameters_number(model):
+    """Calculate parameter number of a model.
+
+    Args:
+        model (nn.module): The model for parameter number calculation.
+
+    Returns:
+        float: Parameter number of the model.
+    """
+    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    return num_params
+
+
+def add_flops_params_counting_methods(net_main_module):
+    """Add additional methods to the existing module object.
+
+    This is done this way so that each function has access to self object.
+    """
+    net_main_module.start_flops_params_count = start_flops_params_count.__get__(  # noqa: E501
+        net_main_module)
+    net_main_module.stop_flops_params_count = stop_flops_params_count.__get__(
+        net_main_module)
+    net_main_module.reset_flops_params_count = reset_flops_params_count.__get__(  # noqa: E501
+        net_main_module)
+    net_main_module.compute_average_flops_params_cost = compute_average_flops_params_cost.__get__(  # noqa: E501
+        net_main_module)
+
+    net_main_module.reset_flops_params_count()
+
+    return net_main_module
+
+
+def compute_average_flops_params_cost(self):
+    """Compute average FLOPs and Params cost.
+
+    A method to compute average FLOPs cost, which will be available after
+    `add_flops_params_counting_methods()` is called on a desired net object.
+
+    Returns:
+        float: Current mean flops consumption per image.
+    """
+    batches_count = self.__batch_counter__
+    flops_sum = 0
+    params_sum = 0
+    for module in self.modules():
+        if is_supported_instance(module):
+            flops_sum += module.__flops__
+            params_sum += module.__params__
+    return flops_sum / batches_count, params_sum
+
+
+def start_flops_params_count(self, disabled_counters):
+    """Activate the computation of mean flops and params consumption per image.
+
+    A method to activate the computation of mean flops consumption per image.
+    which will be available after ``add_flops_params_counting_methods()`` is
+    called on a desired net object. It should be called before running the
+    network.
+    """
+    add_batch_counter_hook_function(self)
+
+    def add_flops_params_counter_hook_function(module):
+        if is_supported_instance(module):
+            if hasattr(module, '__flops_params_handle__'):
+                return
+
+            else:
+                counter_type = get_counter_type(module)
+                if (disabled_counters is None
+                        or counter_type not in disabled_counters):
+                    counter = TASK_UTILS.build(
+                        dict(type=counter_type, _scope_='mmrazor'))
+                    handle = module.register_forward_hook(
+                        counter.add_count_hook)
+
+                    module.__flops_params_handle__ = handle
+                else:
+                    return
+
+    self.apply(partial(add_flops_params_counter_hook_function))
+
+
+def stop_flops_params_count(self):
+    """Stop computing the mean flops and params consumption per image.
+
+    A method to stop computing the mean flops consumption per image, which will
+    be available after ``add_flops_params_counting_methods()`` is called on a
+    desired net object. It can be called to pause the computation whenever.
+    """
+    remove_batch_counter_hook_function(self)
+    self.apply(remove_flops_params_counter_hook_function)
+
+
+def reset_flops_params_count(self):
+    """Reset statistics computed so far.
+
+    A method to Reset computed statistics, which will be available after
+    `add_flops_params_counting_methods()` is called on a desired net object.
+    """
+    add_batch_counter_variables_or_reset(self)
+    self.apply(add_flops_params_counter_variable_or_reset)
+
+
+# ---- Internal functions
+def empty_flops_params_counter_hook(module, input, output):
+    """Empty flops and params variables of the module."""
+    module.__flops__ += 0
+    module.__params__ += 0
+
+
+def add_batch_counter_variables_or_reset(module):
+    """Add or reset the batch counter variable."""
+    module.__batch_counter__ = 0
+
+
+def add_batch_counter_hook_function(module):
+    """Register the batch counter hook for the module."""
+    if hasattr(module, '__batch_counter_handle__'):
+        return
+
+    handle = module.register_forward_hook(batch_counter_hook)
+    module.__batch_counter_handle__ = handle
+
+
+def batch_counter_hook(module, input, output):
+    """Add batch counter variable based on the input size."""
+    batch_size = 1
+    if len(input) > 0:
+        # Can have multiple inputs, getting the first one
+        input = input[0]
+        batch_size = len(input)
+    else:
+        global no_positional_input_warned
+        if no_positional_input_warned:
+            pass
+        else:
+            print('Warning! No positional inputs found for a module, '
+                  'assuming batch size is 1.')
+            no_positional_input_warned = True
+    module.__batch_counter__ += batch_size
+
+
+def remove_batch_counter_hook_function(module):
+    """Remove batch counter handle variable."""
+    if hasattr(module, '__batch_counter_handle__'):
+        module.__batch_counter_handle__.remove()
+        del module.__batch_counter_handle__
+
+
+def add_flops_params_counter_variable_or_reset(module):
+    """Add or reset flops and params variable of the module."""
+    if is_supported_instance(module):
+        if hasattr(module, '__flops__') or hasattr(module, '__params__'):
+            print('Warning: variables __flops__ or __params__ are already '
+                  'defined for the module' + type(module).__name__ +
+                  ' ptflops can affect your code!')
+        module.__flops__ = 0
+        module.__params__ = 0
+
+
+counter_warning_list = []
+
+
+def get_counter_type(module) -> str:
+    """Get counter type of the module based on the module class name.
+
+    If the current module counter_type is not in TASK_UTILS._module_dict,
+    it will search the base classes of the module to see if it matches any
+    base class counter_type.
+
+    Returns:
+        str: Counter type (or the base counter type) of the current module.
+    """
+    counter_type = module.__class__.__name__ + 'Counter'
+    if counter_type not in TASK_UTILS._module_dict.keys():
+        old_counter_type = counter_type
+        assert nn.Module in module.__class__.mro()
+        for base_cls in module.__class__.mro():
+            if base_cls in get_modules_list():
+                counter_type = base_cls.__name__ + 'Counter'
+                global counter_warning_list
+                if old_counter_type not in counter_warning_list:
+                    from mmengine import MMLogger
+                    logger = MMLogger.get_current_instance()
+                    logger.warning(f'`{old_counter_type}` not in op_counters. '
+                                   f'Using `{counter_type}` instead.')
+                    counter_warning_list.append(old_counter_type)
+                break
+    return counter_type
+
+
+def is_supported_instance(module):
+    """Judge whether the module is in TASK_UTILS registry or not."""
+    if get_counter_type(module) in TASK_UTILS._module_dict.keys():
+        return True
+    return False
+
+
+def remove_flops_params_counter_hook_function(module):
+    """Remove counter related variables after resource estimation."""
+    if hasattr(module, '__flops_params_handle__'):
+        module.__flops_params_handle__.remove()
+        del module.__flops_params_handle__
+    if hasattr(module, '__flops__'):
+        del module.__flops__
+    if hasattr(module, '__params__'):
+        del module.__params__
+
+
+def get_modules_list() -> List:
+    return [
+        # convolutions
+        nn.Conv1d,
+        nn.Conv2d,
+        nn.Conv3d,
+        mmcv.cnn.bricks.Conv2d,
+        mmcv.cnn.bricks.Conv3d,
+        # activations
+        nn.ReLU,
+        nn.PReLU,
+        nn.ELU,
+        nn.LeakyReLU,
+        nn.ReLU6,
+        # poolings
+        nn.MaxPool1d,
+        nn.AvgPool1d,
+        nn.AvgPool2d,
+        nn.MaxPool2d,
+        nn.MaxPool3d,
+        nn.AvgPool3d,
+        mmcv.cnn.bricks.MaxPool2d,
+        mmcv.cnn.bricks.MaxPool3d,
+        nn.AdaptiveMaxPool1d,
+        nn.AdaptiveAvgPool1d,
+        nn.AdaptiveMaxPool2d,
+        nn.AdaptiveAvgPool2d,
+        nn.AdaptiveMaxPool3d,
+        nn.AdaptiveAvgPool3d,
+        # normalizations
+        nn.BatchNorm1d,
+        nn.BatchNorm2d,
+        nn.BatchNorm3d,
+        nn.GroupNorm,
+        nn.InstanceNorm1d,
+        nn.InstanceNorm2d,
+        nn.InstanceNorm3d,
+        nn.LayerNorm,
+        # FC
+        nn.Linear,
+        mmcv.cnn.bricks.Linear,
+        # Upscale
+        nn.Upsample,
+        nn.UpsamplingNearest2d,
+        nn.UpsamplingBilinear2d,
+        # Deconvolution
+        nn.ConvTranspose2d,
+        mmcv.cnn.bricks.ConvTranspose2d,
+    ]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/latency_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/latency_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..a4241e31382b1faf6e97ff65259dcee301f372ae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/latency_counter.py
@@ -0,0 +1,135 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+import time
+from typing import Tuple, Union
+
+import torch
+from mmengine.logging import print_log
+
+
+def get_model_latency(model: torch.nn.Module,
+                      input_shape: Tuple = (1, 3, 224, 224),
+                      unit: str = 'ms',
+                      as_strings: bool = False,
+                      max_iter: int = 100,
+                      num_warmup: int = 5,
+                      log_interval: int = 100,
+                      repeat_num: int = 1) -> Union[float, str]:
+    """Repeat speed measure for multi-times to get more precise results.
+
+    Args:
+        model (torch.nn.Module): The measured model.
+        input_shape (tuple): Input shape (including batchsize) used for
+            calculation. Default to (1, 3, 224, 224).
+        unit (str): Unit of latency in string format. Default to 'ms'.
+        as_strings (bool): Output latency counts in a string form.
+            Default to False.
+        max_iter (Optional[int]): Max iteration num for the measurement.
+            Default to 100.
+        num_warmup (Optional[int]): Iteration num for warm-up stage.
+            Default to 5.
+        log_interval (Optional[int]): Interval num for logging the results.
+            Default to 100.
+        repeat_num (Optional[int]): Num of times to repeat the measurement.
+            Default to 1.
+
+    Returns:
+        latency (Union[float, str]): The measured inference speed of the model.
+            if ``as_strings=True``, it will return latency in string format.
+    """
+    assert repeat_num >= 1
+
+    fps_list = []
+
+    for _ in range(repeat_num):
+        fps_list.append(
+            _get_model_latency(model, input_shape, max_iter, num_warmup,
+                               log_interval))
+
+    latency = round(1000 / fps_list[0], 1)
+
+    if repeat_num > 1:
+        _fps_list = [round(fps, 1) for fps in fps_list]
+        times_per_img_list = [round(1000 / fps, 1) for fps in fps_list]
+        _mean_fps = sum(_fps_list) / len(_fps_list)
+        mean_times_per_img = sum(times_per_img_list) / len(times_per_img_list)
+        print_log(
+            f'Overall fps: {_fps_list}[{_mean_fps:.1f}] img / s, '
+            f'times per image: '
+            f'{times_per_img_list}[{mean_times_per_img:.1f}] ms/img',
+            logger='current',
+            level=logging.DEBUG)
+        latency = mean_times_per_img
+
+    if as_strings:
+        latency = str(latency) + ' ' + unit  # type: ignore
+
+    return latency
+
+
+def _get_model_latency(model: torch.nn.Module,
+                       input_shape: Tuple = (1, 3, 224, 224),
+                       max_iter: int = 100,
+                       num_warmup: int = 5,
+                       log_interval: int = 100) -> float:
+    """Measure inference speed on GPU devices.
+
+    Args:
+        model (torch.nn.Module): The measured model.
+        input_shape (tuple): Input shape (including batchsize) used for
+            calculation. Default to (1, 3, 224, 224).
+        max_iter (Optional[int]): Max iteration num for the measurement.
+            Default to 100.
+        num_warmup (Optional[int]): Iteration num for warm-up stage.
+            Default to 5.
+        log_interval (Optional[int]): Interval num for logging the results.
+            Default to 100.
+
+    Returns:
+        fps (float): The measured inference speed of the model.
+    """
+    # the first several iterations may be very slow so skip them
+    pure_inf_time = 0.0
+    fps = 0.0
+    data = dict()
+    if next(model.parameters()).is_cuda:
+        device = 'cuda'
+    else:
+        raise NotImplementedError('To use cpu to test latency not supported.')
+
+    # benchmark with {max_iter} image and take the average
+    for i in range(1, max_iter):
+        if device == 'cuda':
+            data = torch.rand(input_shape).cuda()
+        torch.cuda.synchronize()
+        start_time = time.perf_counter()
+
+        with torch.no_grad():
+            model(data)
+
+        torch.cuda.synchronize()
+        elapsed = time.perf_counter() - start_time
+
+        if i >= num_warmup:
+            pure_inf_time += elapsed
+            if (i + 1) % log_interval == 0:
+                fps = (i + 1 - num_warmup) / pure_inf_time
+                print_log(
+                    f'Done image [{i + 1:<3}/ {max_iter}], '
+                    f'fps: {fps:.1f} img / s, '
+                    f'times per image: {1000 / fps:.1f} ms / img',
+                    logger='current',
+                    level=logging.DEBUG)
+
+        if (i + 1) == max_iter:
+            fps = (i + 1 - num_warmup) / pure_inf_time
+            print_log(
+                f'Overall fps: {fps:.1f} img / s, '
+                f'times per image: {1000 / fps:.1f} ms / img',
+                logger='current',
+                level=logging.DEBUG)
+            break
+
+        torch.cuda.empty_cache()
+
+    return fps
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5226a70473fef2d4fab40196ed78b92678150eea
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/__init__.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .activation_layer_counter import (ELUCounter, LeakyReLUCounter,
+                                       PReLUCounter, ReLU6Counter, ReLUCounter)
+from .base_counter import BaseCounter
+from .conv_layer_counter import (Conv1dCounter, Conv2dCounter, Conv3dCounter,
+                                 DynamicConv2dCounter)
+from .deconv_layer_counter import ConvTranspose2dCounter
+from .linear_layer_counter import DynamicLinearCounter, LinearCounter
+from .norm_layer_counter import (BatchNorm1dCounter, BatchNorm2dCounter,
+                                 BatchNorm3dCounter, DMCPBatchNorm2dCounter,
+                                 GroupNormCounter, InstanceNorm1dCounter,
+                                 InstanceNorm2dCounter, InstanceNorm3dCounter,
+                                 LayerNormCounter)
+from .pooling_layer_counter import *  # noqa: F403, F405, F401
+from .upsample_layer_counter import UpsampleCounter
+
+__all__ = [
+    'ReLUCounter', 'PReLUCounter', 'ELUCounter', 'LeakyReLUCounter',
+    'ReLU6Counter', 'BatchNorm1dCounter', 'BatchNorm2dCounter',
+    'BatchNorm3dCounter', 'Conv1dCounter', 'Conv2dCounter', 'Conv3dCounter',
+    'ConvTranspose2dCounter', 'UpsampleCounter', 'LinearCounter',
+    'GroupNormCounter', 'InstanceNorm1dCounter', 'InstanceNorm2dCounter',
+    'InstanceNorm3dCounter', 'LayerNormCounter', 'BaseCounter',
+    'DMCPBatchNorm2dCounter', 'DynamicConv2dCounter', 'DynamicLinearCounter'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/activation_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/activation_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..e32aa552674ed968f5203d38a43fd5f561a724fc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/activation_layer_counter.py
@@ -0,0 +1,40 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+@TASK_UTILS.register_module()
+class ReLUCounter(BaseCounter):
+    """FLOPs/params counter for ReLU series activate function."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        active_elements_count = output.numel()
+        module.__flops__ += int(active_elements_count)
+        module.__params__ += get_model_parameters_number(module)
+
+
+@TASK_UTILS.register_module()
+class PReLUCounter(ReLUCounter):
+    """FLOPs/params counter for PReLU function."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class ELUCounter(ReLUCounter):
+    """FLOPs/params counter for ELU function."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class LeakyReLUCounter(ReLUCounter):
+    """FLOPs/params counter for LeakyReLU function."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class ReLU6Counter(ReLUCounter):
+    """FLOPs/params counter for ReLU6 function."""
+    pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/base_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/base_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..46baee9336beae53721ea4842898ce27c9573700
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/base_counter.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractclassmethod
+
+
+class BaseCounter(object, metaclass=ABCMeta):
+    """Base class of all op module counters in `TASK_UTILS`.
+
+    In ResourceEstimator, `XXModuleCounter` is responsible for `XXModule`,
+    which refers to estimator/flops_params_counter.py::get_counter_type().
+    Users can customize a `ModuleACounter` and overwrite the `add_count_hook`
+    method with a self-defined module `ModuleA`.
+    """
+
+    def __init__(self) -> None:
+        pass
+
+    @staticmethod
+    @abstractclassmethod
+    def add_count_hook(module, input, output):
+        """The main method of a `BaseCounter` which defines the way to
+        calculate resources(flops/params) of the current module.
+
+        Args:
+            module (nn.Module): the module to be tested.
+            input (_type_): input_tensor. Plz refer to `torch forward_hook`
+            output (_type_): output_tensor. Plz refer to `torch forward_hook`
+        """
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/conv_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/conv_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..d410e398ea93f2f1ca9659057c624c2f9efb3b45
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/conv_layer_counter.py
@@ -0,0 +1,115 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+from mmrazor.models.architectures.dynamic_ops import DynamicConv2d
+from mmrazor.registry import TASK_UTILS
+from .base_counter import BaseCounter
+
+
+class ConvCounter(BaseCounter):
+    """FLOPs/params counter for Conv module series."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        # Can have multiple inputs, getting the first one
+        input = input[0]
+
+        batch_size = input.shape[0]
+        output_dims = list(output.shape[2:])
+
+        kernel_dims = list(module.kernel_size)
+        in_channels = module.in_channels
+        out_channels = module.out_channels
+        groups = module.groups
+
+        filters_per_channel = out_channels / groups
+        conv_per_position_flops = int(
+            np.prod(kernel_dims)) * in_channels * filters_per_channel
+
+        active_elements_count = batch_size * int(np.prod(output_dims))
+
+        overall_conv_flops = conv_per_position_flops * active_elements_count
+        overall_params = conv_per_position_flops
+
+        bias_flops = 0
+        overall_params = conv_per_position_flops
+        if module.bias is not None:
+            bias_flops = out_channels * active_elements_count
+            overall_params += out_channels
+
+        overall_flops = overall_conv_flops + bias_flops
+
+        module.__flops__ += overall_flops
+        module.__params__ += int(overall_params)
+
+
+@TASK_UTILS.register_module()
+class Conv1dCounter(ConvCounter):
+    """FLOPs/params counter for Conv1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class Conv2dCounter(ConvCounter):
+    """FLOPs/params counter for Conv2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class Conv3dCounter(ConvCounter):
+    """FLOPs/params counter for Conv3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class DynamicConv2dCounter(ConvCounter):
+
+    @staticmethod
+    def add_count_hook(module: DynamicConv2d, input, output):
+        """Calculate FLOPs and params based on the dynamic channels of conv
+        layers."""
+        input = input[0]
+
+        batch_size = input.shape[0]
+        output_dims = list(output.shape[2:])
+
+        kernel_dims = list(module.kernel_size)
+
+        if 'out_channels' in module.mutable_attrs:
+            out_channels = module.mutable_attrs[
+                'out_channels'].activated_channels
+            mutable_channel = list(
+                module.mutable_attrs['out_channels'].mutable_channels.values())
+            if len(mutable_channel) > 0 and hasattr(
+                    mutable_channel[0], 'activated_tensor_channels'):
+                out_channels = mutable_channel[0].activated_tensor_channels
+        else:
+            out_channels = module.out_channels
+        if 'in_channels' in module.mutable_attrs:
+            in_channels = module.mutable_attrs[
+                'in_channels'].activated_channels
+        else:
+            in_channels = module.in_channels
+
+        groups = module.groups
+
+        filters_per_channel = out_channels / groups
+        conv_per_position_flops = \
+            np.prod(kernel_dims) * in_channels * filters_per_channel
+
+        active_elements_count = batch_size * int(np.prod(output_dims))
+
+        overall_conv_flops = conv_per_position_flops * active_elements_count
+        overall_params = conv_per_position_flops
+
+        bias_flops = 0
+        overall_params = conv_per_position_flops
+        if module.bias is not None:
+            bias_flops = out_channels * active_elements_count
+            overall_params += out_channels
+
+        overall_flops = overall_conv_flops + bias_flops
+
+        module.__flops__ += overall_flops
+        module.__params__ += int(overall_params)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/deconv_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/deconv_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..0426fbb4bdeb76dca11480958e890bd174724421
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/deconv_layer_counter.py
@@ -0,0 +1,39 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+@TASK_UTILS.register_module()
+class ConvTranspose2dCounter(BaseCounter):
+    """FLOPs/params counter for Deconv module series."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Compute FLOPs and params based on the size of input & output."""
+        # Can have multiple inputs, getting the first one
+        input = input[0]
+
+        batch_size = input.shape[0]
+        input_height, input_width = input.shape[2:]
+
+        # TODO: use more common representation
+        kernel_height, kernel_width = module.kernel_size
+        in_channels = module.in_channels
+        out_channels = module.out_channels
+        groups = module.groups
+
+        filters_per_channel = out_channels // groups
+        conv_per_position_flops = (
+            kernel_height * kernel_width * in_channels * filters_per_channel)
+
+        active_elements_count = batch_size * input_height * input_width
+        overall_conv_flops = conv_per_position_flops * active_elements_count
+        bias_flops = 0
+        if module.bias is not None:
+            output_height, output_width = output.shape[2:]
+            bias_flops = out_channels * batch_size * output_height * output_height  # noqa: E501
+        overall_flops = overall_conv_flops + bias_flops
+
+        module.__flops__ += int(overall_flops)
+        module.__params__ += get_model_parameters_number(module)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/group_fisher_counters.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/group_fisher_counters.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e85c33ae42ba4f2b972bf629f4f3e1c15f09144
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/group_fisher_counters.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This file includes the modules in the impl folder.
+
+As it only records impl modules, it is not initialized automatically.
+"""
+from mmrazor.implementations.pruning.group_fisher import (  # noqa
+    GroupFisherConv2dCounter, GroupFisherLinearCounter)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/linear_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/linear_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..80c024c09fe7070b1d2f3b8a15e7268665af2467
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/linear_layer_counter.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+@TASK_UTILS.register_module()
+class LinearCounter(BaseCounter):
+    """FLOPs/params counter for Linear operation series."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        input = input[0]
+        output_last_dim = output.shape[
+            -1]  # pytorch checks dimensions, so here we don't care much
+        module.__flops__ += int(np.prod(input.shape) * output_last_dim)
+        module.__params__ += get_model_parameters_number(module)
+
+
+@TASK_UTILS.register_module()
+class DynamicLinearCounter(LinearCounter):
+    pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/norm_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/norm_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..54f3f26d208362d30ea4b404198fcebf4eeea27e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/norm_layer_counter.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+class BNCounter(BaseCounter):
+    """FLOPs/params counter for BatchNormalization series."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        input = input[0]
+        batch_flops = np.prod(input.shape)
+        if getattr(module, 'affine', False):
+            batch_flops *= 2
+        module.__flops__ += int(batch_flops)
+        module.__params__ += get_model_parameters_number(module)
+
+
+@TASK_UTILS.register_module()
+class BatchNorm1dCounter(BNCounter):
+    """FLOPs/params counter for BatchNorm1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class BatchNorm2dCounter(BNCounter):
+    """FLOPs/params counter for BatchNorm2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class BatchNorm3dCounter(BNCounter):
+    """FLOPs/params counter for BatchNorm3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class InstanceNorm1dCounter(BNCounter):
+    """FLOPs/params counter for InstanceNorm1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class InstanceNorm2dCounter(BNCounter):
+    """FLOPs/params counter for InstanceNorm2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class InstanceNorm3dCounter(BNCounter):
+    """FLOPs/params counter for InstanceNorm3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class LayerNormCounter(BNCounter):
+    """FLOPs/params counter for LayerNorm module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class GroupNormCounter(BNCounter):
+    """FLOPs/params counter for GroupNorm module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class DMCPBatchNorm2dCounter(BNCounter):
+    """FLOPs/params counter for DynamicBatchNorm2d module."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        input = input[0]
+        B, C, H, W = input.shape
+
+        mutable_channel = list(
+            module.mutable_attrs['num_features'].mutable_channels.values())
+        if hasattr(mutable_channel[0], 'activated_tensor_channels'):
+            C = mutable_channel[0].activated_tensor_channels
+
+        batch_flops = B * C * H * W
+        if getattr(module, 'affine', False):
+            batch_flops *= 2
+        num_features = module.mutable_attrs['num_features'].activated_channels
+        module.__flops__ += batch_flops
+        module.__params__ += num_features * 2
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/pooling_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/pooling_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..27d9b605fe4a6e416d1b6a35fb353b12a9cf2886
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/pooling_layer_counter.py
@@ -0,0 +1,89 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+class PoolCounter(BaseCounter):
+    """FLOPs/params counter for Pooling series."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        input = input[0]
+        module.__flops__ += int(np.prod(input.shape))
+        module.__params__ += get_model_parameters_number(module)
+
+
+@TASK_UTILS.register_module()
+class MaxPool1dCounter(PoolCounter):
+    """FLOPs/params counter for MaxPool1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class MaxPool2dCounter(PoolCounter):
+    """FLOPs/params counter for MaxPool2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class MaxPool3dCounter(PoolCounter):
+    """FLOPs/params counter for MaxPool3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AvgPool1dCounter(PoolCounter):
+    """FLOPs/params counter for AvgPool1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AvgPool2dCounter(PoolCounter):
+    """FLOPs/params counter for AvgPool2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AvgPool3dCounter(PoolCounter):
+    """FLOPs/params counter for AvgPool3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveMaxPool1dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveMaxPool1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveMaxPool2dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveMaxPool2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveMaxPool3dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveMaxPool3d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveAvgPool1dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveAvgPool1d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveAvgPool2dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveAvgPool2d module."""
+    pass
+
+
+@TASK_UTILS.register_module()
+class AdaptiveAvgPool3dCounter(PoolCounter):
+    """FLOPs/params counter for AdaptiveAvgPool3d module."""
+    pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/upsample_layer_counter.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/upsample_layer_counter.py
new file mode 100644
index 0000000000000000000000000000000000000000..12958b6d5ab29e0641b365b614560c5ee6b2d504
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/counters/op_counters/upsample_layer_counter.py
@@ -0,0 +1,20 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.registry import TASK_UTILS
+from ..flops_params_counter import get_model_parameters_number
+from .base_counter import BaseCounter
+
+
+@TASK_UTILS.register_module()
+class UpsampleCounter(BaseCounter):
+    """FLOPs/params counter for Upsample function."""
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        """Calculate FLOPs and params based on the size of input & output."""
+        output_size = output[0]
+        batch_size = output_size.shape[0]
+        output_elements_count = batch_size
+        for val in output_size.shape[1:]:
+            output_elements_count *= val
+        module.__flops__ += int(output_elements_count)
+        module.__params__ += get_model_parameters_number(module)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/resource_estimator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/resource_estimator.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac5292d0c08e237696a056c2224a64764d131d67
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/estimators/resource_estimator.py
@@ -0,0 +1,215 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple, Union
+
+import torch.nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_estimator import BaseEstimator
+from .counters import get_model_flops_params, get_model_latency
+
+
+@TASK_UTILS.register_module()
+class ResourceEstimator(BaseEstimator):
+    """Estimator for calculating the resources consume.
+
+    Args:
+        input_shape (tuple): Input data's default shape, for calculating
+            resources consume. Defaults to (1, 3, 224, 224).
+        units (dict): Dict that contains converted FLOPs/params/latency units.
+            Default to dict(flops='M', params='M', latency='ms').
+        as_strings (bool): Output FLOPs/params/latency counts in a string
+            form. Default to False.
+        flops_params_cfg (dict): Cfg for estimating FLOPs and parameters.
+            Default to None.
+        latency_cfg (dict): Cfg for estimating latency. Default to None.
+
+    Examples:
+        >>> # direct calculate resource consume of nn.Conv2d
+        >>> conv2d = nn.Conv2d(3, 32, 3)
+        >>> estimator = ResourceEstimator(input_shape=(1, 3, 64, 64))
+        >>> estimator.estimate(model=conv2d)
+        {'flops': 3.444, 'params': 0.001, 'latency': 0.0}
+
+        >>> # direct calculate resource consume of nn.Conv2d
+        >>> conv2d = nn.Conv2d(3, 32, 3)
+        >>> estimator = ResourceEstimator()
+        >>> flops_params_cfg = dict(input_shape=(1, 3, 32, 32))
+        >>> estimator.estimate(model=conv2d, flops_params_cfg)
+        {'flops': 0.806, 'params': 0.001, 'latency': 0.0}
+
+        >>> # calculate resources of custom modules
+        >>> class CustomModule(nn.Module):
+        ...
+        ...    def __init__(self) -> None:
+        ...        super().__init__()
+        ...
+        ...    def forward(self, x):
+        ...        return x
+        ...
+        >>> @TASK_UTILS.register_module()
+        ... class CustomModuleCounter(BaseCounter):
+        ...
+        ...    @staticmethod
+        ...    def add_count_hook(module, input, output):
+        ...        module.__flops__ += 1000000
+        ...        module.__params__ += 700000
+        ...
+        >>> model = CustomModule()
+        >>> flops_params_cfg = dict(input_shape=(1, 3, 64, 64))
+        >>> estimator.estimate(model=model, flops_params_cfg)
+        {'flops': 1.0, 'params': 0.7, 'latency': 0.0}
+        ...
+        >>> # calculate resources of custom modules with disable_counters
+        >>> flops_params_cfg = dict(input_shape=(1, 3, 64, 64),
+        ...                         disabled_counters=['CustomModuleCounter'])
+        >>> estimator.estimate(model=model, flops_params_cfg)
+        {'flops': 0.0, 'params': 0.0, 'latency': 0.0}
+
+        >>> # calculate resources of mmrazor.models
+        NOTE: check 'EstimateResourcesHook' in
+              mmrazor.engine.hooks.estimate_resources_hook for details.
+    """
+
+    def __init__(
+        self,
+        input_shape: Tuple = (1, 3, 224, 224),
+        units: Dict = dict(flops='M', params='M', latency='ms'),
+        as_strings: bool = False,
+        flops_params_cfg: Optional[dict] = None,
+        latency_cfg: Optional[dict] = None,
+    ):
+        super().__init__(input_shape, units, as_strings)
+        if not isinstance(units, dict):
+            raise TypeError('units for estimator should be a dict',
+                            f'but got `{type(units)}`')
+        for unit_key in units:
+            if unit_key not in ['flops', 'params', 'latency']:
+                raise KeyError(f'Got invalid key `{unit_key}` in units. ',
+                               'Should be `flops`, `params` or `latency`.')
+        if flops_params_cfg:
+            self.flops_params_cfg = flops_params_cfg
+        else:
+            self.flops_params_cfg = dict()
+        self.latency_cfg = latency_cfg if latency_cfg else dict()
+
+    def estimate(self,
+                 model: torch.nn.Module,
+                 flops_params_cfg: dict = None,
+                 latency_cfg: dict = None) -> Dict[str, Union[float, str]]:
+        """Estimate the resources(flops/params/latency) of the given model.
+
+        This method will first parse the merged :attr:`self.flops_params_cfg`
+        and the :attr:`self.latency_cfg` to check whether the keys are valid.
+
+        Args:
+            model: The measured model.
+            flops_params_cfg (dict): Cfg for estimating FLOPs and parameters.
+                Default to None.
+            latency_cfg (dict): Cfg for estimating latency. Default to None.
+
+            NOTE: If the `flops_params_cfg` and `latency_cfg` are both None,
+            this method will only estimate FLOPs/params with default settings.
+
+        Returns:
+            Dict[str, Union[float, str]]): A dict that contains the resource
+                results(FLOPs, params and latency).
+        """
+        resource_metrics = dict()
+        measure_latency = True if latency_cfg else False
+
+        if flops_params_cfg:
+            flops_params_cfg = {**self.flops_params_cfg, **flops_params_cfg}
+            self._check_flops_params_cfg(flops_params_cfg)
+            flops_params_cfg = self._set_default_resource_params(
+                flops_params_cfg)
+        else:
+            flops_params_cfg = self.flops_params_cfg
+
+        if latency_cfg:
+            latency_cfg = {**self.latency_cfg, **latency_cfg}
+            self._check_latency_cfg(latency_cfg)
+            latency_cfg = self._set_default_resource_params(latency_cfg)
+        else:
+            latency_cfg = self.latency_cfg
+
+        model.eval()
+        flops, params = get_model_flops_params(model, **flops_params_cfg)
+        if measure_latency:
+            latency = get_model_latency(model, **latency_cfg)
+        else:
+            latency = '0.0 ms' if self.as_strings else 0.0  # type: ignore
+
+        resource_metrics.update({
+            'flops': flops,
+            'params': params,
+            'latency': latency
+        })
+        return resource_metrics
+
+    def estimate_separation_modules(
+            self,
+            model: torch.nn.Module,
+            flops_params_cfg: dict = None) -> Dict[str, Union[float, str]]:
+        """Estimate FLOPs and params of the spec modules with separate return.
+
+        Args:
+            model: The measured model.
+            flops_params_cfg (dict): Cfg for estimating FLOPs and parameters.
+                Default to None.
+
+        Returns:
+            Dict[str, Union[float, str]]): A dict that contains the FLOPs and
+                params results (string | float format) of each modules in the
+                ``flops_params_cfg['spec_modules']``.
+        """
+        if flops_params_cfg:
+            flops_params_cfg = {**self.flops_params_cfg, **flops_params_cfg}
+            self._check_flops_params_cfg(flops_params_cfg)
+            flops_params_cfg = self._set_default_resource_params(
+                flops_params_cfg)
+        else:
+            flops_params_cfg = self.flops_params_cfg
+        flops_params_cfg['seperate_return'] = True
+
+        assert len(flops_params_cfg['spec_modules']), (
+            'spec_modules can not be empty when calling '
+            f'`estimate_separation_modules` of {self.__class__.__name__} ')
+
+        model.eval()
+        spec_modules_resources = get_model_flops_params(
+            model, **flops_params_cfg)
+        return spec_modules_resources
+
+    def _check_flops_params_cfg(self, flops_params_cfg: dict) -> None:
+        """Check the legality of ``flops_params_cfg``.
+
+        Args:
+            flops_params_cfg (dict): Cfg for estimating FLOPs and parameters.
+        """
+        for key in flops_params_cfg:
+            if key not in get_model_flops_params.__code__.co_varnames[
+                    1:]:  # type: ignore
+                raise KeyError(f'Got invalid key `{key}` in flops_params_cfg.')
+
+    def _check_latency_cfg(self, latency_cfg: dict) -> None:
+        """Check the legality of ``latency_cfg``.
+
+        Args:
+            latency_cfg (dict): Cfg for estimating latency.
+        """
+        for key in latency_cfg:
+            if key not in get_model_latency.__code__.co_varnames[
+                    1:]:  # type: ignore
+                raise KeyError(f'Got invalid key `{key}` in latency_cfg.')
+
+    def _set_default_resource_params(self, cfg: dict) -> dict:
+        """Set default attributes for the input cfgs.
+
+        Args:
+            cfg (dict): flops_params_cfg or latency_cfg.
+        """
+        default_common_settings = ['input_shape', 'units', 'as_strings']
+        for key in default_common_settings:
+            if key not in cfg:
+                cfg[key] = getattr(self, key)
+        return cfg
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6caa0dbf8130979037f01687149e31a45072847d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .metric_predictor import MetricPredictor
+
+__all__ = ['MetricPredictor']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..96337921d9c3c6d51581fc0290532fe3eb0e27c3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .carts_handler import CartsHandler
+from .gp_handler import GaussProcessHandler
+from .mlp_handler import MLPHandler
+from .rbf_handler import RBFHandler
+
+__all__ = ['CartsHandler', 'GaussProcessHandler', 'MLPHandler', 'RBFHandler']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/base_handler.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/base_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..40246ac3dd1d72ca7ccc7a914e40a9730adce374
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/base_handler.py
@@ -0,0 +1,32 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from joblib import dump, load
+
+
+class BaseHandler:
+    """Base class for a handler.
+
+    Note:
+        The handler works through a specific machine leanring algorithm,
+        and is designed for predicting the evaluation metric of a model.
+    """
+
+    def __init__(self) -> None:
+        pass
+
+    def fit(self, train_data, train_label):
+        """Training the model of handler."""
+        pass
+
+    def predict(self, test_data):
+        """Predicting the metric using the model of handler."""
+        pass
+
+    def load(self, path):
+        """Load pretrained weights for the handler."""
+        self.model = load(path)
+
+    def save(self, path):
+        """Save the handler and return saved path for diff suffix."""
+        path += f'_{self.__class__.__name__}.joblib'.lower()
+        dump(self.model, path)
+        return path
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/carts_handler.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/carts_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..0c310c97f8e585d4cfd9673b2cf9047764454706
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/carts_handler.py
@@ -0,0 +1,97 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+import numpy as np
+
+try:
+    from sklearn.tree import DecisionTreeRegressor
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    DecisionTreeRegressor = get_placeholder('sklearn')
+
+from mmrazor.registry import TASK_UTILS
+from .base_handler import BaseHandler
+
+
+@TASK_UTILS.register_module()
+class CartsHandler(BaseHandler):
+    """Classification and Regression Tree.
+
+    Args:
+        num_trees (int): number of regression trees.
+    """
+
+    def __init__(self, num_trees=1000):
+        self.num_trees = num_trees
+
+    def fit(self, train_data: np.array, train_label: np.array) -> None:
+        """Define the model of handler.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+        """
+        self.model = self._make_decision_trees(train_data, train_label,
+                                               self.num_trees)
+
+    def predict(self, test_data: np.array) -> np.array:
+        """Predict the evaluation metric of the model.
+
+        Args:
+            test_data (numpy.array): input data for testing.
+
+        Returns:
+            numpy.array: predicted metric.
+        """
+        trees, features = self.model[0], self.model[1]
+        test_num, num_trees = len(test_data), len(trees)
+
+        predict_labels = np.zeros((test_num, 1))
+        for i in range(test_num):
+            this_test_data = test_data[i, :]
+            predict_this_list = np.zeros(num_trees)
+
+            for j, (tree, feature) in enumerate(zip(trees, features)):
+                predict_this_list[j] = tree.predict([this_test_data[feature]
+                                                     ])[0]
+
+            predict_this_list = np.sort(predict_this_list)
+            predict_this_list = predict_this_list[::-1]
+            this_predict = np.mean(predict_this_list)
+            predict_labels[i, 0] = this_predict
+
+        return predict_labels
+
+    @staticmethod
+    def _make_decision_trees(train_data: np.array, train_label: np.array,
+                             num_trees: int) -> List[list]:
+        """Construct the decision trees.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+            num_trees (int): num of decision trees.
+
+        Returns:
+            List[list]: List of built models.
+        """
+        feature_record = []
+        tree_record = []
+
+        for _ in range(num_trees):
+            sample_idx = np.arange(train_data.shape[0])
+            np.random.shuffle(sample_idx)
+            train_data = train_data[sample_idx, :]
+            train_label = train_label[sample_idx]
+
+            feature_idx = np.arange(train_data.shape[1])
+            np.random.shuffle(feature_idx)
+            n_feature = np.random.randint(1, train_data.shape[1] + 1)
+            selected_feature_ids = feature_idx[0:n_feature]
+            feature_record.append(selected_feature_ids)
+
+            dt = DecisionTreeRegressor()
+            dt.fit(train_data[:, selected_feature_ids], train_label)
+            tree_record.append(dt)
+
+        return [tree_record, feature_record]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/gp_handler.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/gp_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..76f8aab67c0cf40edfb9f95113c9c10765bf4ba4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/gp_handler.py
@@ -0,0 +1,129 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+try:
+    from pydacefit.dace import DACE
+    from pydacefit.fit import fit
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    DACE = get_placeholder('pydacefit')
+    fit = get_placeholder('pydacefit')
+
+from mmrazor.registry import TASK_UTILS
+from .base_handler import BaseHandler
+
+
+def get_pydacefit_func():
+    """Build a function map from pydacefit."""
+    from pydacefit.corr import (corr_cubic, corr_exp, corr_expg, corr_gauss,
+                                corr_spherical, corr_spline)
+    from pydacefit.dace import regr_linear, regr_quadratic
+    from pydacefit.regr import regr_constant
+
+    REGR = {
+        'linear': regr_linear,
+        'constant': regr_constant,
+        'quadratic': regr_quadratic
+    }
+
+    CORR = {
+        'gauss': corr_gauss,
+        'cubic': corr_cubic,
+        'exp': corr_exp,
+        'expg': corr_expg,
+        'spline': corr_spline,
+        'spherical': corr_spherical
+    }
+
+    return REGR, CORR
+
+
+class DACE_with_smooth(DACE):
+    """GP model."""
+
+    def __init__(self,
+                 regr,
+                 corr,
+                 theta: float = 1.0,
+                 thetaL: float = 0.0,
+                 thetaU: float = 100.0):
+        super(DACE_with_smooth, self).__init__(regr, corr, theta, thetaL,
+                                               thetaU)
+
+    def fit(self, X, Y):
+        """Build the model."""
+        if len(Y.shape) == 1:
+            Y = Y[:, None]
+
+        if X.shape[0] != Y.shape[0]:
+            raise Exception('X and Y must have the same number of rows.')
+
+        mX, sX = np.mean(X, axis=0), np.std(X, axis=0, ddof=1) + 1e-6
+        mY, sY = np.mean(Y, axis=0), np.std(Y, axis=0, ddof=1) + 1e-6
+
+        nX = (X - mX) / sX
+        nY = (Y - mY) / sY
+
+        if self.tl is not None and self.tu is not None:
+            self.model = {'nX': nX, 'nY': nY}
+            self.boxmin()
+            self.model = self.itpar['best']
+        else:
+            self.model = fit(nX, nY, self.regr, self.kernel, self.theta)
+
+        self.model = {
+            **self.model, 'mX': mX,
+            'sX': sX,
+            'mY': mY,
+            'sY': sY,
+            'nX': nX,
+            'nY': nY
+        }
+        self.model['sigma2'] = np.square(sY) @ self.model['_sigma2']
+
+
+@TASK_UTILS.register_module()
+class GaussProcessHandler(BaseHandler):
+    """GaussProcess handler of the metric predictor. It uses Gaussian Process
+    (Kriging) to predict the metric of a trained model.
+
+    Args:
+        regr (str): regression kernel for GP model. Defaults to 'linear'.
+        corr (str): correlation kernel for GP model. Defaults to 'gauss'.
+    """
+
+    def __init__(self, regr: str = 'linear', corr: str = 'gauss'):
+        REGR, CORR = get_pydacefit_func()
+        assert regr in REGR, \
+            ValueError(f'`regr` should be in `REGR`. Got `{regr}`.')
+        assert corr in CORR, \
+            ValueError(f'`corr` should be in `CORR`. Got `{corr}`.')
+        self.regr = REGR[regr]
+        self.corr = CORR[corr]
+
+        self.model = DACE_with_smooth(
+            regr=self.regr,
+            corr=self.corr,
+            theta=1.0,
+            thetaL=0.00001,
+            thetaU=100)
+
+    def fit(self, train_data: np.array, train_label: np.array) -> None:
+        """Training the model of handler.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+        """
+        self.model.fit(train_data, train_label)
+
+    def predict(self, test_data: np.array) -> np.array:
+        """Predict the evaluation metric of the model.
+
+        Args:
+            test_data (numpy.array): input data for testing.
+
+        Returns:
+            numpy.array: predicted metric.
+        """
+        return self.model.predict(test_data)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/mlp_handler.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/mlp_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..e1bd7975abd3d2554c4720feab4205f853799d91
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/mlp_handler.py
@@ -0,0 +1,197 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from mmcv.cnn.bricks import build_activation_layer
+
+try:
+    from mmdet.models.losses import SmoothL1Loss
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    SmoothL1Loss = get_placeholder('mmdet')
+from mmengine.model import BaseModule
+from mmengine.optim.scheduler import CosineAnnealingLR
+
+from mmrazor.registry import TASK_UTILS
+from .base_handler import BaseHandler
+
+
+class MLP(BaseModule):
+    """MLP implemented with nn.Linear.
+
+    Input: Tensor with shape [B, C, H, W].
+    Output: Tensor with shape [B, C, H, W].
+
+    Args:
+        in_features (int): Dimension of input features.
+        hidden_features (int): Dimension of hidden features.
+        out_features (int): Dimension of output features.
+        act_cfg (dict): The config dict for activation between pointwise
+            convolution. Defaults to ``dict(type='ReLU')``.
+        drop (float): Dropout rate. Defaults to 0.0.
+    """
+
+    def __init__(self,
+                 in_features: int = 78,
+                 hidden_features: int = 300,
+                 out_features: int = 1,
+                 num_hidden_layers: int = 2,
+                 act_cfg: Dict = dict(type='ReLU'),
+                 drop: float = 0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = build_activation_layer(act_cfg)
+
+        hidden_layers = []
+        for _ in range(num_hidden_layers):
+            hidden_layers.append(nn.Linear(hidden_features, hidden_features))
+            hidden_layers.append(build_activation_layer(act_cfg))
+        self.hidden_layers = nn.Sequential(*hidden_layers)
+
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+        self.init_weights()
+
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.hidden_layers(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        return x
+
+
+@TASK_UTILS.register_module()
+class MLPHandler(BaseHandler):
+    """MLP handler of the metric predictor. It uses MLP network to predict the
+    metric of a trained model.
+
+    Args:
+        epochs (int, optional): num of epochs for MLP network training.
+            Defaults to 100.
+        data_split_ratio (float, optional): split ratio of train/valid of
+            input data. Defaults to 0.8.
+        model_cfg (dict, optional): configs for MLP network. Defaults to None.
+        device (str, optional): device for MLP Handler. Defaults to 'cuda'.
+    """
+
+    def __init__(self,
+                 epochs: int = 100,
+                 data_split_ratio: float = 0.8,
+                 model_cfg: Dict = None,
+                 device: str = 'cpu'):
+        self.epochs = epochs
+        self.data_split_ratio = data_split_ratio
+
+        self.model_cfg = model_cfg if model_cfg is not None else dict()
+        self.model = MLP(**self.model_cfg)
+
+        self.device = device
+
+    def fit(self, train_data: np.array, train_label: np.array) -> None:
+        """Training the model of handler.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+        """
+        if train_data.shape[1] != self.model.fc1.in_features:
+            self.model.fc1 = nn.Linear(train_data.shape[1],
+                                       self.model.fc1.out_features)
+        self.model = self.train_mlp(train_data, train_label)
+
+    def predict(self, test_data: np.array) -> np.array:
+        """Predict the evaluation metric of the model.
+
+        Args:
+            test_data (numpy.array): input data for testing.
+
+        Returns:
+            numpy.array: predicted metric.
+        """
+        if test_data.ndim < 2:
+            data = torch.zeros(1, test_data.shape[0])
+            data[0, :] = torch.from_numpy(test_data).float()
+        else:
+            data = torch.from_numpy(test_data).float()
+
+        self.model = self.model.to(device=self.device)
+        self.model.eval()
+        with torch.no_grad():
+            data = data.to(device=self.device)
+            pred = self.model(data)
+
+        return pred.cpu().detach().numpy()
+
+    def load(self, path: str) -> None:
+        """Load predictor's pretrained weights."""
+        self.model.load_state_dict(
+            torch.load(path, map_location='cpu')['state_dict'])
+
+    def save(self, path: str) -> str:
+        """Save predictor and return saved path for diff suffix."""
+        path = path + '_mlp.pth'
+        torch.save({'state_dict': self.model.state_dict(), 'meta': {}}, path)
+        return path
+
+    def train_mlp(self, train_data: np.array,
+                  train_label: np.array) -> nn.Module:
+        """Train MLP network.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+
+        Returns:
+            nn.Module: the well-trained MLP network.
+        """
+        num_samples = train_data.shape[0]
+        target = torch.zeros(num_samples, 1)
+        perm = torch.randperm(target.size(0))
+        train_index = perm[:int(num_samples * self.data_split_ratio)]
+        valid_index = perm[int(num_samples * self.data_split_ratio):]
+
+        inputs = torch.from_numpy(train_data).float()
+        target[:, 0] = torch.from_numpy(train_label).float()
+
+        self.model = self.model.to(device=self.device)
+        self.optimizer = optim.Adam(self.model.parameters(), lr=8e-4)
+        self.criterion = SmoothL1Loss()
+
+        self.scheduler = CosineAnnealingLR(
+            self.optimizer, T_max=self.epochs, eta_min=0, by_epoch=True)
+
+        best_loss = 1e33
+        for _ in range(self.epochs):
+            train_inputs = inputs[train_index].to(self.device)
+            train_labels = target[train_index].to(self.device)
+
+            self.model.train()
+            self.optimizer.zero_grad()
+            pred = self.model(train_inputs)
+            loss = self.criterion(pred, train_labels)
+            loss.backward()
+            self.optimizer.step()
+
+            self.model.eval()
+            with torch.no_grad():
+                valid_inputs = inputs[valid_index].to(self.device)
+                valid_labels = target[valid_index].to(self.device)
+
+                pred = self.model(valid_inputs)
+                valid_loss = self.criterion(pred, valid_labels).item()
+
+            self.scheduler.step()
+
+            if valid_loss < best_loss:
+                best_loss = valid_loss
+                best_net = copy.deepcopy(self.model)
+
+        return best_net.to(device='cpu')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/rbf_handler.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/rbf_handler.py
new file mode 100644
index 0000000000000000000000000000000000000000..39bec949d370df6bac85d6d147d43b282f696a7f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/handler/rbf_handler.py
@@ -0,0 +1,68 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+
+try:
+    from pySOT.surrogate import RBFInterpolant
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    RBFInterpolant = get_placeholder('pySOT')
+
+from mmrazor.registry import TASK_UTILS
+from .base_handler import BaseHandler
+
+
+@TASK_UTILS.register_module()
+class RBFHandler(BaseHandler):
+    """RBF handler of the metric predictor. It uses `Radial Basis Function` to
+    predict the metric of a trained model.
+
+    Args:
+        kernel (str): RBF kernel object. Defaults to 'tps'.
+        tail (str): RBF polynomial tail object. Defaults to 'linear'.
+    """
+
+    def __init__(self, kernel: str = 'tps', tail: str = 'linear'):
+        from pySOT.surrogate import (ConstantTail, CubicKernel, Kernel,
+                                     LinearTail, Tail, TPSKernel)
+
+        self.kernel_mapping = {'cubic': CubicKernel, 'tps': TPSKernel}
+        self.tail_mapping = {'linear': LinearTail, 'constant': ConstantTail}
+
+        assert kernel in self.kernel_mapping.keys(), (
+            f'Got unknown RBF kernel `{kernel}`.')
+        self.kernel: Kernel = self.kernel_mapping[kernel]
+
+        assert tail in self.tail_mapping.keys(), (
+            f'Got unknown RBF tail `{tail}`.')
+        self.tail: Tail = self.tail_mapping[tail]
+
+    def fit(self, train_data: np.array, train_label: np.array) -> None:
+        """Training the model of handler.
+
+        Args:
+            train_data (numpy.array): input data for training.
+            train_label (numpy.array): input label for training.
+        """
+        if train_data.shape[0] <= train_data.shape[1]:
+            raise ValueError('In RBF, dim 0 of data (got '
+                             f'{train_data.shape[0]}) should be larger than '
+                             f'dim 1 of data (got {train_data.shape[1]}).')
+
+        self.model = RBFInterpolant(
+            dim=train_data.shape[1],
+            kernel=self.kernel(),
+            tail=self.tail(train_data.shape[1]))
+
+        for i in range(len(train_data)):
+            self.model.add_points(train_data[i, :], train_label[i])
+
+    def predict(self, test_data: np.array) -> np.array:
+        """Predict the evaluation metric of the model.
+
+        Args:
+            test_data (numpy.array): input data for testing.
+
+        Returns:
+            numpy.array: predicted metric.
+        """
+        return self.model.predict(test_data)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/metric_predictor.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/metric_predictor.py
new file mode 100644
index 0000000000000000000000000000000000000000..796c3ac5be83b4418513dce43fe6587f562dc98e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/predictor/metric_predictor.py
@@ -0,0 +1,223 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Union
+
+import numpy as np
+
+try:
+    import scipy.stats as stats
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    stats = get_placeholder('scipy')
+
+from mmrazor.registry import TASK_UTILS
+from mmrazor.structures import export_fix_subnet
+from mmrazor.utils.typing import DumpChosen
+from .handler import RBFHandler
+
+
+@TASK_UTILS.register_module()
+class MetricPredictor:
+    """A predictor for predicting evaluation metrics in different tasks.
+
+    Args:
+        handler_cfg (dict): Config to build a predict handler.
+        search_groups (dict) : The search_groups of the specified supernet.
+        train_samples (int): Num of training samples for the handler.
+            Defaults to 2.
+        handler_ckpt (str, optional): Path to handler's checkpoint. If given,
+            predictor will load weights directly instead of handler training.
+        encoding_type (str, optional): Type of how to encode the search space
+            to integer bit-string. Defaults to `onehot`.
+        score_key (str): Specify one metric in evaluation results to score
+            models. Defaults to 'accuracy_top-1'.
+    """
+
+    def __init__(self,
+                 handler_cfg: Dict,
+                 search_groups: Dict,
+                 train_samples: int = 2,
+                 handler_ckpt: str = None,
+                 encoding_type: str = 'onehot',
+                 score_key: str = 'accuracy_top-1',
+                 **kwargs):
+        self.handler_cfg = handler_cfg
+        self.handler = TASK_UTILS.build(handler_cfg)
+
+        assert encoding_type in [
+            'normal', 'onehot'
+        ], ('encoding_type must be `normal` or `onehot`.'
+            f'Got `{encoding_type}`.')
+        if isinstance(self.handler, RBFHandler):
+            encoding_type = 'normal'
+        self.encoding_type = encoding_type
+
+        self.search_groups = search_groups
+        self.train_samples = train_samples
+        self.handler_ckpt = handler_ckpt
+
+        self.score_key_list = [score_key] + ['anticipate']
+        self.initialize = False
+
+    def predict(self, model) -> Dict[str, float]:
+        """Predict the evaluation metric of input model using the handler.
+
+        Args:
+            model: input model.
+
+        Returns:
+            Dict[str, float]: evaluation metric of the model.
+        """
+        metric: Dict[str, float] = {}
+        assert self.initialize is True, (
+            'Before predicting, evaluator is required to be executed first, '
+            'cause the model of handler in predictor needs to be initialized.')
+
+        if self.initialize:
+            model, _ = export_fix_subnet(model)
+            data = self.preprocess(np.array([self.model2vector(model)]))
+            score = float(np.squeeze(self.handler.predict(data)))
+            if metric.get(self.score_key_list[0], None):
+                metric.update({self.score_key_list[1]: score})
+            else:
+                metric.update({self.score_key_list[0]: score})
+        return metric
+
+    def model2vector(
+            self, model: Dict[str, Union[str, DumpChosen]]) -> Dict[str, list]:
+        """Convert the input model to N-dims vector.
+
+        Args:
+            model (Dict[str, Union[str, DumpChosen]]): input model.
+
+        Returns:
+            Dict[str, list]: converted vector.
+        """
+        index = 0
+        vector_dict: Dict[str, list] = \
+            dict(normal_vector=[], onehot_vector=[])
+
+        assert len(model.keys()) == len(self.search_groups.keys()), (
+            f'Length mismatch for model({len(model.keys())}) and search_groups'
+            f'({len(self.search_groups.keys())}).')
+
+        for key, choice in model.items():
+            if isinstance(choice, DumpChosen):
+                assert choice.meta is not None, (
+                    f'`DumpChosen.meta` of current {key} should not be None '
+                    'when converting the search space.')
+                onehot = np.zeros(
+                    len(choice.meta['all_choices']), dtype=np.int)
+                _chosen_index = choice.meta['all_choices'].index(choice.chosen)
+            else:
+                if key is not None:
+                    from mmrazor.models.mutables import MutableChannelUnit
+                    if isinstance(self.search_groups[key][0],
+                                  MutableChannelUnit):
+                        choices = self.search_groups[key][0].candidate_choices
+                    else:
+                        choices = self.search_groups[key][0].choices
+                else:
+                    assert len(self.search_groups[index]) == 1
+                    choices = self.search_groups[index][0].choices
+                onehot = np.zeros(len(choices), dtype=np.int)
+                _chosen_index = choices.index(choice)
+            onehot[_chosen_index] = 1
+
+            vector_dict['normal_vector'].extend([_chosen_index])
+            vector_dict['onehot_vector'].extend(onehot)
+            index += 1
+
+        return vector_dict
+
+    def vector2model(self, vector: np.array) -> Dict[str, str]:
+        """Convert the N-dims vector to original model.
+
+        Args:
+            vector (numpy.array): input vector which represents the model.
+
+        Returns:
+            Dict[str, str]: converted model.
+        """
+        from mmrazor.models.mutables import OneShotMutableChannelUnit
+
+        start = 0
+        model = {}
+        vector = np.squeeze(vector)
+        for name, mutables in self.search_groups.items():
+            if isinstance(mutables[0], OneShotMutableChannelUnit):
+                choices = mutables[0].candidate_choices
+            else:
+                choices = mutables[0].choices
+
+            if self.encoding_type == 'onehot':
+                index = np.where(vector[start:start + len(choices)] == 1)[0][0]
+                start += len(choices)
+            else:
+                index = vector[start]
+                start += 1
+
+            chosen = choices[int(index)] if len(choices) > 1 else choices[0]
+            model[name] = chosen
+
+        return model
+
+    @staticmethod
+    def get_correlation(prediction: np.array,
+                        label: np.array) -> List[np.array]:
+        """Compute the correlations between prediction and ground-truth label.
+
+        Args:
+            prediction (numpy.array): predict vector.
+            label (numpy.array): ground-truth label.
+
+        Returns:
+            List[numpy.array]: coefficients of correlations between predicton
+                and ground-truth label.
+        """
+        rmse = np.sqrt(((prediction - label)**2).mean())
+        rho, _ = stats.spearmanr(prediction, label)
+        tau, _ = stats.kendalltau(prediction, label)
+        return [rmse, rho, tau]
+
+    def preprocess(self, data: List[Dict[str, list]]) -> np.array:
+        """Preprocess the data, convert it into np.array format.
+
+        Args:
+            data (List[Dict[str, list]]): input data for training.
+
+        Returns:
+            numpy.array: input data in numpy.array format.
+        """
+        if self.encoding_type == 'normal':
+            data = np.array([x['normal_vector'] for x in data])
+        else:
+            data = np.array([x['onehot_vector'] for x in data])
+        return data
+
+    def fit(self, data: List[Dict[str, list]], label: np.array) -> None:
+        """Training the handler using the structure information of a model. The
+        weights of handler will be fixed after that.
+
+        Args:
+            data (List[Dict[str, list]]): input data for training.
+            label (numpy.array): input label for training.
+        """
+        data = self.preprocess(data)
+        self.handler.fit(data, label)
+        self.initialize = True
+
+    def load_checkpoint(self) -> None:
+        """Load checkpoint for handler."""
+        self.handler.load(self.handler_ckpt)
+        self.initialize = True
+
+    def save_checkpoint(self, path: str) -> str:
+        """Save checkpoint of handler and return saved path for diff suffix.
+
+        Args:
+            path (str): save path for the handler.
+
+        Returns:
+            (str): specific checkpoint path of the current handler.
+        """
+        return self.handler.save(path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8af39912605d8f0100ab368a18066c5e9552b1e6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .function_inputs_recorder import FunctionInputsRecorder
+from .function_outputs_recorder import FunctionOutputsRecorder
+from .method_inputs_recorder import MethodInputsRecorder
+from .method_outputs_recorder import MethodOutputsRecorder
+from .module_inputs_recorder import ModuleInputsRecorder
+from .module_outputs_recorder import ModuleOutputsRecorder
+from .param_recorder import ParameterRecorder
+from .recorder_manager import RecorderManager
+
+__all__ = [
+    'FunctionOutputsRecorder', 'MethodOutputsRecorder',
+    'ModuleOutputsRecorder', 'ParameterRecorder', 'RecorderManager',
+    'ModuleInputsRecorder', 'MethodInputsRecorder', 'FunctionInputsRecorder'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/base_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/base_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..b34c6918033c403bacd424817a232de55c72db1f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/base_recorder.py
@@ -0,0 +1,116 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import Any, List, Optional
+
+from torch import nn
+
+
+class BaseRecorder(metaclass=ABCMeta):
+    """Base class for recorders.
+
+    Recorder is a context manager used to record various intermediate results
+    during the model forward. It can be used in distillation algorithm and
+    can also be used to obtain some specific data for visual analysis.
+
+    In MMRazor, there will be different types of recorders to obtain different
+    types of intermediate results. They can be used in combination with the
+    ``RecorderManager``.
+
+    Note:
+        The recorder will be lazily initialized in the ``RecorderManager`` by
+        default. If you want to use the recorder without the
+        ``RecorderManager``, you need to initialize it first.
+    """
+
+    def __init__(self, source: str) -> None:
+
+        self._source = source
+        # Intermediate results are recorded in dictionary format according
+        # to the data source.
+        # One data source may generate multiple records, which need to be
+        # recorded through list.
+        self._data_buffer: List = list()
+        # Before using the recorder for the first time, it needs to be
+        # initialized.
+        self._initialized = False
+
+    @property
+    def source(self) -> str:
+        """str: source of recorded data."""
+        return self._source
+
+    @property
+    def data_buffer(self) -> List:
+        """list: data buffer."""
+        return self._data_buffer
+
+    @abstractmethod
+    def prepare_from_model(self, model: Optional[nn.Module] = None) -> None:
+        """Make the intermediate results of the model can be record."""
+
+    def initialize(self, model: Optional[nn.Module] = None) -> None:
+        """Init the recorder.
+
+        Args:
+            model (nn.Module): The model which need to record intermediate
+                results.
+        """
+        self.prepare_from_model(model)
+        self._initialized = True
+
+    def get_record_data(self,
+                        record_idx: int = 0,
+                        data_idx: Optional[int] = None) -> Any:
+        """Get data from ``data_buffer``.
+
+        Args:
+            record_idx (int): The index of the record saved in
+                ``data_buffer``. If a source is executed N times during
+                forward, there will be N records in ``data_buffer``.
+            data_index (int, optional):  The index of target data in
+                a record. A record may be a tuple or a list, if data_idx is
+                None, the whole list or tuple is returned. Defaults to None.
+
+        Returns:
+            Any: The type of the return value is undefined, and different
+                source data may have different types.
+        """
+        assert record_idx < len(self._data_buffer), \
+            'record_idx is illegal. The length of data_buffer is ' \
+            f'{len(self._data_buffer)}, but record_idx is ' \
+            f'{record_idx}.'
+
+        record = self._data_buffer[record_idx]
+
+        if data_idx is None:
+            target_data = record
+        else:
+            if isinstance(record, (list, tuple)):
+                assert data_idx < len(record), \
+                    'data_idx is illegal. The length of record is ' \
+                    f'{len(record)}, but data_idx is {data_idx}.'
+                target_data = record[data_idx]
+            else:
+                raise TypeError('When data_idx is not None, record should be '
+                                'a list or tuple instance, but got '
+                                f'{type(record)}.')
+
+        return target_data
+
+    def reset_data_buffer(self) -> None:
+        """Clear data in data_buffer."""
+
+        self._data_buffer = list()
+
+    def __enter__(self):
+        """Enter the context manager."""
+
+        assert self._initialized, \
+            'The recorder will be initialized in the RecorderManager by '\
+            'default. If you want to use the recorder without the '\
+            'RecorderManager, you need to initialize it first.'
+
+        self.reset_data_buffer()
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Exit the context manager."""
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_inputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_inputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7bbdd8961c919197ff4c5234d889f43bfc5acad
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_inputs_recorder.py
@@ -0,0 +1,71 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from inspect import signature
+from typing import Callable, List
+
+from mmrazor.registry import TASK_UTILS
+from .function_outputs_recorder import FunctionOutputsRecorder
+
+
+@TASK_UTILS.register_module()
+class FunctionInputsRecorder(FunctionOutputsRecorder):
+    """Recorder for intermediate results which are ``FunctionType``'s inputs.
+
+     Notes:
+        The form of `source` needs special attention. For example,
+        `anchor_inside_flags` is a function in mmdetection to check whether the
+        anchors are inside the border. This function is in
+        `mmdet/core/anchor/utils.py` and used in
+        `mmdet/models/dense_heads/anchor_head.py`. Then the source should be
+        `mmdet.models.dense_heads.anchor_head.anchor_inside_flags` but not
+        `mmdet.core.anchor.utils.anchor_inside_flags`.
+
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> def toy_func(a, b):
+        ...     return a, b
+        >>> def execute_toy_func(a, b):
+        ...     toy_func(a, b)
+
+        >>> # Below code in main.py
+        >>> # Now, we want to get teacher's inputs by recorder.
+
+        >>> from toy_module import execute_toy_func
+        >>> r1 = FunctionInputsRecorder('toy_module.toy_func')
+        >>> r1.initialize()
+        >>> with r1:
+        ...     execute_toy_func(1, 2)
+        ...     execute_toy_func(1, b=2)
+        ...     execute_toy_func(b=2, a=1)
+
+        >>> r1.data_buffer
+        [[1, 2], [1, 2], [1, 2]]
+    """
+
+    def func_record_wrapper(self, origin_func: Callable,
+                            data_buffer: List) -> Callable:
+        """Save the function's inputs.
+
+        Args:
+            origin_func (FunctionType): The method whose inputs need to be
+                recorded.
+            data_buffer (list): A list of data.
+        """
+
+        func_input_params = signature(origin_func).parameters.keys()
+
+        @functools.wraps(origin_func)
+        def wrap_func(*args, **kwargs):
+            outputs = origin_func(*args, **kwargs)
+            inputs = list(args)
+            for keyword in func_input_params:
+                if keyword in kwargs:
+                    inputs.append(kwargs[keyword])
+            # assume a func execute N times, there will be N inputs need to
+            # save.
+            data_buffer.append(inputs)
+            return outputs
+
+        return wrap_func
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_outputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_outputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..706c1a8f71f193725e840b612965131d2b3abbfa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/function_outputs_recorder.py
@@ -0,0 +1,162 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from types import FunctionType, ModuleType
+from typing import Callable, List, Optional
+
+from mmengine.utils import import_modules_from_strings
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_recorder import BaseRecorder
+
+
+@TASK_UTILS.register_module()
+class FunctionOutputsRecorder(BaseRecorder):
+    """Recorder for intermediate results which are ``FunctionType``'s outputs.
+
+     Notes:
+        The form of `source` needs special attention. For example,
+        `anchor_inside_flags` is a function in mmdetection to check whether the
+        anchors are inside the border. This function is in
+        `mmdet/core/anchor/utils.py` and used in
+        `mmdet/models/dense_heads/anchor_head.py`. Then the source should be
+        `mmdet.models.dense_heads.anchor_head.anchor_inside_flags` but not
+        `mmdet.core.anchor.utils.anchor_inside_flags`.
+
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> def toy_func():
+        ...     return random.randint(0, 1000)
+        >>> def toy_list_func():
+        ...     return [random.randint(0,1000) for _ in range(3)]
+
+        >>> # Below code in main.py
+        >>> # Now, we want to get teacher's outputs by recorder.
+
+        >>> import toy_module
+        >>> r1 = FunctionOutputsRecorder('toy_module.toy_func')
+        >>> r1.initialize()
+        >>> with r1:
+        ...     output_teacher1 = toy_module.toy_func()
+        ...     output_teacher2 = toy_module.toy_func()
+        ...     output_teacher3 = toy_module.toy_func()
+
+        >>> r1.data_buffer
+        [33, 41, 12]
+        >>> recorder.get_record_data(record_idx=2)
+        12
+        >>> output_teacher1==33 and output_teacher2==41 and output_teacher3==41
+        True
+
+        >>> r2 = FunctionOutputsRecorder('toy_module.toy_list_func')
+        >>> r2.initialize()
+        >>> with r2:
+        ...     output_teacher1 = toy_module.toy_list_func()
+        ...     output_teacher2 = toy_module.toy_list_func()
+        ...     output_teacher3 = toy_module.toy_list_func()
+
+        >>> r2.data_buffer
+        [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
+        >>> r2.get_record_data(record_idx=2, data_idx=2)
+        9
+    """
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        self._check_valid_source(self.source)
+
+    @staticmethod
+    def _check_valid_source(source):
+        """Check if the source's format is valid."""
+        if not isinstance(source, str):
+            raise TypeError(f'source should be a str '
+                            f'instance, but got {type(source)}')
+
+        assert len(source.split('.')) > 1, \
+            'source must have at least one `.`'
+
+    @property
+    def func_name(self):
+        """Get the function name according to `func_path`."""
+        return self.source.split('.')[-1]
+
+    @property
+    def module_string(self):
+        """Get the module name according to `func_path`."""
+        return '.'.join(self.source.split('.')[:-1])
+
+    def prepare_from_model(self, model: Optional[nn.Module] = None) -> None:
+        """The `model` is useless in `FunctionOutputsRecorder`."""
+        pass
+
+    def func_record_wrapper(self, origin_func: Callable,
+                            data_buffer: List) -> Callable:
+        """Save the function's outputs.
+
+        Args:
+            origin_func (FunctionType): The method whose outputs need to be
+                recorded.
+            data_buffer (list): A list of data.
+        """
+
+        @functools.wraps(origin_func)
+        def wrap_func(*args, **kwargs):
+            outputs = origin_func(*args, **kwargs)
+            # assume a func execute N times, there will be N outputs need to
+            # save.
+            data_buffer.append(outputs)
+            return outputs
+
+        return wrap_func
+
+    def __enter__(self):
+        """Enter the context manager."""
+        super().__enter__()
+
+        # import the function corrosponding module
+        try:
+            mod = import_modules_from_strings(self.module_string)
+        except ImportError:
+            raise ImportError(
+                f'{self.module_string} is not imported correctly.')
+
+        self.imported_module: ModuleType = mod
+
+        assert hasattr(mod, self.func_name), \
+            f'{self.func_name} is not in {self.module_string}.'
+
+        origin_func = getattr(mod, self.func_name)
+        if not isinstance(origin_func, FunctionType):
+            raise TypeError(f'{self.func_name} should be a FunctionType '
+                            f'instance, but got {type(origin_func)}')
+
+        self.origin_func: Callable = origin_func
+
+        # add record wrapper to origin function.
+        record_func = self.func_record_wrapper(origin_func, self.data_buffer)
+
+        assert hasattr(mod, self.func_name), \
+            f'{self.func_name} is not in {self.module_string}.'
+
+        # rewrite the origin function
+        setattr(mod, self.func_name, record_func)
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Exit the context manager."""
+        super().__exit__(exc_type, exc_value, traceback)
+
+        mod = self.imported_module
+        origin_func = self.origin_func
+
+        assert hasattr(mod, self.func_name), \
+            f'{self.func_name} is not in {self.module_string}.'
+
+        # restore the origin function
+        setattr(mod, self.func_name, origin_func)
+
+        # self.imported_module and self.origin_func can not be pickled.
+        # Delete these two attributes to avoid errors when ema model is used.
+        del self.imported_module
+        del self.origin_func
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_inputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_inputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..44cb4184350da9b6fd9ed6ff36e45b8ac2caa510
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_inputs_recorder.py
@@ -0,0 +1,83 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from inspect import signature
+from typing import Callable, List
+
+from mmrazor.registry import TASK_UTILS
+from .method_outputs_recorder import MethodOutputsRecorder
+
+
+@TASK_UTILS.register_module()
+class MethodInputsRecorder(MethodOutputsRecorder):
+    """Recorder for intermediate results which are ``MethodType``'s inputs.
+
+    Note:
+        Different from ``FunctionType``, ``MethodType`` is the type of methods
+        of class instances.
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> class Toy():
+        ...     def toy_func(self, x, y=0):
+        ...         return x + y
+
+        >>> # Below code in main.py
+        >>> # Now, we want to get teacher's inputs by recorder.
+
+        >>> from toy_module import Toy
+        >>> toy = Toy()
+        >>> r1 = MethodInputsRecorder('toy_module.Toy.toy_func')
+        >>> r1.initialize()
+        >>> with r1:
+        ...     _ = toy.toy_func(1, 2)
+
+        >>> r1.data_buffer
+        [[1, 2]]
+        >>> r1.get_record_data(record_idx=0, data_idx=0)
+        1
+        >>> r1.get_record_data(record_idx=0, data_idx=1)
+        2
+
+        >>> from toy_module import Toy
+        >>> toy = Toy()
+        >>> r1 = MethodInputsRecorder('toy_module.Toy.toy_func')
+        >>> r1.initialize()
+        >>> with r1:
+        ...     _ = toy.toy_func(1, 2)
+        ...     _ = toy.toy_func(y=2, x=1)
+
+        >>> r1.data_buffer
+        [[1, 2], [1, 2]]
+        >>> r1.get_record_data(record_idx=1, data_idx=0)
+        1
+        >>> r1.get_record_data(record_idx=1, data_idx=1)
+        2
+    """
+
+    def method_record_wrapper(self, orgin_method: Callable,
+                              data_buffer: List) -> Callable:
+        """Save the method's inputs.
+
+        Args:
+            origin_method (MethodType): The method whose inputs need to be
+                recorded.
+            data_buffer (list): A list of data.
+        """
+
+        method_input_params = signature(orgin_method).parameters.keys()
+
+        @functools.wraps(orgin_method)
+        def wrap_method(*args, **kwargs):
+            outputs = orgin_method(*args, **kwargs)
+            # the first element of a class method is the class itself
+            inputs = list(args[1:])
+            for keyword in method_input_params:
+                if keyword in kwargs:
+                    inputs.append(kwargs[keyword])
+            # Assume a func execute N times, there will be N inputs need to
+            # save.
+            data_buffer.append(inputs)
+            return outputs
+
+        return wrap_method
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_outputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_outputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d3fb6593a0849e0567d6b269bac2e59685dbece
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/method_outputs_recorder.py
@@ -0,0 +1,167 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from types import FunctionType, ModuleType
+from typing import Callable, List, Optional
+
+from mmengine.utils import import_modules_from_strings
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_recorder import BaseRecorder
+
+
+@TASK_UTILS.register_module()
+class MethodOutputsRecorder(BaseRecorder):
+    """Recorder for intermediate results which are ``MethodType``'s outputs.
+
+    Note:
+        Different from ``FunctionType``, ``MethodType`` is the type of methods
+        of class instances.
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> class Toy():
+        ...     def toy_func(self):
+        ...         return random.randint(0, 1000)
+        ...     def toy_list_func(self):
+        ...         return [random.randint(0, 1000) for _ in range(3)]
+
+        >>> # Below code in main.py
+        >>> # Now, we want to get teacher's outputs by recorder.
+
+        >>> from toy_module import Toy
+        >>> toy = Toy()
+        >>> r1 = MethodOutputsRecorder('toy_module.Toy.toy_func')
+        >>> r1.initialize()
+        >>> with r1:
+        ...     output_teacher1 = toy.toy_func()
+        ...     output_teacher2 = toy.toy_func()
+        ...     output_teacher3 = toy.toy_func()
+
+        >>> r1.data_buffer
+        [33, 41, 12]
+        >>> r1.get_record_data(record_idx=2)
+        12
+        >>> output_teacher1==33 and output_teacher2==41 and output_teacher3==41
+        True
+
+        >>> r2 = MethodOutputsRecorder('toy_module.Toy.toy_list_func'
+        >>> r2.initialize()
+        >>> with r2:
+        ...     output_teacher1 = toy.toy_list_func()
+        ...     output_teacher2 = toy.toy_list_func()
+        ...     output_teacher3 = toy.toy_list_func()
+
+        >>> r2.data_buffer
+        [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
+        >>> r2.get_record_data(record_idx=2, data_idx=2)
+        9
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        self._check_valid_source(self.source)
+
+        # import the function corrosponding module
+        try:
+            mod: ModuleType = import_modules_from_strings(self.module_string)
+        except ImportError:
+            raise ImportError(
+                f'{self.module_string} is not imported correctly.')
+
+        assert hasattr(mod, self.cls_name),  \
+            f'{self.cls_name} is not in {self.module_string}.'
+
+        imported_cls: type = getattr(mod, self.cls_name)
+        if not isinstance(imported_cls, type):
+            raise TypeError(f'{self.cls_name} should be a type '
+                            f'instance, but got {type(imported_cls)}')
+        self.imported_class = imported_cls
+
+        assert hasattr(imported_cls, self.method_name), \
+            f'{self.method_name} is not in {self.cls_name}.'
+
+        origin_method = getattr(imported_cls, self.method_name)
+        if not isinstance(origin_method, FunctionType):
+            raise TypeError(f'{self.method_name} should be a FunctionType '
+                            f'instance, but got {type(origin_method)}')
+        self.origin_method = origin_method
+
+    @staticmethod
+    def _check_valid_source(source: str) -> None:
+        """Check if the `source` is valid."""
+        if not isinstance(source, str):
+            raise TypeError(f'source should be a str '
+                            f'instance, but got {type(source)}')
+
+        assert len(source.split('.')) > 2, \
+            'source must have at least two `.`'
+
+    @property
+    def method_name(self):
+        """Get the method name according to `method_path`."""
+        return self.source.split('.')[-1]
+
+    @property
+    def cls_name(self):
+        """Get the class name corresponding to this method according to
+        `method_path`."""
+        return self.source.split('.')[-2]
+
+    @property
+    def module_string(self):
+        """Get the module name according to `method_path`."""
+        return '.'.join(self.source.split('.')[:-2])
+
+    def prepare_from_model(self, model: Optional[nn.Module] = None) -> None:
+        """Wrapper the origin source methods.
+
+        The ``model`` is useless in this recorder, just to be consistent with
+        other recorders.
+        """
+        pass
+
+    def method_record_wrapper(self, orgin_method: Callable,
+                              data_buffer: List) -> Callable:
+        """Save the method's outputs.
+
+        Args:
+            origin_method (MethodType): The method whose outputs need to be
+                recorded.
+            data_buffer (list): A list of data.
+        """
+
+        @functools.wraps(orgin_method)
+        def wrap_method(*args, **kwargs):
+            outputs = orgin_method(*args, **kwargs)
+            # assume a func execute N times, there will be N outputs need to
+            # save.
+            data_buffer.append(outputs)
+            return outputs
+
+        return wrap_method
+
+    def __enter__(self):
+        """Enter the context manager."""
+        super().__enter__()
+
+        imported_cls = self.imported_class
+        origin_method = self.origin_method
+        # add record wrapper to origin method.
+        record_method = self.method_record_wrapper(origin_method,
+                                                   self.data_buffer)
+
+        # rewrite the origin method.
+        setattr(imported_cls, self.method_name, record_method)
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Exit the context manager."""
+        super().__exit__(exc_type, exc_value, traceback)
+
+        imported_cls = self.imported_class
+        origin_method = self.origin_method
+
+        # restore the origin method
+        setattr(imported_cls, self.method_name, origin_method)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_inputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_inputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..53ea90940f3fdd555bcf3a848fd39fea66d7a3ba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_inputs_recorder.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Tuple
+
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .module_outputs_recorder import ModuleOutputsRecorder
+
+
+@TASK_UTILS.register_module()
+class ModuleInputsRecorder(ModuleOutputsRecorder):
+    """Recorder for intermediate results which are Pytorch moudle's inputs."""
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward_hook(self, module: nn.Module, inputs: Tuple,
+                     outputs: Any) -> None:
+        """Save the module's forward input.
+
+        Args:
+            module (:obj:`torch.nn.Module`): The module to register hook.
+            inputs (tuple): The input of the module.
+            outputs : The output of the module.
+        """
+        if self.recording:
+            self.data_buffer.append(inputs)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_outputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_outputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..277d73935188c34b0c42f82ec9a494b8be3a2028
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/module_outputs_recorder.py
@@ -0,0 +1,96 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Any, Optional, Tuple
+
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_recorder import BaseRecorder
+
+
+@TASK_UTILS.register_module()
+class ModuleOutputsRecorder(BaseRecorder):
+    """Recorder for intermediate results which are Pytorch moudle's outputs.
+
+    Examples:
+            >>> from torch import nn
+            >>> class ToyModel(nn.Module):
+            ...     def __init__(self):
+            ...         super().__init__()
+            ...         self.conv1 = nn.Conv2d(8,8,1)
+            ...         self.conv2 = nn.Conv2d(1,1,1)
+            ...     def forward(self, x):
+            ...         x1 = self.conv1(x)
+            ...         x2 = self.conv1(x+1)
+            ...         return self.conv2(x1 + x2)
+
+            >>> model = ToyModel()
+            >>> [ name for name,_ in model.named_modules() ]
+            ['conv1', 'conv2']
+
+            >>> r1 = ModuleOutputsRecorder('conv1')
+            >>> r1.initialize(model)
+
+            >>> with r1:
+            >>>     res = model(torch.randn(1,1,1,1))
+
+            >>> r1.data_buffer
+            [tensor([[[[0.6734]]]]), tensor([[[[1.2514]]]]) ]
+            >>> r1.get_record_data(record_idx=1)
+            tensor([[[[1.2514]]]])
+
+            >>> r2 = ModuleOutputsRecorder('conv2')
+            >>> r2.initialize(model)
+
+            >>> with r2:
+            >>>     res = model(torch.randn(1,1,1,1))
+
+            >>> r2.data_buffer
+            [tensor([[[[0.9534]]]])]
+            >>> r2.get_record_data()
+            tensor([[[[0.9534]]]])
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._recording = False
+
+    @property
+    def recording(self) -> bool:
+        """bool: whether to record data in forward hook."""
+        return self._recording
+
+    def prepare_from_model(self, model: Optional[nn.Module] = None) -> None:
+        """Register Pytorch forward hook to corresponding module."""
+
+        assert model is not None, 'model can not be None.'
+
+        founded = False
+        for name, module in model.named_modules():
+            if name == self.source:
+                module.register_forward_hook(self.forward_hook)
+                founded = True
+                break
+
+        assert founded, f'"{self.source}" is not in the model.'
+
+    def forward_hook(self, module: nn.Module, inputs: Tuple,
+                     outputs: Any) -> None:
+        """Save the module's forward output.
+
+        Args:
+            module (:obj:`torch.nn.Module`): The module to register hook.
+            inputs (tuple): The input of the module.
+            outputs : The output of the module.
+        """
+        if self._recording:
+            self.data_buffer.append(outputs)
+
+    def __enter__(self):
+        """Enter the context manager."""
+        super().__enter__()
+        self._recording = True
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Exit the context manager."""
+        super().__exit__(exc_type, exc_value, traceback)
+        self._recording = False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/param_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/param_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..afd0d1c0f9a969b6add167ee04d60d112e701ca1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/param_recorder.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_recorder import BaseRecorder
+
+
+@TASK_UTILS.register_module()
+class ParameterRecorder(BaseRecorder):
+    """Recorder for Pytorch model's parameters.
+
+    Examples:
+        >>> from torch import nn
+        >>> class ToyModel(nn.Module):
+        ...     def __init__(self):
+        ...         super().__init__()
+        ...         self.toy_conv = nn.Conv2d(1,1,1)
+        ...     def forward(self, x):
+        ...         return self.toy_conv(x)
+
+        >>> model = ToyModel()
+        >>> [ name for name,_ in model.named_parameters() ]
+        ['toy_conv.weight', 'toy_conv.bias']
+
+        >>> recorder = ParameterRecorder('toy_conv.weight')
+        >>> recorder.initialize(model)
+
+        >>> recorder.data_buffer
+        [Parameter containing: tensor([[[[0.3244]]]], requires_grad=True)]
+        >>> recorder.get_record_data()
+        Parameter containing: tensor([[[[0.3244]]]], requires_grad=True)
+    """
+
+    def prepare_from_model(self, model: Optional[nn.Module] = None) -> None:
+        """Record the Pytorch model's parameters."""
+        assert model is not None, \
+            'model can not be None when use ParameterRecorder.'
+
+        founded = False
+        for param_name, param in model.named_parameters():
+            if param_name == self.source:
+                self.data_buffer.append(param)
+                founded = True
+                break
+
+        assert founded, f'"{self.source}" is not in the model.'
+
+    def reset_data_buffer(self):
+        """Clear data in data_buffer.
+
+        Note:
+            The data_buffer stores the address of the parameter in memory and
+            does not need to be reset.
+        """
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/recorder_manager.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/recorder_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..30fbbb39f1612625793624b4c2f65b0e081074fc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/recorder/recorder_manager.py
@@ -0,0 +1,116 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, Optional
+
+from torch import nn
+
+from mmrazor.registry import TASK_UTILS
+from .base_recorder import BaseRecorder
+
+
+class RecorderManager:
+    """Various types recorders' manager. The ``RecorderManager`` also is a
+    context manager, managing various types of Recorder. When entering the
+    ``RecorderManager``, all recorders managed by it will be started.
+
+    Note:
+        The recorders will be initialized in the ``RecorderManager`` by
+        default. If you want to just use a recorder without the
+        ``RecorderManager``, you need to initialize it first.
+
+    Args:
+        recorders (dict, optional): All recorders' config.
+
+
+    Examples:
+        >>> # Below code in toy_module.py
+        >>> import random
+        >>> class Toy():
+        ...     def toy_func(self):
+        ...         return random.randint(0, 1000)
+
+        >>> # Below code in main.py
+        >>> from torch import nn
+        >>> from toy_module import Toy
+
+        >>> class ToyModel(nn.Module):
+        ...     def __init__(self):
+        ...         super().__init__()
+        ...         self.conv1 = nn.Conv2d(1,1,1)
+        ...         self.conv2 = nn.Conv2d(1,1,1)
+        ...         self.toy = Toy()
+        ...     def forward(self, x):
+        ...         return self.conv2(self.conv1(x)) + self.toy.toy_func()
+
+        >>> model = ToyModel()
+        >>> [ name for name,_ in model.named_modules() ]
+        ['conv1', 'conv2']
+
+        >>> conv1_rec = ModuleOutputsRecorder('conv1')
+        >>> conv2_rec = ModuleOutputsRecorder('conv2')
+        >>> func_rec = MethodOutputsRecorder('toy_module.Toy.toy_func')
+        >>> manager = RecorderManager(
+        ...             {'conv1_rec': conv1_rec ,
+        ...              'conv2_rec': conv2_rec,
+        ...              'func_rec': func_rec})
+        >>> manager.initialize(model)
+
+        >>> with manager:
+        ...     res = model(torch.ones(1,1,1,1))
+        >>> res
+        tensor([[[[22.9534]]]])
+
+        >>> conv2_data = manager.get_recorder('conv2_rec').get_record_data()
+        >>> conv2_data
+        tensor([[[[0.9534]]]])
+
+        >>> func_data = manager.get_recorder('func_rec').get_record_data()
+        >>> func_data
+        22
+
+        >>> res.sum() == (conv2_data + func_data).sum()
+        True
+    """
+
+    def __init__(self, recorders: Optional[Dict] = None) -> None:
+
+        self._recorders: Dict[str, BaseRecorder] = dict()
+        if recorders:
+            for name, cfg in recorders.items():
+                recorder_cfg = copy.deepcopy(cfg)
+                recorder_type = cfg['type']
+                recorder_type_ = recorder_type + 'Recorder'
+
+                recorder_cfg['type'] = recorder_type_
+                recorder = TASK_UTILS.build(recorder_cfg)
+
+                self._recorders[name] = recorder
+
+    @property
+    def recorders(self) -> Dict[str, BaseRecorder]:
+        """dict: all recorders."""
+        return self._recorders
+
+    def get_recorder(self, recorder: str) -> BaseRecorder:
+        """Get the corresponding recorder according to the name."""
+        return self.recorders[recorder]
+
+    def initialize(self, model: nn.Module):
+        """Init all recorders.
+
+        Args:
+            model (nn.Module): The model which need to record intermediate
+                results.
+        """
+        for recorder in self.recorders.values():
+            recorder.initialize(model)
+
+    def __enter__(self):
+        """Enter the context manager."""
+        for recorder in self.recorders.values():
+            recorder.__enter__()
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Exit the context manager."""
+        for recorder in self.recorders.values():
+            recorder.__exit__(exc_type, exc_value, traceback)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..987030d814b84d97a8293d79a1162d373ab74b2e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .backward_tracer import BackwardTracer
+from .channel_analyzer import ChannelAnalyzer
+# from .razor_tracer import RazorFxTracer
+from .fx import (CustomTracer, UntracedMethodRegistry, build_graphmodule,
+                 custom_symbolic_trace)
+from .loss_calculator import *  # noqa: F401,F403
+from .parsers import *  # noqa: F401,F403
+from .path import (Path, PathConcatNode, PathConvNode, PathDepthWiseConvNode,
+                   PathLinearNode, PathList, PathNode, PathNormNode)
+
+__all__ = [
+    'BackwardTracer', 'PathConvNode', 'PathLinearNode', 'PathNormNode',
+    'PathConcatNode', 'Path', 'PathList', 'PathNode', 'PathDepthWiseConvNode',
+    'ChannelAnalyzer', 'CustomTracer', 'UntracedMethodRegistry',
+    'custom_symbolic_trace', 'build_graphmodule'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/backward_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/backward_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..f87c760d117da8145f9c809a8738bd164894258b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/backward_tracer.py
@@ -0,0 +1,201 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import re
+from collections import OrderedDict
+
+from mmengine import ConfigDict
+from torch.nn import Conv2d, Linear
+from torch.nn.modules import GroupNorm
+from torch.nn.modules.batchnorm import _NormBase
+
+from mmrazor.registry import TASK_UTILS
+from .parsers import DEFAULT_BACKWARD_TRACER
+from .path import Path, PathList
+
+SUPPORT_MODULES = (Conv2d, Linear, _NormBase, GroupNorm)
+
+
+@TASK_UTILS.register_module()
+class BackwardTracer:
+    """A topology tracer via backward.
+
+    Args:
+        loss_calculator (dict or Callable): Calculate the pseudo loss to trace
+            the topology of a model.
+    """
+
+    def __init__(self, loss_calculator):
+        if isinstance(loss_calculator, (dict, ConfigDict)):
+            loss_calculator = TASK_UTILS.build(loss_calculator)
+
+        assert callable(
+            loss_calculator
+        ), 'loss_calculator should be a dict, ConfigDict or ' \
+           'callable object'
+        self.loss_calculator = loss_calculator
+
+    @property
+    def backward_parser(self):
+        """The mapping from the type of a backward op to the corresponding
+        parser."""
+        return DEFAULT_BACKWARD_TRACER
+
+    def backward_trace(self, grad_fn, module2name, param2module, cur_path,
+                       result_paths, visited, shared_module):
+        """Trace the topology of all the ``NON_PASS_MODULE``."""
+        grad_fn = grad_fn[0] if isinstance(grad_fn, (list, tuple)) else grad_fn
+
+        if grad_fn is not None:
+            name = type(grad_fn).__name__
+            # In pytorch graph, there may be an additional '0' or '1'
+            # (e.g. ThnnConv2DBackward0) after a backward op. Delete the
+            # digit numbers to build the corresponding parser.
+            name = re.sub(r'[0-1]+', '', name)
+            parse_module = self.backward_parser.get(name)
+
+            if parse_module is not None:
+                parse_module(self, grad_fn, module2name, param2module,
+                             cur_path, result_paths, visited, shared_module)
+            else:
+                # If the op is AccumulateGrad, parents is (),
+                parents = grad_fn.next_functions
+                if parents is not None:
+                    for parent in parents:
+                        self.backward_trace(parent, module2name, param2module,
+                                            cur_path, result_paths, visited,
+                                            shared_module)
+        else:
+            result_paths.append(copy.deepcopy(cur_path))
+
+    def _trace_shared_module_hook(self, module, inputs, outputs):
+        """Trace shared modules. Modules such as the detection head in
+        RetinaNet which are visited more than once during :func:`forward` are
+        shared modules.
+
+        Args:
+            module (:obj:`torch.nn.Module`): The module to register hook.
+            inputs (tuple): The input of the module.
+            outputs (tuple): The output of the module.
+        """
+        module._cnt += 1
+
+    def _build_mappings(self, model):
+        """Build the mappings which are used during tracing."""
+
+        module2name = OrderedDict()
+        # build a mapping from the identity of a module's params
+        # to this module
+        param2module = OrderedDict()
+        # record the visited module name during trace path
+        visited = dict()
+
+        def traverse(module, prefix=''):
+            for name, child in module.named_children():
+                full_name = f'{prefix}.{name}' if prefix else name
+
+                if isinstance(child, SUPPORT_MODULES):
+                    module2name[child] = full_name
+                    for param in child.parameters():
+                        param2module[id(param)] = child
+                    visited[full_name] = False
+                else:
+                    traverse(child, full_name)
+
+        traverse(model)
+
+        return module2name, param2module, visited
+
+    def _register_share_module_hook(self, model):
+        """Record shared modules which will be visited more than once during
+        forward such as shared detection head in RetinaNet.
+
+        If a module is not a shared module and it has been visited during
+        forward, its parent modules must have been traced already. However, a
+        shared module will be visited more than once during forward, so it is
+        still need to be traced even if it has been visited.
+        """
+        self._shared_module_hook_handles = list()
+        for module in model.modules():
+            if hasattr(module, 'weight'):
+                # trace shared modules
+                module._cnt = 0
+                # the handle is only to remove the corresponding hook later
+                handle = module.register_forward_hook(
+                    self._trace_shared_module_hook)
+                self._shared_module_hook_handles.append(handle)
+
+    def _remove_share_module_hook(self, model):
+        """`_trace_shared_module_hook` and `_cnt` are only used to trace the
+        shared modules in a model and need to be remove later."""
+        for module in model.modules():
+            if hasattr(module, 'weight'):
+                del module._cnt
+
+        for handle in self._shared_module_hook_handles:
+            handle.remove()
+
+        del self._shared_module_hook_handles
+
+    def _set_all_requires_grad(self, model):
+        """Set `requires_grad` of a parameter to True to trace the whole
+        architecture topology."""
+        self._param_requires_grad = dict()
+        for param in model.parameters():
+            self._param_requires_grad[id(param)] = param.requires_grad
+            param.requires_grad = True
+
+    def _restore_requires_grad(self, model):
+        """We set requires_grad to True to trace the whole architecture
+        topology.
+
+        So it should be reset after that.
+        """
+        for param in model.parameters():
+            param.requires_grad = self._param_requires_grad[id(param)]
+        del self._param_requires_grad
+
+    @staticmethod
+    def _find_share_modules(model):
+        """Find shared modules which will be visited more than once during
+        forward such as shared detection head in RetinaNet."""
+        share_modules = list()
+        for name, module in model.named_modules():
+            if hasattr(module, 'weight'):
+                if module._cnt > 1:
+                    share_modules.append(name)
+
+        return share_modules
+
+    @staticmethod
+    def _reset_norm_running_stats(model):
+        """As we calculate the pseudo loss during tracing, we need to reset
+        states of parameters."""
+        for module in model.modules():
+            if isinstance(module, _NormBase):
+                module.reset_parameters()
+
+    def trace(self, model):
+        """Trace trace the architecture topology of the input model."""
+        module2name, param2module, visited = self._build_mappings(model)
+
+        # Set requires_grad to True. If the `requires_grad` of a module's
+        # weight is False, we can not trace this module by parsing backward.
+        self._set_all_requires_grad(model)
+
+        self._register_share_module_hook(model)
+
+        pseudo_loss = self.loss_calculator(model)
+
+        share_modules = self._find_share_modules(model)
+
+        self._remove_share_module_hook(model)
+        self._restore_requires_grad(model)
+
+        module_path_list = PathList()
+
+        self.backward_trace(pseudo_loss.grad_fn, module2name, param2module,
+                            Path(), module_path_list, visited, share_modules)
+
+        self._reset_norm_running_stats(model)
+
+        return module_path_list
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f754e97e6d68beceb1972f5a8b168964e46776c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/channel_analyzer.py
@@ -0,0 +1,178 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""
+- How to config ChannelAnalyzer by hard code
+  - fxtracer
+    - demo_inputs
+        ./mmrazor/models/task_modules/demo_inputs/default_demo_inputs.py
+    - leaf module
+      - ChannelAnalyzer.default_leaf_modules
+    - method
+      - ./mmrazor/models/task_modules/tracer/fx_tracer.py
+  - ChannelNode
+    - ./mmrazor/structures/graph/channel_nodes.py
+  - DynamicOp
+        ./mmrazor/models/architectures/dynamic_ops/bricks/dynamic_conv.py
+"""
+import copy
+from typing import Dict, List, Tuple, Union
+
+import torch
+import torch.nn as nn
+from mmcv.cnn.bricks import Scale
+from mmengine.model.utils import revert_sync_batchnorm
+
+from mmrazor.models.architectures.dynamic_ops import DynamicChannelMixin
+from mmrazor.models.mutables.mutable_channel import (
+    MutableChannelUnit, SequentialMutableChannelUnit)
+from mmrazor.models.mutables.mutable_channel.units.utils import find_mutable
+from mmrazor.registry import TASK_UTILS
+from mmrazor.structures.graph import ModuleGraph
+from mmrazor.structures.graph.channel_graph import (
+    ChannelGraph, default_channel_node_converter)
+from mmrazor.structures.graph.module_graph import (FxTracerToGraphConverter,
+                                                   PathToGraphConverter)
+from mmrazor.structures.graph.pseudo_fx_graph import parse_torch_graph
+from mmrazor.utils import print_log
+from ..demo_inputs import BaseDemoInput, DefaultDemoInput
+from .backward_tracer import BackwardTracer
+from .fx_tracer import MMFxTracer
+from .loss_calculator.sum_loss_calculator import SumPseudoLoss
+
+
+@TASK_UTILS.register_module()
+class ChannelAnalyzer:
+    """The tracer for pruning. It return the configs of MutableChannelUnits as
+    result.
+
+    Args:
+        demo_input (Union[List, Dict, Tuple, BaseDemoInput], optional):
+            The demo input for the model. demo_input can be one of
+            input_shape(list), config of a demo input generator, a demoinput
+            generator. Defaults to (1, 3, 224, 224).
+        tracer_type (str, optional): str indicates which basic tracer to use.
+            Defaults to 'BackwardTracer'.
+    """
+    default_leaf_modules = (
+        # dynamic op
+        DynamicChannelMixin,
+        # torch
+        nn.Conv2d,
+        nn.Linear,
+        nn.modules.batchnorm._BatchNorm,
+        # mmcv
+        Scale,
+    )
+
+    def __init__(self,
+                 demo_input: Union[List, Dict, Tuple,
+                                   BaseDemoInput] = (1, 3, 224, 224),
+                 tracer_type='BackwardTracer') -> None:
+
+        if isinstance(demo_input, dict):
+            self.demo_input = TASK_UTILS.build(demo_input)
+        elif isinstance(demo_input, list) or isinstance(demo_input, tuple):
+            self.demo_input = DefaultDemoInput(demo_input, False)
+        elif isinstance(demo_input, BaseDemoInput):
+            self.demo_input = demo_input
+        else:
+            raise NotImplementedError(f'{type(demo_input)},{demo_input}')
+
+        self.input_shape = demo_input
+
+        assert tracer_type in ['BackwardTracer', 'FxTracer']
+        self.tracer_type = tracer_type
+        if tracer_type == 'BackwardTracer':
+            self.tracer = BackwardTracer(
+                loss_calculator=SumPseudoLoss(
+                    input_shape=self.demo_input.input_shape))
+        elif tracer_type == 'FxTracer':
+            from mmrazor import digit_version
+            assert digit_version(torch.__version__) >= digit_version(
+                '1.12.0'
+            ), 'Please install torch>=1.12.0, if you want to use fx tracer.'
+            self.tracer = MMFxTracer(leaf_module=self.default_leaf_modules)
+        else:
+            raise NotImplementedError()
+
+    def analyze(self, model):
+        """Tracer the model, and return configs of channel dependency."""
+        model = copy.deepcopy(model)
+        model = revert_sync_batchnorm(model)
+        model.eval()
+        if self.tracer_type == 'BackwardTracer':
+            path_list = self.tracer.trace(model)
+            module_graph: ModuleGraph = PathToGraphConverter(path_list,
+                                                             model).graph
+        elif self.tracer_type == 'FxTracer':
+            fx_graph = self._fx_trace(model)
+            fx_graph.owning_module = model
+            base_graph = parse_torch_graph(fx_graph)
+
+            module_graph = FxTracerToGraphConverter(base_graph, model).graph
+            module_graph._model = model
+        else:
+            raise NotImplementedError()
+
+        module_graph.refresh_module_name()
+        module_graph.check(fix=True)
+        module_graph.check()
+
+        channel_graph = ChannelGraph.copy_from(module_graph,
+                                               default_channel_node_converter)
+        channel_graph.check(fix=True)
+        channel_graph.check()
+
+        channel_graph.forward(self.demo_input.input_shape[1])
+        unit_configs = channel_graph.generate_units_config()
+
+        return self._find_mutable_units(model, unit_configs)
+
+    def _fx_trace(self, model):
+        """Tracer the model using fx tracer."""
+        args = self.demo_input.get_data(model)
+        if isinstance(args, dict):
+            args.pop('inputs')
+            args['mode'] = 'tensor'
+            return self.tracer.trace(model, concrete_args=args)
+        else:
+            return self.tracer.trace(model)
+
+    def _find_mutable_units(self, model: nn.Module, units_config: Dict):
+        """Test the tracer result and filter unforwardable units."""
+        model = copy.deepcopy(model).cpu()
+        units: List[SequentialMutableChannelUnit] = [
+            SequentialMutableChannelUnit.init_from_cfg(model, cfg)
+            for cfg in units_config.values()
+        ]
+        for unit in units:
+            unit.prepare_for_pruning(model)
+        mutable_units = [unit for unit in units if unit.is_mutable]
+        inputs = self.demo_input.get_data(model)
+        model.eval()
+
+        template_output = None
+        if isinstance(inputs, dict):
+            for mode in ['loss', 'tensor', 'predict']:
+                try:
+                    inputs['mode'] = mode
+                    template_output = model(**inputs)
+                    break
+                except Exception as e:
+                    print_log(f'Forward failed in {mode} mode as {e}')
+        else:
+            try:
+                template_output = model(inputs)
+            except Exception as e:
+                print_log(f'Forward failed in as {e}')
+        if template_output is None:
+            raise Exception(
+                'Forward failed, there may be an error in demo input.',
+                f'{inputs}')
+        mutable_units = find_mutable(model, mutable_units, units, inputs,
+                                     template_output)
+        mutable_unit_config = {}
+        for unit in mutable_units:
+            mutable_unit_config[
+                unit.name] = MutableChannelUnit.config_template(
+                    unit, with_channels=True, with_init_args=True)
+        return mutable_unit_config
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..82f723f10b4a1a95301a78ceceaef0772c7516b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .custom_tracer import (CustomTracer, UntracedMethodRegistry,
+                            build_graphmodule, custom_symbolic_trace)
+from .graph_utils import (del_fakequant_after_function,
+                          del_fakequant_after_method,
+                          del_fakequant_after_module, del_fakequant_after_op,
+                          del_fakequant_before_function,
+                          del_fakequant_before_method,
+                          del_fakequant_before_module, del_fakequant_before_op)
+
+__all__ = [
+    'CustomTracer', 'UntracedMethodRegistry', 'custom_symbolic_trace',
+    'build_graphmodule', 'del_fakequant_before_module',
+    'del_fakequant_after_module', 'del_fakequant_after_function',
+    'del_fakequant_before_function', 'del_fakequant_after_op',
+    'del_fakequant_before_op', 'del_fakequant_before_method',
+    'del_fakequant_after_method'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/custom_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/custom_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..68d5f08097211f1e7cfdbafd5d1ca93216d7aa87
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/custom_tracer.py
@@ -0,0 +1,477 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools
+from copy import deepcopy
+from types import FunctionType
+from typing import Any, Callable, Dict, List, Optional, Tuple, Type, Union
+
+import torch
+import torch.nn as nn
+
+try:
+    from torch._C import ScriptObject  # type: ignore[attr-defined]
+    from torch.ao.quantization.quantize_fx import QuantizationTracer
+    from torch.fx import Graph, GraphModule, Tracer
+    from torch.fx._symbolic_trace import (_autowrap_check,
+                                          _patch_wrapped_functions, _Patcher)
+    from torch.fx.proxy import Proxy
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ScriptObject = get_placeholder('torch>=1.13')
+    QuantizationTracer = get_placeholder('torch>=1.13')
+    GraphModule = get_placeholder('torch>=1.13')
+    Tracer = get_placeholder('torch>=1.13')
+    Graph = get_placeholder('torch>=1.13')
+    _autowrap_check = get_placeholder('torch>=1.13')
+    _patch_wrapped_functions = get_placeholder('torch>=1.13')
+    _Patcher = get_placeholder('torch>=1.13')
+    Proxy = get_placeholder('torch>=1.13')
+
+from mmengine.utils import import_modules_from_strings
+
+from mmrazor.registry import TASK_UTILS
+
+_orig_module_call: Callable = nn.Module.__call__
+_orig_module_getattr: Callable = nn.Module.__getattr__
+
+
+class UntracedMethodRegistry:
+    """A `Descriptor` class which records untraced methods. Thus, when the
+    class is traced with CustomTracer, the decorated method will be as a leaf
+    node, not be nested traced.
+
+    Example:
+        >>> # `imported_cls` is the owner of the untraced method;
+        >>> # `method_str` is the name of the untraced method.
+        >>> method_registry = UntracedMethodRegistry(method)
+        >>> method_registry.__set_name__(imported_cls, method_str)
+
+    Args:
+        method (FunctionType): Function to be registered.
+    """
+    method_dict: Dict = dict()
+    tracer = None
+
+    def __init__(self, method: FunctionType):
+        self.method = method
+        self.owner = None
+
+    def __set_name__(self, owner, name):
+        self.owner = owner
+        self.name = name
+        wrapped = self.method_wrapper()
+        self.method_dict[name] = dict(mod=self.owner, wrapped=wrapped)
+
+    def method_wrapper(self):
+
+        @functools.wraps(self.method)
+        def wrapped_method(mod, *args, **kwargs):
+
+            def method(*args, **kwargs):
+                return self.method(mod, *args, **kwargs)
+
+            return self.tracer.call_method(mod, self.name, method, args,
+                                           kwargs)
+
+        return wrapped_method
+
+
+def _prepare_module_dict(model: torch.nn.Module, fx_graph):
+    """If there is a class method that can not be traced by the symbolic
+    tracer, a ``call_method`` ``Node`` will be inserted into the ``Graph`` in
+    ``CustomTracer``.
+
+    Example:
+        >>> class Model:
+        ...     def __init__(self):
+        ...         self.head = ClsHead()
+        ...
+        >>> class ClsHead(nn.Module):
+        ...     def forward(self, feats: Tuple[torch.Tensor]) -> torch.Tensor:
+        ...         return feats[-1]
+        ...
+        ...     def loss(self, feats: Tuple[torch.Tensor],
+        ...              data_samples: List[ClsDataSample], **kwargs) -> dict:
+        ...         cls_score = self(feats)
+        ...         # The part can not be traced by torch.fx
+        ...         losses = self._get_loss(cls_score, data_samples, **kwargs)
+        ...         return losses
+        ...
+        ...     def _get_loss(self, cls_score: torch.Tensor,
+        ...                   data_samples: List[ClsDataSample], **kwargs):
+        ...         if 'score' in data_samples[0].gt_label:
+        ...             xxx
+        ...         else:
+        ...             xxx
+        ...         losses = xxx
+        ...         return losses
+
+    As the ``_get_loss`` can not be traced by torch.fx, ``Toy._get_loss`` need
+    to be added to ``skipped_methods`` in ``CustomTracer``. Hence the code
+    above will product the following Graph::
+
+    .. code-block:: text
+        ... ...
+        %head : [#users=1] = get_attr[target=head]
+        %_get_loss : [#users=1] = call_method[target=_get_loss](args = (%head, %head_fc, %data_samples), kwargs = {})  # noqa: E501
+        return _get_loss
+
+    Hence, the head module in the ``GraphModule`` and that in the original
+    model are the same one (refer to https://github.com/pytorch/pytorch/blob/master/torch/fx/graph_module.py#L346).  # noqa: E501
+    So changes made to the graph module (in ``prepare()``) will also modify
+    the original model.
+
+    Args:
+        model (torch.nn.Module): Module or function to be
+            traced and converted into a Graph representation.
+        fx_graph (torch.fx.Graph): The fx Graph traced by fx tracer. It
+            contains the nodes this GraphModule should use for code generation.
+    """
+
+    def _get_attrs(target, attrs):
+        attrs = attrs.split('.')
+        for att in attrs:
+            target = getattr(target, att)
+        return target
+
+    module_dict = dict()
+    special_nodes = []
+
+    for node in fx_graph.nodes:
+        if node.op == 'get_attr':
+            attr = _get_attrs(model, node.target)
+            if isinstance(attr, nn.Module):
+                module_dict[node.target] = nn.Module()
+                special_nodes.append(node)
+        elif node.op == 'call_method':
+            for special_node in special_nodes:
+                if special_node in node.args or \
+                        special_node in node.kwargs.values():
+                    origin_module = getattr(model, special_node.target)
+                    setattr(module_dict[special_node.target], node.target,
+                            getattr(origin_module, node.target))
+
+    return module_dict
+
+
+def duplicate_reused_nodes(graph: Graph, modules: Dict[str, Any] = {}):
+    """Deepcopy the shared modules (e.g. shared detection head in RetinaNet) to
+    make sure modules can be fused correctly.
+
+    Modified from https://github.com/ModelTC/MQBench/blob/main/mqbench/prepare_by_platform.py  # noqa: E501
+    """
+    _dup_prefix = '_dup'
+    target_dict = dict()
+    dup_modules = dict()
+    for node in graph.nodes:
+        if node.op == 'call_module':
+            if node.target not in target_dict:
+                target_dict[node.target] = [node]
+            else:
+                target_dict[node.target].append(node)
+    for key in target_dict:
+        if len(target_dict[key]) > 1:
+            for idx, node in enumerate(target_dict[key]):
+                if idx == 0:
+                    continue
+                module = deepcopy(modules[node.target])
+                node.target += _dup_prefix + str(idx)
+                dup_modules[node.target] = module
+    graph.lint()
+    return graph, dup_modules
+
+
+def build_graphmodule(model: torch.nn.Module,
+                      fx_graph,
+                      name: str = 'GraphModule'):
+    """To build GraphModule with the generated graph by CustomTracer. The
+    implement of skipping methods in CustomTracer will cause the confliction of
+    that a node is both a leaf node and non-leaf node, which will lead that the
+    modification to the ``graph`` also change the original ``forward``.
+
+    Args:
+        model (torch.nn.Module): Module or function to be
+            traced and converted into a Graph representation.
+        fx_graph (torch.fx.Graph): The fx Graph traced by fx tracer. It
+            contains the nodes this GraphModule should use for code generation.
+        name (str): The name of generated GraphModule.
+
+    Returns:
+        GraphModule: GraphModule is an nn.Module generated from an fx.Graph.
+        Graphmodule has a ``graph`` attribute, as well as ``code`` and
+        ``forward`` attributes generated from that ``graph``.
+
+    .. warning::
+        When ``graph`` is reassigned, ``code`` and ``forward`` will be
+        automatically regenerated. However, if you edit the contents of the
+        ``graph`` without reassigning the ``graph`` attribute itself, you must
+        call ``recompile()`` to update the generated code.
+    """
+    modules = dict(model.named_modules())
+    module_dict = _prepare_module_dict(model, fx_graph)
+    fx_graph, duplicated_modules = duplicate_reused_nodes(fx_graph, modules)
+    modules.update(module_dict)
+    modules.update(duplicated_modules)
+    return GraphModule(modules, fx_graph, name)
+
+
+@TASK_UTILS.register_module()
+class CustomTracer(QuantizationTracer):
+    """Custom tracer based on QuantizationTracer of pytorch. It can not only
+    skip some modules and classes while tracing, but also skip some methods
+    untraced by torch.fx.Tracer.
+
+    Args:
+        skipped_methods (List[str], optional): Methods to be skipped while
+            tracing. Defaults to None.
+        skipped_module_names (List[str], optional): Modules to be skipped
+            while tracing. Defaults to None.
+        skipped_module_classes (List[Callable], optional): Class to be skipped
+            while tracing. Defaults to None.
+    """
+
+    def __init__(self,
+                 skipped_methods: List[str] = [],
+                 skipped_module_names: List[str] = [],
+                 skipped_module_classes: List[Callable] = [],
+                 *args,
+                 **kwargs):
+        super(CustomTracer, self).__init__(skipped_module_names,
+                                           skipped_module_classes)
+        UntracedMethodRegistry.tracer = self  # type: ignore
+        self.skipped_methods = skipped_methods
+        if self.skipped_methods:
+            self.register_skipped_methods()
+
+    @staticmethod
+    def _check_valid_source(source):
+        """Check if the source's format is valid."""
+        if not isinstance(source, str):
+            raise TypeError(f'source should be a str '
+                            f'instance, but got {type(source)}')
+
+        assert len(source.split('.')) > 1, \
+            'source must have at least one `.`'
+
+    def register_skipped_methods(self):
+        """Register skipped methods to UntracedMethodRegistry.method_dict."""
+        if not isinstance(self.skipped_methods, list):
+            self.skipped_methods = [self.skipped_methods]
+        for s_method in self.skipped_methods:
+            self._check_valid_source(s_method)
+            mod_str = '.'.join(s_method.split('.')[:-2])
+            cls_str = s_method.split('.')[-2]
+            method_str = s_method.split('.')[-1]
+
+            try:
+                mod = import_modules_from_strings(mod_str)
+            except ImportError:
+                raise ImportError(f'{mod_str} is not imported correctly.')
+
+            imported_cls: type = getattr(mod, cls_str)
+            if not isinstance(imported_cls, type):
+                raise TypeError(f'{cls_str} should be a type '
+                                f'instance, but got {type(imported_cls)}')
+            assert hasattr(imported_cls, method_str), \
+                   f'{method_str} is not in {mod_str}.'
+
+            method = getattr(imported_cls, method_str)
+
+            method_registry = UntracedMethodRegistry(method)
+            method_registry.__set_name__(imported_cls, method_str)
+
+    def call_method(self, m: torch.nn.Module, name: str, method: Callable,
+                    args: Tuple, kwargs: Dict):
+        """Method that specifies the behavior of this ``Tracer`` when it
+        encounters a call to an ``nn.Module`` instance.
+
+        By default, the behavior is to check if the called module is a leaf
+        module via ``is_leaf_module``. If it is, emit a ``call_module``
+        node referring to ``m`` in the ``Graph``. Otherwise, call the
+        ``Module`` normally, tracing through the operations in its ``forward``
+        function.
+
+        This method can be overridden to--for example--create nested traced
+        GraphModules, or any other behavior you would want while tracing across
+        ``Module`` boundaries.
+
+        Args:
+            m (torch.nn.Module): The module for which a call is being emitted
+            name (str): The name of proxy to be created.
+            method (Callable): The method of the ``Module`` to be invoked
+            args (Tuple): args of the module callsite
+            kwargs (Dict): kwargs of the module callsite
+
+        Return:
+
+            The return value from the Module call. In the case that a
+            ``call_module`` node was emitted, this is a ``Proxy`` value.
+            Otherwise, it is whatever value was returned from the ``Module``
+            invocation.
+        """
+        # module_qualified_name = self.path_of_module(m)
+        if not self.is_skipped_method(m):
+            return method(*args, **kwargs)
+        args_l = list(args)
+        args_l.insert(0, m)
+        args = tuple(args_l)
+        return self.create_proxy('call_method', name, args, kwargs)
+
+    def trace(self,
+              root: Union[torch.nn.Module, Callable[..., Any]],
+              concrete_args: Optional[Dict[str, Any]] = None) -> Graph:
+        """Trace ``root`` and return the corresponding FX ``Graph``
+        representation. ``root`` can either be an ``nn.Module`` instance or a
+        Python callable. Note that after this call, ``self.root`` may be
+        different from the ``root`` passed in here. For example, when a free
+        function is passed to ``trace()``, we will create an ``nn.Module``
+        instance to use as the root and add embedded constants to.
+
+        Args:
+            root (Union[Module, Callable]): Either a ``Module`` or a function
+                to be traced through. Backwards-compatibility for this
+                parameter is guaranteed.
+            concrete_args (Optional[Dict[str, any]]): Concrete arguments that
+                should not be treated as Proxies. This parameter is
+                experimental and its backwards-compatibility is *NOT*
+                guaranteed.
+
+        Returns:
+            A ``Graph`` representing the semantics of the passed-in ``root``.
+        """
+        if isinstance(root, torch.nn.Module):
+            self.root = root
+            fn = type(root).forward
+            self.submodule_paths: Optional[Dict[torch.nn.Module, str]] = {
+                mod: name
+                for name, mod in root.named_modules()
+            }
+        else:
+            self.root = nn.Module()
+            fn = root
+
+        tracer_cls: Optional[Type['Tracer']] = getattr(self, '__class__', None)
+        self.graph = Graph(tracer_cls=tracer_cls)
+
+        # When we encounter a Tensor value that's not a parameter, we look if
+        # it is some other attribute on the model. Construct a dict mapping
+        # Tensor values to the qualified name here for efficiency. This is
+        # used downstream in create_arg
+        self.tensor_attrs: Dict[Union[torch.Tensor, ScriptObject], str] = {}
+
+        def collect_tensor_attrs(m: nn.Module, prefix_atoms: List[str]):
+            for k, v in m.__dict__.items():
+                if isinstance(v, (torch.Tensor, ScriptObject)):
+                    self.tensor_attrs[v] = '.'.join(prefix_atoms + [k])
+            for k, v in m.named_children():
+                collect_tensor_attrs(v, prefix_atoms + [k])
+
+        collect_tensor_attrs(self.root, [])
+
+        assert isinstance(fn, FunctionType)
+
+        fn_globals = fn.__globals__  # run before it gets patched
+        fn, args = self.create_args_for_root(fn, isinstance(root, nn.Module),
+                                             concrete_args)
+
+        # Reduce number of get_attr calls
+        parameter_proxy_cache: Dict[str, Proxy] = {}
+
+        # Method dispatch on parameters is not recorded unless it's directly
+        # used. Thus, we need to insert a proxy when __getattr__ requests a
+        # parameter.
+        @functools.wraps(_orig_module_getattr)
+        def module_getattr_wrapper(mod, attr):
+            attr_val = _orig_module_getattr(mod, attr)
+            return self.getattr(attr, attr_val, parameter_proxy_cache)
+
+        @functools.wraps(_orig_module_call)
+        def module_call_wrapper(mod, *args, **kwargs):
+
+            def forward(*args, **kwargs):
+                return _orig_module_call(mod, *args, **kwargs)
+
+            _autowrap_check(
+                patcher,
+                getattr(getattr(mod, 'forward', mod), '__globals__', {}),
+                self._autowrap_function_ids)
+            return self.call_module(mod, forward, args, kwargs)
+
+        with _Patcher() as patcher:
+            # allow duplicate patches to support the case of nested calls
+            patcher.patch_method(
+                nn.Module,
+                '__getattr__',
+                module_getattr_wrapper,
+                deduplicate=False)
+            patcher.patch_method(
+                nn.Module, '__call__', module_call_wrapper, deduplicate=False)
+
+            for name, value in UntracedMethodRegistry.method_dict.items():
+                wrapped = value['wrapped']
+                patcher.patch_method(
+                    value['mod'], name, wrapped, deduplicate=False)
+
+            _patch_wrapped_functions(patcher)
+            _autowrap_check(patcher, fn_globals, self._autowrap_function_ids)
+            for module in self._autowrap_search:
+                _autowrap_check(patcher, module.__dict__,
+                                self._autowrap_function_ids)
+            self.create_node(
+                'output',
+                'output', (self.create_arg(fn(*args)), ), {},
+                type_expr=fn.__annotations__.get('return', None))
+
+        self.submodule_paths = None
+
+        return self.graph
+
+    def is_skipped_method(self, m: torch.nn.Module):
+        """Judge if ``m`` is registered skipped method."""
+        mods = tuple(value['mod']
+                     for value in UntracedMethodRegistry.method_dict.values())
+        custom = isinstance(m, mods)
+        return custom
+
+    def is_leaf_module(self, m: torch.nn.Module,
+                       module_qualified_name: str) -> bool:
+        """A method to specify whether a given ``nn.Module`` is a "leaf"
+        module. Leaf modules are the atomic units that appear in the IR,
+        referenced by ``call_module`` calls. By default, Modules in the PyTorch
+        standard library namespace (torch.nn) are leaf modules. All other
+        modules are traced through and their constituent ops are recorded,
+        unless specified otherwise via this parameter.
+
+        Args:
+            m (Module): The module being queried about
+            module_qualified_name (str): The path to root of this module.
+                For example, if you have a module hierarchy where submodule
+                ``foo`` contains submodule ``bar``, which contains submodule
+                ``baz``, that module will appear with the qualified name
+                ``foo.bar.baz`` here.
+        """
+        leaf = super().is_leaf_module(m, module_qualified_name)
+        return leaf
+
+
+def custom_symbolic_trace(
+        root: Union[torch.nn.Module, Callable[..., Any]],
+        concrete_args: Optional[Dict[str, Any]] = None) -> GraphModule:
+    """Modified `symbolic_trace` function in pytorch. Given an ``nn.Module`` or
+    function instance ``root``, this function will return a ``GraphModule``
+    constructed by recording operations seen while tracing through ``root``.
+
+    Args:
+        root (torch.nn.Module): Module or function to be
+            traced and converted into a Graph representation.
+        concrete_args (Optional[Dict[str, any]]): Inputs to be partially
+            specialized.
+
+    Returns:
+        GraphModule: a Module created from the recorded operations from
+        ``root``.
+    """
+    tracer = CustomTracer()
+    graph = tracer.trace(root, concrete_args)
+    name = root.__class__.__name__ if isinstance(
+        root, torch.nn.Module) else root.__name__
+    return GraphModule(tracer.root, graph, name)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/graph_utils.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/graph_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ca12917115fed7328ff7faae703da5cd5a3f3de9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx/graph_utils.py
@@ -0,0 +1,387 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Any, List, Tuple
+
+import torch
+
+try:
+    from torch.ao.quantization.fake_quantize import FakeQuantizeBase
+    from torch.fx import Node
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    FakeQuantizeBase = get_placeholder('torch>=1.13')
+    Node = get_placeholder('torch>=1.13')
+
+
+def _get_attrs(target: torch.nn.Module, attr: str) -> Any:
+    """Get the attribute from target.
+
+    Args:
+        target (torch.nn.Module): Get the attribute from target module.
+        attr (str): The target attribute.
+
+    Returns:
+        Any: The target attribute.
+    """
+
+    attrs: List[str] = attr.split('.')
+
+    for att in attrs:
+        target = getattr(target, att, None)
+    return target
+
+
+def recursive_find_erased_nodes(node, prepared_model):
+    """Find FakeQuant before target node recursively.
+
+    Examples:
+        head_fc = self.head.fc(activation_post_process_87);  \
+            activation_post_process_87 = None
+        activation_post_process_88 = \
+            self.activation_post_process_88(head_fc);  head_fc = None
+        head = self.head
+        _get_loss = head._get_loss(activation_post_process_88,
+            data_samples);  \
+            head = activation_post_process_88 = data_samples = None
+        return _get_loss
+
+    node                       |           node.args
+    --------------------
+    output                     | (_get_loss, )
+    _get_loss                  | (head, activation_post_process_88,
+                                    data_samples)
+    head                       | ()
+    activation_post_process_88 | (head_fc, )
+    data_samples               | (None, )
+    """
+    if node is None:
+        return []
+
+    if node.op == 'call_module' and isinstance(
+            _get_attrs(prepared_model, node.target), FakeQuantizeBase):
+        return [node]
+
+    nodes_to_erase = []
+    for prev_node in node.args:
+        if isinstance(prev_node, Node):
+            nodes_to_erase.extend(
+                recursive_find_erased_nodes(prev_node, prepared_model))
+    for prev_node in node.kwargs.values():
+        if isinstance(prev_node, Node):
+            nodes_to_erase.extend(
+                recursive_find_erased_nodes(prev_node, prepared_model))
+
+    return nodes_to_erase
+
+
+def del_fakequant_before_op(prepared_model,
+                            target_ops: Tuple,
+                            inplace: bool = True):
+    """Delete useless fakequant before nodes whose ``op`` attribute (node.op)
+    is in `target_ops`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_ops (tuple): Fakequants before nodes whose op attribute
+            (node.op) is in `target_ops` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+    for node in new_graph.nodes:
+        if node.op in target_ops:
+            nodes_to_erase: List[Node] = recursive_find_erased_nodes(
+                node, prepared_model)
+            for to_erase in nodes_to_erase:
+                assert to_erase.op == 'call_module' and isinstance(
+                    _get_attrs(prepared_model, to_erase.target),
+                    FakeQuantizeBase) and len(to_erase.args) == 1
+                to_erase.replace_all_uses_with(to_erase.args[0])
+                new_graph.erase_node(to_erase)
+                delattr(prepared_model, to_erase.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_after_op(prepared_model,
+                           target_ops: Tuple,
+                           inplace: bool = True):
+    """Delete useless fakequant after nodes whose ``op`` attribute (node.op) is
+    in `target_ops`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_ops (tuple): Fakequants after nodes whose op attribute
+            (node.op) is in `target_ops` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+
+    target_nodes = []
+    for node in new_graph.nodes:
+        if node.op in target_ops:
+            target_nodes.append(node)
+
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), FakeQuantizeBase):
+            assert len(node.args) == 1
+            prev_node = node.args[0]
+            if prev_node not in target_nodes:
+                continue
+            node.replace_all_uses_with(prev_node)
+            new_graph.erase_node(node)
+            delattr(prepared_model, node.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_before_method(prepared_model,
+                                method_patterns: Tuple,
+                                inplace: bool = True):
+    """Delete useless fakequant before nodes whose op attribute (node.op) is
+    `call_method` and target attribute (node.target) is in `target_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_patterns (tuple): Fakequants before nodes whose op attribute
+            (node.op) is `call_method` and target attribute (node.target) is
+            in `target_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+    for node in new_graph.nodes:
+        if node.op == 'call_method' and node.target in method_patterns:
+            nodes_to_erase: List[Node] = recursive_find_erased_nodes(
+                node, prepared_model)
+            for to_erase in nodes_to_erase:
+                assert to_erase.op == 'call_module' and isinstance(
+                    _get_attrs(prepared_model, to_erase.target),
+                    FakeQuantizeBase) and len(to_erase.args) == 1
+                to_erase.replace_all_uses_with(to_erase.args[0])
+                new_graph.erase_node(to_erase)
+                delattr(prepared_model, to_erase.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_after_method(prepared_model,
+                               method_patterns: Tuple,
+                               inplace: bool = True):
+    """Delete useless fakequant after nodes whose op attribute (node.op) is
+    `call_method` and target attribute (node.target) is in `target_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_patterns (tuple): Fakequants after nodes whose op attribute
+            (node.op) is `call_method` and target attribute (node.target)
+            is in `target_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+
+    target_nodes = []
+    for node in new_graph.nodes:
+        if node.op == 'call_method' and node.target in method_patterns:
+            target_nodes.append(node)
+
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), FakeQuantizeBase):
+            assert len(node.args) == 1
+            prev_node = node.args[0]
+            if prev_node not in target_nodes:
+                continue
+            node.replace_all_uses_with(prev_node)
+            new_graph.erase_node(node)
+            delattr(prepared_model, node.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_before_function(prepared_model,
+                                  function_patterns: Tuple,
+                                  inplace: bool = True):
+    """Delete useless fakequant before nodes whose op attribute (node.op) is
+    `call_function` and target attribute (node.target) is in `target_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_patterns (tuple): Fakequants before nodes whose op attribute
+            (node.op) is `call_function` and target attribute (node.target) is
+            in `target_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+    for node in new_graph.nodes:
+        if node.op == 'call_function' and node.target in function_patterns:
+            nodes_to_erase: List[Node] = recursive_find_erased_nodes(
+                node, prepared_model)
+            for to_erase in nodes_to_erase:
+                assert to_erase.op == 'call_module' and isinstance(
+                    _get_attrs(prepared_model, to_erase.target),
+                    FakeQuantizeBase) and len(to_erase.args) == 1
+                to_erase.replace_all_uses_with(to_erase.args[0])
+                new_graph.erase_node(to_erase)
+                delattr(prepared_model, to_erase.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_after_function(prepared_model,
+                                 function_patterns: Tuple,
+                                 inplace: bool = True):
+    """Delete useless fakequant after nodes whose op attribute (node.op) is
+    `call_function` and target attribute (node.target) is in `target_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        function_patterns (tuple): Fakequants after nodes whose op attribute
+            (node.op) is `call_function` and target attribute (node.target) is
+            in `target_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place. Defaults to
+            True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+
+    target_nodes = []
+    for node in new_graph.nodes:
+        if node.op == 'call_function' and node.target in function_patterns:
+            target_nodes.append(node)
+
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), FakeQuantizeBase):
+            assert len(node.args) == 1
+            prev_node = node.args[0]
+            if prev_node not in target_nodes:
+                continue
+            node.replace_all_uses_with(prev_node)
+            new_graph.erase_node(node)
+            delattr(prepared_model, node.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_before_module(prepared_model,
+                                module_patterns: Tuple,
+                                inplace: bool = True):
+    """Delete useless fakequant before modules whose type are in
+    `module_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_patterns (tuple): Fakequants before modules whose type is in
+            `module_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place.
+            Defaults to True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), module_patterns):
+            to_erase = node.args[0]
+            if not (to_erase.op == 'call_module' and isinstance(
+                    _get_attrs(prepared_model, to_erase.target),
+                    FakeQuantizeBase)):
+                continue
+            to_erase.replace_all_uses_with(to_erase.args[0])
+            new_graph.erase_node(to_erase)
+            delattr(prepared_model, to_erase.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
+
+
+def del_fakequant_after_module(prepared_model,
+                               module_patterns: Tuple,
+                               inplace: bool = True):
+    """Delete useless fakequant after modules whose type are in
+    `module_patterns`.
+
+    Args:
+        prepared_model (GraphModule): Prepared standalone module.
+        target_patterns (tuple): Fakequants after modules whose type is in
+            `module_patterns` will be deleted.
+        inplace (bool): Can optionally do the operation in-place.
+            Defaults to True.
+
+    Returns:
+        GraphModule: Prepared standalone module after deletion.
+    """
+    if not inplace:
+        prepared_model = copy.deepcopy(prepared_model)
+    new_graph = copy.deepcopy(prepared_model.graph)
+    target_nodes = []
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), module_patterns):
+            target_nodes.append(node)
+
+    for node in new_graph.nodes:
+        if node.op == 'call_module' and isinstance(
+                _get_attrs(prepared_model, node.target), FakeQuantizeBase):
+            assert len(node.args) == 1
+            prev_node = node.args[0]
+            if prev_node not in target_nodes:
+                continue
+            node.replace_all_uses_with(prev_node)
+            new_graph.erase_node(node)
+            delattr(prepared_model, node.target)
+
+    new_graph.lint()
+    prepared_model.graph = new_graph
+    return prepared_model
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..23b4c3325b77445687a6bc434584fb2c7a6c7184
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/fx_tracer.py
@@ -0,0 +1,359 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module define FxTracer and related classes."""
+# flake8: noqa
+import functools
+import inspect
+import sys
+import types
+from types import FunctionType, ModuleType
+from typing import Any, Callable, Dict, List, Optional, Set, Tuple, Type, Union
+
+import torch
+import torch.nn as nn
+from mmengine import MMLogger
+from torch._C import ScriptObject  # type: ignore[attr-defined]
+
+from mmrazor.utils import get_placeholder
+
+try:
+    from torch.fx._symbolic_trace import (Tracer, _find_proxy,
+                                          _orig_module_call,
+                                          _orig_module_getattr,
+                                          _patch_wrapped_functions, _Patcher)
+    from torch.fx.graph import Graph
+    from torch.fx.node import Argument
+    from torch.fx.proxy import Proxy
+except ImportError:
+    Tracer = get_placeholder('torch>=1.12')
+    _find_proxy = get_placeholder('torch>=1.12')
+    _orig_module_call = get_placeholder('torch>=1.12')
+    _orig_module_getattr = get_placeholder('torch>=1.12')
+    _patch_wrapped_functions = get_placeholder('torch>=1.12')
+    _Patcher = get_placeholder('torch>=1.12')
+    Graph = get_placeholder('torch>=1.12')
+    Argument = get_placeholder('torch>=1.12')
+    Proxy = get_placeholder('torch>=1.12')
+
+from mmrazor import digit_version
+
+sys.setrecursionlimit(int(pow(2, 20)))
+
+logger = MMLogger.get_current_instance()
+
+
+def _autowrap_check(patcher: _Patcher, frame_dict: Dict[str, Any],
+                    function_ids: Set[int]):
+    auto_wrapper = AutoWrapper(patcher)
+    auto_wrapper.wrap(None, '', frame_dict)
+
+
+def auto_wrap(patcher, owner):
+    auto_wrapper = AutoWrapper(patcher)
+    auto_wrapper.wrap(None, '', owner)
+
+
+class AutoWrapper:
+
+    def __init__(self, patcher) -> None:
+        self.patcher: _Patcher = patcher
+
+    # wrap
+
+    def wrap(self, owner, name, val):
+
+        def is_method(val):
+            return (inspect.ismethod(val) or inspect.isfunction(val)
+                    or isinstance(val, types.BuiltinFunctionType)
+                    or isinstance(val, staticmethod)
+                    or isinstance(val, classmethod))
+
+        if owner is None and isinstance(val, dict):
+            self.wrap_frame(owner, name, val)
+        else:
+            # class
+            if inspect.isclass(val):
+                self.wrap_class(owner, name, val)
+            # method
+            elif inspect.isclass(owner) and is_method(val):
+                self.wrap_method(owner, name, val)
+            # function
+            elif inspect.isfunction(val) or isinstance(
+                    val, types.BuiltinFunctionType):
+                self.wrap_function(owner, name, val)
+            # package
+            elif isinstance(val, ModuleType):
+                self.wrap_module(owner, name, val)
+            # instance
+            elif isinstance(val, object):
+                self.wrap_class(None, '', type(val))
+            # else
+            else:
+                logger.debug(f'unsupported type to wrap: {name}/{type(val)}')
+
+    def wrap_frame(self, owner, name: str, val: dict):
+        assert isinstance(val, dict)
+
+        if self.patcher.visit_once(val):
+            frame_name = val['__name__'] if '__name__' in val else ''
+            logger.debug(f'wrap a frame {frame_name}')
+            for key in val:
+                self.wrap(val, key, val[key])
+
+    def wrap_module(self, owner, name, val):
+        if self.visit_once(val):
+            if val in [torch]:
+                logger.debug(f'wrap a module {owner[name]}')
+                self.wrap(None, '', val.__dict__)
+
+    def wrap_class(self, owner, name, val):
+        assert inspect.isclass(val)
+        if issubclass(val, nn.Module):
+            if self.visit_once(val):
+                logger.debug(f'wrap a class {val}')
+                for key in val.__dict__:
+                    key: str
+                    if not (key.startswith('__')):
+                        self.wrap(val, key, val.__dict__[key])
+
+    def wrap_function(self, owner, name, val):
+        if self.visit_once(val):
+            self.patcher.patch(owner, name, self.func_wapper(val))
+            logger.debug(f'wrap a function {name}')
+
+    def wrap_method(self, owner, name, val):
+        assert inspect.isclass(owner)
+        if self.visit_once(val):
+            try:
+                if isinstance(val, staticmethod):
+                    pass
+                    logger.debug(f'wrap a staticmethod {name} (unimplement)')
+                elif isinstance(val, classmethod):
+                    pass
+                    logger.debug(f'wrap a classmethod {name} (unimplement)')
+                else:
+                    self.patcher.patch_method(owner, name,
+                                              self.method_wrapper(val))
+                    logger.debug(f'wrap an instance method {name}')
+            except Exception:
+                self.patcher.patches_made.pop()
+
+    # wrapper
+    def func_wapper(self, orig_fn):
+
+        @functools.wraps(orig_fn)
+        def wrapped(*args, **kwargs):
+            """Given an closed-over ``orig_function`` to invoke, search the
+            args and kwargs for a Proxy object.
+
+            If there is one, emit a ``call_function`` node to preserve the call
+            to this leaf function directly. Otherwise, just return the results
+            of this function call, as this function is not being traced.
+            """
+            _autowrap_check(self.patcher, getattr(orig_fn, '__globals__', {}),
+                            set())
+            try:
+                end = orig_fn(*args, **kwargs)
+                return end
+            except Exception:
+                logger.debug(f'auto wrap {orig_fn}')
+                proxy = _find_proxy(args, kwargs)
+                if proxy is not None:
+                    return_proxy = proxy.tracer.create_proxy(
+                        'call_function', orig_fn, args, kwargs)
+                    return_proxy.node.meta['is_wrapped'] = True
+                    return return_proxy
+                else:
+                    return orig_fn(*args, **kwargs)
+
+        return wrapped
+
+    def method_wrapper(self, orig_fn):
+
+        @functools.wraps(orig_fn)
+        def wrapped(*args, **kwargs):
+            """Given an closed-over ``orig_function`` to invoke, search the
+            args and kwargs for a Proxy object.
+
+            If there is one, emit a ``call_function`` node to preserve the call
+            to this leaf function directly. Otherwise, just return the results
+            of this function call, as this function is not being traced.
+            """
+            _autowrap_check(self.patcher, getattr(orig_fn, '__globals__', {}),
+                            set())
+            # logger.debug(f'call method {orig_fn}')
+            try:
+                end = orig_fn(*args, **kwargs)
+                return end
+            except Exception:
+                logger.debug(f'auto wrap {orig_fn}')
+                proxy: Proxy = _find_proxy(args, kwargs)
+                if proxy is not None:
+                    return_proxy = proxy.tracer.create_proxy(
+                        'call_method', orig_fn.__name__, args, kwargs)
+                    return_proxy.node.meta['is_wrapped'] = True
+                    return return_proxy
+                else:
+                    return orig_fn(*args, **kwargs)
+
+        return wrapped
+
+    # others
+    def visit_once(self, obj):
+        return self.patcher.visit_once(obj)
+
+    def is_visited(self, obj):
+        id_ = id(obj)
+        return id_ in self.patcher.visited
+
+
+class FxTracer(Tracer):
+
+    def trace(self,
+              root: Union[torch.nn.Module, Callable[..., Any]],
+              concrete_args: Optional[Dict[str, Any]] = None) -> Graph:
+        """Please refer to torch.fx._symbolic_trace.Tracer."""
+        if isinstance(root, torch.nn.Module):
+            self.root = root
+
+            assert hasattr(type(root), self.traced_func_name), (
+                f"traced_func_name={self.traced_func_name} doesn't exist in {type(root).__name__}"  # noqa
+            )  # noqa
+
+            fn = getattr(type(root), self.traced_func_name)
+            self.submodule_paths = {
+                mod: name
+                for name, mod in root.named_modules()
+            }
+        else:
+            self.root = torch.nn.Module()
+            fn = root
+
+        tracer_cls: Optional[Type['Tracer']] = getattr(self, '__class__', None)
+        self.graph = Graph(tracer_cls=tracer_cls)
+
+        # When we encounter a Tensor value that's not a parameter, we look if it
+        # is some other attribute on the model. Construct a dict mapping Tensor
+        # values to the qualified name here for efficiency. This is used downstream
+        # in create_arg
+        self.tensor_attrs: Dict[Union[torch.Tensor, ScriptObject], str] = {}
+
+        def collect_tensor_attrs(m: torch.nn.Module, prefix_atoms: List[str]):
+            for k, v in m.__dict__.items():
+                if isinstance(v, (torch.Tensor, ScriptObject)):
+                    self.tensor_attrs[v] = '.'.join(prefix_atoms + [k])
+            for k, v in m.named_children():
+                collect_tensor_attrs(v, prefix_atoms + [k])
+
+        collect_tensor_attrs(self.root, [])
+
+        assert isinstance(fn, FunctionType)
+
+        fn_globals = fn.__globals__  # run before it gets patched
+        fn, args = self.create_args_for_root(fn,
+                                             isinstance(root, torch.nn.Module),
+                                             concrete_args)
+
+        parameter_proxy_cache: Dict[str, Proxy] = {
+        }  # Reduce number of get_attr calls
+
+        # Method dispatch on parameters is not recorded unless it's directly used.
+        # Thus, we need to insert a proxy when __getattr__ requests a parameter.
+        @functools.wraps(_orig_module_getattr)
+        def module_getattr_wrapper(mod, attr):
+            attr_val = _orig_module_getattr(mod, attr)
+            ########################################################################
+            if digit_version(torch.__version__) >= digit_version('1.13.0'):
+                return self.getattr(attr, attr_val, parameter_proxy_cache)
+            else:
+                return self._module_getattr(attr, attr_val,
+                                            parameter_proxy_cache)
+            ########################################################################
+        @functools.wraps(_orig_module_call)
+        def module_call_wrapper(mod, *args, **kwargs):
+
+            def forward(*args, **kwargs):
+                return _orig_module_call(mod, *args, **kwargs)
+
+            _autowrap_check(
+                patcher,
+                getattr(getattr(mod, 'forward', mod), '__globals__', {}),
+                self._autowrap_function_ids)
+            ########################################################################
+            auto_wrap(patcher, mod)
+            ########################################################################
+
+            return self.call_module(mod, forward, args, kwargs)
+
+        with _Patcher() as patcher:
+            # allow duplicate patches to support the case of nested calls
+            patcher.patch_method(
+                torch.nn.Module,
+                '__getattr__',
+                module_getattr_wrapper,
+                deduplicate=False)
+            patcher.patch_method(
+                torch.nn.Module,
+                '__call__',
+                module_call_wrapper,
+                deduplicate=False)
+            _patch_wrapped_functions(patcher)
+            ########################################################################
+            patcher.visit_once(globals())
+            auto_wrap(patcher, self.root)
+            ########################################################################
+            _autowrap_check(patcher, fn_globals, self._autowrap_function_ids)
+            for module in self._autowrap_search:
+                _autowrap_check(patcher, module.__dict__,
+                                self._autowrap_function_ids)
+            self.create_node(
+                'output',
+                'output', (self.create_arg(fn(*args)), ), {},
+                type_expr=fn.__annotations__.get('return', None))
+
+        self.submodule_paths = None  # type:ignore
+
+        return self.graph
+
+    def call_module(self, m: torch.nn.Module, forward: Callable[..., Any],
+                    args: Tuple[Any, ...], kwargs: Dict[str, Any]) -> Any:
+
+        try:
+            return super().call_module(m, forward, args, kwargs)
+        except Exception:
+            module_qualified_name = self.path_of_module(m)
+            return self.create_proxy('call_module', module_qualified_name,
+                                     args, kwargs)
+
+    def create_arg(self, a: Any) -> 'Argument':
+        try:
+            arg = super().create_arg(a)
+            return arg
+        except Exception:
+            return a
+
+
+class MMFxTracer(FxTracer):
+
+    def __init__(
+            self,
+            autowrap_modules: Tuple = (),
+            autowrap_functions: Tuple[Callable, ...] = (),
+            param_shapes_constant: bool = False,
+            leaf_module: Tuple = (),
+    ) -> None:
+        super().__init__(autowrap_modules, autowrap_functions,
+                         param_shapes_constant)
+
+        self.leaf_module = leaf_module
+
+    def is_leaf_module(self, m: torch.nn.Module,
+                       module_qualified_name: str) -> bool:
+        is_torch_module = super().is_leaf_module(m, module_qualified_name)
+
+        is_leaf = False
+        for module_type in self.leaf_module:
+            if isinstance(m, module_type):
+                is_leaf = True
+                break
+
+        return is_leaf or is_torch_module
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f013c8cb96db5aa2311caac8b54bb6c6374b9bbd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/__init__.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .cascade_encoder_decoder_loss_calculator import \
+    CascadeEncoderDecoderPseudoLoss
+from .image_classifier_loss_calculator import ImageClassifierPseudoLoss
+from .single_stage_detector_loss_calculator import \
+    SingleStageDetectorPseudoLoss
+from .sum_loss_calculator import SumPseudoLoss
+from .top_down_pose_estimator_loss_calculator import \
+    TopdownPoseEstimatorPseudoLoss
+from .two_stage_detector_loss_calculator import TwoStageDetectorPseudoLoss
+
+__all__ = [
+    'ImageClassifierPseudoLoss', 'SingleStageDetectorPseudoLoss',
+    'TwoStageDetectorPseudoLoss', 'TopdownPoseEstimatorPseudoLoss',
+    'CascadeEncoderDecoderPseudoLoss', 'SumPseudoLoss'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/cascade_encoder_decoder_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/cascade_encoder_decoder_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4f60c843a0a20bfce35cfe048a8f72fe57159a2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/cascade_encoder_decoder_loss_calculator.py
@@ -0,0 +1,26 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+try:
+    from mmseg.models import CascadeEncoderDecoder
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    CascadeEncoderDecoder = get_placeholder('mmseg')
+
+
+@TASK_UTILS.register_module()
+class CascadeEncoderDecoderPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a
+    `CascadeEncoderDecoder` in MMSegmentation with `BackwardTracer`."""
+
+    def __call__(self, model: CascadeEncoderDecoder) -> torch.Tensor:
+        pseudo_img = torch.rand(1, 3, 224, 224)
+        pseudo_output = model.backbone(pseudo_img)
+        pseudo_output = model.neck(pseudo_output)
+        # unmodified decode_heads
+        out = torch.tensor(0.)
+        for levels in pseudo_output:
+            out += sum([level.sum() for level in levels])
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/image_classifier_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/image_classifier_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..65e908e30ed90e34555af0c2d07138050f9d9faa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/image_classifier_loss_calculator.py
@@ -0,0 +1,29 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+try:
+    from mmcls.models import ImageClassifier
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ImageClassifier = get_placeholder('mmcls')
+
+
+@TASK_UTILS.register_module()
+class ImageClassifierPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a `ImageClassifier`
+    in MMClassification with `BackwardTracer`.
+
+    Args:
+        input_shape (Tuple): The shape of the pseudo input. Defaults to
+            (2, 3, 224, 224).
+    """
+
+    def __init__(self, input_shape=(2, 3, 224, 224)):
+        self.input_shape = input_shape
+
+    def __call__(self, model: ImageClassifier) -> torch.Tensor:
+        pseudo_img = torch.rand(self.input_shape)
+        pseudo_output = model(pseudo_img)
+        return pseudo_output.sum()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/single_stage_detector_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/single_stage_detector_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8554580d2e45ff544312b79f574715a90293d14
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/single_stage_detector_loss_calculator.py
@@ -0,0 +1,33 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+try:
+    from mmdet.models import SingleStageDetector
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    SingleStageDetector = get_placeholder('mmdet')
+
+
+@TASK_UTILS.register_module()
+class SingleStageDetectorPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a
+    `SingleStageDetector` in MMDetection with `BackwardTracer`.
+
+    Args:
+        input_shape (Tuple): The shape of the pseudo input. Defaults to
+            (2, 3, 224, 224).
+    """
+
+    def __init__(self, input_shape=(2, 3, 224, 224)):
+        self.input_shape = input_shape
+
+    def __call__(self, model: SingleStageDetector) -> torch.Tensor:
+        pseudo_img = torch.rand(self.input_shape)
+        pseudo_output = model(pseudo_img)
+        out = torch.tensor(0.)
+        for levels in pseudo_output:
+            out += sum([level.sum() for level in levels])
+
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/sum_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/sum_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..408b687b14fdf28ee814c46fd1975da7f7e7526a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/sum_loss_calculator.py
@@ -0,0 +1,42 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+
+@TASK_UTILS.register_module()
+class SumPseudoLoss:
+    """Calculate the pseudo loss to trace the topology by summing all output
+    tensors.
+
+    Args:
+        input_shape (Tuple): The shape of the pseudo input. Defaults to
+            (2, 3, 224, 224).
+    """
+
+    def __init__(self, input_shape=(2, 3, 224, 224)):
+        self.input_shape = input_shape
+
+    def __call__(self, model) -> torch.Tensor:
+        pseudo_img = torch.rand(self.input_shape)
+        model.eval()
+        pseudo_output = model(pseudo_img)
+        return self._sum_of_output(pseudo_output)
+
+    def _sum_of_output(self, tensor):
+        """Get a loss by summing all tensors."""
+        if isinstance(tensor, torch.Tensor):
+            return tensor.sum()
+        elif isinstance(tensor, list) or isinstance(tensor, tuple):
+            loss = 0
+            for t in tensor:
+                loss = loss + self._sum_of_output(t)
+            return loss
+        elif isinstance(tensor, dict):
+            loss = 0
+            for t in tensor.values():
+                loss = loss + self._sum_of_output(t)
+            return loss
+        else:
+            raise NotImplementedError(
+                f'unsuppored type{type(tensor)} to get shape of tensors.')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/top_down_pose_estimator_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/top_down_pose_estimator_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..9720194f4e3b319106bb374ed714c4c77e1a12ac
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/top_down_pose_estimator_loss_calculator.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+try:
+    from mmpose.models import TopdownPoseEstimator
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    TopdownPoseEstimator = get_placeholder('mmpose')
+
+
+@TASK_UTILS.register_module()
+class TopdownPoseEstimatorPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a
+    `TopdownPoseEstimator` in MMPose with `BackwardTracer`."""
+
+    def __call__(self, model: TopdownPoseEstimator) -> torch.Tensor:
+        pseudo_img = torch.rand(1, 3, 224, 224)
+        pseudo_output = model.backbone(pseudo_img)
+        # immutable decode_heads
+        out = torch.tensor(0.)
+        for levels in pseudo_output:
+            out += sum([level.sum() for level in levels])
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/two_stage_detector_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/two_stage_detector_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..97ff7d2826c8cc033bb11f28ea692c5199073e9c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/loss_calculator/two_stage_detector_loss_calculator.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor.registry import TASK_UTILS
+
+try:
+    from mmdet.models import TwoStageDetector
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    TwoStageDetector = get_placeholder('mmdet')
+
+
+# todo: adapt to mmdet 2.0
+@TASK_UTILS.register_module()
+class TwoStageDetectorPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a `TwoStageDetector`
+    in MMDet with `BackwardTracer`."""
+
+    def __call__(self, model: TwoStageDetector) -> torch.Tensor:
+        pseudo_img = torch.rand(1, 3, 224, 224)
+        pseudo_output = model.backbone(pseudo_img)
+        pseudo_output = model.neck(pseudo_output)
+        out = torch.tensor(0.)
+        for levels in pseudo_output:
+            out += sum([level.sum() for level in levels])
+
+        return out
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/parsers.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/parsers.py
new file mode 100644
index 0000000000000000000000000000000000000000..c342da71689b3675657322aee91a9459bad1d345
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/parsers.py
@@ -0,0 +1,187 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Callable, Dict
+
+from .path import (Path, PathConcatNode, PathConvNode, PathDepthWiseConvNode,
+                   PathLinearNode, PathList, PathNormNode)
+
+
+def _is_leaf_grad_fn(grad_fn):
+    """Determine whether the current node is a leaf node."""
+    if type(grad_fn).__name__ == 'AccumulateGrad':
+        return True
+    return False
+
+
+def parse_conv(tracer, grad_fn, module2name, param2module, cur_path,
+               result_paths, visited, shared_module):
+    """Parse the backward of a conv layer.
+
+    Example:
+        >>> conv = nn.Conv2d(3, 3, 3)
+        >>> pseudo_img = torch.rand(1, 3, 224, 224)
+        >>> out = conv(pseudo_img)
+        >>> out.grad_fn.next_functions
+        ((None, 0), (<AccumulateGrad object at 0x0000020E405CBD88>, 0),
+        (<AccumulateGrad object at 0x0000020E405CB588>, 0))
+        >>> # op.next_functions[0][0] is None means this ThnnConv2DBackward
+        >>> # op has no parents
+        >>> # op.next_functions[1][0].variable is the weight of this Conv2d
+        >>> # module
+        >>> # op.next_functions[2][0].variable is the bias of this Conv2d
+        >>> # module
+    """
+    leaf_grad_fn = grad_fn.next_functions[1][0]
+    while not _is_leaf_grad_fn(leaf_grad_fn):
+        leaf_grad_fn = leaf_grad_fn.next_functions[0][0]
+    variable = leaf_grad_fn.variable
+    param_id = id(variable)
+    module = param2module[param_id]
+    name = module2name[module]
+    parent = grad_fn.next_functions[0][0]
+    if module.in_channels == module.groups:
+        cur_path.append(PathDepthWiseConvNode(name))
+    else:
+        cur_path.append(PathConvNode(name))
+    # If a module is not a shared module and it has been visited during
+    # forward, its parent modules must have been traced already.
+    # However, a shared module will be visited more than once during
+    # forward, so it is still need to be traced even if it has been
+    # visited.
+    if visited[name] and name not in shared_module:
+        result_paths.append(copy.deepcopy(cur_path))
+    else:
+        visited[name] = True
+        tracer.backward_trace(parent, module2name, param2module, cur_path,
+                              result_paths, visited, shared_module)
+    cur_path.pop(-1)
+
+
+# todo: support parsing `MultiheadAttention` and user-defined matrix
+#  multiplication
+def parse_linear(tracer, grad_fn, module2name, param2module, cur_path,
+                 result_paths, visited, shared_module):
+    """Parse the backward of a conv layer.
+
+    Example:
+        >>> fc = nn.Linear(3, 3, bias=True)
+        >>> input = torch.rand(3, 3)
+        >>> out = fc(input)
+        >>> out.grad_fn.next_functions
+        ((<AccumulateGrad object at 0x0000020E405F75C8>, 0), (None, 0),
+        (<TBackward object at 0x0000020E405F7D48>, 0))
+        >>> # op.next_functions[0][0].variable is the bias of this Linear
+        >>> # module
+        >>> # op.next_functions[1][0] is None means this AddmmBackward op
+        >>> # has no parents
+        >>> # op.next_functions[2][0] is the TBackward op, and
+        >>> # op.next_functions[2][0].next_functions[0][0].variable is
+        >>> # the transpose of the weight of this Linear module
+    """
+    leaf_grad_fn = grad_fn.next_functions[-1][0].next_functions[0][0]
+    while not _is_leaf_grad_fn(leaf_grad_fn):
+        leaf_grad_fn = leaf_grad_fn.next_functions[0][0]
+    variable = leaf_grad_fn.variable
+    param_id = id(variable)
+    module = param2module[param_id]
+    name = module2name[module]
+    parent = grad_fn.next_functions[-2][0]
+
+    cur_path.append(PathLinearNode(name))
+    # If a module is not a shared module and it has been visited during
+    # forward, its parent modules must have been traced already.
+    # However, a shared module will be visited more than once during
+    # forward, so it is still need to be traced even if it has been
+    # visited.
+    if visited[name] and name not in shared_module:
+        result_paths.append(copy.deepcopy(cur_path))
+    else:
+        visited[name] = True
+        tracer.backward_trace(parent, module2name, param2module, cur_path,
+                              result_paths, visited, shared_module)
+
+
+def parse_cat(tracer, grad_fn, module2name, param2module, cur_path,
+              result_paths, visited, shared_module):
+    """Parse the backward of a concat operation.
+
+    Example:
+        >>> conv = nn.Conv2d(3, 3, 3)
+        >>> pseudo_img = torch.rand(1, 3, 224, 224)
+        >>> out1 = conv(pseudo_img)
+        >>> out2 = conv(pseudo_img)
+        >>> out = torch.cat([out1, out2], dim=1)
+        >>> out.grad_fn.next_functions
+        ((<ThnnConv2DBackward object at 0x0000020E405F24C8>, 0),
+        (<ThnnConv2DBackward object at 0x0000020E405F2648>, 0))
+        >>> # the length of ``out.grad_fn.next_functions`` is two means
+        >>> # ``out`` is obtained by concatenating two tensors
+    """
+    parents = grad_fn.next_functions
+    concat_id = '_'.join([str(id(p)) for p in parents])
+    concat_id_list = [str(id(p)) for p in parents]
+    concat_id_list.sort()
+    concat_id = '_'.join(concat_id_list)
+    name = f'concat_{concat_id}'
+
+    visited[name] = True
+    sub_path_lists = list()
+    for _, parent in enumerate(parents):
+        sub_path_list = PathList()
+        tracer.backward_trace(parent, module2name, param2module, Path(),
+                              sub_path_list, visited, shared_module)
+        sub_path_lists.append(sub_path_list)
+    cur_path.append(PathConcatNode(name, sub_path_lists))
+
+    result_paths.append(copy.deepcopy(cur_path))
+    cur_path.pop(-1)
+
+
+def parse_norm(tracer, grad_fn, module2name, param2module, cur_path,
+               result_paths, visited, shared_module):
+    """Parse the backward of a concat operation.
+
+    Example:
+        >>> conv = nn.Conv2d(3, 3, 3)
+        >>> pseudo_img = torch.rand(1, 3, 224, 224)
+        >>> out1 = conv(pseudo_img)
+        >>> out2 = conv(pseudo_img)
+        >>> out = torch.cat([out1, out2], dim=1)
+        >>> out.grad_fn.next_functions
+        ((<ThnnConv2DBackward object at 0x0000020E405F24C8>, 0),
+        (<ThnnConv2DBackward object at 0x0000020E405F2648>, 0))
+        >>> # the length of ``out.grad_fn.next_functions`` is two means
+        >>> # ``out`` is obtained by concatenating two tensors
+    """
+    leaf_grad_fn = grad_fn.next_functions[1][0]
+    while not _is_leaf_grad_fn(leaf_grad_fn):
+        leaf_grad_fn = leaf_grad_fn.next_functions[0][0]
+    variable = leaf_grad_fn.variable
+    param_id = id(variable)
+    module = param2module[param_id]
+    name = module2name[module]
+    parent = grad_fn.next_functions[0][0]
+    cur_path.append(PathNormNode(name))
+
+    visited[name] = True
+    tracer.backward_trace(parent, module2name, param2module, cur_path,
+                          result_paths, visited, shared_module)
+    cur_path.pop(-1)
+
+
+DEFAULT_BACKWARD_TRACER: Dict[str, Callable] = {
+    'ConvolutionBackward': parse_conv,
+    'SlowConv2DBackward': parse_conv,
+    'ThnnConv2DBackward': parse_conv,
+    'CudnnConvolutionBackward': parse_conv,
+    'MkldnnConvolutionBackward': parse_conv,
+    'SlowConvDilated2DBackward': parse_conv,
+    'ThAddmmBackward': parse_linear,
+    'AddmmBackward': parse_linear,
+    'MmBackward': parse_linear,
+    'CatBackward': parse_cat,
+    'ThnnBatchNormBackward': parse_norm,
+    'CudnnBatchNormBackward': parse_norm,
+    'NativeBatchNormBackward': parse_norm,
+    'NativeGroupNormBackward': parse_norm
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/path.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/path.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6597703f4b2811487efc9a11e5b211822a3b6a9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/task_modules/tracer/path.py
@@ -0,0 +1,359 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Tuple, Union
+
+
+def _addindent(s_, numSpaces):
+    s = s_.split('\n')
+    # don't do anything for single-line stuff
+    if len(s) == 1:
+        return s_
+    first = s.pop(0)
+    s = [(numSpaces * ' ') + line for line in s]
+    s = '\n'.join(s)
+    s = first + '\n' + s
+    return s
+
+
+def _merge_node_parents(node2parents, _node2parents):
+    for node, parents in _node2parents.items():
+        if node in node2parents:
+            cur_parents = node2parents[node]
+            new_parents_set = set(cur_parents + parents)
+            new_parents = list(new_parents_set)
+            node2parents[node] = new_parents
+        else:
+            node2parents[node] = parents
+
+
+class PathNode:
+    """``Node`` is the data structure that represents individual instances
+    within a ``Path``. It corresponds to a module or an operation such as
+    concatenation in the model.
+
+    Args:
+        name (str): Unique identifier of a node.
+    """
+
+    def __init__(self, name: str) -> None:
+        self._name = name
+
+    def get_module_names(self) -> List:
+        return [self.name]
+
+    @property
+    def name(self) -> str:
+        """Get the name of current node."""
+        return self._name
+
+    def _get_class_name(self):
+        return self.__class__.__name__
+
+    def __eq__(self, other):
+        if isinstance(other, self.__class__):
+            return self.name == other.name
+        else:
+            return False
+
+    def __hash__(self):
+        return hash(self.name)
+
+    def __repr__(self):
+        return f'{self._get_class_name()}(\'{self.name}\')'
+
+
+class PathConvNode(PathNode):
+    """A `ConvNode` corresponds to a Conv module in the original model."""
+    pass
+
+
+class PathDepthWiseConvNode(PathNode):
+    """A `DepthWiseConvNode` corresponds to a depth-wise conv module in the
+    original model."""
+    pass
+
+
+class PathNormNode(PathNode):
+    """A `NormNode` corresponds to a normalization module in the original
+    model."""
+    pass
+
+
+class PathLinearNode(PathNode):
+    """A `LinearNode` corresponds to a linear module in the original model."""
+    pass
+
+
+class Path:
+    """``Path`` is the data structure that represents a list of ``Node`` traced
+    by a tracer.
+
+    Args:
+        nodes(:obj:`Node` or List[:obj:`Node`], optional): Nodes in a path.
+            Default to None.
+    """
+
+    def __init__(self,
+                 nodes: Optional[Union[PathNode, List[PathNode]]] = None):
+        self._nodes: List[PathNode] = list()
+        if nodes is not None:
+            if isinstance(nodes, PathNode):
+                nodes = [nodes]
+            assert isinstance(nodes, (list, tuple))
+            for node in nodes:
+                assert isinstance(node, PathNode)
+                self._nodes.append(node)
+
+    def get_root_names(self) -> List[str]:
+        """Get the name of the first node in a path."""
+        return self._nodes[0].get_module_names()
+
+    def find_nodes_parents(self,
+                           target_nodes: Tuple,
+                           non_pass: Optional[Tuple] = None) -> Dict:
+        """Find the parents of a specific node.
+
+        Args:
+            target_nodes (Tuple): Find the parents of nodes whose types
+                are one of `target_nodes`.
+            non_pass (Tuple): Ancestor nodes whose types are one of
+                `non_pass` are the parents of a specific node. Default to None.
+        """
+        node2parents: Dict[str, List[PathNode]] = dict()
+        for i, node in enumerate(self._nodes):
+            if isinstance(node, PathConcatNode):
+                _node2parents: Dict[str,
+                                    List[PathNode]] = node.find_nodes_parents(
+                                        target_nodes, non_pass)
+                _merge_node_parents(node2parents, _node2parents)
+                continue
+
+            if isinstance(node, target_nodes):
+                parents = list()
+                for behind_node in self._nodes[i + 1:]:
+                    if non_pass is None or isinstance(behind_node, non_pass):
+                        parents.append(behind_node)
+                        break
+                _node2parents = {node.name: parents}
+                _merge_node_parents(node2parents, _node2parents)
+        return node2parents
+
+    @property
+    def nodes(self) -> List:
+        """Return a list of nodes in the current path."""
+        return self._nodes
+
+    def append(self, x: PathNode) -> None:
+        """Add a node to the end of the current path."""
+        assert isinstance(x, PathNode)
+        self._nodes.append(x)
+
+    def pop(self, *args, **kwargs):
+        """Temoves the node at the given index from the path and returns the
+        removed node."""
+        return self._nodes.pop(*args, **kwargs)
+
+    def __eq__(self, other):
+        if isinstance(other, self.__class__):
+            return self.nodes == other.nodes
+        else:
+            return False
+
+    def __len__(self):
+        return len(self._nodes)
+
+    def __getitem__(self, item):
+        return self._nodes[item]
+
+    def __iter__(self):
+        for node in self._nodes:
+            yield node
+
+    def _get_class_name(self) -> str:
+        """Get the name of the current class."""
+        return self.__class__.__name__
+
+    def __repr__(self):
+        child_lines = []
+        for node in self._nodes:
+            node_str = repr(node)
+            node_str = _addindent(node_str, 2)
+            child_lines.append(node_str)
+        lines = child_lines
+
+        main_str = self._get_class_name() + '('
+        if lines:
+            main_str += '\n  ' + ',\n  '.join(lines) + '\n'
+        main_str += ')'
+        return main_str
+
+
+class PathList:
+    """``PathList`` is the data structure that represents a list of ``Path``
+    traced by a tracer.
+
+    Args:
+        paths(:obj:`Path` or List[:obj:`Path`], optional): A list of `Path`.
+            Default to None.
+    """
+
+    def __init__(self, paths: Optional[Union[Path, List[Path]]] = None):
+        self._paths = list()
+        if paths is not None:
+            if isinstance(paths, Path):
+                paths = [paths]
+            assert isinstance(paths, (list, tuple))
+            for path in paths:
+                assert isinstance(path, Path)
+                self._paths.append(path)
+
+    def get_root_names(self) -> List[str]:
+        """Get the root node of all the paths in `PathList`."""
+        root_name_list = [path.get_root_names() for path in self._paths]
+        for root_names in root_name_list[1:]:
+            assert root_names == root_name_list[0], \
+                f'If the input of a module is a concatenation of several ' \
+                f'modules\' outputs, we can use `get_root_names` to get the' \
+                f' names of these modules. As `get_root_names` is only used' \
+                f' in this case, each element in `root_name_list` should be' \
+                f' the same. Got root_name_list = {root_name_list}'
+        return self._paths[0].get_root_names()
+
+    def find_nodes_parents(self,
+                           target_nodes: Tuple,
+                           non_pass: Optional[Tuple] = None):
+        """Find the parents of a specific node.
+
+        Args:
+            target_nodes (Tuple): Find the parents of nodes whose types
+                are one of `target_nodes`.
+            non_pass (Tuple): Ancestor nodes whose types are one of
+                `non_pass` are the parents of a specific node. Default to None.
+        """
+        node2parents: Dict[str, List[PathNode]] = dict()
+        for p in self._paths:
+            _node2parents = p.find_nodes_parents(target_nodes, non_pass)
+            _merge_node_parents(node2parents, _node2parents)
+        return node2parents
+
+    def append(self, x: Path) -> None:
+        """Add a path to the end of the current PathList."""
+        assert isinstance(x, Path)
+        self._paths.append(x)
+
+    @property
+    def paths(self):
+        """Return all paths in the current PathList."""
+        return self._paths
+
+    def __eq__(self, other):
+        if isinstance(other, self.__class__):
+            return self.paths == other.paths
+        else:
+            return False
+
+    def __len__(self):
+        return len(self._paths)
+
+    def __getitem__(self, item):
+        return self._paths[item]
+
+    def __iter__(self):
+        for path in self._paths:
+            yield path
+
+    def _get_class_name(self) -> str:
+        """Get the name of the current class."""
+        return self.__class__.__name__
+
+    def __repr__(self):
+        child_lines = []
+        for node in self._paths:
+            node_str = repr(node)
+            node_str = _addindent(node_str, 2)
+            child_lines.append(node_str)
+        lines = child_lines
+
+        main_str = self._get_class_name() + '('
+        if lines:
+            main_str += '\n  ' + ',\n  '.join(lines) + '\n'
+        main_str += ')'
+        return main_str
+
+
+class PathConcatNode(PathNode):
+    """``ConcatNode`` is the data structure that represents the concatenation
+    operation in a model.
+
+    Args:
+        name (str): Unique identifier of a `ConcatNode`.
+        path_lists (List[PathList]): Several nodes are concatenated and each
+            node is the root node of all the paths in a `PathList`
+            (one of `path_lists`).
+    """
+
+    def __init__(self, name: str, path_lists: List[PathList]):
+        super().__init__(name)
+        self._path_lists = list()
+        for path_list in path_lists:
+            assert isinstance(path_list, PathList)
+            self._path_lists.append(path_list)
+
+    def get_module_names(self) -> List[str]:
+        """Several nodes are concatenated.
+
+        Get the names of these nodes.
+        """
+        module_names = list()
+        for path_list in self._path_lists:
+            module_names.extend(path_list.get_root_names())
+        return module_names
+
+    def find_nodes_parents(self,
+                           target_nodes: Tuple,
+                           non_pass: Optional[Tuple] = None):
+        """Find the parents of a specific node.
+
+        Args:
+            target_nodes (Tuple): Find the parents of nodes whose types
+                are one of `target_nodes`.
+            non_pass (Tuple): Ancestor nodes whose types are one of
+                `non_pass` are the parents of a specific node. Default to None.
+        """
+        node2parents: Dict[str, List[PathNode]] = dict()
+        for p in self._path_lists:
+            _node2parents = p.find_nodes_parents(target_nodes, non_pass)
+            _merge_node_parents(node2parents, _node2parents)
+        return node2parents
+
+    @property
+    def path_lists(self) -> List[PathList]:
+        """Return all the path_list."""
+        return self._path_lists
+
+    def __len__(self):
+        return len(self._path_lists)
+
+    def __getitem__(self, item):
+        return self._path_lists[item]
+
+    def __iter__(self):
+        for path_list in self._path_lists:
+            yield path_list
+
+    def _get_class_name(self) -> str:
+        """Get the name of the current class."""
+        return self.__class__.__name__
+
+    def __repr__(self):
+        child_lines = []
+        for node in self._path_lists:
+            node_str = repr(node)
+            node_str = _addindent(node_str, 2)
+            child_lines.append(node_str)
+        lines = child_lines
+
+        main_str = self._get_class_name() + '('
+        if lines:
+            main_str += '\n  ' + ',\n  '.join(lines) + '\n'
+        main_str += ')'
+        return main_str
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3be9494657d8c5f28dddf1e30887f0d4fcbad87
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .make_divisible import make_divisible
+from .misc import add_prefix
+from .optim_wrapper import reinitialize_optim_wrapper_count_status
+from .parse_values import parse_values
+from .quantization_util import pop_rewriter_function_record, str2class
+from .utils import get_module_device, set_requires_grad
+
+__all__ = [
+    'make_divisible', 'add_prefix', 'reinitialize_optim_wrapper_count_status',
+    'str2class', 'get_module_device', 'set_requires_grad', 'parse_values',
+    'pop_rewriter_function_record'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..23eeb6073745516b536f1a78098a50f0898bf95c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module is used to expand the channels of a supernet.
+
+We only expose some tool functions, rather than all DynamicOps and
+MutableChannelUnits, as They uses a few hacky operations.
+"""
+from .tools import (expand_expandable_dynamic_model, expand_static_model,
+                    make_channel_divisible, to_expandable_model)
+
+__all__ = [
+    'make_channel_divisible',
+    'to_expandable_model',
+    'expand_expandable_dynamic_model',
+    'expand_static_model',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/ops.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..f2bc2b04650f04eb59df9c199f4deefc126bef75
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/ops.py
@@ -0,0 +1,238 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+
+from mmrazor.models.architectures import dynamic_ops
+from mmrazor.models.mutables import MutableChannelContainer
+from mmrazor.models.utils import get_module_device
+
+
+class ExpandableMixin:
+    """This minin coroperates with dynamic ops.
+
+    It defines interfaces to expand the channels of ops. We can get a wider
+    network than original supernet with it.
+    """
+
+    def expand(self, zero=False):
+        """Expand the op.
+
+        Args:
+            zero (bool, optional): whether to set new weights to zero. Defaults
+                to False.
+        """
+        return self.get_expand_op(
+            self.expanded_in_channel,
+            self.expanded_out_channel,
+            zero=zero,
+        )
+
+    def get_expand_op(self, in_c, out_c, zero=False):
+        """Get an expanded op.
+
+        Args:
+            in_c (int): New input channels
+            out_c (int): New output channels
+            zero (bool, optional): Whether to zero new weights. Defaults to
+                False.
+        """
+        pass
+
+    @property
+    def _original_in_channel(self):
+        """Return original in channel."""
+        raise NotImplementedError()
+
+    @property
+    def _original_out_channel(self):
+        """Return original out channel."""
+
+    @property
+    def expanded_in_channel(self):
+        """Return expanded in channel number."""
+        if self.in_mutable is not None:
+            return self.in_mutable.current_mask.numel()
+        else:
+            return self._original_in_channel
+
+    @property
+    def expanded_out_channel(self):
+        """Return expanded out channel number."""
+        if self.out_mutable is not None:
+            return self.out_mutable.current_mask.numel()
+        else:
+            return self._original_out_channel
+
+    @property
+    def mutable_in_mask(self):
+        """Return the mutable in mask."""
+        device = get_module_device(self)
+        if self.in_mutable is not None:
+            return self.in_mutable.current_mask.to(device)
+        else:
+            return torch.ones([self.expanded_in_channel]).to(device)
+
+    @property
+    def mutable_out_mask(self):
+        """Return the mutable out mask."""
+        device = get_module_device(self)
+        if self.out_mutable is not None:
+            return self.out_mutable.current_mask.to(device)
+        else:
+            return torch.ones([self.expanded_out_channel]).to(device)
+
+    @property
+    def in_mutable(self) -> MutableChannelContainer:
+        """In channel mask."""
+        return self.get_mutable_attr('in_channels')  # type: ignore
+
+    @property
+    def out_mutable(self) -> MutableChannelContainer:
+        """Out channel mask."""
+        return self.get_mutable_attr('out_channels')  # type: ignore
+
+    def zero_weight_(self: nn.Module):
+        """Zero all weights."""
+        for p in self.parameters():
+            p.data.zero_()
+
+    @torch.no_grad()
+    def expand_matrix(self, weight: torch.Tensor, old_weight: torch.Tensor):
+        """Expand weight matrix."""
+        assert len(weight.shape) == 3  # out in c
+        assert len(old_weight.shape) == 3  # out in c
+        mask = self.mutable_out_mask.float().unsqueeze(
+            -1) * self.mutable_in_mask.float().unsqueeze(0)
+        mask = mask.unsqueeze(-1).expand(*weight.shape)
+        weight.data.masked_scatter_(mask.bool(), old_weight)
+        return weight
+
+    @torch.no_grad()
+    def expand_vector(self, weight: torch.Tensor, old_weight: torch.Tensor):
+        """Expand weight vector which has the shape of [out, c]."""
+        assert len(weight.shape) == 2  # out c
+        assert len(old_weight.shape) == 2  # out c
+        mask = self.mutable_out_mask
+        mask = mask.unsqueeze(-1).expand(*weight.shape)
+        weight.data.masked_scatter_(mask.bool(), old_weight)
+        return weight
+
+    @torch.no_grad()
+    def expand_bias(self, bias: torch.Tensor, old_bias: torch.Tensor):
+        """Expand bias."""
+        assert len(bias.shape) == 1  # out c
+        assert len(old_bias.shape) == 1  # out c
+        return self.expand_vector(bias.unsqueeze(-1),
+                                  old_bias.unsqueeze(-1)).squeeze(1)
+
+
+class ExpandableConv2d(dynamic_ops.DynamicConv2d, ExpandableMixin):
+
+    @property
+    def _original_in_channel(self):
+        return self.in_channels
+
+    @property
+    def _original_out_channel(self):
+        return self.out_channels
+
+    def get_expand_op(self, in_c, out_c, zero=False):
+
+        if self.groups == 1:
+            return self._get_expand_op_normal_conv(in_c, out_c, zero=zero)
+        elif self.in_channels == self.out_channels == self.groups:
+            return self._get_expand_op_dw_conv(in_c, out_c, zero=zero)
+        else:
+            raise NotImplementedError('Groupwise conv is not supported yet.')
+
+    def _get_expand_op_normal_conv(self, in_c, out_c, zero=False):
+
+        module = nn.Conv2d(in_c, out_c, self.kernel_size, self.stride,
+                           self.padding, self.dilation, self.groups, self.bias
+                           is not None,
+                           self.padding_mode).to(get_module_device(self))
+        if zero:
+            ExpandableMixin.zero_weight_(module)
+
+        weight = self.expand_matrix(
+            module.weight.flatten(2), self.weight.flatten(2))
+        module.weight.data = weight.reshape(module.weight.shape)
+        if module.bias is not None and self.bias is not None:
+            bias = self.expand_vector(
+                module.bias.unsqueeze(-1), self.bias.unsqueeze(-1))
+            module.bias.data = bias.reshape(module.bias.shape)
+        return module
+
+    def _get_expand_op_dw_conv(self, in_c, out_c, zero=False):
+        assert in_c == out_c
+        module = nn.Conv2d(in_c, out_c, self.kernel_size, self.stride,
+                           self.padding, self.dilation, in_c, self.bias
+                           is not None,
+                           self.padding_mode).to(get_module_device(self))
+        if zero:
+            ExpandableMixin.zero_weight_(module)
+
+        weight = self.expand_vector(
+            module.weight.flatten(1), self.weight.flatten(1))
+        module.weight.data = weight.reshape(module.weight.shape)
+        if module.bias is not None and self.bias is not None:
+            bias = self.expand_vector(
+                module.bias.unsqueeze(-1), self.bias.unsqueeze(-1))
+            module.bias.data = bias.reshape(module.bias.shape)
+        return module
+
+
+class ExpandLinear(dynamic_ops.DynamicLinear, ExpandableMixin):
+
+    @property
+    def _original_in_channel(self):
+        return self.in_features
+
+    @property
+    def _original_out_channel(self):
+        return self.out_features
+
+    def get_expand_op(self, in_c, out_c, zero=False):
+        module = nn.Linear(in_c, out_c, self.bias
+                           is not None).to(get_module_device(self))
+        if zero:
+            ExpandableMixin.zero_weight_(module)
+
+        weight = self.expand_matrix(
+            module.weight.unsqueeze(-1), self.weight.unsqueeze(-1))
+        module.weight.data = weight.reshape(module.weight.shape)
+        if module.bias is not None:
+            bias = self.expand_vector(
+                module.bias.unsqueeze(-1), self.bias.unsqueeze(-1))
+            module.bias.data = bias.reshape(module.bias.shape)
+        return module
+
+
+class ExpandableBatchNorm2d(dynamic_ops.DynamicBatchNorm2d, ExpandableMixin):
+
+    @property
+    def _original_in_channel(self):
+        return self.num_features
+
+    @property
+    def _original_out_channel(self):
+        return self.num_features
+
+    def get_expand_op(self, in_c, out_c, zero=False):
+        assert in_c == out_c
+        module = nn.BatchNorm2d(in_c, self.eps, self.momentum, self.affine,
+                                self.track_running_stats).to(
+                                    get_module_device(self))
+        if zero:
+            ExpandableMixin.zero_weight_(module)
+
+        if module.running_mean is not None:
+            module.running_mean.data = self.expand_bias(
+                module.running_mean, self.running_mean)
+
+        if module.running_var is not None:
+            module.running_var.data = self.expand_bias(module.running_var,
+                                                       self.running_var)
+        module.weight.data = self.expand_bias(module.weight, self.weight)
+        module.bias.data = self.expand_bias(module.bias, self.bias)
+        return module
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/tools.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/tools.py
new file mode 100644
index 0000000000000000000000000000000000000000..c96a72454ed08203bd7efe1812f0976751d5d36a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/tools.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict
+
+import torch.nn as nn
+
+from mmrazor.models.mutators import ChannelMutator
+from .ops import ExpandableMixin
+from .unit import ExpandableUnit
+
+
+def to_expandable_model(model: nn.Module) -> ChannelMutator[ExpandableUnit]:
+    """Convert a static model to an expandable model."""
+    state_dict = model.state_dict()
+    mutator = ChannelMutator[ExpandableUnit](
+        channel_unit_cfg=ExpandableUnit,
+        parse_cfg=dict(
+            _scope_='mmrazor',
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='FxTracer'),
+    )
+    mutator.prepare_from_supernet(model)
+    model.load_state_dict(state_dict)
+    return mutator
+
+
+def expand_expandable_dynamic_model(model: nn.Module, zero=False) -> nn.Module:
+    """Expand a expandable model and return a expanded static model.
+
+    Args:
+        model (nn.Module): The model to be expanded.
+        zero (bool, optional): Whether to zero expanded weight. Defaults to
+            False.
+    """
+
+    def traverse_children(module: nn.Module) -> None:
+        for name, mutable in module.items():
+            if isinstance(mutable, ExpandableMixin):
+                module[name] = mutable.expand(zero=zero)
+            if hasattr(mutable, '_modules'):
+                traverse_children(mutable._modules)
+
+    if isinstance(model, ExpandableMixin):
+        raise RuntimeError('Root model can not be dynamic op.')
+
+    if hasattr(model, '_modules'):
+        traverse_children(model._modules)
+    return model
+
+
+def expand_static_model(model: nn.Module, structure: Dict, zero_weight=True):
+    """Expand the channels of a model.
+
+    Args:
+        model (nn.Module): the model to be expanded.
+        structure (Dict): the channel structure for the model.
+        divisor (_type_): the divisor to make the channels divisible.
+    """
+    mutator = to_expandable_model(model)
+    for key, value in structure.items():
+        mutator._name2unit[key].expand_to(value)
+    expand_expandable_dynamic_model(model, zero=zero_weight)
+    return model
+
+
+def make_channel_divisible(model: nn.Module, divisor, zero_weight=True):
+    """Expand the channels of a model and return the new divisible channel
+    structure.
+
+    Args:
+        model (nn.Module): the model to be expanded.
+        divisor (_type_): the divisor to make the channels divisible.
+    """
+    # to sta
+    mutator = to_expandable_model(model)
+
+    structure = mutator.choice_template
+    for key, num in structure.items():
+        unit = mutator._name2unit[key]
+        if num % divisor == 0:
+            continue
+        else:
+            num = (num // divisor + 1) * divisor
+            num = max(num, unit.num_channels)
+            unit.expand_to(num)
+
+    model = expand_expandable_dynamic_model(model, zero=zero_weight)
+    mutator = to_expandable_model(copy.deepcopy(model))
+
+    return mutator.choice_template
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/unit.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d0b036dcf91d6210d9a3c59595d7e36eb1a7a4a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/expandable_utils/unit.py
@@ -0,0 +1,31 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+from mmengine.model.utils import _BatchNormXd
+
+from mmrazor.models.mutables import (L1MutableChannelUnit,
+                                     MutableChannelContainer)
+from .ops import ExpandableBatchNorm2d, ExpandableConv2d, ExpandLinear
+
+
+class ExpandableUnit(L1MutableChannelUnit):
+    """The units to inplace modules with expandable dynamic ops."""
+
+    def prepare_for_pruning(self, model: nn.Module):
+        self._replace_with_dynamic_ops(
+            model, {
+                nn.Conv2d: ExpandableConv2d,
+                nn.BatchNorm2d: ExpandableBatchNorm2d,
+                _BatchNormXd: ExpandableBatchNorm2d,
+                nn.Linear: ExpandLinear,
+            })
+        self._register_channel_container(model, MutableChannelContainer)
+        self._register_mutable_channel(self.mutable_channel)
+
+    def expand(self, num):
+        expand_mask = self.mutable_channel.mask.new_zeros([num])
+        mask = torch.cat([self.mutable_channel.mask, expand_mask])
+        self.mutable_channel.mask = mask
+
+    def expand_to(self, num):
+        self.expand(num - self.num_channels)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/make_divisible.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/make_divisible.py
new file mode 100644
index 0000000000000000000000000000000000000000..fa3dc4dea1b6724f456c9d3478942e55cd6a3808
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/make_divisible.py
@@ -0,0 +1,46 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+from mmrazor.utils import print_log
+
+warn_once = False
+
+
+def make_divisible(value: int,
+                   divisor: int,
+                   min_value: Optional[int] = None,
+                   min_ratio: float = 0.9) -> int:
+    """Make divisible function.
+
+    This function rounds the channel number down to the nearest value that can
+    be divisible by the divisor.
+
+    Args:
+        value (int): The original channel number.
+        divisor (int): The divisor to fully divide the channel number.
+        min_value (int, optional): The minimum value of the output channel.
+            Default: None, means that the minimum value equal to the divisor.
+        min_ratio (float): The minimum ratio of the rounded channel
+            number to the original channel number. Default: 0.9.
+    Returns:
+        int: The modified output channel number
+    """
+
+    if min_value is None:
+        min_value = divisor
+    if min_value < divisor:
+        global warn_once
+        if warn_once is False:
+            print_log((f'min_value=={min_value} should greater or equal to '
+                       f'divisor=={divisor}, '
+                       'so we make min_value equal divisor.'),
+                      level='warning')
+            warn_once = True
+
+        min_value = divisor
+
+    new_value = max(min_value, int(value + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than (1-min_ratio).
+    if new_value < min_ratio * value:
+        new_value += divisor
+    return new_value
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/misc.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..42c7b553d3a682a94782dcf2d153aa03e9034af3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/misc.py
@@ -0,0 +1,20 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+
+def add_prefix(inputs: Dict, prefix: str) -> Dict:
+    """Add prefix for dict.
+
+    Args:
+        inputs (dict): The input dict with str keys.
+        prefix (str): The prefix to add.
+
+    Returns:
+        dict: The dict with keys updated with ``prefix``.
+    """
+
+    outputs = dict()
+    for name, value in inputs.items():
+        outputs[f'{prefix}.{name}'] = value
+
+    return outputs
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/optim_wrapper.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/optim_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..753f3bd7ec75e53ade2c9a7baa0dda6e2d6c34d4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/optim_wrapper.py
@@ -0,0 +1,29 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.logging import MMLogger
+from mmengine.optim import OptimWrapper
+from torch.nn import Module
+
+
+def reinitialize_optim_wrapper_count_status(model: Module,
+                                            optim_wrapper: OptimWrapper,
+                                            accumulative_counts: int,
+                                            verbose: bool = True) -> None:
+    if verbose:
+        logger = MMLogger.get_current_instance()
+        logger.warning('Reinitialize count status of optim wrapper')
+
+    original_max_iters = \
+        optim_wrapper.message_hub.runtime_info['max_iters']
+    new_max_iters = original_max_iters * accumulative_counts
+    original_init_iters = \
+        optim_wrapper.message_hub.runtime_info['iter']
+    new_init_iters = original_init_iters * accumulative_counts
+
+    if verbose:
+        logger.info(f'original `init_iters`: {original_init_iters}, '
+                    f'new `init_iters`: {new_init_iters}; '
+                    f'orginal `max_iters`: {original_max_iters}, '
+                    f'new `max_iters`: {new_max_iters}')
+
+    optim_wrapper.initialize_count_status(
+        model=model, init_counts=new_init_iters, max_counts=new_max_iters)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/parse_values.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/parse_values.py
new file mode 100644
index 0000000000000000000000000000000000000000..67e98688b5251baf520e154af9c81ea2fd198fbb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/parse_values.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+
+def parse_values(candidate_lists: List[list]):
+    """Parse a list with format `(min_range, max_range, step)`.
+
+    NOTE: this method is required when customizing search space in configs.
+    """
+
+    def _range_to_list(input_range: List[int]) -> List[int]:
+        assert len(input_range) == 3, (
+            'The format should be `(min_range, max_range, step)` with dim=3, '
+            f'but got dim={len(input_range)}.')
+        start, end, step = input_range
+        return list(range(start, end + 1, step))
+
+    return [_range_to_list(i) for i in candidate_lists]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/quantization_util.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/quantization_util.py
new file mode 100644
index 0000000000000000000000000000000000000000..36d108372618f20f4d8094bcbafa951735e246d8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/quantization_util.py
@@ -0,0 +1,60 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.utils import import_modules_from_strings
+
+
+def pop_rewriter_function_record(rewriter_context, function_record_to_pop):
+    """Delete user-specific rewriters from `RewriterContext._rewriter_manager`.
+
+    We use the model which is rewritten by mmdeploy to build quantized models.
+    However not all the functions rewritten by mmdeploy need to be rewritten in
+    mmrazor. For example, mmdeploy rewrite
+    `mmcls.models.classifiers.ImageClassifier.forward` and
+    `mmcls.models.classifiers.BaseClassifier.forward` for deployment. But they
+    can't be rewritten by mmrazor as ptq and qat are done in mmrazor. So to
+    ensure ptq and qat proceed normally, we have to remove these record from
+    `RewriterContext._rewriter_manager`.
+    """
+    function_record_backup = {}
+    for record in function_record_to_pop:
+        records = rewriter_context._rewriter_manager.function_rewriter. \
+            _registry._rewrite_records
+        if record in records:
+            function_record_backup[record] = records.pop(record)
+    return function_record_backup
+
+
+def _check_valid_source(source):
+    """Check if the source's format is valid."""
+    if not isinstance(source, str):
+        raise TypeError(f'source should be a str '
+                        f'instance, but got {type(source)}')
+
+    assert len(source.split('.')) > 1, \
+        'source must have at least one `.`'
+
+
+def str2class(str_inputs):
+    clss = []
+    if not isinstance(str_inputs, tuple) and not isinstance(str_inputs, list):
+        str_inputs_list = [str_inputs]
+    else:
+        str_inputs_list = str_inputs
+    for s_class in str_inputs_list:
+        _check_valid_source(s_class)
+        mod_str = '.'.join(s_class.split('.')[:-1])
+        cls_str = s_class.split('.')[-1]
+        try:
+            mod = import_modules_from_strings(mod_str)
+        except ImportError:
+            raise ImportError(f'{mod_str} is not imported correctly.')
+        imported_cls: type = getattr(mod, cls_str)
+        if not isinstance(imported_cls, type):
+            raise TypeError(f'{cls_str} should be a type '
+                            f'instance, but got {type(imported_cls)}')
+        clss.append(imported_cls)
+    if isinstance(str_inputs, list):
+        return clss
+    elif isinstance(str_inputs, tuple):
+        return tuple(clss)
+    else:
+        return clss[0]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/utils.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2fc4962bf08d3ce4882cac117cc2302fedd1bfc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/models/utils/utils.py
@@ -0,0 +1,39 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Union
+
+import torch
+import torch.nn as nn
+
+
+def get_module_device(module: nn.Module) -> torch.device:
+    """Get the device of a module.
+
+    Args:
+        module (nn.Module): A module contains the parameters.
+    """
+    try:
+        next(module.parameters())
+    except StopIteration as e:
+        raise ValueError('The input module should contain parameters.') from e
+
+    if next(module.parameters()).is_cuda:
+        return next(module.parameters()).get_device()
+
+    return torch.device('cpu')
+
+
+def set_requires_grad(nets: Union[nn.Module, List[nn.Module]],
+                      requires_grad: bool = False) -> None:
+    """Set requires_grad for all the networks.
+
+    Args:
+        nets (nn.Module | list[nn.Module]): A list of networks or a single
+            network.
+        requires_grad (bool): Whether the networks require gradients or not
+    """
+    if not isinstance(nets, list):
+        nets = [nets]
+    for net in nets:
+        if net is not None:
+            for param in net.parameters():
+                param.requires_grad = requires_grad
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..63ce9b1ef3163cab5c68c2db02e3cc686ecf35c7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .registry import (DATA_SAMPLERS, DATASETS, HOOKS, LOOPS, METRICS,
+                       MODEL_WRAPPERS, MODELS, OPTIM_WRAPPER_CONSTRUCTORS,
+                       OPTIM_WRAPPERS, OPTIMIZERS, PARAM_SCHEDULERS,
+                       RUNNER_CONSTRUCTORS, RUNNERS, TASK_UTILS, TRANSFORMS,
+                       VISBACKENDS, VISUALIZERS, WEIGHT_INITIALIZERS,
+                       sub_model)
+
+__all__ = [
+    'RUNNERS', 'RUNNER_CONSTRUCTORS', 'HOOKS', 'DATASETS', 'DATA_SAMPLERS',
+    'TRANSFORMS', 'MODELS', 'WEIGHT_INITIALIZERS', 'OPTIMIZERS',
+    'OPTIM_WRAPPERS', 'OPTIM_WRAPPER_CONSTRUCTORS', 'TASK_UTILS',
+    'PARAM_SCHEDULERS', 'METRICS', 'MODEL_WRAPPERS', 'LOOPS', 'VISBACKENDS',
+    'VISUALIZERS', 'sub_model'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/registry.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/registry.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f915ee74f927036f19d590f58c1b5d08ee235ee
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/registry/registry.py
@@ -0,0 +1,141 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""MMRazor provides 17 registry nodes to support using modules across projects.
+Each node is a child of the root registry in MMEngine.
+
+More details can be found at
+https://mmengine.readthedocs.io/en/latest/tutorials/registry.html.
+"""
+from typing import Any, Dict, Optional, Union
+
+from mmengine.config import Config, ConfigDict
+from mmengine.registry import DATA_SAMPLERS as MMENGINE_DATA_SAMPLERS
+from mmengine.registry import DATASETS as MMENGINE_DATASETS
+from mmengine.registry import HOOKS as MMENGINE_HOOKS
+from mmengine.registry import LOOPS as MMENGINE_LOOPS
+from mmengine.registry import METRICS as MMENGINE_METRICS
+from mmengine.registry import MODEL_WRAPPERS as MMENGINE_MODEL_WRAPPERS
+from mmengine.registry import MODELS as MMENGINE_MODELS
+from mmengine.registry import \
+    OPTIM_WRAPPER_CONSTRUCTORS as MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS
+from mmengine.registry import OPTIM_WRAPPERS as MMENGINE_OPTIM_WRAPPERS
+from mmengine.registry import OPTIMIZERS as MMENGINE_OPTIMIZERS
+from mmengine.registry import PARAM_SCHEDULERS as MMENGINE_PARAM_SCHEDULERS
+from mmengine.registry import \
+    RUNNER_CONSTRUCTORS as MMENGINE_RUNNER_CONSTRUCTORS
+from mmengine.registry import RUNNERS as MMENGINE_RUNNERS
+from mmengine.registry import TASK_UTILS as MMENGINE_TASK_UTILS
+from mmengine.registry import TRANSFORMS as MMENGINE_TRANSFORMS
+from mmengine.registry import VISBACKENDS as MMENGINE_VISBACKENDS
+from mmengine.registry import VISUALIZERS as MMENGINE_VISUALIZERS
+from mmengine.registry import \
+    WEIGHT_INITIALIZERS as MMENGINE_WEIGHT_INITIALIZERS
+from mmengine.registry import Registry, build_from_cfg
+
+
+def build_razor_model_from_cfg(
+        cfg: Union[dict, ConfigDict, Config],
+        registry: 'Registry',
+        default_args: Optional[Union[dict, ConfigDict, Config]] = None) -> Any:
+
+    # TODO relay on mmengine:HAOCHENYE/config_new_feature
+    if cfg.get('cfg_path', None) and not cfg.get('type', None):
+        from mmengine.hub import get_model
+        model = get_model(**cfg)  # type: ignore
+        return model
+
+    return_architecture = False
+    if cfg.get('_return_architecture_', None):
+        return_architecture = cfg.pop('_return_architecture_')
+    razor_model = build_from_cfg(cfg, registry, default_args)
+    if return_architecture:
+        return razor_model.architecture
+    else:
+        return razor_model
+
+
+# Registries For Runner and the related
+# manage all kinds of runners like `EpochBasedRunner` and `IterBasedRunner`
+RUNNERS = Registry('runner', parent=MMENGINE_RUNNERS)
+# manage runner constructors that define how to initialize runners
+RUNNER_CONSTRUCTORS = Registry(
+    'runner constructor', parent=MMENGINE_RUNNER_CONSTRUCTORS)
+# manage all kinds of loops like `EpochBasedTrainLoop`
+LOOPS = Registry('loop', parent=MMENGINE_LOOPS)
+# manage all kinds of hooks like `CheckpointHook`
+HOOKS = Registry('hook', parent=MMENGINE_HOOKS)
+
+# Registries For Data and the related
+# manage data-related modules
+DATASETS = Registry('dataset', parent=MMENGINE_DATASETS)
+DATA_SAMPLERS = Registry('data sampler', parent=MMENGINE_DATA_SAMPLERS)
+TRANSFORMS = Registry('transform', parent=MMENGINE_TRANSFORMS)
+
+# manage all kinds of modules inheriting `nn.Module`
+MODELS = Registry(
+    'model', parent=MMENGINE_MODELS, build_func=build_razor_model_from_cfg)
+# manage all kinds of model wrappers like 'MMDistributedDataParallel'
+MODEL_WRAPPERS = Registry('model_wrapper', parent=MMENGINE_MODEL_WRAPPERS)
+# manage all kinds of weight initialization modules like `Uniform`
+WEIGHT_INITIALIZERS = Registry(
+    'weight initializer', parent=MMENGINE_WEIGHT_INITIALIZERS)
+
+# Registries For Optimizer and the related
+# manage all kinds of optimizers like `SGD` and `Adam`
+OPTIMIZERS = Registry('optimizer', parent=MMENGINE_OPTIMIZERS)
+# manage optimizer wrapper
+OPTIM_WRAPPERS = Registry('optimizer_wrapper', parent=MMENGINE_OPTIM_WRAPPERS)
+# manage constructors that customize the optimization hyperparameters.
+OPTIM_WRAPPER_CONSTRUCTORS = Registry(
+    'optimizer wrapper constructor',
+    parent=MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS)
+# manage all kinds of parameter schedulers like `MultiStepLR`
+PARAM_SCHEDULERS = Registry(
+    'parameter scheduler', parent=MMENGINE_PARAM_SCHEDULERS)
+
+# manage all kinds of metrics
+METRICS = Registry('metric', parent=MMENGINE_METRICS)
+
+# manage task-specific modules like anchor generators and box coders
+TASK_UTILS = Registry('task util', parent=MMENGINE_TASK_UTILS)
+
+# Registries For Visualizer and the related
+# manage visualizer
+VISUALIZERS = Registry('visualizer', parent=MMENGINE_VISUALIZERS)
+# manage visualizer backend
+VISBACKENDS = Registry('vis_backend', parent=MMENGINE_VISBACKENDS)
+
+
+# manage sub models for downstream repos
+@MODELS.register_module()
+def sub_model(cfg,
+              fix_subnet,
+              mode: str = 'mutable',
+              prefix: str = '',
+              extra_prefix: str = '',
+              init_weight_from_supernet: bool = False,
+              init_cfg: Optional[Dict] = None,
+              **kwargs):
+    model = MODELS.build(cfg)
+    # Save path type cfg process, set init_cfg directly.
+    if init_cfg:
+        # update init_cfg when init_cfg is valid.
+        model.init_cfg = init_cfg
+
+    if init_weight_from_supernet:
+        # init weights from supernet first before it turns into a sub model.
+        model.init_weights()
+
+    from mmrazor.structures import load_fix_subnet
+
+    load_fix_subnet(
+        model,
+        fix_subnet,
+        load_subnet_mode=mode,
+        prefix=prefix,
+        extra_prefix=extra_prefix)
+
+    if not init_weight_from_supernet:
+        # init weights from the specific sub model.
+        model.init_weights()
+
+    return model
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f15c5d45930d0b005015866e1134a1f35297e32
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/__init__.py
@@ -0,0 +1,3 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .quantization import *  # noqa: F401,F403
+from .subnet import *  # noqa: F401,F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b22fe57d802a77a08dc56a9e9c58d9291f195cb4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .base_graph import BaseGraph, BaseNode
+from .module_graph import ModuleGraph, ModuleNode
+
+__all__ = ['BaseGraph', 'BaseNode', 'ModuleNode', 'ModuleGraph']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/base_graph.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/base_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c40e03594c2e25c8dd3211667dad5a0ac8d20c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/base_graph.py
@@ -0,0 +1,233 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module defines BaseNode and BaseGraph, which are used to model Directed
+Acyclic Graph(DAG)"""
+from collections import OrderedDict
+from copy import copy
+from typing import Any, Callable, Generic, Iterator, List, TypeVar
+
+# BaseNode && BaseGraph
+
+
+class BaseNode:
+    """A single node in a graph."""
+
+    def __init__(self, name: str, val: Any) -> None:
+        """
+        Args:
+            name (str): name of the node.
+            val (any): content of the node.
+        """
+        self.val = val
+        self.name = name
+        self.prev_nodes: List = []
+        self.next_nodes: List = []
+
+    # node operation
+
+    def add_prev_node(self, node: 'BaseNode'):
+        """add previous node."""
+        if node not in self.prev_nodes:
+            self.prev_nodes.append(node)
+        if self not in node.next_nodes:
+            node.next_nodes.append(self)
+
+    def add_next_node(self, node: 'BaseNode'):
+        """add next node."""
+        if node not in self.next_nodes:
+            self.next_nodes.append(node)
+        if self not in node.prev_nodes:
+            node.prev_nodes.append(self)
+
+    @classmethod
+    def copy_from(cls, node: 'BaseNode'):
+        """Copy a node, and generate a new node with current node type."""
+        return cls(node.name, node.val)
+
+    # compare operation
+
+    def __hash__(self) -> int:
+        """Hash the node."""
+        return hash((self.val, self.name))
+
+    def __eq__(self, other):
+        """Compare two nodes."""
+        return self.val is other.val and self.name == other.name
+
+    # other
+
+    def __repr__(self) -> str:
+        return self.name
+
+
+BASENODE = TypeVar('BASENODE', bound=BaseNode)
+
+
+class BaseGraph(Generic[BASENODE]):
+    """A Directed Acyclic Graph(DAG)"""
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.nodes: OrderedDict[str, BASENODE] = OrderedDict()
+
+    # graph operations
+
+    @classmethod
+    def copy_from(cls,
+                  graph: 'BaseGraph',
+                  node_converter: Callable = BaseNode.copy_from):
+        """Copy a graph, and generate a new graph of the current class.
+
+        Args:
+            graph (BaseGraph): the graph to be copied.
+            node_converter (Callable): a function that converts node,
+            when coping graph.
+        """
+        old2new = {}
+        new_graph = cls()
+        # copy nodes
+        for old in graph:
+            old2new[old] = new_graph.add_or_find_node(node_converter(old))
+
+        # connect
+        for old in graph:
+            for pre in old.prev_nodes:
+                new_graph.connect(old2new[pre], old2new[old])
+        return new_graph
+
+    # node operations
+
+    def add_or_find_node(self, node: BASENODE):
+        """Add a node to the graph.
+
+        If the node has exsited in the graph, the function will return the node
+        recorded in the graph.
+        """
+        find = self.find_node(node)
+        if find is not None:
+            return find
+        else:
+            self.add_node(node)
+            return node
+
+    def find_node(self, node: BaseNode):
+        """Find a node and return."""
+        if node.name in self.nodes and node.val == self.nodes[node.name].val:
+            return self.nodes[node.name]
+        else:
+            return None
+
+    def add_node(self, node: BASENODE):
+        """Add a node."""
+        if node.name not in self.nodes:
+            self.nodes[node.name] = node
+        else:
+            raise Exception(f'{node.name} already exists in graph')
+
+    def connect(self, pre_node: BASENODE, next_node: BASENODE):
+        """Add an edge from pre_node to next_node."""
+        pre_node_ = self.find_node(pre_node)
+        next_node_ = self.find_node(next_node)
+        assert pre_node_ is not None and next_node_ is not None, \
+            f"{pre_node},{next_node} don't exist in the graph."
+        pre_node = pre_node_
+        next_node = next_node_
+        pre_node.add_next_node(next_node)
+        next_node.add_prev_node(pre_node)
+
+    def disconnect(self, pre_node: BASENODE, next_node: BASENODE):
+        """Remove the edge form pre_node to next_node."""
+        pre_node_ = self.find_node(pre_node)
+        next_node_ = self.find_node(next_node)
+        assert pre_node_ is not None and next_node_ is not None, \
+            f"{pre_node},{next_node} don't exist in the graph."
+        pre_node = pre_node_
+        next_node = next_node_
+        if next_node in pre_node.next_nodes:
+            pre_node.next_nodes.remove(next_node)
+        if pre_node in next_node.prev_nodes:
+            next_node.prev_nodes.remove(pre_node)
+
+    def delete_node(self, node: BASENODE):
+        """Delete a node with its related edges."""
+        node = self.find_node(node)
+        assert node is not None
+
+        if len(node.prev_nodes) == 0:
+            for next in copy(node.next_nodes):
+                self.disconnect(node, next)
+        elif len(node.next_nodes) == 0:
+            for pre in copy(node.prev_nodes):
+                self.disconnect(pre, node)
+        elif len(node.prev_nodes) == 1:
+            pre_node = node.prev_nodes[0]
+            self.disconnect(pre_node, node)
+            for next in copy(node.next_nodes):
+                self.disconnect(node, next)
+                self.connect(pre_node, next)
+        elif len(node.next_nodes) == 1:
+            next_node = node.next_nodes[0]
+            self.disconnect(node, next_node)
+            for pre in copy(node.prev_nodes):
+                self.disconnect(pre, node)
+                self.connect(pre, next_node)
+        else:
+            raise Exception(f'not delete {node}, \
+                as it has more than one inputs and outputs')
+        self.nodes.pop(node.name)
+
+    # work as a collection
+
+    def __iter__(self) -> Iterator[BASENODE]:
+        """Traverse all nodes in the graph."""
+        for x in self.nodes.values():
+            yield x
+
+    def __contains__(self, node: BASENODE) -> bool:
+        """Check if a node is contained in the graph."""
+        return node.name in self.nodes
+
+    def __len__(self) -> int:
+        """Number of nodes in the graph."""
+        return len(self.nodes)
+
+    # other
+
+    def __repr__(self):
+        res = f'Graph with {len(self)} nodes:\n'
+        for node in self:
+            res += '{0:<80} -> {1:^80} -> {2:<80}\n'.format(
+                str(node.prev_nodes), node.__repr__(), str(node.next_nodes))
+        return res
+
+    # traverse
+
+    def topo_traverse(self) -> Iterator[BASENODE]:
+        """Traverse the graph in topologitcal order."""
+
+        def _in_degree(graph: BaseGraph):
+            degree = {}
+            for name, node in graph.nodes.items():
+                degree[name] = len(node.prev_nodes)
+            return degree
+
+        def find_zero_degree_node(in_degree):
+            for node_name in in_degree:
+                if in_degree[node_name] == 0:
+                    return node_name
+            raise Exception(f'no zero degree node\n{in_degree}')
+
+        in_degree = _in_degree(self)
+
+        while len(in_degree) > 0:
+            node_name = find_zero_degree_node(in_degree)  # visit the node
+            in_degree.pop(node_name)
+            yield self.nodes[node_name]
+            for next in self.nodes[node_name].next_nodes:
+                in_degree[next.name] -= 1
+
+    def topo_sort(self):
+        """Sort all node in topological order."""
+        sorted_nodes = OrderedDict()
+        for node in self.topo_traverse():
+            sorted_nodes[node.name] = node
+        self.nodes = sorted_nodes
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_flow.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_flow.py
new file mode 100644
index 0000000000000000000000000000000000000000..990826be96c419f6e5d933413f2d6f38547b2fe9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_flow.py
@@ -0,0 +1,220 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""Including modules for ChannelFlow to analyze channel dependency."""
+import copy
+import itertools
+import sys
+from typing import List, Set, Union
+
+from mmrazor.utils import IndexDict
+
+sys.setrecursionlimit(int(pow(2, 20)))
+
+
+class ChannelElem:
+    """A ChannelElem represents a channel in ChannelFlow."""
+
+    def __init__(self, owning_tensor: 'ChannelTensor',
+                 index_in_tensor: int) -> None:
+        """A ChannelElem represents a channel in channelflow.
+
+        Args:
+            owning_tensor (ChannelTensor): the ChannelTensor which the
+                ChannelElem belongs to.
+            index_in_tensor (int): the index  in the owning_tensor.
+        """
+        self._parent: Union[None, 'ChannelElem'] = None
+        self._subs: Set[ChannelElem] = set()
+        self.owing_tensor = owning_tensor
+        self.index_in_tensoor = index_in_tensor
+        self._hash_cache = None
+        self._min_elem_set_index_cache = None
+
+    # channel elem operations
+
+    @classmethod
+    def union_two(cls, elem1: 'ChannelElem', elem2: 'ChannelElem'):
+        """Bind two ChannelElems."""
+        root1 = elem1.root
+        root2 = elem2.root
+        if root1 is not root2:
+            root2._set_parent(root1)
+
+    def union(self, elem: 'ChannelElem'):
+        """Bind with anther ChannelElem."""
+        ChannelElem.union_two(self, elem)
+
+    # hash related
+
+    @property
+    def owing_elem_set(self):
+        """Get ChannelElem set representation."""
+        root = self.root
+        return root.subs
+
+    def reset_cache(self):
+        """Reset hash cache."""
+        self._hash_cache = None
+        self._min_elem_set_index_cache = None
+
+    @property
+    def elem_set_hash(self):
+        """Get the hash of the owning ChannelElems set."""
+        if self._hash_cache is not None:
+            return self._hash_cache
+        else:
+            tensor_list = list(self.owing_elem_set)
+            tensor_set = set([elem.owing_tensor for elem in tensor_list])
+            frozen_set = frozenset(tensor_set)
+            hash = frozen_set.__hash__()
+            for elem in self.owing_elem_set:
+                assert elem._hash_cache is None
+                elem._hash_cache = hash
+            return hash
+
+    @property
+    def min_elem_set_index(self):
+        """Minimal index in ChannelTensors."""
+        if self._min_elem_set_index_cache is not None:
+            return self._min_elem_set_index_cache
+        else:
+            elem_set = self.owing_elem_set
+            min_index = int(pow(2, 32))
+            for elem in elem_set:
+                min_index = min(min_index, elem.index_in_tensoor)
+            for elem in elem_set:
+                assert elem._min_elem_set_index_cache is None
+                elem._min_elem_set_index_cache = min_index
+            return min_index
+
+    # work as a disjoint set
+
+    @property
+    def root(self) -> 'ChannelElem':
+        """Get root of the owing ChannelElem set."""
+        if self._parent is None:
+            return self
+        else:
+            root = self._parent.root
+            self._unset_parent()
+            self._set_parent(root)
+            return root
+
+    @property
+    def subs(self):
+        """Get all Elements in the set."""
+        subs = copy.copy(self._subs)
+        subs.add(self)
+        for elem in self._subs:
+            subs = subs.union(elem.subs)
+        return subs
+
+    def _set_parent(self, parent: 'ChannelElem'):
+        """Set parent for the ChannelElem."""
+        assert self._parent is None
+        assert parent.root is not self
+        self._parent = parent
+        parent._subs.add(self)
+
+    def _unset_parent(self):
+        """Unset parent of the ChannelElem."""
+        assert self._parent is not None
+        old_parent = self._parent
+        old_parent._subs.remove(self)
+        self._parent = None
+
+
+class ChannelTensor:
+    """The ChannelTensor in ChannelFlow."""
+
+    def __init__(self, num_channel_elem: int) -> None:
+        """ChannelTensor works as a proxy of a tensor.
+
+        Args:
+            num_channel_elem (int): number of channels(ChannelElems)
+        """
+        self.elems = [ChannelElem(self, i) for i in range(num_channel_elem)]
+
+    # tensor operations
+
+    def union(self, tensor: 'ChannelTensor'):
+        """Bind with another ChannelTensor."""
+        return self.__class__.union_two(self, tensor)
+
+    @classmethod
+    def union_two(cls, tensor1: 'ChannelTensor', tensor2: 'ChannelTensor'):
+        """Bind two ChannelTensors."""
+        assert len(tensor1) == len(tensor2), f'{len(tensor1)}!={len(tensor2)}'
+        for e1, e2 in zip(tensor1, tensor2):
+            ChannelElem.union_two(e1, e2)
+
+    @classmethod
+    def cat(cls, tensors: List['ChannelTensor']):
+        """Cat multiple ChannelTensors."""
+        elems = list(itertools.chain(*[t.elems for t in tensors]))
+        new_tensor = ChannelTensor(len(elems))
+        new_tensor.elems = elems
+        return new_tensor
+
+    def expand(self, expand_ratio: int):
+        """Expand self ChannelTensor."""
+        new_tensor = ChannelTensor(expand_ratio * len(self))
+
+        for i in range(len(self)):
+            for j in range(expand_ratio):
+                self[i].union(new_tensor[i * expand_ratio + j])
+        return new_tensor
+
+    # hash operation
+
+    @property
+    def elems_hash_with_index(self):
+        """Return hash of the ChannelElems in the ChannelTensor with index."""
+        elem_hashes = [(elem.elem_set_hash, elem.min_elem_set_index)
+                       for elem in self.elems]
+        return elem_hashes
+
+    @property
+    def elems_hash_dict(self):
+        """Return hash of the ChannelElems in the ChannelTensor."""
+        elem_hash_with_index = self.elems_hash_with_index
+        unit_dict = IndexDict()
+        start = 0
+        for e in range(1, len(self)):
+            if (elem_hash_with_index[e][0] != elem_hash_with_index[e - 1][0]
+                    or elem_hash_with_index[e][1] <
+                    elem_hash_with_index[e - 1][1]):
+
+                unit_dict[(start, e)] = elem_hash_with_index[start][0]
+                start = e
+        unit_dict[start, len(self)] = elem_hash_with_index[start][0]
+        return unit_dict
+
+    # work as a tensor
+
+    def __getitem__(self, key: Union[int, slice]):
+        if isinstance(key, int):
+            return self.elems[key]
+        elif isinstance(key, slice):
+            elems = self.elems[key]
+            tensor = ChannelTensor(len(elems))
+            tensor.elems = elems
+            return tensor
+        else:
+            raise NotImplementedError()
+
+    def __len__(self):
+        return len(self.elems)
+
+    def __iter__(self):
+        for e in self.elems:
+            yield e
+
+    def __add__(self, tensor: 'ChannelTensor'):
+        return ChannelTensor.cat([self, tensor])
+
+    # others
+
+    def _reset_channel_elem_cache(self):
+        """Reset hash of all ChannelElems in the ChannelTensor."""
+        for elem in self.elems:
+            elem.reset_cache()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_graph.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..40f8da0880bf00e10514e334be0e0c9ac4ffb139
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_graph.py
@@ -0,0 +1,245 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Callable, Dict, List
+
+from torch.nn import Module
+
+from mmrazor.utils import print_log
+from .base_graph import BaseGraph
+from .channel_flow import ChannelTensor
+from .channel_nodes import (ChannelDismatchError, ChannelNode, EndNode,
+                            ExpandChannelNode, InputChannelNode,
+                            default_channel_node_converter)
+from .module_graph import ModuleGraph, NoInputError, NoOutputError
+
+
+class ChannelGraph(ModuleGraph[ChannelNode]):
+    """ChannelGraph is used to trace the channel dependency of a model.
+
+    A ChannelGraph generates a ChannelTensor as the input to the model. Then,
+    the tensor can forward through all nodes and collect channel dependency.
+    """
+
+    @classmethod
+    def copy_from(cls,
+                  graph: 'BaseGraph',
+                  node_converter: Callable = default_channel_node_converter):
+        """Copy from a ModuleGraph."""
+        assert isinstance(graph, ModuleGraph)
+        channel_graph: ChannelGraph = super().copy_from(graph, node_converter)
+        channel_graph._insert_expand_node()
+        return channel_graph
+
+    def generate_units_config(self) -> Dict:
+        """Generate configs of MutableChannelUnits according to the Graph.
+
+        "hash"{
+            'init_args':{
+                'num_channels': 10
+            }
+            'channels':{
+                'input_related':[
+                    {
+                        "name":"backbone.bn1",
+                        "start":0,
+                        "end":64,
+                        "expand_ratio":1,
+                        "is_output_channel":false
+                    }
+                ],
+                'output_related':[
+                    ...
+                ]
+            }
+        }
+        """
+
+        chanel_config_template: Dict = {
+            'init_args': {
+                'num_channels': 1
+            },
+            'channels': {
+                'input_related': [],
+                'output_related': []
+            }
+        }
+
+        def process_tensor(node: ChannelNode, is_output_tensor,
+                           unit_hash_dict: Dict):
+            if is_output_tensor:
+                tensor = node.out_channel_tensor
+            else:
+                tensor = node.in_channel_tensor
+            assert tensor is not None
+            for (start, end), hash in tensor.elems_hash_dict.items():
+                channel_config = {
+                    'name': node.module_name if node.is_module else node.val,
+                    'start': start,
+                    'end': end,
+                    'is_output_channel': is_output_tensor
+                }
+                if hash not in unit_hash_dict:
+                    unit_hash_dict[hash] = copy.deepcopy(
+                        chanel_config_template)
+                related_dict = unit_hash_dict[hash]['channels'][
+                    'output_related' if is_output_tensor else 'input_related']
+                if channel_config not in related_dict:
+                    related_dict.append(channel_config)
+
+        def fill_num_channels(units_config: Dict):
+
+            def min_num_channels(channel_configs: List[Dict]):
+                min_num_channels = int(pow(2, 32))
+                for channel in channel_configs:
+                    min_num_channels = min(min_num_channels,
+                                           channel['end'] - channel['start'])
+                return min_num_channels
+
+            for name in units_config:
+                units_config[name]['init_args'][
+                    'num_channels'] = min_num_channels(
+                        units_config[name]['channels']['input_related'] +
+                        units_config[name]['channels']['output_related'])
+
+        unit_hash_dict: Dict = {}
+        self._reset_channel_elem_cache()
+        for node in self.topo_traverse():
+            process_tensor(node, True, unit_hash_dict)
+            process_tensor(node, False, unit_hash_dict)
+        fill_num_channels(unit_hash_dict)
+        return unit_hash_dict
+
+    def forward(self, num_input_channel=3):
+        """Generate a ChanneelTensor and let it forwards through the graph."""
+        for node in self.topo_traverse():
+            node.reset_channel_tensors()
+        for i, node in enumerate(self.topo_traverse()):
+            node: ChannelNode
+            if len(node.prev_nodes) == 0:
+                tensor = ChannelTensor(num_input_channel)
+                node.forward([tensor])
+            else:
+                node.forward()
+        self._merge_same_module()
+
+    # graph modification
+
+    def _add_input_before(self, node: ChannelNode):
+        """Add a input node before a ChannelNode."""
+        try:
+            in_channels = node.in_channels
+        except Exception:
+            in_channels = 3
+        input_node = InputChannelNode(
+            f'auto_input_{in_channels}',
+            'input_placeholder',
+            input_channels=in_channels)  # type: ignore
+        input_node = self.add_or_find_node(input_node)
+        self.connect(input_node, node)
+
+    def _add_output_after(self, node: ChannelNode):
+        """Add a output node after a ChannelNode."""
+
+        output_node = EndNode('auto_output',
+                              'output_placeholder')  # type: ignore
+        output_node = self.add_or_find_node(output_node)
+        self.connect(node, output_node)
+
+    def _convert_a_node_to_end_node(self, node: ChannelNode):
+        """Convert a node to end node."""
+
+        end_node = EndNode('auto_end', 'output_placeholder')
+        end_node = self.add_or_find_node(end_node)
+        for prev in copy.copy(node.prev_nodes):
+            self.disconnect(prev, node)
+            self.connect(prev, end_node)
+        self._add_input_before(node)
+
+    def _merge_same_module(self):
+        """Union all nodes with the same module to the same unit."""
+        module2node: Dict[Module, List[ChannelNode]] = dict()
+        for node in self:
+            if isinstance(node.val,
+                          Module) and len(list(node.val.parameters())) > 0:
+                if node.val not in module2node:
+                    module2node[node.val] = []
+                if node not in module2node[node.val]:
+                    module2node[node.val].append(node)
+
+        for module in module2node:
+            if len(module2node[module]) > 1:
+                nodes = module2node[module]
+                assert nodes[0].in_channel_tensor is not None and \
+                    nodes[0].out_channel_tensor is not None
+                for node in nodes[1:]:
+                    nodes[0].in_channel_tensor.union(node.in_channel_tensor)
+                    nodes[0].out_channel_tensor.union(node.out_channel_tensor)
+
+    def _insert_expand_node(self):
+        """Insert expand nodes in the graph."""
+        num_expand_nodes = 0
+        nodes: List[ChannelNode] = copy.copy(list(self.topo_traverse()))
+        for node in nodes:
+            try:
+                node.check_channel()
+            except Exception:
+                for pre_node in node.prev_nodes:
+                    pre_node: ChannelNode
+                    if (pre_node.out_channels < node.in_channels
+                            and node.in_channels % pre_node.out_channels == 0):
+                        print_log(
+                            (f'As the channels of {pre_node} and {node} '
+                             'dismatch, we add an ExpandNode between them.'),
+                            level='warning')
+                        expand_ratio = (
+                            node.in_channels // pre_node.out_channels)
+                        # insert a expand node
+                        new_node = ExpandChannelNode(
+                            f'expand_{num_expand_nodes}',
+                            'expand',
+                            expand_ratio=expand_ratio)
+                        num_expand_nodes += 1
+                        self.add_node(new_node)
+                        self.connect(pre_node, new_node)
+                        self.connect(new_node, node)
+                        self.disconnect(pre_node, node)
+
+    # others
+
+    def _check(self, node: ChannelNode, fix=False):
+        """Helper for self.check, including check whether the Graph has any
+        error and fix errors."""
+        try:
+            node.check_channel()
+            node.check()
+        except Exception as e:
+            if not fix:
+                raise e
+            else:
+                try:
+                    raise e
+                except NoOutputError as e:
+                    print_log(f'add a output after {node}, error: {e}',
+                              'debug')
+                    self._add_output_after(node)
+                except NoInputError as e:
+                    print_log(
+                        f'add a input before {node}, error: {e}',
+                        level='debug')
+                    self._add_input_before(node)
+                except ChannelDismatchError as e:
+                    print_log((f'{node} has channel error, so'
+                               f'we convert it to a EndNode. error: {e}'),
+                              level='debug')
+                    self._convert_a_node_to_end_node(node)
+
+                self._check(node, fix=True)
+
+    def _reset_channel_elem_cache(self):
+        """Reset hash cache of ChannelTensors."""
+        # may has bug, as some tensor not recorded by node.xxxx_tensors
+        for node in self.topo_traverse():
+            assert (node.in_channel_tensor is not None
+                    and node.out_channel_tensor is not None), f'{node}'
+            node.in_channel_tensor._reset_channel_elem_cache()
+            node.out_channel_tensor._reset_channel_elem_cache()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_nodes.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_nodes.py
new file mode 100644
index 0000000000000000000000000000000000000000..be8e3dee37bb757436cc83e9c7cfb5876a6c730f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/channel_nodes.py
@@ -0,0 +1,533 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""ChannelNodes are basic node type of ChannelGraph.
+
+Different ChannelNodes represent different modules.
+"""
+import operator
+from abc import abstractmethod
+from typing import List, Union
+
+import torch
+import torch.nn as nn
+from mmcv.cnn.bricks import Scale
+from mmengine import MMLogger
+
+from mmrazor.utils import print_log
+from .channel_flow import ChannelTensor
+from .module_graph import ModuleNode
+
+# error types
+
+
+class ChannelDismatchError(Exception):
+    pass
+
+
+def assert_channel(condition, node):
+    if not condition:
+        raise ChannelDismatchError(node.name)
+
+
+# ChannelNode
+
+
+class ChannelNode(ModuleNode):
+    """A ChannelNode is like a torch module. It accepts  a ChannelTensor and
+    output a ChannelTensor. The difference is that the torch module transforms
+    a tensor, while the ChannelNode records the information of channel
+    dependency in the ChannelTensor.
+
+    Args:
+        name (str): The name of the node.
+        val (Union[nn.Module, str]): value of the node.
+        module_name (str, optional): the module name of the module of the
+            node.
+    """
+
+    # init
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='') -> None:
+
+        super().__init__(name, val, module_name)
+        self.in_channel_tensor: Union[None, ChannelTensor] = None
+        self.out_channel_tensor: Union[None, ChannelTensor] = None
+        self.return_tensor: Union[None, ChannelTensor] = None
+
+    @classmethod
+    def copy_from(cls, node):
+        """Copy from a ModuleNode."""
+        assert isinstance(node, ModuleNode)
+        return cls(node.name, node.val, node.module_name)
+
+    def reset_channel_tensors(self):
+        """Reset the owning ChannelTensors."""
+        self.in_channel_tensor = None
+        self.out_channel_tensor = None
+
+    # forward
+
+    def forward(self, in_channel_tensors=None):
+        """Forward with ChannelTensors."""
+        if in_channel_tensors is None:
+            out_channel_tensors = [
+                node.return_tensor for node in self.prev_nodes
+            ]
+            in_channel_tensors = out_channel_tensors
+        try:
+            self.return_tensor = self.channel_forward(in_channel_tensors)
+        except Exception as e:
+            raise Exception(f'{e},{self.name}')
+
+    @abstractmethod
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        """Forward with ChannelTensors."""
+        assert len(channel_tensors) == 1, f'{len(channel_tensors)}'
+
+        self.in_channel_tensor = channel_tensors[0]
+        self.out_channel_tensor = ChannelTensor(self.out_channels)
+        return self.out_channel_tensor
+
+    # channels
+
+    # @abstractmethod
+    @property
+    def in_channels(self) -> int:
+        """Get the number of input channels of the node."""
+        try:
+            return self._in_channels
+        except NotImplementedError:
+            return \
+                self._get_in_channels_by_prev_nodes(self.prev_nodes)
+
+    # @abstractmethod
+    @property
+    def out_channels(self) -> int:
+        """Get the number of output channels of the node."""
+        try:
+            return self._out_channels
+        except NotImplementedError:
+            return self._get_out_channel_by_in_channels(self.in_channels)
+
+    def check_channel(self):
+        """Check if the node has a channel error."""
+        for node in self.prev_nodes:
+            assert_channel(node.out_channels == self.in_channels, self)
+
+    @property
+    def _in_channels(self) -> int:
+        """Get in channel number of by the module self."""
+        raise NotImplementedError(
+            f'{self.name}({self.__class__.__name__}) has no _in_channels')
+
+    @property
+    def _out_channels(self) -> int:
+        """Get out channel number of by the module self."""
+        raise NotImplementedError(
+            f'{self.name}({self.__class__.__name__}) has no _out_channels')
+
+    def _get_out_channel_by_in_channels(self, in_channels):
+        """Get output channel number by the input channel number."""
+        return in_channels
+
+    def _get_in_channels_by_prev_nodes(self, prev_nodes):
+        """Get input channel numbers by previous nodes."""
+        if len(prev_nodes) == 0:
+            print_log(
+                (f'As {self.name} '
+                 'has no prev nodes, so we set the in channels of it to 3.'),
+                level='debug')
+            return 3
+        else:
+            return prev_nodes[0].out_channels
+
+    def __repr__(self) -> str:
+        return f'{self.name}_({self.in_channels},{self.out_channels})'
+
+
+# basic nodes
+
+
+class PassUnionChannelNode(ChannelNode):
+    """A PassUnionChannelNode has the same number of input channels and output
+    channels.
+
+    Besides, the corresponding input channels and output channels belong to one
+    channel unit. Such as  BatchNorm, Relu.
+    """
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        """Channel forward."""
+        return PassUnionChannelNode._channel_forward(self, channel_tensors[0])
+
+    @staticmethod
+    def _channel_forward(node: ChannelNode, tensor: ChannelTensor):
+        """Channel forward."""
+        assert node.in_channels == node.out_channels
+        assert isinstance(tensor, ChannelTensor)
+        node.in_channel_tensor = tensor
+        node.out_channel_tensor = tensor
+        return node.out_channel_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_uion'
+
+
+class PassChannelNode(ChannelNode):
+
+    def _get_in_channels_by_prev_nodes(self, prev_nodes):
+        assert len(self.prev_nodes) == 1
+        node0: ChannelNode = self.prev_nodes[0]
+        return node0.out_channels
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        assert len(channel_tensors) == 1
+        self.in_channel_tensor = ChannelTensor(1)
+        self.out_channel_tensor = ChannelTensor(1)
+        return channel_tensors[0]
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_pass'
+
+
+class MixChannelNode(ChannelNode):
+    """A MixChannelNode  has independent input channels and output channels."""
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        """Channel forward."""
+        assert len(channel_tensors) <= 1
+        if len(channel_tensors) == 1:
+            self.in_channel_tensor = channel_tensors[0]
+            self.out_channel_tensor = ChannelTensor(self.out_channels)
+        else:
+            raise NotImplementedError()
+        return self.out_channel_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_mix'
+
+
+class BindChannelNode(ChannelNode):
+    """A BindChannelNode has multiple inputs, and all input channels belong to
+    the same channel unit."""
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        """Channel forward."""
+        assert len(channel_tensors) > 0, f'{self}'
+        #  align channel_tensors
+        for tensor in channel_tensors[1:]:
+            channel_tensors[0].union(tensor)
+        self.in_channel_tensor = channel_tensors[0]
+        self.out_channel_tensor = channel_tensors[0]
+        return self.out_channel_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_bind'
+
+    def check_channel(self):
+        for node in self.prev_nodes:
+            assert_channel(node.out_channels == self.in_channels, self)
+
+
+class CatChannelNode(ChannelNode):
+    """A CatChannelNode cat all input channels."""
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        tensor_cat = ChannelTensor.cat(channel_tensors)
+        self.in_channel_tensor = tensor_cat
+        self.out_channel_tensor = tensor_cat
+        return self.out_channel_tensor
+
+    def check_channel(self):
+        in_num = [node.out_channels for node in self.prev_nodes]
+        assert_channel(sum(in_num) == self.in_channels, self)
+
+    def _get_in_channels_by_prev_nodes(self, prev_nodes):
+        assert len(prev_nodes) > 0
+        nums = [node.out_channels for node in prev_nodes]
+        return sum(nums)
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_cat'
+
+
+class ExpandChannelNode(ChannelNode):
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='',
+                 expand_ratio=1) -> None:
+        super().__init__(name, val, module_name)
+        self.expand_ratio = expand_ratio
+
+    def _get_out_channel_by_in_channels(self, in_channels):
+        return in_channels * self.expand_ratio
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        assert len(channel_tensors) == 1, f'{self}'
+        assert self.out_channels >= self.in_channels, f'{self}'
+        assert self.out_channels % self.in_channels == 0, f'{self}'
+        tensor0 = channel_tensors[0]
+        self.in_channel_tensor = tensor0
+        self.out_channel_tensor = tensor0.expand(self.expand_ratio)
+        return self.out_channel_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + f'_expand({self.expand_ratio})'
+
+
+class InputChannelNode(ChannelNode):
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='',
+                 input_channels=3) -> None:
+        super().__init__(name, val, module_name)
+        self._input_channels = input_channels
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        input_tensor = ChannelTensor(self._input_channels)
+        self.in_channel_tensor = input_tensor
+        self.out_channel_tensor = input_tensor
+        return input_tensor
+
+    @property
+    def _in_channels(self) -> int:
+        return self._input_channels
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_input'
+
+
+class EndNode(ChannelNode):
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        tensor_end = ChannelTensor(1)
+        self.in_channel_tensor = tensor_end
+        self.out_channel_tensor = tensor_end
+        for channel in channel_tensors:
+            channel.union(tensor_end.expand(len(channel)))
+        return self.out_channel_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_end'
+
+    def check_channel(self):
+        pass
+
+
+# module nodes
+
+
+class ConvNode(MixChannelNode):
+    """A ConvNode corresponds to a Conv2d module.
+
+    It can deal with normal conv, dwconv and gwconv.
+    """
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='') -> None:
+        super().__init__(name, val, module_name)
+        assert isinstance(self.val, nn.Conv2d)
+
+    @property
+    def conv_type(self):
+        if self.val.groups == 1:
+            return 'conv'
+        elif self.val.in_channels == self.out_channels == self.val.groups:
+            return 'dwconv'
+        else:
+            return 'gwconv'
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        if self.conv_type == 'conv':
+            return super().channel_forward(channel_tensors)
+        elif self.conv_type == 'dwconv':
+            return PassUnionChannelNode._channel_forward(
+                self, channel_tensors[0])
+        elif self.conv_type == 'gwconv':
+            return self._gw_conv_channel_forward(channel_tensors)
+        else:
+            raise NotImplementedError(f'{self}')
+
+    def _gw_conv_channel_forward(self, channel_tensors: List[ChannelTensor]):
+
+        assert len(channel_tensors) == 1
+        tensor0 = channel_tensors[0]
+        conv: nn.Conv2d = self.val
+        group_union(tensor0, conv.groups)
+        self.in_channel_tensor = tensor0
+        self.out_channel_tensor = ChannelTensor(self.out_channels)
+        group_union(self.out_channel_tensor, conv.groups)
+        return self.out_channel_tensor
+
+    @property
+    def _in_channels(self) -> int:
+        return self.val.in_channels
+
+    @property
+    def _out_channels(self) -> int:
+        return self.val.out_channels
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_conv'
+
+
+class LinearNode(MixChannelNode):
+    """A LinearNode corresponds to a Linear module."""
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='') -> None:
+        super().__init__(name, val, module_name)
+        assert isinstance(self.val, nn.Linear)
+
+    @property
+    def _in_channels(self) -> int:
+        return self.val.in_features
+
+    @property
+    def _out_channels(self) -> int:
+        return self.val.out_features
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_linear'
+
+
+class BnNode(PassUnionChannelNode):
+    """A NormNode corresponds to a BatchNorm2d module."""
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='') -> None:
+        super().__init__(name, val, module_name)
+        assert isinstance(self.val,
+                          nn.modules.batchnorm._BatchNorm), f'{type(self.val)}'
+
+    @property
+    def _in_channels(self) -> int:
+        return self.val.num_features
+
+    @property
+    def _out_channels(self) -> int:
+        return self.val.num_features
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_bn'
+
+
+class GroupNormNode(PassUnionChannelNode):
+
+    def __init__(self,
+                 name: str,
+                 val: Union[nn.Module, str],
+                 module_name='') -> None:
+        super().__init__(name, val, module_name)
+        assert isinstance(self.val, nn.GroupNorm)
+        self.val: nn.GroupNorm
+
+    @property
+    def _in_channels(self) -> int:
+        return self.val.num_channels
+
+    @property
+    def _out_channels(self) -> int:
+        return self.val.num_channels
+
+    def channel_forward(self, channel_tensors: List[ChannelTensor]):
+        out_tensor = super().channel_forward(channel_tensors)
+        group_tensor = ChannelTensor(self.in_channels // self.val.num_groups)
+        group_union(out_tensor, self.val.num_groups, group_tensor)
+        return out_tensor
+
+    def __repr__(self) -> str:
+        return super().__repr__() + '_gn'
+
+
+# converter
+
+channel_nodes_mapping = {
+    'module': {
+        nn.Conv2d: ConvNode,
+        nn.modules.batchnorm._BatchNorm: BnNode,
+        nn.Linear: LinearNode,
+        nn.modules.ReLU: PassChannelNode,
+        nn.modules.Hardtanh: PassChannelNode,
+        # pools
+        nn.modules.pooling._AvgPoolNd: PassChannelNode,
+        nn.modules.pooling._AdaptiveAvgPoolNd: PassChannelNode,
+        nn.modules.pooling._MaxPoolNd: PassChannelNode,
+        nn.modules.pooling._AdaptiveMaxPoolNd: PassChannelNode,
+        Scale: PassChannelNode,
+        nn.modules.GroupNorm: GroupNormNode,
+    },
+    'function': {
+        torch.add: BindChannelNode,
+        torch.cat: CatChannelNode,
+        operator.add: BindChannelNode,
+    },
+    'str': {
+        'bind_placeholder': BindChannelNode,
+        'pass_placeholder': PassUnionChannelNode,
+        'cat_placeholder': CatChannelNode,
+        'input_placeholder': InputChannelNode,
+        'output_placeholder': EndNode
+    },
+}
+
+
+def default_channel_node_converter(
+        node: ModuleNode,
+        module_mapping=channel_nodes_mapping['module'],
+        function_mapping=channel_nodes_mapping['function'],
+        name_mapping=channel_nodes_mapping['str']) -> ChannelNode:
+    """The default node converter for ChannelNode."""
+
+    def warn(default='PassUnionChannelNode'):
+        logger = MMLogger.get_current_instance()
+        logger.info(
+            (f"{node.name}({node.module_name}) node can't find match type of"
+             'channel_nodes,'
+             f'replaced with {default} by default.'))
+
+    if isinstance(node.val, nn.Module):
+        # module_mapping
+        for module_type in module_mapping:
+            if isinstance(node.val, module_type):
+                return module_mapping[module_type].copy_from(node)
+
+    elif isinstance(node.val, str):
+        for module_type in name_mapping:
+            if node.val == module_type:
+                return name_mapping[module_type].copy_from(node)
+    else:
+        for fun_type in function_mapping:
+            if node.val == fun_type:
+                return function_mapping[fun_type].copy_from(node)
+    if len(node.prev_nodes) > 1:
+        warn('BindChannelNode')
+        return BindChannelNode.copy_from(node)
+    else:
+        warn('PassUnionChannelNode')
+        return PassUnionChannelNode.copy_from(node)
+
+
+# helper functions
+
+
+def group_union(tensor: ChannelTensor, groups: int, group_tensor=None):
+    """Group-wise union for ChannelTensor."""
+    c_per_group = len(tensor) // groups
+    if group_tensor is None:
+        group_tensor = ChannelTensor(c_per_group)
+    assert groups * len(group_tensor) == len(tensor)
+    for i in range(groups):
+        tensor[i * c_per_group:(i + 1) * c_per_group].union(group_tensor)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/module_graph.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/module_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..d90771940c752c1f12f1521bced8467b111eb13f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/module_graph.py
@@ -0,0 +1,507 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module defines ModuleNode and ModuleGraph.
+
+They model the computation graph of a model based on BaseNode and BaseGraph
+"""
+import copy
+from collections import OrderedDict
+from typing import Dict, List, TypeVar, Union
+
+import torch.nn as nn
+from torch.nn import Module
+
+from mmrazor.models.task_modules.tracer.backward_tracer import BackwardTracer
+from mmrazor.models.task_modules.tracer.loss_calculator import \
+    ImageClassifierPseudoLoss
+from mmrazor.models.task_modules.tracer.path import (Path, PathConcatNode,
+                                                     PathList, PathNode)
+from mmrazor.registry import TASK_UTILS
+from mmrazor.utils import print_log
+from .base_graph import BaseGraph, BaseNode
+from .pseudo_fx_graph import FxBaseNode
+
+
+# ModuleNode && ModuleGraph
+class NoOutputError(Exception):
+    """An error occurs when no output node for a leaf node."""
+
+    def __init__(self, node, *args: object) -> None:
+        super().__init__(f'{node}', *args)
+        self.node = node
+
+    pass
+
+
+class NoInputError(Exception):
+    """An error occurs when no input node for a leaf node."""
+
+    def __init__(self, node, *args: object) -> None:
+        super().__init__(f'{node}', *args)
+        self.node = node
+
+
+def my_assert(condiion, exception):
+    """assert helper function."""
+    if not condiion:
+        raise exception
+
+
+class ModuleNode(BaseNode):
+    """A node in a computation graph.
+
+    All nodes are divided to four types, the detail of definition can be found
+    in functions  self.is_{xxx}_node.
+    """
+
+    pre_defined_node_val_str = [
+        'cat_placeholder', 'bind_placeholder', 'pass_placeholder'
+    ]
+
+    def __init__(self,
+                 name: str,
+                 val: Union[Module, str],
+                 module_name='') -> None:
+        """
+        Args:
+            name (str): the name of the node
+            val (Module | str): content of the node. It can be Module or
+            string. If val is a string, the string can only be one of
+                self.pre_defined_node_val_str
+        Note:
+            Here, we give an example of expand_ratio.
+            >>> class Pool(nn.Module):
+                    def forward(x):
+                        return F.adaptive_avg_pool2d(x,2).flatten(1)
+            >>> node= ModuleNode('pass_0',Pool(),expand_ratio=4)
+            >>> assert node.out_channels == node.in_channels*4
+        """
+
+        super().__init__(name, val)
+        self.module_name = module_name
+
+    # other
+
+    @property
+    def is_module(self):
+        """Whether the node includes a module."""
+        return isinstance(self.val, nn.Module)
+
+    def __repr__(self) -> str:
+        repr = f'{self.name}'
+        if self.module_name != '':
+            repr += f'({self.module_name})'
+        return repr
+
+    # node type
+
+    @property
+    def basic_type(self) -> str:
+        """The basic type of the node.
+
+        Basic types are divided into seveval major types, detailed in
+        self.is_{xxx}_node
+        """
+        if isinstance(self.val, Module):
+            if isinstance(self.val, nn.Conv2d):
+                if self.val.groups == 1:
+                    return 'conv2d'
+                elif self.val.groups == self.val.in_channels == \
+                        self.val.out_channels:
+                    return 'dwconv2d'
+                else:
+                    return 'gwconv2d'
+            elif isinstance(self.val, nn.modules.batchnorm._BatchNorm):
+                return 'bn'
+            elif isinstance(self.val, nn.Linear):
+                return 'linear'
+            else:
+                raise NotImplementedError(f'{self.val}')
+        else:
+            if self.val in [
+                    'cat_placeholder', 'bind_placeholder', 'pass_placeholder'
+            ]:
+                return self.val
+            else:
+                raise NotImplementedError()
+
+    def is_pass_node(self):
+        """pass node represent a module whose in-channels correspond out-
+        channels one-to-one."""
+        return self.basic_type in ['bn', 'dwconv2d', 'pass_placeholder']
+
+    def is_cat_node(self):
+        """cat node represents a cat module."""
+        return self.basic_type == 'cat_placeholder'
+
+    def is_bind_node(self):
+        """bind node represent a node that has multiple inputs, and their
+        channels are bound one-to-one."""
+        return self.basic_type == 'bind_placeholder'
+
+    def is_mix_node(self):
+        """mix node represents a module that mixs all input channels and
+        generete new output channels, such as conv and linear."""
+        return self.basic_type in ['conv2d', 'linear', 'gwconv2d']
+
+    def is_input(self):
+        """Whether the node is an input node."""
+        return self.val == 'input_placeholder'
+
+    def is_output(self):
+        """Whether the node is an output node."""
+        return self.val == 'output_placeholder'
+
+    def check(self):
+        """Check whether the node has any error."""
+        if self.is_input():
+            assert len(self.prev_nodes) == 0, f'{self}'
+            my_assert(len(self.next_nodes) > 0, NoOutputError(self))
+        elif self.is_output():
+            my_assert(len(self.prev_nodes) > 0, NoInputError(self))
+            assert len(self.next_nodes) == 0, f'{self}'
+        else:
+            my_assert(len(self.prev_nodes) > 0, NoInputError(self))
+            my_assert(len(self.next_nodes) > 0, NoOutputError(self))
+
+
+MODULENODE = TypeVar('MODULENODE', bound=ModuleNode)
+
+
+class ModuleGraph(BaseGraph[MODULENODE]):
+    """Computatation Graph."""
+
+    def __init__(self, model=None) -> None:
+        super().__init__()
+        self._model: nn.Module = model
+
+    # functions to generate module graph.
+
+    @staticmethod
+    def init_from_backward_tracer(
+        model: Module,
+        backward_tracer=BackwardTracer(
+            loss_calculator=ImageClassifierPseudoLoss()),
+    ):
+        """init module graph using backward tracer."""
+        if isinstance(backward_tracer, dict):
+            backward_tracer = TASK_UTILS.build(backward_tracer)
+        path_lists = backward_tracer.trace(model)
+        converter = PathToGraphConverter(path_lists, model)
+        converter.graph.refresh_module_name()
+        return converter.graph
+
+    @staticmethod
+    def init_from_model(model: Module):
+        """init module graph from a model which uses connect_module to record
+        the relation among modules."""
+        pass
+
+    # others
+    def refresh_module_name(self):
+        """Refresh the module name."""
+        module2name = {}
+        for name, module in self._model.named_modules():
+            module2name[module] = name
+
+        for node in self:
+            if isinstance(node.val, nn.Module):
+                node.module_name = module2name[node.val]
+
+    def check(self, fix=False):
+        """Check whether the Graph has any error."""
+        for node in copy.copy(list(self.topo_traverse())):
+            self._check(node, fix=fix)
+
+    def _check(self, node, fix=False):
+        """Helper method for self.check."""
+        try:
+            node.check()
+        except Exception as e:
+            if not fix:
+                raise e
+            else:
+                try:
+                    raise e
+                except NoOutputError as e:
+                    print_log(
+                        f'add a output after {node}, error: {e}',
+                        level='debug')
+                    self._add_output_after(node)
+                except NoInputError as e:
+                    print_log(
+                        f'add a input before {node}, error: {e}',
+                        level='debug')
+                    self._add_input_before(node)
+
+                self._check(node, fix=True)
+
+    def _add_input_before(self, node):
+        """Add an input node before a node."""
+        input_node = ModuleNode('auto_input',
+                                'input_placeholder')  # type: ignore
+        input_node = self.add_or_find_node(input_node)
+        self.connect(input_node, node)
+
+    def _add_output_after(self, node):
+        """Add an output node after a node."""
+        output_node = ModuleNode('auto_output',
+                                 'output_placeholder')  # type: ignore
+        output_node = self.add_or_find_node(output_node)
+        self.connect(node, output_node)
+
+
+# Converter
+
+
+class GraphConverter:
+    """Base class for converters for ModuleGraph."""
+
+    def __init__(self, model) -> None:
+        self.graph = ModuleGraph[ModuleNode](model)
+        self.cat_placeholder_num = 0
+        self.bind_placeholder_num = 0
+        self.pass_placeholder_num = 0
+
+    # add node
+
+    def _new_placeholder_node(self, type: str, expand_ratio=1):
+        """New cat/bind/pass node."""
+        assert type in [
+            'cat_placeholder', 'pass_placeholder', 'bind_placeholder'
+        ]
+        if expand_ratio != 1:
+            assert type == 'pass_placeholder'
+        if type == 'cat_placeholder':
+            num = self.cat_placeholder_num
+            self.cat_placeholder_num += 1
+        elif type == 'pass_placeholder':
+            num = self.pass_placeholder_num
+            self.pass_placeholder_num += 1
+        elif type == 'bind_placeholder':
+            num = self.bind_placeholder_num
+            self.bind_placeholder_num += 1
+        else:
+            pass
+        node = ModuleNode(f'{type}_{num}', type)
+        self.graph.add_or_find_node(node)
+        return node
+
+    # insert nodes
+
+    def _insert_node_before(self, node: ModuleNode, new_node: ModuleNode):
+        """Insert a new node before a node."""
+        for pre in node.prev_nodes:
+            self.graph.connect(pre, new_node)
+        for pre in new_node.prev_nodes:
+            self.graph.disconnect(pre, node)
+        self.graph.connect(new_node, node)
+
+    def _insert_bind_nodes(self):
+        """Add bind nodes before the nodes which only need one previous node
+        but have more than one."""
+
+        need_bind_nodes = []
+        for node in self.graph:
+            if (isinstance(node.val, nn.Conv2d)
+                    or isinstance(node.val, nn.Linear)
+                    or isinstance(node.val, nn.modules.batchnorm._BatchNorm)):
+                if len(node.prev_nodes) > 1:
+                    need_bind_nodes.append(node)
+        for node in need_bind_nodes:
+            bind_node = self._new_placeholder_node('bind_placeholder')
+            self._insert_node_before(node, bind_node)
+
+    def _insert_pass_nodes(self):
+        """Add pass nodes where the channel conflict."""
+        for node in copy.copy(list(self.graph.nodes.values())):
+            if len(node.prev_nodes) == 1:
+                pre: ModuleNode = node.prev_nodes[0]
+                if node.in_channels != pre.out_channels:
+                    assert node.in_channels % pre.out_channels == 0, \
+                        f'{node.name} channel error'
+                    pass_node = self._new_placeholder_node(
+                        'pass_placeholder',
+                        node.in_channels // pre.out_channels)
+                    self._insert_node_before(node, pass_node)
+
+    def _remove_redundant_pass_nodes(self):
+        """Remove redundant pass nodes, which do not change number of channels
+        and  do not represent any module."""
+        for node in copy.copy(list(self.graph.nodes.values())):
+            if (node.is_pass_node() and len(node.prev_nodes) == 1
+                    and len(node.next_nodes) == 1
+                    and not isinstance(node.val, nn.Module)
+                    and node.in_channels == node.out_channels):
+                self.graph.delete_node(node)
+
+    # topo_rename_nodes
+    def _topo_rename(self):
+        """Rename cat, bind, pass nodes in topological order."""
+        self.cat_placeholder_num = 0
+        self.bind_placeholder_num = 0
+        self.pass_placeholder_num = 0
+        sorted_nodes = OrderedDict()
+        for node in self.graph.topo_traverse():
+            node: ModuleNode
+            if isinstance(node.val, Module):
+                pass
+            elif node.is_pass_node():
+                node.name = f'pass_{self.pass_placeholder_num}'
+                self.pass_placeholder_num += 1
+            elif node.is_cat_node():
+                node.name = f'cat_{self.cat_placeholder_num}'
+                self.cat_placeholder_num += 1
+            elif node.is_bind_node():
+                node.name = f'bind_{self.bind_placeholder_num}'
+                self.bind_placeholder_num += 1
+            else:
+                pass
+            sorted_nodes[node.name] = node
+        self.graph.nodes = sorted_nodes
+
+    # other
+    def _post_process(self):
+        """Some post process after init a basic module graph."""
+        # self._remove_redundant_pass_nodes()
+        self._insert_bind_nodes()
+        self._topo_rename()
+
+
+class PathToGraphConverter(GraphConverter):
+    """The class converts pathlist, which is generated by backward tracer, to a
+    module graph."""
+
+    def __init__(self, path_list: PathList, model: Module) -> None:
+        """
+            Args:
+                path_list (PathList): path_list generated by backward tracer.
+                model (Module): the model corresponding to the path_list
+        """
+        super().__init__(model)
+        self.path_list = path_list
+        self.cat_dict: Dict[str, str] = {}
+        self.name2module = dict(model.named_modules())
+        self._parse(self.path_list)
+
+        self._insert_bind_nodes()
+        self._topo_rename()
+
+    def _parse(self, path_list: PathList):
+        """Parse path list."""
+        self._parse_helper(path_list, [])
+
+    def _parse_helper(self, path_unit: Union[PathList, Path, PathNode],
+                      next_nodes: List[ModuleNode]):
+        """Parse a node(unit) in path list."""
+        current_node = None
+        # path_list
+        if isinstance(path_unit, PathList):
+            for single_path in path_unit:  # sibling
+                self._parse_helper(single_path, next_nodes)
+
+        # path:
+        elif isinstance(path_unit, Path):
+            current_nexts = next_nodes
+            for node in path_unit:  # parent -> children
+                current_node = self._parse_helper(node, current_nexts)
+                current_nexts = [current_node]
+
+        # Node
+        elif isinstance(path_unit, PathNode):
+
+            # cat node: [cat_path_lists]
+            if isinstance(path_unit, PathConcatNode):
+                current_node = self._add_or_find_node(path_unit)
+                self._connect_nexts(current_node, next_nodes)
+                for catpath in path_unit.path_lists:  # sibling
+                    self._parse_helper(catpath, [current_node])
+
+            # single node
+            else:
+                current_node = self._add_or_find_node(path_unit)
+                self._connect_nexts(current_node, next_nodes)
+        return current_node
+
+    def _add_or_find_cat_node(self, pathnode: PathConcatNode):
+        """Receive a cat-node.
+
+        If the cat-node exists in the graph, the corresponding node is
+        returned, or a new cat node is added to the graph.
+        """
+
+        def unify_cat_name(name: str):
+            cat_name = name.split('_')
+            inputs = sorted(cat_name[1:])
+            return f"cat_{'_'.join(inputs)}"
+
+        name_id = pathnode.name
+        name_id = unify_cat_name(name_id)
+        if name_id in self.cat_dict:
+            name = self.cat_dict[name_id]
+        else:
+            name = f'cat_{self.cat_placeholder_num}'
+            self.cat_placeholder_num += 1
+            self.cat_dict[name_id] = name
+        node = self.graph.add_or_find_node(ModuleNode(name, 'cat_placeholder'))
+        return node
+
+    def _add_or_find_node(self, pathnode: PathNode) -> Module:
+        """Receive a cat-node.
+
+        If the cat-node exists in the graph, the corresponding node is
+        returned, or a new cat node is added to the graph.
+        """
+        if isinstance(pathnode, PathConcatNode):
+            return self._add_or_find_cat_node(pathnode)
+        else:
+            name = pathnode.name
+            assert name in self.name2module, f"{name} doesn't exist in model"
+            module = self.name2module[name]
+            return self.graph.add_or_find_node(ModuleNode(name, module))
+
+    def _connect_nexts(self, node, nexts: List[ModuleNode]):
+        """Connext the node and the nodes in nexts."""
+        for next in nexts:
+            self.graph.connect(node, next)
+
+
+class FxTracerToGraphConverter(GraphConverter):
+    """Use fx tracer to parse model, and generate module-graph."""
+
+    def __init__(self, base_graph, model=None) -> None:
+        """
+        Args:
+            model (Module): the model which will be parsed
+            is_extra_leaf_module (Callable): a function used to determine,
+             if a module is a leaf module except torch pre-defined modules
+        """
+        super().__init__(model)
+        self.base_graph = base_graph
+        self._convert_graph()
+
+    def _node_converter(self, node: FxBaseNode):
+        """Convert a fxnode to a module-node."""
+        if node.is_function():
+            val = node.function()
+        elif node.is_input():
+            val = 'input_placeholder'
+        elif node.is_output():
+            val = 'output_placeholder'
+        elif node.is_method():
+            val = node.method()
+        elif node.is_get_attr():
+            val = 'get_attr'
+        elif node.is_module():
+            val = node.module()
+        else:
+            raise NotImplementedError(f'{node} is unsupported')
+
+        new_node = ModuleNode(node.name, val)
+        return new_node
+
+    def _convert_graph(self):
+        """Convert a torch-graph to a module-graph."""
+        base_graph = self.base_graph
+        # copy_nodes and connect
+        module_graph = ModuleGraph.copy_from(base_graph, self._node_converter)
+        self.graph = module_graph
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/pseudo_fx_graph.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/pseudo_fx_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..210fc2302a665bc3217b4a7357ef5ef9f54b0333
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/graph/pseudo_fx_graph.py
@@ -0,0 +1,105 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This module define FxTracer and related classes."""
+
+import torch
+
+from mmrazor.utils import get_placeholder
+
+try:
+    import torch.fx as fx
+    from torch.fx.node import Node as FxNode
+except ImportError:
+    fx = get_placeholder('torch>=1.12')
+    FxNode = get_placeholder('torch>=1.12')
+from mmrazor.structures.graph.base_graph import BaseGraph, BaseNode
+
+
+class FxBaseNode(BaseNode):
+    """Node to record FxNode."""
+
+    def __init__(self, name: str, val: FxNode) -> None:
+        super().__init__(name, val)
+
+    def module(self):
+        """Union[Module | None]: the module the fxnode corresponding to."""
+        self.val: FxNode
+        model = self.val.graph.owning_module
+        if self.val.op == 'call_module':
+            target = self.val.target
+            target = target.split('.')
+            obj = model
+            for t in target:
+                obj = getattr(obj, t)
+            return obj
+        else:
+            return None
+
+    def function(self):
+        """Union[Callable | Node]: the function the fxnode corresponding to."""
+        if self.is_function():
+            return self.val.target
+        else:
+            return None
+
+    def method(self):
+        if self.is_method():
+            return self.val.target
+        else:
+            return None
+
+    # base type
+    # placeholder|call_method|call_module|call_function|get_attr|output
+
+    def is_function(self):
+        """Bool: if the fxnode represents 'call_function'"""
+        return self.val.op == 'call_function'
+
+    def is_module(self):
+        """Bool: if the fxnode represents 'call_module'"""
+        return self.val.op == 'call_module'
+
+    def is_input(self):
+        """Bool: if the fxnode represents input or output tensors"""
+        return self.val.op == 'placeholder'
+
+    def is_output(self):
+        return self.val.op == 'output'
+
+    def is_method(self):
+        """Bool: if the fxnode represents 'call_method'"""
+        return self.val.op == 'call_method'
+
+    def is_get_attr(self):
+        """Bool: if the fxnode represents 'get_attr'"""
+        return self.val.op == 'get_attr'
+
+    # extended type
+
+    def is_cat(self):
+        """Bool: if the fxnode represents a cat node"""
+        return self.is_function() and self.function() is torch.cat
+
+    # other
+
+    def __repr__(self) -> str:
+        return f'{self.name}({self.val.op})'
+
+
+def parse_torch_graph(torch_graph):
+    """BaseGraph: convert torch graph to self.graph"""
+    torch_graph: fx.graph.Graph
+
+    def add_node(graph, fxnode):
+        node = graph.add_or_find_node(FxBaseNode(fxnode.name, fxnode))
+        return node
+
+    graph = BaseGraph[FxBaseNode]()
+    # copy_nodes
+    for fxnode in torch_graph.nodes:
+        add_node(graph, fxnode)
+
+    # connect nodes
+    for fxnode in torch_graph.nodes:
+        for pre_node in fxnode.all_input_nodes:
+            graph.connect(add_node(graph, pre_node), add_node(graph, fxnode))
+    return graph
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbf28034fbee2674d2d8b820f79931e6d425751a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/__init__.py
@@ -0,0 +1,3 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .backend_config import *  # noqa: F401,F403
+from .qconfig import *  # noqa: F401,F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..151968f8d588f181cc2b60a77c2a977c09179e42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/__init__.py
@@ -0,0 +1,21 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .academic import (get_academic_backend_config,
+                       get_academic_backend_config_dict)
+from .mapping import BackendConfigs
+from .native import get_native_backend_config, get_native_backend_config_dict
+from .openvino import (get_openvino_backend_config,
+                       get_openvino_backend_config_dict)
+from .tensorrt import (get_tensorrt_backend_config,
+                       get_tensorrt_backend_config_dict)
+
+__all__ = [
+    'BackendConfigs',
+    'get_native_backend_config',
+    'get_native_backend_config_dict',
+    'get_academic_backend_config',
+    'get_academic_backend_config_dict',
+    'get_openvino_backend_config',
+    'get_openvino_backend_config_dict',
+    'get_tensorrt_backend_config',
+    'get_tensorrt_backend_config_dict',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/academic.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/academic.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b4f0d598963b694d6b5d0b0fc2cd149a498ddf9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/academic.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+try:
+    from torch.ao.quantization.backend_config import BackendConfig, DTypeConfig
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BackendConfig = get_placeholder('torch>=1.13')
+    DTypeConfig = get_placeholder('torch>=1.13')
+
+from .common_operator_config_utils import (_get_conv_configs,
+                                           _get_linear_configs)
+
+# =====================
+# |  BACKEND CONFIGS  |
+# =====================
+
+
+def get_academic_backend_config() -> BackendConfig:
+    """Return the `BackendConfig` for academic reseaching.
+
+    Note:
+        Learn more about BackendConfig, please refer to:
+        https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config # noqa: E501
+    """
+
+    # ===================
+    # |  DTYPE CONFIGS  |
+    # ===================
+    # weighted op int8 dtype config
+    # this is config for ops that has quantized weights, like linear, conv
+    weighted_op_int8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+        weight_dtype=torch.qint8,
+        bias_dtype=torch.float,
+    )
+
+    conv_dtype_configs = [weighted_op_int8_dtype_config]
+    linear_dtype_configs = [weighted_op_int8_dtype_config]
+
+    return BackendConfig('academic') \
+        .set_backend_pattern_configs(_get_conv_configs(conv_dtype_configs)) \
+        .set_backend_pattern_configs(_get_linear_configs(linear_dtype_configs))
+
+
+def get_academic_backend_config_dict():
+    """Return the `BackendConfig` for academic reseaching in dictionary
+    form."""
+    return get_academic_backend_config().to_dict()
+
+
+__all__ = [
+    'get_academic_backend_config',
+    'get_academic_backend_config_dict',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/common_operator_config_utils.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/common_operator_config_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a381d5d05097253f27318a8f9abdf3e4baa99db
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/common_operator_config_utils.py
@@ -0,0 +1,639 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import operator
+from collections import namedtuple
+from typing import List
+
+import torch
+import torch.nn as nn
+
+from mmrazor import digit_version
+
+try:
+    import torch.nn.functional as F
+    import torch.nn.intrinsic as nni
+    import torch.nn.intrinsic.qat as nniqat
+    import torch.nn.qat as nnqat
+    import torch.nn.quantized._reference as nnqr
+    from torch.ao.quantization.backend_config import (BackendPatternConfig,
+                                                      DTypeConfig,
+                                                      ObservationType)
+    from torch.ao.quantization.fake_quantize import FixedQParamsFakeQuantize
+    from torch.ao.quantization.fuser_method_mappings import (
+        fuse_conv_bn, fuse_conv_bn_relu, fuse_convtranspose_bn, fuse_linear_bn,
+        reverse2, reverse3, reverse_sequential_wrapper2)
+    from torch.ao.quantization.qconfig_mapping import \
+        _FIXED_QPARAMS_OP_TO_OBSERVER
+except ImportError:
+    from mmrazor.utils import get_package_placeholder, get_placeholder
+    F = get_package_placeholder('torch>=1.13')
+    nni = get_package_placeholder('torch>=1.13')
+    nniqat = get_package_placeholder('torch>=1.13')
+    nnqat = get_package_placeholder('torch>=1.13')
+    nnqr = get_package_placeholder('torch>=1.13')
+    BackendPatternConfig = get_placeholder('torch>=1.13')
+    DTypeConfig = get_placeholder('torch>=1.13')
+    ObservationType = get_placeholder('torch>=1.13')
+    FixedQParamsFakeQuantize = get_placeholder('torch>=1.13')
+    fuse_conv_bn = get_placeholder('torch>=1.13')
+    fuse_conv_bn_relu = get_placeholder('torch>=1.13')
+    fuse_convtranspose_bn = get_placeholder('torch>=1.13')
+    fuse_linear_bn = get_placeholder('torch>=1.13')
+    reverse2 = get_placeholder('torch>=1.13')
+    reverse3 = get_placeholder('torch>=1.13')
+    reverse_sequential_wrapper2 = get_placeholder('torch>=1.13')
+    _FIXED_QPARAMS_OP_TO_OBSERVER = get_placeholder('torch>=1.13')
+
+_ConvMetadata = namedtuple('_ConvMetadata', [
+    'root', 'transpose', 'bn', 'reference', 'transpose_reference',
+    'fused_conv_relu', 'fused_conv_bn', 'fused_conv_bn_relu', 'qat',
+    'relu_qat', 'bn_qat', 'bn_relu_qat', 'func'
+])
+
+if digit_version(torch.__version__) >= digit_version('1.13.0'):
+    _Conv1dMetadata = _ConvMetadata(
+        nn.Conv1d, nn.ConvTranspose1d, nn.BatchNorm1d, nnqr.Conv1d,
+        nnqr.ConvTranspose1d, nni.ConvReLU1d, nni.ConvBn1d, nni.ConvBnReLU1d,
+        nnqat.Conv1d, nniqat.ConvReLU1d, nniqat.ConvBn1d, nniqat.ConvBnReLU1d,
+        F.conv1d)
+    _Conv2dMetadata = _ConvMetadata(
+        nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d, nnqr.Conv2d,
+        nnqr.ConvTranspose2d, nni.ConvReLU2d, nni.ConvBn2d, nni.ConvBnReLU2d,
+        nnqat.Conv2d, nniqat.ConvReLU2d, nniqat.ConvBn2d, nniqat.ConvBnReLU2d,
+        F.conv2d)
+    _Conv3dMetadata = _ConvMetadata(
+        nn.Conv3d, nn.ConvTranspose3d, nn.BatchNorm3d, nnqr.Conv3d,
+        nnqr.ConvTranspose3d, nni.ConvReLU3d, nni.ConvBn3d, nni.ConvBnReLU3d,
+        nnqat.Conv3d, nniqat.ConvReLU3d, nniqat.ConvBn3d, nniqat.ConvBnReLU3d,
+        F.conv3d)
+else:
+    toy_val = _ConvMetadata(*[i for i in range(13)])
+    _Conv1dMetadata = toy_val
+    _Conv2dMetadata = toy_val
+    _Conv3dMetadata = toy_val
+
+
+def _get_binary_op_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    binary_op_configs: List[BackendPatternConfig] = []
+    num_tensor_args_to_observation_type_mapping = {
+        # TODO: this is not used right now since we have extra check in prepare
+        # will need to change this to NO_OBSERVER later after we implemented
+        # Tensor dtype inference properly
+        0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
+        1: ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT,
+        2: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
+    }
+    for op_with_quantized_bop_scalar_variant in [
+            operator.add, torch.add, operator.mul, torch.mul
+    ]:
+        bop_patterns = [(torch.nn.ReLU, op_with_quantized_bop_scalar_variant),
+                        (torch.nn.functional.relu,
+                         op_with_quantized_bop_scalar_variant),
+                        (torch.relu, op_with_quantized_bop_scalar_variant),
+                        op_with_quantized_bop_scalar_variant]
+        for bop_pattern in bop_patterns:
+            binary_op_configs.append(
+                BackendPatternConfig(bop_pattern).set_dtype_configs(
+                    dtype_configs)  # noqa: E131
+                ._set_num_tensor_args_to_observation_type(
+                    num_tensor_args_to_observation_type_mapping))
+    # matmul
+    binary_op_configs.append(
+        BackendPatternConfig(torch.matmul).set_dtype_configs(
+            dtype_configs)  # noqa: E131
+    )
+    return binary_op_configs
+
+
+def _get_linear_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    """Return all configs related to linear modules and ops."""
+    observation_type = ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+    linear_configs: List[BackendPatternConfig] = []
+
+    # (1) Single linear modules/functions
+    # -------------------------------------
+    # linear module
+    linear_configs.append(
+        BackendPatternConfig(torch.nn.Linear).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(
+                nnqr.Linear).set_qat_module(nnqat.Linear))
+    # linear qat module
+    linear_configs.append(
+        BackendPatternConfig(nnqat.Linear).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(nnqr.Linear))
+    # functional linear
+    linear_configs.append(
+        BackendPatternConfig(torch.nn.functional.linear).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs)._set_input_type_to_index({
+            'weight': 1,
+            'bias': 2
+        }))
+
+    # (2) Linear + relu
+    # -------------------
+    # 2.1 linear module + relu fusion config
+    # linear relu, linear module + relu module
+    linear_configs.append(
+        BackendPatternConfig(
+            (torch.nn.ReLU,
+             torch.nn.Linear)).set_dtype_configs(dtype_configs)  # noqa: E131
+        .set_fuser_method(reverse_sequential_wrapper2(
+            nni.LinearReLU)).set_fused_module(nni.LinearReLU))
+    # linear relu, linear module + functional relu
+    linear_configs.append(
+        BackendPatternConfig(
+            (torch.nn.functional.relu,
+             torch.nn.Linear)).set_dtype_configs(dtype_configs)  # noqa: E131
+        .set_fuser_method(reverse_sequential_wrapper2(
+            nni.LinearReLU)).set_fused_module(nni.LinearReLU))
+
+    # 2.2 linear module + relu, fused module configs
+    # linear relu, fused module
+    linear_configs.append(
+        BackendPatternConfig(nni.LinearReLU).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(
+                nnqr.Linear).set_qat_module(nniqat.LinearReLU))
+    # linear relu, qat fused module
+    linear_configs.append(
+        BackendPatternConfig(nniqat.LinearReLU).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(nnqr.Linear))
+    # 2.3 functional linear + relu configs
+    # linear relu, functional linear + relu module
+    linear_configs.append(
+        BackendPatternConfig(
+            (torch.nn.ReLU,
+             F.linear)).set_observation_type(observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs))
+    # linear relu, functional linear + functional relu
+    linear_configs.append(
+        BackendPatternConfig(
+            (F.relu,
+             F.linear)).set_observation_type(observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs))
+
+    # (3) Linear + batchnorm
+    # ------------------------
+    # 3.1 linear bn fusion
+    linear_configs.append(
+        BackendPatternConfig(
+            (nn.BatchNorm1d,
+             nn.Linear)).set_dtype_configs(dtype_configs)  # noqa: E131
+        .set_fuser_method(reverse2(fuse_linear_bn)).set_fused_module(
+            nni.LinearBn1d))
+
+    # 3.2 linear bn fused
+    # linear bn, fused module
+    linear_configs.append(
+        BackendPatternConfig(nni.LinearBn1d).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(
+                nnqr.Linear).set_qat_module(nniqat.LinearBn1d))
+    # linear bn, qat fused module
+    linear_configs.append(
+        BackendPatternConfig(nniqat.LinearBn1d).set_observation_type(
+            observation_type)  # noqa: E131
+        .set_dtype_configs(dtype_configs).set_root_module(
+            torch.nn.Linear).set_reference_quantized_module(nnqr.Linear))
+    return linear_configs
+
+
+def _get_conv_configs(dtype_configs):
+    """Return all configs related to conv modules and ops."""
+    conv_configs = []
+    observation_type = ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+    for convs in [_Conv1dMetadata, _Conv2dMetadata, _Conv3dMetadata]:
+
+        # (1) Single conv modules/functions
+        # -----------------------------------
+        # conv module
+        conv_configs.append(
+            BackendPatternConfig(convs.root).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(
+                    convs.reference).set_qat_module(convs.qat))
+        # conv qat module
+        conv_configs.append(
+            BackendPatternConfig(convs.qat).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(convs.reference))
+        # functional conv
+        conv_configs.append(
+            BackendPatternConfig(convs.func).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs)._set_input_type_to_index({
+                'weight':
+                1,
+                'bias':
+                2
+            }))
+
+        # (2) Conv + relu
+        # -----------------
+        # 2.1 conv module + relu fusion configs
+        # conv relu fusion, conv module + relu module
+        conv_configs.append(
+            BackendPatternConfig(
+                (torch.nn.ReLU,
+                 convs.root)).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(
+                reverse_sequential_wrapper2(
+                    convs.fused_conv_relu)).set_fused_module(
+                        convs.fused_conv_relu))
+        # conv relu fusion, conv module + functional relu
+        conv_configs.append(
+            BackendPatternConfig(
+                (F.relu,
+                 convs.root)).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(
+                reverse_sequential_wrapper2(
+                    convs.fused_conv_relu)).set_fused_module(
+                        convs.fused_conv_relu))
+        # 2.2 conv module + relu fused module configs
+        # conv relu, fused module
+        conv_configs.append(
+            BackendPatternConfig(convs.fused_conv_relu).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(
+                    convs.reference).set_qat_module(convs.relu_qat))
+        # conv relu, qat fused module
+        conv_configs.append(
+            BackendPatternConfig(convs.relu_qat).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(convs.reference))
+        # 2.3 functional conv + relu configs
+        # conv relu, functional conv + relu module
+        conv_configs.append(
+            BackendPatternConfig(
+                (torch.nn.ReLU, convs.func)).set_observation_type(
+                    observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs))
+        # conv relu, functional conv + functional relu
+        conv_configs.append(
+            BackendPatternConfig((F.relu, convs.func)).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs))
+
+        # fused conv relu
+        conv_configs.append(
+            BackendPatternConfig(convs.fused_conv_relu).set_dtype_configs(
+                dtype_configs)  # noqa: E131
+            .set_qat_module(convs.relu_qat))
+
+        conv_configs.append(
+            BackendPatternConfig(convs.relu_qat).set_dtype_configs(
+                dtype_configs)  # noqa: E131
+            .set_root_module(convs.root).set_reference_quantized_module(
+                convs.reference))
+
+        # (3) Conv + batchnorm (+ relu)
+        # -------------------------------
+        # 3.1 conv bn fusion configs
+        # conv + bn fusion
+        conv_configs.append(
+            BackendPatternConfig(
+                (convs.bn,
+                 convs.root)).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(reverse2(fuse_conv_bn)).set_fused_module(
+                convs.fused_conv_bn))
+        # conv + bn + relu module fusion
+        conv_configs.append(
+            BackendPatternConfig(
+                (nn.ReLU,
+                 (convs.bn,
+                  convs.root))).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(reverse3(fuse_conv_bn_relu)).set_fused_module(
+                convs.fused_conv_bn_relu))
+        # conv + bn + relu functional fusion
+        conv_configs.append(
+            BackendPatternConfig(
+                (F.relu,
+                 (convs.bn,
+                  convs.root))).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_root_module(convs.root).set_fuser_method(
+                reverse3(fuse_conv_bn_relu)).set_fused_module(
+                    convs.fused_conv_bn_relu))
+        # TODO: we can add fusion for torch.relu as well
+
+        # 3.2 conv + bn (+ relu) fused module configs
+        # fused conv bn
+        conv_configs.append(
+            BackendPatternConfig(convs.fused_conv_bn).set_dtype_configs(
+                dtype_configs)  # noqa: E131
+            .set_qat_module(convs.bn_qat))
+
+        # fused conv bn relu
+        conv_configs.append(
+            BackendPatternConfig(convs.fused_conv_bn_relu).set_dtype_configs(
+                dtype_configs)  # noqa: E131
+            .set_qat_module(convs.bn_relu_qat))
+
+        # conv bn, qat fused module
+        conv_configs.append(
+            BackendPatternConfig(convs.bn_qat).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(convs.reference))
+        # conv bn relu, qat fused module
+        conv_configs.append(
+            BackendPatternConfig(convs.bn_relu_qat).set_observation_type(
+                observation_type)  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                convs.root).set_reference_quantized_module(convs.reference))
+
+        # (4) conv transpose and its fusion
+        # 4.1 conv transpose config
+        conv_configs.append(
+            BackendPatternConfig(convs.transpose).set_dtype_configs(
+                dtype_configs)  # noqa: E131
+            .set_root_module(convs.transpose).set_reference_quantized_module(
+                convs.transpose_reference))
+
+        # 4.2 conv transpose + bn fusion
+        conv_configs.append(
+            BackendPatternConfig(
+                (convs.bn, convs.transpose)).set_dtype_configs(
+                    dtype_configs)  # noqa: E131
+            .set_fuser_method(reverse2(fuse_convtranspose_bn)).set_root_module(
+                convs.transpose).set_reference_quantized_module(
+                    convs.transpose_reference))
+
+    return conv_configs
+
+
+def _get_cat_config(dtype_configs: List[DTypeConfig]) -> BackendPatternConfig:
+    return BackendPatternConfig(torch.cat) \
+        .set_observation_type(
+            ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT) \
+        .set_dtype_configs(dtype_configs)
+
+
+def _get_ln_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    ln_configs = []
+    ln_configs.append(
+        BackendPatternConfig(torch.nn.LayerNorm).set_observation_type(
+            ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+        )  # noqa: E131
+        .set_dtype_configs(dtype_configs))
+    ln_configs.append(
+        BackendPatternConfig(
+            torch.nn.functional.layer_norm).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+        .set_dtype_configs(dtype_configs)._set_input_type_to_index({
+            'weight': 2,
+            'bias': 3
+        }))
+    return ln_configs
+
+
+def _get_default_op_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    configs = []
+    default_ops = [
+        torch.nn.ELU,
+        torch.nn.LeakyReLU,
+        torch.nn.Hardswish,
+        torch.nn.InstanceNorm1d,
+        torch.nn.InstanceNorm2d,
+        torch.nn.InstanceNorm3d,
+        torch.nn.Dropout,
+        torch.nn.PReLU,
+        torch.nn.functional.elu,
+        torch.nn.functional.hardswish,
+        torch.nn.functional.leaky_relu,
+        torch.nn.functional.dropout,
+    ]
+    for op in default_ops:
+        configs.append(
+            BackendPatternConfig(op).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs))
+
+    configs.append(
+        BackendPatternConfig(
+            torch.nn.functional.group_norm).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+        .set_dtype_configs(dtype_configs)._set_input_type_to_index({
+            'weight': 2,
+            'bias': 3
+        }))
+
+    configs.append(
+        BackendPatternConfig(
+            torch.nn.functional.instance_norm).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+        .set_dtype_configs(dtype_configs)._set_input_type_to_index({
+            'weight': 3,
+            'bias': 4
+        }))
+    return configs
+
+
+def _get_fixed_qparams_op_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    fixed_qparams_op_configs = []
+    op_to_obs = _FIXED_QPARAMS_OP_TO_OBSERVER.items()
+    for fixed_qparam_op, output_observer in op_to_obs:
+        fixed_qparams_op_configs.append(
+            # TODO: The _overwrite_output keys are temporary, since we don't
+            # want to put observer in the configs we expect that it's provided
+            # by user What we want to put here is the requirement on observers,
+            # in this case dtype, quant_min, quant_max etc., but we need to
+            # first move all configs to backend_config_dict to do that, we'll
+            # remove these keys after we fully migrated everything to use
+            # backend_config_dict
+            BackendPatternConfig(fixed_qparam_op).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs).
+            _set_overwrite_output_fake_quantize(
+                FixedQParamsFakeQuantize.with_args(observer=output_observer)
+            )._set_overwrite_output_observer(output_observer))
+    return fixed_qparams_op_configs
+
+
+def _get_share_qparams_op_configs(dtype_configs):
+    """Get the operator config for the operators that works for both float and
+    quantized input if input is quantized, the output Tensor shares the same
+    quantization parameter with input. Example operator: avgpool2d, reshape,
+    transpose, maxpool2d Example observed operator:
+
+    observer_0 - avgpool2d - observer_0 (same observer instance as input)
+    """
+
+    def _get_share_qprams_op_backend_config(op):
+        return BackendPatternConfig(op) \
+            .set_observation_type(
+                ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT) \
+            .set_dtype_configs(dtype_configs)
+
+    share_qparams_ops = [
+        torch.nn.AdaptiveAvgPool1d,
+        torch.nn.AdaptiveAvgPool2d,
+        torch.nn.AdaptiveAvgPool3d,
+        torch.nn.AvgPool1d,
+        torch.nn.AvgPool2d,
+        torch.nn.AvgPool3d,
+        torch.nn.Hardtanh,
+        torch.nn.Identity,
+        torch.nn.MaxPool1d,
+        torch.nn.MaxPool2d,
+        torch.nn.MaxPool3d,
+        torch.nn.ReLU,
+        torch.adaptive_avg_pool1d,
+        torch.nn.functional.adaptive_avg_pool2d,
+        torch.nn.functional.adaptive_avg_pool3d,
+        torch.nn.functional.hardtanh,
+        torch.nn.functional.hardtanh_,
+        torch.nn.functional.interpolate,
+        torch.nn.functional.max_pool1d,
+        torch.nn.functional.max_pool2d,
+        torch.nn.functional.max_pool3d,
+        torch.nn.functional.relu,
+        torch.nn.functional.relu6,
+        torch.avg_pool1d,
+        torch._C._nn.avg_pool2d,
+        torch._C._nn.avg_pool3d,
+        torch.clamp,
+        torch.flatten,
+        torch.mean,
+        torch.repeat_interleave,
+        torch.transpose,
+        torch.squeeze,
+        torch.stack,
+        torch.unsqueeze,
+        operator.floordiv,
+        'contiguous',
+        'clamp',
+        'detach',
+        'detach_',
+        'mean',
+        'permute',
+        'repeat',
+        'repeat_interleave',
+        'reshape',
+        'resize_',
+        'relu',
+        'relu_',
+        'shape',
+        'size',
+        'squeeze',
+        'squeeze_',
+        'transpose',
+        'unsqueeze',
+        'unsqueeze_',
+        'view',
+    ]
+    return [
+        _get_share_qprams_op_backend_config(op) for op in share_qparams_ops
+    ]
+
+
+def _get_bn_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    """Get configs related to batchnorm."""
+    bn_configs = []
+    bn_to_fused_bn = {
+        torch.nn.BatchNorm2d: nni.BNReLU2d,
+        torch.nn.BatchNorm3d: nni.BNReLU3d,
+    }
+    for bn in bn_to_fused_bn.keys():
+        fused_bn = bn_to_fused_bn[bn]
+        # bn module + relu module fusion config
+        bn_configs.append(
+            BackendPatternConfig(
+                (torch.nn.ReLU,
+                 bn)).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(reverse_sequential_wrapper2(
+                fused_bn)).set_fused_module(fused_bn))
+        # bn module + F.relu fusion config
+        bn_configs.append(
+            BackendPatternConfig(
+                (torch.nn.functional.relu,
+                 bn)).set_dtype_configs(dtype_configs)  # noqa: E131
+            .set_fuser_method(reverse_sequential_wrapper2(
+                bn_to_fused_bn[bn])).set_fused_module(fused_bn))
+        bn_configs.append(
+            BackendPatternConfig(bn).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs))
+
+    # fused bn configs
+    for fused_bn in bn_to_fused_bn.values():
+        bn_configs.append(
+            BackendPatternConfig(fused_bn).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs))
+    return bn_configs
+
+
+def _get_rnn_op_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    rnn_op_configs = []
+    for rnn_op, ref_rnn_op in [(nn.GRUCell, nnqr.GRUCell),
+                               (nn.LSTMCell, nnqr.LSTMCell),
+                               (nn.RNNCell, nnqr.RNNCell),
+                               (nn.LSTM, nnqr.LSTM)]:
+        rnn_op_configs.append(
+            BackendPatternConfig(rnn_op).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                rnn_op).set_reference_quantized_module(ref_rnn_op))
+    return rnn_op_configs
+
+
+def _get_embedding_op_configs(
+        dtype_configs: List[DTypeConfig]) -> List[BackendPatternConfig]:
+    embedding_op_configs = []
+    for embedding_op, qat_embedding_op, ref_embedding_op in [
+        (nn.Embedding, nnqat.Embedding, nnqr.Embedding),
+        (nn.EmbeddingBag, nnqat.EmbeddingBag, nnqr.EmbeddingBag),
+    ]:
+        embedding_op_configs.append(
+            BackendPatternConfig(embedding_op).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_qat_module(qat_embedding_op).
+            set_root_module(embedding_op).set_reference_quantized_module(
+                ref_embedding_op)._set_input_output_observed(
+                    False))  # This is temporary, and will be removed soon
+        # config for qat op
+        embedding_op_configs.append(
+            BackendPatternConfig(qat_embedding_op).set_observation_type(
+                ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT
+            )  # noqa: E131
+            .set_dtype_configs(dtype_configs).set_root_module(
+                embedding_op).set_reference_quantized_module(
+                    ref_embedding_op)._set_input_output_observed(
+                        False))  # This is temporary, and will be removed soon
+    return embedding_op_configs
+
+
+__all__ = [
+    '_get_binary_op_configs',
+    '_get_linear_configs',
+    '_get_conv_configs',
+    '_get_share_qparams_op_configs',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/mapping.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/mapping.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9cc5372bb188f4d641c1e128723ea54dd1e57de
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/mapping.py
@@ -0,0 +1,23 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+from mmrazor import digit_version
+from .academic import get_academic_backend_config
+from .native import get_native_backend_config
+from .openvino import get_openvino_backend_config
+from .tensorrt import get_tensorrt_backend_config
+
+if digit_version(torch.__version__) >= digit_version('1.13.0'):
+    BackendConfigs = {
+        'academic': get_academic_backend_config(),
+        'native': get_native_backend_config(),
+        'tensorrt': get_tensorrt_backend_config(),
+        'openvino': get_openvino_backend_config()
+    }
+else:
+    BackendConfigs = {
+        'academic': None,
+        'native': None,
+        'tensorrt': None,
+        'openvino': None
+    }
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/native.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/native.py
new file mode 100644
index 0000000000000000000000000000000000000000..59085a56abe494b4ca7caa0ca137a47fd726dfe1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/native.py
@@ -0,0 +1,147 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+try:
+    from torch.ao.quantization.backend_config import BackendConfig, DTypeConfig
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BackendConfig = get_placeholder('torch>=1.13')
+    DTypeConfig = get_placeholder('torch>=1.13')
+
+from .common_operator_config_utils import (  # noqa: F401,F403
+    _get_binary_op_configs, _get_bn_configs, _get_cat_config,
+    _get_conv_configs, _get_default_op_configs, _get_embedding_op_configs,
+    _get_fixed_qparams_op_configs, _get_linear_configs, _get_ln_configs,
+    _get_rnn_op_configs, _get_share_qparams_op_configs)
+
+# =====================
+# |  BACKEND CONFIGS  |
+# =====================
+
+
+def get_native_backend_config() -> BackendConfig:
+    """Return the `BackendConfig` for PyTorch Native backend (fbgemm/qnnpack).
+
+    Note:
+        Learn more about BackendConfig, please refer to:
+        https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config # noqa: E501
+    """
+    # TODO: express this BackendConfig as a union of the FBGEMM and QNNPACK
+    # BackendConfigs
+
+    # ===================
+    # |  DTYPE CONFIGS  |
+    # ===================
+    # weighted op int8 dtype config
+    # this is config for ops that has quantized weights, like linear, conv
+    weighted_op_int8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+        weight_dtype=torch.qint8,
+        bias_dtype=torch.float,
+    )
+
+    default_op_quint8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+    )
+
+    default_dynamic_int8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.float,
+        weight_dtype=torch.qint8,
+        bias_dtype=torch.float,
+        # currently the dtype check is not yet enabled, so we provided the
+        # dtype_configs but it is not really used yet,
+        # we will enable it a bit later after we moved everything to
+        # backend_config_dict
+        is_dynamic=True,
+    )
+
+    default_dynamic_float16_dtype_config = DTypeConfig(
+        input_dtype=torch.float16,
+        output_dtype=torch.float,
+        weight_dtype=torch.float16,
+        bias_dtype=torch.float,
+        # currently the dtype check is not yet enabled, so we provided the
+        # dtype_configs but it is not really used yet, we will enable it a bit
+        # later after we moved everything to backend_config_dict
+        is_dynamic=True,
+    )
+
+    # Needed for LayerNorm and f.layer_norm, since currently the kernel only
+    # supports float weights
+    input_output_only_quint8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+        weight_dtype=torch.float,
+        bias_dtype=torch.float,
+    )
+
+    weight_only_quint8_dtype_config = DTypeConfig(
+        input_dtype=torch.float,
+        output_dtype=torch.float,
+        weight_dtype=torch.quint8,
+    )
+
+    weight_only_quint4x2_dtype_config = DTypeConfig(
+        input_dtype=torch.float,
+        output_dtype=torch.float,
+        weight_dtype=torch.quint4x2,
+    )
+
+    conv_dtype_configs = [weighted_op_int8_dtype_config]
+    linear_dtype_configs = [
+        weighted_op_int8_dtype_config,
+        default_dynamic_int8_dtype_config,
+        default_dynamic_float16_dtype_config,
+    ]
+    binary_op_dtype_configs = [weighted_op_int8_dtype_config]
+    default_op_dtype_configs = [default_op_quint8_dtype_config]
+    fixed_qparams_op_dtype_configs = [weighted_op_int8_dtype_config]
+    share_qparams_op_dtype_configs = [default_op_quint8_dtype_config]
+    rnn_op_dtype_configs = [
+        default_dynamic_int8_dtype_config,
+        default_dynamic_float16_dtype_config,
+    ]
+    embedding_op_dtype_configs = [
+        weight_only_quint8_dtype_config,
+        weight_only_quint4x2_dtype_config,
+    ]
+    layer_norm_op_dtype_configs = [input_output_only_quint8_dtype_config]
+
+    return BackendConfig('native') \
+        .set_backend_pattern_configs(
+            _get_conv_configs(conv_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_linear_configs(linear_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_binary_op_configs(binary_op_dtype_configs)) \
+        .set_backend_pattern_config(
+            _get_cat_config(default_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_default_op_configs(default_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_fixed_qparams_op_configs(fixed_qparams_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_share_qparams_op_configs(share_qparams_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_bn_configs(default_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_ln_configs(layer_norm_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_rnn_op_configs(rnn_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_embedding_op_configs(embedding_op_dtype_configs))
+
+
+def get_native_backend_config_dict():
+    """Return the `BackendConfig` for PyTorch Native backend (fbgemm/qnnpack)
+    in dictionary form."""
+    return get_native_backend_config().to_dict()
+
+
+__all__ = [
+    'get_native_backend_config',
+    'get_native_backend_config_dict',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/openvino.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/openvino.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e3051f752bd748f639f2ad9f345319876dc11e2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/openvino.py
@@ -0,0 +1,89 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+try:
+    from torch.ao.quantization.backend_config import (BackendConfig,
+                                                      BackendPatternConfig,
+                                                      DTypeConfig,
+                                                      ObservationType)
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BackendConfig = get_placeholder('torch>=1.13')
+    BackendPatternConfig = get_placeholder('torch>=1.13')
+    DTypeConfig = get_placeholder('torch>=1.13')
+    ObservationType = get_placeholder('torch>=1.13')
+
+from .common_operator_config_utils import (_get_binary_op_configs,
+                                           _get_conv_configs,
+                                           _get_linear_configs,
+                                           _get_share_qparams_op_configs)
+
+
+def get_openvino_backend_config() -> BackendConfig:
+    """Return the `BackendConfig` for the OpenVINO backend.
+
+    Note:
+        Learn more about BackendConfig, please refer to:
+        https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config # noqa: E501
+    """
+    # dtype configs
+    weighted_op_qint8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+        weight_dtype=torch.qint8,
+        bias_dtype=torch.float,
+    )
+    non_weighted_op_qint8_dtype_config = DTypeConfig(
+        input_dtype=torch.quint8,
+        output_dtype=torch.quint8,
+    )
+
+    addmm_config = BackendPatternConfig(torch.addmm) \
+        .set_observation_type(
+            ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
+        .add_dtype_config(weighted_op_qint8_dtype_config) \
+        ._set_input_type_to_index({
+            'bias': 0,
+            'input': 1,
+            'weight': 2,
+        })
+    cat_config = BackendPatternConfig(torch.cat) \
+        .set_observation_type(
+            ObservationType.OUTPUT_SHARE_OBSERVER_WITH_INPUT) \
+        .add_dtype_config(non_weighted_op_qint8_dtype_config)
+    conv_dtype_configs = [
+        weighted_op_qint8_dtype_config,
+    ]
+    linear_dtype_configs = [
+        weighted_op_qint8_dtype_config,
+    ]
+    binary_op_dtype_configs = [
+        weighted_op_qint8_dtype_config,
+    ]
+    share_qparams_op_dtype_configs = [
+        non_weighted_op_qint8_dtype_config,
+    ]
+    # there might be things not supported in fx2trt, but it will error out
+    # during fx2trt conversion and can support them after that
+    return BackendConfig('openvino') \
+        .set_backend_pattern_configs(_get_conv_configs(conv_dtype_configs)) \
+        .set_backend_pattern_config(addmm_config) \
+        .set_backend_pattern_config(cat_config) \
+        .set_backend_pattern_configs(
+            _get_linear_configs(linear_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_binary_op_configs(binary_op_dtype_configs)) \
+        .set_backend_pattern_configs(
+            _get_share_qparams_op_configs(share_qparams_op_dtype_configs))
+
+
+def get_openvino_backend_config_dict():
+    """Return the `BackendConfig` for the OpenVINO backend in dictionary
+    form."""
+    return get_openvino_backend_config().to_dict()
+
+
+__all__ = [
+    'get_openvino_backend_config',
+    'get_openvino_backend_config_dict',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/tensorrt.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/tensorrt.py
new file mode 100644
index 0000000000000000000000000000000000000000..8dddbac91bedfcb533c54d5e38e857ab5883bc42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/backend_config/tensorrt.py
@@ -0,0 +1,68 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+try:
+    from torch.ao.quantization.backend_config import (BackendConfig,
+                                                      BackendPatternConfig,
+                                                      DTypeConfig,
+                                                      ObservationType)
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BackendConfig = get_placeholder('torch>=1.13')
+    BackendPatternConfig = get_placeholder('torch>=1.13')
+    DTypeConfig = get_placeholder('torch>=1.13')
+    ObservationType = get_placeholder('torch>=1.13')
+
+from .common_operator_config_utils import (_get_conv_configs,
+                                           _get_linear_configs)
+
+
+def get_tensorrt_backend_config() -> BackendConfig:
+    """Return the `BackendConfig` for the TensorRT backend.
+
+    Note:
+        Learn more about BackendConfig, please refer to:
+        https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config # noqa: E501
+    """
+    # dtype configs
+    weighted_op_qint8_dtype_config = DTypeConfig(
+        input_dtype=torch.qint8,
+        output_dtype=torch.qint8,
+        weight_dtype=torch.qint8,
+        bias_dtype=torch.float,
+    )
+
+    addmm_config = BackendPatternConfig(torch.addmm) \
+        .set_observation_type(
+            ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
+        .add_dtype_config(weighted_op_qint8_dtype_config) \
+        ._set_input_type_to_index({
+            'bias': 0,
+            'input': 1,
+            'weight': 2,
+        })
+    conv_dtype_configs = [
+        weighted_op_qint8_dtype_config,
+    ]
+    linear_dtype_configs = [
+        weighted_op_qint8_dtype_config,
+    ]
+    # there might be things not supported in fx2trt, but it will error out
+    # during fx2trt conversion and can support them after that
+    return BackendConfig('tensorrt') \
+        .set_backend_pattern_configs(_get_conv_configs(conv_dtype_configs)) \
+        .set_backend_pattern_config(addmm_config) \
+        .set_backend_pattern_configs(
+            _get_linear_configs(linear_dtype_configs))
+
+
+def get_tensorrt_backend_config_dict():
+    """Return the `BackendConfig` for the TensorRT backend in dictionary
+    form."""
+    return get_tensorrt_backend_config().to_dict()
+
+
+__all__ = [
+    'get_tensorrt_backend_config',
+    'get_tensorrt_backend_config_dict',
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/qconfig.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/qconfig.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab682be39d921382f01127e044187aca18902e6b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/quantization/qconfig.py
@@ -0,0 +1,200 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Union
+
+import torch
+from mmengine.config import Config
+
+try:
+    from torch.ao.quantization import FakeQuantize, QConfig
+    from torch.ao.quantization.utils import is_per_tensor
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    QConfig = get_placeholder('torch>=1.13')
+    FakeQuantize = get_placeholder('torch>=1.13')
+    is_per_tensor = get_placeholder('torch>=1.13')
+
+from mmrazor.registry import MODELS
+
+RequiredArgs = [
+    'w_qscheme', 'a_qscheme', 'w_fake_quant', 'a_fake_quant', 'w_observer',
+    'a_observer'
+]
+
+RetainArgsPerTensor = [
+    'dtype', 'qscheme', 'quant_min', 'quant_max', 'reduce_range'
+]
+RetainArgsPerChannel = RetainArgsPerTensor + ['ch_axis']
+
+
+class QSchemeHandler(object):
+    """Convert the qscheme of custom user-friendly qconfig to args needed in
+    observers.
+
+    Args:
+        qdtype (str): Quantization dtype. It should is 'quint8' or 'qint8',
+            and should be supported by the deploy backend. Defaults to 'quint8'
+        bit (int): Quantization bit number. Defaults to 8.
+        is_symmetry (bool): Is symmetry quantization or not. Defaults to True.
+        is_per_channel (bool): Is per-channel quantization or not.
+            Defaults to False.
+    """
+
+    def __init__(self,
+                 qdtype: str = 'quint8',
+                 bit: int = 8,
+                 is_symmetry: bool = True,
+                 is_per_channel: bool = False,
+                 **kwargs):
+        assert qdtype in ('quint8', 'qint8'), \
+            'qdtype is incorrect, it should be quint8 or qint8.'
+        self.qdtype = qdtype
+        self.bit = bit
+        self.is_symmetry = is_symmetry
+        self.is_per_channel = is_per_channel
+
+        if self.is_per_channel:
+            self.torch_qscheme = torch.per_channel_symmetric \
+                if self.is_symmetry else torch.per_channel_affine
+        else:
+            self.torch_qscheme = torch.per_tensor_symmetric \
+                if self.is_symmetry else torch.per_tensor_affine
+        if 'is_symmetric_range' in kwargs:
+            self.is_symmetric_range = kwargs['is_symmetric_range']
+            del kwargs['is_symmetric_range']
+        else:
+            self.is_symmetric_range = False
+        self.kwargs = kwargs
+
+    def to_observer_params(self):
+        """Generate the args needed in observers."""
+        if self.qdtype == 'quint8':
+            quant_min = 0
+            quant_max = 2**self.bit - 1
+        else:
+            quant_max = 2**(self.bit - 1) - 1
+            if self.is_symmetric_range:
+                quant_min = -2**(self.bit - 1) + 1
+            else:
+                quant_min = -2**(self.bit - 1)
+
+        # `dtype` will be same as BackenConfig's
+        naive_para = {
+            'dtype': torch.quint8 if self.qdtype == 'quint8' else torch.qint8,
+            'quant_min': quant_min,
+            'quant_max': quant_max,
+            'qscheme': self.torch_qscheme,
+            'reduce_range': False
+        }
+        if self.is_per_channel:
+            naive_para['ch_axis'] = 0
+        all_para = self.kwargs.copy()
+        all_para.update(naive_para)
+        return all_para
+
+    def __str__(self):
+        """Print generated args for observers."""
+        return f'dtype: {self.dtype} / bit: {self.bit} / is_symmetry: {self.is_symmetry} / \
+                is_per_channel: {self.is_per_channel} \
+                / extra_kwargs: {self.kwargs}'
+
+
+class QConfigHandler():
+    """Convert custom user-friendly qconfig format to torch's QConfig.
+
+    Args:
+        qconfig (Dict | Config): custom user-friendly qconfig format,
+            including setting observers, fakequants and quantization schemes
+            for weights and activations.
+    Note:
+        whether quantization scheme is per-channel or not depends on
+        used observer, if observer support per-channel quantization, its name
+        should contain 'PerChannel'.
+    """
+
+    def __init__(self, qconfig: Union[Dict, Config]):
+        if not self.check_qconfig(qconfig):
+            raise ValueError('The format of qconfig is incorrect.')
+        else:
+            w_observer = MODELS.get(qconfig['w_observer']['type'])
+            a_observer = MODELS.get(qconfig['a_observer']['type'])
+            w_is_per_channel = False
+            a_is_per_channel = False
+            # import pdb;pdb.set_trace()
+            if 'PerChannel' in w_observer.__name__:
+                w_is_per_channel = True
+            if 'PerChannel' in a_observer.__name__:
+                a_is_per_channel = True
+            self.w_qscheme = QSchemeHandler(
+                is_per_channel=w_is_per_channel, **qconfig['w_qscheme'])
+            self.a_qscheme = QSchemeHandler(
+                is_per_channel=a_is_per_channel, **qconfig['a_qscheme'])
+
+            w_fake_quant = MODELS.get(qconfig['w_fake_quant']['type'])
+            w_observer_kwargs = self.w_qscheme.to_observer_params()
+            a_fake_quant = MODELS.get(qconfig['a_fake_quant']['type'])
+            a_observer_kwargs = self.a_qscheme.to_observer_params()
+
+            self.w_fake_quant = w_fake_quant.with_args(
+                observer=w_observer, **w_observer_kwargs)
+            self.a_fake_quant = a_fake_quant.with_args(
+                observer=a_observer, **a_observer_kwargs)
+
+    @staticmethod
+    def check_qconfig(qconfig: Union[Dict, Config]):
+        """Check whether the passed qconfig's format meets requirement."""
+        is_pass = True
+        for arg in RequiredArgs:
+            val = qconfig.get(arg, None)
+            if isinstance(val, dict) and arg in qconfig.keys():
+                continue
+            else:
+                is_pass = False
+                break
+        return is_pass
+
+    def convert(self):
+        """Generate torch's QConfig with built fake_quants."""
+        torch_qconfig = QConfig(
+            weight=self.w_fake_quant, activation=self.a_fake_quant)
+        return torch_qconfig
+
+    @staticmethod
+    def replace_fakequant(fake_quant_org: FakeQuantize,
+                          qscheme_org: QSchemeHandler,
+                          update_qparams: bool = True):
+        """Replace origin fakequants in model with the specified fakequant,
+        which is in favor of deploying the quantized model."""
+        assert isinstance(qscheme_org, QSchemeHandler)
+        observer_kwargs = qscheme_org.to_observer_params()
+        if is_per_tensor(observer_kwargs['qscheme']):
+            observer = MODELS.get('MinMaxObserver')
+            retain_args = RetainArgsPerTensor
+        else:
+            observer = MODELS.get('PerChannelMinMaxObserver')
+            retain_args = RetainArgsPerChannel
+        pop_keys = []
+        for k in observer_kwargs.keys():
+            if k not in retain_args:
+                pop_keys.append(k)
+        for k in pop_keys:
+            observer_kwargs.pop(k)
+        fake_quant = MODELS.get('FakeQuantize')
+        fake_quant_wrapper = fake_quant.with_args(
+            observer=observer, **observer_kwargs)
+        if update_qparams:
+            device = fake_quant_org.scale.device
+            fake_quant_ins = fake_quant_wrapper().to(device)
+            fake_quant_ins.scale.copy_(fake_quant_org.scale)
+            fake_quant_ins.zero_point.copy_(fake_quant_org.zero_point)
+            obs = fake_quant_ins.activation_post_process
+            obs_org = fake_quant_org.activation_post_process
+            obs.min_val.resize_(obs_org.min_val.shape).copy_(obs_org.min_val)
+            obs.max_val.resize_(obs_org.max_val.shape).copy_(obs_org.max_val)
+            return fake_quant_ins
+        else:
+            return fake_quant_wrapper
+
+    def fixed_w_fakequant(self):
+        """Make `self.w_fake_quant` fixed as the consistent fakequant."""
+        self.w_fake_quant = self.replace_fakequant(
+            self.w_fake_quant(), self.w_qscheme, update_qparams=False)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..af69cc96ec73a34ff155b2d7a2c37f0946ef82b5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .candidate import Candidates
+from .fix_subnet import convert_fix_subnet, export_fix_subnet, load_fix_subnet
+
+__all__ = [
+    'load_fix_subnet', 'export_fix_subnet', 'convert_fix_subnet', 'Candidates'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/candidate.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/candidate.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f0ebc344ebc9645ec54255ab3ead0062261e4b4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/candidate.py
@@ -0,0 +1,184 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import UserList
+from typing import Any, Dict, List, Optional, Union
+
+
+class Candidates(UserList):
+    """The data structure of sampled candidate. The format is Union[Dict[str,
+    Dict], List[Dict[str, Dict]]].
+    Examples:
+        >>> candidates = Candidates()
+        >>> subnet_1 = {'1': 'choice1', '2': 'choice2'}
+        >>> candidates.append(subnet_1)
+        >>> candidates
+        [{"{'1': 'choice1', '2': 'choice2'}":
+        {'score': 0.0, 'flops': 0.0, 'params': 0.0, 'latency': 0.0}}]
+        >>> candidates.set_resources(0, 49.9, 'flops')
+        >>> candidates.set_score(0, 100.)
+        >>> candidates
+        [{"{'1': 'choice1', '2': 'choice2'}":
+        {'score': 100.0, 'flops': 49.9, 'params': 0.0, 'latency': 0.0}}]
+        >>> subnet_2 = {'choice_3': 'layer_3', 'choice_4': 'layer_4'}
+        >>> candidates.append(subnet_2)
+        >>> candidates
+        [{"{'1': 'choice1', '2': 'choice2'}":
+        {'score': 100.0, 'flops': 49.9, 'params': 0.0, 'latency': 0.0}},
+        {"{'choice_3': 'layer_3', 'choice_4':'layer_4'}":
+        {'score': 0.0, 'flops': 0.0, 'params': 0.0, 'latency': 0.0}}]
+        >>> candidates.subnets
+        [{'1': 'choice1', '2': 'choice2'},
+        {'choice_3': 'layer_3', 'choice_4': 'layer_4'}]
+        >>> candidates.resources('flops')
+        [49.9, 0.0]
+        >>> candidates.scores
+        [100.0, 0.0]
+    """
+    _format_return = Union[Dict[str, Dict], List[Dict[str, Dict]]]
+    _format_input = Union[Dict, List[Dict], Dict[str, Dict], List[Dict[str,
+                                                                       Dict]]]
+    _indicators = ('score', 'flops', 'params', 'latency')
+
+    def __init__(self, initdata: Optional[_format_input] = None):
+        self.data = []
+        if initdata is not None:
+            initdata = self._format(initdata)
+            if isinstance(initdata, list):
+                self.data = initdata
+            else:
+                self.data.append(initdata)
+
+    @property
+    def scores(self) -> List[float]:
+        """The scores of candidates."""
+        return [
+            round(value.get('score', 0.), 2) for item in self.data
+            for _, value in item.items()
+        ]
+
+    def resources(self, key_indicator: str = 'flops') -> List[float]:
+        """The resources of candidates."""
+        assert key_indicator in ['flops', 'params', 'latency']
+        return [
+            value.get(key_indicator, 0.) for item in self.data
+            for _, value in item.items()
+        ]
+
+    @property
+    def subnets(self) -> List[Dict]:
+        """The subnets of candidates."""
+        import copy
+        assert len(self.data) > 0, ('Got empty candidates.')
+        if 'value_subnet' in self.data[0]:
+            subnets = []
+            for data in self.data:
+                subnet = dict()
+                _data = copy.deepcopy(data)
+                for k1 in ['value_subnet', 'channel_subnet']:
+                    for k2 in self._indicators:
+                        _data[k1].pop(k2)
+                    subnet[k1] = _data[k1]
+                subnets.append(subnet)
+            return subnets
+        else:
+            return [eval(key) for item in self.data for key, _ in item.items()]
+
+    def _format(self, data: _format_input) -> _format_return:
+        """Transform [Dict, ...] to Union[Dict[str, Dict], List[Dict[str,
+        Dict]]].
+
+        Args:
+            data: Four types of input are supported:
+                1. Dict: only include network information.
+                2. List[Dict]: multiple candidates only include network
+                    information.
+                3. Dict[str, Dict]: network information and the corresponding
+                    resources.
+                4. List[Dict[str, Dict]]: multiple candidate information.
+        Returns:
+            Union[Dict[str, Dict], UserList[Dict[str, Dict]]]:
+                A dict or a list of dict that contains a pair of network
+                information and the corresponding Score | FLOPs | Params |
+                Latency results in each candidate.
+        Notes:
+            Score | FLOPs | Params | Latency:
+                1. a candidate resources with a default value of -1 indicates
+                    that it has not been estimated.
+                2. a candidate resources with a default value of 0 indicates
+                    that some indicators have been evaluated.
+        """
+
+        def _format_item(
+                cond: Union[Dict, Dict[str, Dict]]) -> Dict[str, Dict]:
+            """Transform Dict to Dict[str, Dict]."""
+            if isinstance(list(cond.values())[0], dict):
+                for value in list(cond.values()):
+                    for key in list(self._indicators):
+                        value.setdefault(key, 0.)
+                return cond
+            else:
+                return {str(cond): {}.fromkeys(self._indicators, -1)}
+
+        if isinstance(data, UserList):
+            return [_format_item(i) for i in data.data]
+
+        elif isinstance(data, list):
+            return [_format_item(i) for i in data]
+
+        else:
+            return _format_item(data)
+
+    def append(self, item: _format_input) -> None:
+        """Append operation."""
+        item = self._format(item)
+        if isinstance(item, list):
+            self.data = self.data + item
+        else:
+            self.data.append(item)
+
+    def insert(self, i: int, item: _format_input) -> None:
+        """Insert operation."""
+        item = self._format(item)
+        self.data.insert(i, item)
+
+    def extend(self, other: Any) -> None:
+        """Extend operation."""
+        other = self._format(other)
+        if isinstance(other, list):
+            self.data.extend(other)
+        else:
+            self.data.extend([other])
+
+    def set_score(self, i: int, score: float) -> None:
+        """Set score to the specified subnet by index."""
+        self.set_resource(i, score, 'score')
+
+    def set_resource(self,
+                     i: int,
+                     resources: float,
+                     key_indicator: str = 'flops') -> None:
+        """Set resources to the specified subnet by index."""
+        assert key_indicator in ['score', 'flops', 'params', 'latency']
+        for _, value in self.data[i].items():
+            value[key_indicator] = resources
+
+    def update_resources(self, resources: list, start: int = 0) -> None:
+        """Update resources to the specified candidate."""
+        end = start + len(resources)
+        assert len(
+            self.data) >= end, 'Check the number of candidate resources.'
+        for i, item in enumerate(self.data[start:end]):
+            for _, value in item.items():
+                value.update(resources[i])
+
+    def sort_by(self,
+                key_indicator: str = 'score',
+                reverse: bool = True) -> None:
+        """Sort by a specific indicator in descending order.
+
+        Args:
+            key_indicator (str): sort all candidates by key_indicator.
+                Defaults to 'score'.
+            reverse (bool): sort all candidates in descending order.
+        """
+        self.data.sort(
+            key=lambda x: list(x.values())[0][key_indicator], reverse=reverse)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/fix_subnet.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/fix_subnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..56b33be76379846cdf1042ee2d99a25588860a17
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/structures/subnet/fix_subnet.py
@@ -0,0 +1,218 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, Optional, Tuple
+
+from mmengine import fileio
+from torch import nn
+
+from mmrazor.registry import MODELS
+from mmrazor.utils import FixMutable, ValidFixMutable
+from mmrazor.utils.typing import DumpChosen
+
+
+def _dynamic_to_static(model: nn.Module) -> None:
+    # Avoid circular import
+    from mmrazor.models.architectures.dynamic_ops import DynamicMixin
+
+    def traverse_children(module: nn.Module) -> None:
+        for name, mutable in module.items():
+            if isinstance(mutable, DynamicMixin):
+                module[name] = mutable.to_static_op()
+            if hasattr(mutable, '_modules'):
+                traverse_children(mutable._modules)
+
+    if isinstance(model, DynamicMixin):
+        raise RuntimeError('Root model can not be dynamic op.')
+
+    if hasattr(model, '_modules'):
+        traverse_children(model._modules)
+
+
+def load_fix_subnet(model: nn.Module,
+                    subnet_dict: ValidFixMutable,
+                    load_subnet_mode: str = 'mutable',
+                    prefix: str = '',
+                    extra_prefix: str = '') -> None:
+    """Load fix subnet."""
+    if prefix and extra_prefix:
+        raise RuntimeError('`prefix` and `extra_prefix` can not be set at the '
+                           f'same time, but got {prefix} vs {extra_prefix}')
+    if isinstance(subnet_dict, str):
+        subnet_dict = fileio.load(subnet_dict)
+    if not isinstance(subnet_dict, dict):
+        raise TypeError('subnet_dict should be a `str` or `dict`'
+                        f'but got {type(subnet_dict)}')
+
+    from mmrazor.models.architectures.dynamic_ops import DynamicMixin
+    if isinstance(model, DynamicMixin):
+        raise RuntimeError('Root model can not be dynamic op.')
+
+    if load_subnet_mode == 'mutable':
+        _load_fix_subnet_by_mutable(model, subnet_dict, prefix, extra_prefix)
+    elif load_subnet_mode == 'mutator':
+        _load_fix_subnet_by_mutator(model, subnet_dict)
+    else:
+        raise ValueError(f'Invalid load_subnet_mode {load_subnet_mode}, '
+                         'only mutable or mutator is supported.')
+
+    # convert dynamic op to static op
+    _dynamic_to_static(model)
+
+
+def _load_fix_subnet_by_mutable(model: nn.Module,
+                                subnet_dict: Dict,
+                                prefix: str = '',
+                                extra_prefix: str = '') -> None:
+    # Avoid circular import
+    from mmrazor.models.mutables import DerivedMutable, MutableChannelContainer
+    from mmrazor.models.mutables.base_mutable import BaseMutable
+
+    def load_fix_module(module):
+        """Load fix module."""
+        if getattr(module, 'alias', None):
+            alias = module.alias
+            assert alias in subnet_dict, \
+                f'The alias {alias} is not in fix_modules, ' \
+                'please check your `subnet_dict`.'
+            # {chosen=xx, meta=xx)
+            chosen = subnet_dict.get(alias, None)
+        else:
+            if prefix:
+                mutable_name = name.lstrip(prefix)
+            elif extra_prefix:
+                mutable_name = extra_prefix + name
+            else:
+                mutable_name = name
+            if mutable_name not in subnet_dict and not isinstance(
+                    module, MutableChannelContainer):
+                raise RuntimeError(
+                    f'The module name {mutable_name} is not in '
+                    'subnet_dict, please check your `subnet_dict`.')
+            # {chosen=xx, meta=xx)
+            chosen = subnet_dict.get(mutable_name, None)
+
+        if not isinstance(chosen, DumpChosen):
+            chosen = DumpChosen(**chosen)
+        if not module.is_fixed:
+            module.fix_chosen(chosen.chosen)
+
+    for name, module in model.named_modules():
+        # The format of `chosen`` is different for each type of mutable.
+        # In the corresponding mutable, it will check whether the `chosen`
+        # format is correct.
+        if isinstance(module, (MutableChannelContainer)):
+            continue
+
+        if isinstance(module, BaseMutable):
+            if isinstance(module, DerivedMutable):
+                for source_mutable in module.source_mutables:
+                    load_fix_module(source_mutable)
+            else:
+                load_fix_module(module)
+
+
+def _load_fix_subnet_by_mutator(model: nn.Module, mutator_cfg: Dict) -> None:
+    if 'channel_unit_cfg' not in mutator_cfg:
+        raise ValueError('mutator_cfg must contain key channel_unit_cfg, '
+                         f'but got mutator_cfg:'
+                         f'{mutator_cfg}')
+    mutator = MODELS.build(mutator_cfg)
+    mutator.parse_cfg['from_cfg'] = True
+    mutator.prepare_from_supernet(model)
+
+
+def export_fix_subnet(
+        model: nn.Module,
+        export_subnet_mode: str = 'mutable',
+        slice_weight: bool = False,
+        export_channel: bool = False) -> Tuple[FixMutable, Optional[Dict]]:
+    """Export subnet that can be loaded by :func:`load_fix_subnet`. Include
+    subnet structure and subnet weight.
+
+    Args:
+        model (nn.Module): The target model to export.
+        export_subnet_mode (bool): Subnet export method choice.
+            Export by `mutable.dump_chosen()` when set to 'mutable' (NAS)
+            Export by `mutator.config_template()` when set to 'mutator' (Prune)
+        slice_weight (bool): Export subnet weight. Default to False.
+        export_channel (bool): Whether to export the mutator's channel.
+            Often required when finetune is needed for the exported subnet.
+            Default to False.
+
+    Return:
+        fix_subnet (ValidFixMutable): Exported subnet choice config.
+        static_model (Optional[Dict]): Exported static model state_dict.
+            Valid when `slice_weight`=True.
+    """
+    fix_subnet = dict()
+    if export_subnet_mode == 'mutable':
+        fix_subnet = _export_subnet_by_mutable(model)
+    elif export_subnet_mode == 'mutator':
+        fix_subnet = _export_subnet_by_mutator(model, export_channel)
+    else:
+        raise ValueError(f'Invalid export_subnet_mode {export_subnet_mode}, '
+                         'only mutable or mutator is supported.')
+
+    if slice_weight:
+        # export subnet ckpt
+        from mmrazor.models.mutators import ChannelMutator
+
+        copied_model = copy.deepcopy(model)
+        if hasattr(model, 'mutator') and \
+                isinstance(model.mutator, ChannelMutator):
+            _dynamic_to_static(copied_model)
+        else:
+            load_fix_subnet(copied_model, fix_subnet)
+
+        if next(copied_model.parameters()).is_cuda:
+            copied_model.cuda()
+        return fix_subnet, copied_model
+    else:
+        return fix_subnet, None
+
+
+def _export_subnet_by_mutable(model: nn.Module) -> Dict:
+
+    # Avoid circular import
+    from mmrazor.models.mutables import DerivedMutable, MutableChannelContainer
+    from mmrazor.models.mutables.base_mutable import BaseMutable
+
+    def module_dump_chosen(module, fix_subnet):
+        if module.alias:
+            fix_subnet[module.alias] = module.dump_chosen()
+        else:
+            fix_subnet[name] = module.dump_chosen()
+
+    fix_subnet: Dict[str, DumpChosen] = dict()
+    for name, module in model.named_modules():
+        if isinstance(module, BaseMutable):
+            if isinstance(module, MutableChannelContainer):
+                continue
+            elif isinstance(module, DerivedMutable):
+                for source_mutable in module.source_mutables:
+                    module_dump_chosen(source_mutable, fix_subnet)
+            else:
+                module_dump_chosen(module, fix_subnet)
+    return fix_subnet
+
+
+def _export_subnet_by_mutator(model: nn.Module, export_channel: bool) -> Dict:
+    if not hasattr(model, 'mutator'):
+        raise ValueError('model should contain `mutator` attribute, but got '
+                         f'{type(model)} model')
+    fix_subnet = model.mutator.config_template(
+        with_channels=export_channel, with_unit_init_args=True)
+
+    return fix_subnet
+
+
+def convert_fix_subnet(fix_subnet: Dict[str, DumpChosen]):
+    """Convert the fixed subnet to avoid python typing error."""
+    from mmrazor.utils.typing import DumpChosen
+
+    converted_fix_subnet = dict()
+    for k, v in fix_subnet.items():
+        assert isinstance(v, DumpChosen)
+        converted_fix_subnet[k] = dict(chosen=v.chosen)
+
+    return converted_fix_subnet
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..54dfd30ed4b18e8a06a029deb1f4fad0d52d39fb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/__init__.py
@@ -0,0 +1,3 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from ._fast_stop_training_hook import FastStopTrainingHook  # noqa: F401,F403
+from ._fx_models import *  # noqa: F401, F403
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fast_stop_training_hook.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fast_stop_training_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d029b9b89842cdc93ac886659007471b5510368
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fast_stop_training_hook.py
@@ -0,0 +1,27 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmengine.hooks import Hook
+
+from mmrazor.registry import HOOKS
+
+
+@HOOKS.register_module()
+class FastStopTrainingHook(Hook):
+    """Set runner's epoch information to the model."""
+
+    def __init__(self, by_epoch, save_ckpt=False, stop_iter_or_epoch=5):
+        self.by_epoch = by_epoch
+        self.save_ckpt = save_ckpt
+        self.stop_iter_or_epoch = stop_iter_or_epoch
+
+    def after_train_iter(self, runner, batch_idx: int, data_batch: None,
+                         outputs: None) -> None:
+        if self.save_ckpt and self.by_epoch:
+            # If it is epoch-based and want to save weights,
+            # we must run at least 1 epoch.
+            return
+        if runner.iter >= self.stop_iter_or_epoch:
+            raise RuntimeError('quick exit')
+
+    def after_train_epoch(self, runner) -> None:
+        if runner.epoch >= self.stop_iter_or_epoch - 1:
+            raise RuntimeError('quick exit')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fx_models.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fx_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..6bf42e16a5b8de947fa1761116f48c0fc918da85
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/testing/_fx_models.py
@@ -0,0 +1,44 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Tuple, Union
+
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class ConvBNReLU(nn.Module):
+
+    def __init__(
+        self,
+        in_channel: int,
+        out_channel: int,
+        kernel_size: Union[int, Tuple[int, int]] = 1,
+        stride: Union[int, Tuple[int, int]] = 1,
+        padding: Union[int, Tuple[int, int]] = 0,
+        dilation: Union[int, Tuple[int, int]] = 1,
+        groups: int = 1,
+        bias: Union[str, bool] = 'auto',
+        conv_cfg: Optional[Dict] = None,
+        norm_cfg: Optional[Dict] = None,
+        act_cfg: Dict = dict(type='ReLU'),
+        inplace: bool = True,
+        with_spectral_norm: bool = False,
+        padding_mode: str = 'zeros',
+        order: tuple = ('conv', 'norm', 'act'),
+        init_cfg: Optional[Dict] = None,
+    ) -> None:
+        super().__init__()
+        self.conv_module = ConvModule(in_channel, out_channel, kernel_size,
+                                      stride, padding, dilation, groups, bias,
+                                      conv_cfg, norm_cfg, act_cfg, inplace,
+                                      with_spectral_norm, padding_mode, order)
+        self.toy_attr1 = 1
+        self.toy_attr2 = 2
+
+    def forward(self, x):
+        x = self.conv_module.conv(x)
+        x = self.conv_module.norm(x)
+        x = self.conv_module.activate(x)
+        return x
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d23ca632e7ec113abdc5ac63b39a78d669f4804
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .index_dict import IndexDict
+from .log_tools import get_level, print_log
+from .misc import find_latest_checkpoint
+from .placeholder import get_package_placeholder, get_placeholder
+from .runtime_info import RuntimeInfo
+from .setup_env import register_all_modules, setup_multi_processes
+from .typing import (FixMutable, MultiMutatorsRandomSubnet,
+                     SingleMutatorRandomSubnet, SupportRandomSubnet,
+                     ValidFixMutable)
+
+__all__ = [
+    'find_latest_checkpoint', 'setup_multi_processes', 'register_all_modules',
+    'FixMutable', 'ValidFixMutable', 'SingleMutatorRandomSubnet',
+    'MultiMutatorsRandomSubnet', 'SupportRandomSubnet', 'get_placeholder',
+    'IndexDict', 'get_level', 'print_log', 'RuntimeInfo',
+    'get_package_placeholder'
+]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/index_dict.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/index_dict.py
new file mode 100644
index 0000000000000000000000000000000000000000..a053024ac73e10897109951eecd50a3368164850
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/index_dict.py
@@ -0,0 +1,61 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from collections import OrderedDict
+from typing import Tuple, TypeVar
+
+VT = TypeVar('VT')  # Value type
+
+
+class IndexDict(OrderedDict):
+    """IndexDict inherits from OrderedDict[Tuple[int, int], VT]. Each IndexDict
+    object is a OrderDict object which using index(Tuple[int,int]) as key and
+    Any as value.
+
+    The key type is Tuple[a: int,b: int]. It indicates a range in
+    the [a,b).
+
+    IndexDict has three features:
+    1. ensure a key always is a index(Tuple[int,int]).
+    1. ensure the the indexes are sorted by ascending order.
+    2. ensure there is no overlap among indexes.
+    """
+
+    def __setitem__(self, __k: Tuple[int, int], __v):
+        """set item."""
+        start, end = __k
+        assert start < end
+        self._assert_no_over_lap(start, end)
+        super().__setitem__(__k, __v)
+        self._sort()
+
+    def _sort(self):
+        """sort the dict accorrding to index."""
+        items = sorted(self.items())
+        self.clear()
+        for k, v in items:
+            super().__setitem__(k, v)
+
+    def _assert_no_over_lap(self, start, end):
+        """Assert the index [start,end) has no over lav with existed
+        indexes."""
+        assert (start, end) not in self, 'index overlap'
+
+    def __contains__(self, __o) -> bool:
+        """Bool: if the index has any overlap with existed indexes"""
+        if super().__contains__(__o):
+            return True
+        else:
+            self._assert_is_index(__o)
+            start, end = __o
+            existed = False
+            for s, e in self.keys():
+                existed = (s <= start < e or s < end < e or
+                           (s < start and end < e)) or existed
+
+            return existed
+
+    def _assert_is_index(self, index):
+        """Assert the index is an instance of Tuple[int,int]"""
+        assert isinstance(index, Tuple) \
+            and len(index) == 2 \
+            and isinstance(index[0], int) \
+            and isinstance(index[1], int)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/log_tools.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/log_tools.py
new file mode 100644
index 0000000000000000000000000000000000000000..787dc19270f5ea1e5cc6584896848304670cabd1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/log_tools.py
@@ -0,0 +1,29 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+
+from mmengine import MMLogger
+from mmengine import print_log as engine_print_log
+
+
+def get_level(level='info'):
+    if isinstance(level, str):
+        level = level.upper()
+        assert level in logging._nameToLevel
+        level = logging._nameToLevel[level]
+    elif isinstance(level, int):
+        pass
+    else:
+        raise NotImplementedError()
+    return level
+
+
+def print_log(msg, logger='current', level='info'):
+    engine_print_log(msg, logger, get_level(level))
+
+
+def set_log_level(level='debug'):
+    level = get_level(level)
+
+    logger = MMLogger.get_current_instance()
+    logger.handlers[0].setLevel(level)
+    logger.setLevel(level)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/misc.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c12b21ef6ff6b43c7bf0626a43eca61e93fb555
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/misc.py
@@ -0,0 +1,38 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import glob
+import os.path as osp
+import warnings
+
+
+def find_latest_checkpoint(path, suffix='pth'):
+    """Find the latest checkpoint from the working directory.
+
+    Args:
+        path(str): The path to find checkpoints.
+        suffix(str): File extension. Defaults to pth.
+
+    Returns:
+        latest_path(str | None): File path of the latest checkpoint.
+
+    References:
+        .. [1] https://github.com/microsoft/SoftTeacher
+                  /blob/main/ssod/utils/patch.py
+    """
+    if not osp.exists(path):
+        warnings.warn('The path of checkpoints does not exist.')
+        return None
+    if osp.exists(osp.join(path, f'latest.{suffix}')):
+        return osp.join(path, f'latest.{suffix}')
+
+    checkpoints = glob.glob(osp.join(path, f'*.{suffix}'))
+    if len(checkpoints) == 0:
+        warnings.warn('There are no checkpoints in the path.')
+        return None
+    latest = -1
+    latest_path = None
+    for checkpoint in checkpoints:
+        count = int(osp.basename(checkpoint).split('_')[-1].split('.')[0])
+        if count > latest:
+            latest = count
+            latest_path = checkpoint
+    return latest_path
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/placeholder.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/placeholder.py
new file mode 100644
index 0000000000000000000000000000000000000000..9af35f7a42fb1835973ef2c07d185a4318c9a675
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/placeholder.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+def get_placeholder(string: str) -> object:
+    """Get placeholder instance which can avoid raising errors when down-stream
+    dependency is not installed properly.
+
+    Args:
+        string (str): the dependency's name, i.e. `mmcls`
+
+    Raises:
+        ImportError: raise it when the dependency is not installed properly.
+
+    Returns:
+        object: PlaceHolder instance.
+    """
+
+    def raise_import_error(package_name):
+        raise ImportError(
+            f'`{package_name}` is not installed properly, plz check.')
+
+    class PlaceHolder():
+
+        def __init__(self) -> None:
+            raise_import_error(string)
+
+    return PlaceHolder
+
+
+def get_package_placeholder(string: str) -> object:
+    """Get placeholder instance which can avoid raising errors when down-stream
+    dependency is not installed properly.
+
+    Args:
+        string (str): the dependency's name, i.e. `mmcls`
+
+    Raises:
+        ImportError: raise it when the dependency is not installed properly.
+
+    Returns:
+        object: PlaceHolder instance.
+    """
+
+    def raise_import_error(package_name):
+        raise ImportError(
+            f'`{package_name}` is not installed properly, plz check.')
+
+    class PlaceHolderMetaclass(type):
+        """Used to support usage of PlaceHolder.xxxx."""
+
+        def __getattr__(self, name):
+            raise_import_error(string)
+
+    class PlaceHolder(metaclass=PlaceHolderMetaclass):
+
+        def __init__(self) -> None:
+            raise_import_error(string)
+
+    return PlaceHolder
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/runtime_info.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/runtime_info.py
new file mode 100644
index 0000000000000000000000000000000000000000..f117c2d060f1eba6ef7a1edaaf683bf0649cee10
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/runtime_info.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+
+from mmengine import Config, MessageHub
+
+
+class RuntimeInfo():
+    """A tools to get runtime info in MessageHub."""
+
+    @classmethod
+    def info(cls):
+        hub = MessageHub.get_current_instance()
+        return hub.runtime_info
+
+    @classmethod
+    def get_info(cls, key):
+        info = cls.info()
+        if key in info:
+            return info[key]
+        else:
+            raise KeyError(key)
+
+    @classmethod
+    def epoch(cls):
+        return cls.get_info('epoch')
+
+    @classmethod
+    def max_epochs(cls):
+        return cls.get_info('max_epochs')
+
+    @classmethod
+    def iter(cls):
+        return cls.get_info('iter')
+
+    @classmethod
+    def max_iters(cls):
+        return cls.get_info('max_iters')
+
+    @classmethod
+    def iter_by_epoch(cls):
+        iter_per_epoch = math.ceil(cls.max_iters() / cls.max_epochs())
+        return cls.iter() % iter_per_epoch
+
+    @classmethod
+    def iter_pre_epoch(cls):
+        iter_per_epoch = math.ceil(cls.max_iters() / cls.max_epochs())
+        return iter_per_epoch
+
+    @classmethod
+    def config(cls):
+        cfg: str = cls.get_info('cfg')
+        config = Config.fromstring(cfg, '.py')
+        return config
+
+    @classmethod
+    def work_dir(cls):
+        config = cls.config()
+        return config['work_dir']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/setup_env.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/setup_env.py
new file mode 100644
index 0000000000000000000000000000000000000000..a091933aa686150184b3d5fd94f80cb2f356cf1c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/setup_env.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import datetime
+import os
+import platform
+import warnings
+
+import cv2
+import torch.multiprocessing as mp
+from mmengine import DefaultScope
+
+
+def setup_multi_processes(cfg):
+    """Setup multi-processing environment variables."""
+    # set multi-process start method as `fork` to speed up the training
+    if platform.system() != 'Windows':
+        mp_start_method = cfg.get('mp_start_method', 'fork')
+        current_method = mp.get_start_method(allow_none=True)
+        if current_method is not None and current_method != mp_start_method:
+            warnings.warn(
+                f'Multi-processing start method `{mp_start_method}` is '
+                f'different from the previous setting `{current_method}`.'
+                f'It will be force set to `{mp_start_method}`. You can change '
+                f'this behavior by changing `mp_start_method` in your config.')
+        mp.set_start_method(mp_start_method, force=True)
+
+    # disable opencv multithreading to avoid system being overloaded
+    opencv_num_threads = cfg.get('opencv_num_threads', 0)
+    cv2.setNumThreads(opencv_num_threads)
+
+    # setup OMP threads
+    # This code is referred from https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py  # noqa
+    if 'OMP_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1:
+        omp_num_threads = 1
+        warnings.warn(
+            f'Setting OMP_NUM_THREADS environment variable for each process '
+            f'to be {omp_num_threads} in default, to avoid your system being '
+            f'overloaded, please further tune the variable for optimal '
+            f'performance in your application as needed.')
+        os.environ['OMP_NUM_THREADS'] = str(omp_num_threads)
+
+    # setup MKL threads
+    if 'MKL_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1:
+        mkl_num_threads = 1
+        warnings.warn(
+            f'Setting MKL_NUM_THREADS environment variable for each process '
+            f'to be {mkl_num_threads} in default, to avoid your system being '
+            f'overloaded, please further tune the variable for optimal '
+            f'performance in your application as needed.')
+        os.environ['MKL_NUM_THREADS'] = str(mkl_num_threads)
+
+
+def register_all_modules(init_default_scope: bool = True) -> None:
+    """Register all modules in mmrazor into the registries.
+
+    Args:
+        init_default_scope (bool): Whether initialize the mmrazor default scope.
+            When `init_default_scope=True`, the global default scope will be
+            set to `mmrazor`, and all registries will build modules from mmrazor's
+            registry node. To understand more about the registry, please refer
+            to https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/registry.md
+            Defaults to True.
+    """  # noqa
+
+    import mmrazor.datasets  # noqa: F401,F403
+    import mmrazor.engine  # noqa: F401,F403
+    import mmrazor.implementations  # noqa: F401,F403
+    import mmrazor.models  # noqa: F401,F403
+    import mmrazor.structures  # noqa: F401,F403
+    if init_default_scope:
+        never_created = DefaultScope.get_current_instance() is None \
+                        or not DefaultScope.check_instance_created('mmrazor')
+        if never_created:
+            DefaultScope.get_instance('mmrazor', scope_name='mmrazor')
+            return
+        current_scope = DefaultScope.get_current_instance()
+        if current_scope.scope_name != 'mmrazor':  # type: ignore
+            warnings.warn(
+                'The current default scope '  # type: ignore
+                f'"{current_scope.scope_name}" is not '
+                '"mmrazor", `register_all_modules` will force the current'
+                'default scope to be "mmrazor". If this is not expected, '
+                'please set `init_default_scope=False`.')
+            # avoid name conflict
+            new_instance_name = f'mmrazor-{datetime.datetime.now()}'
+            DefaultScope.get_instance(new_instance_name, scope_name='mmrazor')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/typing.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/typing.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d1126f2ac81f27ca42b42895fdb22a5003c411c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/utils/typing.py
@@ -0,0 +1,37 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from pathlib import Path
+from typing import Any, Dict, List, NamedTuple, Optional, Union
+
+FixMutable = Dict[str, Any]
+ValidFixMutable = Union[str, Path, FixMutable]
+
+# RANDOM_SUBNET means the subnet sampled by one or more mutators. Usually used
+# for supernet training or searching.
+
+# `SingleMutatorRandomSubnet`` sampled by a mutator, its format is a dict, the
+# keys of the dict are the group_id in the mutator‘s search groups, and the
+# values ​​of the dict are the choices corresponding to all mutables in each
+# search group.
+
+# One search group may contains N mutables. More details of search groups see
+# docs for :class:`mmrazor.models.mutators.OneShotModuleMutator`.
+SingleMutatorRandomSubnet = Dict[int, Any]
+
+# For some more complex algorithms, multiple mutators may be used, and the
+# corresponding format will be a list
+MultiMutatorsRandomSubnet = List[SingleMutatorRandomSubnet]
+
+SupportRandomSubnet = Union[SingleMutatorRandomSubnet,
+                            MultiMutatorsRandomSubnet]
+
+Chosen = Union[str, float, List[str]]
+ChosenMeta = Optional[Dict[str, Any]]
+
+
+class DumpChosen(NamedTuple):
+    chosen: Chosen
+    meta: ChosenMeta = None
+
+
+# DumpChosen = NamedTuple('DumpChosen', [('chosen', Chosen),
+#                                        ('meta', ChosenMeta)])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/version.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/version.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a60b40f31da1d6681d70010af3556b3b2363e5d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/version.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved
+
+__version__ = '1.0.0'
+
+
+def parse_version_info(version_str):
+    """Parse a version string into a tuple.
+
+    Args:
+        version_str (str): The version string.
+    Returns:
+        tuple[int | str]: The version info, e.g., "1.3.0" is parsed into
+            (1, 3, 0), and "2.0.0rc1" is parsed into (2, 0, 0, 'rc1').
+    """
+    version_info = []
+    for x in version_str.split('.'):
+        if x.isdigit():
+            version_info.append(int(x))
+        elif x.find('rc') != -1:
+            patch_version = x.split('rc')
+            version_info.append(int(patch_version[0]))
+            version_info.append(f'rc{patch_version[1]}')
+    return tuple(version_info)
+
+
+version_info = parse_version_info(__version__)
+
+__all__ = ['__version__', 'version_info', 'parse_version_info']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..993202428d447279408d132f76f008ce4a9dd568
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .local_visualizer import modify
+
+__all__ = ['modify']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/local_visualizer.py b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/local_visualizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..5834e8eca9824ddb19c661d0114eedfa80d88770
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/mmrazor/visualization/local_visualizer.py
@@ -0,0 +1,115 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import warnings
+from typing import Optional, Tuple
+
+import cv2
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+import torch.nn.functional as F
+from mmengine.dist import master_only
+from mmengine.visualization.utils import (convert_overlay_heatmap,
+                                          img_from_canvas)
+
+
+@master_only
+def modify(featmap: torch.Tensor,
+           overlaid_image: Optional[np.ndarray] = None,
+           channel_reduction: Optional[str] = 'pixel_wise_max',
+           topk: int = 20,
+           arrangement: Tuple[int, int] = (4, 5),
+           resize_shape: Optional[tuple] = None,
+           alpha: float = 0.5):
+    assert isinstance(featmap,
+                      torch.Tensor), (f'`featmap` should be torch.Tensor,'
+                                      f' but got {type(featmap)}')
+    assert featmap.ndim == 3, f'Input dimension must be 3, ' \
+                              f'but got {featmap.ndim}'
+    featmap = featmap.detach().cpu()
+
+    if overlaid_image is not None:
+        if overlaid_image.ndim == 2:
+            overlaid_image = cv2.cvtColor(overlaid_image, cv2.COLOR_GRAY2RGB)
+
+        if overlaid_image.shape[:2] != featmap.shape[1:]:
+            warnings.warn(f'Since the spatial dimensions of '
+                          f'overlaid_image: {overlaid_image.shape[:2]} and '
+                          f'featmap: {featmap.shape[1:]} are not same, '
+                          f'the feature map will be interpolated. '
+                          f'This may cause mismatch problems ！')
+            if resize_shape is None:
+                overlaid_image_h, overlaid_image_w = overlaid_image.shape[:2]
+                feat_h, feat_w = featmap.shape[-2:]
+                if feat_h / feat_w > overlaid_image_h / overlaid_image_w:
+                    feat_h = round(feat_w * overlaid_image_h /
+                                   overlaid_image_w)
+                else:
+                    feat_w = round(feat_h * overlaid_image_w /
+                                   overlaid_image_h)
+                featmap = featmap[..., :feat_h, :feat_w]
+                featmap = F.interpolate(
+                    featmap[None], overlaid_image.shape[:2],
+                    mode='bilinear')[0]
+
+    if resize_shape is not None:
+        featmap = F.interpolate(
+            featmap[None], resize_shape, mode='bilinear',
+            align_corners=False)[0]
+        if overlaid_image is not None:
+            overlaid_image = cv2.resize(overlaid_image, resize_shape[::-1])
+
+    if channel_reduction is not None:
+        assert channel_reduction in [
+            'squeeze_mean', 'select_max', 'pixel_wise_max'], \
+            f'Mode only support "squeeze_mean", "select_max", ' \
+            f'"pixel_wise_max", but got {channel_reduction}'
+        if channel_reduction == 'select_max':
+            sum_channel_featmap = torch.sum(featmap, dim=(1, 2))
+            _, indices = torch.topk(sum_channel_featmap, 1)
+            feat_map = featmap[indices]
+        elif channel_reduction == 'squeeze_mean':
+            feat_map = torch.mean(featmap, dim=0)
+        else:
+            feat_map = torch.max(featmap, dim=0)[0]
+        return convert_overlay_heatmap(feat_map, overlaid_image, alpha)
+    elif topk <= 0:
+        featmap_channel = featmap.shape[0]
+        assert featmap_channel in [
+            1, 3
+        ], ('The input tensor channel dimension must be 1 or 3 '
+            'when topk is less than 1, but the channel '
+            f'dimension you input is {featmap_channel}, you can use the'
+            ' channel_reduction parameter or set topk greater than '
+            '0 to solve the error')
+        return convert_overlay_heatmap(featmap, overlaid_image, alpha)
+    else:
+        row, col = arrangement
+        channel, height, width = featmap.shape
+        assert row * col >= topk, 'The product of row and col in ' \
+                                  'the `arrangement` is less than ' \
+                                  'topk, please set the ' \
+                                  '`arrangement` correctly'
+
+        # Extract the feature map of topk
+        topk = min(channel, topk)
+        sum_channel_featmap = torch.sum(featmap, dim=(1, 2))
+        _, indices = torch.topk(sum_channel_featmap, topk)
+        topk_featmap = featmap[indices]
+
+        fig = plt.figure(frameon=False)
+        # Set the window layout
+        fig.subplots_adjust(
+            left=0, right=1, bottom=0, top=1, wspace=0, hspace=0)
+        dpi = fig.get_dpi()
+        fig.set_size_inches((width * col + 1e-2) / dpi,
+                            (height * row + 1e-2) / dpi)
+        for i in range(topk):
+            axes = fig.add_subplot(row, col, i + 1)
+            axes.axis('off')
+            axes.text(2, 15, f'channel: {indices[i]}', fontsize=10)
+            axes.imshow(
+                convert_overlay_heatmap(topk_featmap[i], overlaid_image,
+                                        alpha))
+        image = img_from_canvas(fig.canvas)
+        plt.close(fig)
+        return image
diff --git a/cv/distiller/CWD/pytorch/mmrazor/model-index.yml b/cv/distiller/CWD/pytorch/mmrazor/model-index.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6204bceae50e54da6c5b6475f02dc472fef5722c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/model-index.yml
@@ -0,0 +1,31 @@
+Import:
+  - configs/distill/mmseg/cwd/metafile.yml
+  - configs/distill/mmdet/cwd/metafile.yml
+  - configs/distill/mmcls/wsld/metafile.yml
+  - configs/distill/mmcls/rkd/metafile.yml
+  - configs/distill/mmcls/abloss/metafile.yml
+  - configs/distill/mmcls/byot/metafile.yml
+  - configs/distill/mmcls/dafl/metafile.yml
+  - configs/distill/mmcls/dfad/metafile.yml
+  - configs/distill/mmcls/dkd/metafile.yml
+  - configs/distill/mmcls/fitnets/metafile.yml
+  - configs/distill/mmcls/kd/metafile.yml
+  - configs/distill/mmcls/zskt/metafile.yml
+  - configs/distill/mmdet/fbkd/metafile.yml
+  - configs/distill/mmcls/factor_transfer/metafile.yml
+  - configs/distill/mmcls/ofd/metafile.yml
+  - configs/nas/mmcls/spos/metafile.yml
+  - configs/nas/mmcls/autoslim/metafile.yml
+  - configs/nas/mmcls/darts/metafile.yml
+  - configs/nas/mmdet/detnas/metafile.yml
+  - configs/distill/mmdet/pkd/metafile.yml
+  - configs/distill/mmdet3d/pkd/metafile.yml
+  - configs/distill/mmcls/deit/metafile.yml
+  - configs/pruning/mmcls/group_fisher/mobilenet/metafile.yml
+  - configs/pruning/mmcls/group_fisher/resnet50/metafile.yml
+  - configs/pruning/mmdet/group_fisher/retinanet/metafile.yml
+  - configs/pruning/mmcls/l1-norm/metafile.yml
+  # - configs/pruning/mmcls/dmcp/metafile.yml
+  - configs/quantization/ptq/base/metafile.yml
+  - configs/quantization/qat/base/metafile.yml
+  - configs/quantization/qat/lsq/metafile.yml
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6da5adea757ffc79ac35e544d4afe85c5f44a90d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements.txt
@@ -0,0 +1,3 @@
+-r requirements/optional.txt
+-r requirements/runtime.txt
+-r requirements/tests.txt
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/docs.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/docs.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6934d41bd43cb07a00231cd5c26dc3b7cf635739
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/docs.txt
@@ -0,0 +1,7 @@
+docutils==0.16.0
+m2r
+myst-parser
+git+https://github.com/open-mmlab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
+sphinx==4.0.2
+sphinx-copybutton
+sphinx_markdown_tables
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/mminstall.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/mminstall.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e9a81c4359b8d0a7d754fb69497a959b3bdba0cf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/mminstall.txt
@@ -0,0 +1,2 @@
+mmcv>=2.0.0rc1
+mmengine>=0.1.0,<1.0.0
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/optional.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/optional.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bb7173848927341cd82701deb8cb8857c4ae03d6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/optional.txt
@@ -0,0 +1,4 @@
+pydacefit
+pySOT==0.2.3
+scipy
+timm
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/readthedocs.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/readthedocs.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d1e7e86f500417e23c55481e4032e48ef28d2e19
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/readthedocs.txt
@@ -0,0 +1,4 @@
+mmcv>=1.3.8
+ordered_set
+torch
+torchvision
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/runtime.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/runtime.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e2c66bf8e9aa6ff1729526f0322ad61927babd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/runtime.txt
@@ -0,0 +1,2 @@
+ordered_set
+typing_extensions;python_version<"3.8"
diff --git a/cv/distiller/CWD/pytorch/mmrazor/requirements/tests.txt b/cv/distiller/CWD/pytorch/mmrazor/requirements/tests.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5980dc3034b04d1bffc13ce7e3b1d3f30fc28202
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/requirements/tests.txt
@@ -0,0 +1,11 @@
+coverage
+flake8
+interrogate
+isort==4.3.21
+nbconvert
+nbformat
+numpy < 1.24.0 # A temporary solution for tests with mmdet.
+onnx
+pytest
+xdoctest >= 0.10.0
+yapf
diff --git a/cv/distiller/CWD/pytorch/mmrazor/resources/design_and_implement.png b/cv/distiller/CWD/pytorch/mmrazor/resources/design_and_implement.png
new file mode 100644
index 0000000000000000000000000000000000000000..547559d4f33c5adfa4aea23bcc3ab3175abb9f13
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/resources/design_and_implement.png differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/resources/mmrazor-logo.png b/cv/distiller/CWD/pytorch/mmrazor/resources/mmrazor-logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..f2714e3702a3434a996f2b1432e8d0e67b5b8bd5
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/resources/mmrazor-logo.png differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/resources/qq_group_qrcode.jpg b/cv/distiller/CWD/pytorch/mmrazor/resources/qq_group_qrcode.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7c6b04f561da283ae622f4219ea9b8cabf8f301a
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/resources/qq_group_qrcode.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/resources/xiaozhushou_weixin_qrcode.jpeg b/cv/distiller/CWD/pytorch/mmrazor/resources/xiaozhushou_weixin_qrcode.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..873a0ba40a5af1baec49c11b16f86edd79714eab
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/resources/xiaozhushou_weixin_qrcode.jpeg differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/resources/zhihu_qrcode.jpg b/cv/distiller/CWD/pytorch/mmrazor/resources/zhihu_qrcode.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c745fb027f06564d41794e9a40069b06c34e2bb5
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/resources/zhihu_qrcode.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/setup.cfg b/cv/distiller/CWD/pytorch/mmrazor/setup.cfg
new file mode 100644
index 0000000000000000000000000000000000000000..f3eceac298eff9a9ef3d26d76de83fdc4f7697cb
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/setup.cfg
@@ -0,0 +1,24 @@
+[bdist_wheel]
+universal=1
+
+[aliases]
+test=pytest
+
+[yapf]
+based_on_style = pep8
+blank_line_before_nested_class_or_def = true
+split_before_expression_after_opening_paren = true
+
+[isort]
+line_length = 79
+multi_line_output = 0
+extra_standard_library = pkg_resources,setuptools
+known_first_party = mmrazor
+known_third_party=cv2,mmcls,mmcv,mmdet,mmseg,numpy,ordered_set,packaging,pytest,pytorch_sphinx_theme,torch,yaml
+no_lines_before = STDLIB,LOCALFOLDER
+default_section = THIRDPARTY
+
+[codespell]
+skip = *.ipynb
+quiet-level = 3
+ignore-words-list = patten,confectionary,nd,ty,formating
diff --git a/cv/distiller/CWD/pytorch/mmrazor/setup.py b/cv/distiller/CWD/pytorch/mmrazor/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..4d69ff2d441d840f1443932456390c36cf544de7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/setup.py
@@ -0,0 +1,192 @@
+import os
+import os.path as osp
+import shutil
+import sys
+import warnings
+from setuptools import find_packages, setup
+
+
+def readme():
+    with open('README.md', encoding='utf-8') as f:
+        content = f.read()
+    return content
+
+
+def get_version():
+    version_file = 'mmrazor/version.py'
+    with open(version_file, 'r', encoding='utf-8') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+
+
+def parse_requirements(fname='requirements.txt', with_version=True):
+    """Parse the package dependencies listed in a requirements file but strips
+    specific versioning information.
+
+    Args:
+        fname (str): path to requirements file
+        with_version (bool, default=False): if True include version specs
+
+    Returns:
+        List[str]: list of requirements items
+
+    CommandLine:
+        python -c "import setup; print(setup.parse_requirements())"
+    """
+    import re
+    import sys
+    from os.path import exists
+    require_fpath = fname
+
+    def parse_line(line):
+        """Parse information from a line in a requirements text file."""
+        if line.startswith('-r '):
+            # Allow specifying requirements in other files
+            target = line.split(' ')[1]
+            for info in parse_require_file(target):
+                yield info
+        else:
+            info = {'line': line}
+            if line.startswith('-e '):
+                info['package'] = line.split('#egg=')[1]
+            else:
+                # Remove versioning from the package
+                pat = '(' + '|'.join(['>=', '==', '>']) + ')'
+                parts = re.split(pat, line, maxsplit=1)
+                parts = [p.strip() for p in parts]
+
+                info['package'] = parts[0]
+                if len(parts) > 1:
+                    op, rest = parts[1:]
+                    if ';' in rest:
+                        # Handle platform specific dependencies
+                        # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies
+                        version, platform_deps = map(str.strip,
+                                                     rest.split(';'))
+                        info['platform_deps'] = platform_deps
+                    else:
+                        version = rest  # NOQA
+                    info['version'] = (op, version)
+            yield info
+
+    def parse_require_file(fpath):
+        with open(fpath, 'r') as f:
+            for line in f.readlines():
+                line = line.strip()
+                if line and not line.startswith('#'):
+                    for info in parse_line(line):
+                        yield info
+
+    def gen_packages_items():
+        if exists(require_fpath):
+            for info in parse_require_file(require_fpath):
+                parts = [info['package']]
+                if with_version and 'version' in info:
+                    parts.extend(info['version'])
+                if not sys.version.startswith('3.4'):
+                    # apparently package_deps are broken in 3.4
+                    platform_deps = info.get('platform_deps')
+                    if platform_deps is not None:
+                        parts.append(';' + platform_deps)
+                item = ''.join(parts)
+                yield item
+
+    packages = list(gen_packages_items())
+    return packages
+
+
+def add_mim_extension():
+    """Add extra files that are required to support MIM into the package.
+
+    These files will be added by creating a symlink to the originals if the
+    package is installed in `editable` mode (e.g. pip install -e .), or by
+    copying from the originals otherwise.
+    """
+
+    # parse installment mode
+    if 'develop' in sys.argv:
+        # installed by `pip install -e .`
+        mode = 'symlink'
+    elif 'sdist' in sys.argv or 'bdist_wheel' in sys.argv:
+        # installed by `pip install .`
+        # or create source distribution by `python setup.py sdist`
+        mode = 'copy'
+    else:
+        return
+
+    filenames = ['tools', 'configs', 'model-index.yml']
+    repo_path = osp.dirname(__file__)
+    mim_path = osp.join(repo_path, 'mmrazor', '.mim')
+    os.makedirs(mim_path, exist_ok=True)
+
+    for filename in filenames:
+        if osp.exists(filename):
+            src_path = osp.join(repo_path, filename)
+            tar_path = osp.join(mim_path, filename)
+
+            if osp.isfile(tar_path) or osp.islink(tar_path):
+                os.remove(tar_path)
+            elif osp.isdir(tar_path):
+                shutil.rmtree(tar_path)
+
+            if mode == 'symlink':
+                src_relpath = osp.relpath(src_path, osp.dirname(tar_path))
+                try:
+                    os.symlink(src_relpath, tar_path)
+                except OSError:
+                    # Creating a symbolic link on windows may raise an
+                    # `OSError: [WinError 1314]` due to privilege. If
+                    # the error happens, the src file will be copied
+                    mode = 'copy'
+                    warnings.warn(
+                        f'Failed to create a symbolic link for {src_relpath}, '
+                        f'and it will be copied to {tar_path}')
+                else:
+                    continue
+
+            elif mode == 'copy':
+                if osp.isfile(src_path):
+                    shutil.copyfile(src_path, tar_path)
+                elif osp.isdir(src_path):
+                    shutil.copytree(src_path, tar_path)
+                else:
+                    warnings.warn(f'Cannot copy file {src_path}.')
+            else:
+                raise ValueError(f'Invalid mode {mode}')
+
+
+if __name__ == '__main__':
+    add_mim_extension()
+    setup(
+        name='mmrazor',
+        version=get_version(),
+        description='OpenMMLab Model Compression Toolbox and Benchmark',
+        long_description=readme(),
+        long_description_content_type='text/markdown',
+        keywords='computer vision, model compression',
+        packages=find_packages(exclude=('configs', 'tools', 'demo')),
+        include_package_data=True,
+        classifiers=[
+            'Development Status :: 4 - Beta',
+            'License :: OSI Approved :: Apache Software License',
+            'Operating System :: OS Independent',
+            'Programming Language :: Python :: 3',
+            'Programming Language :: Python :: 3.5',
+            'Programming Language :: Python :: 3.6',
+            'Programming Language :: Python :: 3.7',
+            'Programming Language :: Python :: 3.8',
+            'Programming Language :: Python :: 3.9',
+            'Topic :: Scientific/Engineering :: Artificial Intelligence',
+        ],
+        url='https://github.com/open-mmlab/mmrazor',
+        author='MMRazor Contributors',
+        author_email='openmmlab@gmail.com',
+        license='Apache License 2.0',
+        install_requires=parse_requirements('requirements/runtime.txt'),
+        extras_require={
+            'all': parse_requirements('requirements.txt'),
+            'tests': parse_requirements('requirements/tests.txt'),
+            'optional': parse_requirements('requirements/optional.txt'),
+            'mim': parse_requirements('requirements/mminstall.txt'),
+        },
+        zip_safe=False)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_220M.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_220M.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2aa967c8d31f13cc58ee21b9778c911374b81304
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_220M.yaml
@@ -0,0 +1,474 @@
+backbone.conv1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv1.conv.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+backbone.conv1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv2.bn.mutable_num_features:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.conv2.conv.mutable_in_channels:
+  current_choice: 280
+  origin_channels: 480
+backbone.conv2.conv.mutable_out_channels:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.layer1.0.conv.0.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer1.0.conv.1.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.0.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer5.0.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer5.0.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.0.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer6.0.conv.0.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer6.0.conv.0.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.0.conv.2.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.0.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.0.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.2.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.0.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.0.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.2.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer7.0.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer7.0.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.bn.mutable_num_features:
+  current_choice: 280
+  origin_channels: 480
+backbone.layer7.0.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.conv.mutable_out_channels:
+  current_choice: 280
+  origin_channels: 480
+head.fc.mutable_in_features:
+  current_choice: 1920
+  origin_channels: 1920
+head.fc.mutable_out_features:
+  current_choice: 1000
+  origin_channels: 1000
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_320M.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_320M.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2c63bcf767dfca156277d43f0c0f96135f383986
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_320M.yaml
@@ -0,0 +1,474 @@
+backbone.conv1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv1.conv.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+backbone.conv1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv2.bn.mutable_num_features:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.conv2.conv.mutable_in_channels:
+  current_choice: 480
+  origin_channels: 480
+backbone.conv2.conv.mutable_out_channels:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.layer1.0.conv.0.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer1.0.conv.1.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.bn.mutable_num_features:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.0.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.conv.mutable_out_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.1.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.0.conv.mutable_in_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.1.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.bn.mutable_num_features:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.1.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.conv.mutable_out_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.2.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.0.conv.mutable_in_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.2.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.bn.mutable_num_features:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.2.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.conv.mutable_out_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.3.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.0.conv.mutable_in_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.3.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.bn.mutable_num_features:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer4.3.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.conv.mutable_out_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer5.0.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.0.conv.mutable_in_channels:
+  current_choice: 56
+  origin_channels: 96
+backbone.layer5.0.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.0.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.1.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.0.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.1.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.1.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.2.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.0.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.2.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer5.2.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer6.0.conv.0.bn.mutable_num_features:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.0.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer6.0.conv.0.conv.mutable_out_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.bn.mutable_num_features:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_in_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_out_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.0.conv.2.conv.mutable_in_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.0.bn.mutable_num_features:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.0.conv.mutable_out_channels:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.1.bn.mutable_num_features:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_in_channels:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_out_channels:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.2.conv.mutable_in_channels:
+  current_choice: 960
+  origin_channels: 1440
+backbone.layer6.2.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer7.0.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer7.0.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.bn.mutable_num_features:
+  current_choice: 480
+  origin_channels: 480
+backbone.layer7.0.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.conv.mutable_out_channels:
+  current_choice: 480
+  origin_channels: 480
+head.fc.mutable_in_features:
+  current_choice: 1920
+  origin_channels: 1920
+head.fc.mutable_out_features:
+  current_choice: 1000
+  origin_channels: 1000
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_530M.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_530M.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..19bf6fccfc679b1bf931cc3b5c1a992b22c58792
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_530M.yaml
@@ -0,0 +1,474 @@
+backbone.conv1.bn.mutable_num_features:
+  current_choice: 32
+  origin_channels: 48
+backbone.conv1.conv.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+backbone.conv1.conv.mutable_out_channels:
+  current_choice: 32
+  origin_channels: 48
+backbone.conv2.bn.mutable_num_features:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.conv2.conv.mutable_in_channels:
+  current_choice: 480
+  origin_channels: 480
+backbone.conv2.conv.mutable_out_channels:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.layer1.0.conv.0.bn.mutable_num_features:
+  current_choice: 32
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_in_channels:
+  current_choice: 32
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_out_channels:
+  current_choice: 32
+  origin_channels: 48
+backbone.layer1.0.conv.1.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 24
+backbone.layer1.0.conv.1.conv.mutable_in_channels:
+  current_choice: 32
+  origin_channels: 48
+backbone.layer1.0.conv.1.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 24
+backbone.layer2.0.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 24
+backbone.layer2.0.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer2.0.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer2.0.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer2.1.conv.0.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer2.1.conv.0.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.1.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer2.1.conv.2.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer2.1.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer3.0.conv.0.bn.mutable_num_features:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 40
+backbone.layer3.0.conv.0.conv.mutable_out_channels:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.1.bn.mutable_num_features:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_in_channels:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_out_channels:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.0.conv.2.conv.mutable_in_channels:
+  current_choice: 192
+  origin_channels: 240
+backbone.layer3.0.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.1.conv.0.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.1.conv.0.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.1.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.1.conv.2.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 288
+backbone.layer3.1.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.2.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.2.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer3.2.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer4.0.conv.0.bn.mutable_num_features:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 48
+backbone.layer4.0.conv.0.conv.mutable_out_channels:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.1.bn.mutable_num_features:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_in_channels:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_out_channels:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.2.bn.mutable_num_features:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.0.conv.2.conv.mutable_in_channels:
+  current_choice: 264
+  origin_channels: 288
+backbone.layer4.0.conv.2.conv.mutable_out_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.1.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.0.conv.mutable_in_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.1.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.bn.mutable_num_features:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.1.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.conv.mutable_out_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.2.conv.0.bn.mutable_num_features:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.0.conv.mutable_in_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.2.conv.0.conv.mutable_out_channels:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.1.bn.mutable_num_features:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_in_channels:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_out_channels:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.2.bn.mutable_num_features:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.2.conv.2.conv.mutable_in_channels:
+  current_choice: 336
+  origin_channels: 576
+backbone.layer4.2.conv.2.conv.mutable_out_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.3.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.0.conv.mutable_in_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.3.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.2.bn.mutable_num_features:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer4.3.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 576
+backbone.layer4.3.conv.2.conv.mutable_out_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer5.0.conv.0.bn.mutable_num_features:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.0.conv.mutable_in_channels:
+  current_choice: 88
+  origin_channels: 96
+backbone.layer5.0.conv.0.conv.mutable_out_channels:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.1.bn.mutable_num_features:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_in_channels:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_out_channels:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.2.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.0.conv.2.conv.mutable_in_channels:
+  current_choice: 576
+  origin_channels: 576
+backbone.layer5.0.conv.2.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.1.conv.0.bn.mutable_num_features:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.0.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.1.conv.0.conv.mutable_out_channels:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.1.bn.mutable_num_features:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_in_channels:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_out_channels:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.2.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.1.conv.2.conv.mutable_in_channels:
+  current_choice: 576
+  origin_channels: 864
+backbone.layer5.1.conv.2.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.2.conv.0.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.0.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.2.conv.0.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.1.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.2.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer5.2.conv.2.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer5.2.conv.2.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer6.0.conv.0.bn.mutable_num_features:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.0.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 144
+backbone.layer6.0.conv.0.conv.mutable_out_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.bn.mutable_num_features:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_in_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_out_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.0.conv.2.conv.mutable_in_channels:
+  current_choice: 864
+  origin_channels: 864
+backbone.layer6.0.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.1.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.1.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.2.bn.mutable_num_features:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer6.2.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer6.2.conv.2.conv.mutable_out_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer7.0.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.0.conv.mutable_in_channels:
+  current_choice: 240
+  origin_channels: 240
+backbone.layer7.0.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.bn.mutable_num_features:
+  current_choice: 480
+  origin_channels: 480
+backbone.layer7.0.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.conv.mutable_out_channels:
+  current_choice: 480
+  origin_channels: 480
+head.fc.mutable_in_features:
+  current_choice: 1920
+  origin_channels: 1920
+head.fc.mutable_out_features:
+  current_choice: 1000
+  origin_channels: 1000
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_channel_config.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_channel_config.json
new file mode 100644
index 0000000000000000000000000000000000000000..4b9e421f39562e83da76b26ebd7a6e9cefe39cb0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_channel_config.json
@@ -0,0 +1,362 @@
+{
+    "backbone.conv1.conv_(0, 48)_48": {
+        "init_args": {
+            "num_channels": 48,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                8,
+                8,
+                32
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 32
+    },
+    "backbone.layer1.0.conv.1.conv_(0, 24)_24": {
+        "init_args": {
+            "num_channels": 24,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                8,
+                8,
+                16
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 16
+    },
+    "backbone.layer2.0.conv.0.conv_(0, 144)_144": {
+        "init_args": {
+            "num_channels": 144,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer2.0.conv.2.conv_(0, 40)_40": {
+        "init_args": {
+            "num_channels": 40,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                16,
+                16,
+                24
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 24
+    },
+    "backbone.layer2.1.conv.0.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                176
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 176
+    },
+    "backbone.layer3.0.conv.0.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                192
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 192
+    },
+    "backbone.layer3.0.conv.2.conv_(0, 48)_48": {
+        "init_args": {
+            "num_channels": 48,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                24,
+                24,
+                48
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 48
+    },
+    "backbone.layer3.1.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                240
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 240
+    },
+    "backbone.layer3.2.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer4.0.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                264
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 264
+    },
+    "backbone.layer4.0.conv.2.conv_(0, 96)_96": {
+        "init_args": {
+            "num_channels": 96,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                48,
+                56,
+                88
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 88
+    },
+    "backbone.layer4.1.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                288
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 288
+    },
+    "backbone.layer4.2.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                336
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 336
+    },
+    "backbone.layer4.3.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                432
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 432
+    },
+    "backbone.layer5.0.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                576
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 576
+    },
+    "backbone.layer5.0.conv.2.conv_(0, 144)_144": {
+        "init_args": {
+            "num_channels": 144,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                64,
+                96,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer5.1.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                432,
+                432,
+                576
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 576
+    },
+    "backbone.layer5.2.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                432,
+                432,
+                648
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 648
+    },
+    "backbone.layer6.0.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                648,
+                864,
+                864
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 864
+    },
+    "backbone.layer6.0.conv.2.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                176,
+                240,
+                240
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 240
+    },
+    "backbone.layer6.1.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                720,
+                1440,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer6.2.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                720,
+                960,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer7.0.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                1440,
+                1440,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer7.0.conv.2.conv_(0, 480)_480": {
+        "init_args": {
+            "num_channels": 480,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                280,
+                480,
+                480
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 480
+    }
+}
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_config.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_config.json
new file mode 100644
index 0000000000000000000000000000000000000000..9010b83e212dd52885cef96524aef4ad681d39c8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/MBV2_slimmable_config.json
@@ -0,0 +1,377 @@
+{
+    "backbone.conv1.conv_(0, 48)_48": {
+        "init_args": {
+            "num_channels": 48,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                8,
+                8,
+                32
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 32
+    },
+    "backbone.layer1.0.conv.1.conv_(0, 24)_24": {
+        "init_args": {
+            "num_channels": 24,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                8,
+                8,
+                16
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 16
+    },
+    "backbone.layer2.0.conv.0.conv_(0, 144)_144": {
+        "init_args": {
+            "num_channels": 144,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer2.0.conv.2.conv_(0, 40)_40": {
+        "init_args": {
+            "num_channels": 40,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                16,
+                16,
+                24
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 24
+    },
+    "backbone.layer2.1.conv.0.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                176
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 176
+    },
+    "backbone.layer3.0.conv.0.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                96,
+                96,
+                192
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 192
+    },
+    "backbone.layer3.0.conv.2.conv_(0, 48)_48": {
+        "init_args": {
+            "num_channels": 48,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                24,
+                24,
+                48
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 48
+    },
+    "backbone.layer3.1.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                240
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 240
+    },
+    "backbone.layer3.2.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer4.0.conv.0.conv_(0, 288)_288": {
+        "init_args": {
+            "num_channels": 288,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                144,
+                144,
+                264
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 264
+    },
+    "backbone.layer4.0.conv.2.conv_(0, 96)_96": {
+        "init_args": {
+            "num_channels": 96,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                48,
+                56,
+                88
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 88
+    },
+    "backbone.layer4.1.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                288
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 288
+    },
+    "backbone.layer4.2.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                336
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 336
+    },
+    "backbone.layer4.3.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                432
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 432
+    },
+    "backbone.layer5.0.conv.0.conv_(0, 576)_576": {
+        "init_args": {
+            "num_channels": 576,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                288,
+                288,
+                576
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 576
+    },
+    "backbone.layer5.0.conv.2.conv_(0, 144)_144": {
+        "init_args": {
+            "num_channels": 144,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                64,
+                96,
+                144
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 144
+    },
+    "backbone.layer5.1.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                432,
+                432,
+                576
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 576
+    },
+    "backbone.layer5.2.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                432,
+                432,
+                648
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 648
+    },
+    "backbone.layer6.0.conv.0.conv_(0, 864)_864": {
+        "init_args": {
+            "num_channels": 864,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                648,
+                864,
+                864
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 864
+    },
+    "backbone.layer6.0.conv.2.conv_(0, 240)_240": {
+        "init_args": {
+            "num_channels": 240,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                176,
+                240,
+                240
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 240
+    },
+    "backbone.layer6.1.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                720,
+                1440,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer6.2.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                720,
+                960,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer7.0.conv.0.conv_(0, 1440)_1440": {
+        "init_args": {
+            "num_channels": 1440,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                1440,
+                1440,
+                1440
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1440
+    },
+    "backbone.layer7.0.conv.2.conv_(0, 480)_480": {
+        "init_args": {
+            "num_channels": 480,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                280,
+                480,
+                480
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 480
+    },
+    "backbone.conv2.conv_(0, 1920)_1920": {
+        "init_args": {
+            "num_channels": 1920,
+            "divisor": 1,
+            "min_value": 1,
+            "min_ratio": 0.9,
+            "candidate_choices": [
+                1920,
+                1920,
+                1920
+            ],
+            "choice_mode": "number"
+        },
+        "choice": 1920
+    }
+}
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/color.jpeg b/cv/distiller/CWD/pytorch/mmrazor/tests/data/color.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..2f19ebc6c6e867372f61dceadba4d66de46e31ab
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/tests/data/color.jpeg differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet1.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c15cab25f2ea4e201d3a01cc6be6bddad1f65cec
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet1.yaml
@@ -0,0 +1,24 @@
+op1.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op1.mutable_out_channels:
+  current_choice: 4
+  origin_channels: 8
+bn1.mutable_num_features:
+  current_choice: 4
+  origin_channels: 8
+op2.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op2.mutable_out_channels:
+  current_choice: 4
+  origin_channels: 8
+bn2.mutable_num_features:
+  current_choice: 4
+  origin_channels: 8
+op3.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 16
+op3.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet2.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f2c6e7ab24da74b3b15d28412ffa23c39efbaebd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/concat_subnet2.yaml
@@ -0,0 +1,24 @@
+op1.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op1.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
+bn1.mutable_num_features:
+  current_choice: 8
+  origin_channels: 8
+op2.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op2.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
+bn2.mutable_num_features:
+  current_choice: 8
+  origin_channels: 8
+op3.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 16
+op3.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/a/1.JPG b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/a/1.JPG
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.json
new file mode 100644
index 0000000000000000000000000000000000000000..a55539329966e7f36233099a6c9e37ce50a09ecf
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.json
@@ -0,0 +1,28 @@
+{
+  "metainfo": {
+    "categories": [
+      {
+        "category_name": "first",
+        "id": 0
+      },
+      {
+        "category_name": "second",
+        "id": 1
+      }
+    ]
+  },
+  "data_list": [
+    {
+      "img_path": "a/1.JPG",
+      "gt_label": 0
+    },
+    {
+      "img_path": "b/2.jpeg",
+      "gt_label": 1
+    },
+    {
+      "img_path": "b/subb/2.jpeg",
+      "gt_label": 1
+    }
+  ]
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.txt b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f929e873b79cd93a8517ef05dd4302271a605af1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/ann.txt
@@ -0,0 +1,3 @@
+a/1.JPG 0
+b/2.jpeg 1
+b/subb/3.jpg 1
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/b/2.jpeg b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/b/2.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/b/subb/3.jpg b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/b/subb/3.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/classes.txt b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/classes.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c012a51e609848598ce299aef954746393c49303
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/classes.txt
@@ -0,0 +1,2 @@
+bus
+car
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/multi_label_ann.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/multi_label_ann.json
new file mode 100644
index 0000000000000000000000000000000000000000..5cd8a84d086b619e1dffd25cc8800c2dc5f097dd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/dataset/multi_label_ann.json
@@ -0,0 +1,28 @@
+{
+  "metainfo": {
+    "categories": [
+      {
+        "category_name": "first",
+        "id": 0
+      },
+      {
+        "category_name": "second",
+        "id": 1
+      }
+    ]
+  },
+  "data_list": [
+    {
+      "img_path": "a/1.JPG",
+      "gt_label": [0]
+    },
+    {
+      "img_path": "b/2.jpeg",
+      "gt_label": [1]
+    },
+    {
+      "img_path": "b/subb/2.jpeg",
+      "gt_label": [0, 1]
+    }
+  ]
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/model_library.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/model_library.py
new file mode 100644
index 0000000000000000000000000000000000000000..d917dcc3039c8319972c0dbae0fce18e380039a3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/model_library.py
@@ -0,0 +1,693 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import List
+from typing import Dict, Callable
+from mmrazor.registry import MODELS
+from mmengine.config import Config
+import os
+from mmengine.utils import get_installed_path
+from mmrazor.registry import MODELS
+import torch
+import torch.nn as nn
+from .models import (AddCatModel, ConcatModel, ConvAttnModel, DwConvModel,
+                     ExpandLineModel, GroupWiseConvModel, SingleLineModel,
+                     MultiBindModel, MultiConcatModel, MultiConcatModel2,
+                     ResBlock, Xmodel, MultipleUseModel, Icep, SelfAttention)
+import json
+# model generator
+from mmdet.testing._utils import demo_mm_inputs
+import string
+import copy
+# helper functions
+
+
+def get_shape(tensor, only_length=False):
+    if isinstance(tensor, torch.Tensor):
+        if only_length:
+            return len(tensor.shape)
+        else:
+            return tensor.shape
+    elif isinstance(tensor, list) or isinstance(tensor, tuple):
+        shapes = []
+        for x in tensor:
+            shapes.append(get_shape(x, only_length))
+        return shapes
+    elif isinstance(tensor, dict):
+        shapes = {}
+        for key in tensor:
+            shapes[key] = get_shape(tensor[key], only_length)
+        return shapes
+    else:
+        raise NotImplementedError(
+            f'unsuppored type{type(tensor)} to get shape of tensors.')
+
+
+# generators
+
+
+class ModelGenerator(nn.Module):
+
+    def __init__(self, name: str, model_src) -> None:
+        super().__init__()
+        self.name = name
+        self.model_src = model_src
+        self._model = None
+
+    def __call__(self, *args, **kwargs):
+        return self.init_model()
+
+    def init_model(self):
+        return self.model_src()
+
+    def forward(self, x):
+        assert self._model is not None
+        return self._model(x, *self.input())
+
+    def input(self):
+        return []
+
+    def assert_model_is_changed(self, tensors_org, tensors_new):
+        shape1 = get_shape(tensors_org)
+        shape2 = get_shape(tensors_new)
+        assert shape1 == shape2, f'{shape1}!={shape2}'
+
+    def __repr__(self) -> str:
+        return self.name
+
+    @classmethod
+    def get_base_name(cls, name: str):
+        names = name.split('.')
+        return '.'.join(names[1:])
+
+    @classmethod
+    def get_short_name(cls, name: str):
+        scope = name.split('.')[0]
+        base_name = cls.get_base_name(name)
+        names = base_name.replace('-', '.').replace('_', '.').split('.')
+        name = names[0]
+        name = name.rstrip(string.digits)
+
+        return f'{scope}.{name}'
+
+    @property
+    def base_name(self):
+        return self.__class__.get_base_name(self.name)
+
+    @property
+    def short_name(self):
+        return self.__class__.get_short_name(self.name)
+
+    @property
+    def scope(self):
+        return self.name.split('.')[0]
+
+
+class MMModelGenerator(ModelGenerator):
+
+    def __init__(self, name, cfg) -> None:
+        self.cfg = cfg
+        super().__init__(name, self.get_model_src)
+
+    def get_model_src(self):
+        model = MODELS.build(self.cfg)
+        model = revert_sync_batchnorm(model)
+        return model
+
+    def __repr__(self) -> str:
+        return self.name
+
+
+class MMDetModelGenerator(MMModelGenerator):
+
+    def forward(self, x):
+        assert self._model is not None
+        self._model.eval()
+        return self._model(x, **self.input(), mode='tensor')
+
+    def input(self):
+        data = demo_mm_inputs(1, [[3, 224, 224]])
+        data = self._model.data_preprocessor(data, False)
+        data.pop('inputs')
+        return data
+
+    def assert_model_is_changed(self, tensors_org, tensors_new):
+        assert get_shape(tensors_org, True) == get_shape(tensors_new, True)
+
+
+# model library
+
+
+class ModelLibrary:
+    default_includes: List = []
+    _models = None
+
+    def __init__(self, include=default_includes, exclude=[]) -> None:
+        self.include_key = include
+        self.exclude_key = exclude
+        self._include_models, self._uninclude_models, self.exclude_models =\
+             self._classify_models(self.models)
+
+    @property
+    def models(self):
+        if self.__class__._models is None:
+            self.__class__._models: Dict[
+                str, Callable] = self.__class__.get_models()
+        return self.__class__._models
+
+    @classmethod
+    def get_models(cls):
+        raise NotImplementedError()
+
+    def include_models(self):
+        return self._include_models
+
+    def uninclude_models(self):
+        return self._uninclude_models
+
+    def is_include(self, name: str, includes: List[str], start_with=True):
+        for key in includes:
+            if start_with:
+                if name.startswith(key):
+                    return True
+            else:
+                if key in name:
+                    return True
+        return False
+
+    def is_default_includes_cover_all_models(self):
+        models = copy.copy(self._models)
+        is_covered = True
+        for name in models:
+            if self.is_include(name, self.__class__.default_includes):
+                pass
+            else:
+                is_covered = False
+                print(name, '\tnot include')
+        return is_covered
+
+    def short_names(self):
+        short_names = set()
+        for name in self.models:
+            short_names.add(self.models[name].short_name)
+        return short_names
+
+    def _classify_models(self, models: Dict):
+        include = []
+        uninclude = []
+        exclude = []
+        for name in models:
+            if self.is_include(name, self.exclude_key, start_with=False):
+                exclude.append(models[name])
+            elif self.is_include(name, self.include_key, start_with=True):
+                include.append(models[name])
+            else:
+                uninclude.append(models[name])
+        return include, uninclude, exclude
+
+    def get_short_name_of_model(self, name: str):
+        names = name.replace('-', '.').replace('_', '.').split('.')
+        return names[0]
+
+
+class DefaultModelLibrary(ModelLibrary):
+    _mm_models = None
+
+    default_includes: List = [
+        'SingleLineModel',
+        'ResBlock',
+        'AddCatModel',
+        'ConcatModel',
+        'MultiConcatModel',
+        'MultiConcatModel2',
+        'GroupWiseConvModel',
+        'Xmodel',
+        'MultipleUseModel',
+        'Icep',
+        'ExpandLineModel',
+        'MultiBindModel',
+        'DwConvModel',
+        'ConvAttnModel',
+        'SelfAttention',
+        # mm models
+        'resnet',
+        'pspnet',
+        'yolo'
+    ]
+
+    def __init__(self,
+                 include=default_includes,
+                 exclude=[],
+                 with_mm_models=False) -> None:
+        self.with_mm_models = with_mm_models
+        super().__init__(include, exclude)
+
+    @property
+    def models(self):
+        models = copy.copy(super().models)
+        if self.with_mm_models:
+            models.update(self.mm_models)
+        return models
+
+    @property
+    def mm_models(self):
+        if self.__class__._mm_models is None:
+            self.__class__._mm_models = self.get_mm_models()
+        return self.__class__._mm_models
+
+    @classmethod
+    def get_models(cls):
+        models = [
+            SingleLineModel,
+            ResBlock,
+            AddCatModel,
+            ConcatModel,
+            MultiConcatModel,
+            MultiConcatModel2,
+            GroupWiseConvModel,
+            Xmodel,
+            MultipleUseModel,
+            Icep,
+            ExpandLineModel,
+            MultiBindModel,
+            DwConvModel,  #
+            ConvAttnModel,
+            SelfAttention,
+        ]
+        model_dict = {}
+        for model in models:
+            model_dict[model.__name__] = ModelGenerator(
+                'default.' + model.__name__, model)
+        return model_dict
+
+    @classmethod
+    def get_mm_models(cls):
+        paths = [
+            'mmcls::resnet/resnet34_8xb32_in1k.py',
+            'mmseg::pspnet/pspnet_r18-d8_4xb4-80k_potsdam-512x512.py',
+            'mmdet::yolo/yolov3_d53_8xb8-320-273e_coco.py'
+        ]
+        models = {}
+        for path in paths:
+            Model = MMModelLibrary.get_model_from_path(path)
+            models[Model.base_name] = Model
+        return models
+
+
+class TorchModelLibrary(ModelLibrary):
+
+    default_includes = [
+        'alexnet', 'densenet', 'efficientnet', 'googlenet', 'inception',
+        'mnasnet', 'mobilenet', 'regnet', 'resnet', 'resnext', 'shufflenet',
+        'squeezenet', 'vgg', 'wide_resnet', "vit", "swin", "convnext"
+    ]
+
+    def __init__(self, include=default_includes, exclude=[]) -> None:
+        super().__init__(include, exclude)
+
+    @classmethod
+    def get_models(cls):
+        from inspect import isfunction
+
+        import torchvision
+
+        attrs = dir(torchvision.models)
+        models = {}
+        for name in attrs:
+            module = getattr(torchvision.models, name)
+            if isfunction(module) and name is not 'get_weight':
+                models[name] = ModelGenerator('torch.' + name, module)
+        return models
+
+
+class MMModelLibrary(ModelLibrary):
+    default_includes = []
+    base_config_path = '/'
+    repo = 'mmxx'
+
+    def __init__(self, include=default_includes, exclude=[]) -> None:
+        super().__init__(include, exclude)
+
+    @classmethod
+    def scope_path(cls):
+        path = cls._scope_path(cls.repo) + cls.base_config_path
+        return path
+
+    @classmethod
+    def get_models(cls):
+        models = {}
+        added_models = set()
+        for dirpath, dirnames, filenames in os.walk(cls.scope_path()):
+            for filename in filenames:
+                if filename.endswith('.py'):
+
+                    cfg_path = dirpath + '/' + filename
+                    try:
+                        config = Config.fromfile(cfg_path)
+                    except:
+                        continue
+                    if 'model' in config:
+
+                        # get model_name
+                        model_name = cls.get_model_name_from_path(
+                            cfg_path, cls.scope_path())
+
+                        model_cfg = config['model']
+                        model_cfg = cls._config_process(model_cfg)
+                        if json.dumps(model_cfg) not in added_models:
+                            models[model_name] = cls.generator_type()(
+                                cls.repo + '.' + model_name, model_cfg)
+                            added_models.add(json.dumps(model_cfg))
+        return models
+
+    @classmethod
+    def generator_type(cls):
+        return MMModelGenerator
+
+    @classmethod
+    def get_model_name_from_path(cls, config_path, scope_path):
+        import os
+        dirpath = os.path.dirname(config_path) + '/'
+        filename = os.path.basename(config_path)
+
+        model_type_name = '_'.join(dirpath.replace(scope_path, '').split('/'))
+        model_type_name = model_type_name if model_type_name == '' else model_type_name + '_'
+        model_name = model_type_name + \
+            os.path.basename(filename).split('.')[0]
+        return model_name
+
+    @classmethod
+    def get_model_from_path(cls, config_path):
+        path, scope = Config._get_cfg_path(config_path, '')
+        if scope is None:
+            scope = 'mmrazor'
+        config = Config.fromfile(path)['model']
+        config = cls._config_process(config=config)
+        config['_scope_'] = scope
+        name = cls.get_model_name_from_path(path, cls._scope_path(scope))
+        return cls.generator_type()(scope + '.' + name, config)
+
+    @staticmethod
+    def _scope_path(scope):
+        if scope == 'mmseg':
+            scope = 'mmsegmentation'
+        repo_path = get_installed_path(scope)
+        path = repo_path + '/.mim/configs/'
+        return path
+
+    @classmethod
+    def _config_process(cls, config: Dict):
+        config['_scope_'] = cls.repo
+        config = cls._remove_certain_key(config, 'init_cfg')
+        config = cls._remove_certain_key(config, 'pretrained')
+        config = cls._remove_certain_key(config, 'Pretrained')
+        return config
+
+    @classmethod
+    def _remove_certain_key(cls, config: Dict, key: str = 'init_cfg'):
+        if isinstance(config, dict):
+            if key in config:
+                config.pop(key)
+            for keyx in config:
+                config[keyx] = cls._remove_certain_key(config[keyx], key)
+        return config
+
+
+class MMClsModelLibrary(MMModelLibrary):
+
+    default_includes = [
+        'vgg',
+        'efficientnet',
+        'resnet',
+        'mobilenet',
+        'resnext',
+        'wide-resnet',
+        'shufflenet',
+        'hrnet',
+        'resnest',
+        'inception',
+        'res2net',
+        'densenet',
+        'convnext',
+        'regnet',
+        'van',
+        'swin_transformer',
+        'convmixer',
+        't2t',
+        'twins',
+        'repmlp',
+        'tnt',
+        't2t',
+        'mlp_mixer',
+        'conformer',
+        'poolformer',
+        'vit',
+        'efficientformer',
+        'mobileone',
+        'edgenext',
+        'mvit',
+        'seresnet',
+        'repvgg',
+        'seresnext',
+        'deit',
+        'replknet',
+        'hornet',
+        'mobilevit',
+        'davit',
+    ]
+    base_config_path = '_base_/models/'
+    repo = 'mmcls'
+
+    def __init__(
+            self,
+            include=default_includes,
+            exclude=['cutmix', 'cifar', 'gem', 'efficientformer']) -> None:
+        super().__init__(include=include, exclude=exclude)
+
+
+class MMDetModelLibrary(MMModelLibrary):
+
+    default_includes = [
+        '_base',
+        'gfl',
+        'sparse',
+        'simple',
+        'pisa',
+        'lvis',
+        'carafe',
+        'selfsup',
+        'solo',
+        'ssd',
+        'res2net',
+        'yolof',
+        'reppoints',
+        'htc',
+        'groie',
+        'dyhead',
+        'grid',
+        'soft',
+        'swin',
+        'regnet',
+        'gcnet',
+        'ddod',
+        'instaboost',
+        'point',
+        'vfnet',
+        'pafpn',
+        'ghm',
+        'mask',
+        'resnest',
+        'tood',
+        'detectors',
+        'cornernet',
+        'convnext',
+        'cascade',
+        'paa',
+        'detr',
+        'rpn',
+        'ld',
+        'lad',
+        'ms',
+        'faster',
+        'centripetalnet',
+        'gn',
+        'dcnv2',
+        'legacy',
+        'panoptic',
+        'strong',
+        'fpg',
+        'deformable',
+        'free',
+        'scratch',
+        'openimages',
+        'fsaf',
+        'rtmdet',
+        'solov2',
+        'yolact',
+        'empirical',
+        'centernet',
+        'hrnet',
+        'guided',
+        'deepfashion',
+        'fast',
+        'mask2former',
+        'retinanet',
+        'autoassign',
+        'gn+ws',
+        'dcn',
+        'yolo',
+        'foveabox',
+        'libra',
+        'double',
+        'queryinst',
+        'resnet',
+        'nas',
+        'sabl',
+        'fcos',
+        'scnet',
+        'maskformer',
+        'pascal',
+        'cityscapes',
+        'timm',
+        'seesaw',
+        'pvt',
+        'atss',
+        'efficientnet',
+        'wider',
+        'tridentnet',
+        'dynamic',
+        'yolox',
+        'albu',
+        'misc',
+        'crowddet',
+        'condins',
+    ]
+    base_config_path = '/'
+    repo = 'mmdet'
+
+    def __init__(
+        self,
+        include=default_includes,
+        exclude=[
+            'lad',
+            'ld',
+            'faster_rcnn_faster-rcnn_r50-caffe-c4_ms-1x_coco',
+        ]
+    ) -> None:
+        super().__init__(include=include, exclude=exclude)
+
+    @classmethod
+    def _config_process(cls, config: Dict):
+        config = super()._config_process(config)
+        if 'preprocess_cfg' in config:
+            config.pop('preprocess_cfg')
+        return config
+
+    @classmethod
+    def generator_type(cls):
+        return MMModelGenerator
+
+
+class MMSegModelLibrary(MMModelLibrary):
+    default_includes: List = [
+        '_base_',
+        'knet',
+        'sem',
+        'dnlnet',
+        'dmnet',
+        'icnet',
+        'apcnet',
+        'swin',
+        'isanet',
+        'fastfcn',
+        'poolformer',
+        'mae',
+        'segformer',
+        'ccnet',
+        'twins',
+        'emanet',
+        'upernet',
+        'beit',
+        'hrnet',
+        'bisenetv2',
+        'vit',
+        'setr',
+        'cgnet',
+        'ocrnet',
+        'ann',
+        'erfnet',
+        'point',
+        'bisenetv1',
+        'nonlocal',
+        'unet',
+        'danet',
+        'stdc',
+        'fcn',
+        'encnet',
+        'resnest',
+        'mobilenet',
+        'convnext',
+        'deeplabv3',
+        'pspnet',
+        'gcnet',
+        'fastscnn',
+        'segmenter',
+        'dpt',
+        'deeplabv3plus',
+        'psanet',
+    ]
+    base_config_path = '/'
+    repo = 'mmsegmentation'
+
+    def __init__(self, include=default_includes, exclude=['_base_']) -> None:
+        super().__init__(include, exclude)
+
+    @classmethod
+    def _config_process(cls, config: Dict):
+        config['_scope_'] = 'mmseg'
+        return config
+
+
+class MMPoseModelLibrary(MMModelLibrary):
+    default_includes: List = [
+        'hand',
+        'face',
+        'wholebody',
+        'body',
+        'animal',
+    ]
+    base_config_path = '/'
+    repo = 'mmpose'
+
+    def __init__(self, include=default_includes, exclude=[]) -> None:
+        super().__init__(include, exclude=exclude)
+
+    @classmethod
+    def _config_process(cls, config: Dict):
+        config['_scope_'] = 'mmpose'
+        return config
+
+
+# tools
+
+def revert_sync_batchnorm(module):
+    # this is very similar to the function that it is trying to revert:
+    # https://github.com/pytorch/pytorch/blob/c8b3686a3e4ba63dc59e5dcfe5db3430df256833/torch/nn/modules/batchnorm.py#L679
+    module_output = module
+    if isinstance(module, torch.nn.modules.batchnorm.SyncBatchNorm):
+        new_cls = nn.BatchNorm2d
+        module_output = nn.BatchNorm2d(module.num_features, module.eps,
+                                       module.momentum, module.affine,
+                                       module.track_running_stats)
+        if module.affine:
+            with torch.no_grad():
+                module_output.weight = module.weight
+                module_output.bias = module.bias
+        module_output.running_mean = module.running_mean
+        module_output.running_var = module.running_var
+        module_output.num_batches_tracked = module.num_batches_tracked
+        if hasattr(module, "qconfig"):
+            module_output.qconfig = module.qconfig
+    for name, child in module.named_children():
+        module_output.add_module(name, revert_sync_batchnorm(child))
+    del module
+    return module_output
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/models.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/models.py
new file mode 100644
index 0000000000000000000000000000000000000000..0347b91477b85c4b8fd66341b20874f74fb6aeb1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/models.py
@@ -0,0 +1,1073 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# this file includes models for tesing.
+from collections import OrderedDict
+from typing import Dict
+import math
+
+from torch.nn import Module
+from torch import Tensor
+import torch.nn as nn
+import torch.nn.functional as F
+import torch
+from mmengine.model import BaseModel
+from mmrazor.models.architectures.dynamic_ops import DynamicBatchNorm2d, DynamicConv2d, DynamicLinear, DynamicChannelMixin, DynamicPatchEmbed, DynamicSequential
+from mmrazor.models.mutables.mutable_channel import MutableChannelContainer
+from mmrazor.models.mutables import MutableChannelUnit
+from mmrazor.models.mutables import DerivedMutable
+from mmrazor.models.mutables import BaseMutable
+from mmrazor.models.mutables import OneShotMutableChannelUnit, OneShotMutableChannel
+
+from mmrazor.models.mutables import OneShotMutableValue
+from mmrazor.models.architectures.backbones.searchable_autoformer import TransformerEncoderLayer
+from mmrazor.registry import MODELS
+from mmrazor.models.mutables import OneShotMutableValue
+from mmrazor.models.architectures.backbones.searchable_autoformer import TransformerEncoderLayer
+from mmrazor.models.utils.parse_values import parse_values
+
+from mmrazor.models.architectures.ops.mobilenet_series import MBBlock
+from mmcv.cnn import ConvModule
+from mmengine.model import Sequential
+from mmrazor.models.architectures.utils.mutable_register import (
+    mutate_conv_module, mutate_mobilenet_layer)
+
+# models to test fx tracer
+
+
+def untracable_function(x: torch.Tensor):
+    if x.sum() > 0:
+        x = x - 1
+    else:
+        x = x + 1
+    return x
+
+
+class UntracableModule(nn.Module):
+
+    def __init__(self, in_channel, out_channel) -> None:
+        super().__init__()
+        self.conv = nn.Conv2d(in_channel, out_channel, 3, 1, 1)
+        self.conv2 = nn.Conv2d(out_channel, out_channel, 3, 1, 1)
+
+    def forward(self, x: torch.Tensor):
+        x = self.conv(x)
+        if x.sum() > 0:
+            x = x * 2
+        else:
+            x = x * -2
+        x = self.conv2(x)
+        return x
+
+
+class ModuleWithUntracableMethod(nn.Module):
+
+    def __init__(self, in_channel, out_channel) -> None:
+        super().__init__()
+        self.conv = nn.Conv2d(in_channel, out_channel, 3, 1, 1)
+        self.conv2 = nn.Conv2d(out_channel, out_channel, 3, 1, 1)
+
+    def forward(self, x: torch.Tensor):
+        x = self.conv(x)
+        x = self.untracable_method(x)
+        x = self.conv2(x)
+        return x
+
+    def untracable_method(self, x):
+        if x.sum() > 0:
+            x = x * 2
+        else:
+            x = x * -2
+        return x
+
+@MODELS.register_module()
+class UntracableBackBone(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.conv = nn.Conv2d(3, 16, 3, 2)
+        self.untracable_module = UntracableModule(16, 8)
+        self.module_with_untracable_method = ModuleWithUntracableMethod(8, 16)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = untracable_function(x)
+        x = self.untracable_module(x)
+        x = self.module_with_untracable_method(x)
+        return x
+
+
+class UntracableModel(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.backbone = UntracableBackBone()
+        self.head = LinearHeadForTest(16, 1000)
+
+    def forward(self, x):
+        return self.head(self.backbone(x))
+
+
+class ConvAttnModel(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.conv = nn.Conv2d(3, 8, 3, 1, 1)
+        self.pool = nn.AdaptiveAvgPool2d(1)
+        self.conv2 = nn.Conv2d(8, 16, 3, 1, 1)
+        self.head = LinearHeadForTest(16, 1000)
+
+    def forward(self, x):
+        x1 = self.conv(x)
+        attn = F.sigmoid(self.pool(x1))
+        x_attn = x1 * attn
+        x_last = self.conv2(x_attn)
+        return self.head(x_last)
+
+@MODELS.register_module()
+class LinearHeadForTest(Module):
+
+    def __init__(self, in_channel, num_class=1000) -> None:
+        super().__init__()
+        self.pool = nn.AdaptiveAvgPool2d(1)
+        self.linear = nn.Linear(in_channel, num_class)
+
+    def forward(self, x):
+        pool = self.pool(x).flatten(1)
+        return self.linear(pool)
+
+
+class MultiConcatModel(Module):
+    """
+        x----------------
+        |op1    |op2    |op4
+        x1      x2      x4
+        |       |       |
+        |cat-----       |
+        cat1            |
+        |op3            |
+        x3              |
+        |cat-------------
+        cat2
+        |avg_pool
+        x_pool
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(16, 8, 1)
+        self.op4 = nn.Conv2d(3, 8, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(16, 1000)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        x3 = self.op3(cat1)
+        x4 = self.op4(x)
+        cat2 = torch.cat([x3, x4], dim=1)
+        x_pool = self.avg_pool(cat2).flatten(1)
+        output = self.fc(x_pool)
+
+        return output
+
+
+class MultiConcatModel2(Module):
+    """
+        x---------------
+        |op1    |op2   |op3
+        x1      x2      x3
+        |       |       |
+        |cat-----       |
+        cat1            |
+        |cat-------------
+        cat2
+        |op4
+        x4
+        |avg_pool
+        x_pool
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(3, 8, 1)
+        self.op4 = nn.Conv2d(24, 8, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(8, 1000)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        cat2 = torch.cat([cat1, x3], dim=1)
+        x4 = self.op4(cat2)
+
+        x_pool = self.avg_pool(x4).reshape([x4.shape[0], -1])
+        output = self.fc(x_pool)
+
+        return output
+
+
+class ConcatModel(Module):
+    """
+        x------------
+        |op1,bn1    |op2,bn2 
+        x1          x2 
+        |cat--------| 
+        cat1 
+        |op3 
+        x3
+        |avg_pool
+        x_pool
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(16, 8, 1)
+
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(8, 1000)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x))
+        cat1 = torch.cat([x1, x2], dim=1)
+        x3 = self.op3(cat1)
+
+        x_pool = self.avg_pool(x3).flatten(1)
+        output = self.fc(x_pool)
+
+        return output
+
+
+class ResBlock(Module):
+    """
+        x
+        |op1,bn1
+        x1-----------
+        |op2,bn2    |
+        x2          |
+        +------------
+        |op3
+        x3
+        |avg_pool
+        x_pool
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(8, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(8, 8, 1)
+
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(8, 1000)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x1))
+        x3 = self.op3(x2 + x1)
+        x_pool = self.avg_pool(x3).flatten(1)
+        output = self.fc(x_pool)
+        return output
+
+
+class SingleLineModel(nn.Module):
+    """
+        x
+        |net0,net1
+        |net2
+        |net3
+        x1
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Conv2d(3, 8, 3, 1, 1), nn.BatchNorm2d(8), nn.ReLU(),
+            nn.Conv2d(8, 16, 3, 1, 1), nn.BatchNorm2d(16),
+            nn.AdaptiveAvgPool2d(1))
+        self.linear = nn.Linear(16, 1000)
+
+    def forward(self, x):
+        x1 = self.net(x)
+        x1 = x1.reshape([x1.shape[0], -1])
+        return self.linear(x1)
+
+
+class AddCatModel(Module):
+    """
+        x------------------------
+        |op1    |op2    |op3    |op4
+        x1      x2      x3      x4
+        |       |       |       |
+        |cat-----       |cat-----
+        cat1            cat2
+        |               |
+        +----------------
+        x5
+        |avg_pool
+        x_pool
+        |fc
+        y
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.op1 = nn.Conv2d(3, 2, 3)
+        self.op2 = nn.Conv2d(3, 6, 3)
+        self.op3 = nn.Conv2d(3, 4, 3)
+        self.op4 = nn.Conv2d(3, 4, 3)
+        self.op5 = nn.Conv2d(8, 16, 3)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(16, 1000)
+
+    def forward(self, x):
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        x4 = self.op4(x)
+
+        cat1 = torch.cat((x1, x2), dim=1)
+        cat2 = torch.cat((x3, x4), dim=1)
+        x5 = self.op5(cat1 + cat2)
+        x_pool = self.avg_pool(x5).flatten(1)
+        y = self.fc(x_pool)
+        return y
+
+
+class GroupWiseConvModel(nn.Module):
+    """
+        x
+        |op1,bn1
+        x1
+        |op2,bn2
+        x2
+        |op3
+        x3
+        |avg_pool
+        x_pool
+        |fc
+        y
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.op1 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(8, 16, 3, 1, 1, groups=2)
+        self.bn2 = nn.BatchNorm2d(16)
+        self.op3 = nn.Conv2d(16, 32, 3, 1, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(32, 1000)
+
+    def forward(self, x):
+        x1 = self.op1(x)
+        x1 = self.bn1(x1)
+        x2 = self.op2(x1)
+        x2 = self.bn2(x2)
+        x3 = self.op3(x2)
+        x_pool = self.avg_pool(x3).flatten(1)
+        return self.fc(x_pool)
+
+
+class Xmodel(nn.Module):
+    """
+        x--------
+        |op1    |op2
+        x1      x2
+        |       |
+        +--------
+        x12------
+        |op3    |op4
+        x3      x4
+        |       |
+        +--------
+        x34
+        |avg_pool
+        x_pool
+        |fc
+        y
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.op1 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.op2 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.op3 = nn.Conv2d(8, 16, 3, 1, 1)
+        self.op4 = nn.Conv2d(8, 16, 3, 1, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(16, 1000)
+
+    def forward(self, x):
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x12 = x1 * x2
+        x3 = self.op3(x12)
+        x4 = self.op4(x12)
+        x34 = x3 + x4
+        x_pool = self.avg_pool(x34).flatten(1)
+        return self.fc(x_pool)
+
+
+class MultipleUseModel(nn.Module):
+    """
+        x------------------------
+        |conv0  |conv1  |conv2  |conv3
+        xs.0    xs.1    xs.2    xs.3
+        |convm  |convm  |convm  |convm
+        xs_.0   xs_.1   xs_.2   xs_.3
+        |       |       |       |
+        +------------------------
+        |
+        x_sum
+        |conv_last
+        feature
+        |avg_pool
+        pool
+        |linear
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.conv0 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv1 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv2 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv3 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv_multiple_use = nn.Conv2d(8, 16, 3, 1, 1)
+        self.conv_last = nn.Conv2d(16 * 4, 32, 3, 1, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.linear = nn.Linear(32, 1000)
+
+    def forward(self, x):
+        xs = [
+            conv(x)
+            for conv in [self.conv0, self.conv1, self.conv2, self.conv3]
+        ]
+        xs_ = [self.conv_multiple_use(x_) for x_ in xs]
+        x_cat = torch.cat(xs_, dim=1)
+        feature = self.conv_last(x_cat)
+        pool = self.avg_pool(feature).flatten(1)
+        return self.linear(pool)
+
+
+class IcepBlock(nn.Module):
+    """
+        x------------------------
+        |op1    |op2    |op3    |op4
+        x1      x2      x3      x4
+        |       |       |       |
+        cat----------------------
+        |
+        y_
+    """
+
+    def __init__(self, in_c=3, out_c=32) -> None:
+        super().__init__()
+        self.op1 = nn.Conv2d(in_c, out_c, 3, 1, 1)
+        self.op２ = nn.Conv2d(in_c, out_c, 3, 1, 1)
+        self.op３ = nn.Conv2d(in_c, out_c, 3, 1, 1)
+        self.op4 = nn.Conv2d(in_c, out_c, 3, 1, 1)
+        # self.op5 = nn.Conv2d(out_c*4, out_c, 3)
+
+    def forward(self, x):
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        x4 = self.op4(x)
+        y_ = [x1, x2, x3, x4]
+        y_ = torch.cat(y_, 1)
+        return y_
+
+
+class Icep(nn.Module):
+
+    def __init__(self, num_icep_blocks=2) -> None:
+        super().__init__()
+        self.icps = nn.Sequential(*[
+            IcepBlock(32 * 4 if i != 0 else 3, 32)
+            for i in range(num_icep_blocks)
+        ])
+        self.op = nn.Conv2d(32 * 4, 32, 1)
+        self.avg_pool = nn.AdaptiveAvgPool2d(1)
+        self.fc = nn.Linear(32, 1000)
+
+    def forward(self, x):
+        y_ = self.icps(x)
+        y = self.op(y_)
+        pool = self.avg_pool(y).flatten(1)
+        return self.fc(pool)
+
+
+class ExpandLineModel(Module):
+    """
+        x
+        |net0,net1,net2
+        |net3,net4
+        x1
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Conv2d(3, 8, 3, 1, 1), nn.BatchNorm2d(8), nn.ReLU(),
+            nn.Conv2d(8, 16, 3, 1, 1), nn.BatchNorm2d(16),
+            nn.AdaptiveAvgPool2d(2))
+        self.linear = nn.Linear(64, 1000)
+
+    def forward(self, x):
+        x1 = self.net(x)
+        x1 = x1.reshape([x1.shape[0], -1])
+        return self.linear(x1)
+
+
+class MultiBindModel(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.conv1 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv2 = nn.Conv2d(3, 8, 3, 1, 1)
+        self.conv3 = nn.Conv2d(8, 8, 3, 1, 1)
+        self.head = LinearHeadForTest(8, 1000)
+
+    def forward(self, x):
+        x1 = self.conv1(x)
+        x2 = self.conv2(x)
+        x12 = x1 + x2
+        x3 = self.conv3(x12)
+        x123 = x12 + x3
+        return self.head(x123)
+
+
+class DwConvModel(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Conv2d(3, 48, 3, 1, 1), nn.BatchNorm2d(48), nn.ReLU(),
+            nn.Conv2d(48, 48, 3, 1, 1, groups=48), nn.BatchNorm2d(48),
+            nn.ReLU())
+        self.head = LinearHeadForTest(48, 1000)
+
+    def forward(self, x):
+        return self.head(self.net(x))
+
+
+class SelfAttention(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.stem = nn.Conv2d(3, 32, 4, 4, 4)
+
+        self.num_head = 4
+        self.qkv = nn.Linear(32, 32 * 3)
+        self.proj = nn.Linear(32, 32)
+
+        self.head = LinearHeadForTest(32, 1000)
+
+    def forward(self, x: torch.Tensor):
+        x = self.stem(x)
+        h, w = x.shape[-2:]
+        x = self._to_token(x)
+        x = x + self._forward_attention(x)
+        x = self._to_img(x, h, w)
+        return self.head(x)
+
+    def _to_img(self, x, h, w):
+        x = x.reshape([x.shape[0], h, w, x.shape[2]])
+        x = x.permute(0, 3, 1, 2)
+        return x
+
+    def _to_token(self, x):
+        x = x.flatten(2).transpose(-1, -2)
+        return x
+
+    def _forward_attention(self, x: torch.Tensor):
+        qkv = self.qkv(x)
+        qkv = qkv.reshape([
+            x.shape[0], x.shape[1], 3, self.num_head,
+            x.shape[2] // self.num_head
+        ]).permute(2, 0, 3, 1, 4).contiguous()
+        q, k, v = qkv
+        attn = q @ k.transpose(-1, -2) / math.sqrt(32 // self.num_head)
+        y = attn @ v  # B H N h
+        y = y.permute(0, 2, 1, 3).flatten(-2)
+        return self.proj(y)
+
+
+def MMClsResNet18()  -> BaseModel:
+    model_cfg = dict(
+        _scope_='mmcls',
+        type='ImageClassifier',
+        backbone=dict(
+            type='ResNet',
+            depth=18,
+            num_stages=4,
+            out_indices=(3, ),
+            style='pytorch'),
+        neck=dict(type='GlobalAveragePooling'),
+        head=dict(
+            type='LinearClsHead',
+            num_classes=1000,
+            in_channels=512,
+            loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+            topk=(1, 5),
+        ))
+    return MODELS.build(model_cfg)
+
+
+# models with dynamicop
+
+
+def register_mutable(module: DynamicChannelMixin,
+                     mutable: MutableChannelUnit,
+                     is_out=True,
+                     start=0,
+                     end=-1):
+    if end == -1:
+        end = mutable.num_channels + start
+    if is_out:
+        container: MutableChannelContainer = module.get_mutable_attr(
+            'out_channels')
+    else:
+        container: MutableChannelContainer = module.get_mutable_attr(
+            'in_channels')
+    container.register_mutable(mutable, start, end)
+
+
+class SampleExpandDerivedMutable(BaseMutable):
+
+    def __init__(self, expand_ratio=1) -> None:
+        super().__init__()
+        self.ratio = expand_ratio
+
+    def __mul__(self, other):
+        if isinstance(other, OneShotMutableChannel):
+
+            def _expand_mask():
+                mask = other.current_mask
+                mask = torch.unsqueeze(
+                    mask,
+                    -1).expand(list(mask.shape) + [self.ratio]).flatten(-2)
+                return mask
+
+            return DerivedMutable(_expand_mask, _expand_mask, [self, other])
+        else:
+            raise NotImplementedError()
+
+    def dump_chosen(self):
+        return super().dump_chosen()
+
+    def export_chosen(self):
+        return super().export_chosen()
+
+    def fix_chosen(self, chosen):
+        return super().fix_chosen(chosen)
+
+    def num_choices(self) -> int:
+        return super().num_choices
+
+    @property
+    def current_choice(self):
+        return super().current_choice
+
+    @current_choice.setter
+    def current_choice(self, choice):
+        super().current_choice(choice)
+
+class DynamicLinearModel(nn.Module):
+    """
+        x
+        |net0,net1
+        |net2
+        |net3
+        x1
+        |fc
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.net = nn.Sequential(
+            DynamicConv2d(3, 8, 3, 1, 1), DynamicBatchNorm2d(8), nn.ReLU(),
+            DynamicConv2d(8, 16, 3, 1, 1), DynamicBatchNorm2d(16),
+            nn.AdaptiveAvgPool2d(1))
+        self.linear = DynamicLinear(16, 1000)
+
+        MutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+        self._register_mutable()
+
+    def forward(self, x):
+        x1 = self.net(x)
+        x1 = x1.reshape([x1.shape[0], -1])
+        return self.linear(x1)
+
+    def _register_mutable(self):
+        mutable1 = OneShotMutableChannel(8, candidate_choices=[1, 4, 8])
+        mutable2 = OneShotMutableChannel(16, candidate_choices=[2, 8, 16])
+        mutable_value = SampleExpandDerivedMutable(1)
+
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.net[0], mutable1, True)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.net[1], mutable1.expand_mutable_channel(1), True, 0, 8)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.net[3], mutable_value * mutable1, False, 0, 8)
+
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.net[3], mutable2, True)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.net[4], mutable2, True)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.linear, mutable2, False)
+
+
+class DynamicAttention(nn.Module):
+    """
+        x 
+        |blocks: DynamicSequential(depth)
+        |(blocks)
+        x1
+        |fc (OneShotMutableChannel * OneShotMutableValue)
+        output
+    """
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.mutable_depth = OneShotMutableValue(
+            value_list=[1, 2], default_value=2)
+        self.mutable_embed_dims = OneShotMutableChannel(
+            num_channels=624, candidate_choices=[576, 624])
+        self.base_embed_dims = OneShotMutableChannel(
+            num_channels=64, candidate_choices=[64])
+        self.mutable_num_heads = [
+            OneShotMutableValue(value_list=[8, 10], default_value=10)
+            for _ in range(2)
+        ]
+        self.mutable_mlp_ratios = [
+            OneShotMutableValue(value_list=[3.0, 3.5, 4.0], default_value=4.0)
+            for _ in range(2)
+        ]
+        self.mutable_q_embed_dims = [
+            i * self.base_embed_dims for i in self.mutable_num_heads
+        ]
+
+        self.patch_embed = DynamicPatchEmbed(
+            img_size=224,
+            in_channels=3,
+            embed_dims=self.mutable_embed_dims.num_channels)
+
+        # cls token and pos embed
+        self.pos_embed = nn.Parameter(
+            torch.zeros(1, 197, self.mutable_embed_dims.num_channels))
+        self.cls_token = nn.Parameter(
+            torch.zeros(1, 1, self.mutable_embed_dims.num_channels))
+
+        layers = []
+        for i in range(self.mutable_depth.max_choice):
+            layer = TransformerEncoderLayer(
+                embed_dims=self.mutable_embed_dims.num_channels,
+                num_heads=self.mutable_num_heads[i].max_choice,
+                mlp_ratio=self.mutable_mlp_ratios[i].max_choice)
+            layers.append(layer)
+        self.blocks = DynamicSequential(*layers)
+
+        # OneShotMutableChannelUnit
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        self.register_mutables()
+
+    def register_mutables(self):
+        # mutablevalue
+        self.blocks.register_mutable_attr('depth', self.mutable_depth)
+        # mutablechannel
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.patch_embed, self.mutable_embed_dims, True)
+
+        for i in range(self.mutable_depth.max_choice):
+            layer = self.blocks[i]
+            layer.register_mutables(
+                mutable_num_heads=self.mutable_num_heads[i],
+                mutable_mlp_ratios=self.mutable_mlp_ratios[i],
+                mutable_q_embed_dims=self.mutable_q_embed_dims[i],
+                mutable_head_dims=self.base_embed_dims,
+                mutable_embed_dims=self.mutable_embed_dims)
+
+    def forward(self, x: torch.Tensor):
+        B = x.shape[0]
+        x = self.patch_embed(x)
+        embed_dims = self.mutable_embed_dims.current_choice
+        cls_tokens = self.cls_token[..., :embed_dims].expand(B, -1, -1)
+        x = torch.cat((cls_tokens, x), dim=1)
+        x = x + self.pos_embed[..., :embed_dims]
+        x = self.blocks(x)
+        return torch.mean(x[:, 1:], dim=1)
+
+
+class DynamicMMBlock(nn.Module):
+
+    arch_setting = dict(
+        kernel_size=[  # [min_kernel_size, max_kernel_size, step]
+            [3, 5, 2],
+            [3, 5, 2],
+            [3, 5, 2],
+            [3, 5, 2],
+            [3, 5, 2],
+            [3, 5, 2],
+            [3, 5, 2],
+        ],
+        num_blocks=[  # [min_num_blocks, max_num_blocks, step]
+            [1, 2, 1],
+            [3, 5, 1],
+            [3, 6, 1],
+            [3, 6, 1],
+            [3, 8, 1],
+            [3, 8, 1],
+            [1, 2, 1],
+        ],
+        expand_ratio=[  # [min_expand_ratio, max_expand_ratio, step]
+            [1, 1, 1],
+            [4, 6, 1],
+            [4, 6, 1],
+            [4, 6, 1],
+            [4, 6, 1],
+            [6, 6, 1],
+            [6, 6, 1],
+        ],
+        num_out_channels=[  # [min_channel, max_channel, step]
+            [16, 24, 8],
+            [24, 32, 8],
+            [32, 40, 8],
+            [64, 72, 8],
+            [112, 128, 8],
+            [192, 216, 8],
+            [216, 224, 8],
+        ])
+
+    def __init__(
+        self,
+        conv_cfg: Dict = dict(type='mmrazor.BigNasConv2d'),
+        norm_cfg: Dict = dict(type='mmrazor.DynamicBatchNorm2d'),
+        fine_grained_mode: bool = False,
+    ) -> None:
+        super().__init__()
+
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self.act_list = ['Swish'] * 7
+        self.stride_list = [1, 2, 2, 2, 1, 2, 1]
+        self.with_se_list = [False, False, True, False, True, True, True]
+        self.kernel_size_list = parse_values(self.arch_setting['kernel_size'])
+        self.num_blocks_list = parse_values(self.arch_setting['num_blocks'])
+        self.expand_ratio_list = \
+            parse_values(self.arch_setting['expand_ratio'])
+        self.num_channels_list = \
+            parse_values(self.arch_setting['num_out_channels'])
+        assert len(self.kernel_size_list) == len(self.num_blocks_list) == \
+            len(self.expand_ratio_list) == len(self.num_channels_list)
+
+        self.fine_grained_mode = fine_grained_mode
+        self.with_attentive_shortcut = True
+        self.in_channels = 24
+
+        self.first_out_channels_list = [16]
+        self.first_conv = ConvModule(
+            in_channels=3,
+            out_channels=24,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            conv_cfg=self.conv_cfg,
+            norm_cfg=self.norm_cfg,
+            act_cfg=dict(type='Swish'))
+
+        self.layers = []
+        for i, (num_blocks, kernel_sizes, expand_ratios, num_channels) in \
+            enumerate(zip(self.num_blocks_list, self.kernel_size_list,
+                          self.expand_ratio_list, self.num_channels_list)):
+            inverted_res_layer = self._make_single_layer(
+                out_channels=num_channels,
+                num_blocks=num_blocks,
+                kernel_sizes=kernel_sizes,
+                expand_ratios=expand_ratios,
+                act_cfg=self.act_list[i],
+                stride=self.stride_list[i],
+                use_se=self.with_se_list[i])
+            layer_name = f'layer{i + 1}'
+            self.add_module(layer_name, inverted_res_layer)
+            self.layers.append(inverted_res_layer)
+
+        last_expand_channels = 1344
+        self.out_channels = 1984
+        self.last_out_channels_list = [1792, 1984]
+        self.last_expand_ratio_list = [6]
+
+        last_layers = Sequential(
+            OrderedDict([('final_expand_layer',
+                          ConvModule(
+                              in_channels=self.in_channels,
+                              out_channels=last_expand_channels,
+                              kernel_size=1,
+                              padding=0,
+                              conv_cfg=self.conv_cfg,
+                              norm_cfg=self.norm_cfg,
+                              act_cfg=dict(type='Swish'))),
+                         ('pool', nn.AdaptiveAvgPool2d((1, 1))),
+                         ('feature_mix_layer',
+                          ConvModule(
+                              in_channels=last_expand_channels,
+                              out_channels=self.out_channels,
+                              kernel_size=1,
+                              padding=0,
+                              bias=False,
+                              conv_cfg=self.conv_cfg,
+                              norm_cfg=None,
+                              act_cfg=dict(type='Swish')))]))
+        self.add_module('last_conv', last_layers)
+        self.layers.append(last_layers)
+
+        self.register_mutables()
+
+    def _make_single_layer(self, out_channels, num_blocks, kernel_sizes,
+                           expand_ratios, act_cfg, stride, use_se):
+        _layers = []
+        for i in range(max(num_blocks)):
+            if i >= 1:
+                stride = 1
+            if use_se:
+                se_cfg = dict(
+                    act_cfg=(dict(type='ReLU'), dict(type='HSigmoid')),
+                    ratio=4,
+                    conv_cfg=self.conv_cfg)
+            else:
+                se_cfg = None  # type: ignore
+
+            mb_layer = MBBlock(
+                in_channels=self.in_channels,
+                out_channels=max(out_channels),
+                kernel_size=max(kernel_sizes),
+                stride=stride,
+                expand_ratio=max(expand_ratios),
+                conv_cfg=self.conv_cfg,
+                norm_cfg=self.norm_cfg,
+                act_cfg=dict(type=act_cfg),
+                se_cfg=se_cfg,
+                with_attentive_shortcut=self.with_attentive_shortcut)
+
+            _layers.append(mb_layer)
+            self.in_channels = max(out_channels)
+
+        dynamic_seq = DynamicSequential(*_layers)
+        return dynamic_seq
+
+    def register_mutables(self):
+        """Mutate the BigNAS-style MobileNetV3."""
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        self.first_mutable_channels = OneShotMutableChannel(
+            alias='backbone.first_channels',
+            num_channels=max(self.first_out_channels_list),
+            candidate_choices=self.first_out_channels_list)
+
+        mutate_conv_module(
+            self.first_conv, mutable_out_channels=self.first_mutable_channels)
+
+        mid_mutable = self.first_mutable_channels
+        # mutate the built mobilenet layers
+        for i, layer in enumerate(self.layers[:-1]):
+            num_blocks = self.num_blocks_list[i]
+            kernel_sizes = self.kernel_size_list[i]
+            expand_ratios = self.expand_ratio_list[i]
+            out_channels = self.num_channels_list[i]
+
+            prefix = 'backbone.layers.' + str(i + 1) + '.'
+
+            mutable_out_channels = OneShotMutableChannel(
+                alias=prefix + 'out_channels',
+                candidate_choices=out_channels,
+                num_channels=max(out_channels))
+
+            if not self.fine_grained_mode:
+                mutable_kernel_size = OneShotMutableValue(
+                    alias=prefix + 'kernel_size', value_list=kernel_sizes)
+
+                mutable_expand_ratio = OneShotMutableValue(
+                    alias=prefix + 'expand_ratio', value_list=expand_ratios)
+
+            mutable_depth = OneShotMutableValue(
+                alias=prefix + 'depth', value_list=num_blocks)
+            layer.register_mutable_attr('depth', mutable_depth)
+
+            for k in range(max(self.num_blocks_list[i])):
+
+                if self.fine_grained_mode:
+                    mutable_kernel_size = OneShotMutableValue(
+                        alias=prefix + str(k) + '.kernel_size',
+                        value_list=kernel_sizes)
+
+                    mutable_expand_ratio = OneShotMutableValue(
+                        alias=prefix + str(k) + '.expand_ratio',
+                        value_list=expand_ratios)
+
+                mutate_mobilenet_layer(layer[k], mid_mutable,
+                                       mutable_out_channels,
+                                       mutable_expand_ratio,
+                                       mutable_kernel_size)
+                mid_mutable = mutable_out_channels
+
+        self.last_mutable_channels = OneShotMutableChannel(
+            alias='backbone.last_channels',
+            num_channels=self.out_channels,
+            candidate_choices=self.last_out_channels_list)
+
+        last_mutable_expand_value = OneShotMutableValue(
+            value_list=self.last_expand_ratio_list,
+            default_value=max(self.last_expand_ratio_list))
+
+        derived_expand_channels = mid_mutable * last_mutable_expand_value
+        mutate_conv_module(
+            self.layers[-1].final_expand_layer,
+            mutable_in_channels=mid_mutable,
+            mutable_out_channels=derived_expand_channels)
+        mutate_conv_module(
+            self.layers[-1].feature_mix_layer,
+            mutable_in_channels=derived_expand_channels,
+            mutable_out_channels=self.last_mutable_channels)
+
+    def forward(self, x):
+        x = self.first_conv(x)
+        for _, layer in enumerate(self.layers):
+            x = layer(x)
+
+        return tuple([x])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet1.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f7886351b81c8913bb0d9cded6ed9b524fdf5fa8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet1.yaml
@@ -0,0 +1,24 @@
+op1.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op1.mutable_out_channels:
+  current_choice: 4
+  origin_channels: 8
+bn1.mutable_num_features:
+  current_choice: 4
+  origin_channels: 8
+op2.mutable_in_channels:
+  current_choice: 4
+  origin_channels: 8
+op2.mutable_out_channels:
+  current_choice: 4
+  origin_channels: 8
+bn2.mutable_num_features:
+  current_choice: 4
+  origin_channels: 8
+op3.mutable_in_channels:
+  current_choice: 4
+  origin_channels: 8
+op3.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet2.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bd49b2c7da309b001a5ffdba81eb31135ace9a64
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/subnet2.yaml
@@ -0,0 +1,24 @@
+op1.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+op1.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
+bn1.mutable_num_features:
+  current_choice: 8
+  origin_channels: 8
+op2.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 8
+op2.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
+bn2.mutable_num_features:
+  current_choice: 8
+  origin_channels: 8
+op3.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 8
+op3.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 8
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_algorithm/MBV2_220M.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_algorithm/MBV2_220M.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b96ebeb493d029920429902103d5ca572f958d5a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_algorithm/MBV2_220M.yaml
@@ -0,0 +1,474 @@
+backbone.conv1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv1.conv.mutable_in_channels:
+  current_choice: 3
+  origin_channels: 3
+backbone.conv1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.conv2.bn.mutable_num_features:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.conv2.conv.mutable_in_channels:
+  current_choice: 280
+  origin_channels: 480
+backbone.conv2.conv.mutable_out_channels:
+  current_choice: 1920
+  origin_channels: 1920
+backbone.layer1.0.conv.0.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.0.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.bn.mutable_num_features:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer1.0.conv.1.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 48
+backbone.layer1.0.conv.1.conv.mutable_out_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.0.conv.mutable_in_channels:
+  current_choice: 8
+  origin_channels: 24
+backbone.layer2.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 144
+backbone.layer2.0.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.bn.mutable_num_features:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer2.1.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer2.1.conv.2.conv.mutable_out_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.0.conv.mutable_in_channels:
+  current_choice: 16
+  origin_channels: 40
+backbone.layer3.0.conv.0.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.bn.mutable_num_features:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.1.conv.mutable_out_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.0.conv.2.conv.mutable_in_channels:
+  current_choice: 96
+  origin_channels: 240
+backbone.layer3.0.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.1.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.1.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.bn.mutable_num_features:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer3.2.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer3.2.conv.2.conv.mutable_out_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.0.conv.mutable_in_channels:
+  current_choice: 24
+  origin_channels: 48
+backbone.layer4.0.conv.0.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.bn.mutable_num_features:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.1.conv.mutable_out_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.0.conv.2.conv.mutable_in_channels:
+  current_choice: 144
+  origin_channels: 288
+backbone.layer4.0.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.1.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.1.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.2.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.2.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.bn.mutable_num_features:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer4.3.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer4.3.conv.2.conv.mutable_out_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer5.0.conv.0.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.0.conv.mutable_in_channels:
+  current_choice: 48
+  origin_channels: 96
+backbone.layer5.0.conv.0.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.bn.mutable_num_features:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.1.conv.mutable_out_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.0.conv.2.conv.mutable_in_channels:
+  current_choice: 288
+  origin_channels: 576
+backbone.layer5.0.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.1.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.1.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.0.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.0.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.bn.mutable_num_features:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.1.conv.mutable_out_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.bn.mutable_num_features:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer5.2.conv.2.conv.mutable_in_channels:
+  current_choice: 432
+  origin_channels: 864
+backbone.layer5.2.conv.2.conv.mutable_out_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer6.0.conv.0.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.0.conv.mutable_in_channels:
+  current_choice: 64
+  origin_channels: 144
+backbone.layer6.0.conv.0.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.bn.mutable_num_features:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.1.conv.mutable_out_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.0.conv.2.conv.mutable_in_channels:
+  current_choice: 648
+  origin_channels: 864
+backbone.layer6.0.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.0.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.0.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.1.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.1.conv.2.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.1.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.0.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.0.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.bn.mutable_num_features:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.1.conv.mutable_out_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.2.bn.mutable_num_features:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer6.2.conv.2.conv.mutable_in_channels:
+  current_choice: 720
+  origin_channels: 1440
+backbone.layer6.2.conv.2.conv.mutable_out_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer7.0.conv.0.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.0.conv.mutable_in_channels:
+  current_choice: 176
+  origin_channels: 240
+backbone.layer7.0.conv.0.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.bn.mutable_num_features:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.1.conv.mutable_out_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.bn.mutable_num_features:
+  current_choice: 280
+  origin_channels: 480
+backbone.layer7.0.conv.2.conv.mutable_in_channels:
+  current_choice: 1440
+  origin_channels: 1440
+backbone.layer7.0.conv.2.conv.mutable_out_channels:
+  current_choice: 280
+  origin_channels: 480
+head.fc.mutable_in_features:
+  current_choice: 1920
+  origin_channels: 1920
+head.fc.mutable_out_features:
+  current_choice: 1000
+  origin_channels: 1000
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_mutator/subnet1.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_mutator/subnet1.json
new file mode 100644
index 0000000000000000000000000000000000000000..2fed960b2e5c1d5ba178635f0b49fb17feca5e00
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_mutator/subnet1.json
@@ -0,0 +1,15 @@
+{
+    "op1_(0, 8)_8": {
+      "init_args":{
+        "num_channels":8,
+        "divisor":1,
+        "min_value":1,
+        "min_ratio":0.9,
+        "candidate_choices":[
+            6
+        ],
+        "choice_mode":"number"
+      },
+      "choice":6
+    }
+}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_subnet/mockmodel_subnet.yaml b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_subnet/mockmodel_subnet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b7262da5b6daffa8791be8dc23040f0c0839811b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_subnet/mockmodel_subnet.yaml
@@ -0,0 +1,6 @@
+mutable1: 
+  chosen: conv1
+mutable2: 
+  chosen: conv2
+mutable3.0.kernel_size:
+  chosen: 3
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_task_modules/mmcls_cfg.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_task_modules/mmcls_cfg.py
new file mode 100644
index 0000000000000000000000000000000000000000..117b9383e228be4202aa7f13ea3ba7a10b2b5a1a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_models/test_task_modules/mmcls_cfg.py
@@ -0,0 +1,2 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_architecture_config.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_architecture_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5d2204754f156a8ba967df8806d375d88c6b67a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_architecture_config.py
@@ -0,0 +1,14 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from platform import architecture
+
+
+supernet = dict(
+    type='MockModel',
+)
+
+model = dict(
+    type='MockAlgorithm',
+    architecture=supernet,
+    _return_architecture_ = True,
+)
+
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_subnet_config.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_subnet_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..7311c1e240a8c33145ee491490b1c539c2762d1e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/registry_subnet_config.py
@@ -0,0 +1,17 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+supernet = dict(
+    type='mmrazor.sub_model',
+    cfg=dict(
+        type='MockModel',
+    ),
+    fix_subnet = {
+            'backbone.mutable1': {'chosen':'conv1'},
+            'backbone.mutable2': {'chosen':'conv2'},
+        },
+    extra_prefix='backbone.'
+)
+
+model = dict(
+    type='MockAlgorithm',
+    architecture=supernet
+)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/subnet.json b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/subnet.json
new file mode 100644
index 0000000000000000000000000000000000000000..4fe63bda23479f26a70d6f3de6fda73764019a61
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/test_registry/subnet.json
@@ -0,0 +1,141 @@
+{
+    "type":"DCFFChannelMutator",
+    "channel_unit_cfg":{
+        "type":"DCFFChannelUnit",
+        "default_args":{
+            "choice_mode":"ratio"
+        },
+        "units":{
+            "backbone.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":1.0
+            },
+            "backbone.layer1.0.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer1.1.conv1_(0, 64)_64":{
+                "init_args":{
+                    "num_channels":64,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.640625
+            },
+            "backbone.layer2.0.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer2.0.conv2_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59375
+            },
+            "backbone.layer2.1.conv1_(0, 128)_128":{
+                "init_args":{
+                    "num_channels":128,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer3.0.conv2_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.59765625
+            },
+            "backbone.layer3.1.conv1_(0, 256)_256":{
+                "init_args":{
+                    "num_channels":256,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.6484375
+            },
+            "backbone.layer4.0.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.0.conv2_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            },
+            "backbone.layer4.1.conv1_(0, 512)_512":{
+                "init_args":{
+                    "num_channels":512,
+                    "choice_mode":"ratio",
+                    "divisor":1,
+                    "min_value":1,
+                    "min_ratio":0.9
+                },
+                "choice":0.69921875
+            }
+        }
+    },
+    "parse_cfg":{
+        "type":"ChannelAnalyzer",
+        "demo_input":[
+            1,
+            3,
+            224,
+            224
+        ],
+        "tracer_type":"BackwardTracer"
+    }
+}
\ No newline at end of file
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/data/tracer_passed_models.py b/cv/distiller/CWD/pytorch/mmrazor/tests/data/tracer_passed_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..ade282141a74dc1c1762567740c1b3ef6da97ac1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/data/tracer_passed_models.py
@@ -0,0 +1,520 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .model_library import (MMClsModelLibrary, MMDetModelLibrary,
+                            DefaultModelLibrary, TorchModelLibrary,
+                            MMPoseModelLibrary, MMSegModelLibrary)
+
+
+class PassedModelManager:
+
+    def __init__(self) -> None:
+        pass
+
+    def include_models(self, full_test=False):
+        models = []
+        for library in self.libraries(full_test):
+            models.extend(library.include_models())
+        return models
+
+    def uninclude_models(self, full_test=False):
+        models = []
+        for library in self.libraries(full_test):
+            models.extend(library.uninclude_models())
+        return models
+
+    def libraries(self, full=False):
+        return []
+
+
+class FxPassedModelManager(PassedModelManager):
+
+    _default_library = None
+    _torch_library = None
+    _mmcls_library = None
+    _mmseg_library = None
+    _mmdet_library = None
+    _mmpose_library = None
+
+    def libraries(self, full=False):
+        if full:
+            return [
+                self.__class__.default_library(),
+                self.__class__.torch_library(),
+                self.__class__.mmcls_library(),
+                self.__class__.mmseg_library(),
+                self.__class__.mmdet_library(),
+                self.__class__.mmpose_library(),
+            ]
+        else:
+            return [self.__class__.default_library()]
+
+    @classmethod
+    def default_library(cls):
+        if cls._default_library is None:
+            cls._default_library = DefaultModelLibrary(include=[
+                'SingleLineModel',
+                'ResBlock',
+                'AddCatModel',
+                'ConcatModel',
+                'MultiConcatModel',
+                'MultiConcatModel2',
+                'GroupWiseConvModel',
+                'Xmodel',
+                'MultipleUseModel',
+                'Icep',
+                'ExpandLineModel',
+                'MultiBindModel',
+                'DwConvModel',
+                'ConvAttnModel',
+                # mm models
+                'resnet',
+                'pspnet',
+                'yolo'
+            ],with_mm_models=True)
+
+        return cls._default_library
+
+    @classmethod
+    def torch_library(cls):
+        """
+        googlenet: return a tuple when training, so it should
+        trace in eval mode
+        """
+        torch_includes = [
+            'resnext',
+            'efficientnet',
+            'inception',
+            'wide',
+            'resnet',
+            'regnet',
+            'shufflenet',
+            'mnasnet',
+            'vit',
+            'convnext',
+            'googlenet',
+            'densenet',
+            'swin',
+            'vgg',
+            'mobilenet',
+            'squeezenet',
+            'alexnet',
+        ]
+        if cls._torch_library is None:
+            cls._torch_library = TorchModelLibrary(include=torch_includes)
+        return cls._torch_library
+
+    @classmethod
+    def mmcls_library(cls):
+        """
+        shufflenet consists of chunk operations.
+        resnest: resnest has two problems. First it uses *x.shape() which is
+            not tracerable using fx tracer. Second, it uses channel folding.
+        res2net: res2net consists of split operations.
+        convnext: consist of layernorm.
+        """
+        mmcls_include = [
+            'tnt',
+            'resnet',
+            'resnetv1c',
+            'mobileone',
+            'mlp',
+            'densenet',
+            'hrnet',
+            'seresnet',
+            'van',
+            'repmlp',
+            'repvgg',
+            'vgg',
+            'vgg11bn',
+            'edgenext',
+            'vgg19bn',
+            'wide',
+            'res2net',
+            'vgg13bn',
+            'resnetv1d',
+            'mobilenet',
+            'convmixer',
+            'resnest',
+            'inception',
+            'resnext',
+            'twins',
+            'vgg16bn',
+            'shufflenet',
+            'conformer',
+            'regnet',
+            'seresnext',
+            'vit',
+            'poolformer',
+            't2t',
+            'efficientnet',
+            ## error
+            # 'deit',
+            # 'swin',
+            # 'convnext',
+            # 'mvit'
+        ]
+        if cls._mmcls_library is None:
+            cls._mmcls_library = MMClsModelLibrary(include=mmcls_include)
+        return cls._mmcls_library
+
+    @classmethod
+    def mmdet_library(cls):
+        mmdet_include = [
+            'pafpn',
+            'gn+ws',
+            'paa',
+            'fcos',
+            'autoassign',
+            'centripetalnet',
+            'retinanet',
+            'cornernet',
+            'gn',
+            'instaboost',
+            'rpn',
+            'fpg',
+            'crowddet',
+            'resnest',
+            'pvt',
+            'solo',
+            'grid',
+            'free',
+            'point',
+            'yolo',
+            'double',
+            'dynamic',
+            'maskformer',
+            'scratch',
+            'nas',
+            'yolof',
+            'faster',
+            'atss',
+            'yolox',
+            'fsaf',
+            'ghm',
+            'centernet',
+            'seesaw',
+            'regnet',
+            'cityscapes',
+            'lvis',
+            'sabl',
+            'gfl',
+            'tridentnet',
+            'selfsup',
+            'deepfashion',
+            'efficientnet',
+            'foveabox',
+            'mask',
+            ## errors
+            # 'timm',
+            # 'swin',
+            # 'dyhead',
+            # 'hrnet',
+            # 'deformable',
+            # 'ssd',
+            # 'empirical',
+            # 'detectors',
+            # 'reppoints',
+            # 'scnet',
+            # 'legacy',
+            # 'htc',
+            # 'dcnv',
+            # 'carafe',
+            # 'yolact',
+            # 'panoptic',
+            # 'misc',
+            # 'rtmdet',
+            # 'pascal',
+            # 'ddod',
+            # 'mask2former',
+            # 'tood',
+            # 'queryinst',
+            # 'simple',
+            # 'pisa',
+            # 'fast',
+            # 'cascade',
+            # 'wider',
+            # 'openimages',
+            # '',
+            # 'strong',
+            # 'res2net',
+            # 'libra',
+            # 'vfnet',
+            # 'soft',
+            # 'sparse',
+            # 'gcnet',
+            # 'convnext',
+            # 'ms',
+            # 'dcn',
+            # 'guided',
+            # 'groie',
+            # 'solov',
+            # 'detr',
+        ]
+        if cls._mmdet_library is None:
+            cls._mmdet_library = MMDetModelLibrary(mmdet_include)
+        return cls._mmdet_library
+
+    @classmethod
+    def mmseg_library(cls):
+        # a common error: unet related models
+        include = [
+            'bisenetv',
+            'erfnet',
+            'dmnet',
+            'twins',
+            'segformer',
+            'isanet',
+            'vit',
+            'resnest',
+            'setr',
+            'cgnet',
+            'stdc',
+            'dpt',
+            'pspnet',
+            'upernet',
+            'apcnet',
+            'gcnet',
+            'ann',
+            'ocrnet',
+            'ccnet',
+            'deeplabv',
+            'dnlnet',
+            'point',
+            'fastscnn',
+            'psanet',
+            'segmenter',
+            'danet',
+            'emanet',
+            'icnet',
+            'unet',
+            'fcn',
+            'swin',
+            'nonlocal',
+            'deeplabv3plus',
+            'sem',
+            ## errors
+            # 'mobilenet',
+            # 'mae',
+            # 'knet',
+            # 'poolformer',
+            # 'beit',
+            # 'encnet',
+            # 'hrnet',
+            # 'convnext',
+            # 'fastfcn'
+        ]
+        if cls._mmseg_library is None:
+            cls._mmseg_library = MMSegModelLibrary(include=include)
+        return cls._mmseg_library
+
+
+    @classmethod
+    def mmpose_library(cls):
+        mmpose_include = [
+            'hand',
+            'face',
+            'wholebody',
+            'body',
+            'animal',
+        ]
+        if cls._mmpose_library is None:
+            cls._mmpose_library = MMPoseModelLibrary(include=mmpose_include)
+        
+        return cls._mmpose_library
+
+    # for backward tracer
+
+
+class BackwardPassedModelManager(PassedModelManager):
+
+    _default_library = None
+    _torch_library = None
+    _mmcls_library = None
+    _mmseg_library = None
+    _mmdet_library = None
+    _mmpose_library = None
+
+
+    def libraries(self, full=False):
+        if full:
+            return [
+                self.__class__.default_library(),
+                self.__class__.torch_library(),
+                self.__class__.mmcls_library(),
+                self.__class__.mmseg_library(),
+                self.__class__.mmdet_library(),
+                self.__class__.mmpose_library(),
+            ]
+        else:
+            return [self.__class__.default_library()]
+
+    @classmethod
+    def default_library(cls):
+        if cls._default_library is None:
+            cls._default_library = DefaultModelLibrary(include=[
+                'SingleLineModel',
+                'ResBlock',
+                'AddCatModel',
+                'ConcatModel',
+                'MultiConcatModel',
+                'MultiConcatModel2',
+                'GroupWiseConvModel',
+                'Xmodel',
+                # 'MultipleUseModel', # bug
+                'Icep',
+                'ExpandLineModel',
+                'MultiBindModel',
+                'DwConvModel',
+                'ConvAttnModel',
+            ])
+        return cls._default_library
+
+    @classmethod
+    def torch_library(cls):
+        """
+        googlenet return a tuple when training, so it
+            should trace in eval mode
+        """
+
+        torch_includes = [
+            'alexnet',
+            'densenet',
+            'efficientnet',
+            'googlenet',
+            'inception',
+            'mnasnet',
+            'mobilenet',
+            'regnet',
+            'resnet',
+            'resnext',
+            # 'shufflenet',     # bug
+            'squeezenet',
+            'vgg',
+            'wide_resnet',
+            # "vit",
+            # "swin",
+            # "convnext"
+        ]
+        if cls._torch_library is None:
+            cls._torch_library = TorchModelLibrary(include=torch_includes)
+        return cls._torch_library
+
+    @classmethod
+    def mmcls_library(cls):
+        """
+        shufflenet consists of chunk operations.
+        resnest: resnest has two problems. First it uses *x.shape() which is
+            not tracerable using fx tracer. Second, it uses channel folding.
+        res2net: res2net consists of split operations.
+        convnext: consist of layernorm.
+        """
+        mmcls_model_include = [
+            'vgg',
+            'efficientnet',
+            'resnet',
+            'mobilenet',
+            'resnext',
+            'wide-resnet',
+            # 'shufflenet',  # bug
+            'hrnet',
+            # 'resnest',  # bug
+            'inception',
+            # 'res2net',  # bug
+            'densenet',
+            # 'convnext',  # bug
+            'regnet',
+            # 'van',  # bug
+            # 'swin_transformer',  # bug
+            # 'convmixer', # bug
+            # 't2t',  # bug
+            # 'twins',  # bug
+            # 'repmlp',  # bug
+            # 'tnt',  # bug
+            # 't2t',  # bug
+            # 'mlp_mixer',  # bug
+            # 'conformer',  # bug
+            # 'poolformer',  # bug
+            # 'vit',  # bug
+            # 'efficientformer',
+            # 'mobileone',
+            # 'edgenext'
+        ]
+        mmcls_exclude = ['cutmix', 'cifar', 'gem']
+        if cls._mmcls_library is None:
+            cls._mmcls_library = MMClsModelLibrary(
+                include=mmcls_model_include, exclude=mmcls_exclude)
+        return cls._mmcls_library
+
+    @classmethod
+    def mmdet_library(cls):
+        mmdet_include = [
+            # 'rpn',  #
+            # 'faster-rcnn',
+            # 'cascade-rcnn',
+            # 'fast-rcnn',  # mmdet has bug.
+            # 'retinanet',
+            # 'mask-rcnn',
+            # 'ssd300'
+        ]
+        if cls._mmdet_library is None:
+            cls._mmdet_library = MMDetModelLibrary(mmdet_include)
+        return cls._mmdet_library
+
+    @classmethod
+    def mmseg_library(cls):
+        include = [
+            # 'cgnet',
+            # 'gcnet',
+            # 'setr',
+            # 'deeplabv3',
+            # 'twins',
+            # 'fastfcn',
+            # 'fpn',
+            # 'upernet',
+            # 'dnl',
+            # 'icnet',
+            # 'segmenter',
+            # 'encnet',
+            # 'erfnet',
+            # 'segformer',
+            # 'apcnet',
+            # 'fast',
+            # 'ocrnet',
+            # 'lraspp',
+            # 'dpt',
+            # 'fcn',
+            # 'psanet',
+            # 'bisenetv2',
+            # 'pointrend',
+            # 'ccnet',
+            'pspnet',
+            # 'dmnet',
+            # 'stdc',
+            # 'ann',
+            # 'nonlocal',
+            # 'isanet',
+            # 'danet',
+            # 'emanet',
+            # 'deeplabv3plus',
+            # 'bisenetv1',
+        ]
+        if cls._mmseg_library is None:
+            cls._mmseg_library = MMSegModelLibrary(include=include)
+        return cls._mmseg_library
+
+    @classmethod
+    def mmpose_library(cls):
+        mmpose_include = [
+            'hand',
+            'face',
+            'wholebody',
+            'body',
+            'animal',
+        ]
+
+        if cls._mmpose_library is None:
+            cls._mmpose_library = MMPoseModelLibrary(include=mmpose_include)
+        return cls._mmpose_library
+
+
+fx_passed_library = FxPassedModelManager()
+backward_passed_library = BackwardPassedModelManager()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_deliver_manager.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_deliver_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..a476a7e3e92e5671ec44eef205797f4ada1b3e8a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_deliver_manager.py
@@ -0,0 +1,60 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmengine import ConfigDict
+
+from mmrazor.models.task_modules import DistillDeliveryManager
+
+
+class TestDeliverManager(TestCase):
+
+    def test_init(self):
+
+        distill_deliveries = ConfigDict(
+            delivery1=dict(
+                type='MethodOutputs',
+                max_keep_data=2,
+                method_path='toy_module.ToyClass.random_int'))
+
+        manager = DistillDeliveryManager(distill_deliveries)
+        self.assertEquals(len(manager.deliveries), 1)
+
+        manager = DistillDeliveryManager()
+        self.assertEquals(len(manager.deliveries), 0)
+
+    def test_context_manager(self):
+        from toy_module import ToyClass
+
+        distill_deliveries = ConfigDict(
+            delivery1=dict(
+                type='MethodOutputs',
+                max_keep_data=2,
+                method_path='toy_module.ToyClass.random_int'))
+
+        manager = DistillDeliveryManager(distill_deliveries)
+
+        manager.override_data = False
+        self.assertFalse(manager.override_data)
+        with manager:
+            toy_class = ToyClass()
+            output1_tea = toy_class.random_int()
+            output2_tea = toy_class.random_int()
+
+        with self.assertRaisesRegex(AssertionError, 'push into an full queue'):
+            with manager:
+                _ = toy_class.random_int()
+
+        self.assertFalse(manager.override_data)
+        manager.override_data = True
+        self.assertTrue(manager.override_data)
+        with manager:
+            output1_stu = toy_class.random_int()
+            output2_stu = toy_class.random_int()
+
+        # With ``DistillDeliverManager``, outputs of the teacher and
+        # the student are the same.
+        assert output1_stu == output1_tea and output2_stu == output2_tea
+
+        with self.assertRaisesRegex(AssertionError, 'pop from an empty queue'):
+            with manager:
+                _ = toy_class.random_int()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_function_outputs_deliver.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_function_outputs_deliver.py
new file mode 100644
index 0000000000000000000000000000000000000000..531e5979558d5c1b473fb8da26bd9e51178ff5e9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_function_outputs_deliver.py
@@ -0,0 +1,163 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+import os.path as osp
+import tempfile
+from unittest import TestCase
+from unittest.mock import Mock
+
+import torch
+import torch.nn as nn
+from mmengine.evaluator import Evaluator
+from mmengine.hooks import EMAHook
+from mmengine.logging import MMLogger
+from mmengine.model import BaseModel, ExponentialMovingAverage
+from mmengine.optim import OptimWrapper
+from mmengine.runner import Runner
+from torch.utils.data import Dataset
+
+from mmrazor.models.task_modules import FunctionOutputsDelivery
+
+
+class ToyModel(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(2, 1)
+        # test FunctionOutputsDelivery when ema_hook is used
+        self.deliver = FunctionOutputsDelivery(
+            max_keep_data=2, func_path='toy_module.toy_func')
+
+    def forward(self, inputs, data_sample, mode='tensor'):
+        labels = torch.stack(data_sample)
+        inputs = torch.stack(inputs)
+        with self.deliver:
+            outputs = self.linear(inputs)
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = (labels - outputs).sum()
+            outputs = dict(loss=loss)
+            return outputs
+        else:
+            return outputs
+
+
+class DummyDataset(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_sample=self.label[index])
+
+
+class TestFuncOutputsDeliver(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.TemporaryDirectory()
+
+    def tearDown(self):
+        # `FileHandler` should be closed in Windows, otherwise we cannot
+        # delete the temporary directory
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        self.temp_dir.cleanup()
+
+    def test_init(self):
+
+        with self.assertRaisesRegex(TypeError, 'func_path should be'):
+            _ = FunctionOutputsDelivery(max_keep_data=1, func_path=1)
+
+        with self.assertRaisesRegex(AssertionError, 'func_path must have at '):
+            _ = FunctionOutputsDelivery(max_keep_data=1, func_path='toy_func')
+
+    def test_context_manager(self):
+        import toy_module
+
+        delivery = FunctionOutputsDelivery(max_keep_data=2, func_path='aaa.bb')
+        with self.assertRaisesRegex(ImportError, 'aaa is not imported'):
+            with delivery:
+                _ = toy_module.toy_func()
+
+        delivery = FunctionOutputsDelivery(
+            max_keep_data=1, func_path='toy_module.bb')
+        with self.assertRaisesRegex(AssertionError, 'bb is not in toy_mod'):
+            with delivery:
+                _ = toy_module.toy_func()
+
+        delivery = FunctionOutputsDelivery(
+            max_keep_data=1, func_path='toy_module.TOY_VAR')
+        with self.assertRaisesRegex(TypeError, 'TOY_VAR should be'):
+            with delivery:
+                _ = toy_module.toy_func()
+
+        delivery = FunctionOutputsDelivery(
+            max_keep_data=2, func_path='toy_module.toy_func')
+
+        delivery.override_data = False
+        with delivery:
+            output1_tea = toy_module.toy_func()
+            output2_tea = toy_module.toy_func()
+
+        with self.assertRaisesRegex(AssertionError, 'push into an full queue'):
+            with delivery:
+                _ = toy_module.toy_func()
+
+        delivery.override_data = True
+        with delivery:
+            output1_stu = toy_module.toy_func()
+            output2_stu = toy_module.toy_func()
+
+        # With ``FunctionOutputsDeliver``, outputs of the teacher and
+        # the student are the same.
+        assert output1_stu == output1_tea and output2_stu == output2_tea
+
+        with self.assertRaisesRegex(AssertionError, 'pop from an empty queue'):
+            with delivery:
+                _ = toy_module.toy_func()
+
+    def test_ema_hook(self):
+        device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
+        model = ToyModel().to(device)
+        evaluator = Evaluator([])
+        evaluator.evaluate = Mock(return_value=dict(acc=0.5))
+        runner = Runner(
+            model=model,
+            train_dataloader=dict(
+                dataset=DummyDataset(),
+                sampler=dict(type='DefaultSampler', shuffle=True),
+                batch_size=3,
+                num_workers=0),
+            val_dataloader=dict(
+                dataset=DummyDataset(),
+                sampler=dict(type='DefaultSampler', shuffle=False),
+                batch_size=3,
+                num_workers=0),
+            val_evaluator=evaluator,
+            work_dir=self.temp_dir.name,
+            default_scope='mmrazor',
+            optim_wrapper=OptimWrapper(
+                torch.optim.Adam(ToyModel().parameters())),
+            train_cfg=dict(by_epoch=True, max_epochs=2, val_interval=1),
+            val_cfg=dict(),
+            default_hooks=dict(logger=None),
+            custom_hooks=[dict(type='EMAHook', )],
+            experiment_name='test_func_outputs_deliver')
+        runner.train()
+        for hook in runner.hooks:
+            if isinstance(hook, EMAHook):
+                self.assertTrue(
+                    isinstance(hook.ema_model, ExponentialMovingAverage))
+
+        self.assertTrue(
+            osp.exists(osp.join(self.temp_dir.name, 'epoch_2.pth')))
+        checkpoint = torch.load(osp.join(self.temp_dir.name, 'epoch_2.pth'))
+        self.assertTrue('ema_state_dict' in checkpoint)
+        self.assertTrue(checkpoint['ema_state_dict']['steps'] == 8)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_method_outputs_deliver.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_method_outputs_deliver.py
new file mode 100644
index 0000000000000000000000000000000000000000..a76967977ff8438b5ea9274a206cc1d073780354
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/test_method_outputs_deliver.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.task_modules import MethodOutputsDelivery
+
+
+class TestMethodOutputsDeliver(TestCase):
+
+    def test_init(self):
+        with self.assertRaisesRegex(TypeError, 'method_path should be'):
+            _ = MethodOutputsDelivery(max_keep_data=1, method_path=1)
+
+        with self.assertRaisesRegex(AssertionError,
+                                    'method_path must have at '):
+            _ = MethodOutputsDelivery(max_keep_data=1, method_path='toy_func')
+
+        with self.assertRaisesRegex(ImportError, 'aaa is not imported'):
+            _ = MethodOutputsDelivery(max_keep_data=1, method_path='aaa.bb.b')
+
+        with self.assertRaisesRegex(AssertionError, 'bb is not in toy_module'):
+            _ = MethodOutputsDelivery(
+                max_keep_data=1, method_path='toy_module.bb.bbb')
+
+        with self.assertRaisesRegex(TypeError, 'toy_func should be a type'):
+            _ = MethodOutputsDelivery(
+                max_keep_data=1, method_path='toy_module.toy_func.bbb')
+
+        with self.assertRaisesRegex(AssertionError, 'bbb is not in'):
+            _ = MethodOutputsDelivery(
+                max_keep_data=1, method_path='toy_module.ToyClass.bbb')
+
+        with self.assertRaisesRegex(TypeError, 'count should be'):
+            _ = MethodOutputsDelivery(
+                max_keep_data=1, method_path='toy_module.ToyClass.count')
+
+    def test_context_manager(self):
+        from toy_module import ToyClass
+
+        delivery = MethodOutputsDelivery(
+            max_keep_data=2, method_path='toy_module.ToyClass.random_int')
+
+        # Without ``MethodOutputsDelivery``, outputs of the teacher and the
+        # student are very likely to be different.
+        # from toy_module import ToyClass
+        # toy_class = ToyClass()
+        # output_tea = toy_class.random_int()
+        # output_stu = toy_class.random_int()
+
+        delivery.override_data = False
+        with delivery:
+            toy_class = ToyClass()
+            output1_tea = toy_class.random_int()
+            output2_tea = toy_class.random_int()
+
+        with self.assertRaisesRegex(AssertionError, 'push into an full queue'):
+            with delivery:
+                _ = toy_class.random_int()
+
+        delivery.override_data = True
+        with delivery:
+            output1_stu = toy_class.random_int()
+            output2_stu = toy_class.random_int()
+
+        # With ``MethodOutputsDeliver``, outputs of the teacher and the
+        # student are the same.
+        assert output1_stu == output1_tea and output2_stu == output2_tea
+
+        with self.assertRaisesRegex(AssertionError, 'pop from an empty queue'):
+            with delivery:
+                _ = toy_class.random_int()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/toy_module.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/toy_module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bca28d52f27f242a233f668393ea7babc7f17a7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_delivers/toy_module.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+
+TOY_VAR = 'aaa'
+
+
+def toy_func():
+    return random.randint(0, 1000)
+
+
+class ToyClass:
+
+    def __init__(self):
+        self._count = 0
+
+    def random_int(self):
+        return random.randint(0, 1000)
+
+    @property
+    def count(self):
+        return self._count
+
+    def __call__(self):
+        self._count += 1
+        return self._count
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_flow.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_flow.py
new file mode 100644
index 0000000000000000000000000000000000000000..87dee6747aa55bf6d62ede34a7047133066ded42
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_flow.py
@@ -0,0 +1,80 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+from mmrazor.structures.graph.channel_flow import ChannelElem, ChannelTensor
+
+
+class TestChannelTensor(unittest.TestCase):
+
+    def test_union(self):
+        tensor1 = ChannelTensor(8)
+        tensor2 = ChannelTensor(8)
+        tensor3 = ChannelTensor(8)
+        tensor4 = ChannelTensor(8)
+
+        ChannelTensor.union_two(tensor1, tensor2)
+        ChannelTensor.union_two(tensor3, tensor4)
+        self.assertUionedTensor(tensor1, tensor2)
+        self.assertUionedTensor(tensor3, tensor4)
+
+        ChannelTensor.union_two(tensor1, tensor4)
+
+        self.assertUionedTensor(tensor1, tensor2)
+        self.assertUionedTensor(tensor2, tensor3)
+        self.assertUionedTensor(tensor3, tensor4)
+        self.assertUionedTensor(tensor1, tensor4)
+
+    def test_cat(self):
+        tensor1 = ChannelTensor(8)
+        tensor2 = ChannelTensor(8)
+        tensor3 = ChannelTensor(16)
+
+        tensor_cat = ChannelTensor.cat([tensor1, tensor2])
+        self.assertEqual(len(tensor_cat), 16)
+        ChannelTensor.union_two(tensor_cat, tensor3)
+
+        tensor31 = tensor3[:8]
+        tensor32 = tensor3[8:]
+        self.assertUionedTensor(tensor1, tensor31)
+        self.assertUionedTensor(tensor2, tensor32)
+
+    def test_add_cat(self):
+        """8+8 && 4+12 -> 4+4+8."""
+        tensor1 = ChannelTensor(8)
+        tensor2 = ChannelTensor(8)
+        tensor_cat1 = ChannelTensor.cat([tensor1, tensor2])
+
+        tensor3 = ChannelTensor(4)
+        tensor4 = ChannelTensor(12)
+        tensor_cat2 = ChannelTensor.cat([tensor3, tensor4])
+
+        ChannelTensor.union_two(tensor_cat1, tensor_cat2)
+        self.assertUionedTensor(tensor_cat1, tensor_cat2)
+
+        self.assertUionedTensor(tensor_cat1[0:4], tensor3[0:4])
+        self.assertUionedTensor(tensor_cat1[4:8], tensor4[0:4])
+        self.assertUionedTensor(tensor_cat1[8:16], tensor4[4:12])
+
+        self.assertUionedTensor(tensor_cat2[0:4], tensor1[0:4])
+        self.assertUionedTensor(tensor_cat2[4:8], tensor1[4:8])
+        self.assertUionedTensor(tensor_cat2[8:], tensor2)
+
+    def assertUionedTensor(self, tensor1: ChannelTensor,
+                           tensor2: ChannelTensor):
+        assert len(tensor1) == len(tensor2)
+        for e1, e2 in zip(tensor1, tensor2):
+            self.assertEqual(e1.root, e2.root)
+
+
+class TestChannelElem(unittest.TestCase):
+
+    def test_union(self):
+        tensor = ChannelTensor(10)
+        elem1 = tensor[1]
+        elem2 = tensor[2]
+        ChannelElem.union_two(elem1, elem2)
+        self.assertEqual(elem1.root, elem2.root)
+
+        elem3 = tensor[3]
+        ChannelElem.union_two(elem2, elem3)
+        self.assertEqual(elem1.root, elem3.root)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_graph.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6f1c3ffa250815bdd8bb39dfc36f03018863623
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_channel_graph.py
@@ -0,0 +1,74 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import torch
+from torch import nn
+
+from mmrazor.models.task_modules import BackwardTracer
+from mmrazor.registry import TASK_UTILS
+from mmrazor.structures.graph import ModuleGraph
+from mmrazor.structures.graph.channel_graph import ChannelGraph
+from mmrazor.structures.graph.channel_nodes import \
+    default_channel_node_converter
+from ...data.models import SingleLineModel
+
+NodeMap = {}
+
+
+@TASK_UTILS.register_module()
+class ImageClassifierPseudoLossWithSixChannel:
+    """Calculate the pseudo loss to trace the topology of a `ImageClassifier`
+    in MMClassification with `BackwardTracer`."""
+
+    def __call__(self, model) -> torch.Tensor:
+        pseudo_img = torch.rand(1, 6, 224, 224)
+        pseudo_output = model(pseudo_img)
+        return sum(pseudo_output)
+
+
+class TestChannelGraph(unittest.TestCase):
+
+    def test_init(self):
+        model = SingleLineModel()
+        module_graph = ModuleGraph.init_from_backward_tracer(model)
+
+        _ = ChannelGraph.copy_from(module_graph,
+                                   default_channel_node_converter)
+
+    # def test_forward(self):
+    #     for model_data in BackwardPassedModelManager.include_models(  # noqa
+    #     ):  # noqa
+    #         with self.subTest(model=model_data):
+    #             model = model_data()
+    #             module_graph = ModuleGraph.init_from_backward_tracer(model)
+
+    #             channel_graph = ChannelGraph.copy_from(
+    #                 module_graph, default_channel_node_converter)
+    #             channel_graph.forward()
+
+    #             # units = channel_graph.collect_units()
+    #             _ = channel_graph.generate_units_config()
+
+    def test_forward_with_config_num_in_channel(self):
+
+        class MyModel(nn.Module):
+
+            def __init__(self) -> None:
+                super().__init__()
+                self.conv1 = nn.Conv2d(6, 3, 3, 1, 1)
+                self.net = SingleLineModel()
+
+            def forward(self, x):
+                return self.net(self.conv1(x))
+
+        model = MyModel()
+        module_graph = ModuleGraph.init_from_backward_tracer(
+            model,
+            backward_tracer=BackwardTracer(
+                loss_calculator=ImageClassifierPseudoLossWithSixChannel()))
+
+        channel_graph = ChannelGraph.copy_from(module_graph,
+                                               default_channel_node_converter)
+        channel_graph.forward(num_input_channel=6)
+
+        _ = channel_graph.generate_units_config
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_graph.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_graph.py
new file mode 100644
index 0000000000000000000000000000000000000000..14df464c8866d11aedfb1a82dcb74a95051c523d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_graph.py
@@ -0,0 +1,31 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import sys
+from unittest import TestCase
+
+import torch
+
+sys.setrecursionlimit(int(1e8))
+
+DEVICE = torch.device('cpu')
+
+
+class TestGraph(TestCase):
+    pass
+    # def test_init_from_fx_tracer(self) -> None:
+    #     TestData = BackwardPassedModelManager.include_models()
+    #     with SetTorchThread(1):
+    #         with mp.Pool() as p:
+    #             result = p.map(_test_init_from_fx_tracer, TestData)
+    #     for res, model in zip(result, TestData):
+    #         with self.subTest(model=model):
+    #             self.assertTrue(res[0], res[1])
+
+    # def test_init_from_backward_tracer(self) -> None:
+    #     TestData = FxPassedModelManager.include_models()
+    #     with SetTorchThread(1) as _:
+    #         with mp.Pool() as p:
+    #             result = p.map(_test_init_from_backward_tracer, TestData)
+    #     for res, model in zip(result, TestData):
+    #         # test_init_from_backward_tracer(model)
+    #         with self.subTest(model=model):
+    #             self.assertTrue(res[0], res[1])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_prune_tracer_model.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_prune_tracer_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d459a6d9fa278ff3a775fbfc807510a33e4b949
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_graph/test_prune_tracer_model.py
@@ -0,0 +1,193 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import multiprocessing as mp
+import os
+import signal
+import sys
+import time
+from concurrent.futures import ProcessPoolExecutor
+from contextlib import contextmanager
+from functools import partial
+from unittest import TestCase
+
+import torch
+from mmengine import MMLogger
+
+from mmrazor.models.task_modules.tracer.channel_analyzer import ChannelAnalyzer
+from ...data.model_library import ModelGenerator
+from ...data.tracer_passed_models import (PassedModelManager,
+                                          backward_passed_library,
+                                          fx_passed_library)
+from ...utils import SetTorchThread
+
+sys.setrecursionlimit(int(pow(2, 20)))
+# test config
+
+DEVICE = torch.device('cpu')
+FULL_TEST = os.getenv('FULL_TEST') == 'true'
+try:
+    MP = int(os.getenv('MP'))
+except Exception:
+    MP = 1
+
+DEBUG = os.getenv('DEBUG') == 'true'
+if DEBUG:
+    import logging
+    logger = MMLogger.get_current_instance()
+    logger.handlers[0].setLevel(logging.DEBUG)
+    logger.setLevel(logging.DEBUG)
+
+if MP > 1:
+    POOL_SIZE = MP
+    TORCH_THREAD_SIZE = mp.cpu_count() // POOL_SIZE
+    torch.set_num_interop_threads(TORCH_THREAD_SIZE)
+else:
+    POOL_SIZE = 1
+    TORCH_THREAD_SIZE = -1
+
+print(f'DEBUG: {DEBUG}')
+print(f'FULL_TEST: {FULL_TEST}')
+print(f'POOL_SIZE: {POOL_SIZE}')
+print(f'TORCH_THREAD_SIZE: {TORCH_THREAD_SIZE}')
+
+# tools for tesing
+
+# test functions for mp
+
+
+@contextmanager
+def time_limit(seconds, msg='', activated=(not DEBUG)):
+
+    class TimeoutException(Exception):
+        pass
+
+    def signal_handler(signum, frame):
+        if activated:
+            raise TimeoutException(f'{msg} run over {seconds} s!')
+
+    signal.signal(signal.SIGALRM, signal_handler)
+    signal.alarm(seconds)
+    try:
+        yield
+    finally:
+        signal.alarm(0)
+
+
+def _test_a_model(Model, tracer_type='fx'):
+    start = time.time()
+
+    try:
+        print(f'test {Model}.')
+        model = Model.init_model()
+        model.eval()
+        if tracer_type == 'fx':
+            tracer_type = 'FxTracer'
+        elif tracer_type == 'backward':
+            tracer_type = 'BackwardTracer'
+        else:
+            raise NotImplementedError()
+
+        tracer = ChannelAnalyzer(
+            tracer_type=tracer_type,
+            demo_input={
+                'type': 'DefaultDemoInput',
+                'scope': Model.scope
+            })
+        with time_limit(60):
+            unit_configs = tracer.analyze(model)
+
+        out = len(unit_configs)
+        print(f'test {Model} successful.')
+
+        return Model.name, True, '', time.time() - start, out
+    except Exception as e:
+        if DEBUG:
+            raise e
+        else:
+            print(f'test {Model} failed.')
+            return Model.name, False, f'{e}', time.time() - start, -1
+
+
+# TestCase
+
+
+class TestTraceModel(TestCase):
+
+    def test_init_from_fx_tracer(self) -> None:
+        from mmrazor import digit_version
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+        TestData = fx_passed_library.include_models(FULL_TEST)
+
+        with SetTorchThread(TORCH_THREAD_SIZE):
+            if POOL_SIZE != 1:
+                with ProcessPoolExecutor(POOL_SIZE) as p:
+                    result = p.map(
+                        partial(_test_a_model, tracer_type='fx'), TestData)
+
+            else:
+                result = map(
+                    partial(_test_a_model, tracer_type='fx'), TestData)
+        result = list(result)
+        self.report(result, fx_passed_library, 'fx')
+
+    def test_init_from_backward_tracer(self) -> None:
+        TestData = backward_passed_library.include_models(FULL_TEST)
+        with SetTorchThread(TORCH_THREAD_SIZE):
+            if POOL_SIZE != 1:
+                with ProcessPoolExecutor(POOL_SIZE) as p:
+                    result = p.map(
+                        partial(_test_a_model, tracer_type='backward'),
+                        TestData)
+            else:
+                result = map(
+                    partial(_test_a_model, tracer_type='fx'), TestData)
+        self.report(result, backward_passed_library, 'backward')
+
+    def report(self, result, model_manager: PassedModelManager, fx_type='fx'):
+        print()
+        print(f'Trace model summary using {fx_type} tracer.')
+
+        passd_test = [res for res in result if res[1] is True]
+        unpassd_test = [res for res in result if res[1] is False]
+
+        # long summary
+
+        print(f'{len(passd_test)},{len(unpassd_test)},'
+              f'{len(model_manager.uninclude_models(full_test=FULL_TEST))}')
+
+        print('Passed:')
+        print('\tmodel\ttime\tlen(mutable)')
+        for model, passed, msg, used_time, out in passd_test:
+            with self.subTest(model=model):
+                print(f'\t{model}\t{int(used_time)}s\t{out}')
+                self.assertTrue(passed, msg)
+
+        print('UnPassed:')
+        for model, passed, msg, used_time, out in unpassd_test:
+            with self.subTest(model=model):
+                print(f'\t{model}\t{int(used_time)}s\t{out}')
+                print(f'\t\t{msg}')
+                self.assertTrue(passed, msg)
+
+        print('UnTest:')
+        untest_models = model_manager.uninclude_models(full_test=FULL_TEST)
+        for model in untest_models:
+            print(f'\t{model}')
+
+        # short summary
+        short_passed = set(
+            [ModelGenerator.get_short_name(res[0]) for res in passd_test])
+
+        short_unpassed = set(
+            [ModelGenerator.get_short_name(res[0]) for res in unpassd_test])
+
+        short_untest = set([model.short_name for model in untest_models])
+
+        for name in short_unpassed:
+            if name in short_passed:
+                short_passed.remove(name)
+
+        print('Short Summary:')
+        print('Passed\n', short_passed)
+        print('Unpassed\n', short_unpassed)
+        print('Untest\n', short_untest)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_base_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_base_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..15faf5d95b45256c0b295faf79632a93ccc6ff96
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_base_recorder.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from toy_mod import Toy
+
+from mmrazor.models.task_modules import MethodOutputsRecorder
+
+
+class TestFuncOutputsRecorder(TestCase):
+
+    def test_get_record_data(self):
+
+        toy = Toy()
+
+        recorder = MethodOutputsRecorder('toy_mod.Toy.toy_func')
+        recorder.initialize()
+
+        with recorder:
+            res0 = toy.toy_func()
+            res1 = toy.toy_func()
+
+        self.assertEquals(res0, recorder.get_record_data(record_idx=0))
+        self.assertEquals(res1, recorder.get_record_data(record_idx=1))
+
+        with self.assertRaisesRegex(
+                AssertionError,
+                'record_idx is illegal. The length of data_buffer is 2, '
+                'but record_idx is 2'):
+            _ = recorder.get_record_data(record_idx=2)
+
+        with self.assertRaisesRegex(
+                TypeError,
+                'When data_idx is not None, record should be a list or '
+                'tuple instance'):
+            _ = recorder.get_record_data(data_idx=0)
+
+        recorder = MethodOutputsRecorder('toy_mod.Toy.toy_list_func')
+        recorder.initialize()
+
+        with recorder:
+            res = toy.toy_list_func()
+
+        self.assertEqual(len(res), 3)
+
+        with self.assertRaisesRegex(
+                AssertionError,
+                'data_idx is illegal. The length of record is 3'):
+            _ = recorder.get_record_data(data_idx=3)
+
+        self.assertEquals(res[2], recorder.get_record_data(data_idx=2))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_inputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_inputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..6fa9655a1ff2af1d8f835831d92305715fc3fed1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_inputs_recorder.py
@@ -0,0 +1,138 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import logging
+import os.path as osp
+import tempfile
+from unittest import TestCase
+from unittest.mock import Mock
+
+import torch
+import torch.nn as nn
+from mmengine.evaluator import Evaluator
+from mmengine.hooks import EMAHook
+from mmengine.logging import MMLogger
+from mmengine.model import BaseModel, ExponentialMovingAverage
+from mmengine.optim import OptimWrapper
+from mmengine.runner import Runner
+from torch.utils.data import Dataset
+
+from mmrazor.models.task_modules import FunctionInputsRecorder, RecorderManager
+
+
+class ToyModel(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(2, 1)
+        # test FunctionInputsRecorder when ema_hook is used
+        recorders_cfg = dict(
+            out=dict(type='FunctionInputs', source='toy_mod.toy_func'))
+        self.recorders = RecorderManager(recorders_cfg)
+        self.recorders.initialize(self)
+
+    def forward(self, inputs, data_sample, mode='tensor'):
+        labels = torch.stack(data_sample)
+        inputs = torch.stack(inputs)
+        with self.recorders:
+            outputs = self.linear(inputs)
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = (labels - outputs).sum()
+            outputs = dict(loss=loss)
+            return outputs
+        else:
+            return outputs
+
+
+class DummyDataset(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_sample=self.label[index])
+
+
+class TestFuncInputsRecorder(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.TemporaryDirectory()
+
+    def tearDown(self):
+        # `FileHandler` should be closed in Windows, otherwise we cannot
+        # delete the temporary directory
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        self.temp_dir.cleanup()
+
+    def test_context_manager(self):
+        from toy_mod import execute_toy_func2 as execute_toy_func
+
+        recorder = FunctionInputsRecorder('toy_mod.toy_func2')
+        recorder.initialize()
+
+        with recorder:
+            execute_toy_func(1, 2)
+            execute_toy_func(1, b=2)
+            execute_toy_func(b=2, a=1)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=0, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=0, data_idx=1) == 2)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=1, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=1, data_idx=1) == 2)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=2, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=2, data_idx=1) == 2)
+
+    def test_ema_hook(self):
+        device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
+        model = ToyModel().to(device)
+        evaluator = Evaluator([])
+        evaluator.evaluate = Mock(return_value=dict(acc=0.5))
+        runner = Runner(
+            model=model,
+            train_dataloader=dict(
+                dataset=DummyDataset(),
+                sampler=dict(type='DefaultSampler', shuffle=True),
+                batch_size=3,
+                num_workers=0),
+            val_dataloader=dict(
+                dataset=DummyDataset(),
+                sampler=dict(type='DefaultSampler', shuffle=False),
+                batch_size=3,
+                num_workers=0),
+            val_evaluator=evaluator,
+            work_dir=self.temp_dir.name,
+            default_scope='mmrazor',
+            optim_wrapper=OptimWrapper(
+                torch.optim.Adam(ToyModel().parameters())),
+            train_cfg=dict(by_epoch=True, max_epochs=2, val_interval=1),
+            val_cfg=dict(),
+            default_hooks=dict(logger=None),
+            custom_hooks=[dict(type='EMAHook', )],
+            experiment_name='test_func_inputs_recorder')
+        runner.train()
+        for hook in runner.hooks:
+            if isinstance(hook, EMAHook):
+                self.assertTrue(
+                    isinstance(hook.ema_model, ExponentialMovingAverage))
+
+        self.assertTrue(
+            osp.exists(osp.join(self.temp_dir.name, 'epoch_2.pth')))
+        checkpoint = torch.load(osp.join(self.temp_dir.name, 'epoch_2.pth'))
+        self.assertTrue('ema_state_dict' in checkpoint)
+        self.assertTrue(checkpoint['ema_state_dict']['steps'] == 8)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_outputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_outputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d65614955b223e52a3f40340b40e7b4d1c3b491
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_func_outputs_recorder.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.task_modules import FunctionOutputsRecorder
+
+
+class TestFuncOutputsRecorder(TestCase):
+
+    def test_init(self):
+
+        _ = FunctionOutputsRecorder('toy_mod.toy_func')
+
+        with self.assertRaisesRegex(TypeError, 'source should be'):
+            _ = FunctionOutputsRecorder([1])
+
+        with self.assertRaisesRegex(AssertionError, 'source must have at '):
+            _ = FunctionOutputsRecorder('aaaaa')
+
+    def test_context_manager(self):
+        from toy_mod import execute_toy_func
+
+        recorder = FunctionOutputsRecorder('aaa.bbb')
+        recorder.initialize()
+        with self.assertRaisesRegex(ImportError, 'aaa is not imported'):
+            with recorder:
+                execute_toy_func(1)
+
+        recorder = FunctionOutputsRecorder('toy_mod.aaa')
+        recorder.initialize()
+        with self.assertRaisesRegex(AssertionError, 'aaa is not in toy_mod'):
+            with recorder:
+                execute_toy_func(1)
+
+        recorder = FunctionOutputsRecorder('toy_mod.TOY_VAR')
+        recorder.initialize()
+        with self.assertRaisesRegex(TypeError, 'TOY_VAR should be'):
+            with recorder:
+                execute_toy_func(1)
+
+        recorder = FunctionOutputsRecorder('toy_mod.toy_func')
+        recorder.initialize()
+
+        with recorder:
+            execute_toy_func(1)
+
+        data = recorder.get_record_data()
+        self.assertTrue(data == 1)
+
+        execute_toy_func(1)
+        data = recorder.get_record_data()
+        self.assertTrue(data == 1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_inputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_inputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..7450a231c4a3414fa3bf4695f970364c2cff628e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_inputs_recorder.py
@@ -0,0 +1,35 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.task_modules import MethodInputsRecorder
+
+
+class TestFuncOutputsRecorder(TestCase):
+
+    def test_context_manager(self):
+        from toy_mod import ToyClass
+
+        toy = ToyClass()
+
+        recorder = MethodInputsRecorder('toy_mod.ToyClass.func')
+        recorder.initialize()
+
+        with recorder:
+            _ = toy.func(x=1, y=2)
+            _ = toy.func(1, y=2)
+            _ = toy.func(y=2, x=1)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=0, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=0, data_idx=1) == 2)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=1, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=1, data_idx=1) == 2)
+
+        self.assertTrue(
+            recorder.get_record_data(record_idx=2, data_idx=0) == 1)
+        self.assertTrue(
+            recorder.get_record_data(record_idx=2, data_idx=1) == 2)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_outputs_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_outputs_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..83fdbc3c05ef546446fe699b672abdc14e9d415c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_method_outputs_recorder.py
@@ -0,0 +1,55 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.task_modules import MethodOutputsRecorder
+
+
+class TestFuncOutputsRecorder(TestCase):
+
+    def test_init(self):
+
+        _ = MethodOutputsRecorder('toy_mod.ToyClass.toy')
+
+        with self.assertRaisesRegex(TypeError, 'source should be'):
+            _ = MethodOutputsRecorder([1])
+
+        with self.assertRaisesRegex(AssertionError, 'source must have at '):
+            _ = MethodOutputsRecorder('aaaaa')
+
+        with self.assertRaisesRegex(AssertionError, 'source must have at '):
+            _ = MethodOutputsRecorder('aaa.bbb')
+
+        with self.assertRaisesRegex(ImportError, 'aaa is not imported'):
+            _ = MethodOutputsRecorder('aaa.bbb.ccc')
+
+        with self.assertRaisesRegex(AssertionError, 'aaa is not in toy_mod'):
+            _ = MethodOutputsRecorder('toy_mod.aaa.bbb')
+
+        with self.assertRaisesRegex(TypeError, 'toy_func should be'):
+            _ = MethodOutputsRecorder('toy_mod.toy_func.bbb')
+
+        with self.assertRaisesRegex(AssertionError, 'bbb is not in ToyClass'):
+            _ = MethodOutputsRecorder('toy_mod.ToyClass.bbb')
+
+        with self.assertRaisesRegex(TypeError, 'TOY_CLS should be'):
+            _ = MethodOutputsRecorder('toy_mod.ToyClass.TOY_CLS')
+
+    def test_context_manager(self):
+        from toy_mod import ToyClass
+
+        toy = ToyClass()
+
+        recorder = MethodOutputsRecorder('toy_mod.ToyClass.toy')
+        recorder.initialize()
+
+        with recorder:
+            result = toy.toy()
+
+        data = recorder.get_record_data()
+        self.assertTrue(data == result)
+
+        result_ = toy.toy()
+
+        data = recorder.get_record_data()
+        self.assertTrue(data == result)
+        self.assertFalse(result_ == result)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_module_recorders.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_module_recorders.py
new file mode 100644
index 0000000000000000000000000000000000000000..c267903ca1ccdb52a1b84d81441ba4bd42a54b93
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_module_recorders.py
@@ -0,0 +1,75 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from torch import nn
+
+from mmrazor.models.task_modules import (ModuleInputsRecorder,
+                                         ModuleOutputsRecorder)
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 1, 1)
+        self.conv2 = nn.Conv2d(1, 1, 1)
+
+    def forward(self, x):
+        return self.conv2(self.conv1(x))
+
+
+class TestModuleOutputsRecorder(TestCase):
+
+    def test_prepare_from_model(self):
+
+        recorder = ModuleOutputsRecorder('conv1')
+        with self.assertRaisesRegex(AssertionError, 'model can not be'):
+            recorder.prepare_from_model()
+
+        recorder = ModuleOutputsRecorder('conv3')
+        model = ToyModel()
+        with self.assertRaisesRegex(AssertionError, '"conv3" is not in'):
+            recorder.prepare_from_model(model)
+
+        recorder = ModuleOutputsRecorder('conv2')
+        model = ToyModel()
+        recorder.prepare_from_model(model)
+
+    def test_module_outputs(self):
+
+        recorder = ModuleOutputsRecorder('conv2')
+        model = ToyModel()
+        recorder.initialize(model)
+
+        with recorder:
+            self.assertTrue(recorder.recording)
+            res = model(torch.randn(1, 1, 1, 1))
+
+        self.assertEquals(res, recorder.get_record_data())
+
+        with recorder:
+            self.assertTrue(len(recorder.data_buffer) == 0)
+
+        _ = model(torch.randn(1, 1, 1, 1))
+        self.assertTrue(len(recorder.data_buffer) == 0)
+
+    def test_module_intputs(self):
+
+        recorder = ModuleInputsRecorder('conv1')
+        model = ToyModel()
+        recorder.initialize(model)
+
+        tensor = torch.randn(1, 1, 1, 1)
+        with recorder:
+            self.assertTrue(recorder.recording)
+            _ = model(tensor)
+
+        conv1_input = recorder.get_record_data(data_idx=0)
+        self.assertEquals(conv1_input.sum(), tensor.sum())
+
+        with recorder:
+            self.assertTrue(len(recorder.data_buffer) == 0)
+
+        _ = model(torch.randn(1, 1, 1, 1))
+        self.assertTrue(len(recorder.data_buffer) == 0)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_param_recorder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_param_recorder.py
new file mode 100644
index 0000000000000000000000000000000000000000..e604bdf97fe999a6343b0d981ff40be7da36f9d4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_param_recorder.py
@@ -0,0 +1,42 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from torch import nn
+
+from mmrazor.models.task_modules import ParameterRecorder
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.toy_conv = nn.Conv2d(1, 1, 1)
+        self.no_record_conv = nn.Conv2d(1, 1, 1)
+
+    def forward(self, x):
+        return self.toy_conv(x)
+
+
+class TestParameterRecorder(TestCase):
+
+    def test_prepare_from_model(self):
+
+        model = ToyModel()
+        recorder = ParameterRecorder('AAA')
+        with self.assertRaisesRegex(AssertionError, '"AAA" is not in the'):
+            recorder.initialize(model)
+
+        recorder = ParameterRecorder('toy_conv.bias')
+        with self.assertRaisesRegex(AssertionError, 'model can not be None'):
+            recorder.prepare_from_model()
+
+        recorder.initialize(model)
+        bias_weight = recorder.get_record_data()
+
+        self.assertEquals(bias_weight, model.toy_conv.bias)
+
+        with recorder:
+            _ = model(torch.randn(1, 1, 1, 1))
+
+        self.assertEquals(bias_weight, model.toy_conv.bias)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_recorder_manager.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_recorder_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..beab7477c7a4309ec1a5eba36e65d963a745604a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/test_recorder_manager.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmengine import ConfigDict
+from torch import nn
+from toy_mod import Toy
+
+from mmrazor.models.task_modules import RecorderManager
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 1, 1)
+        self.conv2 = nn.Conv2d(1, 1, 1)
+        self.toy = Toy()
+
+    def forward(self, x):
+        return self.conv2(self.conv1(x)) + self.toy.toy_func()
+
+
+class TestRecorderManager(TestCase):
+
+    def test_init(self):
+
+        manager = RecorderManager()
+        self.assertEquals(len(manager.recorders), 0)
+
+        recorders = ConfigDict(
+            r1=dict(type='ModuleOutputs', source='conv1'),
+            r2=dict(type='MethodOutputs', source='toy_mod.Toy.toy_func'),
+        )
+        manager = RecorderManager(recorders)
+        model = ToyModel()
+        manager.initialize(model)
+
+    def test_context_manager(self):
+
+        recorders = ConfigDict(
+            r1=dict(type='ModuleOutputs', source='conv2'),
+            r2=dict(type='MethodOutputs', source='toy_mod.Toy.toy_func'),
+        )
+        manager = RecorderManager(recorders)
+        model = ToyModel()
+        manager.initialize(model)
+
+        self.assertEquals(manager.get_recorder('r1'), manager.recorders['r1'])
+        self.assertEquals(manager.get_recorder('r2'), manager.recorders['r2'])
+
+        with manager:
+            res = model(torch.ones(1, 1, 1, 1))
+
+        method_outputs = manager.recorders['r2'].get_record_data()
+        conv2_outputs = manager.recorders['r1'].get_record_data()
+
+        self.assertEquals(res.sum(), method_outputs + conv2_outputs.sum())
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/toy_mod.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/toy_mod.py
new file mode 100644
index 0000000000000000000000000000000000000000..0df3e2d70690fb72cb1660acd64379b12164b309
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_recorders/toy_mod.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+
+TOY_VAR = 'aaa'
+
+
+def toy_func(a):
+    return a
+
+
+def toy_func2(a, b):
+    return a, b
+
+
+def toy_list_func(a):
+    return [a, a, a]
+
+
+def execute_toy_func(a):
+    toy_func(a)
+
+
+def execute_toy_func2(a, b):
+    toy_func2(a, b)
+
+
+def execute_toy_list_func(a):
+    toy_list_func(a)
+
+
+class ToyClass:
+
+    TOY_CLS = 'TOY_CLASS'
+
+    def __init__(self):
+        self._count = 0
+
+    def toy(self):
+        self._count += 1
+        return self._count
+
+    def func(self, x, y=0):
+        return x + y
+
+    def __call__(self):
+        self._count += 1
+        return self._count
+
+
+class Toy():
+
+    def toy_func(self):
+        return random.randint(0, 1000)
+
+    def toy_list_func(self):
+        return [random.randint(0, 1000) for _ in range(3)]
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_backward_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_backward_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..55ddaccc0413737fe4e942452c484fbe96f087b0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_backward_tracer.py
@@ -0,0 +1,309 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+from torch import Tensor, nn
+from torch.nn import Module
+
+from mmrazor.models.task_modules import (BackwardTracer, Path, PathConcatNode,
+                                         PathConvNode, PathDepthWiseConvNode,
+                                         PathLinearNode, PathList,
+                                         PathNormNode)
+
+NONPASS_NODES = (PathConvNode, PathLinearNode, PathConcatNode)
+PASS_NODES = (PathNormNode, PathDepthWiseConvNode)
+
+
+class MultiConcatModel(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(16, 8, 1)
+        self.op4 = nn.Conv2d(3, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        x3 = self.op3(cat1)
+        x4 = self.op4(x)
+        output = torch.cat([x3, x4], dim=1)
+
+        return output
+
+
+class MultiConcatModel2(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(3, 8, 1)
+        self.op4 = nn.Conv2d(24, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        cat2 = torch.cat([cat1, x3], dim=1)
+        output = self.op4(cat2)
+
+        return output
+
+
+class MultiConcatModel3(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(3, 8, 1)
+        self.op4 = nn.Conv2d(24, 8, 1)
+        self.op5 = nn.Conv2d(24, 8, 1)
+        self.op6 = nn.Conv2d(24, 8, 1)
+        self.op7 = nn.Conv2d(24, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        cat1 = torch.cat([x1, x2, x3], dim=1)
+        x4 = self.op4(cat1)
+        x5 = self.op5(cat1)
+        x6 = self.op6(cat1)
+        x7 = self.op7(cat1)
+        return torch.cat([x4, x5, x6, x7], dim=1)
+
+
+class ResBlock(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(8, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(8, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x1))
+        x3 = self.op3(x2 + x1)
+        return x3
+
+
+class ToyCNNPseudoLoss:
+
+    def __init__(self, input_shape=(2, 3, 16, 16)):
+        self.input_shape = input_shape
+
+    def __call__(self, model):
+        pseudo_img = torch.rand(self.input_shape)
+        pseudo_output = model(pseudo_img)
+        return pseudo_output.sum()
+
+
+class TestBackwardTracer(TestCase):
+
+    def test_trace_resblock(self) -> None:
+        model = ResBlock()
+        loss_calculator = ToyCNNPseudoLoss()
+        tracer = BackwardTracer(loss_calculator=loss_calculator)
+        path_list = tracer.trace(model)
+
+        # test tracer and parser
+        assert len(path_list) == 2
+        assert len(path_list[0]) == 5
+
+        # test path_list
+        nonpass2parents = path_list.find_nodes_parents(NONPASS_NODES)
+        assert len(nonpass2parents) == 3
+        assert nonpass2parents['op1'] == list()
+        assert nonpass2parents['op2'] == list({PathNormNode('bn1')})
+        assert nonpass2parents['op3'] == list(
+            {PathNormNode('bn2'), PathNormNode('bn1')})
+
+        nonpass2nonpassparents = path_list.find_nodes_parents(
+            NONPASS_NODES, non_pass=NONPASS_NODES)
+        assert len(nonpass2parents) == 3
+        assert nonpass2nonpassparents['op1'] == list()
+        assert nonpass2nonpassparents['op2'] == list({PathConvNode('op1')})
+        assert nonpass2nonpassparents['op3'] == list(
+            {PathConvNode('op2'), PathConvNode('op1')})
+
+        pass2nonpassparents = path_list.find_nodes_parents(
+            PASS_NODES, non_pass=NONPASS_NODES)
+        assert len(pass2nonpassparents) == 2
+        assert pass2nonpassparents['bn1'] == list({PathConvNode('op1')})
+        assert pass2nonpassparents['bn2'] == list({PathConvNode('op2')})
+
+    def test_trace_multi_cat(self) -> None:
+        loss_calculator = ToyCNNPseudoLoss()
+
+        model = MultiConcatModel()
+        tracer = BackwardTracer(loss_calculator=loss_calculator)
+        path_list = tracer.trace(model)
+
+        assert len(path_list) == 1
+
+        nonpass2parents = path_list.find_nodes_parents(NONPASS_NODES)
+        assert len(nonpass2parents) == 4
+        assert nonpass2parents['op1'] == list()
+        assert nonpass2parents['op2'] == list()
+        path_list1 = PathList(Path(PathConvNode('op1')))
+        path_list2 = PathList(Path(PathConvNode('op2')))
+        # only one parent
+        assert len(nonpass2parents['op3']) == 1
+        assert isinstance(nonpass2parents['op3'][0], PathConcatNode)
+        assert len(nonpass2parents['op3'][0]) == 2
+        assert nonpass2parents['op3'][0].get_module_names() == ['op1', 'op2']
+        assert nonpass2parents['op3'][0].path_lists == [path_list1, path_list2]
+        assert nonpass2parents['op3'][0][0] == path_list1
+        assert nonpass2parents['op4'] == list()
+
+        model = MultiConcatModel2()
+        tracer = BackwardTracer(loss_calculator=loss_calculator)
+        path_list = tracer.trace(model)
+        assert len(path_list) == 1
+
+        nonpass2parents = path_list.find_nodes_parents(NONPASS_NODES)
+        assert len(nonpass2parents) == 4
+        assert nonpass2parents['op1'] == list()
+        assert nonpass2parents['op2'] == list()
+        assert nonpass2parents['op3'] == list()
+        # only one parent
+        assert len(nonpass2parents['op4']) == 1
+        assert isinstance(nonpass2parents['op4'][0], PathConcatNode)
+        assert nonpass2parents['op4'][0].get_module_names() == [
+            'op1', 'op2', 'op3'
+        ]
+
+        model = MultiConcatModel3()
+        tracer = BackwardTracer(loss_calculator=loss_calculator)
+        path_list = tracer.trace(model)
+        assert len(path_list) == 1
+
+        nonpass2parents = path_list.find_nodes_parents(NONPASS_NODES)
+        assert nonpass2parents['op1'] == list()
+        assert nonpass2parents['op2'] == list()
+        assert nonpass2parents['op3'] == list()
+        assert nonpass2parents['op4'] == nonpass2parents['op5'] == \
+               nonpass2parents['op6'] == nonpass2parents['op7']
+
+    def test_repr(self):
+        toy_node = PathConvNode('op1')
+        assert repr(toy_node) == 'PathConvNode(\'op1\')'
+
+        toy_path = Path([PathConvNode('op1'), PathConvNode('op2')])
+        assert repr(
+            toy_path
+        ) == 'Path(\n  PathConvNode(\'op1\'),\n  PathConvNode(\'op2\')\n)'
+
+        toy_path_list = PathList(Path(PathConvNode('op1')))
+        assert repr(
+            toy_path_list
+        ) == 'PathList(\n  Path(\n    PathConvNode(\'op1\')\n  )\n)'
+
+        path_list1 = PathList(Path(PathConvNode('op1')))
+        path_list2 = PathList(Path(PathConvNode('op2')))
+        toy_concat_node = PathConcatNode('op3', [path_list1, path_list2])
+        assert repr(
+            toy_concat_node
+        ) == 'PathConcatNode(\n  PathList(\n    Path(\n      PathConvNode(\'op1\')\n    )\n  ),\n  PathList(\n    Path(\n      PathConvNode(\'op2\')\n    )\n  )\n)'  # noqa: E501
+
+    def test_reset_bn_running_stats(self):
+        _test_reset_bn_running_stats(False)
+        with pytest.raises(AssertionError):
+            _test_reset_bn_running_stats(True)
+
+    def test_node(self):
+        node1 = PathConvNode('conv1')
+        node2 = PathConvNode('conv2')
+        assert node1 != node2
+
+        node1 = PathConvNode('conv1')
+        node2 = PathConvNode('conv1')
+        assert node1 == node2
+
+    def test_path(self):
+        node1 = PathConvNode('conv1')
+        node2 = PathConvNode('conv2')
+
+        path1 = Path([node1])
+        path2 = Path([node2])
+        assert path1 != path2
+
+        path1 = Path([node1])
+        path2 = Path([node1])
+        assert path1 == path2
+
+        assert path1[0] == node1
+
+    def test_path_list(self):
+        node1 = PathConvNode('conv1')
+        node2 = PathConvNode('conv2')
+
+        path1 = Path([node1])
+        path2 = Path([node2])
+        assert PathList(path1) == PathList([path1])
+        assert PathList(path1) != PathList(path2)
+
+        with self.assertRaisesRegex(AssertionError, ''):
+            _ = PathList({})
+
+    def test_sum_pseudo_loss(self):
+        model = ResBlock()
+        tracer = BackwardTracer(loss_calculator={'type': 'SumPseudoLoss'})
+        path = tracer.trace(model)
+        print(path)
+
+
+def _test_reset_bn_running_stats(should_fail):
+    import os
+    import random
+
+    import numpy as np
+
+    def set_seed(seed: int) -> None:
+        random.seed(seed)
+        os.environ['PYTHONHASHSEED'] = str(seed)
+        np.random.seed(seed)
+        torch.manual_seed(seed)
+
+    set_seed(1024)
+    imgs = torch.randn(2, 3, 4, 4)
+    loss_calculator = ToyCNNPseudoLoss()
+    tracer = BackwardTracer(loss_calculator=loss_calculator)
+    if should_fail:
+        tracer._reset_norm_running_stats = lambda *_: None
+
+    torch_rng_state = torch.get_rng_state()
+    np_rng_state = np.random.get_state()
+    random_rng_state = random.getstate()
+
+    model1 = ResBlock()
+    set_seed(1)
+    tracer.trace(model1)
+    model1.eval()
+    output1 = model1(imgs)
+
+    set_seed(1024)
+    torch.set_rng_state(torch_rng_state)
+    np.random.set_state(np_rng_state)
+    random.setstate(random_rng_state)
+
+    model2 = ResBlock()
+    set_seed(2)
+    tracer.trace(model2)
+    model2.eval()
+    output2 = model2(imgs)
+
+    assert torch.equal(output1.norm(p='fro'), output2.norm(p='fro'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_fx_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_fx_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..544f2aab879399e9eec93bb38bfbf33fbe9854f3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_fx_tracer.py
@@ -0,0 +1,68 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+from functools import partial
+
+from mmcls.models.classifiers.image import ImageClassifier
+
+from mmrazor.utils import get_placeholder
+
+try:
+    from torch.fx.graph_module import GraphModule
+except ImportError:
+    GraphModule = get_placeholder('torch>=1.12')
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models.task_modules.demo_inputs import DefaultDemoInput
+from mmrazor.models.task_modules.tracer.fx_tracer import FxTracer
+from ...data.models import UntracableModel
+
+MODELS = [
+    UntracableModel,
+    partial(
+        ImageClassifier,
+        backbone=dict(type='mmrazor.UntracableBackBone'),
+        head=dict(
+            type='mmrazor.LinearHeadForTest',
+            in_channel=16,
+        )),
+]
+
+
+class TestFxTracer(unittest.TestCase):
+
+    def test_model(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+
+        for Model in MODELS:
+            with self.subTest(model=Model):
+                model = Model()
+
+                tracer = FxTracer()
+                demo_input = DefaultDemoInput()
+                inputs = demo_input.get_data(model)
+
+                if isinstance(inputs, dict):
+                    # args = copy.copy(inputs)
+                    # args.pop('inputs')
+                    # args['mode'] = 'tensor'
+                    args = {'mode': 'tensor'}
+                    torch_graph = tracer.trace(model, concrete_args=args)
+                else:
+                    torch_graph = tracer.trace(model)
+                print(model)
+
+                print(torch_graph)
+
+                graph_module = GraphModule(model, torch_graph)
+                print(graph_module)
+                print(graph_module.code)
+
+                inputs = demo_input.get_data(model)
+                if isinstance(inputs, dict):
+                    inputs['mode_1'] = inputs['mode']
+                    inputs.pop('mode')
+                    graph_module(**inputs)
+                else:
+                    graph_module(inputs)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_loss_calculator.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_loss_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..a567c002ce356016514c833d5e208f924d9e89ab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_loss_calculator.py
@@ -0,0 +1,39 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmengine.hub import get_model
+
+from mmrazor.models.task_modules.tracer import (ImageClassifierPseudoLoss,
+                                                SingleStageDetectorPseudoLoss,
+                                                SumPseudoLoss)
+
+
+class TestLossCalculator(TestCase):
+
+    def test_image_classifier_pseudo_loss(self):
+        model = get_model(
+            'mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=False)
+        loss_calculator = ImageClassifierPseudoLoss()
+        loss = loss_calculator(model)
+        assert isinstance(loss, torch.Tensor) and loss.dim() == 0
+
+    def test_single_stage_detector_pseudo_loss(self):
+        model = get_model(
+            'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py', pretrained=False)
+        loss_calculator = SingleStageDetectorPseudoLoss()
+        loss = loss_calculator(model)
+        assert isinstance(loss, torch.Tensor) and loss.dim() == 0
+
+    def test_sumloss(self):
+        model = get_model(
+            'mmdet::retinanet/retinanet_r50_fpn_1x_coco.py', pretrained=False)
+        loss_calculator = SumPseudoLoss()
+        loss = loss_calculator(model)
+        assert isinstance(loss, torch.Tensor) and loss.dim() == 0
+
+        model = get_model(
+            'mmcls::resnet/resnet34_8xb32_in1k.py', pretrained=False)
+        loss_calculator = SumPseudoLoss()
+        loss = loss_calculator(model)
+        assert isinstance(loss, torch.Tensor) and loss.dim() == 0
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_prune_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_prune_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..63674350def0accc32c0275ec87a5b67f9d535a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_core/test_tracer/test_prune_tracer.py
@@ -0,0 +1,25 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models.task_modules.tracer import ChannelAnalyzer
+from ...data.models import SingleLineModel
+
+
+class TestChannelAnalyzer(TestCase):
+
+    def test_backward_tracer(self):
+        model = SingleLineModel()
+        tracer = ChannelAnalyzer(tracer_type='BackwardTracer')
+        unit_configs = tracer.analyze(model)
+        print(unit_configs)
+
+    def test_fx_tracer(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('torch<1.12.0')
+        model = SingleLineModel()
+        tracer = ChannelAnalyzer(tracer_type='FxTracer')
+        unit_configs = tracer.analyze(model)
+        print(unit_configs)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_data.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..df3e07f698b334c2046b8cc1e5a9de7be1447481
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_data.py
@@ -0,0 +1,93 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import unittest
+
+import torch
+
+from .data.model_library import (DefaultModelLibrary, MMClsModelLibrary,
+                                 MMDetModelLibrary, MMModelLibrary,
+                                 MMPoseModelLibrary, MMSegModelLibrary,
+                                 ModelGenerator, TorchModelLibrary)
+from .data.models import SingleLineModel
+from .data.tracer_passed_models import (BackwardPassedModelManager,
+                                        FxPassedModelManager)
+
+TEST_DATA = os.getenv('TEST_DATA') == 'true'
+
+
+class TestModelLibrary(unittest.TestCase):
+
+    def test_mmcls(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = MMClsModelLibrary(exclude=['cutmax', 'cifar'])
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    def test_defaul_library(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = DefaultModelLibrary()
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    def test_torchlibrary(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = TorchModelLibrary()
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    def test_mmdet(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = MMDetModelLibrary()
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    def test_mmseg(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = MMSegModelLibrary()
+        print(library.short_names())
+
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    # New
+    def test_mmpose(self):
+        if not TEST_DATA:
+            self.skipTest('not test data to save time.')
+        library = MMPoseModelLibrary()
+        print(library.short_names())
+        self.assertTrue(library.is_default_includes_cover_all_models())
+
+    def test_get_model_by_config(self):
+        config = 'mmcls::resnet/resnet34_8xb32_in1k.py'
+        Model = MMModelLibrary.get_model_from_path(config)
+        _ = Model()
+
+    def test_passed_models(self):
+        try:
+            print(FxPassedModelManager().include_models())
+            print(BackwardPassedModelManager().include_models())
+        except Exception:
+            self.fail()
+
+
+class TestModels(unittest.TestCase):
+
+    def _test_a_model(self, Model):
+        model = Model()
+        x = torch.rand(2, 3, 224, 224)
+        y = model(x)
+        self.assertSequenceEqual(y.shape, [2, 1000])
+
+    def test_models(self):
+        library = DefaultModelLibrary()
+        for Model in library.include_models():
+            with self.subTest(model=Model):
+                self._test_a_model(Model)
+
+    def test_generator(self):
+        Model = ModelGenerator('model', SingleLineModel)
+        model = Model()
+        model.eval()
+        self.assertEqual(model.training, False)
+        model.train()
+        self.assertEqual(model.training, True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_datasets.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_datasets.py
new file mode 100644
index 0000000000000000000000000000000000000000..1eaf72ec89121e0f9b4168e2032f060023ba372d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_datasets.py
@@ -0,0 +1,88 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import pickle
+import tempfile
+from unittest import TestCase
+
+import numpy as np
+from mmcls.registry import DATASETS as CLS_DATASETS
+
+from mmrazor.registry import DATASETS
+from mmrazor.utils import register_all_modules
+
+register_all_modules()
+ASSETS_ROOT = osp.abspath(osp.join(osp.dirname(__file__), '../data/dataset'))
+
+
+class Test_CRD_CIFAR10(TestCase):
+    ORI_DATASET_TYPE = 'CIFAR10'
+    DATASET_TYPE = 'CRDDataset'
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        super().setUpClass()
+
+        tmpdir = tempfile.TemporaryDirectory()
+        cls.tmpdir = tmpdir
+        data_prefix = tmpdir.name
+        cls.ORI_DEFAULT_ARGS = dict(
+            data_prefix=data_prefix, pipeline=[], test_mode=False)
+        cls.DEFAULT_ARGS = dict(neg_num=1, percent=0.5)
+
+        dataset_class = CLS_DATASETS.get(cls.ORI_DATASET_TYPE)
+        base_folder = osp.join(data_prefix, dataset_class.base_folder)
+        os.mkdir(base_folder)
+
+        cls.fake_imgs = np.random.randint(
+            0, 255, size=(6, 3 * 32 * 32), dtype=np.uint8)
+        cls.fake_labels = np.random.randint(0, 10, size=(6, ))
+        cls.fake_classes = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+        batch1 = dict(
+            data=cls.fake_imgs[:2], labels=cls.fake_labels[:2].tolist())
+        with open(osp.join(base_folder, 'data_batch_1'), 'wb') as f:
+            f.write(pickle.dumps(batch1))
+
+        batch2 = dict(
+            data=cls.fake_imgs[2:4], labels=cls.fake_labels[2:4].tolist())
+        with open(osp.join(base_folder, 'data_batch_2'), 'wb') as f:
+            f.write(pickle.dumps(batch2))
+
+        test_batch = dict(
+            data=cls.fake_imgs[4:], fine_labels=cls.fake_labels[4:].tolist())
+        with open(osp.join(base_folder, 'test_batch'), 'wb') as f:
+            f.write(pickle.dumps(test_batch))
+
+        meta = {dataset_class.meta['key']: cls.fake_classes}
+        meta_filename = dataset_class.meta['filename']
+        with open(osp.join(base_folder, meta_filename), 'wb') as f:
+            f.write(pickle.dumps(meta))
+
+        dataset_class.train_list = [['data_batch_1', None],
+                                    ['data_batch_2', None]]
+        dataset_class.test_list = [['test_batch', None]]
+        dataset_class.meta['md5'] = None
+
+    def test_initialize(self):
+        dataset_class = DATASETS.get(self.DATASET_TYPE)
+
+        # Test overriding metainfo by `metainfo` argument
+        ori_cfg = {
+            **self.ORI_DEFAULT_ARGS, 'metainfo': {
+                'classes': ('bus', 'car')
+            },
+            'type': self.ORI_DATASET_TYPE,
+            '_scope_': 'mmcls'
+        }
+        cfg = {'dataset': ori_cfg, **self.DEFAULT_ARGS}
+        dataset = dataset_class(**cfg)
+        self.assertEqual(dataset.dataset.CLASSES, ('bus', 'car'))
+
+    @classmethod
+    def tearDownClass(cls):
+        cls.tmpdir.cleanup()
+
+
+class Test_CRD_CIFAR100(Test_CRD_CIFAR10):
+    ORI_DATASET_TYPE = 'CIFAR100'
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_transforms/test_formatting.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_transforms/test_formatting.py
new file mode 100644
index 0000000000000000000000000000000000000000..69e211aad574be41c8bdea8eb16944f8b4dbdbbc
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_datasets/test_transforms/test_formatting.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os.path as osp
+import unittest
+
+import numpy as np
+import torch
+from mmcls.structures import ClsDataSample
+from mmengine.structures import LabelData
+
+from mmrazor.datasets.transforms import PackCRDClsInputs
+
+
+class TestPackClsInputs(unittest.TestCase):
+
+    def setUp(self):
+        """Setup the model and optimizer which are used in every test method.
+
+        TestCase calls functions in this order: setUp() -> testMethod() ->
+        tearDown() -> cleanUp()
+        """
+        data_prefix = osp.join(osp.dirname(__file__), '../../data')
+        img_path = osp.join(data_prefix, 'color.jpg')
+        rng = np.random.RandomState(0)
+        self.results1 = {
+            'sample_idx': 1,
+            'img_path': img_path,
+            'ori_height': 300,
+            'ori_width': 400,
+            'height': 600,
+            'width': 800,
+            'scale_factor': 2.0,
+            'flip': False,
+            'img': rng.rand(300, 400),
+            'gt_label': rng.randint(3, ),
+            # TODO.
+            'contrast_sample_idxs': rng.randint(3, )
+        }
+        self.meta_keys = ('sample_idx', 'img_path', 'ori_shape', 'img_shape',
+                          'scale_factor', 'flip')
+
+    def test_transform(self):
+        transform = PackCRDClsInputs(meta_keys=self.meta_keys)
+        results = transform(copy.deepcopy(self.results1))
+        self.assertIn('inputs', results)
+        self.assertIsInstance(results['inputs'], torch.Tensor)
+        self.assertIn('data_samples', results)
+        self.assertIsInstance(results['data_samples'], ClsDataSample)
+
+        data_sample = results['data_samples']
+        self.assertIsInstance(data_sample.gt_label, LabelData)
+
+    def test_repr(self):
+        transform = PackCRDClsInputs(meta_keys=self.meta_keys)
+        self.assertEqual(
+            repr(transform), f'PackCRDClsInputs(meta_keys={self.meta_keys})')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_doc.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_doc.py
new file mode 100644
index 0000000000000000000000000000000000000000..275b9ecec87074a223cc60b30099475c285103d7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_doc.py
@@ -0,0 +1,32 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from unittest import TestCase
+
+import nbformat
+from nbconvert.preprocessors import ExecutePreprocessor
+
+TEST_DOC = os.getenv('TEST_DOC') == 'true'
+notebook_paths = [
+    './mmrazor/models/mutators/channel_mutator/channel_mutator.ipynb',
+    './mmrazor/models/mutables/mutable_channel/units/mutable_channel_unit.ipynb',  # noqa
+    './demo/config_pruning.ipynb'
+]
+
+
+class TestDocs(TestCase):
+
+    def setUp(self) -> None:
+        if not TEST_DOC:
+            self.skipTest('disabled')
+
+    def test_notebooks(self):
+        for path in notebook_paths:
+            with self.subTest(path=path):
+                with open(path) as file:
+                    nb_in = nbformat.read(file, nbformat.NO_CONVERT)
+                    ep = ExecutePreprocessor(
+                        timeout=600, kernel_name='python3')
+                    try:
+                        _ = ep.preprocess(nb_in)
+                    except Exception:
+                        self.fail()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_stop_distillation_hook.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_stop_distillation_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..49e753b640f30e85c7afa6ea9e3198ed33c925a1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_stop_distillation_hook.py
@@ -0,0 +1,26 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+from unittest.mock import Mock
+
+from mmrazor.engine import StopDistillHook
+
+
+class TestStopDistillHook(TestCase):
+
+    def setUp(self):
+        self.hook = StopDistillHook(stop_epoch=5)
+        runner = Mock()
+        runner.model = Mock()
+        runner.model.distillation_stopped = False
+
+        runner.epoch = 0
+        self.runner = runner
+
+    def test_before_train_epoch(self):
+        max_epochs = 10
+        target = [False] * 5 + [True] * 5
+        for epoch in range(max_epochs):
+            self.hook.before_train_epoch(self.runner)
+            self.assertEquals(self.runner.model.distillation_stopped,
+                              target[epoch])
+            self.runner.epoch += 1
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_visualization_hook.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_visualization_hook.py
new file mode 100644
index 0000000000000000000000000000000000000000..64004843a2efb243cdec7d51ce26c591f531a8cd
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_engine/test_hooks/test_visualization_hook.py
@@ -0,0 +1,129 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import shutil
+import time
+from os.path import dirname
+from typing import Optional
+from unittest import TestCase
+from unittest.mock import Mock
+
+import torch
+import torch.nn as nn
+# TODO: The argument `out_file` has not been supported in MMEngine yet.
+#  Temporarily, we use `ClsVisualizer` here
+from mmcls.visualization import ClsVisualizer
+from mmengine import ConfigDict
+from mmengine.model import BaseModel
+
+from mmrazor.engine.hooks import RazorVisualizationHook
+
+
+def get_data_info(idx):
+    root_path = dirname(dirname(dirname(dirname(__file__))))
+    return {
+        'img_path': os.path.join(root_path, 'tools/visualizations/demo.jpg')
+    }
+
+
+class ToyModel(BaseModel):
+
+    def __init__(self):
+        data_preprocessor = dict(
+            type='mmcls.ClsDataPreprocessor',
+            # RGB format normalization parameters
+            mean=[123.675, 116.28, 103.53],
+            std=[58.395, 57.12, 57.375],
+            # convert image from BGR to RGB
+            to_rgb=True,
+        )
+        super().__init__(data_preprocessor=data_preprocessor)
+        self.op = nn.Conv2d(3, 3, 1)
+
+    def forward(self,
+                inputs: torch.Tensor,
+                data_samples: Optional[list] = None,
+                mode: str = 'tensor'):
+        out = self.op(inputs)
+        return out
+
+
+class TestVisualizationHook(TestCase):
+
+    def setUp(self) -> None:
+        # TODO: The argument `out_file` has not been supported in MMEngine yet.
+        #  Temporarily, we use `ClsVisualizer` here
+        ClsVisualizer.get_instance('visualizer')
+
+        test_pipeline = [
+            dict(type='LoadImageFromFile'),
+            dict(type='Resize', scale=(1333, 800), keep_ratio=True),
+            dict(type='mmcls.PackClsInputs')
+        ]
+
+        self.runner = Mock()
+        self.runner.val_loop.dataloader.dataset.get_data_info = get_data_info
+        self.runner.cfg = ConfigDict(
+            test_dataloader=dict(dataset=dict(pipeline=test_pipeline)))
+        self.runner.model = ToyModel()
+
+        self.recorders = ConfigDict(
+            out=dict(_scope_='mmrazor', type='ModuleOutputs', source='op'))
+        self.mappings = ConfigDict(out=dict(recorder='out'))
+
+    def test_before_run(self):
+        hook = RazorVisualizationHook(self.recorders, self.mappings)
+        hook.before_run(self.runner)
+
+    def test_before_train(self):
+        timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
+        out_dir = timestamp + '1'
+        self.runner.work_dir = timestamp
+        self.runner.timestamp = '1'
+        self.runner.epoch = 0
+
+        hook = RazorVisualizationHook(
+            self.recorders, self.mappings, out_dir=out_dir, enabled=False)
+        # initialize recorders
+        hook.before_run(self.runner)
+        hook.before_train(self.runner)
+        self.assertTrue(not osp.exists(f'{timestamp}/1/{out_dir}'))
+
+        hook = RazorVisualizationHook(
+            self.recorders, self.mappings, out_dir=out_dir, enabled=True)
+        # initialize recorders
+        hook.before_run(self.runner)
+        hook.before_train(self.runner)
+        self.assertTrue(osp.exists(f'{timestamp}/1/{out_dir}'))
+        shutil.rmtree(f'{timestamp}')
+
+    def test_after_train_epoch(self):
+        timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
+        out_dir = timestamp + '1'
+        self.runner.work_dir = timestamp
+        self.runner.timestamp = '1'
+
+        hook = RazorVisualizationHook(
+            self.recorders, self.mappings, out_dir=out_dir, enabled=False)
+        # initialize recorders
+        hook.before_run(self.runner)
+        self.runner.epoch = 0
+        hook.after_train_epoch(self.runner)
+        self.assertTrue(not osp.exists(f'{timestamp}/1/{out_dir}'))
+
+        self.runner.epoch = 1
+        hook = RazorVisualizationHook(
+            self.recorders,
+            self.mappings,
+            out_dir=out_dir,
+            enabled=True,
+            interval=2)
+        # initialize recorders
+        hook.before_run(self.runner)
+        hook.after_train_epoch(self.runner)
+        self.assertTrue(not osp.exists(f'{timestamp}/1/{out_dir}'))
+
+        self.runner.epoch = 2
+        hook.after_train_epoch(self.runner)
+        self.assertTrue(osp.exists(f'{timestamp}/1/{out_dir}'))
+        shutil.rmtree(f'{timestamp}')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..ec270728291289797b97a624b97e244033b16b8e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_algorithm.py
@@ -0,0 +1,68 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmcls.structures import ClsDataSample
+from mmengine import MessageHub
+
+from mmrazor.implementations.pruning.group_fisher.algorithm import \
+    GroupFisherAlgorithm
+from mmrazor.implementations.pruning.group_fisher.ops import GroupFisherConv2d
+from ....data.models import MMClsResNet18
+
+if torch.cuda.is_available():
+    DEVICE = torch.device('cuda:0')
+else:
+    DEVICE = torch.device('cpu')
+
+
+class TestGroupFisherPruneAlgorithm(TestCase):
+
+    def fake_cifar_data(self):
+        imgs = torch.randn(16, 3, 32, 32).to(DEVICE)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 10,
+                                                       (16, ))).to(DEVICE)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_group_fisher_prune(self):
+        data = self.fake_cifar_data()
+
+        MUTATOR_CONFIG = dict(
+            type='GroupFisherChannelMutator',
+            parse_cfg=dict(
+                type='ChannelAnalyzer', tracer_type='BackwardTracer'),
+            channel_unit_cfg=dict(type='GroupFisherChannelUnit'))
+
+        epoch = 2
+        interval = 1
+
+        algorithm = GroupFisherAlgorithm(
+            MMClsResNet18(), mutator=MUTATOR_CONFIG,
+            interval=interval).to(DEVICE)
+        mutator = algorithm.mutator
+
+        for e in range(epoch):
+            for ite in range(10):
+                self._set_epoch_ite(e, ite, epoch)
+                algorithm.forward(
+                    data['inputs'], data['data_samples'], mode='loss')
+                self.gen_fake_grad(mutator)
+        self.assertEqual(interval, algorithm.interval)
+
+    def gen_fake_grad(self, mutator):
+        for unit in mutator.mutable_units:
+            for channel in unit.input_related:
+                module = channel.module
+                if isinstance(module, GroupFisherConv2d):
+                    module.recorded_grad = module.recorded_input
+
+    def _set_epoch_ite(self, epoch, ite, max_epoch):
+        iter_per_epoch = 10
+        message_hub = MessageHub.get_current_instance()
+        message_hub.update_info('epoch', epoch)
+        message_hub.update_info('max_epochs', max_epoch)
+        message_hub.update_info('max_iters', max_epoch * 10)
+        message_hub.update_info('iter', ite + iter_per_epoch * epoch)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_deploy_sub_model.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_deploy_sub_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ae819bf82f22e03a4b6f6c8b0843a7d854c9b2f
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_deploy_sub_model.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from unittest import TestCase
+
+import torch
+from mmengine import fileio
+
+from mmrazor import digit_version
+from mmrazor.implementations.pruning.group_fisher.prune_deploy_sub_model import \
+    GroupFisherDeploySubModel  # noqa
+from ....data.models import MMClsResNet18
+from .test_prune_sub_model import PruneAlgorithm, get_model_structure
+
+
+class TestPruneDeploySubModel(TestCase):
+
+    def check_torch_version(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+
+    def test_build_sub_model(self):
+        self.check_torch_version()
+        model = MMClsResNet18()
+
+        parse_cfg = dict(
+            _scope_='mmrazor',
+            type='ChannelAnalyzer',
+            demo_input=(1, 3, 224, 224),
+            tracer_type='BackwardTracer')
+        # get structure
+        algorithm = PruneAlgorithm(copy.deepcopy(model))
+        algorithm.random_prune()
+        strucutrue = algorithm.mutator.current_choices
+
+        # test divisor
+        wrapper = GroupFisherDeploySubModel(
+            copy.deepcopy(model), strucutrue, divisor=1, parse_cfg=parse_cfg)
+        self.assertSequenceEqual(
+            list(strucutrue.values()),
+            list(get_model_structure(wrapper).values()))
+
+        wrapper = GroupFisherDeploySubModel(
+            copy.deepcopy(model), strucutrue, divisor=8, parse_cfg=parse_cfg)
+        self.assertSequenceEqual(
+            list(strucutrue.values()),
+            list(get_model_structure(wrapper).values()))
+
+        mutable_path = os.path.dirname(__file__) + '/mutable.json'
+        fileio.dump(algorithm.mutator.current_choices, mutable_path)
+        GroupFisherDeploySubModel(
+            copy.deepcopy(model),
+            divisor=1,
+            mutable_cfg=mutable_path,
+            parse_cfg=parse_cfg)
+        os.remove(mutable_path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_sub_model.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_sub_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..bb85b71a6ac7e0af144ad871293d36c58f0fb2a5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_prune_sub_model.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Dict, Union
+from unittest import TestCase
+
+import torch
+
+from mmrazor import digit_version
+from mmrazor.implementations.pruning.group_fisher.prune_sub_model import \
+    GroupFisherSubModel
+from mmrazor.models import BaseAlgorithm
+from mmrazor.models.mutators import ChannelMutator
+from mmrazor.registry import MODELS
+from ....data.models import MMClsResNet18
+
+
+class PruneAlgorithm(BaseAlgorithm):
+
+    def __init__(self,
+                 architecture,
+                 mutator: Union[Dict, ChannelMutator] = dict(
+                     type='ChannelMutator',
+                     channel_unit_cfg=dict(
+                         type='SequentialMutableChannelUnit')),
+                 data_preprocessor=None,
+                 init_cfg=None) -> None:
+        super().__init__(
+            architecture, data_preprocessor, init_cfg, module_inplace=False)
+        if isinstance(mutator, dict):
+            mutator = MODELS.build(mutator)
+        assert isinstance(mutator, ChannelMutator)
+        self.mutator = mutator
+        mutator.prepare_from_supernet(self.architecture)
+
+    def random_prune(self):
+        choices = self.mutator.sample_choices()
+        self.mutator.set_choices(choices)
+
+
+def get_model_structure(model):
+    algorithm = PruneAlgorithm(copy.deepcopy(model))
+    return algorithm.mutator.current_choices
+
+
+class TestPruneSubModel(TestCase):
+
+    def check_torch_version(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+
+    def test_build_sub_model(self):
+        self.check_torch_version()
+        x = torch.rand([1, 3, 224, 224])
+        model = MMClsResNet18()
+        algorithm = PruneAlgorithm(model)
+        algorithm.random_prune()
+
+        # test divisor
+        static_model1 = GroupFisherSubModel(algorithm, divisor=1)
+        self.assertSequenceEqual(
+            list(algorithm.mutator.current_choices.values()),
+            list(get_model_structure(static_model1).values()))
+
+        static_model2 = GroupFisherSubModel(algorithm, divisor=8)
+        for value in get_model_structure(static_model2).values():
+            self.assertTrue(value % 8 == 0)
+
+        y1 = static_model1(x)
+        y2 = static_model2(x)
+        self.assertTrue((y1 - y2).abs().max() < 1e-3)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_unit.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..712d2fb50ccb6e0a6c84e9b5f7c579bf19b05175
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_impl/test_pruning/test_group_fisher/test_unit.py
@@ -0,0 +1,44 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import torch
+
+from mmrazor.implementations.pruning.group_fisher import \
+    GroupFisherChannelMutator
+from ....data.models import MMClsResNet18
+
+
+class TestGroupFisherChannelUnit(unittest.TestCase):
+
+    def test_init(self):
+        model = MMClsResNet18()
+        mutator = GroupFisherChannelMutator(
+            parse_cfg=dict(
+                type='ChannelAnalyzer',
+                demo_input=(1, 3, 224, 224),
+                tracer_type='BackwardTracer'))
+        mutator.prepare_from_supernet(model)
+
+        x = torch.rand([1, 3, 224, 224])
+        mutator.start_record_info()
+        for i in range(2):
+            model.train()
+            loss = model(x).sum()
+            loss.backward()
+        mutator.end_record_info()
+
+        for unit in mutator.mutable_units:
+            for module in unit.input_related_dynamic_ops:
+                self.assertEqual(len(module.recorded_input), 2)
+                self.assertEqual(len(module.recorded_grad), 2)
+                self.assertIsInstance(module.recorded_grad[0], torch.Tensor)
+
+        unit = mutator.mutable_units[0]
+        fisher = unit._fisher_of_a_module(next(unit.input_related_dynamic_ops))
+        self.assertEqual(list(fisher.shape), [1, unit.num_channels])
+
+        fisher = unit.current_batch_fisher
+        self.assertEqual(list(fisher.shape), [unit.num_channels])
+
+        fisher = unit._get_normalized_fisher_info(fisher, unit.delta_type)
+        unit.update_fisher_info()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoformer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2baa703fe479b1c19ccba231c527612ebbf5fc4a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoformer.py
@@ -0,0 +1,73 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models import Autoformer, NasMutator
+from mmrazor.registry import MODELS
+
+arch_setting = dict(
+    mlp_ratios=[3.0, 3.5, 4.0],
+    num_heads=[8, 9, 10],
+    depth=[14, 15, 16],
+    embed_dims=[528, 576, 624])
+
+MUTATOR_CFG = dict(type='NasMutator')
+
+ARCHITECTURE_CFG = dict(
+    _scope_='mmrazor',
+    type='SearchableImageClassifier',
+    backbone=dict(
+        _scope_='mmrazor',
+        type='AutoformerBackbone',
+        arch_setting=arch_setting),
+    neck=None,
+    head=dict(
+        type='DynamicLinearClsHead',
+        num_classes=1000,
+        in_channels=624,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            mode='original',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            loss_weight=1.0),
+        topk=(1, 5)),
+    connect_head=dict(connect_with_backbone='backbone.last_mutable'),
+)
+
+ALGORITHM_CFG = dict(
+    type='mmrazor.Autoformer',
+    architecture=ARCHITECTURE_CFG,
+    mutator=MUTATOR_CFG)
+
+
+class TestAutoFormer(TestCase):
+
+    def test_init(self):
+        ALGORITHM_CFG_SUPERNET = copy.deepcopy(ALGORITHM_CFG)
+        # initiate autoformer with built `algorithm`.
+        autoformer_algo = MODELS.build(ALGORITHM_CFG_SUPERNET)
+        self.assertIsInstance(autoformer_algo, Autoformer)
+        self.assertIsInstance(autoformer_algo.mutator, NasMutator)
+
+        # autoformer search_groups
+        random_subnet = autoformer_algo.mutator.sample_choices()
+        self.assertIsInstance(random_subnet, dict)
+
+        # initiate autoformer without any `mutator`.
+        ALGORITHM_CFG_SUPERNET.pop('type')
+        ALGORITHM_CFG_SUPERNET['mutator'] = None
+        none_type = type(ALGORITHM_CFG_SUPERNET['mutator'])
+        with self.assertRaisesRegex(
+                TypeError, 'mutator should be a `dict` or `NasMutator` '
+                f'instance, but got {none_type}.'):
+            _ = Autoformer(**ALGORITHM_CFG_SUPERNET)
+
+    def test_loss(self):
+        # supernet
+        inputs = torch.randn(1, 3, 224, 224)
+        autoformer = MODELS.build(ALGORITHM_CFG)
+        loss = autoformer(inputs)
+        assert loss.size(1) == 1000
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoslim.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoslim.py
new file mode 100644
index 0000000000000000000000000000000000000000..fed6ca5e099b7ffc7bc6bff2afdf1c9d59489259
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_autoslim.py
@@ -0,0 +1,195 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import unittest
+from typing import Dict, List, Tuple, Union
+from unittest import TestCase
+from unittest.mock import Mock
+
+import pytest
+import torch
+import torch.distributed as dist
+from mmcls.structures import ClsDataSample
+from mmengine.optim import build_optim_wrapper
+
+from mmrazor import digit_version
+from mmrazor.models.algorithms import AutoSlim, AutoSlimDDP
+
+MUTATOR_TYPE = Union[torch.nn.Module, Dict]
+DISTILLER_TYPE = Union[torch.nn.Module, Dict]
+
+ARCHITECTURE_CFG = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(type='MobileNetV2', widen_factor=1.5),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='mmcls.LinearClsHead',
+        num_classes=1000,
+        in_channels=1920,
+        loss=dict(type='mmcls.CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5)))
+
+ONESHOT_MUTABLE_CFG = dict(
+    type='OneShotMutableChannel',
+    candidate_choices=[1 / 8, 2 / 8, 3 / 8, 4 / 8, 5 / 8, 6 / 8, 7 / 8, 1.0],
+    candidate_mode='ratio')
+ONESHOT_MUTABLE_CFGS = dict(
+    in_features=ONESHOT_MUTABLE_CFG,
+    out_features=ONESHOT_MUTABLE_CFG,
+    in_channels=ONESHOT_MUTABLE_CFG,
+    out_channels=ONESHOT_MUTABLE_CFG,
+    num_features=ONESHOT_MUTABLE_CFG)
+
+MUTATOR_CFG = dict(
+    type='OneShotChannelMutator',
+    channel_unit_cfg=dict(
+        type='OneShotMutableChannelUnit',
+        default_args=dict(
+            candidate_choices=list(i / 12 for i in range(2, 13)),
+            choice_mode='ratio')),
+    parse_cfg=dict(type='ChannelAnalyzer'))
+
+DISTILLER_CFG = dict(
+    type='ConfigurableDistiller',
+    teacher_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    student_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    distill_losses=dict(
+        loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+    loss_forward_mappings=dict(
+        loss_kl=dict(
+            preds_S=dict(recorder='fc', from_student=True),
+            preds_T=dict(recorder='fc', from_student=False))))
+
+OPTIM_WRAPPER_CFG = dict(
+    optimizer=dict(
+        type='mmcls.SGD',
+        lr=0.5,
+        momentum=0.9,
+        weight_decay=4e-05,
+        _scope_='mmrazor'),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0, norm_decay_mult=0.0, dwconv_decay_mult=0.0),
+    clip_grad=None,
+    accumulative_counts=4)
+
+
+class FakeMutator:
+    ...
+
+
+class ToyDataPreprocessor(torch.nn.Module):
+
+    def forward(
+            self,
+            data: Dict,
+            training: bool = True) -> Tuple[torch.Tensor, List[ClsDataSample]]:
+        return data
+
+
+@unittest.skipIf(
+    digit_version(torch.__version__) == digit_version('1.8.1'),
+    'PyTorch version 1.8.1 is not supported by the Backward Tracer.')
+class TestAutoSlim(TestCase):
+    device: str = 'cpu'
+
+    def _prepare_fake_data(self) -> Dict:
+        imgs = torch.randn(16, 3, 224, 224).to(self.device)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 1000,
+                                                       (16, ))).to(self.device)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def prepare_model(self,
+                      mutator_cfg: MUTATOR_TYPE = MUTATOR_CFG,
+                      distiller_cfg: DISTILLER_TYPE = DISTILLER_CFG,
+                      architecture_cfg: Dict = ARCHITECTURE_CFG,
+                      num_random_samples: int = 2) -> AutoSlim:
+        model = AutoSlim(
+            mutator=mutator_cfg,
+            distiller=distiller_cfg,
+            architecture=architecture_cfg,
+            data_preprocessor=ToyDataPreprocessor(),
+            num_random_samples=num_random_samples)
+        model.to(self.device)
+
+        return model
+
+    def test_init(self) -> None:
+        mutator_wrong_type = FakeMutator()
+        with pytest.raises(Exception):
+            _ = self.prepare_model(mutator_wrong_type)
+
+        algo = self.prepare_model()
+        self.assertSequenceEqual(
+            algo.mutator.mutable_units[0].candidate_choices,
+            list(i / 12 for i in range(2, 13)),
+        )
+
+    def test_autoslim_train_step(self) -> None:
+        algo = self.prepare_model()
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(algo, OPTIM_WRAPPER_CFG)
+        fake_message_hub = Mock()
+        fake_message_hub.runtime_info = {'iter': 0, 'max_iters': 100}
+        optim_wrapper.message_hub = fake_message_hub
+        assert not algo._optim_wrapper_count_status_reinitialized
+        losses = algo.train_step(data, optim_wrapper)
+
+        assert len(losses) == 7
+        assert losses['max_subnet.loss'] > 0
+        assert losses['min_subnet.loss'] > 0
+        assert losses['min_subnet.loss_kl'] + 1e-5 > 0
+        assert losses['random0_subnet.loss'] > 0
+        assert losses['random0_subnet.loss_kl'] + 1e-5 > 0
+        assert losses['random1_subnet.loss'] > 0
+        assert losses['random1_subnet.loss_kl'] + 1e-5 > 0
+
+        assert algo._optim_wrapper_count_status_reinitialized
+        assert optim_wrapper._inner_count == 4
+        assert optim_wrapper._max_counts == 400
+
+        losses = algo.train_step(data, optim_wrapper)
+        assert algo._optim_wrapper_count_status_reinitialized
+
+
+class TestAutoSlimDDP(TestAutoSlim):
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12355'
+
+        # initialize the process group
+        if torch.cuda.is_available():
+            backend = 'nccl'
+            cls.device = 'cuda'
+        else:
+            backend = 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def prepare_model(self,
+                      mutator_cfg: MUTATOR_TYPE = MUTATOR_CFG,
+                      distiller_cfg: DISTILLER_TYPE = DISTILLER_CFG,
+                      architecture_cfg: Dict = ARCHITECTURE_CFG,
+                      num_random_samples: int = 2) -> AutoSlim:
+        model = super().prepare_model(
+            mutator_cfg=mutator_cfg,
+            distiller_cfg=distiller_cfg,
+            architecture_cfg=architecture_cfg,
+            num_random_samples=num_random_samples)
+
+        return AutoSlimDDP(module=model, find_unused_parameters=True)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        dist.destroy_process_group()
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='cuda device is not avaliable')
+    def test_init(self) -> None:
+        model = super().prepare_model()
+        ddp_model = AutoSlimDDP(module=model, device_ids=[0])
+
+        self.assertIsInstance(ddp_model, AutoSlimDDP)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_base_algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_base_algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..502ed4f5eae8535817d3fe2ef076284689c6984d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_base_algorithm.py
@@ -0,0 +1,124 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine.model import BaseDataPreprocessor, BaseModel
+
+from mmrazor.models import BaseAlgorithm
+from mmrazor.models.task_modules import ModuleOutputsRecorder
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class CustomDataPreprocessor(BaseDataPreprocessor):
+
+    def forward(self, data, training=False):
+        if training:
+            return 1
+        else:
+            return 2
+
+
+@MODELS.register_module()
+class ToyModel(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        self.conv = nn.Conv2d(3, 1, 1)
+        self.bn = nn.BatchNorm2d(1)
+        self.relu = nn.ReLU(inplace=True)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.relu(self.bn(self.conv(batch_inputs)))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.relu(self.bn(self.conv(batch_inputs) + 1))
+            return out
+        elif mode == 'tensor':
+            out = self.relu(self.bn(self.conv(batch_inputs) + 2))
+            return out
+
+
+class TestBaseAlgorithm(TestCase):
+
+    def test_init(self):
+        # initiate model without `data_preprocessor`
+        model = ToyModel()
+        alg = BaseAlgorithm(ToyModel())
+        self.assertIsInstance(alg.data_preprocessor, BaseDataPreprocessor)
+        # self.assertIs(alg.data_preprocessor, model.data_preprocessor)
+
+        # initiate model with unbuilt `data_preprocessor`.
+        data_preprocessor = dict(type='mmrazor.CustomDataPreprocessor')
+        alg = BaseAlgorithm(ToyModel(), data_preprocessor=data_preprocessor)
+        self.assertIsInstance(alg.data_preprocessor, CustomDataPreprocessor)
+
+        # initiate algorithm with built `data_preprocessor`.
+        data_preprocessor = CustomDataPreprocessor()
+        alg = BaseAlgorithm(
+            ToyModel(data_preprocessor), data_preprocessor=data_preprocessor)
+        self.assertIs(alg.data_preprocessor, data_preprocessor)
+        self.assertIs(alg.data_preprocessor,
+                      alg.architecture.data_preprocessor)
+        alg = BaseAlgorithm(
+            ToyModel(data_preprocessor), data_preprocessor=None)
+        self.assertIs(alg.data_preprocessor, data_preprocessor)
+        self.assertIs(alg.data_preprocessor,
+                      alg.architecture.data_preprocessor)
+
+        # initiate algorithm with built `model`.
+        model = ToyModel()
+        alg = BaseAlgorithm(model)
+        self.assertIs(alg.architecture, model)
+
+        # initiate algorithm with unbuilt `model`.
+        model = dict(type='ToyModel')
+        alg = BaseAlgorithm(model)
+        self.assertIsInstance(alg.architecture, ToyModel)
+
+        # initiate algorithm with error type `model`.
+        with self.assertRaisesRegex(TypeError, 'architecture should be'):
+            BaseAlgorithm(architecture=[model])
+
+    def test_forward(self):
+
+        model = ToyModel()
+        alg = BaseAlgorithm(model)
+
+        inputs = torch.randn(1, 3, 8, 8)
+
+        loss = alg(inputs, mode='loss')
+        loss_ = alg.loss(inputs)
+        self.assertEqual(loss['loss'].sum(), loss_['loss'].sum())
+
+        predict = alg(inputs, mode='predict')
+        predict_ = alg._predict(inputs)
+        self.assertEqual(predict.sum(), predict_.sum())
+
+        tensor = alg(inputs, mode='tensor')
+        tensor_ = alg._forward(inputs)
+        self.assertEqual(tensor.sum(), tensor_.sum())
+
+        with self.assertRaisesRegex(RuntimeError, 'Invalid mode "A"'):
+            alg(inputs, mode='A')
+
+    def test_set_module_inplace_false(self):
+        inputs = torch.randn(1, 3, 8, 8)
+
+        model = ToyModel()
+        res_before = model(inputs)
+        _ = BaseAlgorithm(model)
+
+        r1 = ModuleOutputsRecorder('bn')
+        r1.initialize(model)
+        with r1:
+            res_after = model(inputs)
+        self.assertIs(torch.equal(res_before, res_after), True)
+
+        self.assertIs(model.relu.inplace, False)
+
+        self.assertIs(
+            torch.equal(r1.data_buffer[0], model.bn(model.conv(inputs) + 2)),
+            True)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_bignas.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_bignas.py
new file mode 100644
index 0000000000000000000000000000000000000000..41ce4673d9041567f6eb775bfa5ff701a235cf0b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_bignas.py
@@ -0,0 +1,111 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models import BigNAS, NasMutator
+from mmrazor.registry import MODELS
+
+arch_setting = dict(
+    kernel_size=[
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+    ],
+    num_blocks=[
+        [1, 2, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [1, 2, 1],
+    ],
+    expand_ratio=[
+        [1, 1, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+    ],
+    num_out_channels=[
+        [16, 24, 8],  # first layer
+        [16, 24, 8],
+        [24, 32, 8],
+        [32, 40, 8],
+        [64, 72, 8],
+        [72, 72, 8],  # last layer
+    ])
+
+MUTATOR_CFG = dict(type='NasMutator')
+
+DISTILLER_CFG = dict(
+    _scope_='mmrazor',
+    type='ConfigurableDistiller',
+    teacher_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    student_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    distill_losses=dict(
+        loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+    loss_forward_mappings=dict(
+        loss_kl=dict(
+            preds_S=dict(recorder='fc', from_student=True),
+            preds_T=dict(recorder='fc', from_student=False))))
+
+ARCHITECTURE_CFG = dict(
+    type='mmrazor.SearchableImageClassifier',
+    backbone=dict(
+        type='mmrazor.AttentiveMobileNetV3',
+        arch_setting=arch_setting,
+        out_indices=(4, ),
+        conv_cfg=dict(type='mmrazor.BigNasConv2d'),
+        norm_cfg=dict(type='mmrazor.DynamicBatchNorm2d', momentum=0.0)),
+    neck=dict(type='mmcls.GlobalAveragePooling'),
+    head=dict(
+        _scope_='mmrazor',
+        type='DynamicLinearClsHead',
+        num_classes=1000,
+        in_channels=72,
+        loss=dict(
+            type='mmcls.LabelSmoothLoss',
+            mode='original',
+            num_classes=1000,
+            label_smooth_val=0.1,
+            loss_weight=1.0),
+        topk=(1, 5)),
+    connect_head=dict(connect_with_backbone='backbone.last_mutable_channels'),
+)
+
+ALGORITHM_CFG = dict(
+    type='mmrazor.BigNAS',
+    architecture=ARCHITECTURE_CFG,
+    mutator=MUTATOR_CFG,
+    distiller=DISTILLER_CFG)
+
+
+class TestBigNAS(TestCase):
+
+    def test_init(self):
+        ALGORITHM_CFG_SUPERNET = copy.deepcopy(ALGORITHM_CFG)
+        # initiate bignas with built `algorithm`.
+        bignas_algo = MODELS.build(ALGORITHM_CFG_SUPERNET)
+        self.assertIsInstance(bignas_algo, BigNAS)
+        self.assertIsInstance(bignas_algo.mutator, NasMutator)
+
+        # bignas search_groups
+        random_subnet = bignas_algo.mutator.sample_choices()
+        self.assertIsInstance(random_subnet, dict)
+
+        # initiate bignas without any `mutator`.
+        ALGORITHM_CFG_SUPERNET.pop('type')
+        ALGORITHM_CFG_SUPERNET['mutator'] = None
+        none_type = type(ALGORITHM_CFG_SUPERNET['mutator'])
+        with self.assertRaisesRegex(
+                TypeError, 'mutator should be a `dict` or `NasMutator` '
+                f'instance, but got {none_type}.'):
+            _ = BigNAS(**ALGORITHM_CFG_SUPERNET)
+
+    def test_loss(self):
+        # supernet
+        inputs = torch.randn(1, 3, 224, 224)
+        bignas = MODELS.build(ALGORITHM_CFG)
+        loss = bignas(inputs)
+        assert loss.size(1) == 1000
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_darts.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_darts.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d0949fa0e82cff6e8f5bbb8fe186ea4e4ef4c5c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_darts.py
@@ -0,0 +1,266 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from typing import Dict
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+from mmcls.structures import ClsDataSample
+from mmengine.model import BaseModel
+from mmengine.optim import build_optim_wrapper
+from mmengine.optim.optimizer import OptimWrapper, OptimWrapperDict
+from torch import Tensor
+from torch.optim import SGD
+
+from mmrazor.models import Darts, DiffMutableOP, NasMutator
+from mmrazor.models.algorithms.nas.darts import DartsDDP
+from mmrazor.registry import MODELS
+from mmrazor.structures import load_fix_subnet
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+@MODELS.register_module()
+class ToyDiffModule2(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+
+        self.candidates = dict(
+            torch_conv2d_3x3=dict(
+                type='torchConv2d',
+                kernel_size=3,
+                padding=1,
+            ),
+            torch_conv2d_5x5=dict(
+                type='torchConv2d',
+                kernel_size=5,
+                padding=2,
+            ),
+            torch_conv2d_7x7=dict(
+                type='torchConv2d',
+                kernel_size=7,
+                padding=3,
+            ),
+        )
+        module_kwargs = dict(
+            in_channels=3,
+            out_channels=8,
+            stride=1,
+        )
+        self.mutable = DiffMutableOP(
+            candidates=self.candidates,
+            module_kwargs=module_kwargs,
+            alias='normal')
+
+        self.bn = nn.BatchNorm2d(8)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.bn(self.mutable(batch_inputs))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.bn(self.mutable(batch_inputs)) + 1
+            return out
+        elif mode == 'tensor':
+            out = self.bn(self.mutable(batch_inputs)) + 2
+            return out
+
+
+class TestDarts(TestCase):
+
+    def setUp(self) -> None:
+        self.device: str = 'cpu'
+
+        OPTIMIZER_CFG = dict(
+            type='SGD',
+            lr=0.5,
+            momentum=0.9,
+            nesterov=True,
+            weight_decay=0.0001)
+
+        self.OPTIM_WRAPPER_CFG = dict(optimizer=OPTIMIZER_CFG)
+
+    def test_init(self) -> None:
+        # initiate darts when `norm_training` is True.
+        model = ToyDiffModule2()
+        mutator = NasMutator()
+        algo = Darts(architecture=model, mutator=mutator, norm_training=True)
+        algo.eval()
+        self.assertTrue(model.bn.training)
+
+        # initiate darts with built mutator
+        model = ToyDiffModule2()
+        mutator = NasMutator()
+        algo = Darts(model, mutator)
+        self.assertIs(algo.mutator, mutator)
+
+        # initiate darts with unbuilt mutator
+        mutator = dict(type='NasMutator')
+        algo = Darts(model, mutator)
+        self.assertIsInstance(algo.mutator, NasMutator)
+
+        # test load fix_subnet
+        fix_subnet = {
+            'normal': {
+                'chosen': ['torch_conv2d_3x3', 'torch_conv2d_7x7']
+            }
+        }
+        load_fix_subnet(model, fix_subnet)
+        algo = Darts(model, mutator)
+        self.assertEqual(algo.architecture.mutable.num_choices, 2)
+
+        # initiate darts with error type `mutator`
+        with self.assertRaisesRegex(TypeError, 'mutator should be'):
+            Darts(model, model)
+
+    def test_forward_loss(self) -> None:
+        inputs = torch.randn(1, 3, 8, 8)
+        model = ToyDiffModule2()
+
+        # supernet
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mutator.prepare_arch_params()
+
+        # subnet
+        fix_subnet = fix_subnet = {
+            'normal': {
+                'chosen': ['torch_conv2d_3x3', 'torch_conv2d_7x7']
+            }
+        }
+        load_fix_subnet(model, fix_subnet)
+        loss = model(inputs, mode='loss')
+        self.assertIsInstance(loss, dict)
+
+    def _prepare_fake_data(self) -> Dict:
+        imgs = torch.randn(16, 3, 224, 224).to(self.device)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 1000,
+                                                       (16, ))).to(self.device)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_search_subnet(self) -> None:
+        model = ToyDiffModule2()
+
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mutator.prepare_arch_params()
+
+        algo = Darts(model, mutator)
+        subnet = algo.mutator.sample_choices()
+        self.assertIsInstance(subnet, dict)
+
+    def test_darts_train_step(self) -> None:
+        model = ToyDiffModule2()
+
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mutator.prepare_arch_params()
+
+        # data is tensor
+        algo = Darts(model, mutator)
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(algo, self.OPTIM_WRAPPER_CFG)
+        loss = algo.train_step(data, optim_wrapper)
+
+        self.assertTrue(isinstance(loss['loss'], Tensor))
+
+        # data is tuple or list
+        algo = Darts(model, mutator)
+        data = [self._prepare_fake_data() for _ in range(2)]
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(model.parameters(), lr=0.01)))
+        loss = algo.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
+
+    def test_darts_with_unroll(self) -> None:
+        model = ToyDiffModule2()
+
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mutator.prepare_arch_params()
+
+        # data is tuple or list
+        algo = Darts(model, mutator, unroll=True)
+        data = [self._prepare_fake_data() for _ in range(2)]
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(model.parameters(), lr=0.01)))
+        loss = algo.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
+
+
+class TestDartsDDP(TestDarts):
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12345'
+
+        # initialize the process group
+        backend = 'nccl' if torch.cuda.is_available() else 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def prepare_model(self, unroll=False, device_ids=None) -> Darts:
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+        model = ToyDiffModule2()
+
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mutator.prepare_arch_params()
+
+        algo = Darts(model, mutator, unroll=unroll).to(self.device)
+
+        return DartsDDP(
+            module=algo, find_unused_parameters=True, device_ids=device_ids)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        dist.destroy_process_group()
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='cuda device is not avaliable')
+    def test_init(self) -> None:
+        ddp_model = self.prepare_model()
+        self.assertIsInstance(ddp_model, DartsDDP)
+
+    def test_dartsddp_train_step(self) -> None:
+        # data is tensor
+        ddp_model = self.prepare_model()
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(ddp_model, self.OPTIM_WRAPPER_CFG)
+        loss = ddp_model.train_step(data, optim_wrapper)
+
+        self.assertIsNotNone(loss)
+
+        # data is tuple or list
+        ddp_model = self.prepare_model()
+        data = [self._prepare_fake_data() for _ in range(2)]
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(ddp_model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(ddp_model.parameters(), lr=0.01)))
+        loss = ddp_model.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
+
+    def test_dartsddp_with_unroll(self) -> None:
+        # data is tuple or list
+        ddp_model = self.prepare_model(unroll=True)
+        data = [self._prepare_fake_data() for _ in range(2)]
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(ddp_model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(ddp_model.parameters(), lr=0.01)))
+        loss = ddp_model.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_datafree_distill.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_datafree_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..a427e95dde6aa029d011025ce870085760a96f65
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_datafree_distill.py
@@ -0,0 +1,221 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+from mmengine import ConfigDict
+from mmengine.optim import build_optim_wrapper
+
+from mmrazor.models import DAFLDataFreeDistillation, DataFreeDistillation
+
+
+class TestDataFreeDistill(TestCase):
+
+    def test_init(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teachers=dict(
+                tea1=dict(build_cfg=dict(type='ToyTeacher')),
+                tea2=dict(build_cfg=dict(type='ToyTeacher'))),
+            generator=dict(type='ToyGenerator'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea1_conv=dict(type='ModuleOutputs', source='tea1.conv')),
+                distill_losses=dict(loss_dis=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_dis=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea1_conv')))),
+            generator_distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea2_conv=dict(type='ModuleOutputs', source='tea2.conv')),
+                distill_losses=dict(loss_gen=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_gen=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea2_conv')))),
+        )
+
+        alg = DataFreeDistillation(**alg_kwargs)
+        self.assertEquals(len(alg.teachers), len(alg_kwargs['teachers']))
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teachers'] = 'ToyTeacher'
+        with self.assertRaisesRegex(TypeError,
+                                    'teacher should be a `dict` but got '):
+            alg = DataFreeDistillation(**alg_kwargs_)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['generator'] = 'ToyGenerator'
+        with self.assertRaisesRegex(
+                TypeError, 'generator should be a `dict` instance, but got '):
+            _ = DataFreeDistillation(**alg_kwargs_)
+
+    def test_loss(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teachers=dict(
+                tea1=dict(build_cfg=dict(type='ToyTeacher')),
+                tea2=dict(build_cfg=dict(type='ToyTeacher'))),
+            generator=dict(type='ToyGenerator'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea1_conv=dict(type='ModuleOutputs', source='tea1.conv')),
+                distill_losses=dict(loss_dis=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_dis=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea1_conv')))),
+            generator_distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea2_conv=dict(type='ModuleOutputs', source='tea2.conv')),
+                distill_losses=dict(loss_gen=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_gen=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea2_conv')))),
+        )
+
+        optim_wrapper_cfg = dict(
+            type='OptimWrapper',
+            optimizer=dict(
+                type='SGD', lr=0.1, weight_decay=0.01, momentum=0.9))
+
+        data = dict(inputs=torch.randn(3, 1, 1), data_samples=None)
+
+        alg = DataFreeDistillation(**alg_kwargs)
+        optim_wrapper = build_optim_wrapper(alg, optim_wrapper_cfg)
+        optim_wrapper_dict = dict(
+            architecture=optim_wrapper, generator=optim_wrapper)
+
+        losses = alg.train_step(data, optim_wrapper_dict)
+        self.assertIn('distill.loss_dis', losses)
+        self.assertIn('distill.loss', losses)
+        self.assertIn('generator.loss_gen', losses)
+        self.assertIn('generator.loss', losses)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['student_iter'] = 5
+        alg = DataFreeDistillation(**alg_kwargs_)
+        losses = alg.train_step(data, optim_wrapper_dict)
+        self.assertIn('distill.loss_dis', losses)
+        self.assertIn('distill.loss', losses)
+        self.assertIn('generator.loss_gen', losses)
+        self.assertIn('generator.loss', losses)
+
+
+class TestDAFLDataFreeDistill(TestCase):
+
+    def test_init(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teachers=dict(
+                tea1=dict(build_cfg=dict(type='ToyTeacher')),
+                tea2=dict(build_cfg=dict(type='ToyTeacher'))),
+            generator=dict(type='ToyGenerator'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea1_conv=dict(type='ModuleOutputs', source='tea1.conv')),
+                distill_losses=dict(loss_dis=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_dis=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea1_conv')))),
+            generator_distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea2_conv=dict(type='ModuleOutputs', source='tea2.conv')),
+                distill_losses=dict(loss_gen=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_gen=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea2_conv')))))
+
+        alg = DAFLDataFreeDistillation(**alg_kwargs)
+        self.assertEquals(len(alg.teachers), len(alg_kwargs['teachers']))
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teachers'] = 'ToyTeacher'
+        with self.assertRaisesRegex(TypeError,
+                                    'teacher should be a `dict` but got '):
+            alg = DAFLDataFreeDistillation(**alg_kwargs_)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['generator'] = 'ToyGenerator'
+        with self.assertRaisesRegex(
+                TypeError, 'generator should be a `dict` instance, but got '):
+            _ = DAFLDataFreeDistillation(**alg_kwargs_)
+
+    def test_loss(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teachers=dict(
+                tea1=dict(build_cfg=dict(type='ToyTeacher')),
+                tea2=dict(build_cfg=dict(type='ToyTeacher'))),
+            generator=dict(type='ToyGenerator'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea1_conv=dict(type='ModuleOutputs', source='tea1.conv')),
+                distill_losses=dict(loss_dis=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_dis=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='tea1_conv')))),
+            generator_distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=dict(
+                    tea1_conv=dict(type='ModuleOutputs', source='tea1.conv'),
+                    tea2_conv=dict(type='ModuleOutputs', source='tea2.conv')),
+                distill_losses=dict(loss_gen=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_gen=dict(
+                        arg1=dict(from_student=False, recorder='tea1_conv'),
+                        arg2=dict(from_student=False, recorder='tea2_conv')))))
+
+        optim_wrapper_cfg = dict(
+            type='OptimWrapper',
+            optimizer=dict(
+                type='SGD', lr=0.1, weight_decay=0.01, momentum=0.9))
+
+        data = dict(inputs=torch.randn(3, 1, 1), data_samples=None)
+
+        alg = DAFLDataFreeDistillation(**alg_kwargs)
+        optim_wrapper = build_optim_wrapper(alg, optim_wrapper_cfg)
+        optim_wrapper_dict = dict(
+            architecture=optim_wrapper, generator=optim_wrapper)
+
+        losses = alg.train_step(data, optim_wrapper_dict)
+        self.assertIn('distill.loss_dis', losses)
+        self.assertIn('distill.loss', losses)
+        self.assertIn('generator.loss_gen', losses)
+        self.assertIn('generator.loss', losses)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dcff_network.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dcff_network.py
new file mode 100644
index 0000000000000000000000000000000000000000..657d7a09bbd64065499836a5e21ee3ab60a6e093
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dcff_network.py
@@ -0,0 +1,294 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import json
+import os
+import os.path as osp
+import unittest
+
+import torch
+from mmcls.structures import ClsDataSample
+from mmengine import MessageHub
+from mmengine.model import BaseModel
+
+from mmrazor.models.algorithms.pruning.dcff import DCFF
+from mmrazor.models.algorithms.pruning.ite_prune_algorithm import \
+    ItePruneConfigManager
+from mmrazor.registry import MODELS
+from mmrazor.structures import export_fix_subnet
+
+
+# @TASK_UTILS.register_module()
+class ImageClassifierPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a `ImageClassifier`
+    in MMClassification with `BackwardTracer`."""
+
+    def __call__(self, model) -> torch.Tensor:
+        pseudo_img = torch.rand(2, 3, 32, 32)
+        pseudo_output = model(pseudo_img)
+        return pseudo_output.sum()
+
+
+MODEL_CFG = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(
+        type='ResNet',
+        depth=18,
+        num_stages=4,
+        out_indices=(3, ),
+        style='pytorch'),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=512,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+MUTATOR_CONFIG_NUM = dict(
+    type='DCFFChannelMutator',
+    channel_unit_cfg={
+        'type': 'DCFFChannelUnit',
+        'default_args': {
+            'choice_mode': 'number'
+        }
+    })
+MUTATOR_CONFIG_FLOAT = dict(
+    type='DCFFChannelMutator',
+    channel_unit_cfg={
+        'type': 'DCFFChannelUnit',
+        'default_args': {
+            'choice_mode': 'ratio'
+        }
+    })
+
+if torch.cuda.is_available():
+    DEVICE = torch.device('cuda:0')
+else:
+    DEVICE = torch.device('cpu')
+
+
+class TestDCFFAlgorithm(unittest.TestCase):
+
+    def _set_epoch_ite(self, epoch, ite, max_epoch):
+        iter_per_epoch = 10
+        message_hub = MessageHub.get_current_instance()
+        message_hub.update_info('epoch', epoch)
+        message_hub.update_info('max_epochs', max_epoch)
+        message_hub.update_info('max_iters', max_epoch * iter_per_epoch)
+        message_hub.update_info('iter', ite + iter_per_epoch * epoch)
+
+    def fake_cifar_data(self):
+        imgs = torch.randn(16, 3, 32, 32).to(DEVICE)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 10,
+                                                       (16, ))).to(DEVICE)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_ite_prune_config_manager(self):
+        iter_per_epoch = 10
+        float_origin, float_target = 1.0, 0.5
+        int_origin, int_target = 10, 5
+        for origin, target, manager in [
+            (float_origin, float_target,
+             ItePruneConfigManager({'a': float_target}, {'a': float_origin},
+                                   2 * iter_per_epoch, 5)),
+            (int_origin, int_target,
+             ItePruneConfigManager({'a': int_target}, {'a': int_origin},
+                                   2 * iter_per_epoch, 5))
+        ]:
+            times = 1
+            for e in range(1, 10):
+                for ite in range(iter_per_epoch):
+                    self._set_epoch_ite(e, ite, 10)
+                    if (e, ite) in [(0, 0), (2, 0), (4, 0), (6, 0), (8, 0)]:
+                        self.assertTrue(
+                            manager.is_prune_time(e * iter_per_epoch + ite))
+                        times += 1
+                        self.assertEqual(
+                            manager.prune_at(e * iter_per_epoch + ite)['a'],
+                            origin - (origin - target) * times / 5)
+                    else:
+                        self.assertFalse(
+                            manager.is_prune_time(e * iter_per_epoch + ite))
+
+    def test_iterative_prune_int(self):
+
+        data = self.fake_cifar_data()
+
+        model = MODELS.build(MODEL_CFG)
+        mutator = MODELS.build(MUTATOR_CONFIG_FLOAT)
+        mutator.prepare_from_supernet(model)
+        mutator.set_choices(mutator.sample_choices())
+        prune_target = mutator.choice_template
+
+        iter_per_epoch = 10
+        epoch = 10
+        epoch_step = 2
+        times = 5
+
+        algorithm = DCFF(
+            MODEL_CFG,
+            target_pruning_ratio=prune_target,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            step_freq=epoch_step).to(DEVICE)
+
+        for e in range(epoch):
+            for ite in range(10):
+                self._set_epoch_ite(e, ite, epoch)
+
+                algorithm.forward(
+                    data['inputs'], data['data_samples'], mode='loss')
+                self.assertEqual(times, algorithm.prune_times)
+                self.assertEqual(epoch_step * iter_per_epoch,
+                                 algorithm.step_freq)
+
+        current_choices = algorithm.mutator.current_choices
+        target_pruning_ratio = algorithm.set_target_pruning_ratio(
+            prune_target, mutator.mutable_units)
+        for key in current_choices:
+            self.assertAlmostEqual(
+                current_choices[key], target_pruning_ratio[key], delta=0.1)
+
+    def test_load_pretrained(self):
+        iter_per_epoch = 10
+        epoch_step = 20
+        data = self.fake_cifar_data()
+
+        # prepare checkpoint
+        model_cfg = copy.deepcopy(MODEL_CFG)
+        model: BaseModel = MODELS.build(model_cfg)
+        checkpoint_path = os.path.dirname(__file__) + '/checkpoint'
+        torch.save(model.state_dict(), checkpoint_path)
+
+        # build algorithm
+        model_cfg['init_cfg'] = {
+            'type': 'Pretrained',
+            'checkpoint': checkpoint_path
+        }
+        algorithm = DCFF(
+            model_cfg,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            target_pruning_ratio=None,
+            step_freq=epoch_step).to(DEVICE)
+        algorithm.init_weights()
+        self._set_epoch_ite(10, 5, 200)
+        algorithm.forward(data['inputs'], data['data_samples'], mode='loss')
+        self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+        # delete checkpoint
+        os.remove(checkpoint_path)
+
+    def test_group_target_ratio(self):
+
+        model = MODELS.build(MODEL_CFG)
+        mutator = MODELS.build(MUTATOR_CONFIG_FLOAT)
+        mutator.prepare_from_supernet(model)
+        mutator.set_choices(mutator.sample_choices())
+        prune_target = mutator.choice_template
+
+        iter_per_epoch = 10
+        epoch_step = 2
+        epoch = 6
+        data = self.fake_cifar_data()
+
+        prune_target['backbone.layer1.0.conv1_(0, 64)_64'] = 0.1
+        prune_target['backbone.layer1.1.conv1_(0, 64)_64'] = 0.1
+
+        algorithm = DCFF(
+            MODEL_CFG,
+            target_pruning_ratio=prune_target,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            step_freq=epoch_step).to(DEVICE)
+
+        algorithm.init_weights()
+        self._set_epoch_ite(1, 2, epoch)
+        algorithm.forward(data['inputs'], data['data_samples'], mode='loss')
+        self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+    def test_export_subnet(self):
+
+        model = MODELS.build(MODEL_CFG)
+        mutator = MODELS.build(MUTATOR_CONFIG_FLOAT)
+        mutator.prepare_from_supernet(model)
+        mutator.set_choices(mutator.sample_choices())
+
+        iter_per_epoch = 10
+        epoch_step = 2
+        epoch = 6
+        data = self.fake_cifar_data()
+
+        stage_ratio_1 = 0.65
+        stage_ratio_2 = 0.6
+        stage_ratio_3 = 0.9
+        stage_ratio_4 = 0.7
+
+        target_pruning_ratio = {
+            'backbone.layer1.0.conv1_(0, 64)_64': stage_ratio_1,
+            'backbone.layer1.0.conv2_(0, 64)_64': stage_ratio_2,
+            'backbone.layer1.0.conv3_(0, 256)_256': stage_ratio_3,
+            'backbone.layer1.1.conv1_(0, 64)_64': stage_ratio_1,
+            'backbone.layer1.1.conv2_(0, 64)_64': stage_ratio_2,
+            'backbone.layer1.2.conv1_(0, 64)_64': stage_ratio_1,
+            'backbone.layer1.2.conv2_(0, 64)_64': stage_ratio_2,
+            # block 1 [0.65, 0.6] downsample=[0.9]
+            'backbone.layer2.0.conv1_(0, 128)_128': stage_ratio_1,
+            'backbone.layer2.0.conv2_(0, 128)_128': stage_ratio_2,
+            'backbone.layer2.0.conv3_(0, 512)_512': stage_ratio_3,
+            'backbone.layer2.1.conv1_(0, 128)_128': stage_ratio_1,
+            'backbone.layer2.1.conv2_(0, 128)_128': stage_ratio_2,
+            'backbone.layer2.2.conv1_(0, 128)_128': stage_ratio_1,
+            'backbone.layer2.2.conv2_(0, 128)_128': stage_ratio_2,
+            'backbone.layer2.3.conv1_(0, 128)_128': stage_ratio_1,
+            'backbone.layer2.3.conv2_(0, 128)_128': stage_ratio_2,
+            # block 2 [0.65, 0.6] downsample=[0.9]
+            'backbone.layer3.0.conv1_(0, 256)_256': stage_ratio_1,
+            'backbone.layer3.0.conv2_(0, 256)_256': stage_ratio_2,
+            'backbone.layer3.0.conv3_(0, 1024)_1024': stage_ratio_3,
+            'backbone.layer3.1.conv1_(0, 256)_256': stage_ratio_1,
+            'backbone.layer3.1.conv2_(0, 256)_256': stage_ratio_2,
+            'backbone.layer3.2.conv1_(0, 256)_256': stage_ratio_1,
+            'backbone.layer3.2.conv2_(0, 256)_256': stage_ratio_2,
+            'backbone.layer3.3.conv1_(0, 256)_256': stage_ratio_4,
+            'backbone.layer3.3.conv2_(0, 256)_256': stage_ratio_4,
+            'backbone.layer3.4.conv1_(0, 256)_256': stage_ratio_4,
+            'backbone.layer3.4.conv2_(0, 256)_256': stage_ratio_4,
+            'backbone.layer3.5.conv1_(0, 256)_256': stage_ratio_4,
+            'backbone.layer3.5.conv2_(0, 256)_256': stage_ratio_4,
+            # block 3 [0.65, 0.6]*2+[0.7, 0.7]*2 downsample=[0.9]
+            'backbone.layer4.0.conv1_(0, 512)_512': stage_ratio_4,
+            'backbone.layer4.0.conv2_(0, 512)_512': stage_ratio_4,
+            'backbone.layer4.0.conv3_(0, 2048)_2048': stage_ratio_3,
+            'backbone.layer4.1.conv1_(0, 512)_512': stage_ratio_4,
+            'backbone.layer4.1.conv2_(0, 512)_512': stage_ratio_4,
+            'backbone.layer4.2.conv1_(0, 512)_512': stage_ratio_4,
+            'backbone.layer4.2.conv2_(0, 512)_512': stage_ratio_4
+            # block 4 [0.7, 0.7] downsample=[0.9]
+        }
+
+        algorithm = DCFF(
+            MODEL_CFG,
+            target_pruning_ratio=target_pruning_ratio,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            step_freq=epoch_step).to(DEVICE)
+
+        algorithm.init_weights()
+        self._set_epoch_ite(0, 0, epoch)
+        algorithm.forward(data['inputs'], data['data_samples'], mode='loss')
+        self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+        fix_subnet, static_model = export_fix_subnet(
+            algorithm, export_subnet_mode='mutator', slice_weight=True)
+        fix_subnet = json.dumps(fix_subnet, indent=4, separators=(',', ':'))
+        subnet_name = 'subnet.json'
+        weight_name = 'subnet_weight.pth'
+        with open(osp.join('tests/data/test_registry/', subnet_name),
+                  'w') as file:
+            file.write(fix_subnet)
+        torch.save({
+            'state_dict': static_model.state_dict(),
+            'meta': {}
+        }, osp.join('tests/data/test_registry/', weight_name))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dmcp.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dmcp.py
new file mode 100644
index 0000000000000000000000000000000000000000..5ec199b20b4212824baf23be26b0d53f1f78651c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dmcp.py
@@ -0,0 +1,193 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from typing import Dict, Union
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.distributed as dist
+from mmcls.structures import ClsDataSample
+from mmengine import MessageHub
+from mmengine.optim.optimizer import OptimWrapper, OptimWrapperDict
+from torch.optim import SGD
+
+from mmrazor.models.algorithms import DMCP, DMCPDDP
+from mmrazor.models.mutators import DMCPChannelMutator
+from mmrazor.registry import MODELS
+
+MUTATOR_TYPE = Union[torch.nn.Module, Dict]
+DISTILLER_TYPE = Union[torch.nn.Module, Dict]
+
+MUTATOR_CFG = dict(
+    type='mmrazor.DMCPChannelMutator',
+    channel_unit_cfg={'type': 'DMCPChannelUnit'},
+    parse_cfg=dict(
+        type='ChannelAnalyzer',
+        demo_input=(1, 3, 224, 224),
+        tracer_type='BackwardTracer'),
+)
+
+DISTILLER_CFG = dict(
+    _scope_='mmrazor',
+    type='ConfigurableDistiller',
+    teacher_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    student_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    distill_losses=dict(
+        loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+    loss_forward_mappings=dict(
+        loss_kl=dict(
+            preds_S=dict(recorder='fc', from_student=True),
+            preds_T=dict(recorder='fc', from_student=False))))
+
+ALGORITHM_CFG = dict(
+    type='mmrazor.DMCP',
+    architecture=dict(
+        cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py', pretrained=False),
+    mutator_cfg=MUTATOR_CFG,
+    distiller=DISTILLER_CFG,
+    strategy=['max', 'min', 'scheduled_random', 'arch_random'],
+    arch_start_train=10,
+    distillation_times=10,
+    arch_train_freq=10)
+
+
+class TestDMCP(TestCase):
+
+    def _prepare_fake_data(self) -> Dict:
+        imgs = torch.randn(16, 3, 224, 224).to(self.device)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 1000,
+                                                       (16, ))).to(self.device)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_init(self):
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+        ALGORITHM_CFG_SUPERNET = copy.deepcopy(ALGORITHM_CFG)
+        # initiate dmcp with built `algorithm`.
+        dmcp_algo = MODELS.build(ALGORITHM_CFG_SUPERNET)
+        self.assertIsInstance(dmcp_algo, DMCP)
+        # dmcp mutators include channel_mutator and value_mutator
+        assert isinstance(dmcp_algo.mutator, DMCPChannelMutator)
+
+        ALGORITHM_CFG_SUPERNET.pop('type')
+        fake_distiller = 'distiller'
+        # initiate dmcp without `distiller`.
+        with self.assertRaisesRegex(
+                TypeError, 'distiller should be a `dict` or '
+                '`ConfigurableDistiller` instance, but got '
+                f'{type(fake_distiller)}'):
+            ALGORITHM_CFG_SUPERNET['distiller'] = fake_distiller
+            _ = DMCP(**ALGORITHM_CFG_SUPERNET)
+
+        # initiate dmcp without any `mutator`.
+        ALGORITHM_CFG_SUPERNET['mutator_cfg'] = None
+        with self.assertRaisesRegex(
+                AttributeError, "'NoneType' object has no attribute 'get'"):
+            _ = DMCP(**ALGORITHM_CFG_SUPERNET)
+
+    def test_loss(self):
+        # subernet
+        inputs = torch.randn(1, 3, 224, 224)
+        dmcp = MODELS.build(ALGORITHM_CFG)
+        loss = dmcp(inputs, mode='tensor')
+        assert loss.size(1) == 1000
+
+    def test_dmcp_train_step(self):
+        # supernet
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+        inputs = self._prepare_fake_data()
+        dmcp = MODELS.build(ALGORITHM_CFG)
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(dmcp.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(dmcp.parameters(), lr=0.01)))
+
+        message_hub = MessageHub.get_current_instance()
+
+        message_hub.update_info('iter', 20)
+        dmcp.cur_sample_prob = -1
+
+        losses = dmcp.train_step(inputs, optim_wrapper_dict)
+
+        assert len(losses) == 9
+        assert losses['max_subnet1.loss'] > 0
+        assert losses['min_subnet1.loss'] > 0
+        assert losses['min_subnet1.loss_kl'] + 1e-5 > 0
+        assert losses['direct_subnet1.loss'] > 0
+        assert losses['direct_subnet1.loss_kl'] + 1e-5 > 0
+        assert losses['direct_subnet2.loss'] > 0
+        assert losses['direct_subnet2.loss_kl'] + 1e-5 > 0
+        assert losses['arch.loss'] > 0
+        assert losses['flops.loss'] > 0
+
+        message_hub.update_info('iter', 0)
+        dmcp.arch_train = False
+        losses = dmcp.train_step(inputs, optim_wrapper_dict)
+
+        assert len(losses) == 4
+        assert losses['max_subnet1.loss'] > 0
+        assert losses['min_subnet1.loss'] > 0
+        assert losses['random_subnet1.loss'] > 0
+        assert losses['random_subnet2.loss'] > 0
+
+    def test_dmcp_compute_flops_loss(self):
+        dmcp = MODELS.build(ALGORITHM_CFG)
+        for type in ['l2', 'inverted_log_l1', 'log_l1', 'l1']:
+            dmcp.flops_loss_type = type
+            fake_flops = torch.tensor(100)
+            dmcp._compute_flops_loss(expected_flops=fake_flops)
+
+
+class TestDMCPDDP(TestDMCP):
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12345'
+
+        # initialize the process group
+        backend = 'nccl' if torch.cuda.is_available() else 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def prepare_model(self, device_ids=None) -> DMCPDDP:
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+        dmcp_algo = MODELS.build(ALGORITHM_CFG).to(self.device)
+        self.assertIsInstance(dmcp_algo, DMCP)
+
+        return DMCPDDP(
+            module=dmcp_algo,
+            find_unused_parameters=True,
+            device_ids=device_ids)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        dist.destroy_process_group()
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='cuda device is not avaliable')
+    def test_init(self) -> None:
+        ddp_model = self.prepare_model()
+        self.assertIsInstance(ddp_model, DMCPDDP)
+
+    def test_dmcpddp_train_step(self) -> None:
+        ddp_model = self.prepare_model()
+        data = self._prepare_fake_data()
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(ddp_model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(ddp_model.parameters(), lr=0.01)))
+
+        message_hub = MessageHub.get_current_instance()
+
+        message_hub.update_info('iter', 20)
+        ddp_model.module.cur_sample_prob = -1
+        loss = ddp_model.train_step(data, optim_wrapper_dict)
+
+        message_hub.update_info('iter', 0)
+        ddp_model.module.arch_train = False
+        loss = ddp_model.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dsnas.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dsnas.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6d28e4c680135fa616871a11f241f1ffeb881e7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_dsnas.py
@@ -0,0 +1,217 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from unittest import TestCase
+from unittest.mock import patch
+
+import pytest
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+from mmcls.structures import ClsDataSample
+from mmengine.model import BaseModel
+from mmengine.optim import build_optim_wrapper
+from mmengine.optim.optimizer import OptimWrapper, OptimWrapperDict
+from torch import Tensor
+from torch.optim import SGD
+
+from mmrazor.models import DSNAS, NasMutator, OneHotMutableOP
+from mmrazor.models.algorithms.nas.dsnas import DSNASDDP
+from mmrazor.registry import MODELS
+from mmrazor.structures import load_fix_subnet
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+@MODELS.register_module()
+class ToyDiffModule(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        self.candidates = dict(
+            torch_conv2d_3x3=dict(
+                type='torchConv2d',
+                kernel_size=3,
+                padding=1,
+            ),
+            torch_conv2d_5x5=dict(
+                type='torchConv2d',
+                kernel_size=5,
+                padding=2,
+            ),
+            torch_conv2d_7x7=dict(
+                type='torchConv2d',
+                kernel_size=7,
+                padding=3,
+            ),
+        )
+        module_kwargs = dict(in_channels=3, out_channels=8, stride=1)
+
+        self.mutable = OneHotMutableOP(
+            candidates=self.candidates, module_kwargs=module_kwargs)
+        self.bn = nn.BatchNorm2d(8)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.bn(self.mutable(batch_inputs))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.bn(self.mutable(batch_inputs)) + 1
+            return out
+        elif mode == 'tensor':
+            out = self.bn(self.mutable(batch_inputs)) + 2
+            return out
+
+
+class TestDsnas(TestCase):
+
+    def setUp(self) -> None:
+        self.device: str = 'cpu'
+
+        OPTIMIZER_CFG = dict(
+            type='SGD',
+            lr=0.5,
+            momentum=0.9,
+            nesterov=True,
+            weight_decay=0.0001)
+
+        self.OPTIM_WRAPPER_CFG = dict(optimizer=OPTIMIZER_CFG)
+
+    def test_init(self) -> None:
+        # initiate dsnas when `norm_training` is True.
+        model = ToyDiffModule()
+        mutator = NasMutator()
+        algo = DSNAS(architecture=model, mutator=mutator, norm_training=True)
+        algo.eval()
+        self.assertTrue(model.bn.training)
+
+        # initiate Dsnas with built mutator
+        model = ToyDiffModule()
+        mutator = NasMutator()
+        algo = DSNAS(model, mutator)
+        self.assertIs(algo.mutator, mutator)
+
+        # initiate Dsnas with unbuilt mutator
+        mutator = dict(type='NasMutator')
+        algo = DSNAS(model, mutator)
+        self.assertIsInstance(algo.mutator, NasMutator)
+
+        # test load fix_subnet
+        fix_subnet = {'mutable': {'chosen': 'torch_conv2d_5x5'}}
+        load_fix_subnet(model, fix_subnet)
+        algo = DSNAS(model, mutator)
+        self.assertEqual(algo.architecture.mutable.num_choices, 1)
+
+        # initiate Dsnas with error type `mutator`
+        with self.assertRaisesRegex(TypeError, 'mutator should be'):
+            DSNAS(model, model)
+
+    def test_forward_loss(self) -> None:
+        inputs = torch.randn(1, 3, 8, 8)
+        model = ToyDiffModule()
+
+        # supernet
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        algo = DSNAS(model, mutator)
+        loss = algo(inputs, mode='loss')
+        self.assertIsInstance(loss, dict)
+
+        # subnet
+        fix_subnet = {'mutable': {'chosen': 'torch_conv2d_5x5'}}
+        load_fix_subnet(model, fix_subnet)
+        loss = model(inputs, mode='loss')
+        self.assertIsInstance(loss, dict)
+
+    def _prepare_fake_data(self):
+        imgs = torch.randn(16, 3, 224, 224).to(self.device)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 1000,
+                                                       (16, ))).to(self.device)
+        ]
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_search_subnet(self) -> None:
+        model = ToyDiffModule()
+
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        algo = DSNAS(model, mutator)
+        subnet = algo.mutator.sample_choices()
+        self.assertIsInstance(subnet, dict)
+
+    @patch('mmengine.logging.message_hub.MessageHub.get_info')
+    def test_dsnas_train_step(self, mock_get_info) -> None:
+        model = ToyDiffModule()
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+        mock_get_info.return_value = 2
+
+        algo = DSNAS(model, mutator)
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(algo, self.OPTIM_WRAPPER_CFG)
+        loss = algo.train_step(data, optim_wrapper)
+
+        self.assertTrue(isinstance(loss['loss'], Tensor))
+
+        algo = DSNAS(model, mutator)
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(model.parameters(), lr=0.01)))
+        loss = algo.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
+
+
+class TestDsnasDDP(TestDsnas):
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12345'
+
+        # initialize the process group
+        backend = 'nccl' if torch.cuda.is_available() else 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def prepare_model(self, device_ids=None) -> DSNAS:
+        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+        model = ToyDiffModule()
+        mutator = NasMutator()
+        mutator.prepare_from_supernet(model)
+
+        algo = DSNAS(model, mutator).to(self.device)
+
+        return DSNASDDP(
+            module=algo, find_unused_parameters=True, device_ids=device_ids)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        dist.destroy_process_group()
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='cuda device is not avaliable')
+    def test_init(self) -> None:
+        ddp_model = self.prepare_model()
+        self.assertIsInstance(ddp_model, DSNASDDP)
+
+    @patch('mmengine.logging.message_hub.MessageHub.get_info')
+    def test_dsnasddp_train_step(self, mock_get_info) -> None:
+        ddp_model = self.prepare_model()
+        mock_get_info.return_value = 2
+
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(ddp_model, self.OPTIM_WRAPPER_CFG)
+        loss = ddp_model.train_step(data, optim_wrapper)
+
+        self.assertIsNotNone(loss)
+
+        ddp_model = self.prepare_model()
+        optim_wrapper_dict = OptimWrapperDict(
+            architecture=OptimWrapper(SGD(ddp_model.parameters(), lr=0.1)),
+            mutator=OptimWrapper(SGD(ddp_model.parameters(), lr=0.01)))
+        loss = ddp_model.train_step(data, optim_wrapper_dict)
+
+        self.assertIsNotNone(loss)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_general_quant.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_general_quant.py
new file mode 100644
index 0000000000000000000000000000000000000000..94a2485bc29ac0684a9d7454184d31c8bd795d5a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_general_quant.py
@@ -0,0 +1,34 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch.nn as nn
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        # TODO
+
+
+class TestGeneralQuant(TestCase):
+    """TODO.
+
+    Args:
+        TestCase (_type_): _description_
+    """
+
+    def test_init(self):
+        pass
+
+    def test_prepare(self):
+        pass
+
+    def test_convert(self):
+        pass
+
+    def test_states(self):
+        pass
+
+    def test_forward(self):
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_mm_architecture.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_mm_architecture.py
new file mode 100644
index 0000000000000000000000000000000000000000..310d42f5e4288103cd13574ff413f6141a1cd264
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_mm_architecture.py
@@ -0,0 +1,225 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import shutil
+import tempfile
+from unittest import TestCase, skipIf
+
+import torch
+import torch.nn as nn
+
+try:
+    from torch.fx import GraphModule
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    GraphModule = get_placeholder('torch>=1.13')
+
+from mmengine import ConfigDict
+from mmengine.model import BaseModel
+
+try:
+    import mmdeploy
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    mmdeploy = get_package_placeholder('mmdeploy')
+
+from mmrazor import digit_version
+from mmrazor.models.algorithms import MMArchitectureQuant
+from mmrazor.registry import MODELS
+
+
+class BasicBlock(nn.Module):
+
+    def __init__(self, in_channels, out_channels):
+        super(BasicBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.mid_channels = out_channels
+
+        self.norm1 = nn.BatchNorm2d(self.mid_channels)
+        self.norm2 = nn.BatchNorm2d(out_channels)
+        self.conv1 = nn.Conv2d(in_channels, self.mid_channels, 1)
+        self.conv2 = nn.Conv2d(self.mid_channels, out_channels, 1)
+
+        self.relu = nn.ReLU6()
+        self.drop_path = nn.Identity()
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            identity = x
+
+            out = self.conv1(x)
+            out = self.norm1(out)
+            out = self.relu(out)
+
+            out = self.conv2(out)
+            out = self.norm2(out)
+
+            out = self.drop_path(out)
+
+            out += identity
+
+            return out
+
+        out = _inner_forward(x)
+
+        out = self.relu(out)
+
+        return out
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super(ToyModel, self).__init__()
+        self.stem_layer = nn.Sequential(
+            nn.Conv2d(3, 3, 1), nn.BatchNorm2d(3), nn.ReLU())
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.block = BasicBlock(3, 3)
+        self.block2 = BasicBlock(3, 3)
+        self.gap = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc = nn.Linear(3, 4)
+
+    def forward(self, x):
+        x = self.stem_layer(x)
+        x = self.maxpool(x)
+        x = self.block(x)
+        x = self.block2(x)
+        x = self.gap(x)
+        x = x.flatten(1)
+        x = self.fc(x)
+        return x
+
+
+class ToyQuantModel(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.architecture = ToyModel()
+
+    def loss(self, outputs, data_samples):
+        return dict(loss=outputs.sum() - data_samples.sum())
+
+    def forward(self, inputs, data_samples, mode: str = 'tensor'):
+        if isinstance(inputs, list):
+            inputs = torch.stack(inputs)
+        outputs = self.architecture(inputs)
+
+        return outputs
+
+
+DEPLOY_CFG = ConfigDict(
+    onnx_config=dict(
+        type='onnx',
+        export_params=True,
+        keep_initializers_as_inputs=False,
+        opset_version=11,
+        save_file='end2end.onnx',
+        input_names=['input'],
+        output_names=['output'],
+        input_shape=None,
+        optimize=True,
+        dynamic_axes={
+            'input': {
+                0: 'batch',
+                2: 'height',
+                3: 'width'
+            },
+            'output': {
+                0: 'batch'
+            }
+        }),
+    backend_config=dict(
+        type='openvino',
+        model_inputs=[dict(opt_shapes=dict(input=[1, 3, 224, 224]))]),
+    codebase_config=dict(type='mmcls', task='Classification'),
+    function_record_to_pop=[
+        'mmcls.models.classifiers.ImageClassifier.forward',
+        'mmcls.models.classifiers.BaseClassifier.forward'
+    ],
+)
+
+
+@skipIf(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    'PyTorch version lower than 1.13.0 is not supported.')
+class TestMMArchitectureQuant(TestCase):
+
+    def setUp(self):
+
+        MODELS.register_module(module=ToyQuantModel, force=True)
+
+        self.temp_dir = tempfile.mkdtemp()
+        filename = 'fp_model.pth'
+        filename = os.path.join(self.temp_dir, filename)
+        toymodel = ToyQuantModel()
+        torch.save(toymodel.state_dict(), filename)
+
+        global_qconfig = ConfigDict(
+            w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+            a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+            w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            w_qscheme=dict(
+                qdtype='qint8',
+                bit=8,
+                is_symmetry=True,
+                is_symmetric_range=True),
+            a_qscheme=dict(
+                qdtype='quint8',
+                bit=8,
+                is_symmetry=True,
+                averaging_constant=0.1),
+        )
+        alg_kwargs = ConfigDict(
+            type='mmrazor.MMArchitectureQuant',
+            architecture=dict(type='ToyQuantModel'),
+            float_checkpoint=filename,
+            quantizer=dict(
+                type='mmrazor.OpenVINOQuantizer',
+                global_qconfig=global_qconfig,
+                tracer=dict(type='mmrazor.CustomTracer')))
+        self.alg_kwargs = alg_kwargs
+
+    def tearDown(self):
+        MODELS.module_dict.pop('ToyQuantModel')
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        self.toy_model = MODELS.build(self.alg_kwargs)
+        assert isinstance(self.toy_model, MMArchitectureQuant)
+        assert hasattr(self.toy_model, 'quantizer')
+
+        alg_kwargs = copy.deepcopy(self.alg_kwargs)
+        alg_kwargs.deploy_cfg = DEPLOY_CFG
+        assert isinstance(self.toy_model, MMArchitectureQuant)
+        assert hasattr(self.toy_model, 'quantizer')
+
+    def test_sync_qparams(self):
+        self.toy_model = MODELS.build(self.alg_kwargs)
+        mode = self.toy_model.forward_modes[0]
+        self.toy_model.sync_qparams(mode)
+        w_loss = self.toy_model.qmodels[
+            'loss'].architecture.block.conv1.state_dict()['weight']
+        w_tensor = self.toy_model.qmodels[
+            'tensor'].architecture.block.conv1.state_dict()['weight']
+        w_pred = self.toy_model.qmodels[
+            'predict'].architecture.block.conv1.state_dict()['weight']
+        assert w_loss.equal(w_pred)
+        assert w_loss.equal(w_tensor)
+
+    def test_build_qmodels(self):
+        self.toy_model = MODELS.build(self.alg_kwargs)
+        for forward_modes in self.toy_model.forward_modes:
+            qmodels = self.toy_model.qmodels[forward_modes]
+            assert isinstance(qmodels, GraphModule)
+
+    def test_get_deploy_model(self):
+        self.toy_model = MODELS.build(self.alg_kwargs)
+        deploy_model = self.toy_model.get_deploy_model()
+        self.assertIsInstance(deploy_model, torch.fx.graph_module.GraphModule)
+
+    def test_calibrate_step(self):
+        # TODO
+        pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_ofd_algo.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_ofd_algo.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b7442d6863c640256ba604dc8200bb46b41fdb3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_ofd_algo.py
@@ -0,0 +1,46 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+from mmengine import ConfigDict
+
+from mmrazor.models import OverhaulFeatureDistillation
+from .toy_models import ToyOFDStudent
+
+
+class TestSingleTeacherDistill(TestCase):
+
+    def test_init(self):
+
+        recorders_cfg = ConfigDict(bn=dict(type='ModuleOutputs', source='bn'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyOFDStudent'),
+            teacher=dict(type='ToyOFDTeacher'),
+            distiller=dict(
+                type='OFDDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=recorders_cfg,
+                distill_losses=dict(loss_toy=dict(type='OFDLoss')),
+                connectors=dict(loss_1_tfeat=dict(type='OFDTeacherConnector')),
+                loss_forward_mappings=dict(
+                    loss_toy=dict(
+                        s_feature=dict(from_student=True, recorder='bn'),
+                        t_feature=dict(
+                            from_student=False,
+                            recorder='bn',
+                            connector='loss_1_tfeat')))))
+
+        alg = OverhaulFeatureDistillation(**alg_kwargs)
+
+        teacher = ToyOFDStudent()
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teacher'] = teacher
+        alg = OverhaulFeatureDistillation(**alg_kwargs_)
+        self.assertEquals(alg.teacher, teacher)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teacher'] = 'teacher'
+        with self.assertRaisesRegex(TypeError,
+                                    'teacher should be a `dict` or'):
+            _ = OverhaulFeatureDistillation(**alg_kwargs_)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_prune_algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_prune_algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..00d615815cc306cc0748cf156629709dc297b026
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_prune_algorithm.py
@@ -0,0 +1,264 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import unittest
+
+import torch
+from mmcls.structures import ClsDataSample
+from mmengine import MessageHub
+from mmengine.model import BaseModel
+
+from mmrazor.models.algorithms.pruning.ite_prune_algorithm import (
+    ItePruneAlgorithm, ItePruneConfigManager)
+from mmrazor.registry import MODELS
+from ...utils.set_dist_env import SetDistEnv
+
+
+# @TASK_UTILS.register_module()
+class ImageClassifierPseudoLoss:
+    """Calculate the pseudo loss to trace the topology of a `ImageClassifier`
+    in MMClassification with `BackwardTracer`."""
+
+    def __call__(self, model) -> torch.Tensor:
+        pseudo_img = torch.rand(2, 3, 32, 32)
+        pseudo_output = model(pseudo_img)
+        return pseudo_output.sum()
+
+
+MODEL_CFG = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(
+        type='ResNet',
+        depth=18,
+        num_stages=4,
+        out_indices=(3, ),
+        style='pytorch'),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=512,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5),
+    ))
+
+MUTATOR_CONFIG_NUM = dict(
+    type='ChannelMutator',
+    channel_unit_cfg={
+        'type': 'SequentialMutableChannelUnit',
+        'default_args': {
+            'choice_mode': 'number'
+        }
+    })
+MUTATOR_CONFIG_FLOAT = dict(
+    type='ChannelMutator',
+    channel_unit_cfg={
+        'type': 'SequentialMutableChannelUnit',
+        'default_args': {
+            'choice_mode': 'ratio'
+        }
+    })
+
+if torch.cuda.is_available():
+    DEVICE = torch.device('cuda:0')
+else:
+    DEVICE = torch.device('cpu')
+
+
+class TestItePruneAlgorithm(unittest.TestCase):
+
+    def _set_epoch_ite(self, epoch, ite, max_epoch):
+        iter_per_epoch = 10
+        message_hub = MessageHub.get_current_instance()
+        message_hub.update_info('epoch', epoch)
+        message_hub.update_info('max_epochs', max_epoch)
+        message_hub.update_info('max_iters', max_epoch * 10)
+        message_hub.update_info('iter', ite + iter_per_epoch * epoch)
+
+    def fake_cifar_data(self):
+        imgs = torch.randn(16, 3, 32, 32).to(DEVICE)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 10,
+                                                       (16, ))).to(DEVICE)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def test_ite_prune_config_manager(self):
+        iter_per_epoch = 10
+        float_origin, float_target = 1.0, 0.5
+        int_origin, int_target = 10, 5
+        for origin, target, manager in [
+            (float_origin, float_target,
+             ItePruneConfigManager({'a': float_target}, {'a': float_origin},
+                                   2 * iter_per_epoch, 5)),
+            (int_origin, int_target,
+             ItePruneConfigManager({'a': int_target}, {'a': int_origin},
+                                   2 * iter_per_epoch, 5))
+        ]:
+            times = 1
+            for e in range(1, 10):
+                for ite in range(iter_per_epoch):
+                    self._set_epoch_ite(e, ite, 10)
+                    if (e, ite) in [(0, 0), (2, 0), (4, 0), (6, 0), (8, 0)]:
+                        self.assertTrue(
+                            manager.is_prune_time(e * iter_per_epoch + ite))
+                        times += 1
+                        self.assertEqual(
+                            manager.prune_at(e * iter_per_epoch + ite)['a'],
+                            origin - (origin - target) * times / 5)
+                    else:
+                        self.assertFalse(
+                            manager.is_prune_time(e * iter_per_epoch + ite))
+
+    def test_iterative_prune_int(self):
+
+        data = self.fake_cifar_data()
+
+        model = MODELS.build(MODEL_CFG)
+        mutator = MODELS.build(MUTATOR_CONFIG_FLOAT)
+        mutator.prepare_from_supernet(model)
+        prune_target = mutator.choice_template
+
+        iter_per_epoch = 10
+        epoch = 10
+        epoch_step = 2
+        times = 3
+
+        algorithm = ItePruneAlgorithm(
+            MODEL_CFG,
+            target_pruning_ratio=prune_target,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            step_freq=epoch_step,
+            prune_times=times).to(DEVICE)
+
+        for e in range(epoch):
+            for ite in range(10):
+                self._set_epoch_ite(e, ite, epoch)
+
+                algorithm.forward(
+                    data['inputs'], data['data_samples'], mode='loss')
+                self.assertEqual(times, algorithm.prune_times)
+                self.assertEqual(epoch_step * iter_per_epoch,
+                                 algorithm.step_freq)
+
+        current_choices = algorithm.mutator.current_choices
+        target_pruning_ratio = algorithm.set_target_pruning_ratio(
+            prune_target, mutator.mutable_units)
+        for key in current_choices:
+            self.assertAlmostEqual(
+                current_choices[key], target_pruning_ratio[key], delta=0.1)
+
+    def test_load_pretrained(self):
+        iter_per_epoch = 10
+        epoch_step = 2
+        times = 3
+        data = self.fake_cifar_data()
+
+        # prepare checkpoint
+        model_cfg = copy.deepcopy(MODEL_CFG)
+        model: BaseModel = MODELS.build(model_cfg)
+        checkpoint_path = os.path.dirname(__file__) + '/checkpoint'
+        torch.save(model.state_dict(), checkpoint_path)
+
+        # build algorithm
+        model_cfg['init_cfg'] = {
+            'type': 'Pretrained',
+            'checkpoint': checkpoint_path
+        }
+        algorithm = ItePruneAlgorithm(
+            model_cfg,
+            mutator_cfg=MUTATOR_CONFIG_NUM,
+            target_pruning_ratio=None,
+            step_freq=epoch_step,
+            prune_times=times,
+        ).to(DEVICE)
+        algorithm.init_weights()
+        self._set_epoch_ite(4, 5, 6)
+        algorithm.forward(data['inputs'], data['data_samples'], mode='loss')
+        self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+        # delete checkpoint
+        os.remove(checkpoint_path)
+
+    def test_group_target_ratio(self):
+
+        model = MODELS.build(MODEL_CFG)
+        mutator = MODELS.build(MUTATOR_CONFIG_FLOAT)
+        mutator.prepare_from_supernet(model)
+        mutator.set_choices(mutator.sample_choices())
+        prune_target = mutator.choice_template
+
+        iter_per_epoch = 10
+        epoch_step = 2
+        time = 2
+        epoch = 6
+        data = self.fake_cifar_data()
+
+        prune_target['backbone.layer1.0.conv1_(0, 64)_64'] = 0.1
+        prune_target['backbone.layer1.1.conv1_(0, 64)_64'] = 0.1
+
+        algorithm = ItePruneAlgorithm(
+            MODEL_CFG,
+            target_pruning_ratio=prune_target,
+            mutator_cfg=MUTATOR_CONFIG_FLOAT,
+            step_freq=epoch_step,
+            prune_times=time).to(DEVICE)
+
+        algorithm.init_weights()
+        self._set_epoch_ite(1, 2, epoch)
+        algorithm.forward(data['inputs'], data['data_samples'], mode='loss')
+        self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+    def test_dist_init(self):
+        if DEVICE != torch.device('cuda:0'):
+            self.skipTest('not use cuda')
+        with SetDistEnv(DEVICE == torch.device('cuda:0')):
+            iter_per_epoch = 10
+            epoch_step = 2
+            times = 3
+            data = self.fake_cifar_data()
+
+            # prepare checkpoint
+            model_cfg = copy.deepcopy(MODEL_CFG)
+
+            algorithm = ItePruneAlgorithm(
+                model_cfg,
+                mutator_cfg=MUTATOR_CONFIG_NUM,
+                target_pruning_ratio=None,
+                step_freq=epoch_step,
+                prune_times=times,
+            ).to(DEVICE)
+            algorithm.init_weights()
+            self._set_epoch_ite(4, 5, 6)
+            algorithm.forward(
+                data['inputs'], data['data_samples'], mode='loss')
+            self.assertEqual(algorithm.step_freq, epoch_step * iter_per_epoch)
+
+    def test_resume(self):
+        algorithm: ItePruneAlgorithm = ItePruneAlgorithm(
+            MODEL_CFG,
+            mutator_cfg=MUTATOR_CONFIG_NUM,
+            target_pruning_ratio=None,
+            step_freq=1,
+            prune_times=1,
+        ).to(DEVICE)
+        algorithm.mutator.set_choices(algorithm.mutator.sample_choices())
+        state_dict = algorithm.state_dict()
+        print(state_dict.keys())
+
+        algorithm2: ItePruneAlgorithm = ItePruneAlgorithm(
+            MODEL_CFG,
+            mutator_cfg=MUTATOR_CONFIG_NUM,
+            target_pruning_ratio=None,
+            step_freq=1,
+            prune_times=1,
+        ).to(DEVICE)
+
+        algorithm2.load_state_dict(state_dict)
+
+        print(algorithm.mutator.current_choices)
+        print(algorithm2.mutator.current_choices)
+        self.assertDictEqual(algorithm.mutator.current_choices,
+                             algorithm2.mutator.current_choices)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_self_distill.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_self_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..5755d56a775ef71fcafe39361050767c1dacf518
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_self_distill.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmengine import ConfigDict
+
+from mmrazor.models import SelfDistill
+
+
+class TestSelfDistill(TestCase):
+
+    def test_init(self):
+
+        student_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+        teacher_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            distiller=dict(
+                type='BYOTDistiller',
+                student_recorders=student_recorders_cfg,
+                teacher_recorders=teacher_recorders_cfg,
+                distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_toy=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='conv')))))
+
+        _ = SelfDistill(**alg_kwargs)
+
+    def test_loss(self):
+
+        student_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+        teacher_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            distiller=dict(
+                type='BYOTDistiller',
+                student_recorders=student_recorders_cfg,
+                teacher_recorders=teacher_recorders_cfg,
+                distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_toy=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='conv')))))
+
+        img = torch.randn(1, 3, 1, 1)
+
+        alg = SelfDistill(**alg_kwargs)
+        losses = alg(img, mode='loss')
+        self.assertIn('distill.loss_toy', losses)
+        self.assertIn('student.loss', losses)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_single_teacher_distill.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_single_teacher_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..249e4878c32ca50bd0d1a3ab6698751a4da10154
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_single_teacher_distill.py
@@ -0,0 +1,77 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+from mmengine import ConfigDict
+
+from mmrazor.models import SingleTeacherDistill
+from .toy_models import ToyStudent
+
+
+class TestSingleTeacherDistill(TestCase):
+
+    def test_init(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teacher=dict(type='ToyTeacher'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=recorders_cfg,
+                distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_toy=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='conv')))))
+
+        alg = SingleTeacherDistill(**alg_kwargs)
+
+        teacher = ToyStudent()
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teacher'] = teacher
+        alg = SingleTeacherDistill(**alg_kwargs_)
+        self.assertEquals(alg.teacher, teacher)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teacher'] = 'teacher'
+        with self.assertRaisesRegex(TypeError,
+                                    'teacher should be a `dict` or'):
+            _ = SingleTeacherDistill(**alg_kwargs_)
+
+    def test_loss(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        alg_kwargs = ConfigDict(
+            architecture=dict(type='ToyStudent'),
+            teacher=dict(type='ToyTeacher'),
+            distiller=dict(
+                type='ConfigurableDistiller',
+                student_recorders=recorders_cfg,
+                teacher_recorders=recorders_cfg,
+                distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+                loss_forward_mappings=dict(
+                    loss_toy=dict(
+                        arg1=dict(from_student=True, recorder='conv'),
+                        arg2=dict(from_student=False, recorder='conv')))))
+
+        img = torch.randn(1, 3, 1, 1)
+
+        alg = SingleTeacherDistill(**alg_kwargs)
+        losses = alg(img, mode='loss')
+        self.assertIn('distill.loss_toy', losses)
+        self.assertIn('student.loss', losses)
+
+        alg_kwargs_ = copy.deepcopy(alg_kwargs)
+        alg_kwargs_['teacher_trainable'] = True
+        alg = SingleTeacherDistill(**alg_kwargs_)
+        losses = alg(img, mode='loss')
+        self.assertIn('distill.loss_toy', losses)
+        self.assertIn('student.loss', losses)
+        self.assertIn('teacher.loss', losses)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_slimmable_network.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_slimmable_network.py
new file mode 100644
index 0000000000000000000000000000000000000000..2402e2493c508540a3c84956239eb2c093b85583
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_slimmable_network.py
@@ -0,0 +1,182 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+from typing import Dict, List, Tuple
+from unittest import TestCase
+from unittest.mock import Mock
+
+import pytest
+import torch
+import torch.distributed as dist
+from mmcls.structures import ClsDataSample
+from mmengine.optim import build_optim_wrapper
+
+from mmrazor.models.algorithms import SlimmableNetwork, SlimmableNetworkDDP
+
+MODEL_CFG = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(type='MobileNetV2', widen_factor=1.5),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=1920,
+        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5)))
+CHANNEL_CFG_PATH = 'tests/data/MBV2_slimmable_channel_config.json'
+
+MUTATOR_CFG = dict(
+    type='SlimmableChannelMutator',
+    channel_unit_cfg=dict(type='SlimmableChannelUnit', units=CHANNEL_CFG_PATH),
+    parse_cfg=dict(type='ChannelAnalyzer'))
+
+CHANNEL_CFG_PATHS = [
+    'tests/data/MBV2_220M.yaml',
+    'tests/data/MBV2_320M.yaml',
+    'tests/data/MBV2_530M.yaml',
+]
+
+OPTIMIZER_CFG = dict(
+    type='SGD', lr=0.5, momentum=0.9, nesterov=True, weight_decay=0.0001)
+OPTIM_WRAPPER_CFG = dict(optimizer=OPTIMIZER_CFG, accumulative_counts=3)
+
+
+class FakeMutator:
+    ...
+
+
+class ToyDataPreprocessor(torch.nn.Module):
+
+    def forward(
+            self,
+            data: Dict,
+            training: bool = True) -> Tuple[torch.Tensor, List[ClsDataSample]]:
+        return data
+
+
+class TestSlimmable(TestCase):
+    device: str = 'cpu'
+
+    def test_init(self) -> None:
+
+        mutator_wrong_type = FakeMutator()
+        with pytest.raises(AttributeError):
+            _ = self.prepare_model(MODEL_CFG, mutator_wrong_type)
+
+        # assert has prunable units
+        algo = SlimmableNetwork(MODEL_CFG, MUTATOR_CFG)
+        self.assertGreater(len(algo.mutator.mutable_units), 0)
+
+        # assert can generate config template
+        mutator_cfg = copy.deepcopy(MUTATOR_CFG)
+        mutator_cfg['channel_unit_cfg']['units'] = {}
+        algo = SlimmableNetwork(MODEL_CFG, mutator_cfg)
+        try:
+            algo.mutator.config_template()
+        except Exception:
+            self.fail()
+
+    def test_is_deployed(self) -> None:
+        slimmable_should_not_deployed = \
+            SlimmableNetwork(MODEL_CFG, MUTATOR_CFG)
+        assert not slimmable_should_not_deployed.is_deployed
+
+        slimmable_should_deployed = \
+            SlimmableNetwork(MODEL_CFG, MUTATOR_CFG, deploy_index=0)
+        assert slimmable_should_deployed.is_deployed
+
+    def test_slimmable_train_step(self) -> None:
+        algo = self.prepare_slimmable_model()
+        data = self._prepare_fake_data()
+        optim_wrapper_cfg = copy.deepcopy(OPTIM_WRAPPER_CFG)
+        optim_wrapper_cfg['accumulative_counts'] = 1
+        optim_wrapper = build_optim_wrapper(algo, optim_wrapper_cfg)
+        fake_message_hub = Mock()
+        fake_message_hub.runtime_info = {'iter': 0, 'max_iters': 100}
+        optim_wrapper.message_hub = fake_message_hub
+        assert not algo._optim_wrapper_count_status_reinitialized
+        losses = algo.train_step(data, optim_wrapper)
+
+        assert len(losses) == 3
+        assert losses['subnet_0.loss'] > 0
+        assert losses['subnet_1.loss'] > 0
+        assert losses['subnet_2.loss'] > 0
+
+        self.assertTrue(algo._optim_wrapper_count_status_reinitialized)
+        self.assertEqual(optim_wrapper._inner_count, 3)
+        self.assertEqual(optim_wrapper._max_counts, 300)
+
+        losses = algo.train_step(data, optim_wrapper)
+        assert algo._optim_wrapper_count_status_reinitialized
+
+    def test_fixed_train_step(self) -> None:
+        algo = self.prepare_fixed_model()
+        data = self._prepare_fake_data()
+        optim_wrapper = build_optim_wrapper(algo, OPTIM_WRAPPER_CFG)
+        losses = algo.train_step(data, optim_wrapper)
+
+        assert len(losses) == 1
+        assert losses['loss'] > 0
+
+    def _prepare_fake_data(self) -> Dict:
+        imgs = torch.randn(16, 3, 224, 224).to(self.device)
+        data_samples = [
+            ClsDataSample().set_gt_label(torch.randint(0, 1000,
+                                                       (16, ))).to(self.device)
+        ]
+
+        return {'inputs': imgs, 'data_samples': data_samples}
+
+    def prepare_slimmable_model(self) -> SlimmableNetwork:
+        return self.prepare_model(MODEL_CFG, MUTATOR_CFG)
+
+    def prepare_fixed_model(self) -> SlimmableNetwork:
+        return self.prepare_model(MODEL_CFG, MUTATOR_CFG, deploy=0)
+
+    def prepare_model(self,
+                      model_cfg: Dict,
+                      mutator_cfg: Dict,
+                      deploy=-1) -> SlimmableNetwork:
+        model = SlimmableNetwork(model_cfg, mutator_cfg, deploy,
+                                 ToyDataPreprocessor())
+        model.to(self.device)
+        return model
+
+
+class TestSlimmableDDP(TestSlimmable):
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12355'
+
+        # initialize the process group
+        if torch.cuda.is_available():
+            backend = 'nccl'
+            cls.device = 'cuda'
+        else:
+            backend = 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def prepare_model(self,
+                      model_cfg: Dict,
+                      mutator_cfg: Dict,
+                      deploy=-1) -> SlimmableNetwork:
+        model = super().prepare_model(model_cfg, mutator_cfg, deploy)
+        return SlimmableNetworkDDP(module=model, find_unused_parameters=True)
+
+    def test_is_deployed(self) -> None:
+        ...
+
+    @pytest.mark.skipif(
+        not torch.cuda.is_available(), reason='cuda device is not avaliable')
+    def test_init(self) -> None:
+        model = super().prepare_slimmable_model()
+        ddp_model = SlimmableNetworkDDP(module=model, device_ids=[0])
+
+        self.assertIsInstance(ddp_model, SlimmableNetworkDDP)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        dist.destroy_process_group()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_spos.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_spos.py
new file mode 100644
index 0000000000000000000000000000000000000000..537a16438c3e70bf83b189e477129502b7d6bf0e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/test_spos.py
@@ -0,0 +1,85 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine.model import BaseModel
+
+from mmrazor.models import SPOS, NasMutator, OneShotMutableOP
+from mmrazor.registry import MODELS
+from mmrazor.structures import load_fix_subnet
+
+MUTATOR_CFG = dict(type='NasMutator')
+
+
+@MODELS.register_module()
+class ToySearchableModel(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        convs = nn.ModuleDict({
+            'conv1': nn.Conv2d(3, 8, 1),
+            'conv2': nn.Conv2d(3, 8, 1),
+            'conv3': nn.Conv2d(3, 8, 1),
+        })
+        self.mutable = OneShotMutableOP(convs)
+        self.bn = nn.BatchNorm2d(8)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.bn(self.mutable(batch_inputs))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.bn(self.mutable(batch_inputs)) + 1
+            return out
+        elif mode == 'tensor':
+            out = self.bn(self.mutable(batch_inputs)) + 2
+            return out
+
+
+class TestSPOS(TestCase):
+
+    def test_init(self):
+        # initiate spos when `norm_training` is True.
+        model = ToySearchableModel()
+        mutator = MODELS.build(MUTATOR_CFG)
+        alg = SPOS(model, mutator, norm_training=True)
+        alg.eval()
+        self.assertTrue(model.bn.training)
+
+        # initiate spos with built `mutator`.
+        model = ToySearchableModel()
+        mutator = MODELS.build(MUTATOR_CFG)
+        alg = SPOS(model, mutator)
+        self.assertIs(alg.mutator, mutator)
+
+        # initiate spos with unbuilt `mutator`.
+        mutator = dict(type='NasMutator')
+        alg = SPOS(model, mutator)
+        self.assertIsInstance(alg.mutator, NasMutator)
+
+        # test load fix_subnet
+        fix_subnet = {'mutable': {'chosen': 'conv1'}}
+        load_fix_subnet(model, fix_subnet)
+        algo = SPOS(model, mutator)
+        self.assertEqual(algo.architecture.mutable.num_choices, 1)
+
+        # initiate spos with error type `mutator`.
+        with self.assertRaisesRegex(TypeError, 'mutator should be'):
+            SPOS(model, model)
+
+    def test_forward_loss(self):
+        inputs = torch.randn(1, 3, 8, 8)
+        model = ToySearchableModel()
+
+        # supernet
+        mutator = MODELS.build(MUTATOR_CFG)
+        alg = SPOS(model, mutator)
+        loss = alg(inputs, mode='loss')
+        self.assertIsInstance(loss, dict)
+
+        # subnet
+        fix_subnet = {'mutable': {'chosen': 'conv1'}}
+        load_fix_subnet(model, fix_subnet)
+        loss = model(inputs, mode='loss')
+        self.assertIsInstance(loss, dict)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/toy_models.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/toy_models.py
new file mode 100644
index 0000000000000000000000000000000000000000..09858afe03907970598f6401b0b9b6edc2546417
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_algorithms/toy_models.py
@@ -0,0 +1,94 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from dataclasses import dataclass
+
+import torch
+from mmengine.model import BaseModel
+from torch import nn
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class ToyStudent(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        self.conv = nn.Conv2d(3, 1, 1)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.conv(batch_inputs)
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.conv(batch_inputs) + 1
+            return out
+        elif mode == 'tensor':
+            out = self.conv(batch_inputs) + 2
+            return out
+
+
+@MODELS.register_module()
+class ToyTeacher(ToyStudent):
+
+    def __init__(self):
+        super().__init__()
+
+
+@MODELS.register_module()
+class ToyOFDStudent(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        self.conv = nn.Conv2d(3, 1, 1)
+        self.bn = nn.BatchNorm2d(100)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.bn(self.conv(batch_inputs))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.bn(self.conv(batch_inputs) + 1)
+            return out
+        elif mode == 'tensor':
+            out = self.bn(self.conv(batch_inputs) + 2)
+            return out
+
+
+@MODELS.register_module()
+class ToyOFDTeacher(ToyOFDStudent):
+
+    def __init__(self):
+        super().__init__()
+
+
+@dataclass(frozen=True)
+class Data:
+    latent_dim: int = 1
+
+
+@MODELS.register_module()
+class ToyGenerator(BaseModel):
+
+    def __init__(self, latent_dim=4, out_channel=3):
+        super().__init__(data_preprocessor=None, init_cfg=None)
+        self.latent_dim = latent_dim
+        self.out_channel = out_channel
+        self.conv = nn.Conv2d(self.latent_dim, self.out_channel, 1)
+
+        # Imitate the structure of generator in separate model_wrapper.
+        self.module = Data(latent_dim=self.latent_dim)
+
+    def forward(self, data=None, batch_size=4):
+        fakeimg_init = torch.randn(batch_size, self.latent_dim, 1, 1)
+        fakeimg = self.conv(fakeimg_init)
+        return fakeimg
+
+
+@MODELS.register_module()
+class ToyDistillLoss(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, arg1, arg2):
+        return arg1 + arg2
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_autoformerbackbone.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_autoformerbackbone.py
new file mode 100644
index 0000000000000000000000000000000000000000..25217d1e83279b9208137d36b055f0ad177bceb4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_autoformerbackbone.py
@@ -0,0 +1,60 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.models.architectures.dynamic_ops import (
+    DynamicLinear, DynamicMultiheadAttention, DynamicPatchEmbed,
+    DynamicRelativePosition2D, DynamicSequential)
+from mmrazor.models.mutables import MutableChannelContainer
+from mmrazor.registry import MODELS
+
+arch_setting = dict(
+    mlp_ratios=[3.0, 3.5, 4.0],
+    num_heads=[8, 9, 10],
+    depth=[14, 15, 16],
+    embed_dims=[528, 576, 624])
+
+BACKBONE_CFG = dict(
+    type='mmrazor.AutoformerBackbone',
+    arch_setting=arch_setting,
+    img_size=224,
+    patch_size=16,
+    in_channels=3,
+    norm_cfg=dict(type='mmrazor.DynamicLayerNorm'),
+    act_cfg=dict(type='GELU'))
+
+
+def test_searchable_autoformer_mutable() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+
+    num_heads = backbone.arch_setting['num_heads']
+    mlp_ratios = backbone.arch_setting['mlp_ratios']
+    depth = backbone.arch_setting['depth']
+    embed_dims = backbone.arch_setting['embed_dims']
+    embed_dims_expansion = [i * j for i in mlp_ratios for j in embed_dims]
+    head_expansion = [i * 64 for i in num_heads]
+
+    for name, module in backbone.named_modules():
+        if isinstance(module, DynamicRelativePosition2D):
+            assert len(module.mutable_head_dims.current_choice) == 64
+        elif isinstance(module, DynamicMultiheadAttention):
+            assert len(
+                module.mutable_embed_dims.current_choice) == max(embed_dims)
+            assert len(module.mutable_q_embed_dims.current_choice) == max(
+                head_expansion)
+            assert module.mutable_num_heads.choices == num_heads
+        elif isinstance(module, DynamicLinear):
+            if 'fc1' in name:
+                assert module.mutable_attrs['in_features'].num_channels == max(
+                    embed_dims)
+                assert module.mutable_attrs[
+                    'out_features'].num_channels == max(embed_dims_expansion)
+            elif 'fc2' in name:
+                assert module.mutable_attrs['in_features'].num_channels == max(
+                    embed_dims_expansion)
+                assert module.mutable_attrs[
+                    'out_features'].num_channels == max(embed_dims)
+        elif isinstance(module, DynamicPatchEmbed):
+            assert type(module.mutable_embed_dims) == MutableChannelContainer
+            assert len(
+                module.mutable_embed_dims.current_choice) == max(embed_dims)
+        elif isinstance(module, DynamicSequential):
+            assert module.mutable_depth.choices == depth
+    assert backbone.last_mutable.num_channels == max(embed_dims)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_dartsbackbone.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_dartsbackbone.py
new file mode 100644
index 0000000000000000000000000000000000000000..a4ae05950f8cb9db69614eeacb6e5f53db3f0810
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_dartsbackbone.py
@@ -0,0 +1,117 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmcls.models import *  # noqa:F403,F401
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+class TestDartsBackbone(TestCase):
+
+    def setUp(self) -> None:
+        self.mutable_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ))
+
+        self.route_cfg = dict(
+            type='DiffChoiceRoute',
+            with_arch_param=True,
+        )
+
+        self.backbone_cfg = dict(
+            type='mmrazor.DartsBackbone',
+            in_channels=3,
+            base_channels=16,
+            num_layers=8,
+            num_nodes=4,
+            stem_multiplier=3,
+            out_indices=(7, ),
+            mutable_cfg=self.mutable_cfg,
+            route_cfg=self.route_cfg)
+
+        self.mutator_cfg = dict(
+            type='NasMutator',
+            custom_groups=None,
+        )
+
+    def test_darts_backbone(self):
+        model = MODELS.build(self.backbone_cfg)
+        custom_group = self.generate_key(model)
+
+        assert model is not None
+        self.mutable_cfg.update(custom_group=custom_group)
+        mutator = MODELS.build(self.mutator_cfg)
+        assert mutator is not None
+
+        mutator.prepare_from_supernet(model)
+        # mutator.modify_supernet_forward(mutator.arch_params)
+
+        inputs = torch.randn(4, 3, 224, 224)
+        outputs = model(inputs)
+        assert outputs is not None
+
+    def test_darts_backbone_with_auxliary(self):
+        self.backbone_cfg.update(
+            auxliary=True, aux_channels=256, aux_out_channels=512)
+        model = MODELS.build(self.backbone_cfg)
+        custom_group = self.generate_key(model)
+
+        assert model is not None
+        self.mutable_cfg.update(custom_groups=custom_group)
+        mutator = MODELS.build(self.mutator_cfg)
+        assert mutator is not None
+        mutator.prepare_from_supernet(model)
+        # mutator.modify_supernet_forward(mutator.arch_params)
+
+        inputs = torch.randn(4, 3, 224, 224)
+        outputs = model(inputs)
+        assert outputs is not None
+
+    def generate_key(self, model):
+        """auto generate custom group for darts."""
+        tmp_dict = dict()
+
+        for key, _ in model.named_modules():
+            node_type = key.split('._candidates')[0].split('.')[-1].split(
+                '_')[0]
+            if node_type not in ['normal', 'reduce']:
+                # not supported type
+                continue
+
+            node_name = key.split('._candidates')[0].split('.')[-1]
+            if node_name not in tmp_dict.keys():
+                tmp_dict[node_name] = [key.split('._candidates')[0]]
+            else:
+                current_key = key.split('._candidates')[0]
+                if current_key not in tmp_dict[node_name]:
+                    tmp_dict[node_name].append(current_key)
+
+        return list(tmp_dict.values())
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v2.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..e4ae70d5c29d28dd40c2125f3b88d7f4bebccd07
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v2.py
@@ -0,0 +1,116 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import sys
+
+import pytest
+import torch
+from mmcls.models import *  # noqa: F401,F403
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models import *  # noqa: F401,F403
+from mmrazor.models.mutables import *  # noqa: F401,F403
+from mmrazor.registry import MODELS
+
+sys.path.append('tests/test_models/test_architectures/test_backbones')
+from utils import MockMutable  # noqa: E402
+
+_FIRST_STAGE_MUTABLE = dict(type='MockMutable', choices=['c1'])
+_OTHER_STAGE_MUTABLE = dict(
+    type='MockMutable', choices=['c1', 'c2', 'c3', 'c4'])
+ARCHSETTING_CFG = [
+    # Parameters to build layers. 4 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, stride, mutable cfg.
+    [16, 1, 1, _FIRST_STAGE_MUTABLE],
+    [24, 2, 2, _OTHER_STAGE_MUTABLE],
+    [32, 3, 2, _OTHER_STAGE_MUTABLE],
+    [64, 4, 2, _OTHER_STAGE_MUTABLE],
+    [96, 3, 1, _OTHER_STAGE_MUTABLE],
+    [160, 3, 2, _OTHER_STAGE_MUTABLE],
+    [320, 1, 1, _OTHER_STAGE_MUTABLE]
+]
+NORM_CFG = dict(type='BN')
+BACKBONE_CFG = dict(
+    type='mmrazor.SearchableMobileNetV2',
+    first_channels=32,
+    last_channels=1280,
+    widen_factor=1.0,
+    norm_cfg=NORM_CFG,
+    arch_setting=ARCHSETTING_CFG)
+
+
+def test_searchable_mobilenet_mutable() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+
+    choices = ['c1', 'c2', 'c3', 'c4']
+    mutable_nums = 0
+
+    for name, module in backbone.named_modules():
+        if isinstance(module, MockMutable):
+            if 'layer1' in name:
+                assert module.choices == ['c1']
+            else:
+                assert module.choices == choices
+            mutable_nums += 1
+
+    arch_setting = backbone.arch_setting
+    target_mutable_nums = 0
+    for layer_cfg in arch_setting:
+        target_mutable_nums += layer_cfg[1]
+    assert mutable_nums == target_mutable_nums
+
+
+def test_searchable_mobilenet_train() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        assert m.training
+
+    backbone.norm_eval = True
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        if isinstance(m, _BatchNorm):
+            assert not m.training
+        else:
+            assert m.training
+
+    x = torch.rand(10, 3, 224, 224)
+    assert len(backbone(x)) == 1
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 5
+    backbone = MODELS.build(backbone_cfg)
+    backbone.train()
+
+    for param in backbone.conv1.parameters():
+        assert not param.requires_grad
+    for i in range(1, 8):
+        layer = getattr(backbone, f'layer{i}')
+        for m in layer.modules():
+            if i <= 5:
+                assert not m.training
+            else:
+                assert m.training
+        for param in layer.parameters():
+            if i <= 5:
+                assert not param.requires_grad
+            else:
+                assert param.requires_grad
+
+
+def test_searchable_mobilenet_init() -> None:
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['out_indices'] = (10, )
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 8
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['widen_factor'] = 1.5
+    backbone = MODELS.build(backbone_cfg)
+    assert backbone.out_channel == 1920
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v3.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v3.py
new file mode 100644
index 0000000000000000000000000000000000000000..558e002afc9ce7859f6d4e7126f95a6d172a1453
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_mobilenet_v3.py
@@ -0,0 +1,124 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import sys
+
+import pytest
+import torch
+from mmcls.models import *  # noqa: F401,F403
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models.architectures.dynamic_ops import (BigNasConv2d,
+                                                      DynamicBatchNorm2d,
+                                                      DynamicSequential)
+from mmrazor.models.mutables import (MutableChannelContainer,
+                                     OneShotMutableValue)
+from mmrazor.models.utils import parse_values
+from mmrazor.registry import MODELS
+
+sys.path.append('tests/test_models/test_architectures/test_backbones')
+
+arch_setting = dict(
+    kernel_size=[
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+        [3, 5, 2],
+    ],
+    num_blocks=[
+        [1, 2, 1],
+        [3, 6, 1],
+        [3, 6, 1],
+        [1, 2, 1],
+    ],
+    expand_ratio=[
+        [1, 1, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+        [4, 6, 1],
+    ],
+    num_out_channels=[
+        [16, 24, 8],  # first layer
+        [16, 24, 8],
+        [24, 32, 8],
+        [32, 40, 8],
+        [64, 72, 8],
+        [72, 72, 8],  # last layer
+    ])
+
+BACKBONE_CFG = dict(
+    type='mmrazor.AttentiveMobileNetV3',
+    arch_setting=arch_setting,
+    out_indices=(4, ),
+    conv_cfg=dict(type='mmrazor.BigNasConv2d'),
+    norm_cfg=dict(type='mmrazor.DynamicBatchNorm2d', momentum=0.0))
+
+
+def test_attentive_mobilenet_mutable() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+
+    out_channels = backbone.arch_setting['num_out_channels']
+    out_channels = parse_values(out_channels)
+
+    for module in backbone.modules():
+        if isinstance(module, BigNasConv2d):
+            assert isinstance(module.mutable_attrs.in_channels,
+                              MutableChannelContainer)
+            assert isinstance(module.mutable_attrs.out_channels,
+                              MutableChannelContainer)
+        elif isinstance(module, DynamicBatchNorm2d):
+            assert isinstance(module.mutable_attrs.num_features,
+                              MutableChannelContainer)
+        elif isinstance(module, DynamicSequential):
+            assert isinstance(module.mutable_depth, OneShotMutableValue)
+
+    assert backbone.last_mutable_channels.num_channels == max(out_channels[-1])
+
+
+def test_attentive_mobilenet_train() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        assert m.training
+
+    backbone.norm_eval = True
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        if isinstance(m, _BatchNorm):
+            assert not m.training
+
+    x = torch.rand(10, 3, 224, 224)
+    assert len(backbone(x)) == 1
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 2
+    backbone = MODELS.build(backbone_cfg)
+    backbone.train()
+
+    for param in backbone.first_conv.parameters():
+        assert not param.requires_grad
+    for i, layer in enumerate(backbone.layers):
+        for param in layer.parameters():
+            if i <= 1:
+                assert not param.requires_grad
+            else:
+                assert param.requires_grad
+
+
+def test_searchable_mobilenet_init() -> None:
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['out_indices'] = (10, )
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 8
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['widen_factor'] = 1.5
+    backbone = MODELS.build(backbone_cfg)
+    assert backbone.out_channels == 112
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_shufflenet_v2.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_shufflenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..70060e4bbcca41d2a5a46d8b9ad0e5e800aff896
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/test_searchable_shufflenet_v2.py
@@ -0,0 +1,160 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import sys
+import tempfile
+
+import pytest
+import torch
+from mmcls.models import *  # noqa: F401,F403
+from torch.nn import GroupNorm
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmrazor.models import *  # noqa: F401,F403
+from mmrazor.models.mutables import *  # noqa: F401,F403
+from mmrazor.registry import MODELS
+
+sys.path.append('tests/test_models/test_architectures/test_backbones')
+from utils import MockMutable  # noqa: E402
+
+STAGE_MUTABLE = dict(type='MockMutable', choices=['c1', 'c2', 'c3', 'c4'])
+ARCHSETTING_CFG = [
+    # Parameters to build layers. 3 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, mutable_cfg.
+    [64, 4, STAGE_MUTABLE],
+    [160, 4, STAGE_MUTABLE],
+    [320, 8, STAGE_MUTABLE],
+    [640, 4, STAGE_MUTABLE],
+]
+
+NORM_CFG = dict(type='BN')
+BACKBONE_CFG = dict(
+    type='mmrazor.SearchableShuffleNetV2',
+    widen_factor=1.0,
+    norm_cfg=NORM_CFG,
+    arch_setting=ARCHSETTING_CFG)
+
+
+def test_searchable_shufflenet_v2_mutable() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+
+    choices = ['c1', 'c2', 'c3', 'c4']
+    mutable_nums = 0
+
+    for module in backbone.modules():
+        if isinstance(module, MockMutable):
+            assert module.choices == choices
+            mutable_nums += 1
+
+    arch_setting = backbone.arch_setting
+    target_mutable_nums = 0
+    for layer_cfg in arch_setting:
+        target_mutable_nums += layer_cfg[1]
+    assert mutable_nums == target_mutable_nums
+
+
+def test_searchable_shufflenet_v2_train() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        assert m.training
+
+    backbone.norm_eval = True
+    backbone.train(mode=True)
+    for m in backbone.modules():
+        if isinstance(m, _BatchNorm):
+            assert not m.training
+        else:
+            assert m.training
+
+    x = torch.rand(10, 3, 224, 224)
+    assert len(backbone(x)) == 1
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 2
+    backbone = MODELS.build(backbone_cfg)
+    backbone.train()
+
+    for param in backbone.conv1.parameters():
+        assert not param.requires_grad
+    for i in range(2):
+        layer = backbone.layers[i]
+        for m in layer.modules():
+            if i < 2:
+                assert not m.training
+            else:
+                assert m.training
+        for param in layer.parameters():
+            if i < 2:
+                assert not param.requires_grad
+            else:
+                assert param.requires_grad
+
+
+def test_searchable_shufflenet_v2_init() -> None:
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['out_indices'] = (5, )
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['frozen_stages'] = 5
+
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['with_last_layer'] = False
+    with pytest.raises(ValueError):
+        MODELS.build(backbone_cfg)
+
+    backbone_cfg['out_indices'] = (3, )
+    backbone = MODELS.build(backbone_cfg)
+    assert len(backbone.layers) == 4
+
+
+def test_searchable_shufflenet_v2_init_weights() -> None:
+    backbone = MODELS.build(BACKBONE_CFG)
+    backbone.init_weights()
+
+    for m in backbone.modules():
+        if isinstance(m, (_BatchNorm, GroupNorm)):
+            if hasattr(m, 'weight') and m.weight is not None:
+                assert torch.equal(m.weight, torch.ones_like(m.weight))
+            if hasattr(m, 'bias') and m.bias is not None:
+                bias_tensor = torch.ones_like(m.bias)
+                bias_tensor *= 0.0001
+                assert torch.equal(bias_tensor, m.bias)
+
+    temp_dir = tempfile.mkdtemp()
+    checkpoint_path = os.path.join(temp_dir, 'checkpoint.pth')
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone = MODELS.build(backbone_cfg)
+    torch.save(backbone.state_dict(), checkpoint_path)
+    backbone_cfg['init_cfg'] = dict(
+        type='Pretrained', checkpoint=checkpoint_path)
+    backbone = MODELS.build(backbone_cfg)
+
+    name2weight = dict()
+    for name, m in backbone.named_modules():
+        if isinstance(m, (_BatchNorm, GroupNorm)):
+            if hasattr(m, 'weight') and m.weight is not None:
+                name2weight[name] = m.weight.clone()
+
+    backbone.init_weights()
+    for name, m in backbone.named_modules():
+        if isinstance(m, (_BatchNorm, GroupNorm)):
+            if hasattr(m, 'weight') and m.weight is not None:
+                if name in name2weight:
+                    assert torch.equal(name2weight[name], m.weight)
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['norm_cfg'] = dict(type='BN', track_running_stats=False)
+    backbone = MODELS.build(backbone_cfg)
+    backbone.init_weights()
+
+    backbone_cfg = copy.deepcopy(BACKBONE_CFG)
+    backbone_cfg['norm_cfg'] = dict(type='GN', num_groups=1)
+    backbone = MODELS.build(backbone_cfg)
+    backbone.init_weights()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/utils.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..593faa4aa48c8f2cee4c4457ca7bf10391198909
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_backbones/utils.py
@@ -0,0 +1,21 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List
+
+from torch import Tensor
+from torch.nn import Conv2d, Module
+
+from mmrazor.registry import MODELS
+
+
+@MODELS.register_module()
+class MockMutable(Module):
+
+    def __init__(self, choices: List[str], module_kwargs: Dict) -> None:
+        super().__init__()
+
+        self.choices = choices
+        self.module_kwargs = module_kwargs
+        self.conv = Conv2d(**module_kwargs, kernel_size=3, padding=3 // 2)
+
+    def forward(self, x: Tensor) -> Tensor:
+        return self.conv(x)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_connectors/test_connectors.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_connectors/test_connectors.py
new file mode 100644
index 0000000000000000000000000000000000000000..80b3f88b2a10e457fc2b2fb431c58301301a3e07
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_connectors/test_connectors.py
@@ -0,0 +1,169 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models import (BYOTConnector, ConvModuleConnector, CRDConnector,
+                            FBKDStudentConnector, FBKDTeacherConnector,
+                            MGDConnector, NormConnector, Paraphraser,
+                            TorchFunctionalConnector, TorchNNConnector,
+                            Translator)
+
+
+class TestConnector(TestCase):
+
+    @classmethod
+    def setUpClass(cls):
+        cls.s_feat = torch.randn(1, 1, 5, 5)
+        cls.t_feat = torch.randn(1, 3, 5, 5)
+
+    def test_convmodule_connector(self):
+        convmodule_connector_cfg = dict(
+            in_channel=1, out_channel=3, norm_cfg=dict(type='BN'))
+        convmodule_connector = ConvModuleConnector(**convmodule_connector_cfg)
+
+        output = convmodule_connector.forward_train(self.s_feat)
+        assert output.size() == self.t_feat.size()
+
+        convmodule_connector_cfg['order'] = ('conv', 'norm')
+        with self.assertRaises(AssertionError):
+            _ = ConvModuleConnector(**convmodule_connector_cfg)
+
+        convmodule_connector_cfg['act_cfg'] = 'ReLU'
+        with self.assertRaises(AssertionError):
+            _ = ConvModuleConnector(**convmodule_connector_cfg)
+
+        convmodule_connector_cfg['norm_cfg'] = 'BN'
+        with self.assertRaises(AssertionError):
+            _ = ConvModuleConnector(**convmodule_connector_cfg)
+
+        convmodule_connector_cfg['conv_cfg'] = 'conv2d'
+        with self.assertRaises(AssertionError):
+            _ = ConvModuleConnector(**convmodule_connector_cfg)
+
+    def test_crd_connector(self):
+        dim_out = 128
+        crd_stu_connector = CRDConnector(
+            **dict(dim_in=1 * 5 * 5, dim_out=dim_out))
+
+        crd_tea_connector = CRDConnector(
+            **dict(dim_in=3 * 5 * 5, dim_out=dim_out))
+
+        assert crd_stu_connector.linear.in_features == 1 * 5 * 5
+        assert crd_stu_connector.linear.out_features == dim_out
+        assert crd_tea_connector.linear.in_features == 3 * 5 * 5
+        assert crd_tea_connector.linear.out_features == dim_out
+
+        s_output = crd_stu_connector.forward_train(self.s_feat)
+        t_output = crd_tea_connector.forward_train(self.t_feat)
+        assert s_output.size() == t_output.size()
+
+    def test_ft_connector(self):
+        stu_connector = Translator(**dict(in_channel=1, out_channel=2))
+
+        tea_connector = Paraphraser(**dict(in_channel=3, out_channel=2))
+
+        s_connect = stu_connector.forward_train(self.s_feat)
+        t_connect = tea_connector.forward_train(self.t_feat)
+        assert s_connect.size() == t_connect.size()
+        t_pretrain = tea_connector.forward_pretrain(self.t_feat)
+        assert t_pretrain.size() == torch.Size([1, 3, 5, 5])
+
+    def test_byot_connector(self):
+        byot_connector_cfg = dict(
+            in_channel=16,
+            out_channel=32,
+            num_classes=10,
+            expansion=4,
+            pool_size=4,
+            kernel_size=3,
+            stride=2,
+            init_cfg=None)
+        byot_connector = BYOTConnector(**byot_connector_cfg)
+
+        s_feat = torch.randn(1, 16 * 4, 8, 8)
+        t_feat = torch.randn(1, 32 * 4)
+        labels = torch.randn(1, 10)
+
+        output, logits = byot_connector.forward_train(s_feat)
+        assert output.size() == t_feat.size()
+        assert logits.size() == labels.size()
+
+    def test_fbkd_connector(self):
+        fbkd_stuconnector_cfg = dict(
+            in_channels=16, reduction=2, sub_sample=True)
+        fbkd_stuconnector = FBKDStudentConnector(**fbkd_stuconnector_cfg)
+
+        fbkd_teaconnector_cfg = dict(
+            in_channels=16, reduction=2, sub_sample=True)
+        fbkd_teaconnector = FBKDTeacherConnector(**fbkd_teaconnector_cfg)
+
+        s_feat = torch.randn(1, 16, 8, 8)
+        t_feat = torch.randn(1, 16, 8, 8)
+
+        s_output = fbkd_stuconnector(s_feat)
+        t_output = fbkd_teaconnector(t_feat)
+
+        assert len(s_output) == 6
+        assert len(t_output) == 5
+        assert torch.equal(t_output[-1], t_feat)
+
+    def test_torch_connector(self):
+        tensor1 = torch.rand(3, 3, 16, 16)
+        functional_pool_connector = TorchFunctionalConnector(
+            function_name='avg_pool2d', func_args=dict(kernel_size=4))
+        tensor2 = functional_pool_connector.forward_train(tensor1)
+        assert tensor2.shape == torch.Size([3, 3, 4, 4])
+
+        with self.assertRaises(AssertionError):
+            functional_pool_connector = TorchFunctionalConnector()
+        with self.assertRaises(ValueError):
+            functional_pool_connector = TorchFunctionalConnector(
+                function_name='fake')
+
+        nn_pool_connector = TorchNNConnector(
+            module_name='AvgPool2d', module_args=dict(kernel_size=4))
+        tensor3 = nn_pool_connector.forward_train(tensor1)
+        assert tensor3.shape == torch.Size([3, 3, 4, 4])
+        assert torch.equal(tensor2, tensor3)
+
+        with self.assertRaises(AssertionError):
+            functional_pool_connector = TorchFunctionalConnector()
+        with self.assertRaises(ValueError):
+            functional_pool_connector = TorchNNConnector(module_name='fake')
+
+    def test_mgd_connector(self):
+        s_feat = torch.randn(1, 16, 8, 8)
+        mgd_connector1 = MGDConnector(
+            student_channels=16, teacher_channels=16, lambda_mgd=0.65)
+        mgd_connector2 = MGDConnector(
+            student_channels=16, teacher_channels=32, lambda_mgd=0.65)
+        s_output1 = mgd_connector1.forward_train(s_feat)
+        s_output2 = mgd_connector2.forward_train(s_feat)
+
+        assert s_output1.shape == torch.Size([1, 16, 8, 8])
+        assert s_output2.shape == torch.Size([1, 32, 8, 8])
+
+        mgd_connector1 = MGDConnector(
+            student_channels=16,
+            teacher_channels=16,
+            lambda_mgd=0.65,
+            mask_on_channel=True)
+        mgd_connector2 = MGDConnector(
+            student_channels=16,
+            teacher_channels=32,
+            lambda_mgd=0.65,
+            mask_on_channel=True)
+        s_output1 = mgd_connector1.forward_train(s_feat)
+        s_output2 = mgd_connector2.forward_train(s_feat)
+
+        assert s_output1.shape == torch.Size([1, 16, 8, 8])
+        assert s_output2.shape == torch.Size([1, 32, 8, 8])
+
+    def test_norm_connector(self):
+        s_feat = torch.randn(2, 3, 2, 2)
+        norm_cfg = dict(type='BN', affine=False, track_running_stats=False)
+        norm_connector = NormConnector(3, norm_cfg)
+        output = norm_connector.forward_train(s_feat)
+
+        assert output.shape == torch.Size([2, 3, 2, 2])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_attention.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ed47c0ce1be7f097be976d21410cfa69a63197e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_attention.py
@@ -0,0 +1,50 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models.architectures.dynamic_ops import DynamicMultiheadAttention
+from mmrazor.models.architectures.ops import MultiheadAttention
+from mmrazor.models.mutables import (MutableChannelContainer,
+                                     OneShotMutableChannel,
+                                     OneShotMutableChannelUnit,
+                                     OneShotMutableValue)
+
+
+class TestDynamicMHA(TestCase):
+
+    def setUp(self) -> None:
+        self.mutable_num_heads = OneShotMutableValue(
+            value_list=[2, 4, 8], default_value=8)
+        self.mutable_embed_dims = OneShotMutableChannel(num_channels=128)
+        self.base_embed_dims = OneShotMutableChannel(
+            num_channels=8, candidate_choices=[8])
+        self.mutable_q_embed_dims = self.mutable_num_heads * \
+            self.base_embed_dims
+
+        self.dynamic_m = DynamicMultiheadAttention(embed_dims=128, num_heads=8)
+
+        OneShotMutableChannelUnit._register_channel_container(
+            self.dynamic_m, MutableChannelContainer)
+
+        self.dynamic_m.register_mutable_attr('num_heads',
+                                             self.mutable_num_heads)
+
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.dynamic_m, self.mutable_embed_dims, False)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.dynamic_m, self.mutable_q_embed_dims, True, end=64)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.dynamic_m.rel_pos_embed_k, self.base_embed_dims, False)
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.dynamic_m.rel_pos_embed_v, self.base_embed_dims, False)
+
+    def test_forward(self) -> None:
+        x = torch.randn(8, 197, 128)
+        output = self.dynamic_m(x)
+        self.assertIsNotNone(output)
+
+    def test_convert(self) -> None:
+        static_m = MultiheadAttention(embed_dims=100, num_heads=10)
+        dynamic_m = DynamicMultiheadAttention.convert_from(static_m)
+        self.assertIsNotNone(dynamic_m)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_container.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_container.py
new file mode 100644
index 0000000000000000000000000000000000000000..469ce0a9b662a1011dcb18d91cd348628e0f07a3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_container.py
@@ -0,0 +1,46 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch.nn as nn
+from torch.nn import Sequential
+
+from mmrazor.models.architectures.dynamic_ops import DynamicSequential
+from mmrazor.models.mutables import OneShotMutableValue
+
+
+class TestDynamicSequential(TestCase):
+
+    def setUp(self) -> None:
+        self.layers = [
+            nn.Linear(4, 5),
+            nn.Linear(5, 6),
+            nn.Linear(6, 7),
+            nn.Linear(7, 8),
+        ]
+        self.dynamic_m = DynamicSequential(*self.layers)
+        mutable_depth = OneShotMutableValue(
+            value_list=[2, 3, 4], default_value=3)
+
+        self.dynamic_m.register_mutable_attr('depth', mutable_depth)
+
+    def test_init(self) -> None:
+        self.assertEqual(
+            self.dynamic_m.get_mutable_attr('depth').current_choice, 3)
+
+    def test_to_static_op(self) -> None:
+        with pytest.raises(RuntimeError):
+            self.dynamic_m.to_static_op()
+
+        current_mutable = self.dynamic_m.get_mutable_attr('depth')
+        current_mutable.fix_chosen(current_mutable.dump_chosen().chosen)
+
+        static_op = self.dynamic_m.to_static_op()
+        self.assertIsNotNone(static_op)
+
+    def test_convert_from(self) -> None:
+        static_m = Sequential(*self.layers)
+
+        dynamic_m = DynamicSequential.convert_from(static_m)
+
+        self.assertIsNotNone(dynamic_m)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_conv.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_conv.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd41d6b5ca9a7c1310a91f7dea5e5c9f8f581c9d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_conv.py
@@ -0,0 +1,314 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import Type
+from unittest import TestCase
+from unittest.mock import MagicMock
+
+import pytest
+import torch
+from torch import nn
+
+from mmrazor.models.architectures.dynamic_ops import (
+    BigNasConv2d, DynamicConv2d, DynamicConv2dAdaptivePadding, FuseConv2d,
+    OFAConv2d)
+from mmrazor.models.mutables import (OneShotMutableValue, SimpleMutableChannel,
+                                     SquentialMutableChannel)
+from mmrazor.structures.subnet import export_fix_subnet, load_fix_subnet
+from ..utils import fix_dynamic_op
+
+
+class TestDynamicConv2d(TestCase):
+
+    def test_dynamic_conv2d_depthwise(self) -> None:
+        d_conv2d = DynamicConv2d(
+            in_channels=10,
+            out_channels=10,
+            groups=10,
+            kernel_size=3,
+            stride=1,
+            bias=True)
+
+        mock_mutable = MagicMock()
+        with pytest.raises(ValueError):
+            d_conv2d.register_mutable_attr('in_channels', mock_mutable)
+        with pytest.raises(ValueError):
+            d_conv2d.register_mutable_attr('out_channels', mock_mutable)
+
+        mock_mutable.current_mask = torch.rand(4)
+        with pytest.raises(ValueError):
+            d_conv2d.register_mutable_attr('in_channels', mock_mutable)
+        with pytest.raises(ValueError):
+            d_conv2d.register_mutable_attr('out_channels', mock_mutable)
+
+        mutable_in_channels = SquentialMutableChannel(10)
+        mutable_out_channels = SquentialMutableChannel(10)
+
+        d_conv2d.register_mutable_attr('in_channels', mutable_in_channels)
+        d_conv2d.register_mutable_attr('out_channels', mutable_out_channels)
+
+        with pytest.raises(RuntimeError):
+            d_conv2d.to_static_op()
+
+        d_conv2d.get_mutable_attr('in_channels').current_choice = 8
+        d_conv2d.get_mutable_attr('out_channels').current_choice = 8
+
+        x = torch.rand(10, 8, 224, 224)
+        out1 = d_conv2d(x)
+        assert out1.size(1) == 8
+
+        with pytest.raises(RuntimeError):
+            _ = d_conv2d.to_static_op()
+
+        fix_mutables = export_fix_subnet(d_conv2d)[0]
+        with pytest.raises(RuntimeError):
+            load_fix_subnet(d_conv2d, fix_mutables)
+        fix_dynamic_op(d_conv2d, fix_mutables)
+
+        s_conv2d = d_conv2d.to_static_op()
+        assert s_conv2d.weight.size(0) == 8
+        assert s_conv2d.weight.size(1) == 1
+        assert s_conv2d.bias.size(0) == 8
+        out2 = s_conv2d(x)
+
+        assert torch.equal(out1, out2)
+
+
+def mock_layeri_choice(d_conv2d: FuseConv2d) -> None:
+    # mock selected out channel proxy for `FuseConv2d`
+    c_out, _, _, _ = d_conv2d.weight.size()
+    print('d_conv2d.mutable_attrs:', d_conv2d.mutable_attrs)
+    if ('out_channels' in d_conv2d.mutable_attrs):
+        c_current_out = \
+            d_conv2d.mutable_attrs['out_channels'].current_mask.sum().item()
+    else:
+        c_current_out = c_out
+    device = d_conv2d.weight.device
+    layeri_mock = torch.rand(c_current_out, c_out).to(device)
+    d_conv2d.set_forward_args(choice=layeri_mock)
+
+
+@pytest.mark.parametrize('dynamic_class', [
+    BigNasConv2d, DynamicConv2d, FuseConv2d, OFAConv2d,
+    DynamicConv2dAdaptivePadding
+])
+@pytest.mark.parametrize('bias', [True, False])
+def test_dynamic_conv2d(bias: bool, dynamic_class: Type[nn.Conv2d]) -> None:
+    d_conv2d = dynamic_class(
+        in_channels=4, out_channels=10, kernel_size=3, stride=1, bias=bias)
+
+    x_max = torch.rand(10, 4, 224, 224)
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    out_before_mutate = d_conv2d(x_max)
+
+    mutable_in_channels = SquentialMutableChannel(4)
+    mutable_out_channels = SquentialMutableChannel(10)
+    d_conv2d.register_mutable_attr('in_channels', mutable_in_channels)
+    d_conv2d.register_mutable_attr('out_channels', mutable_out_channels)
+
+    with pytest.raises(RuntimeError):
+        d_conv2d.to_static_op()
+
+    d_conv2d.get_mutable_attr('in_channels').current_choice = 4
+    d_conv2d.mutate_out_channels = 10
+
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    out_max = d_conv2d(x_max)
+    assert torch.equal(out_before_mutate, out_max)
+
+    d_conv2d.get_mutable_attr('in_channels').current_choice = 3
+    d_conv2d.mutable_out_channels.current_choice = 4
+
+    x = torch.rand(10, 3, 224, 224)
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    out1 = d_conv2d(x)
+    assert out1.size(1) == 4
+
+    fix_mutables = export_fix_subnet(d_conv2d)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_conv2d, fix_mutables)
+    fix_dynamic_op(d_conv2d, fix_mutables)
+
+    s_conv2d = d_conv2d.to_static_op()
+    assert s_conv2d.weight.size(0) == 4
+    assert s_conv2d.weight.size(1) == 3
+    if bias:
+        assert s_conv2d.bias.size(0) == 4
+    out2 = s_conv2d(x)
+
+    assert torch.equal(out1, out2)
+
+
+@pytest.mark.parametrize('dynamic_class',
+                         [BigNasConv2d, DynamicConv2d, FuseConv2d, OFAConv2d])
+@pytest.mark.parametrize(
+    ['is_mutate_in_channels', 'in_channels', 'out_channels'], [(True, 6, 10),
+                                                               (False, 10, 4)])
+def test_dynamic_conv2d_mutable_single_channels(
+        is_mutate_in_channels: bool, in_channels: int, out_channels: int,
+        dynamic_class: Type[nn.Conv2d]) -> None:
+    d_conv2d = dynamic_class(
+        in_channels=10, out_channels=10, kernel_size=3, stride=1, bias=True)
+    mutable_channels = SquentialMutableChannel(10)
+
+    if is_mutate_in_channels:
+        d_conv2d.register_mutable_attr('in_channels', mutable_channels)
+    else:
+        d_conv2d.register_mutable_attr('out_channels', mutable_channels)
+
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    with pytest.raises(RuntimeError):
+        d_conv2d.to_static_op()
+
+    if is_mutate_in_channels:
+        d_conv2d.get_mutable_attr('in_channels').current_choice = in_channels
+        assert d_conv2d.get_mutable_attr('out_channels') is None
+    else:
+        d_conv2d.get_mutable_attr('out_channels').current_choice = out_channels
+        assert d_conv2d.get_mutable_attr('in_channels') is None
+
+    x = torch.rand(3, in_channels, 224, 224)
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    out1 = d_conv2d(x)
+
+    assert out1.size(1) == out_channels
+
+    with pytest.raises(RuntimeError):
+        _ = d_conv2d.to_static_op()
+
+    fix_mutables = export_fix_subnet(d_conv2d)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_conv2d, fix_mutables)
+    fix_dynamic_op(d_conv2d, fix_mutables)
+
+    s_conv2d = d_conv2d.to_static_op()
+    assert s_conv2d.weight.size(0) == out_channels
+    assert s_conv2d.weight.size(1) == in_channels
+    out2 = s_conv2d(x)
+
+    assert torch.equal(out1, out2)
+
+
+@pytest.mark.parametrize('dynamic_class', [OFAConv2d, BigNasConv2d])
+@pytest.mark.parametrize('kernel_size_list', [[5], [3, 5, 7]])
+def test_kernel_dynamic_conv2d(dynamic_class: Type[nn.Conv2d],
+                               kernel_size_list: bool) -> None:
+
+    mutable_in_channels = SquentialMutableChannel(10)
+    mutable_out_channels = SquentialMutableChannel(10)
+
+    mutable_kernel_size = OneShotMutableValue(value_list=kernel_size_list)
+
+    d_conv2d = dynamic_class(
+        in_channels=10,
+        out_channels=10,
+        groups=1,
+        kernel_size=3 if kernel_size_list is None else max(kernel_size_list),
+        stride=1,
+        bias=True)
+    d_conv2d.register_mutable_attr('in_channels', mutable_in_channels)
+    d_conv2d.register_mutable_attr('out_channels', mutable_out_channels)
+    if kernel_size_list is not None:
+        copied_mutable_kernel_size = copy.deepcopy(mutable_kernel_size)
+        copied_d_conv2d = copy.deepcopy(d_conv2d)
+
+        copied_mutable_kernel_size._value_list = []
+        with pytest.raises(ValueError):
+            _ = copied_d_conv2d.register_mutable_attr(
+                'kernel_size', copied_mutable_kernel_size)
+
+        d_conv2d.register_mutable_attr('kernel_size', mutable_kernel_size)
+        assert d_conv2d.kernel_size_list == kernel_size_list
+
+    with pytest.raises(RuntimeError):
+        d_conv2d.to_static_op()
+
+    d_conv2d.get_mutable_attr('in_channels').current_choice = 8
+    d_conv2d.get_mutable_attr('out_channels').current_choice = 8
+    if kernel_size_list is not None:
+        kernel_size = mutable_kernel_size.sample_choice()
+        d_conv2d.mutable_attrs['kernel_size'].current_choice = kernel_size
+
+    x = torch.rand(3, 8, 224, 224)
+    if (isinstance(d_conv2d, FuseConv2d)):
+        mock_layeri_choice(d_conv2d)
+    out1 = d_conv2d(x)
+    assert out1.size(1) == 8
+
+    fix_mutables = export_fix_subnet(d_conv2d)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_conv2d, fix_mutables)
+    fix_dynamic_op(d_conv2d, fix_mutables)
+
+    s_conv2d = d_conv2d.to_static_op()
+    assert s_conv2d.weight.size(0) == 8
+    assert s_conv2d.weight.size(1) == 8
+    assert s_conv2d.bias.size(0) == 8
+    if kernel_size_list is not None:
+        assert s_conv2d.kernel_size == (kernel_size, kernel_size)
+        assert tuple(s_conv2d.weight.shape[2:]) == (kernel_size, kernel_size)
+    out2 = s_conv2d(x)
+
+    assert torch.equal(out1, out2)
+
+
+@pytest.mark.parametrize('dynamic_class', [OFAConv2d, BigNasConv2d])
+def test_mutable_kernel_dynamic_conv2d_grad(
+        dynamic_class: Type[nn.Conv2d]) -> None:
+    from mmrazor.models.architectures.dynamic_ops.mixins import \
+        dynamic_conv_mixins
+
+    kernel_size_list = [3, 5, 7]
+    d_conv2d = dynamic_class(
+        in_channels=3,
+        out_channels=10,
+        groups=1,
+        kernel_size=max(kernel_size_list),
+        stride=1,
+        bias=False)
+
+    mutable_kernel_size = OneShotMutableValue(value_list=kernel_size_list)
+    d_conv2d.register_mutable_attr('kernel_size', mutable_kernel_size)
+
+    x = torch.rand(3, 3, 224, 224, requires_grad=True)
+
+    for kernel_size in kernel_size_list:
+        mutable_kernel_size.current_choice = kernel_size
+        if (isinstance(d_conv2d, FuseConv2d)):
+            mock_layeri_choice(d_conv2d)
+        out = d_conv2d(x).sum()
+        out.backward()
+
+        start_offset, end_offset = dynamic_conv_mixins._get_current_kernel_pos(
+            max(kernel_size_list), kernel_size)
+
+        mask = torch.ones_like(
+            d_conv2d.weight, requires_grad=False, dtype=torch.bool)
+        mask[:, :, start_offset:end_offset, start_offset:end_offset] = 0
+        assert d_conv2d.weight.grad[mask].norm().item() == 0
+
+        d_conv2d.weight.grad.zero_()
+
+
+def test_dynamic_group_wise_conv():
+    conv = DynamicConv2d(8, 16, 3, 1, 1, groups=4)
+    in_mutable = SimpleMutableChannel(8)
+    out_mutable = SimpleMutableChannel(16)
+    in_mutable.current_choice = torch.tensor([0, 1] * 4).bool()
+    out_mutable.current_choice = torch.tensor([0, 1, 1, 0] * 4).bool()
+    conv.register_mutable_attr('in_channels', in_mutable)
+    conv.register_mutable_attr('out_channels', out_mutable)
+
+    input = torch.rand([2, 4, 32, 32])
+    y1 = conv(input)
+    assert list(y1.shape) == [2, 8, 32, 32]
+
+    in_mutable.fix_chosen()
+    out_mutable.fix_chosen()
+    static_conv = conv.to_static_op()
+    y2 = static_conv(input)
+    assert torch.equal(y1, y2)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_embed.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_embed.py
new file mode 100644
index 0000000000000000000000000000000000000000..a656b2d5b844998111d6f68644f121b94b6807b2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_embed.py
@@ -0,0 +1,64 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+from mmcls.models.utils import PatchEmbed
+
+from mmrazor.models.architectures.dynamic_ops import DynamicPatchEmbed
+from mmrazor.models.mutables import SquentialMutableChannel
+
+
+class TestPatchEmbed(TestCase):
+
+    def setUp(self):
+        self.dynamic_embed = DynamicPatchEmbed(
+            img_size=224,
+            in_channels=3,
+            embed_dims=100,
+            norm_cfg=dict(type='BN'))
+
+        mutable_embed_dims = SquentialMutableChannel(num_channels=100)
+        mutable_embed_dims.current_choice = 50
+        self.dynamic_embed.register_mutable_attr('embed_dims',
+                                                 mutable_embed_dims)
+
+    def test_patch_embed(self):
+        mutable = SquentialMutableChannel(num_channels=120)
+
+        with pytest.raises(ValueError):
+            self.dynamic_embed.register_mutable_attr('embed_dims', mutable)
+
+        self.assertTrue(
+            self.dynamic_embed.get_mutable_attr('embed_dims').current_choice ==
+            50)
+
+    def test_convert(self):
+        static_m = PatchEmbed(
+            img_size=224,
+            in_channels=3,
+            embed_dims=768,
+            conv_cfg=dict(type='mmrazor.BigNasConv2d'),
+            norm_cfg=dict(type='mmrazor.DynamicBatchNorm2d'))
+
+        dynamic_m = DynamicPatchEmbed.convert_from(static_m)
+
+        self.assertIsNotNone(dynamic_m)
+
+        mutable_embed_dims = SquentialMutableChannel(num_channels=768)
+        dynamic_m.register_mutable_attr('embed_dims', mutable_embed_dims)
+        mutable_embed_dims.current_choice = 512
+
+    def test_to_static_op(self):
+        mutable_embed_dims = SquentialMutableChannel(num_channels=100)
+
+        mutable_embed_dims.current_choice = 10
+
+        with pytest.raises(RuntimeError):
+            self.dynamic_embed.to_static_op()
+
+        mutable_embed_dims.fix_chosen(mutable_embed_dims.dump_chosen().chosen)
+        self.dynamic_embed.register_mutable_attr('embed_dims',
+                                                 mutable_embed_dims)
+        static_op = self.dynamic_embed.to_static_op()
+
+        self.assertIsNotNone(static_op)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_layernorm.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_layernorm.py
new file mode 100644
index 0000000000000000000000000000000000000000..619881f33e08d5ec7e66c2f96dc51eba381b69c6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_layernorm.py
@@ -0,0 +1,45 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+from torch.nn import LayerNorm
+
+from mmrazor.models.architectures.dynamic_ops import DynamicLayerNorm
+from mmrazor.models.mutables import SquentialMutableChannel
+
+
+class TestDynamicLayerNorm(TestCase):
+
+    def setUp(self) -> None:
+        self.dynamic_m = DynamicLayerNorm(100)
+
+        mutable_num_features = SquentialMutableChannel(num_channels=100)
+
+        mutable_num_features.current_choice = 50
+
+        self.dynamic_m.register_mutable_attr('num_features',
+                                             mutable_num_features)
+
+    def test_init(self) -> None:
+        mutable = SquentialMutableChannel(num_channels=100)
+        self.dynamic_m.register_mutable_attr('in_channels', mutable)
+        self.dynamic_m.register_mutable_attr('out_channels', mutable)
+
+        self.assertEqual(
+            self.dynamic_m.get_mutable_attr('num_features').current_choice, 50)
+
+    def test_to_static_op(self):
+        with pytest.raises(RuntimeError):
+            self.dynamic_m.to_static_op()
+
+        current_mutable = self.dynamic_m.get_mutable_attr('num_features')
+        current_mutable.fix_chosen(current_mutable.dump_chosen().chosen)
+        static_op = self.dynamic_m.to_static_op()
+
+        self.assertIsNotNone(static_op)
+
+    def test_convert(self) -> None:
+        static_m = LayerNorm(100)
+        dynamic_m = DynamicLayerNorm.convert_from(static_m)
+
+        self.assertIsNotNone(dynamic_m)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_linear.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_linear.py
new file mode 100644
index 0000000000000000000000000000000000000000..aef840c2c156bc5c1cfb106d92dc92abfc840fba
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_linear.py
@@ -0,0 +1,113 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+from unittest.mock import MagicMock
+
+import pytest
+import torch
+from torch import nn
+
+from mmrazor.models.mutables import SquentialMutableChannel
+from mmrazor.structures.subnet import export_fix_subnet, load_fix_subnet
+from ..utils import fix_dynamic_op
+
+from mmrazor.models.architectures.dynamic_ops import (  # isort:skip
+    DynamicLinear, DynamicLinearMixin)
+
+
+@pytest.mark.parametrize('bias', [True, False])
+def test_dynamic_linear(bias) -> None:
+    mutable_in_features = SquentialMutableChannel(10)
+    mutable_out_features = SquentialMutableChannel(10)
+
+    d_linear = DynamicLinear(in_features=10, out_features=10, bias=bias)
+
+    mock_mutable = MagicMock()
+    with pytest.raises(ValueError):
+        d_linear.register_mutable_attr('in_features', mock_mutable)
+    with pytest.raises(ValueError):
+        d_linear.register_mutable_attr('out_features', mock_mutable)
+
+    mock_mutable.current_mask = torch.rand(8)
+    with pytest.raises(ValueError):
+        d_linear.register_mutable_attr('in_features', mock_mutable)
+    with pytest.raises(ValueError):
+        d_linear.register_mutable_attr('out_features', mock_mutable)
+
+    d_linear.register_mutable_attr('in_features', mutable_in_features)
+    d_linear.register_mutable_attr('out_features', mutable_out_features)
+
+    with pytest.raises(RuntimeError):
+        d_linear.to_static_op()
+
+    d_linear.get_mutable_attr('in_channels').current_choice = 8
+    d_linear.get_mutable_attr('out_channels').current_choice = 4
+
+    x = torch.rand(10, 8)
+    out1 = d_linear(x)
+    assert out1.size(1) == 4
+
+    with pytest.raises(RuntimeError):
+        _ = d_linear.to_static_op()
+
+    fix_mutables = export_fix_subnet(d_linear)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_linear, fix_mutables)
+    fix_dynamic_op(d_linear, fix_mutables)
+    assert isinstance(d_linear, nn.Linear)
+    assert isinstance(d_linear, DynamicLinearMixin)
+
+    s_linear = d_linear.to_static_op()
+    assert s_linear.weight.size(0) == 4
+    assert s_linear.weight.size(1) == 8
+    if bias:
+        assert s_linear.bias.size(0) == 4
+    assert not isinstance(s_linear, DynamicLinearMixin)
+    assert isinstance(s_linear, nn.Linear)
+    out2 = s_linear(x)
+
+    assert torch.equal(out1, out2)
+
+
+@pytest.mark.parametrize(
+    ['is_mutate_in_features', 'in_features', 'out_features'], [(True, 6, 10),
+                                                               (False, 10, 4),
+                                                               (None, 10, 10)])
+def test_dynamic_linear_mutable_single_features(
+        is_mutate_in_features: Optional[bool], in_features: int,
+        out_features: int) -> None:
+    d_linear = DynamicLinear(in_features=10, out_features=10, bias=True)
+    mutable_channels = SquentialMutableChannel(10)
+
+    if is_mutate_in_features is not None:
+        if is_mutate_in_features:
+            d_linear.register_mutable_attr('in_channels', mutable_channels)
+        else:
+            d_linear.register_mutable_attr('out_channels', mutable_channels)
+
+    if is_mutate_in_features:
+        d_linear.get_mutable_attr('in_channels').current_choice = in_features
+        assert d_linear.get_mutable_attr('out_channels') is None
+    elif is_mutate_in_features is False:
+        d_linear.get_mutable_attr('out_channels').current_choice = out_features
+        assert d_linear.get_mutable_attr('in_channels') is None
+
+    x = torch.rand(3, in_features)
+    out1 = d_linear(x)
+
+    assert out1.size(1) == out_features
+
+    if is_mutate_in_features is not None:
+        with pytest.raises(RuntimeError):
+            _ = d_linear.to_static_op()
+
+    fix_mutables = export_fix_subnet(d_linear)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_linear, fix_mutables)
+    fix_dynamic_op(d_linear, fix_mutables)
+
+    s_linear = d_linear.to_static_op()
+    assert s_linear.weight.size(0) == out_features
+    assert s_linear.weight.size(1) == in_features
+    out2 = s_linear(x)
+
+    assert torch.equal(out1, out2)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_norm.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_norm.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a5319a0988c760ba162eb8118da559b00e5f39e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_norm.py
@@ -0,0 +1,154 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+from typing import Tuple, Type
+from unittest.mock import MagicMock
+
+import pytest
+import torch
+import torch.distributed as dist
+from torch import nn
+
+from mmrazor.models.architectures.dynamic_ops import (DynamicBatchNorm1d,
+                                                      DynamicBatchNorm2d,
+                                                      DynamicBatchNorm3d,
+                                                      DynamicMixin,
+                                                      DynamicSyncBatchNorm)
+from mmrazor.models.mutables import SquentialMutableChannel
+from mmrazor.structures.subnet import export_fix_subnet, load_fix_subnet
+from ..utils import fix_dynamic_op
+
+
+@pytest.mark.parametrize('dynamic_class,input_shape',
+                         [(DynamicBatchNorm1d, (10, 8, 224)),
+                          (DynamicBatchNorm2d, (10, 8, 224, 224)),
+                          (DynamicBatchNorm3d, (10, 8, 3, 224, 224))])
+@pytest.mark.parametrize('affine', [True, False])
+@pytest.mark.parametrize('track_running_stats', [True, False])
+def test_dynamic_bn(dynamic_class: Type[nn.modules.batchnorm._BatchNorm],
+                    input_shape: Tuple[int], affine: bool,
+                    track_running_stats: bool) -> None:
+    mutable_num_features = SquentialMutableChannel(10)
+
+    d_bn = dynamic_class(
+        num_features=10,
+        affine=affine,
+        track_running_stats=track_running_stats)
+    if not affine and not track_running_stats:
+        with pytest.raises(RuntimeError):
+            d_bn.register_mutable_attr('num_features', mutable_num_features)
+    else:
+        mock_mutable = MagicMock()
+        with pytest.raises(ValueError):
+            d_bn.register_mutable_attr('num_features', mock_mutable)
+        mock_mutable.current_mask = torch.rand(5)
+        with pytest.raises(ValueError):
+            d_bn.register_mutable_attr('num_features', mock_mutable)
+
+        d_bn.register_mutable_attr('num_features', mutable_num_features)
+        assert d_bn.get_mutable_attr('in_channels') is d_bn.get_mutable_attr(
+            'out_channels')
+
+    if affine or track_running_stats:
+        d_bn.get_mutable_attr('in_channels').current_choice = 8
+
+    with pytest.raises(ValueError):
+        wrong_shape_x = torch.rand(8)
+        _ = d_bn(wrong_shape_x)
+
+    x = torch.rand(*input_shape)
+    out1 = d_bn(x)
+    assert out1.size(1) == 8
+
+    fix_mutables = export_fix_subnet(d_bn)[0]
+    with pytest.raises(RuntimeError):
+        load_fix_subnet(d_bn, fix_mutables)
+    fix_dynamic_op(d_bn, fix_mutables)
+    assert isinstance(d_bn, dynamic_class)
+    assert isinstance(d_bn, DynamicMixin)
+
+    s_bn = d_bn.to_static_op()
+    if affine:
+        assert s_bn.weight.size(0) == 8
+        assert s_bn.bias.size(0) == 8
+    if track_running_stats:
+        assert s_bn.running_mean.size(0) == 8
+        assert s_bn.running_var.size(0) == 8
+    assert not isinstance(s_bn, DynamicMixin)
+    assert isinstance(s_bn, d_bn.static_op_factory)
+    out2 = s_bn(x)
+
+    assert torch.equal(out1, out2)
+
+
+@pytest.mark.parametrize(['static_class', 'dynamic_class', 'input_shape'],
+                         [(nn.BatchNorm1d, DynamicBatchNorm1d, (10, 8, 224)),
+                          (nn.BatchNorm2d, DynamicBatchNorm2d,
+                           (10, 8, 224, 224)),
+                          (nn.BatchNorm3d, DynamicBatchNorm3d,
+                           (10, 8, 3, 224, 224))])
+def test_bn_track_running_stats(
+    static_class: Type[nn.modules.batchnorm._BatchNorm],
+    dynamic_class: Type[nn.modules.batchnorm._BatchNorm],
+    input_shape: Tuple[int],
+) -> None:
+    mutable_num_features = SquentialMutableChannel(10)
+    mutable_num_features.current_choice = 8
+    d_bn = dynamic_class(
+        num_features=10, track_running_stats=True, affine=False)
+    d_bn.register_mutable_attr('num_features', mutable_num_features)
+
+    s_bn = static_class(num_features=8, track_running_stats=True, affine=False)
+
+    d_bn.train()
+    s_bn.train()
+    mask = d_bn._get_num_features_mask()
+    for _ in range(10):
+        x = torch.rand(*input_shape)
+        _ = d_bn(x)
+        _ = s_bn(x)
+
+        d_running_mean = d_bn.running_mean[mask]
+        d_running_var = d_bn.running_var[mask]
+
+        assert torch.equal(s_bn.running_mean, d_running_mean)
+        assert torch.equal(s_bn.running_var, d_running_var)
+
+    d_bn.eval()
+    s_bn.eval()
+    x = torch.rand(*input_shape)
+
+    assert torch.equal(d_bn(x), s_bn(x))
+
+
+class TestDynamicSyncBn(unittest.TestCase):
+
+    def test_init(self):
+        if not torch.cuda.is_available():
+            self.skipTest('no cuda')
+        import os
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = '12355'
+
+        # initialize the process group
+        if torch.cuda.is_available():
+            backend = 'nccl'
+            device = torch.device('cuda:0')
+        else:
+            backend = 'gloo'
+            device = torch.device('cpu')
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+        x = torch.rand([2, 8, 224, 224]).to(device)
+        norm = DynamicSyncBatchNorm(8).to(device)
+        _ = norm(x)
+
+        mutable_num_features = SquentialMutableChannel(8)
+        mutable_num_features.current_choice = 4
+        norm.register_mutable_attr('in_channels', mutable_num_features)
+
+        with pytest.raises(Exception):
+            norm(x)
+
+        x = torch.rand([2, 4, 32, 32]).to(device)
+        _ = norm(x)
+        dist.destroy_process_group()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_relative_position.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_relative_position.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f82fe1d3aaa66728faefb30f7222f20022b25ff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_relative_position.py
@@ -0,0 +1,55 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+
+from mmrazor.models.architectures.dynamic_ops import DynamicRelativePosition2D
+from mmrazor.models.architectures.ops import RelativePosition2D
+from mmrazor.models.mutables import SquentialMutableChannel
+
+
+class TestDynamicRP(TestCase):
+
+    def setUp(self) -> None:
+        mutable_head_dims = SquentialMutableChannel(num_channels=8)
+
+        self.dynamic_rp = DynamicRelativePosition2D(
+            head_dims=8, max_relative_position=14)
+
+        mutable_head_dims.current_choice = 6
+        self.dynamic_rp.register_mutable_attr('head_dims', mutable_head_dims)
+
+    def test_mutable_attrs(self) -> None:
+
+        assert self.dynamic_rp.mutable_head_dims.current_choice == 6
+
+        embed = self.dynamic_rp.forward(14, 14)
+
+        self.assertIsNotNone(embed)
+
+    def test_convert(self):
+        static_model = RelativePosition2D(
+            head_dims=10, max_relative_position=14)
+
+        dynamic_model = DynamicRelativePosition2D.convert_from(static_model)
+
+        self.assertIsNotNone(dynamic_model)
+
+    def test_to_static_op(self):
+        with pytest.raises(RuntimeError):
+            static_m = self.dynamic_rp.to_static_op()
+
+        mutable = SquentialMutableChannel(num_channels=8)
+        mutable.current_choice = 4
+
+        mutable.fix_chosen(mutable.dump_chosen().chosen)
+
+        self.dynamic_rp.register_mutable_attr('head_dims', mutable)
+        static_m = self.dynamic_rp.to_static_op()
+
+        self.assertIsNotNone(static_m)
+
+        dynamic_output = self.dynamic_rp.forward(14, 14)
+        static_output = static_m.forward(14, 14)
+        self.assertTrue(torch.equal(dynamic_output, static_output))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_resizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_resizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7fc93094ec2d333e409a00f950f7aa4e2422ace1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/test_bricks/test_dynamic_resizer.py
@@ -0,0 +1,81 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+
+from mmrazor.models.architectures.dynamic_ops import DynamicInputResizer
+from mmrazor.models.architectures.ops import InputResizer
+from mmrazor.models.mutables import OneShotMutableValue
+from mmrazor.registry import MODELS
+
+_INPUT_MUTABLE = dict(
+    input_resizer=dict(type='DynamicInputResizer'),
+    mutable_shape=dict(
+        type='OneShotMutableValue',
+        value_list=[[192, 192], [224, 224], [256, 256], [288, 288]],
+        default_value=[224, 224]))
+
+
+class TestInputResizer(TestCase):
+
+    def setUp(self):
+        input_resizer_cfg_ = _INPUT_MUTABLE['input_resizer']
+        self.dynamic_input_resizer = MODELS.build(input_resizer_cfg_)
+
+        if not isinstance(self.dynamic_input_resizer, DynamicInputResizer):
+            raise TypeError('input_resizer should be a `dict` or '
+                            '`DynamicInputResizer` instance, but got '
+                            f'{type(self.dynamic_input_resizer)}')
+
+        self.mutable_shape = OneShotMutableValue(
+            value_list=[[192, 192], [224, 224], [256, 256], [288, 288]],
+            default_value=[224, 224])
+
+        self.dynamic_input_resizer.register_mutable_attr(
+            'shape', self.mutable_shape)
+
+        self.assertTrue(
+            self.dynamic_input_resizer.get_mutable_attr('shape').current_choice
+            == [224, 224])
+
+    def test_convert(self):
+        static_m = InputResizer()
+
+        dynamic_m = DynamicInputResizer.convert_from(static_m)
+
+        self.assertIsNotNone(dynamic_m)
+
+    def test_to_static_op(self):
+        input = torch.randn(1, 3, 224, 224)
+
+        mutable_shape = OneShotMutableValue(
+            value_list=[192, 224, 256], default_value=224)
+        mutable_shape.current_choice = 192
+
+        with pytest.raises(RuntimeError):
+            self.dynamic_input_resizer.to_static_op()
+
+        mutable_shape.fix_chosen(mutable_shape.dump_chosen().chosen)
+        self.dynamic_input_resizer.register_mutable_attr(
+            'shape', mutable_shape)
+        static_op = self.dynamic_input_resizer.to_static_op()
+        x = static_op(input)
+        static_m = InputResizer()
+        output = static_m(input, mutable_shape.current_choice)
+        self.assertTrue(torch.equal(x, output))
+
+        mutable_shape = OneShotMutableValue(
+            value_list=[[192, 192], [224, 224], [256, 256], [288, 288]],
+            default_value=[224, 224])
+        mutable_shape.current_choice = [192, 192]
+        mutable_shape.fix_chosen(mutable_shape.dump_chosen().chosen)
+        self.dynamic_input_resizer.register_mutable_attr(
+            'shape', mutable_shape)
+
+        static_op = self.dynamic_input_resizer.to_static_op()
+        self.assertIsNotNone(static_op)
+        x = self.dynamic_input_resizer(input)
+        assert torch.equal(
+            self.dynamic_input_resizer(input),
+            static_op(input, mutable_shape.current_choice))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/utils.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..e448f300e857fcbcb25f41b73aed5e8b0c1d9f9e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_dynamic_op/utils.py
@@ -0,0 +1,20 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional
+
+from mmrazor.models.architectures.dynamic_ops import DynamicMixin
+from mmrazor.utils.typing import DumpChosen
+
+
+def fix_dynamic_op(op: DynamicMixin,
+                   fix_mutables: Optional[Dict] = None) -> None:
+    for name, mutable in op.mutable_attrs.items():
+
+        if fix_mutables is not None:
+            chosen = fix_mutables[f'mutable_attrs.{name}']
+        else:
+            chosen = mutable.dump_chosen()
+
+        if not isinstance(chosen, DumpChosen):
+            chosen = DumpChosen(**chosen)
+
+        mutable.fix_chosen(chosen.chosen)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_generators/test_generators.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_generators/test_generators.py
new file mode 100644
index 0000000000000000000000000000000000000000..93548c067cbd22a841773b0646f4f14c57b21ae4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_architectures/test_generators/test_generators.py
@@ -0,0 +1,73 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmrazor.models import DAFLGenerator, ZSKTGenerator
+
+
+def test_dafl_generator():
+    dafl_generator = DAFLGenerator(
+        img_size=32, latent_dim=10, hidden_channels=32)
+    z_batch = torch.randn(8, 10)
+    fake_img = dafl_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+    with pytest.raises(AssertionError):
+        z_batch = torch.randn(8, 11)
+        fake_img = dafl_generator(z_batch)
+    with pytest.raises(ValueError):
+        z_batch = torch.randn(8, 10, 1, 1)
+        fake_img = dafl_generator(z_batch)
+
+    fake_img = dafl_generator(batch_size=8)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    # scale_factor = 4
+    dafl_generator = DAFLGenerator(
+        img_size=32, latent_dim=10, hidden_channels=32, scale_factor=4)
+    z_batch = torch.randn(8, 10)
+    fake_img = dafl_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    # hidden_channels=64
+    dafl_generator = DAFLGenerator(
+        img_size=32, latent_dim=10, hidden_channels=64)
+    z_batch = torch.randn(8, 10)
+    fake_img = dafl_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    with pytest.raises(AssertionError):
+        fake_img = dafl_generator(data=None, batch_size=0)
+
+
+def test_zskt_generator():
+    zskt_generator = ZSKTGenerator(
+        img_size=32, latent_dim=10, hidden_channels=32)
+    z_batch = torch.randn(8, 10)
+    fake_img = zskt_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+    with pytest.raises(AssertionError):
+        z_batch = torch.randn(8, 11)
+        fake_img = zskt_generator(z_batch)
+    with pytest.raises(ValueError):
+        z_batch = torch.randn(8, 10, 1, 1)
+        fake_img = zskt_generator(z_batch)
+
+    fake_img = zskt_generator(batch_size=8)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    # scale_factor = 4
+    zskt_generator = ZSKTGenerator(
+        img_size=32, latent_dim=10, hidden_channels=32, scale_factor=4)
+    z_batch = torch.randn(8, 10)
+    fake_img = zskt_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    # hidden_channels=64
+    zskt_generator = ZSKTGenerator(
+        img_size=32, latent_dim=10, hidden_channels=64)
+    z_batch = torch.randn(8, 10)
+    fake_img = zskt_generator(z_batch)
+    assert fake_img.size() == torch.Size([8, 3, 32, 32])
+
+    with pytest.raises(AssertionError):
+        fake_img = zskt_generator(data=None, batch_size=0)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_classifier/test_imageclassifier.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_classifier/test_imageclassifier.py
new file mode 100644
index 0000000000000000000000000000000000000000..169d3499575a217418a01221042b7a309933352c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_classifier/test_imageclassifier.py
@@ -0,0 +1,45 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models import SearchableImageClassifier
+
+
+class TestSearchableImageClassifier(TestCase):
+
+    def test_init(self):
+
+        arch_setting = dict(
+            mlp_ratios=[3.0, 3.5, 4.0],
+            num_heads=[8, 9, 10],
+            depth=[14, 15, 16],
+            embed_dims=[528, 576, 624])
+
+        supernet_kwargs = dict(
+            backbone=dict(
+                _scope_='mmrazor',
+                type='AutoformerBackbone',
+                arch_setting=arch_setting),
+            neck=None,
+            head=dict(
+                _scope_='mmrazor',
+                type='DynamicLinearClsHead',
+                num_classes=1000,
+                in_channels=624,
+                loss=dict(
+                    type='mmcls.LabelSmoothLoss',
+                    mode='original',
+                    num_classes=1000,
+                    label_smooth_val=0.1,
+                    loss_weight=1.0),
+                topk=(1, 5)),
+            connect_head=dict(connect_with_backbone='backbone.last_mutable'),
+        )
+
+        supernet = SearchableImageClassifier(**supernet_kwargs)
+
+        # test connect_with_backbone
+        self.assertEqual(
+            supernet.backbone.last_mutable.activated_channels,
+            len(
+                supernet.head.fc.get_mutable_attr(
+                    'in_channels').current_choice))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_byot_distill.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_byot_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..c75381c7fe0ddeab7e8946b887b158fa489dc912
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_byot_distill.py
@@ -0,0 +1,69 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+from mmengine import ConfigDict
+
+from mmrazor.models import BYOTDistiller
+
+
+class TestBYOTDistiller(TestCase):
+
+    def test_init(self):
+
+        student_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+        teacher_recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        distiller_kwargs = ConfigDict(
+            student_recorders=student_recorders_cfg,
+            teacher_recorders=teacher_recorders_cfg,
+            distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+            loss_forward_mappings=dict(
+                loss_toy=dict(
+                    arg1=dict(from_student=True, recorder='conv'),
+                    arg2=dict(from_student=False, recorder='conv'),
+                )),
+        )
+
+        _ = BYOTDistiller(**distiller_kwargs)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['distill_losses'] = None
+        with self.assertRaisesRegex(AssertionError,
+                                    '"loss_toy" is not in distill'):
+            _ = BYOTDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['distill_losses'] = dict(
+            toy=dict(type='ToyDistillLoss'))
+        distiller_kwargs_['loss_forward_mappings'] = dict(
+            toy=dict(
+                arg1=dict(from_student=True, recorder='conv'),
+                arg2=dict(from_student=False, recorder='conv')))
+        with self.assertWarnsRegex(UserWarning, 'Warning: If toy is a'):
+            _ = BYOTDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings'] = None
+        _ = BYOTDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings'] = list('AAA')
+
+        with self.assertRaisesRegex(TypeError,
+                                    'loss_forward_mappings should be '):
+            _ = BYOTDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings']['loss_toy'] = list()
+        with self.assertRaisesRegex(
+                TypeError, 'Each item of loss_forward_mappings should be '):
+            _ = BYOTDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_.loss_forward_mappings.loss_toy.arg1.from_student = ''
+        with self.assertRaisesRegex(TypeError,
+                                    'from_student should be a bool'):
+            _ = BYOTDistiller(**distiller_kwargs_)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_configurable_distill.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_configurable_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..62e0292d58c1241640719a0afc66b327df2113ae
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_distillers/test_configurable_distill.py
@@ -0,0 +1,115 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine import ConfigDict
+
+from mmrazor.models import ConfigurableDistiller
+from mmrazor.registry import MODELS
+
+
+class ToyDistillLoss(torch.nn.Module):
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, arg1, arg2):
+        return arg1 + arg2
+
+
+class TestConfigurableDistiller(TestCase):
+
+    def setUp(self):
+        MODELS.register_module(module=ToyDistillLoss, force=True)
+
+    def tearDown(self):
+        MODELS.module_dict.pop('ToyDistillLoss')
+
+    def test_init(self):
+
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+
+        distiller_kwargs = ConfigDict(
+            student_recorders=recorders_cfg,
+            teacher_recorders=recorders_cfg,
+            distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+            loss_forward_mappings=dict(
+                loss_toy=dict(
+                    arg1=dict(from_student=True, recorder='conv'),
+                    arg2=dict(from_student=False, recorder='conv'),
+                )),
+        )
+
+        _ = ConfigurableDistiller(**distiller_kwargs)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['distill_losses'] = None
+        with self.assertRaisesRegex(AssertionError,
+                                    '"loss_toy" is not in distill'):
+            _ = ConfigurableDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['distill_losses'] = dict(
+            toy=dict(type='ToyDistillLoss'))
+        distiller_kwargs_['loss_forward_mappings'] = dict(
+            toy=dict(
+                arg1=dict(from_student=True, recorder='conv'),
+                arg2=dict(from_student=False, recorder='conv')))
+        with self.assertWarnsRegex(UserWarning, 'Warning: If toy is a'):
+            _ = ConfigurableDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings'] = None
+        _ = ConfigurableDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings'] = list('AAA')
+
+        with self.assertRaisesRegex(TypeError,
+                                    'loss_forward_mappings should be '):
+            _ = ConfigurableDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_['loss_forward_mappings']['loss_toy'] = list()
+        with self.assertRaisesRegex(
+                TypeError, 'Each item of loss_forward_mappings should be '):
+            _ = ConfigurableDistiller(**distiller_kwargs_)
+
+        distiller_kwargs_ = copy.deepcopy(distiller_kwargs)
+        distiller_kwargs_.loss_forward_mappings.loss_toy.arg1.from_student = ''
+        with self.assertRaisesRegex(TypeError,
+                                    'from_student should be a bool'):
+            _ = ConfigurableDistiller(**distiller_kwargs_)
+
+    def test_connector_list(self):
+        recorders_cfg = ConfigDict(
+            conv=dict(type='ModuleOutputs', source='conv'))
+        norm_cfg = dict(type='BN', affine=False, track_running_stats=False)
+
+        distiller_kwargs = ConfigDict(
+            student_recorders=recorders_cfg,
+            teacher_recorders=recorders_cfg,
+            distill_losses=dict(loss_toy=dict(type='ToyDistillLoss')),
+            loss_forward_mappings=dict(
+                loss_toy=dict(
+                    arg1=dict(
+                        from_student=True,
+                        recorder='conv',
+                        connector='loss_1_sfeat'),
+                    arg2=dict(from_student=False, recorder='conv'),
+                )),
+            connectors=dict(loss_1_sfeat=[
+                dict(
+                    type='ConvModuleConnector',
+                    in_channel=3,
+                    out_channel=4,
+                    act_cfg=None),
+                dict(type='NormConnector', norm_cfg=norm_cfg, in_channels=4)
+            ]))
+
+        distiller = ConfigurableDistiller(**distiller_kwargs)
+        connectors = distiller.connectors
+        self.assertIsInstance(connectors['loss_1_sfeat'], nn.Sequential)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_lsq_fake_quants.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_lsq_fake_quants.py
new file mode 100644
index 0000000000000000000000000000000000000000..dcbda5d40de87442bc937113e7aa84babaffad8b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_lsq_fake_quants.py
@@ -0,0 +1,208 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from torch.nn.parameter import Parameter
+
+from mmrazor import digit_version
+from mmrazor.models import LearnableFakeQuantize
+
+try:
+    from torch.ao.quantization import (MovingAverageMinMaxObserver,
+                                       MovingAveragePerChannelMinMaxObserver)
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    MovingAverageMinMaxObserver = get_placeholder('torch>=1.13')
+    MovingAveragePerChannelMinMaxObserver = get_placeholder('torch>=1.13')
+
+
+class TestLearnableFakeQuantize(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.zero_point_trainable_fakequant = LearnableFakeQuantize.with_args(
+            observer=MovingAverageMinMaxObserver,
+            quant_min=0,
+            quant_max=255,
+            dtype=torch.quint8,
+            qscheme=torch.per_tensor_affine,
+            reduce_range=True,
+            zero_point_trainable=True)
+
+        self.zero_point_untrainable_fakequant = \
+            LearnableFakeQuantize.with_args(
+                observer=MovingAverageMinMaxObserver,
+                quant_min=0,
+                quant_max=255,
+                dtype=torch.quint8,
+                qscheme=torch.per_tensor_affine,
+                reduce_range=True,
+                zero_point_trainable=False)
+
+        self.zero_point_untrainable_per_channel_fakequant = \
+            LearnableFakeQuantize.with_args(
+                observer=MovingAveragePerChannelMinMaxObserver,
+                quant_min=0,
+                quant_max=255,
+                dtype=torch.quint8,
+                qscheme=torch.per_channel_affine,
+                reduce_range=True,
+                zero_point_trainable=False)
+
+    def test_repr(self):
+        fq_module = self.zero_point_untrainable_fakequant()
+        repr_str = f'static_enabled={torch.tensor([1], dtype=torch.uint8)}, '
+        repr_str += f'fake_quant_enabled=' \
+                    f'{torch.tensor([1], dtype=torch.uint8)}, '
+        repr_str += 'quant_min=0, '
+        repr_str += 'quant_max=127, '
+        repr_str += f'dtype={torch.quint8}, '
+        repr_str += f'qscheme={torch.per_tensor_affine}, '
+        repr_str += f'scale={Parameter(torch.tensor([1.0]))}, '
+        repr_str += f'zero_point={torch.tensor([0.])}, '
+        repr_str += 'zero_point_trainable=False'
+        self.assertEqual(fq_module.extra_repr(), repr_str)
+
+        fq_module = self.zero_point_trainable_fakequant()
+        repr_str = f'static_enabled={torch.tensor([1], dtype=torch.uint8)}, '
+        repr_str += f'fake_quant_enabled=' \
+                    f'{torch.tensor([1], dtype=torch.uint8)}, '
+        repr_str += 'quant_min=0, '
+        repr_str += 'quant_max=127, '
+        repr_str += f'dtype={torch.quint8}, '
+        repr_str += f'qscheme={torch.per_tensor_affine}, '
+        repr_str += f'scale={Parameter(torch.tensor([1.0]))}, '
+        repr_str += f'zero_point={Parameter(torch.tensor([0.]))}, '
+        repr_str += 'zero_point_trainable=True'
+        self.assertEqual(fq_module.extra_repr(), repr_str)
+
+    def test_calculate_qparams(self):
+        fq_module = self.zero_point_untrainable_fakequant()
+        scale, zero_point = fq_module.calculate_qparams()
+        self.assertEqual(scale, 1.)
+        self.assertEqual(zero_point, 0.)
+
+        fq_module = self.zero_point_trainable_fakequant()
+        scale, zero_point = fq_module.calculate_qparams()
+        self.assertEqual(scale, 1.)
+        self.assertEqual(zero_point, 0.)
+
+    def test_forward(self):
+        fq_module = self.zero_point_untrainable_fakequant()
+        torch.manual_seed(42)
+        X = torch.rand(20, 10, dtype=torch.float32)
+        # Output of fake quant is not identical to input
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # self.assertNotEqual(Y, X)
+        fq_module.toggle_fake_quant(False)
+        X = torch.rand(20, 10, dtype=torch.float32)
+        Y = fq_module(X)
+        # Fake quant is disabled,output is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+        # Explicit copy at this point in time, because FakeQuant keeps internal
+        # state in mutable buffers.
+        scale = fq_module.scale.clone().detach()
+        zero_point = fq_module.zero_point.clone().detach()
+
+        fq_module.toggle_observer_update(False)
+        fq_module.toggle_fake_quant(True)
+        X = 10.0 * torch.rand(20, 10, dtype=torch.float32) - 5.0
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # Observer is disabled, scale and zero-point do not change
+        self.assertEqual(fq_module.scale, scale)
+        self.assertEqual(fq_module.zero_point, zero_point)
+
+        fq_module.toggle_observer_update(True)
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # Observer is enabled, scale and zero-point are different
+        self.assertNotEqual(fq_module.scale, scale)
+        self.assertNotEqual(fq_module.zero_point, zero_point)
+
+        fq_module = self.zero_point_trainable_fakequant()
+        torch.manual_seed(42)
+        X = torch.rand(20, 10, dtype=torch.float32)
+        # Output of fake quant is not identical to input
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # self.assertNotEqual(Y, X)
+        fq_module.toggle_fake_quant(False)
+        X = torch.rand(20, 10, dtype=torch.float32)
+        Y = fq_module(X)
+        # Fake quant is disabled,output is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+        # Explicit copy at this point in time, because FakeQuant keeps internal
+        # state in mutable buffers.
+        scale = fq_module.scale.clone().detach()
+        zero_point = fq_module.zero_point.clone().detach()
+
+        fq_module.toggle_observer_update(False)
+        fq_module.toggle_fake_quant(True)
+        X = 10.0 * torch.rand(20, 10, dtype=torch.float32) - 5.0
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # Observer is disabled, scale and zero-point do not change
+        self.assertEqual(fq_module.scale, scale)
+        self.assertEqual(fq_module.zero_point, zero_point)
+
+        fq_module.toggle_observer_update(True)
+        Y = fq_module(X)
+        self.assertFalse(torch.equal(Y, X))
+        # Observer is enabled, scale and zero-point are different
+        self.assertNotEqual(fq_module.scale, scale)
+        self.assertNotEqual(fq_module.zero_point, zero_point)
+
+    def test_state(self):
+        fq_module = self.zero_point_untrainable_fakequant()
+
+        fq_module.enable_param_learning()
+        self.assertEqual(fq_module.learning_enabled[0], 1)
+        self.assertEqual(fq_module.scale.requires_grad, 1)
+        self.assertEqual(fq_module.zero_point.requires_grad, 0)
+        self.assertEqual(fq_module.fake_quant_enabled[0], 1)
+        self.assertEqual(fq_module.static_enabled[0], 0)
+
+        fq_module.enable_static_estimate()
+        self.assertEqual(fq_module.learning_enabled[0], 0)
+        self.assertEqual(fq_module.scale.requires_grad, 0)
+        self.assertEqual(fq_module.zero_point.requires_grad, 0)
+        self.assertEqual(fq_module.fake_quant_enabled[0], 1)
+        self.assertEqual(fq_module.static_enabled[0], 1)
+
+        fq_module.enable_val()
+        self.assertEqual(fq_module.learning_enabled[0], 0)
+        self.assertEqual(fq_module.scale.requires_grad, 0)
+        self.assertEqual(fq_module.zero_point.requires_grad, 0)
+        self.assertEqual(fq_module.fake_quant_enabled[0], 1)
+        self.assertEqual(fq_module.static_enabled[0], 0)
+
+        fq_module.enable_static_observation()
+        self.assertEqual(fq_module.learning_enabled[0], 0)
+        self.assertEqual(fq_module.scale.requires_grad, 0)
+        self.assertEqual(fq_module.zero_point.requires_grad, 0)
+        self.assertEqual(fq_module.fake_quant_enabled[0], 0)
+        self.assertEqual(fq_module.static_enabled[0], 1)
+
+        fq_module = self.zero_point_trainable_fakequant()
+
+        fq_module.enable_param_learning()
+        self.assertEqual(fq_module.learning_enabled[0], 1)
+        self.assertEqual(fq_module.scale.requires_grad, 1)
+        self.assertEqual(fq_module.zero_point.requires_grad, 1)
+        self.assertEqual(fq_module.fake_quant_enabled[0], 1)
+        self.assertEqual(fq_module.static_enabled[0], 0)
+
+    def test_load_state_dict(self):
+        fq_module = self.zero_point_untrainable_per_channel_fakequant()
+        state_dict = fq_module.state_dict()
+        X = torch.rand(32, 16, 3, 3, dtype=torch.float32)
+        # After forwarding, the shape of `scale` and `zero_point` in
+        # `fq_module` will be in shape (32, ), while the shape of those in
+        # `state_dict` are in shape (1, ).
+        _ = fq_module(X)
+        fq_module.load_state_dict(state_dict)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_torch_fake_quants.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_torch_fake_quants.py
new file mode 100644
index 0000000000000000000000000000000000000000..485113e9059e79380b4b64cdfdc106b77d037a15
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_fake_quants/test_torch_fake_quants.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models.fake_quants import register_torch_fake_quants
+from mmrazor.registry import MODELS
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_register_torch_fake_quants():
+
+    TORCH_fake_quants = register_torch_fake_quants()
+    assert isinstance(TORCH_fake_quants, list)
+    for fake_quant in TORCH_fake_quants:
+        assert MODELS.get(fake_quant)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_distillation_losses.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_distillation_losses.py
new file mode 100644
index 0000000000000000000000000000000000000000..77233b81fce0cffa1c0bce23a3ba60bdeed31133
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_distillation_losses.py
@@ -0,0 +1,213 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmengine.structures import BaseDataElement
+
+from mmrazor import digit_version
+from mmrazor.models import (ABLoss, ActivationLoss, ATLoss, CRDLoss, DKDLoss,
+                            FBKDLoss, FTLoss, InformationEntropyLoss,
+                            KDSoftCELoss, MGDLoss, OFDLoss, OnehotLikeLoss,
+                            PKDLoss)
+
+
+class TestLosses(TestCase):
+
+    @classmethod
+    def setUpClass(cls):
+        cls.feats_1d = torch.randn(5, 6)
+        cls.feats_2d = torch.randn(5, 2, 3)
+        cls.feats_3d = torch.randn(5, 2, 3, 3)
+
+        num_classes = 6
+        cls.labels = torch.randint(0, num_classes, [5])
+
+    def test_ofd_loss(self):
+        ofd_loss = OFDLoss()
+        self.normal_test_1d(ofd_loss)
+        self.normal_test_3d(ofd_loss)
+
+        # test the calculation
+        s_feat_0 = torch.Tensor([[1, 1], [2, 2], [3, 3]])
+        t_feat_0 = torch.Tensor([[0, 0], [1, 1], [2, 2]])
+        ofd_loss_num_0 = ofd_loss.forward(s_feat_0, t_feat_0)
+        assert ofd_loss_num_0 != torch.tensor(0.0)
+
+        s_feat_1 = torch.Tensor([[1, 1], [2, 2], [3, 3]])
+        t_feat_1 = torch.Tensor([[2, 2], [3, 3], [4, 4]])
+        ofd_loss_num_1 = ofd_loss.forward(s_feat_1, t_feat_1)
+        assert ofd_loss_num_1 != torch.tensor(0.0)
+
+        s_feat_2 = torch.Tensor([[-3, -3], [-2, -2], [-1, -1]])
+        t_feat_2 = torch.Tensor([[-2, -2], [-1, -1], [0, 0]])
+        ofd_loss_num_2 = ofd_loss.forward(s_feat_2, t_feat_2)
+        assert ofd_loss_num_2 == torch.tensor(0.0)
+
+    def normal_test_1d(self, loss_instance, labels=False):
+        args = tuple([self.feats_1d, self.feats_1d])
+        if labels:
+            args += (self.labels, )
+        loss_1d = loss_instance.forward(*args)
+        self.assertTrue(loss_1d.numel() == 1)
+
+    def normal_test_2d(self, loss_instance, labels=False):
+        args = tuple([self.feats_2d, self.feats_2d])
+        if labels:
+            args += (self.labels, )
+        loss_2d = loss_instance.forward(*args)
+        self.assertTrue(loss_2d.numel() == 1)
+
+    def normal_test_3d(self, loss_instance, labels=False):
+        args = tuple([self.feats_3d, self.feats_3d])
+        if labels:
+            args += (self.labels, )
+        loss_3d = loss_instance.forward(*args)
+        self.assertTrue(loss_3d.numel() == 1)
+
+    def test_ab_loss(self):
+        ab_loss_cfg = dict(loss_weight=1.0, margin=1.0)
+        ab_loss = ABLoss(**ab_loss_cfg)
+        self.normal_test_1d(ab_loss)
+        self.normal_test_2d(ab_loss)
+        self.normal_test_3d(ab_loss)
+
+    def _mock_crd_data_sample(self, sample_idx_list):
+        data_samples = []
+        for _idx in sample_idx_list:
+            data_sample = BaseDataElement()
+            data_sample.set_data(dict(sample_idx=_idx))
+            data_samples.append(data_sample)
+        return data_samples
+
+    def test_crd_loss(self):
+        crd_loss = CRDLoss(**dict(neg_num=5, sample_n=10, dim_out=6))
+        sample_idx_list = torch.tensor(list(range(5)))
+        data_samples = self._mock_crd_data_sample(sample_idx_list)
+        loss = crd_loss.forward(self.feats_1d, self.feats_1d, data_samples)
+        self.assertTrue(loss.numel() == 1)
+
+        # test the calculation
+        s_feat_0 = torch.randn((5, 6))
+        t_feat_0 = torch.randn((5, 6))
+        crd_loss_num_0 = crd_loss.forward(s_feat_0, t_feat_0, data_samples)
+        assert crd_loss_num_0 != torch.tensor(0.0)
+
+        s_feat_1 = torch.randn((5, 6))
+        t_feat_1 = torch.rand((5, 6))
+        sample_idx_list_1 = torch.tensor(list(range(5)))
+        data_samples_1 = self._mock_crd_data_sample(sample_idx_list_1)
+        crd_loss_num_1 = crd_loss.forward(s_feat_1, t_feat_1, data_samples_1)
+        assert crd_loss_num_1 != torch.tensor(0.0)
+
+    def test_dkd_loss(self):
+        dkd_loss_cfg = dict(loss_weight=1.0)
+        dkd_loss = DKDLoss(**dkd_loss_cfg)
+        # dkd requires label logits
+        self.normal_test_1d(dkd_loss, labels=True)
+
+    def test_ft_loss(self):
+        ft_loss_cfg = dict(loss_weight=1.0)
+        ft_loss = FTLoss(**ft_loss_cfg)
+
+        assert ft_loss.loss_weight == 1.0
+
+        self.normal_test_1d(ft_loss)
+        self.normal_test_2d(ft_loss)
+        self.normal_test_3d(ft_loss)
+
+    def test_dafl_loss(self):
+        dafl_loss_cfg = dict(loss_weight=1.0)
+        ac_loss = ActivationLoss(**dafl_loss_cfg, norm_type='abs')
+        oh_loss = OnehotLikeLoss(**dafl_loss_cfg)
+        ie_loss = InformationEntropyLoss(**dafl_loss_cfg, gather=False)
+
+        # normal test with only one input
+        loss_ac = ac_loss.forward(self.feats_1d)
+        self.assertTrue(loss_ac.numel() == 1)
+        loss_oh = oh_loss.forward(self.feats_1d)
+        self.assertTrue(loss_oh.numel() == 1)
+        loss_ie = ie_loss.forward(self.feats_1d)
+        self.assertTrue(loss_ie.numel() == 1)
+
+        with self.assertRaisesRegex(AssertionError,
+                                    '"norm_type" must be "norm" or "abs"'):
+            _ = ActivationLoss(**dafl_loss_cfg, norm_type='random')
+
+        # test gather_tensors
+        ie_loss = InformationEntropyLoss(**dafl_loss_cfg, gather=True)
+        ie_loss.world_size = 2
+
+        if digit_version(torch.__version__) >= digit_version('1.8.0'):
+            with self.assertRaisesRegex(
+                    RuntimeError,
+                    'Default process group has not been initialized'):
+                loss_ie = ie_loss.forward(self.feats_1d)
+        else:
+            with self.assertRaisesRegex(
+                    AssertionError,
+                    'Default process group is not initialized'):
+                loss_ie = ie_loss.forward(self.feats_1d)
+
+    def test_kdSoftce_loss(self):
+        kdSoftce_loss_cfg = dict(loss_weight=1.0)
+        kdSoftce_loss = KDSoftCELoss(**kdSoftce_loss_cfg)
+        # kd soft ce loss requires label logits
+        self.normal_test_1d(kdSoftce_loss, labels=True)
+
+    def test_at_loss(self):
+        at_loss_cfg = dict(loss_weight=1.0)
+        at_loss = ATLoss(**at_loss_cfg)
+
+        assert at_loss.loss_weight == 1.0
+
+        self.normal_test_1d(at_loss)
+        self.normal_test_2d(at_loss)
+        self.normal_test_3d(at_loss)
+
+    def test_fbkdloss(self):
+        fbkdloss_cfg = dict(loss_weight=1.0)
+        fbkdloss = FBKDLoss(**fbkdloss_cfg)
+
+        spatial_mask = torch.randn(1, 1, 3, 3)
+        channel_mask = torch.randn(1, 4, 1, 1)
+        channel_pool_adapt = torch.randn(1, 4)
+        relation_adpt = torch.randn(1, 4, 3, 3)
+
+        s_input = (spatial_mask, channel_mask, channel_pool_adapt,
+                   spatial_mask, channel_mask, relation_adpt)
+        t_input = (spatial_mask, channel_mask, spatial_mask, channel_mask,
+                   relation_adpt)
+
+        fbkd_loss = fbkdloss(s_input, t_input)
+        self.assertTrue(fbkd_loss.numel() == 1)
+
+    def test_pkdloss(self):
+        pkd_loss = PKDLoss(loss_weight=1.0)
+        feats_S, feats_T = torch.rand(2, 256, 4, 4), torch.rand(2, 256, 4, 4)
+        loss = pkd_loss(feats_S, feats_T)
+        self.assertTrue(loss.numel() == 1)
+        self.assertTrue(0. <= loss <= 1.)
+
+        num_stages = 4
+        feats_S = (torch.rand(2, 256, 4, 4) for _ in range(num_stages))
+        feats_T = (torch.rand(2, 256, 4, 4) for _ in range(num_stages))
+        loss = pkd_loss(feats_S, feats_T)
+        self.assertTrue(loss.numel() == 1)
+        self.assertTrue(0. <= loss <= num_stages * 1.)
+
+        feats_S, feats_T = torch.rand(2, 256, 2, 2), torch.rand(2, 256, 4, 4)
+        loss = pkd_loss(feats_S, feats_T)
+        self.assertTrue(loss.numel() == 1)
+        self.assertTrue(0. <= loss <= 1.)
+
+        pkd_loss = PKDLoss(loss_weight=1.0, resize_stu=False)
+        feats_S, feats_T = torch.rand(2, 256, 2, 2), torch.rand(2, 256, 4, 4)
+        loss = pkd_loss(feats_S, feats_T)
+        self.assertTrue(loss.numel() == 1)
+        self.assertTrue(0. <= loss <= 1.)
+
+    def test_mgd_loss(self):
+        mgd_loss = MGDLoss(alpha_mgd=0.00002)
+        feats_S, feats_T = torch.rand(2, 256, 4, 4), torch.rand(2, 256, 4, 4)
+        loss = mgd_loss(feats_S, feats_T)
+        self.assertTrue(loss.numel() == 1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_general_losses.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_general_losses.py
new file mode 100644
index 0000000000000000000000000000000000000000..946de27aa6d058b52e71d56f2455df9493703b78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_losses/test_general_losses.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+
+from mmrazor.models import L1Loss, L2Loss
+
+
+class TestLosses(TestCase):
+
+    @classmethod
+    def setUpClass(cls):
+        cls.feats_1d = torch.randn(5, 6)
+        cls.feats_2d = torch.randn(5, 2, 3)
+        cls.feats_3d = torch.randn(5, 2, 3, 3)
+
+    def normal_test_1d(self, loss_instance):
+        loss_1d = loss_instance.forward(self.feats_1d, self.feats_1d)
+        self.assertTrue(loss_1d.numel() == 1)
+
+    def normal_test_2d(self, loss_instance):
+        loss_2d = loss_instance.forward(self.feats_2d, self.feats_2d)
+        self.assertTrue(loss_2d.numel() == 1)
+
+    def normal_test_3d(self, loss_instance):
+        loss_3d = loss_instance.forward(self.feats_3d, self.feats_3d)
+        self.assertTrue(loss_3d.numel() == 1)
+
+    def test_l1_loss(self):
+        l1_loss_cfg = dict(loss_weight=10)
+        l1_loss = L1Loss(**l1_loss_cfg)
+        self.normal_test_1d(l1_loss)
+        self.normal_test_2d(l1_loss)
+        self.normal_test_3d(l1_loss)
+
+        l1_loss_cfg = dict(loss_weight=10, reduction='avg')
+        with pytest.raises(AssertionError):
+            l1_loss = L1Loss(**l1_loss_cfg)
+
+    def test_l2_loss(self):
+        l2_loss_cfg = dict(loss_weight=10, normalize=True)
+        l2_loss = L2Loss(**l2_loss_cfg)
+        self.normal_test_1d(l2_loss)
+        self.normal_test_2d(l2_loss)
+        self.normal_test_3d(l2_loss)
+
+        l2_loss_cfg['div_element'] = True
+        l2_loss = L2Loss(**l2_loss_cfg)
+        self.normal_test_1d(l2_loss)
+        self.normal_test_2d(l2_loss)
+        self.normal_test_3d(l2_loss)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_derived_mutable.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_derived_mutable.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ec7c5cd571e70b0868d51494773bbd58622c7e5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_derived_mutable.py
@@ -0,0 +1,318 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+
+from mmrazor.models.mutables import (DerivedMutable, OneShotMutableValue,
+                                     SquentialMutableChannel)
+from mmrazor.models.mutables.base_mutable import BaseMutable
+
+
+class TestDerivedMutable(TestCase):
+
+    def test_is_fixed(self) -> None:
+        mc = SquentialMutableChannel(num_channels=10)
+        mc.current_choice = 2
+
+        mv = OneShotMutableValue(value_list=[2, 3, 4])
+        mv.current_choice = 3
+
+        derived_mutable = mc * mv
+        assert not derived_mutable.is_fixed
+
+        with pytest.raises(RuntimeError):
+            derived_mutable.is_fixed = True
+
+        mc.fix_chosen(mc.dump_chosen().chosen)
+        assert not derived_mutable.is_fixed
+        mv.fix_chosen(mv.dump_chosen().chosen)
+        assert derived_mutable.is_fixed
+
+    def test_fix_dump_chosen(self) -> None:
+        mv = OneShotMutableValue(value_list=[2, 3, 4])
+        mv.current_choice = 3
+
+        derived_mutable = mv * 2
+        assert derived_mutable.dump_chosen().chosen == 6
+
+        mv.current_choice = 4
+        assert derived_mutable.dump_chosen().chosen == 8
+
+        # nothing will happen
+        derived_mutable.fix_chosen(derived_mutable.dump_chosen().chosen)
+
+    def test_derived_same_mutable(self) -> None:
+        mc = SquentialMutableChannel(num_channels=3)
+        mc_derived = mc.derive_same_mutable()
+        assert mc_derived.source_mutables == {mc}
+
+        mc.current_choice = 2
+        assert mc_derived.current_choice == 2
+        assert torch.equal(mc_derived.current_mask,
+                           torch.tensor([1, 1, 0], dtype=torch.bool))
+
+    def test_mutable_concat_derived(self) -> None:
+        mc1 = SquentialMutableChannel(num_channels=3)
+        mc2 = SquentialMutableChannel(num_channels=4)
+        ms = [mc1, mc2]
+
+        mc_derived = DerivedMutable.derive_concat_mutable(ms)
+        assert mc_derived.source_mutables == set(ms)
+
+        mc1.current_choice = 1
+        mc2.current_choice = 4
+        assert mc_derived.current_choice == 5
+        assert torch.equal(
+            mc_derived.current_mask,
+            torch.tensor([1, 0, 0, 1, 1, 1, 1], dtype=torch.bool))
+
+        mc1.current_choice = 1
+        mc2.current_choice = 1
+        assert mc_derived.current_choice == 2
+        assert torch.equal(
+            mc_derived.current_mask,
+            torch.tensor([1, 0, 0, 1, 0, 0, 0], dtype=torch.bool))
+
+        mv = OneShotMutableValue(value_list=[1, 2, 3])
+        ms = [mc1, mv]
+        with pytest.raises(RuntimeError):
+            _ = DerivedMutable.derive_concat_mutable(ms)
+
+    def test_mutable_channel_derived(self) -> None:
+        mc = SquentialMutableChannel(num_channels=3)
+        mc_derived = mc * 3
+        assert mc_derived.source_mutables == {mc}
+
+        mc.current_choice = 1
+        assert mc_derived.current_choice == 3
+        assert torch.equal(
+            mc_derived.current_mask,
+            torch.tensor([1, 1, 1, 0, 0, 0, 0, 0, 0], dtype=torch.bool))
+
+        mc.current_choice = 2
+        assert mc_derived.current_choice == 6
+        assert torch.equal(
+            mc_derived.current_mask,
+            torch.tensor([1, 1, 1, 1, 1, 1, 0, 0, 0], dtype=torch.bool))
+
+        with pytest.raises(RuntimeError):
+            mc_derived.current_mask = torch.ones(
+                mc_derived.current_mask.size())
+
+    def test_mutable_divide(self) -> None:
+        mc = SquentialMutableChannel(num_channels=128)
+        mc_derived = mc // 8
+        assert mc_derived.source_mutables == {mc}
+
+        mc.current_choice = 128
+        assert mc_derived.current_choice == 16
+        assert torch.equal(mc_derived.current_mask,
+                           torch.ones(16, dtype=torch.bool))
+        mc.current_choice = 120
+        assert mc_derived.current_choice == 16
+        assert torch.equal(mc_derived.current_mask,
+                           torch.ones(16, dtype=torch.bool))
+
+        mv = OneShotMutableValue(value_list=[112, 120, 128])
+        mv_derived = mv // 8
+        assert mv_derived.source_mutables == {mv}
+
+        mv.current_choice == 128
+        assert mv_derived.current_choice == 16
+        mv.current_choice == 120
+        assert mv_derived.current_choice == 16
+
+        mc_derived = mc // 8.0
+        assert mc_derived.source_mutables == {mc}
+
+        mc.current_choice = 128.
+        assert mc_derived.current_choice == 16
+        assert torch.equal(mc_derived.current_mask,
+                           torch.ones(16, dtype=torch.bool))
+        mc.current_choice = 120.
+        assert mc_derived.current_choice == 16
+        assert torch.equal(mc_derived.current_mask,
+                           torch.ones(16, dtype=torch.bool))
+
+        mv = OneShotMutableValue(value_list=[112, 120, 128])
+        mv_derived = mv // 8.0
+        assert mv_derived.source_mutables == {mv}
+
+        mv.current_choice == 128.
+        assert mv_derived.current_choice == 16
+        mv.current_choice == 120.
+        assert mv_derived.current_choice == 16
+
+    def test_source_mutables(self) -> None:
+
+        def useless_fn(x):
+            return x  # noqa: E731
+
+        with pytest.raises(RuntimeError):
+            _ = DerivedMutable(choice_fn=useless_fn)
+
+        mc1 = SquentialMutableChannel(num_channels=3)
+        mc2 = SquentialMutableChannel(num_channels=4)
+        ms = [mc1, mc2]
+
+        mc_derived1 = DerivedMutable.derive_concat_mutable(ms)
+
+        from mmrazor.models.mutables.derived_mutable import (_concat_choice_fn,
+                                                             _concat_mask_fn)
+        mc_derived2 = DerivedMutable(
+            choice_fn=_concat_choice_fn(ms),
+            mask_fn=_concat_mask_fn(ms),
+            source_mutables=ms)
+        assert mc_derived1.source_mutables == mc_derived2.source_mutables
+
+        dd_mutable = mc_derived1.derive_same_mutable()
+        assert dd_mutable.source_mutables == mc_derived1.source_mutables
+
+        with pytest.raises(ValueError):
+            _ = DerivedMutable(
+                choice_fn=lambda x: x, source_mutables=[mc_derived1])
+
+        def dict_closure_fn(x, y):
+
+            def fn():
+                nonlocal x, y
+
+            return fn
+
+        ddd_mutable = DerivedMutable(
+            choice_fn=dict_closure_fn({
+                mc1: [2, 3],
+                mc2: 2
+            }, None),
+            mask_fn=dict_closure_fn({2: [mc1, mc2]}, {3: dd_mutable}))
+        assert ddd_mutable.source_mutables == mc_derived1.source_mutables
+
+        mc3 = SquentialMutableChannel(num_channels=4)
+        dddd_mutable = DerivedMutable(
+            choice_fn=dict_closure_fn({
+                mc1: [2, 3],
+                mc2: 2
+            }, []),
+            mask_fn=dict_closure_fn({2: [mc1, mc2, mc3]}, {3: dd_mutable}))
+        assert dddd_mutable.source_mutables == {mc1, mc2, mc3}
+
+    def test_nested_mutables(self) -> None:
+        source_a = SquentialMutableChannel(num_channels=2)
+        source_b = SquentialMutableChannel(num_channels=3)
+
+        # derive from
+        derived_c = source_a * 1
+        concat_mutables = [source_b, derived_c]
+        derived_d = DerivedMutable.derive_concat_mutable(concat_mutables)
+        concat_mutables = [derived_c, derived_d]
+        derived_e = DerivedMutable.derive_concat_mutable(concat_mutables)
+
+        assert derived_c.source_mutables == {source_a}
+        assert derived_d.source_mutables == {source_a, source_b}
+        assert derived_e.source_mutables == {source_a, source_b}
+
+        source_a.current_choice = 1
+        source_b.current_choice = 3
+
+        assert derived_c.current_choice == 1
+        assert torch.equal(derived_c.current_mask,
+                           torch.tensor([1, 0], dtype=torch.bool))
+
+        assert derived_d.current_choice == 4
+        assert torch.equal(derived_d.current_mask,
+                           torch.tensor([1, 1, 1, 1, 0], dtype=torch.bool))
+
+        assert derived_e.current_choice == 5
+        assert torch.equal(
+            derived_e.current_mask,
+            torch.tensor([1, 0, 1, 1, 1, 1, 0], dtype=torch.bool))
+
+    def test_mutable_channel_value_calculation(self) -> None:
+        mc = SquentialMutableChannel(num_channels=10)
+        mv = OneShotMutableValue(value_list=[2.0, 2.5, 3.0, 3.5])
+        derived_mutable = mc * mv
+        assert derived_mutable.source_mutables == {mv, mc}
+
+        mc.current_choice = 6
+        mv.current_choice = 3.5
+        assert derived_mutable.current_choice == 21
+
+        mc.current_choice = 9
+        mv.current_choice = 3.5
+        assert derived_mutable.current_choice == 31
+
+        mc.current_choice = 7
+        mv.current_choice = 2.5
+        assert derived_mutable.current_choice == 17
+
+        assert isinstance(derived_mutable, BaseMutable)
+        assert isinstance(derived_mutable, DerivedMutable)
+        assert not derived_mutable.is_fixed
+
+        mc.current_choice = mc.num_channels
+        mv.current_choice = mv.min_choice
+        assert derived_mutable.current_choice == \
+            mv.current_choice * mc.num_channels
+        mv.current_choice = mv.max_choice
+        assert derived_mutable.current_choice == \
+            mv.current_choice * mc.current_choice
+
+        with pytest.raises(RuntimeError):
+            derived_mutable.is_fixed = True
+        mc.fix_chosen(mc.dump_chosen().chosen)
+        assert not derived_mutable.is_fixed
+        mv.fix_chosen(mv.dump_chosen().chosen)
+        assert derived_mutable.is_fixed
+
+
+@pytest.mark.parametrize('expand_ratio', [1, 2, 3])
+def test_derived_expand_mutable(expand_ratio: int) -> None:
+    mv = OneShotMutableValue(value_list=[3, 5, 7])
+
+    mv_derived = mv * expand_ratio
+    assert mv_derived.source_mutables == {mv}
+
+    assert isinstance(mv_derived, BaseMutable)
+    assert isinstance(mv_derived, DerivedMutable)
+    assert not mv_derived.is_fixed
+    assert mv_derived.num_choices == 1
+
+    mv.current_choice = mv.max_choice
+    assert mv_derived.current_choice == mv.current_choice * expand_ratio
+    mv.current_choice = mv.min_choice
+    assert mv_derived.current_choice == mv.current_choice * expand_ratio
+
+    with pytest.raises(RuntimeError):
+        mv_derived.current_choice = 123
+    with pytest.raises(RuntimeError):
+        _ = mv_derived.current_mask
+
+    mv.current_choice = 5
+    assert mv_derived.current_choice == 5 * expand_ratio
+
+
+@pytest.mark.parametrize('expand_ratio', [1.5, 2.0, 2.5])
+def test_derived_expand_mutable_float(expand_ratio: float) -> None:
+    mv = OneShotMutableValue(value_list=[3, 5, 7])
+
+    mv_derived = mv * expand_ratio
+    assert mv_derived.source_mutables == {mv}
+
+    assert isinstance(mv_derived, BaseMutable)
+    assert isinstance(mv_derived, DerivedMutable)
+    assert not mv_derived.is_fixed
+    assert mv_derived.num_choices == 1
+
+    mv.current_choice = mv.max_choice
+    assert mv_derived.current_choice == int(mv.current_choice * expand_ratio)
+    mv.current_choice = mv.min_choice
+    assert mv_derived.current_choice == int(mv.current_choice * expand_ratio)
+
+    with pytest.raises(RuntimeError):
+        mv_derived.current_choice = 123
+    with pytest.raises(RuntimeError):
+        _ = mv_derived.current_mask
+
+    mv.current_choice = 5
+    assert mv_derived.current_choice == int(5 * expand_ratio)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffchoiceroute.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffchoiceroute.py
new file mode 100644
index 0000000000000000000000000000000000000000..a40c9f5250c1ddddc02d570c28936b73c5eddffe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffchoiceroute.py
@@ -0,0 +1,87 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.nn as nn
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+
+
+class TestDiffChoiceRoute(TestCase):
+
+    def test_forward_arch_param(self):
+        edges_dict = nn.ModuleDict()
+        edges_dict.add_module('first_edge', nn.Conv2d(32, 32, 3, 1, 1))
+        edges_dict.add_module('second_edge', nn.Conv2d(32, 32, 5, 1, 2))
+        edges_dict.add_module('third_edge', nn.MaxPool2d(3, 1, 1))
+        edges_dict.add_module('fourth_edge', nn.MaxPool2d(5, 1, 2))
+        edges_dict.add_module('fifth_edge', nn.MaxPool2d(7, 1, 3))
+
+        diff_choice_route_cfg = dict(
+            type='DiffChoiceRoute',
+            edges=edges_dict,
+            with_arch_param=True,
+        )
+
+        # test with_arch_param = True
+        diffchoiceroute = MODELS.build(diff_choice_route_cfg)
+        arch_param = nn.Parameter(torch.randn(len(edges_dict)))
+
+        x = [torch.randn(4, 32, 64, 64) for _ in range(5)]
+        output = diffchoiceroute.forward_arch_param(x=x, arch_param=arch_param)
+        assert output is not None
+
+        # test with_arch_param = False
+        new_diff_choice_route_cfg = diff_choice_route_cfg.copy()
+        new_diff_choice_route_cfg['with_arch_param'] = False
+
+        new_diff_choice_route = MODELS.build(new_diff_choice_route_cfg)
+        arch_param = nn.Parameter(torch.randn(len(edges_dict)))
+        output = new_diff_choice_route.forward_arch_param(
+            x=x, arch_param=arch_param)
+        assert output is not None
+
+        new_diff_choice_route.fix_chosen(chosen=['first_edge'])
+
+        # test sample choice
+        arch_param = nn.Parameter(torch.randn(len(edges_dict)))
+        new_diff_choice_route.sample_choice(arch_param)
+
+        # test dump_chosen
+        with pytest.raises(AssertionError):
+            new_diff_choice_route.dump_chosen()
+
+    def test_forward_fixed(self):
+        edges_dict = nn.ModuleDict({
+            'first_edge': nn.Conv2d(32, 32, 3, 1, 1),
+            'second_edge': nn.Conv2d(32, 32, 5, 1, 2),
+            'third_edge': nn.Conv2d(32, 32, 7, 1, 3),
+            'fourth_edge': nn.MaxPool2d(3, 1, 1),
+            'fifth_edge': nn.AvgPool2d(3, 1, 1),
+        })
+
+        diff_choice_route_cfg = dict(
+            type='DiffChoiceRoute',
+            edges=edges_dict,
+            with_arch_param=True,
+        )
+
+        # test with_arch_param = True
+        diffchoiceroute = MODELS.build(diff_choice_route_cfg)
+
+        diffchoiceroute.fix_chosen(
+            chosen=['first_edge', 'second_edge', 'fifth_edge'])
+        assert diffchoiceroute.is_fixed is True
+
+        x = [torch.randn(4, 32, 64, 64) for _ in range(5)]
+        output = diffchoiceroute.forward_fixed(x)
+        assert output is not None
+        assert diffchoiceroute.num_choices == 3
+
+        # after is_fixed = True, call fix_chosen
+        with pytest.raises(AttributeError):
+            diffchoiceroute.fix_chosen(chosen=['first_edge'])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffop.py
new file mode 100644
index 0000000000000000000000000000000000000000..eab9fff2be5d9c05c08c758c5c1cfe38cd546164
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_diffop.py
@@ -0,0 +1,200 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.nn as nn
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+class TestDiffOP(TestCase):
+
+    def test_forward_arch_param(self):
+        op_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        arch_param = nn.Parameter(torch.randn(len(op_cfg['candidates'])))
+        output = op.forward_arch_param(input, arch_param=arch_param)
+        assert output is not None
+
+        # test when some element of arch_param is 0
+        arch_param = nn.Parameter(torch.ones(op.num_choices))
+        output = op.forward_arch_param(input, arch_param=arch_param)
+        assert output is not None
+
+    def test_forward_fixed(self):
+        op_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        op.fix_chosen('torch_conv2d_7x7')
+        output = op.forward_fixed(input)
+
+        assert output is not None
+        assert op.is_fixed is True
+
+    def test_forward(self):
+        op_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        # test set_forward_args
+        arch_param = nn.Parameter(torch.randn(len(op_cfg['candidates'])))
+        op.set_forward_args(arch_param=arch_param)
+        output = op.forward(input)
+        assert output is not None
+
+        # test dump_chosen
+        with pytest.raises(AssertionError):
+            op.dump_chosen()
+
+        # test forward when is_fixed is True
+        op.fix_chosen('torch_conv2d_7x7')
+        output = op.forward(input)
+
+    def test_property(self):
+        op_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+
+        assert len(op.choices) == 3
+
+        # test is_fixed propty
+        assert op.is_fixed is False
+
+        # test is_fixed setting
+        op.fix_chosen('torch_conv2d_5x5')
+
+        with pytest.raises(AttributeError):
+            op.is_fixed = True
+
+        # test fix choice when is_fixed is True
+        with pytest.raises(AttributeError):
+            op.fix_chosen('torch_conv2d_3x3')
+
+    def test_module_kwargs(self):
+        op_cfg = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_maxpool_3x3=dict(
+                    type='torchMaxPool2d',
+                    kernel_size=3,
+                    stride=1,
+                ),
+                torch_avgpool_3x3=dict(
+                    type='torchAvgPool2d',
+                    kernel_size=3,
+                    stride=1,
+                ),
+            ),
+        )
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        op.fix_chosen('torch_avgpool_3x3')
+        output = op.forward(input)
+        assert output is not None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_gumbelchoiceroute.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_gumbelchoiceroute.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d1a0fc3277acf35399db75b2806d4c07b973c20
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_gumbelchoiceroute.py
@@ -0,0 +1,90 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.nn as nn
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+
+
+class TestGumbelChoiceRoute(TestCase):
+
+    def test_forward_arch_param(self):
+        edges_dict = nn.ModuleDict({
+            'first_edge': nn.Conv2d(32, 32, 3, 1, 1),
+            'second_edge': nn.Conv2d(32, 32, 5, 1, 2),
+            'third_edge': nn.Conv2d(32, 32, 7, 1, 3),
+            'fourth_edge': nn.MaxPool2d(3, 1, 1),
+            'fifth_edge': nn.AvgPool2d(3, 1, 1),
+        })
+
+        gumbel_choice_route_cfg = dict(
+            type='GumbelChoiceRoute',
+            edges=edges_dict,
+            tau=1.0,
+            hard=True,
+            with_arch_param=True,
+        )
+
+        # test with_arch_param = True
+        GumbelChoiceRoute = MODELS.build(gumbel_choice_route_cfg)
+
+        arch_param = nn.Parameter(torch.randn(len(edges_dict)))
+        assert len(arch_param) == 5
+        GumbelChoiceRoute.set_temperature(1.0)
+
+        x = [torch.randn(4, 32, 64, 64) for _ in range(5)]
+
+        output = GumbelChoiceRoute.forward_arch_param(
+            x=x, arch_param=arch_param)
+        assert output is not None
+
+        # test with_arch_param = False
+        new_gumbel_choice_route_cfg = gumbel_choice_route_cfg.copy()
+        new_gumbel_choice_route_cfg['with_arch_param'] = False
+
+        new_gumbel_choice_route = MODELS.build(new_gumbel_choice_route_cfg)
+
+        arch_param = nn.Parameter(torch.randn(len(edges_dict)))
+        output = new_gumbel_choice_route.forward_arch_param(
+            x=x, arch_param=arch_param)
+        assert output is not None
+
+        new_gumbel_choice_route.fix_chosen(chosen=['first_edge'])
+
+    def test_forward_fixed(self):
+        edges_dict = nn.ModuleDict({
+            'first_edge': nn.Conv2d(32, 32, 3, 1, 1),
+            'second_edge': nn.Conv2d(32, 32, 5, 1, 2),
+            'third_edge': nn.Conv2d(32, 32, 7, 1, 3),
+            'fourth_edge': nn.MaxPool2d(3, 1, 1),
+            'fifth_edge': nn.AvgPool2d(3, 1, 1),
+        })
+
+        gumbel_choice_route_cfg = dict(
+            type='GumbelChoiceRoute',
+            edges=edges_dict,
+            tau=1.0,
+            hard=True,
+            with_arch_param=True,
+        )
+
+        # test with_arch_param = True
+        GumbelChoiceRoute = MODELS.build(gumbel_choice_route_cfg)
+
+        GumbelChoiceRoute.fix_chosen(
+            chosen=['first_edge', 'second_edge', 'fifth_edge'])
+        assert GumbelChoiceRoute.is_fixed is True
+
+        x = [torch.randn(4, 32, 64, 64) for _ in range(3)]
+        output = GumbelChoiceRoute.forward_fixed(x)
+        assert output is not None
+        assert GumbelChoiceRoute.num_choices == 3
+
+        # after is_fixed = True, call fix_chosen
+        with pytest.raises(AttributeError):
+            GumbelChoiceRoute.fix_chosen(chosen=['first_edge'])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_mutable_channels.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_mutable_channels.py
new file mode 100644
index 0000000000000000000000000000000000000000..6330005d1d988d4c451a4d2af3d483a31b342dfa
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_mutable_channels.py
@@ -0,0 +1,33 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import torch
+
+from mmrazor.models.mutables import (SimpleMutableChannel,
+                                     SquentialMutableChannel)
+
+
+class TestMutableChannels(unittest.TestCase):
+
+    def test_SquentialMutableChannel(self):
+        mutable_channel = SquentialMutableChannel(4)
+        mutable_channel.current_choice = 3
+        self.assertEqual(mutable_channel.activated_channels,
+                         mutable_channel.current_choice)
+        self.assertTrue(
+            (mutable_channel.current_mask == torch.tensor([1, 1, 1,
+                                                           0]).bool()).all())
+        channel_str = mutable_channel.__repr__()
+        self.assertEqual(
+            channel_str,
+            'SquentialMutableChannel(num_channels=4, activated_channels=3)')
+
+        mutable_channel.fix_chosen()
+        mutable_channel.dump_chosen()
+
+    def test_SimpleMutableChannel(self):
+        channel = SimpleMutableChannel(4)
+        channel.current_choice = torch.tensor([1, 0, 0, 0]).bool()
+        self.assertEqual(channel.activated_channels, 1)
+        channel.fix_chosen()
+        channel.dump_chosen()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_sequential_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_sequential_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..c807cabe598a98ff27c5238b26b5dfe31da27cff
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_sequential_mutable_channel.py
@@ -0,0 +1,57 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models.mutables import (OneShotMutableValue,
+                                     SquentialMutableChannel)
+
+
+class TestSquentialMutableChannel(TestCase):
+
+    def _test_mutable(self,
+                      mutable: SquentialMutableChannel,
+                      set_choice,
+                      get_choice,
+                      activate_channels,
+                      mask=None):
+        mutable.current_choice = set_choice
+        assert mutable.current_choice == get_choice
+        assert mutable.activated_channels == activate_channels
+        if mask is not None:
+            assert (mutable.current_mask == mask).all()
+
+    def _generate_mask(self, num: int, all: int):
+        mask = torch.zeros([all])
+        mask[0:num] = 1
+        return mask.bool()
+
+    def test_mul_float(self):
+        channel = SquentialMutableChannel(10)
+        new_channel = channel * 0.5
+        self.assertEqual(new_channel.current_choice, 5)
+        channel.current_choice = 5
+        self.assertEqual(new_channel.current_choice, 2)
+
+    def test_int_choice(self):
+        channel = SquentialMutableChannel(10)
+        self._test_mutable(channel, 5, 5, 5, self._generate_mask(5, 10))
+        self._test_mutable(channel, 0.2, 2, 2, self._generate_mask(2, 10))
+
+    def test_float_choice(self):
+        channel = SquentialMutableChannel(10, choice_mode='ratio')
+        self._test_mutable(channel, 0.5, 0.5, 5, self._generate_mask(5, 10))
+        self._test_mutable(channel, 2, 0.2, 2, self._generate_mask(2, 10))
+
+    def test_mutable_channel_mul(self):
+        channel = SquentialMutableChannel(2)
+        self.assertEqual(channel.current_choice, 2)
+        mv = OneShotMutableValue(value_list=[1, 2, 3], default_value=3)
+        derived1 = channel * mv
+        derived2 = mv * channel
+        assert derived1.current_choice == 6
+        assert derived2.current_choice == 6
+        mv.current_choice = mv.min_choice
+        assert derived1.current_choice == 2
+        assert derived2.current_choice == 2
+        assert torch.equal(derived1.current_mask, derived2.current_mask)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_dcff_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_dcff_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..344d90acbf4fe5a69d4b1a97c54917e035fd5b8e
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_dcff_channel_unit.py
@@ -0,0 +1,77 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+from unittest import TestCase
+
+import torch
+
+from mmrazor.models.architectures.dynamic_ops import FuseConv2d
+from mmrazor.models.mutables import DCFFChannelUnit
+from mmrazor.models.task_modules import ChannelAnalyzer
+from .....data.models import SingleLineModel
+
+DEVICE = torch.device('cpu')
+
+
+class TestDCFFChannelUnit(TestCase):
+
+    def test_num(self):
+        unit = DCFFChannelUnit(48, choice_mode='number')
+        unit.current_choice = 24
+        self.assertEqual(unit.current_choice, 24)
+
+        unit.current_choice = 0.5
+        self.assertEqual(unit.current_choice, 24)
+
+    def test_ratio(self):
+        unit = DCFFChannelUnit(48, choice_mode='ratio')
+        unit.current_choice = 0.5
+        self.assertEqual(unit.current_choice, 0.5)
+        unit.current_choice = 24
+        self.assertEqual(unit.current_choice, 0.5)
+
+    def test_divisor(self):
+        unit = DCFFChannelUnit(48, choice_mode='number', divisor=8)
+        unit.current_choice = 20
+        self.assertEqual(unit.current_choice, 24)
+        self.assertTrue(unit.sample_choice() % 8 == 0)
+
+        unit = DCFFChannelUnit(48, choice_mode='ratio', divisor=8)
+        unit.current_choice = 0.3
+        self.assertEqual(unit.current_choice, 1 / 3)
+
+    def test_config_template(self):
+        unit = DCFFChannelUnit(48, choice_mode='ratio', divisor=8)
+        config = unit.config_template(with_init_args=True)
+        unit2 = DCFFChannelUnit.init_from_cfg(None, config)
+        self.assertDictEqual(
+            unit2.config_template(with_init_args=True)['init_args'],
+            config['init_args'])
+
+    def test_init_from_channel_unit(self):
+        # init using tracer
+        model = SingleLineModel()
+        unit_configs = ChannelAnalyzer().analyze(model)
+        units = [
+            DCFFChannelUnit.init_from_cfg(model, unit_config)
+            for unit_config in unit_configs.values()
+        ]
+
+        model = model.to(DEVICE)
+        self._test_units(units, model)
+
+    def _test_units(self, units: List[DCFFChannelUnit], model):
+        for unit in units:
+            unit.prepare_for_pruning(model)
+        mutable_units = [unit for unit in units if unit.is_mutable]
+        self.assertGreaterEqual(len(mutable_units), 1)
+        for unit in mutable_units:
+            choice = unit.sample_choice()
+            unit.current_choice = choice
+            for channel in unit.output_related:
+                if isinstance(channel.module, FuseConv2d):
+                    layeri_softmaxp = channel.module.get_pooled_channel(1.0)
+                    # update fuseconv op's selected layeri_softmax
+                    channel.module.set_forward_args(choice=layeri_softmaxp)
+        x = torch.rand([2, 3, 224, 224]).to(DEVICE)
+        y = model(x)
+        self.assertSequenceEqual(y.shape, [2, 1000])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_l1_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_l1_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..94bbfe6b6a543959d76685408c13e0cce9fde0d3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_l1_mutable_channel_unit.py
@@ -0,0 +1,28 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch.nn as nn
+
+from mmrazor.models.mutables import L1MutableChannelUnit
+from mmrazor.models.mutators import ChannelMutator
+from .....data.models import SingleLineModel
+
+
+class TestL1MutableChannelUnit(TestCase):
+
+    def test_init(self):
+        model = SingleLineModel()
+        mutator = ChannelMutator(
+            channel_unit_cfg={
+                'type': 'L1MutableChannelUnit',
+                'default_args': {
+                    'choice_mode': 'ratio'
+                }
+            })
+        mutator.prepare_from_supernet(model)
+
+    def test_convnd(self):
+        unit = L1MutableChannelUnit(8)
+        conv = nn.Conv3d(3, 8, 3)
+        norm = unit._get_l1_norm(conv, 0, 8)
+        self.assertSequenceEqual(norm.shape, [8])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_mutable_channel_units.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_mutable_channel_units.py
new file mode 100644
index 0000000000000000000000000000000000000000..219125ecef23ef5ff8b6eb566c6ba94216ae7372
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_mutable_channel_units.py
@@ -0,0 +1,140 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+
+from mmrazor.models.architectures.dynamic_ops.mixins import DynamicChannelMixin
+from mmrazor.models.mutables.mutable_channel import (
+    L1MutableChannelUnit, MutableChannelUnit, SequentialMutableChannelUnit)
+from mmrazor.models.mutables.mutable_channel.units.channel_unit import \
+    ChannelUnit
+from .....data.models import SingleLineModel
+from .....data.tracer_passed_models import backward_passed_library
+
+MUTABLE_CFG = dict(type='SimpleMutablechannel')
+PARSE_CFG = dict(
+    type='ChannelAnalyzer',
+    demo_input=(1, 3, 224, 224),
+    tracer_type='BackwardTracer')
+
+DEVICE = torch.device('cpu')
+UNITS: List[MutableChannelUnit] = [
+    L1MutableChannelUnit, SequentialMutableChannelUnit
+]
+
+DefaultChannelUnit = SequentialMutableChannelUnit
+
+
+def _test_units(units: List[MutableChannelUnit], model):
+    for unit in units:
+        unit.prepare_for_pruning(model)
+    for unit in units:
+        _ = unit.current_choice
+
+    mutable_units = [unit for unit in units if unit.is_mutable]
+    assert len(mutable_units) >= 1, \
+        'len of mutable units should greater or equal than 0.'
+    for unit in mutable_units:
+        choice = unit.sample_choice()
+        unit.current_choice = choice
+        assert abs(unit.current_choice - choice) < 0.1
+    x = torch.rand([2, 3, 224, 224]).to(DEVICE)
+    y = model(x)
+    assert list(y.shape) == [2, 1000]
+
+
+class TestMutableChannelUnit(TestCase):
+
+    def test_init_from_cfg(self):
+        model = SingleLineModel()
+        # init using tracer
+
+        config = {
+            'init_args': {
+                'num_channels': 8
+            },
+            'channels': {
+                'input_related': [{
+                    'name': 'net.1',
+                    'start': 0,
+                    'end': 8,
+                    'expand_ratio': 1,
+                    'is_output_channel': False
+                }, {
+                    'name': 'net.3',
+                    'start': 0,
+                    'end': 8,
+                    'expand_ratio': 1,
+                    'is_output_channel': False
+                }],
+                'output_related': [{
+                    'name': 'net.0',
+                    'start': 0,
+                    'end': 8,
+                    'expand_ratio': 1,
+                    'is_output_channel': True
+                }, {
+                    'name': 'net.1',
+                    'start': 0,
+                    'end': 8,
+                    'expand_ratio': 1,
+                    'is_output_channel': True
+                }]
+            }
+        }
+        units = [DefaultChannelUnit.init_from_cfg(model, config)]
+        _test_units(units, model)
+
+    def test_init(self):
+        for UnitClass in UNITS:
+            with self.subTest(unit_class=UnitClass):
+
+                def test_units(units, model):
+                    mutable_units = [
+                        UnitClass.init_from_channel_unit(unit)
+                        for unit in units
+                    ]
+                    _test_units(mutable_units, model)
+
+                # init using tracer
+                model = SingleLineModel()
+                units: List[
+                    ChannelUnit] = ChannelUnit.init_from_channel_analyzer(
+                        model)
+                test_units(units, model)
+
+                # init using tracer config
+                model = SingleLineModel()
+                units: List[
+                    ChannelUnit] = ChannelUnit.init_from_channel_analyzer(
+                        model, analyzer=dict(type='ChannelAnalyzer'))
+                test_units(units, model)
+
+                print(units)
+
+    def test_replace_with_dynamic_ops(self):
+        model_datas = backward_passed_library.include_models()
+        for model_data in model_datas:
+            for unit_type in UNITS:
+                with self.subTest(model=model_data, unit=unit_type):
+                    model: nn.Module = model_data()
+                    units: List[
+                        MutableChannelUnit] = unit_type.init_from_channel_analyzer(  # noqa
+                            model)
+                    for unit in units:
+                        unit.prepare_for_pruning(model)
+
+                    for module in model.modules():
+                        if isinstance(module, nn.Conv2d)\
+                            and module.groups == module.in_channels\
+                                and module.groups == 1:
+                            self.assertTrue(
+                                isinstance(module, DynamicChannelMixin))
+                        if isinstance(module, nn.Linear):
+                            self.assertTrue(
+                                isinstance(module, DynamicChannelMixin))
+                        if isinstance(module, nn.BatchNorm2d):
+                            self.assertTrue(
+                                isinstance(module, DynamicChannelMixin))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_one_shot_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_one_shot_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..80d9800c6059cb35a52ce4c7250555682112fb77
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_one_shot_mutable_channel_unit.py
@@ -0,0 +1,33 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.mutables import OneShotMutableChannelUnit
+from mmrazor.models.mutators.channel_mutator import ChannelMutator
+from tests.data.models import DynamicAttention
+
+
+class TestSequentialMutableChannelUnit(TestCase):
+
+    def test_init(self):
+        unit = OneShotMutableChannelUnit(
+            48, [20, 30, 40], choice_mode='number', divisor=8)
+        self.assertSequenceEqual(unit.candidate_choices, [24, 32, 40])
+
+        unit = OneShotMutableChannelUnit(
+            48, [0.3, 0.5, 0.7], choice_mode='ratio', divisor=8)
+        self.assertSequenceEqual(unit.candidate_choices, [1 / 3, 0.5, 2 / 3])
+
+    def test_unit_predefined(self):
+        model = DynamicAttention()
+        mutator = ChannelMutator(
+            channel_unit_cfg={
+                'type': 'OneShotMutableChannelUnit',
+                'default_args': {
+                    'unit_predefined': False
+                }
+            },
+            parse_cfg={'type': 'Predefined'})
+        mutator.prepare_from_supernet(model)
+        self.assertSequenceEqual(mutator.units[0].candidate_choices,
+                                 [576, 624])
+        self.assertSequenceEqual(mutator.units[1].candidate_choices, [64])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_sequential_mutable_channel_unit.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_sequential_mutable_channel_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..8981a8a2170ce2b43df7bf8d9bf31109c6b01a65
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_channel/test_units/test_sequential_mutable_channel_unit.py
@@ -0,0 +1,41 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.mutables import SequentialMutableChannelUnit
+
+
+class TestSequentialMutableChannelUnit(TestCase):
+
+    def test_num(self):
+        unit = SequentialMutableChannelUnit(48)
+        unit.current_choice = 24
+        self.assertEqual(unit.current_choice, 24)
+
+        unit.current_choice = 0.5
+        self.assertEqual(unit.current_choice, 24)
+
+    def test_ratio(self):
+        unit = SequentialMutableChannelUnit(48, choice_mode='ratio')
+        unit.current_choice = 0.5
+        self.assertEqual(unit.current_choice, 0.5)
+        unit.current_choice = 24
+        self.assertEqual(unit.current_choice, 0.5)
+
+    def test_divisor(self):
+        unit = SequentialMutableChannelUnit(
+            48, choice_mode='number', divisor=8)
+        unit.current_choice = 20
+        self.assertEqual(unit.current_choice, 24)
+        self.assertTrue(unit.sample_choice() % 8 == 0)
+
+        unit = SequentialMutableChannelUnit(48, choice_mode='ratio', divisor=8)
+        unit.current_choice = 0.3
+        self.assertEqual(unit.current_choice, 1 / 3)
+
+    def test_config_template(self):
+        unit = SequentialMutableChannelUnit(48, choice_mode='ratio', divisor=8)
+        config = unit.config_template(with_init_args=True)
+        unit2 = SequentialMutableChannelUnit.init_from_cfg(None, config)
+        self.assertDictEqual(
+            unit2.config_template(with_init_args=True)['init_args'],
+            config['init_args'])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_value.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_value.py
new file mode 100644
index 0000000000000000000000000000000000000000..11ac7d49c46f1047b1e0ec0d36e177d039f12051
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_mutable_value.py
@@ -0,0 +1,119 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+
+from mmrazor.models.mutables import (MutableValue, OneShotMutableValue,
+                                     SquentialMutableChannel)
+
+
+class TestMutableValue(TestCase):
+
+    def test_init_mutable_value(self) -> None:
+        value_list = [2, 4, 6]
+        mv = MutableValue(value_list=value_list)
+        assert mv.current_choice == 2
+        assert mv.num_choices == 3
+
+        mv = MutableValue(value_list=value_list, default_value=4)
+        assert mv.current_choice == 4
+
+        with pytest.raises(ValueError):
+            mv = MutableValue(value_list=value_list, default_value=5)
+
+        mv = MutableValue(value_list=[2])
+        assert mv.current_choice == 2
+        assert mv.choices == [2]
+
+        with pytest.raises(TypeError):
+            mv = MutableValue(value_list=[2, 3.2])
+
+    def test_init_one_shot_mutable_value(self) -> None:
+        value_list = [6, 4, 2]
+        mv = OneShotMutableValue(value_list=value_list)
+        assert mv.current_choice == 6
+        assert mv.choices == [2, 4, 6]
+
+        mv = OneShotMutableValue(value_list=value_list, default_value=4)
+        assert mv.current_choice == 4
+
+    def test_fix_chosen(self) -> None:
+        mv = MutableValue([2, 3, 4])
+        chosen = mv.dump_chosen()
+        assert chosen.chosen == mv.current_choice
+        assert chosen.meta['all_choices'] == mv.choices
+
+        with pytest.raises(AssertionError):
+            mv.fix_chosen(5)
+
+        mv.fix_chosen(3)
+        assert mv.current_choice == 3
+
+        with pytest.raises(RuntimeError):
+            mv.fix_chosen(chosen)
+
+    def test_one_shot_mutable_value_sample(self) -> None:
+        mv = OneShotMutableValue(value_list=[2, 3, 4])
+        assert mv.max_choice == 4
+        assert mv.min_choice == 2
+
+        for _ in range(100):
+            assert mv.sample_choice() in mv.choices
+
+    def test_mul(self) -> None:
+        mv = MutableValue(value_list=[1, 2, 3], default_value=3)
+        mul_derived_mv = mv * 2
+        rmul_derived_mv = 2 * mv
+
+        assert mul_derived_mv.current_choice == 6
+        assert rmul_derived_mv.current_choice == 6
+
+        mv.current_choice = 2
+        assert mul_derived_mv.current_choice == 4
+        assert rmul_derived_mv.current_choice == 4
+
+        mv = MutableValue(value_list=[1, 2, 3], default_value=3)
+        mc = SquentialMutableChannel(num_channels=4)
+
+        with pytest.raises(TypeError):
+            _ = mc * mv
+        with pytest.raises(TypeError):
+            _ = mv * mc
+
+        mv = OneShotMutableValue(value_list=[1, 2, 3], default_value=3)
+        mc.current_choice = 2
+
+        derived1 = mc * mv
+        derived2 = mv * mc
+
+        assert derived1.current_choice == 6
+        assert derived2.current_choice == 6
+        assert torch.equal(derived1.current_mask, derived2.current_mask)
+
+        mv.current_choice = 2
+        assert derived1.current_choice == 4
+        assert derived2.current_choice == 4
+        assert torch.equal(derived1.current_mask, derived2.current_mask)
+
+    def test_floordiv(self) -> None:
+        mv = MutableValue(value_list=[120, 128, 136])
+        derived_mv = mv // 8
+
+        mv.current_choice = 120
+        assert derived_mv.current_choice == 16
+        mv.current_choice = 128
+        assert derived_mv.current_choice == 16
+
+        derived_mv = mv // (8, 3)
+        mv.current_choice = 120
+        assert derived_mv.current_choice == 15
+        mv.current_choice = 136
+        assert derived_mv.current_choice == 18
+
+    def test_repr(self) -> None:
+        value_list = [2, 4, 6]
+        mv = MutableValue(value_list=value_list)
+
+        assert repr(mv) == \
+            f'MutableValue(value_list={value_list}, current_choice=2)'
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_onehotop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_onehotop.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3b86d745ec6bd8d06a393af644f019f70cec72c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_onehotop.py
@@ -0,0 +1,200 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.nn as nn
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+class TestOneHotOP(TestCase):
+
+    def test_forward_arch_param(self):
+        op_cfg = dict(
+            type='mmrazor.OneHotMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        arch_param = nn.Parameter(torch.randn(len(op_cfg['candidates'])))
+        output = op.forward_arch_param(input, arch_param=arch_param)
+        assert output is not None
+
+        # test when some element of arch_param is 0
+        arch_param = nn.Parameter(torch.ones(op.num_choices))
+        output = op.forward_arch_param(input, arch_param=arch_param)
+        assert output is not None
+
+    def test_forward_fixed(self):
+        op_cfg = dict(
+            type='mmrazor.OneHotMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        op.fix_chosen('torch_conv2d_7x7')
+        output = op.forward_fixed(input)
+
+        assert output is not None
+        assert op.is_fixed is True
+
+    def test_forward(self):
+        op_cfg = dict(
+            type='mmrazor.OneHotMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        # test set_forward_args
+        arch_param = nn.Parameter(torch.randn(len(op_cfg['candidates'])))
+        op.set_forward_args(arch_param=arch_param)
+        output = op.forward(input)
+        assert output is not None
+
+        # test dump_chosen
+        with pytest.raises(AssertionError):
+            op.dump_chosen()
+
+        # test forward when is_fixed is True
+        op.fix_chosen('torch_conv2d_7x7')
+        output = op.forward(input)
+
+    def test_property(self):
+        op_cfg = dict(
+            type='mmrazor.OneHotMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+
+        assert len(op.choices) == 3
+
+        # test is_fixed propty
+        assert op.is_fixed is False
+
+        # test is_fixed setting
+        op.fix_chosen('torch_conv2d_5x5')
+
+        with pytest.raises(AttributeError):
+            op.is_fixed = True
+
+        # test fix choice when is_fixed is True
+        with pytest.raises(AttributeError):
+            op.fix_chosen('torch_conv2d_3x3')
+
+    def test_module_kwargs(self):
+        op_cfg = dict(
+            type='mmrazor.OneHotMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    in_channels=32,
+                    out_channels=32,
+                    stride=1,
+                ),
+                torch_maxpool_3x3=dict(
+                    type='torchMaxPool2d',
+                    kernel_size=3,
+                    stride=1,
+                ),
+                torch_avgpool_3x3=dict(
+                    type='torchAvgPool2d',
+                    kernel_size=3,
+                    stride=1,
+                ),
+            ),
+        )
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        op.fix_chosen('torch_avgpool_3x3')
+        output = op.forward(input)
+        assert output is not None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_oneshotop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_oneshotop.py
new file mode 100644
index 0000000000000000000000000000000000000000..3704e67a8fd8b0027aac0c149ee30c421bed47c2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_oneshotop.py
@@ -0,0 +1,241 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+import torch.nn as nn
+
+import mmrazor.models  # noqa:F401
+from mmrazor.registry import MODELS
+
+
+class TestMutables(TestCase):
+
+    def test_oneshotmutableop(self):
+        norm_cfg = dict(type='BN', requires_grad=True)
+        op_cfg = dict(
+            type='OneShotMutableOP',
+            candidates=dict(
+                shuffle_3x3=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=3),
+                shuffle_5x5=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=5),
+                shuffle_7x7=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=7),
+                shuffle_xception=dict(
+                    type='ShuffleXception',
+                    norm_cfg=norm_cfg,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        # test forward all
+        output = op.forward_all(input)
+        assert output is not None
+
+        # test random choice
+        assert op.sample_choice() in [
+            'shuffle_3x3', 'shuffle_5x5', 'shuffle_7x7', 'shuffle_xception'
+        ]
+
+        # test unfixed mode
+        op.current_choice = 'shuffle_3x3'
+        output1 = op.forward(input)
+
+        op.current_choice = 'shuffle_7x7'
+        output2 = op.forward(input)
+
+        assert not output1.equal(output2)
+
+        assert op.is_fixed is False
+        assert len(op.choices) == 4
+        assert op.num_choices == 4
+
+        # compare set_forward_args with forward with choice
+        op.current_choice = 'shuffle_5x5'
+        output1 = op.forward(input)
+        output2 = op.forward_choice(input, choice='shuffle_5x5')
+        assert output1.equal(output2)
+
+        # test fixed mode
+        op.fix_chosen('shuffle_3x3')
+        assert op.is_fixed is True
+        assert len(op.choices) == 1
+        assert op.num_choices == 1
+
+        output = op.forward(input)
+        assert output.shape[1] == 32
+
+        with pytest.raises(AttributeError):
+            op.is_fixed = True
+
+        with pytest.raises(AttributeError):
+            op.fix_chosen('shuffle_3x3')
+
+    def test_oneshotprobop(self):
+        norm_cfg = dict(type='BN', requires_grad=True)
+        op_cfg = dict(
+            type='OneShotProbMutableOP',
+            choice_probs=[0.1, 0.2, 0.3, 0.4],
+            candidates=dict(
+                shuffle_3x3=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=3),
+                shuffle_5x5=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=5),
+                shuffle_7x7=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=7),
+                shuffle_xception=dict(
+                    type='ShuffleXception',
+                    norm_cfg=norm_cfg,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+
+        input = torch.randn(4, 32, 64, 64)
+
+        # test forward choice with None
+        with pytest.raises(AssertionError):
+            output = op.forward_choice(input, choice=None)
+
+        # test forward all
+        output = op.forward_all(input)
+        assert output.shape[1] == 32
+
+        # test random choice
+        assert op.sample_choice() in [
+            'shuffle_3x3', 'shuffle_5x5', 'shuffle_7x7', 'shuffle_xception'
+        ]
+        assert 1 - sum(op.choice_probs) < 0.00001
+
+        # test unfixed mode
+        op.current_choice = 'shuffle_3x3'
+        output = op.forward(input)
+
+        assert output.shape[1] == 32
+
+        op.current_choice = 'shuffle_7x7'
+        output = op.forward(input)
+        assert output.shape[1] == 32
+
+        assert op.is_fixed is False
+        assert len(op.choices) == 4
+        assert op.num_choices == 4
+
+        # test fixed mode
+        op.fix_chosen('shuffle_3x3')
+        assert op.is_fixed is True
+        assert len(op.choices) == 1
+        assert op.num_choices == 1
+
+        output = op.forward(input)
+        assert output.shape[1] == 32
+
+        with pytest.raises(AttributeError):
+            op.is_fixed = True
+
+    def test_forward_choice(self):
+        norm_cfg = dict(type='BN', requires_grad=True)
+        op_cfg = dict(
+            type='OneShotMutableOP',
+            candidates=dict(
+                shuffle_3x3=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=3),
+                shuffle_5x5=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=5),
+                shuffle_7x7=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=7),
+                shuffle_xception=dict(
+                    type='ShuffleXception',
+                    norm_cfg=norm_cfg,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        assert op.forward_choice(input, choice='shuffle_3x3') is not None
+
+    def test_fix_chosen(self):
+        norm_cfg = dict(type='BN', requires_grad=True)
+        op_cfg = dict(
+            type='OneShotMutableOP',
+            candidates=dict(
+                shuffle_3x3=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=3),
+                shuffle_5x5=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=5),
+                shuffle_7x7=dict(
+                    type='ShuffleBlock', norm_cfg=norm_cfg, kernel_size=7),
+                shuffle_xception=dict(
+                    type='ShuffleXception',
+                    norm_cfg=norm_cfg,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        op = MODELS.build(op_cfg)
+
+        with pytest.raises(AttributeError):
+            op.fix_chosen('shuffle_xception')
+            op.fix_chosen('ShuffleBlock')
+
+    def test_build_ops(self):
+        norm_cfg = dict(type='BN', requires_grad=True)
+        op_cfg = dict(
+            type='OneShotMutableOP',
+            candidates=dict(
+                shuffle_3x3=dict(
+                    type='ShuffleBlock',
+                    norm_cfg=norm_cfg,
+                    kernel_size=3,
+                    in_channels=32,
+                    out_channels=32),
+                shuffle_5x5=dict(
+                    type='ShuffleBlock',
+                    norm_cfg=norm_cfg,
+                    kernel_size=5,
+                    in_channels=32,
+                    out_channels=32),
+                shuffle_7x7=dict(
+                    type='ShuffleBlock',
+                    norm_cfg=norm_cfg,
+                    kernel_size=7,
+                    in_channels=32,
+                    out_channels=32),
+                shuffle_xception=dict(
+                    type='ShuffleXception',
+                    norm_cfg=norm_cfg,
+                    in_channels=32,
+                    out_channels=32),
+            ),
+        )
+        op = MODELS.build(op_cfg)
+        input = torch.randn(4, 32, 64, 64)
+
+        output = op.forward_all(input)
+        assert output is not None
+
+    def test_candidates(self):
+
+        candidates = nn.ModuleDict({
+            'conv3x3': nn.Conv2d(32, 32, 3, 1, 1),
+            'conv5x5': nn.Conv2d(32, 32, 5, 1, 2),
+            'conv7x7': nn.Conv2d(32, 32, 7, 1, 3),
+            'maxpool3x3': nn.MaxPool2d(3, 1, 1),
+            'avgpool3x3': nn.AvgPool2d(3, 1, 1),
+        })
+
+        op_cfg = dict(type='OneShotMutableOP', candidates=candidates)
+
+        op = MODELS.build(op_cfg)
+
+        input = torch.randn(4, 32, 64, 64)
+
+        output = op.forward_all(input)
+        assert output is not None
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_sequential_mutable_channel.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_sequential_mutable_channel.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7f4bb91e2ffb3329c1f0c950b4a2ff94d5864c3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutables/test_sequential_mutable_channel.py
@@ -0,0 +1,14 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+from mmrazor.models.mutables import SquentialMutableChannel
+
+
+class TestSquentialMutableChannel(TestCase):
+
+    def test_mul_float(self):
+        channel = SquentialMutableChannel(10)
+        new_channel = channel * 0.5
+        self.assertEqual(new_channel.current_choice, 5)
+        channel.current_choice = 5
+        self.assertEqual(new_channel.current_choice, 2)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_channel_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_channel_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d9d290a25f4d4b9aaec404cfb27b0c3f4b3dab6
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_channel_mutator.py
@@ -0,0 +1,180 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import unittest
+from typing import Union
+
+import torch
+
+# from mmrazor.models.mutables import MutableChannelUnit
+from mmrazor.models.mutables.mutable_channel import (
+    L1MutableChannelUnit, SequentialMutableChannelUnit)
+from mmrazor.models.mutators.channel_mutator import ChannelMutator
+from mmrazor.models.task_modules import ChannelAnalyzer
+from mmrazor.registry import MODELS
+from ...data.models import DynamicAttention, DynamicLinearModel, DynamicMMBlock
+from ...data.tracer_passed_models import backward_passed_library
+
+
+@MODELS.register_module()
+class RandomChannelUnit(SequentialMutableChannelUnit):
+
+    def generate_mask(self, choice: Union[int, float]) -> torch.Tensor:
+        if isinstance(choice, float):
+            choice = max(1, int(self.num_channels * choice))
+        assert 0 < choice <= self.num_channels
+        rand_imp = torch.rand([self.num_channels])
+        ind = rand_imp.topk(choice)[1]
+        mask = torch.zeros([self.num_channels])
+        mask.scatter_(-1, ind, 1)
+        return mask
+
+
+DATA_UNITS = [
+    SequentialMutableChannelUnit, RandomChannelUnit, L1MutableChannelUnit
+]
+
+
+class TestChannelMutator(unittest.TestCase):
+
+    def _test_a_mutator(self, mutator: ChannelMutator, model):
+        self.assertGreater(len(mutator.mutable_units), 0)
+        x = torch.rand([2, 3, 224, 224])
+        y = model(x)
+        self.assertEqual(list(y.shape), [2, 1000])
+
+    def test_init(self):
+        model = backward_passed_library.include_models()[0]()
+        mutator = ChannelMutator(parse_cfg=ChannelAnalyzer())
+        mutator.prepare_from_supernet(model)
+        self.assertGreaterEqual(len(mutator.mutable_units), 1)
+        self._test_a_mutator(mutator, model)
+
+    def test_sample_subnet(self):
+        data_models = backward_passed_library.include_models()[:2]
+
+        for i, data in enumerate(data_models):
+            with self.subTest(i=i, data=data):
+                model = data()
+
+                mutator = ChannelMutator()
+                mutator.prepare_from_supernet(model)
+
+                self.assertGreaterEqual(len(mutator.mutable_units), 1)
+
+                self._test_a_mutator(mutator, model)
+
+    def test_generic_support(self):
+        data_models = backward_passed_library.include_models()
+
+        for data_model in data_models[:1]:
+            for unit_type in DATA_UNITS:
+                with self.subTest(model=data_model, unit=unit_type):
+
+                    model = data_model()
+
+                    mutator = ChannelMutator(channel_unit_cfg=unit_type)
+                    mutator.prepare_from_supernet(model)
+                    mutator.units
+
+                    self._test_a_mutator(mutator, model)
+
+    def test_init_units_from_cfg(self):
+        ARCHITECTURE_CFG = dict(
+            type='mmcls.ImageClassifier',
+            backbone=dict(type='mmcls.MobileNetV2', widen_factor=1.5),
+            neck=dict(type='mmcls.GlobalAveragePooling'),
+            head=dict(
+                type='mmcls.LinearClsHead',
+                num_classes=1000,
+                in_channels=1920,
+                loss=dict(type='mmcls.CrossEntropyLoss', loss_weight=1.0),
+                topk=(1, 5)))
+        model = MODELS.build(ARCHITECTURE_CFG)
+
+        # generate config
+        model1 = copy.deepcopy(model)
+        mutator = ChannelMutator()
+        mutator.prepare_from_supernet(model1)
+        config = mutator.config_template(
+            with_channels=True, with_unit_init_args=True)
+
+        # test passing config
+        model2 = copy.deepcopy(model)
+        config2 = copy.deepcopy(config)
+        config2['parse_cfg'] = {'type': 'Config'}
+        mutator2 = MODELS.build(config2)
+        mutator2.prepare_from_supernet(model2)
+        self.assertEqual(
+            len(mutator.mutable_units), len(mutator2.mutable_units))
+        self._test_a_mutator(mutator2, model2)
+
+    def test_mix_config_tracer(self):
+        model = backward_passed_library.include_models()[0]()
+
+        model0 = copy.deepcopy(model)
+        mutator0 = ChannelMutator()
+        mutator0.prepare_from_supernet(model0)
+        config = mutator0.config_template(with_unit_init_args=True)
+
+        model1 = copy.deepcopy(model)
+        mutator1 = MODELS.build(config)
+        mutator1.prepare_from_supernet(model1)
+        config1 = mutator1.config_template(with_unit_init_args=True)
+
+        self.assertDictEqual(config1, config)
+        self._test_a_mutator(mutator1, model1)
+
+    def test_models_with_predefined_dynamic_op(self):
+        for Model in [
+                DynamicLinearModel,
+        ]:
+            with self.subTest(model=Model):
+                model = Model()
+                mutator = ChannelMutator(
+                    channel_unit_cfg={
+                        'type': 'OneShotMutableChannelUnit',
+                        'default_args': {}
+                    },
+                    parse_cfg={'type': 'Predefined'})
+                mutator.prepare_from_supernet(model)
+                self._test_a_mutator(mutator, model)
+
+    def test_models_with_predefined_dynamic_op_without_pruning(self):
+        for Model in [
+                DynamicAttention,
+        ]:
+            with self.subTest(model=Model):
+                model = Model()
+                mutator = ChannelMutator(
+                    channel_unit_cfg={
+                        'type': 'OneShotMutableChannelUnit',
+                        'default_args': {
+                            'unit_predefined': True
+                        }
+                    },
+                    parse_cfg={'type': 'Predefined'})
+                mutator.prepare_from_supernet(model)
+                self.assertGreater(len(mutator.mutable_units), 0)
+                x = torch.rand([2, 3, 224, 224])
+                y = model(x)
+                self.assertEqual(list(y.shape), [2, 624])
+
+    def test_related_shortcut_layer(self):
+        for Model in [
+                DynamicMMBlock,
+        ]:
+            with self.subTest(model=Model):
+                model = Model()
+                mutator = ChannelMutator(
+                    channel_unit_cfg={
+                        'type': 'OneShotMutableChannelUnit',
+                        'default_args': {
+                            'unit_predefined': True
+                        }
+                    },
+                    parse_cfg={'type': 'Predefined'})
+                mutator.prepare_from_supernet(model)
+                self.assertGreater(len(mutator.mutable_units), 0)
+                x = torch.rand([2, 3, 224, 224])
+                y = model(x)
+                self.assertEqual(list(y[-1].shape), [2, 1984, 1, 1])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dcff_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dcff_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc02502485f0c94999dca91104e25f65b2be1e50
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dcff_mutator.py
@@ -0,0 +1,101 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from mmcls.models import *  # noqa: F401,F403
+from torch import Tensor, nn
+from torch.nn import Module
+
+from mmrazor.models.mutators import DCFFChannelMutator
+
+
+class MultiConcatModel(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(16, 8, 1)
+        self.op4 = nn.Conv2d(3, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        x3 = self.op3(cat1)
+        x4 = self.op4(x)
+        output = torch.cat([x3, x4], dim=1)
+
+        return output
+
+
+class MultiConcatModel2(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.op3 = nn.Conv2d(3, 8, 1)
+        self.op4 = nn.Conv2d(24, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.op1(x)
+        x2 = self.op2(x)
+        x3 = self.op3(x)
+        cat1 = torch.cat([x1, x2], dim=1)
+        cat2 = torch.cat([cat1, x3], dim=1)
+        output = self.op4(cat2)
+
+        return output
+
+
+class ConcatModel(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(3, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(16, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x))
+        cat1 = torch.cat([x1, x2], dim=1)
+        x3 = self.op3(cat1)
+
+        return x3
+
+
+class ResBlock(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(8, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(8, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x1))
+        x3 = self.op3(x2 + x1)
+        return x3
+
+
+def test_DCFF_channel_mutator() -> None:
+    imgs = torch.randn(16, 3, 224, 224)
+
+    # ResBlock
+    mutator = DCFFChannelMutator(channel_unit_cfg=dict(type='DCFFChannelUnit'))
+
+    model = ResBlock()
+    mutator.prepare_from_supernet(model)
+    mutator.calc_information(1.0)
+    out3 = model(imgs)
+
+    assert out3.shape == (16, 8, 224, 224)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dmcp_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dmcp_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..eea8e2a0848f4ea031f9f9297e4cfef770db0318
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_dmcp_mutator.py
@@ -0,0 +1,40 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from mmcls.models import *  # noqa: F401,F403
+from torch import Tensor, nn
+from torch.nn import Module
+
+from mmrazor.models.mutators import DMCPChannelMutator
+
+
+class ResBlock(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.op1 = nn.Conv2d(3, 8, 1)
+        self.bn1 = nn.BatchNorm2d(8)
+        self.op2 = nn.Conv2d(8, 8, 1)
+        self.bn2 = nn.BatchNorm2d(8)
+        self.op3 = nn.Conv2d(8, 8, 1)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x1 = self.bn1(self.op1(x))
+        x2 = self.bn2(self.op2(x1))
+        x3 = self.op3(x2 + x1)
+        return x3
+
+
+def test_DMCP_channel_mutator() -> None:
+    imgs = torch.randn(16, 3, 224, 224)
+
+    # ResBlock
+    mutator = DMCPChannelMutator(channel_unit_cfg=dict(type='DMCPChannelUnit'))
+
+    model = ResBlock()
+    mutator.prepare_from_supernet(model)
+    for mode in ['max', 'min', 'random', 'expected', 'direct']:
+        mutator.sample_subnet(mode, arch_train=True)
+    out3 = model(imgs)
+
+    assert out3.shape == (16, 8, 224, 224)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_nas_mutator.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_nas_mutator.py
new file mode 100644
index 0000000000000000000000000000000000000000..dce6b6c38e838f142f71d68fd9e1ffa21ac50de8
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_mutators/test_nas_mutator.py
@@ -0,0 +1,196 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import pytest
+import torch
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+
+from mmrazor.models.architectures.utils import mutate_conv_module
+from mmrazor.models.mutables import (MutableChannelContainer, MutableValue,
+                                     OneShotMutableChannel,
+                                     OneShotMutableChannelUnit,
+                                     OneShotMutableValue)
+from mmrazor.models.mutables.mutable_module import MutableModule
+from mmrazor.models.mutators import NasMutator
+from mmrazor.registry import MODELS
+
+MODELS.register_module(name='torchConv2d', module=nn.Conv2d, force=True)
+MODELS.register_module(name='torchMaxPool2d', module=nn.MaxPool2d, force=True)
+MODELS.register_module(name='torchAvgPool2d', module=nn.AvgPool2d, force=True)
+
+
+class SearchableLayer(nn.Module):
+
+    def __init__(self, mutable_cfg: dict) -> None:
+        super().__init__()
+        self.op1 = MODELS.build(mutable_cfg)
+        self.op2 = MODELS.build(mutable_cfg)
+        self.op3 = MODELS.build(mutable_cfg)
+
+    def forward(self, x):
+        x = self.op1(x)
+        x = self.op2(x)
+        return self.op3(x)
+
+
+class SearchableModel(nn.Module):
+    """A searchable model with a mixed search space as follows:
+
+    1. value search.
+    2. module search.
+    3. channel search.
+    """
+
+    def __init__(self, mutable_cfg: dict) -> None:
+        super().__init__()
+
+        self.first_conv = ConvModule(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            padding=1,
+            conv_cfg=dict(type='mmrazor.BigNasConv2d'),
+            norm_cfg=dict(type='mmrazor.DynamicBatchNorm2d'))
+
+        self.second_conv = ConvModule(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=1,
+            stride=1,
+            padding=1,
+            conv_cfg=dict(type='mmrazor.BigNasConv2d'))
+
+        self.slayer1 = SearchableLayer(mutable_cfg)
+        self.slayer2 = SearchableLayer(mutable_cfg)
+        self.slayer3 = SearchableLayer(mutable_cfg)
+
+        self.register_mutables()
+
+    def forward(self, x):
+        x = self.first_conv(x)
+        x = self.second_conv(x)
+        x = self.slayer1(x)
+        x = self.slayer2(x)
+        return self.slayer3(x)
+
+    def register_mutables(self):
+        """Mutate the defined model."""
+        OneShotMutableChannelUnit._register_channel_container(
+            self, MutableChannelContainer)
+
+        mutable_kernel_size = OneShotMutableValue(
+            value_list=[1, 3], default_value=3)
+        mutable_out_channels = OneShotMutableChannel(
+            32, candidate_choices=[16, 32])
+        mutate_conv_module(
+            self.first_conv,
+            mutable_kernel_size=mutable_kernel_size,
+            mutable_out_channels=mutable_out_channels)
+
+        # dont forget the last connection.
+        MutableChannelContainer.register_mutable_channel_to_module(
+            self.second_conv.conv, mutable_out_channels, False)
+
+
+class TestNasMutator(unittest.TestCase):
+
+    def setUp(self):
+        self.MUTABLE_CFG = dict(
+            type='DiffMutableOP',
+            candidates=dict(
+                torch_conv2d_3x3=dict(
+                    type='torchConv2d',
+                    kernel_size=3,
+                    padding=1,
+                ),
+                torch_conv2d_5x5=dict(
+                    type='torchConv2d',
+                    kernel_size=5,
+                    padding=2,
+                ),
+                torch_conv2d_7x7=dict(
+                    type='torchConv2d',
+                    kernel_size=7,
+                    padding=3,
+                ),
+            ),
+            module_kwargs=dict(in_channels=32, out_channels=32, stride=1))
+
+        self.MUTATOR_CFG = dict(type='NasMutator')
+
+    def test_models_with_predefined_dynamic_op(self):
+        for Model in [SearchableModel]:
+            with self.subTest(model=Model):
+                model = SearchableModel(self.MUTABLE_CFG)
+                mutator = MODELS.build(self.MUTATOR_CFG)
+                assert isinstance(mutator, NasMutator)
+
+                with pytest.raises(RuntimeError):
+                    _ = mutator.search_groups
+                mutator.prepare_from_supernet(model)
+                assert hasattr(mutator, 'search_groups')
+
+                with pytest.raises(AttributeError):
+                    _ = mutator.arch_params
+                mutator.prepare_arch_params()
+                assert hasattr(mutator, 'arch_params')
+
+                for name in mutator.search_groups.keys():
+                    assert 'value' or 'channel' or 'module' in name
+
+                self.assertEqual(len(mutator.arch_params.keys()), 9)
+                for v in mutator.arch_params.values():
+                    self.assertEqual(v.size()[0], 3)
+
+                mutable_values = []
+                mutable_modules = []
+                for name, module in model.named_modules():
+                    if isinstance(module, MutableValue):
+                        mutable_values.append(name)
+                    elif isinstance(module, MutableModule):
+                        mutable_modules.append(name)
+                    elif hasattr(module, 'source_mutables'):
+                        for each_mutables in module.source_mutables:
+                            if isinstance(each_mutables, MutableValue):
+                                mutable_values.append(each_mutables)
+                            elif isinstance(each_mutables, MutableModule):
+                                mutable_modules.append(each_mutables)
+
+                num_mutables = len(mutable_values) + \
+                    len(mutable_modules) + len(mutator.mutable_units)
+                self.assertEqual(len(mutator.search_groups), num_mutables)
+
+                choices = mutator.sample_choices()
+                min_choices = mutator.sample_choices(kind='min')
+                max_choices = mutator.sample_choices(kind='max')
+
+                self.assertEqual(choices.keys(), min_choices.keys())
+                self.assertEqual(choices.keys(), max_choices.keys())
+
+                with self.assertRaises(NotImplementedError):
+                    _ = mutator.sample_choices(kind='mun')
+
+                assert hasattr(mutator, 'current_choices')
+                with self.assertWarnsRegex(
+                        UserWarning, 'mutables with `arch param` detected'):
+                    _ = mutator.max_choices
+
+                with self.assertWarnsRegex(
+                        UserWarning, 'mutables with `arch param` detected'):
+                    _ = mutator.min_choices
+
+                with self.assertWarnsRegex(
+                        UserWarning, 'mutables with `arch param` detected'):
+                    mutator.set_max_choices()
+
+                with self.assertWarnsRegex(
+                        UserWarning, 'mutables with `arch param` detected'):
+                    mutator.set_min_choices()
+
+                mutator.set_choices(choices)
+
+                x = torch.rand([1, 3, 224, 224])
+                y = model(x)
+                self.assertEqual(list(y.shape), [1, 32, 114, 114])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_lsq_observer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_lsq_observer.py
new file mode 100644
index 0000000000000000000000000000000000000000..a61f95d7f76a01b4ebf8e31cf5a9434d485131f2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_lsq_observer.py
@@ -0,0 +1,77 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models import LSQObserver, LSQPerChannelObserver
+
+
+class TestLSQObserver(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.lsq = LSQObserver.with_args(
+            dtype=torch.quint8,
+            qscheme=torch.per_tensor_symmetric,
+            reduce_range=False,
+            quant_min=0,
+            quant_max=255)
+
+    def test_forward(self):
+        lsq_observer = self.lsq()
+        torch.manual_seed(42)
+        X = torch.rand(20, 10, dtype=torch.float32)
+        Y = lsq_observer(X)
+        # Output of observer is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+        X = torch.rand(0, dtype=torch.float32)
+        Y = lsq_observer(X)
+        # Output of observer is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+    def test_calculate_qparams(self):
+        lsq_observer = self.lsq()
+        X = torch.ones(10, dtype=torch.float32)
+        _ = lsq_observer(X)
+        scale, zero_point = lsq_observer.calculate_qparams()
+        # tensor_norm = 1, quant_max = 255
+        self.assertEqual(scale, 2 * torch.tensor([1.]) / (255**0.5))
+        self.assertEqual(zero_point, 127)
+
+
+class TestLSQPerChannelObserver(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.lsq = LSQPerChannelObserver.with_args(
+            dtype=torch.qint8,
+            qscheme=torch.per_channel_symmetric,
+            reduce_range=False,
+            quant_min=-127,
+            quant_max=127)
+
+    def test_forward(self):
+        lsq_observer = self.lsq()
+        torch.manual_seed(42)
+        X = torch.rand(2, 10, dtype=torch.float32)
+        Y = lsq_observer(X)
+        # Output of observer is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+        X = torch.rand(0, dtype=torch.float32)
+        Y = lsq_observer(X)
+        # Output of observer is identical to input
+        self.assertTrue(torch.equal(Y, X))
+
+    def test_calculate_qparams(self):
+        lsq_observer = self.lsq()
+        X = torch.ones(2, 10, dtype=torch.float32)
+        X[0] -= 1
+        _ = lsq_observer(X)
+        scale, zero_point = lsq_observer.calculate_qparams()
+        self.assertEqual(scale[0], 2 * torch.tensor([0.]) / (127**0.5))
+        self.assertEqual(scale[1], 2 * torch.tensor([1.]) / (127**0.5))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_torch_observers.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_torch_observers.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc32e69d8dd594f2881f027d5c2eb9ddc39b126b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_observers/test_torch_observers.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import pytest
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models.observers import register_torch_observers
+from mmrazor.registry import MODELS
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_register_torch_observers():
+
+    TORCH_observers = register_torch_observers()
+    assert isinstance(TORCH_observers, list)
+    for observer in TORCH_observers:
+        assert MODELS.get(observer)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_academic_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_academic_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..c95060a00afc40bb2294f0a3849a07def1b844b1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_academic_quantizer.py
@@ -0,0 +1,167 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from copy import copy
+from unittest import TestCase
+
+import torch
+from mmengine.model import BaseModule
+
+try:
+    from torch.ao.nn.intrinsic import ConvBnReLU2d
+    from torch.ao.quantization.backend_config import BackendConfig
+    from torch.ao.quantization.fx.custom_config import PrepareCustomConfig
+    from torch.ao.quantization.fx.graph_module import ObservedGraphModule
+    from torch.ao.quantization.qconfig_mapping import QConfigMapping
+    from torch.ao.quantization.quant_type import QuantType
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ConvBnReLU2d = get_placeholder('torch>=1.13')
+    BackendConfig = get_placeholder('torch>=1.13')
+    PrepareCustomConfig = get_placeholder('torch>=1.13')
+    ConObservedGraphModuleBnReLU2d = get_placeholder('torch>=1.13')
+    QConfigMapping = get_placeholder('torch>=1.13')
+    QuantType = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.quantizers import AcademicQuantizer
+from mmrazor.models.quantizers.academic_quantizer import (
+    FLOAT_TO_OBSERVED_DICT_KEY, GLOBAL_DICT_KEY, MODULE_NAME_DICT_KEY,
+    OBJECT_TYPE_DICT_KEY, PRESERVED_ATTRIBUTES_DICT_KEY)
+from mmrazor.registry import MODELS
+from mmrazor.testing import ConvBNReLU
+
+
+@MODELS.register_module()
+class ToyFloatModel(BaseModule):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+
+@MODELS.register_module()
+class ToyObservedModel(BaseModule):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+
+class TestAcademicQuantizer(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        self.global_qconfig = dict(
+            w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+            a_observer=dict(type='mmrazor.MinMaxObserver'),
+            w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            w_qscheme=dict(qdtype='qint8', bit=8, is_symmetry=True),
+            a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+        )
+        self.qconfig = dict(
+            w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+            a_observer=dict(type='mmrazor.MinMaxObserver'),
+            w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            w_qscheme=dict(qdtype='qint8', bit=4, is_symmetry=True),
+            a_qscheme=dict(qdtype='quint8', bit=4, is_symmetry=True),
+        )
+        self.model = ConvBNReLU(3, 3, norm_cfg=dict(type='BN'))
+
+    def test_gen_qconfig_mapping(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # test set GLOBAL_DICT_KEY by QConfigMapping
+        global_qconfig = copy(self.global_qconfig)
+        qconfig_mapping = {GLOBAL_DICT_KEY: global_qconfig}
+        quantizer = AcademicQuantizer(qconfig_mapping=qconfig_mapping)
+        assert hasattr(quantizer, 'qconfig_mapping')
+        assert isinstance(quantizer.qconfig_mapping, QConfigMapping)
+        assert quantizer.qconfig_mapping.global_qconfig
+
+        # test set OBJECT_TYPE_DICT_KEY by QConfigMapping
+        qconfig = copy(self.qconfig)
+        qconfig_mapping = {
+            OBJECT_TYPE_DICT_KEY:
+            [('torch.ao.nn.intrinsic.ConvBnReLU2d', qconfig)]
+        }
+        quantizer = AcademicQuantizer(qconfig_mapping=qconfig_mapping)
+        assert hasattr(quantizer, 'qconfig_mapping')
+        assert isinstance(quantizer.qconfig_mapping, QConfigMapping)
+        assert quantizer.qconfig_mapping.object_type_qconfigs.get(ConvBnReLU2d)
+
+        # test set MODULE_NAME_DICT_KEY by QConfigMapping
+        qconfig = copy(self.qconfig)
+        qconfig_mapping = {
+            MODULE_NAME_DICT_KEY: [('conv_module.conv', qconfig)]
+        }
+        quantizer = AcademicQuantizer(qconfig_mapping=qconfig_mapping)
+        assert hasattr(quantizer, 'qconfig_mapping')
+        assert isinstance(quantizer.qconfig_mapping, QConfigMapping)
+        assert quantizer.qconfig_mapping.module_name_qconfigs.get(
+            'conv_module.conv')
+
+    def test_gen_prepare_custom_config(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # test prepare_custom_config is None
+        global_qconfig = copy(self.global_qconfig)
+        qconfig_mapping = {GLOBAL_DICT_KEY: global_qconfig}
+        quantizer = AcademicQuantizer(qconfig_mapping=qconfig_mapping)
+        assert hasattr(quantizer, 'prepare_custom_config')
+        assert isinstance(quantizer.prepare_custom_config, PrepareCustomConfig)
+
+        # test set FLOAT_TO_OBSERVED_DICT_KEY and PRESERVED_ATTRIBUTES_DICT_KEY
+        # by PrepareCustomConfig
+        global_qconfig = copy(self.global_qconfig)
+        qconfig_mapping = {GLOBAL_DICT_KEY: global_qconfig}
+        flop_to_observed_list = [('ToyFloatModel', 'ToyObservedModel')]
+        preserved_attributes_list = ['toy_attr1', 'toy_attr2']
+        prepare_custom_config = {
+            FLOAT_TO_OBSERVED_DICT_KEY: flop_to_observed_list,
+            PRESERVED_ATTRIBUTES_DICT_KEY: preserved_attributes_list
+        }
+        quantizer = AcademicQuantizer(
+            qconfig_mapping=qconfig_mapping,
+            prepare_custom_config=prepare_custom_config)
+
+        assert hasattr(quantizer, 'prepare_custom_config')
+        assert isinstance(quantizer.prepare_custom_config, PrepareCustomConfig)
+        mapping = quantizer.prepare_custom_config.float_to_observed_mapping[
+            QuantType.STATIC]
+        assert mapping.get(ToyFloatModel)
+        assert mapping[ToyFloatModel] == ToyObservedModel
+
+        attributes = quantizer.prepare_custom_config.preserved_attributes
+        assert attributes == preserved_attributes_list
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        global_qconfig = copy(self.global_qconfig)
+        qconfig_mapping = {GLOBAL_DICT_KEY: global_qconfig}
+        quantizer = AcademicQuantizer(qconfig_mapping=qconfig_mapping)
+        assert hasattr(quantizer, 'backend_config')
+        assert isinstance(quantizer.backend_config, BackendConfig)
+
+    def test_prepare(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        global_qconfig = copy(self.global_qconfig)
+        qconfig_mapping = {GLOBAL_DICT_KEY: global_qconfig}
+        preserved_attributes_list = ['toy_attr1', 'toy_attr2']
+        prepare_custom_config = {
+            PRESERVED_ATTRIBUTES_DICT_KEY: preserved_attributes_list
+        }
+        quantizer = AcademicQuantizer(
+            qconfig_mapping=qconfig_mapping,
+            prepare_custom_config=prepare_custom_config)
+        model = copy(self.model)
+        prepared = quantizer.prepare(model)
+        assert isinstance(prepared, ObservedGraphModule)
+        assert hasattr(prepared, 'toy_attr1')
+        assert hasattr(prepared, 'toy_attr2')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_exporter.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_exporter.py
new file mode 100644
index 0000000000000000000000000000000000000000..04bd8a671339e01569602b29c4afd8d6a2a2b31d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_exporter.py
@@ -0,0 +1,348 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import shutil
+import tempfile
+from unittest import TestCase, skipIf
+
+import torch
+import torch.nn as nn
+
+try:
+    import onnx
+    from onnx import helper
+    from torch.fx import GraphModule
+except ImportError:
+    from mmrazor.utils import get_package_placeholder, get_placeholder
+    GraphModule = get_placeholder('torch>=1.13')
+    onnx = get_package_placeholder('No module named onnx')
+    helper = get_package_placeholder('No module named onnx.helper')
+
+from mmengine import ConfigDict
+from mmengine.model import BaseModel
+
+try:
+    import mmdeploy
+except ImportError:
+    from mmrazor.utils import get_package_placeholder
+    mmdeploy = get_package_placeholder('mmdeploy')
+
+from mmrazor import digit_version
+from mmrazor.models.quantizers.exporters import (OpenVinoQuantizeExportor,
+                                                 TensorRTExplicitExporter)
+from mmrazor.models.quantizers.exporters.optim_utils import ONNXOptimUtils
+from mmrazor.registry import MODELS
+
+
+class BasicBlock(nn.Module):
+
+    def __init__(self, in_channels, out_channels):
+        super(BasicBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.mid_channels = out_channels
+
+        self.norm1 = nn.BatchNorm2d(self.mid_channels)
+        self.norm2 = nn.BatchNorm2d(out_channels)
+        self.conv1 = nn.Conv2d(in_channels, self.mid_channels, 1)
+        self.conv2 = nn.Conv2d(self.mid_channels, out_channels, 1)
+
+        self.relu = nn.ReLU6()
+        self.drop_path = nn.Identity()
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            identity = x
+
+            out = self.conv1(x)
+            out = self.norm1(out)
+            out = self.relu(out)
+
+            out = self.conv2(out)
+            out = self.norm2(out)
+
+            out = self.drop_path(out)
+
+            out += identity
+
+            return out
+
+        out = _inner_forward(x)
+
+        out = self.relu(out)
+
+        return out
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super(ToyModel, self).__init__()
+        self.stem_layer = nn.Sequential(
+            nn.Conv2d(3, 3, 1), nn.BatchNorm2d(3), nn.ReLU())
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.block = BasicBlock(3, 3)
+        self.block2 = BasicBlock(3, 3)
+        self.gap = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc = nn.Linear(3, 4)
+
+    def forward(self, x):
+        x = self.stem_layer(x)
+        x = self.maxpool(x)
+        x = self.block(x)
+        x = self.block2(x)
+        x = self.gap(x)
+        x = x.flatten(1)
+        x = self.fc(x)
+        return x
+
+
+class ToyQuantModel(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.architecture = ToyModel()
+
+    def loss(self, outputs, data_samples):
+        return dict(loss=outputs.sum() - data_samples.sum())
+
+    def forward(self, inputs, data_samples, mode: str = 'tensor'):
+        if isinstance(inputs, list):
+            inputs = torch.stack(inputs)
+        outputs = self.architecture(inputs)
+
+        return outputs
+
+
+OpenVINO_GLOBAL_QCONFIG = ConfigDict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+OpenVINO_ALG_CONFIG = ConfigDict(
+    type='mmrazor.MMArchitectureQuant',
+    architecture=dict(type='ToyQuantModel'),
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=OpenVINO_GLOBAL_QCONFIG,
+        tracer=dict(type='mmrazor.CustomTracer')))
+
+TensorRT_GLOBAL_QCONFIG = ConfigDict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(qdtype='qint8', bit=8, is_symmetry=True),
+    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+)
+
+TensorRT_ALG_CONFIG = ConfigDict(
+    type='mmrazor.MMArchitectureQuant',
+    architecture=dict(type='ToyQuantModel'),
+    quantizer=dict(
+        type='mmrazor.TensorRTQuantizer',
+        global_qconfig=OpenVINO_GLOBAL_QCONFIG,
+        tracer=dict(type='mmrazor.CustomTracer')))
+
+
+@skipIf(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    'PyTorch version lower than 1.13.0 is not supported.')
+class TestONNXOptimUtils(TestCase):
+
+    def setUp(self):
+        MODELS.register_module(module=ToyQuantModel, force=True)
+        self.temp_dir = tempfile.mkdtemp()
+        filename = 'symbolic.onnx'
+        filename = os.path.join(self.temp_dir, filename)
+        toy_model = MODELS.build(OpenVINO_ALG_CONFIG)
+        observed_model = toy_model.get_deploy_model()
+        torch.onnx.export(
+            observed_model,
+            torch.rand(2, 3, 16, 16),
+            filename,
+            opset_version=11)
+        self.onnx_model = onnx.load(filename)
+        self.optimizer = ONNXOptimUtils
+
+    def tearDown(self):
+        MODELS.module_dict.pop('ToyQuantModel')
+        shutil.rmtree(self.temp_dir)
+
+    def test_map_name_and_data(self):
+        params = self.optimizer.map_name_and_data(self.onnx_model)
+        params_keys = [
+            'activation_post_process_0.scale',
+            'activation_post_process_0.zero_point',
+            'architecture.stem_layer.0.weight',
+            'architecture.stem_layer.0.bias',
+            'architecture.stem_layer.0.weight_fake_quant.scale',
+            'architecture.stem_layer.0.weight_fake_quant.zero_point',
+            'architecture.block.conv1.weight', 'architecture.block.conv1.bias',
+            'architecture.block.conv1.weight_fake_quant.scale',
+            'architecture.block.conv2.bias',
+            'architecture.block2.conv1.weight',
+            'architecture.block2.conv1.bias',
+            'architecture.block2.conv1.weight_fake_quant.scale',
+            'architecture.block2.conv2.weight',
+            'architecture.block2.conv2.bias',
+            'architecture.block2.conv2.weight_fake_quant.scale',
+            'architecture.fc.weight', 'architecture.fc.bias',
+            'architecture.fc.weight_fake_quant.scale',
+            'architecture.fc.weight_fake_quant.zero_point',
+            'activation_post_process_15.zero_point',
+            'activation_post_process_15.scale',
+            'activation_post_process_14.zero_point',
+            'activation_post_process_14.scale',
+            'activation_post_process_12.zero_point',
+            'activation_post_process_12.scale',
+            'activation_post_process_10.zero_point',
+            'activation_post_process_10.scale',
+            'activation_post_process_8.zero_point',
+            'activation_post_process_8.scale',
+            'activation_post_process_6.zero_point',
+            'activation_post_process_6.scale',
+            'activation_post_process_4.zero_point',
+            'activation_post_process_4.scale',
+            'activation_post_process_1.zero_point',
+            'activation_post_process_1.scale',
+            'architecture.block2.conv2.weight_fake_quant.zero_point',
+            'architecture.block2.conv1.weight_fake_quant.zero_point',
+            'architecture.block.conv2.weight_fake_quant.zero_point',
+            'architecture.block.conv2.weight_fake_quant.scale',
+            'architecture.block.conv2.weight',
+            'architecture.block.conv1.weight_fake_quant.zero_point',
+            '/activation_post_process_0/Constant_output_0',
+            '/activation_post_process_0/Constant_1_output_0',
+            '/stem_layer.0/weight_fake_quant/Constant_output_0',
+            '/stem_layer.0/weight_fake_quant/Constant_1_output_0',
+            '/relu/Constant_output_0', '/relu/Constant_1_output_0',
+            '/relu_dup1/Constant_output_0', '/relu_dup1/Constant_1_output_0',
+            '/relu_1/Constant_output_0', '/relu_1/Constant_1_output_0',
+            '/relu_dup1_1/Constant_output_0',
+            '/relu_dup1_1/Constant_1_output_0'
+        ]
+        self.assertEqual(set(params.keys()), set(params_keys))
+
+    def test_map_name_and_initializer(self):
+        initializers = self.optimizer.map_name_and_initializer(self.onnx_model)
+        for init in self.onnx_model.graph.initializer:
+            self.assertIn(init.name, initializers.keys())
+        # self.assertEqual(set(initializers.keys()), set(initializers_keys))
+
+    def test_map_output_and_node(self):
+        _ = self.optimizer.map_output_and_node(self.onnx_model)
+
+    def test_map_input_and_node(self):
+        _ = self.optimizer.map_input_and_node(self.onnx_model)
+
+    def test_remove_node_from_onnx(self):
+        onnx_model = copy.deepcopy(self.onnx_model)
+        node_to_remove = next(iter(onnx_model.graph.node))
+        self.optimizer.remove_node_from_onnx(node_to_remove, onnx_model)
+        for node in onnx_model.graph.node:
+            self.assertNotEqual(node, node_to_remove)
+
+    def test_remove_initializer_from_onnx(self):
+        onnx_model = copy.deepcopy(self.onnx_model)
+        initializer_to_remove = next(iter(onnx_model.graph.initializer))
+        self.optimizer.remove_initializer_from_onnx(initializer_to_remove,
+                                                    onnx_model)
+        for initializer in onnx_model.graph.initializer:
+            self.assertNotEqual(initializer, initializer_to_remove)
+
+    def test_find_standalone_nodes(self):
+        standalone_nodes = self.optimizer.find_standalone_nodes(
+            self.onnx_model)
+        self.assertEqual(standalone_nodes, [])
+
+    def test_find_redundant_initializers(self):
+        redundant_initializers = self.optimizer.find_redundant_initializers(
+            self.onnx_model)
+        self.assertEqual(redundant_initializers, [])
+
+    def test_topo_sort(self):
+        onnx_model = copy.deepcopy(self.onnx_model)
+        onnx_model_topo_sort = self.optimizer.topo_sort(onnx_model)
+        self.assertEqual(
+            len(onnx_model_topo_sort.graph.node),
+            len(self.onnx_model.graph.node))
+
+    def test_optimize(self):
+        onnx_model = copy.deepcopy(self.onnx_model)
+        fake_node = helper.make_node('fake_node', [], [], mode='constant')
+        self.optimizer.insert_node_to_onnx(fake_node, onnx_model)
+        self.optimizer.optimize(onnx_model)
+        for node in onnx_model.graph.node:
+            self.assertNotEqual(node, fake_node)
+
+
+@skipIf(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    'PyTorch version lower than 1.13.0 is not supported.')
+class TestOpenVinoQuantizeExportor(TestCase):
+
+    def setUp(self):
+        MODELS.register_module(module=ToyQuantModel, force=True)
+        self.temp_dir = tempfile.mkdtemp()
+        filename = 'toy_model_symbolic.onnx'
+        filename = os.path.join(self.temp_dir, filename)
+        toy_model = MODELS.build(OpenVINO_ALG_CONFIG)
+        observed_model = toy_model.get_deploy_model()
+        torch.onnx.export(
+            observed_model,
+            torch.rand(2, 3, 16, 16),
+            filename,
+            opset_version=11)
+        self.onnx_model = onnx.load(filename)
+        self.export_path = os.path.join(self.temp_dir, 'toy_model.onnx')
+
+    def tearDown(self):
+        MODELS.module_dict.pop('ToyQuantModel')
+        shutil.rmtree(self.temp_dir)
+
+    def test_export(self):
+        exporter = OpenVinoQuantizeExportor(self.onnx_model, self.export_path)
+        exporter.export()
+        self.assertTrue(os.path.exists(self.export_path))
+        onnx_model = onnx.load(self.export_path)
+        self.assertIsInstance(onnx_model, onnx.ModelProto)
+
+
+@skipIf(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    'PyTorch version lower than 1.13.0 is not supported.')
+class TestTensorRTExplicitExporter(TestCase):
+
+    def setUp(self):
+        MODELS.register_module(module=ToyQuantModel, force=True)
+        self.temp_dir = tempfile.mkdtemp()
+        filename = 'toy_model_symbolic.onnx'
+        filename = os.path.join(self.temp_dir, filename)
+        toy_model = MODELS.build(TensorRT_ALG_CONFIG)
+        observed_model = toy_model.get_deploy_model()
+        torch.onnx.export(
+            observed_model,
+            torch.rand(2, 3, 16, 16),
+            filename,
+            opset_version=11)
+        self.onnx_model = onnx.load(filename)
+        self.export_path = os.path.join(self.temp_dir, 'toy_model.onnx')
+
+    def tearDown(self):
+        MODELS.module_dict.pop('ToyQuantModel')
+        shutil.rmtree(self.temp_dir)
+
+    def test_export(self):
+        exporter = TensorRTExplicitExporter(self.onnx_model, self.export_path)
+        exporter.export()
+        self.assertTrue(os.path.exists(self.export_path))
+        onnx_model = onnx.load(self.export_path)
+        self.assertIsInstance(onnx_model, onnx.ModelProto)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_native_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_native_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f982c1394283807e29bfc45e7832e521a013998
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_native_quantizer.py
@@ -0,0 +1,224 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+
+from mmrazor import digit_version
+from mmrazor.models.quantizers import TorchNativeQuantizer
+from mmrazor.models.quantizers.native_quantizer import SUPPORT_QAT_MODULES
+from mmrazor.models.task_modules.tracer import CustomTracer
+from mmrazor.models.task_modules.tracer.fx.custom_tracer import \
+    build_graphmodule
+from mmrazor.registry import MODELS
+from mmrazor.structures.quantization import BackendConfigs, QConfigHandler
+
+try:
+    from torch.ao.quantization.fx import prepare
+    from torch.ao.quantization.fx.graph_module import ObservedGraphModule
+    from torch.ao.quantization.qconfig_mapping import QConfigMapping
+    from torch.ao.quantization.quantize_fx import _fuse_fx
+    from torch.fx import GraphModule
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    GraphModule = get_placeholder('torch>=1.13')
+    ObservedGraphModule = get_placeholder('torch>=1.13')
+    QConfigMapping = get_placeholder('torch>=1.13')
+    prepare = get_placeholder('torch>=1.13')
+    _fuse_fx = get_placeholder('torch>=1.13')
+
+
+class BasicBlock(nn.Module):
+
+    def __init__(self, in_channels, out_channels):
+        super(BasicBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.mid_channels = out_channels
+
+        self.norm1 = nn.BatchNorm2d(self.mid_channels)
+        self.norm2 = nn.BatchNorm2d(out_channels)
+        self.conv1 = nn.Conv2d(in_channels, self.mid_channels, 1)
+        self.conv2 = nn.Conv2d(self.mid_channels, out_channels, 1)
+
+        self.relu = nn.ReLU6()
+        self.drop_path = nn.Identity()
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            identity = x
+
+            out = self.conv1(x)
+            out = self.norm1(out)
+            out = self.relu(out)
+
+            out = self.conv2(out)
+            out = self.norm2(out)
+
+            out = self.drop_path(out)
+
+            out += identity
+
+            return out
+
+        out = _inner_forward(x)
+
+        out = self.relu(out)
+
+        return out
+
+
+class ToyQuantModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.stem_layer = nn.Sequential(
+            nn.Conv2d(3, 3, 1), nn.BatchNorm2d(3), nn.ReLU())
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.block = BasicBlock(3, 3)
+        self.block2 = BasicBlock(3, 3)
+        self.gap = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc = nn.Linear(3, 4)
+
+    def forward(self, x):
+        x = self.stem_layer(x)
+        x = self.maxpool(x)
+        x = self.block(x)
+        x = self.block2(x)
+        x = self.gap(x)
+        x = x.flatten(1)
+        x = self.fc(x)
+        return x
+
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1))
+
+no_observer_modules = [
+    'torch.nn.Conv2d',
+]
+
+q_kwargs = dict(
+    type='mmrazor.TorchNativeQuantizer',
+    global_qconfig=global_qconfig,
+    no_observer_modules=no_observer_modules,
+    tracer=dict(type='CustomTracer'),
+)
+
+
+class TestTorchNativeQuantizer(TestCase):
+    """TODO.
+
+    Args:
+        TestCase (_type_): _description_
+    """
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.q_kwargs = q_kwargs
+        self.tracer = CustomTracer()
+        self.backend_config = BackendConfigs['native']
+        self.qconfig = QConfigHandler(global_qconfig)
+        self.qconfig_mapping = QConfigMapping().set_global(
+            self.qconfig.convert())
+        self.example_inputs = (torch.randn(1, 3, 224, 224), )
+        self.native_quantizer = MODELS.build(self.q_kwargs)
+
+    def tearDown(self):
+        pass
+
+    def swap_ff_with_fxff(self, model):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        modules_to_swap = []
+        for name, module in model.named_children():
+            if isinstance(module, torch.ao.nn.quantized.FloatFunctional):
+                modules_to_swap.append(name)
+            else:
+                self.swap_ff_with_fxff(module)
+
+        for name in modules_to_swap:
+            del model._modules[name]
+            model._modules[name] = torch.ao.nn.quantized.FXFloatFunctional()
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        native_quantizer = MODELS.build(self.q_kwargs)
+        self.assertIsInstance(native_quantizer, TorchNativeQuantizer)
+
+    def test_prepare(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        toy_model = ToyQuantModel()
+        toy_model.eval()
+
+        self.swap_ff_with_fxff(toy_model)
+        traced_graph = self.tracer.trace(toy_model)
+        graph_module = build_graphmodule(toy_model, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        assert isinstance(graph_module, GraphModule)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+        assert isinstance(prepared, ObservedGraphModule)
+
+        prepared = self.native_quantizer.del_redundant_fakequant(prepared)
+        assert isinstance(prepared, GraphModule)
+
+    def post_process_for_deploy(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        toy_model = ToyQuantModel()
+        toy_model.eval()
+
+        self.swap_ff_with_fxff(toy_model)
+        traced_graph = self.tracer.trace(toy_model)
+        graph_module = build_graphmodule(toy_model, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        assert isinstance(graph_module, GraphModule)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+        assert isinstance(prepared, ObservedGraphModule)
+
+        prepared = self.native_quantizer.del_redundant_fakequant(prepared)
+        assert isinstance(prepared, GraphModule)
+
+        prepared_no_fq = prepared
+
+        self.native_quantizer.post_process_weight_fakequant(prepared)
+        for name, child in prepared.named_children():
+            if isinstance(child, SUPPORT_QAT_MODULES):
+                raise ValueError
+        self.native_quantizer.post_process_weight_fakequant(
+            prepared_no_fq, True)
+        for name, child in prepared_no_fq.named_children():
+            if isinstance(child, SUPPORT_QAT_MODULES):
+                raise ValueError
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_openvino_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_openvino_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b60dc4a3060e0b8c99847c5ba006ed41f74a500
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_openvino_quantizer.py
@@ -0,0 +1,55 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import shutil
+import tempfile
+from copy import copy
+from unittest import TestCase
+
+import torch
+
+try:
+    from torch.ao.quantization.fx.graph_module import ObservedGraphModule
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ObservedGraphModule = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.quantizers import OpenVINOQuantizer
+from mmrazor.testing import ConvBNReLU
+
+
+class TestOpenVINOQuantizer(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        self.global_qconfig = dict(
+            w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+            a_observer=dict(type='mmrazor.MinMaxObserver'),
+            w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            w_qscheme=dict(qdtype='qint8', bit=8, is_symmetry=True),
+            a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+        )
+        self.temp_dir = tempfile.mkdtemp()
+        self.model = ConvBNReLU(3, 3, norm_cfg=dict(type='BN'))
+
+    def tearDown(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        shutil.rmtree(self.temp_dir)
+
+    def test_property(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        global_qconfig = copy(self.global_qconfig)
+        quantizer = OpenVINOQuantizer(global_qconfig=global_qconfig)
+        assert quantizer.backend == 'openvino'
+        assert quantizer.support_w_modes == ('per_tensor', 'per_channel')
+        assert quantizer.support_a_modes == ('per_tensor')
+        assert quantizer.module_prev_wo_fakequant
+        assert quantizer.module_next_wo_fakequant
+        assert quantizer.method_next_wo_fakequant
+        assert quantizer.op_prev_wo_fakequant
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_tensorrt_quantizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_tensorrt_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..f5433a0f9b91eebbeed521946fb8f6dfb0f5d60c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_quantizers/test_tensorrt_quantizer.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import shutil
+import tempfile
+from copy import copy
+from unittest import TestCase
+
+import torch
+
+try:
+    from torch.ao.quantization.fx.graph_module import ObservedGraphModule
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    ObservedGraphModule = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.quantizers import TensorRTQuantizer
+from mmrazor.testing import ConvBNReLU
+
+
+class TestTensorRTQuantizer(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        self.global_qconfig = dict(
+            w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+            a_observer=dict(type='mmrazor.MinMaxObserver'),
+            w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+            w_qscheme=dict(qdtype='qint8', bit=8, is_symmetry=True),
+            a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+        )
+        self.temp_dir = tempfile.mkdtemp()
+        self.model = ConvBNReLU(3, 3, norm_cfg=dict(type='BN'))
+
+    def tearDown(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        shutil.rmtree(self.temp_dir)
+
+    def test_property(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        global_qconfig = copy(self.global_qconfig)
+        quantizer = TensorRTQuantizer(global_qconfig=global_qconfig)
+        assert quantizer.backend == 'tensorrt'
+        assert quantizer.support_w_modes == ('per_tensor', 'per_channel')
+        assert quantizer.support_a_modes == ('per_tensor')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_candidate.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_candidate.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f8bfe6402acd459b7112402a0aa20ef11ffa529
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_candidate.py
@@ -0,0 +1,152 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+from collections import UserList
+from unittest import TestCase
+
+from mmrazor.structures import Candidates
+
+
+class TestCandidates(TestCase):
+
+    def setUp(self) -> None:
+        self.fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        self.fake_subnet_with_resource = {
+            str(self.fake_subnet): {
+                'score': 0.,
+                'flops': 50.,
+                'params': 0.,
+                'latency': 0.
+            }
+        }
+        self.fake_subnet_with_score = {
+            str(self.fake_subnet): {
+                'score': 99.,
+                'flops': 0.,
+                'params': 0.,
+                'latency': 0.
+            }
+        }
+        self.has_flops_network = {
+            str(self.fake_subnet): {
+                'flops': 50.,
+            }
+        }
+
+    def test_init(self):
+        # initlist is None
+        candidates = Candidates()
+        self.assertEqual(len(candidates.data), 0)
+        # initlist is list
+        data = [self.fake_subnet] * 2
+        candidates = Candidates(data)
+        self.assertEqual(len(candidates.data), 2)
+        # initlist is UserList
+        data = UserList([self.fake_subnet] * 2)
+        self.assertEqual(len(candidates.data), 2)
+        self.assertEqual(candidates.resources('flops'), [-1, -1])
+        # initlist is list(Dict[str, Dict])
+        candidates = Candidates([self.has_flops_network] * 2)
+        self.assertEqual(candidates.resources('flops'), [50., 50.])
+
+    def test_scores(self):
+        # test property: scores
+        data = [self.fake_subnet_with_score] * 2
+        candidates = Candidates(data)
+        self.assertEqual(candidates.scores, [99., 99.])
+
+    def test_resources(self):
+        data = [self.fake_subnet_with_resource] * 2
+        candidates = Candidates(data)
+        self.assertEqual(candidates.resources('flops'), [50., 50.])
+
+    def test_subnets(self):
+        # test property: subnets
+        data = [self.fake_subnet] * 2
+        candidates = Candidates(data)
+        self.assertEqual(candidates.subnets, [self.fake_subnet] * 2)
+
+    def test_append(self):
+        # item is dict
+        candidates = Candidates()
+        candidates.append(self.fake_subnet)
+        self.assertEqual(len(candidates), 1)
+        # item is List
+        candidates = Candidates()
+        candidates.append([self.fake_subnet_with_score])
+        # item is Candidates
+        candidates_2 = Candidates([self.fake_subnet_with_resource])
+        candidates.append(candidates_2)
+        self.assertEqual(len(candidates), 2)
+
+    def test_insert(self):
+        # item is dict
+        candidates = Candidates(self.fake_subnet_with_score)
+        candidates.insert(1, self.fake_subnet)
+        self.assertEqual(len(candidates), 2)
+        # item is List
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates.insert(1, self.fake_subnet_with_score)
+        self.assertEqual(len(candidates), 2)
+
+    def test_extend(self):
+        # other is list
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates.extend([self.fake_subnet])
+        self.assertEqual(len(candidates), 2)
+        # other is Candidates
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates_2 = Candidates([self.fake_subnet_with_resource])
+        candidates.extend(candidates_2)
+        self.assertEqual(len(candidates), 2)
+
+    def test_set_resource(self):
+        # test set_resource
+        candidates = Candidates([self.fake_subnet])
+        for kk in ['flops', 'params', 'latency']:
+            self.assertEqual(candidates.resources(kk)[0], -1)
+            candidates.set_resource(0, 49.9, kk)
+            self.assertEqual(candidates.resources(kk)[0], 49.9)
+        candidates.insert(0, self.fake_subnet_with_resource)
+        self.assertEqual(len(candidates), 2)
+        self.assertEqual(candidates.resources('flops'), [50., 49.9])
+        self.assertEqual(candidates.resources('latency'), [0., 49.9])
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates.set_resource(0, 100.0, 'score')
+        self.assertEqual(candidates.scores[0], 100.)
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates.set_resource(0, 100.0, 'score')
+        candidates.extend(UserList([self.fake_subnet_with_resource]))
+        candidates.set_resource(1, 99.9, 'score')
+        self.assertEqual(candidates.scores, [100., 99.9])
+
+    def test_update_resources(self):
+        # test update_resources
+        candidates = Candidates([self.fake_subnet])
+        candidates.append([self.fake_subnet_with_score])
+        candidates_2 = Candidates(self.fake_subnet_with_resource)
+        candidates.append(candidates_2)
+        self.assertEqual(len(candidates), 3)
+        self.assertEqual(candidates.resources('flops'), [-1, 0., 50.])
+        self.assertEqual(candidates.resources('latency'), [-1, 0., 0.])
+        resources = [{'flops': -2}, {'latency': 4.}]
+        candidates.update_resources(resources, start=1)
+        self.assertEqual(candidates.resources('flops'), [-1, -2, 50.])
+        self.assertEqual(candidates.resources('latency'), [-1, 0., 4])
+        candidates.update_resources(resources, start=0)
+        self.assertEqual(candidates.resources('flops'), [-2, -2, 50.])
+        self.assertEqual(candidates.resources('latency'), [-1, 4., 4.])
+
+    def test_sort(self):
+        # test set_sort
+        candidates = Candidates([self.fake_subnet_with_score])
+        candidates.extend(UserList([self.fake_subnet_with_resource]))
+        candidates.insert(0, self.fake_subnet)
+        candidates.set_resource(0, 100., 'score')
+        candidates.set_resource(2, 98., 'score')
+        self.assertEqual(candidates.scores, [100., 99., 98.])
+        candidates.sort_by(key_indicator='score', reverse=False)
+        self.assertEqual(candidates.scores, [98., 99., 100.])
+        candidates.sort_by(key_indicator='latency')
+        self.assertEqual(candidates.scores, [98., 99., 100.])
+        candidates.sort_by(key_indicator='flops', reverse=False)
+        self.assertEqual(candidates.scores, [100., 99., 98.])
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_fix_subnet.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_fix_subnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..60fb39f8dae37eca568ab911e0e62383fb87a6e4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_subnet/test_fix_subnet.py
@@ -0,0 +1,150 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch.nn as nn
+
+from mmrazor.models import *  # noqa:F403,F401
+from mmrazor.models.architectures.dynamic_ops import BigNasConv2d
+from mmrazor.models.mutables import OneShotMutableOP, OneShotMutableValue
+from mmrazor.registry import MODELS
+from mmrazor.structures import export_fix_subnet, load_fix_subnet
+from mmrazor.utils import FixMutable
+
+MODELS.register_module()
+
+
+class MockModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        convs1 = nn.ModuleDict({
+            'conv1': nn.Conv2d(3, 8, 1),
+            'conv2': nn.Conv2d(3, 8, 1),
+            'conv3': nn.Conv2d(3, 8, 1),
+        })
+        convs2 = nn.ModuleDict({
+            'conv1': nn.Conv2d(8, 16, 1),
+            'conv2': nn.Conv2d(8, 16, 1),
+            'conv3': nn.Conv2d(8, 16, 1),
+        })
+
+        self.mutable1 = OneShotMutableOP(convs1)
+        self.mutable2 = OneShotMutableOP(convs2)
+        self.mutable3 = nn.Sequential(BigNasConv2d(16, 16, 5))
+
+        mutable_kernel_size = OneShotMutableValue(
+            alias='mutable3.0.kernel_size', value_list=[3, 5])
+        self.mutable3[0].register_mutable_attr('kernel_size',
+                                               mutable_kernel_size)
+
+    def forward(self, x):
+        x = self.mutable1(x)
+        x = self.mutable2(x)
+        x = self.mutable3(x)
+        return x
+
+
+class MockModelWithDerivedMutable(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.source_mutable = OneShotMutableValue([2, 3, 4], default_value=3)
+        self.derived_mutable = self.source_mutable * 2
+
+
+class TestFixSubnet(TestCase):
+
+    def test_load_fix_subnet(self):
+        # fix subnet is str
+        fix_subnet = 'tests/data/test_models/test_subnet/mockmodel_subnet.yaml'  # noqa: E501
+        model = MockModel()
+
+        load_fix_subnet(model, fix_subnet)
+
+        # fix subnet is dict
+        fix_subnet = {
+            'mutable1': {
+                'chosen': 'conv1'
+            },
+            'mutable2': {
+                'chosen': 'conv2'
+            },
+            'mutable3.0.kernel_size': {
+                'chosen': 3
+            }
+        }
+
+        model = MockModel()
+        load_fix_subnet(model, fix_subnet)
+
+        model = MockModel()
+        load_fix_subnet(model, fix_subnet)
+
+        with pytest.raises(TypeError):
+            # type int is not supported.
+            model = MockModel()
+            load_fix_subnet(model, fix_subnet=10)
+
+        model = MockModel()
+        fix_subnet.pop('mutable1')
+        with pytest.raises(RuntimeError):
+            load_fix_subnet(model, fix_subnet)
+
+    def test_export_fix_subnet(self):
+        # get FixSubnet
+        fix_subnet = {
+            'mutable1': {
+                'chosen': 'conv1'
+            },
+            'mutable2': {
+                'chosen': 'conv2'
+            },
+            'mutable3.0.kernel_size': {
+                'chosen': 3
+            }
+        }
+
+        model = MockModel()
+        load_fix_subnet(model, fix_subnet)
+
+        with pytest.raises(AssertionError):
+            exported_fix_subnet: FixMutable = export_fix_subnet(model)[0]
+
+        model = MockModel()
+        model.mutable1.current_choice = 'conv1'
+        model.mutable2.current_choice = 'conv2'
+        model.mutable3[0].mutable_attrs.kernel_size.current_choice = 3
+        exported_fix_subnet = export_fix_subnet(model)[0]
+
+        mutable1_dump_chosen = exported_fix_subnet['mutable1']
+        mutable2_dump_chosen = exported_fix_subnet['mutable2']
+        mutable3_0_ks_chosen = exported_fix_subnet['mutable3.0.kernel_size']
+
+        mutable1_chosen_dict = dict(chosen=mutable1_dump_chosen.chosen)
+        mutable2_chosen_dict = dict(chosen=mutable2_dump_chosen.chosen)
+        mutable3_0_ks_chosen_dict = dict(chosen=mutable3_0_ks_chosen.chosen)
+
+        exported_fix_subnet['mutable1'] = mutable1_chosen_dict
+        exported_fix_subnet['mutable2'] = mutable2_chosen_dict
+        exported_fix_subnet['mutable3.0.kernel_size'] = \
+            mutable3_0_ks_chosen_dict
+        self.assertDictEqual(fix_subnet, exported_fix_subnet)
+
+    def test_export_fix_subnet_with_derived_mutable(self) -> None:
+        model = MockModelWithDerivedMutable()
+        fix_subnet = export_fix_subnet(model)[0]
+        self.assertDictEqual(
+            fix_subnet, {
+                'source_mutable': model.source_mutable.dump_chosen(),
+                'derived_mutable': model.source_mutable.dump_chosen()
+            })
+
+        fix_subnet['source_mutable'] = dict(
+            fix_subnet['source_mutable']._asdict())
+        fix_subnet['source_mutable']['chosen'] = 4
+        load_fix_subnet(model, fix_subnet)
+
+        assert model.source_mutable.current_choice == 4
+        assert model.derived_mutable.current_choice == 8
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_custom_tracer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_custom_tracer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d01ea4962031fd5be30f314a91db5fcdf870a82
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_custom_tracer.py
@@ -0,0 +1,184 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+from mmcls.models.backbones.resnet import ResLayer
+from mmengine.config import Config
+from mmengine.registry import MODELS
+
+try:
+    from torch.fx import GraphModule
+    from torch.fx._symbolic_trace import Graph
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    GraphModule = get_placeholder('torch>=1.13')
+    Graph = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.task_modules.tracer import (CustomTracer,
+                                                UntracedMethodRegistry,
+                                                build_graphmodule,
+                                                custom_symbolic_trace)
+from mmrazor.models.task_modules.tracer.fx.custom_tracer import \
+    _prepare_module_dict
+
+
+class ToyModel(torch.nn.Module):
+
+    def __init__(self):
+        super().__init__()
+
+    def get_loss(self, x):
+        return x * 0.1
+
+    def extrac_feature(self, x):
+        return x * 2
+
+    def forward(self, x):
+        x = self.extrac_feature(x)
+        x = self.get_loss(x)
+        return x
+
+
+class testUntracedMethodRgistry(TestCase):
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        method = ToyModel.get_loss
+        method_registry = UntracedMethodRegistry(method)
+        assert hasattr(method_registry, 'method')
+        assert hasattr(method_registry, 'method_dict')
+
+    def test_registry_method(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model = ToyModel
+        method = ToyModel.get_loss
+        method_registry = UntracedMethodRegistry(method)
+        method_registry.__set_name__(model, 'get_loss')
+        assert 'get_loss' in method_registry.method_dict.keys()
+        assert method_registry.method_dict['get_loss']['mod'] == model
+
+
+class testCustomTracer(TestCase):
+
+    def setUp(self):
+        self.cfg = Config.fromfile(
+            'tests/data/test_models/test_task_modules/mmcls_cfg.py')
+        self.skipped_methods = [
+            'mmcls.models.heads.ClsHead._get_loss',
+            'mmcls.models.heads.ClsHead._get_predictions'
+        ]
+        self.skipped_module_names = ['backbone.layer4.0']
+        self.skipped_module_classes = [ResLayer]
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # init without skipped_methods
+        tracer = CustomTracer()
+        assert hasattr(tracer, 'skipped_methods')
+        assert len(tracer.skipped_methods) == 0
+        # init with skipped_methods(list)
+        UntracedMethodRegistry.method_dict = dict()
+        tracer = CustomTracer(skipped_methods=self.skipped_methods)
+        assert '_get_loss' in UntracedMethodRegistry.method_dict.keys()
+        assert '_get_predictions' in UntracedMethodRegistry.method_dict.keys()
+        # init with skipped_methods(str)
+        UntracedMethodRegistry.method_dict = dict()
+        tracer = CustomTracer(skipped_methods=self.skipped_methods[0])
+        assert '_get_loss' in UntracedMethodRegistry.method_dict.keys()
+        # init with skipped_methods(int, error)
+        with self.assertRaises(TypeError):
+            CustomTracer(skipped_methods=123)
+        # init with skipped_methods(str, error)
+        with self.assertRaises(AssertionError):
+            CustomTracer(skipped_methods='_get_loss')
+
+    def test_trace(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # test trace with skipped_methods
+        model = MODELS.build(self.cfg.model)
+        UntracedMethodRegistry.method_dict = dict()
+        tracer = CustomTracer(skipped_methods=self.skipped_methods)
+        graph_tensor = tracer.trace(model, concrete_args={'mode': 'tensor'})
+        graph_loss = tracer.trace(model, concrete_args={'mode': 'loss'})
+        graph_predict = tracer.trace(model, concrete_args={'mode': 'predict'})
+        assert isinstance(graph_tensor, Graph)
+        assert isinstance(graph_loss, Graph)
+        skip_flag_loss = False
+        for node in graph_loss.nodes:
+            if node.op == 'call_method' and node.target == '_get_loss':
+                skip_flag_loss = True
+        assert isinstance(graph_predict, Graph)
+        skip_flag_predict = False
+        for node in graph_predict.nodes:
+            if node.op == 'call_method' and node.target == '_get_predictions':
+                skip_flag_predict = True
+        assert skip_flag_loss and skip_flag_predict
+
+        # test trace with skipped_module_names
+        model = MODELS.build(self.cfg.model)
+        UntracedMethodRegistry.method_dict = dict()
+        tracer = CustomTracer(skipped_module_names=self.skipped_module_names)
+        graph_tensor = tracer.trace(model, concrete_args={'mode': 'tensor'})
+        skip_flag = False
+        for node in graph_tensor.nodes:
+            skipped_module_name = self.skipped_module_names[0]
+            if node.op == 'call_module' and node.target == skipped_module_name:
+                skip_flag = True
+        assert skip_flag
+
+        # test trace with skipped_module_classes
+        model = MODELS.build(self.cfg.model)
+        UntracedMethodRegistry.method_dict = dict()
+        tracer = CustomTracer(
+            skipped_module_classes=self.skipped_module_classes)
+        graph_tensor = tracer.trace(model, concrete_args={'mode': 'tensor'})
+        skip_flag = False
+        for node in graph_tensor.nodes:
+            if node.op == 'call_module' and node.target == 'backbone.layer1':
+                skip_flag = True
+        assert skip_flag
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_custom_symbolic_trace():
+    cfg = Config.fromfile(
+        'tests/data/test_models/test_task_modules/mmcls_cfg.py')
+    model = MODELS.build(cfg.model)
+    UntracedMethodRegistry.method_dict = dict()
+    graph_module = custom_symbolic_trace(
+        model, concrete_args={'mode': 'tensor'})
+    assert isinstance(graph_module, GraphModule)
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_build_graphmodule():
+    skipped_methods = ['mmcls.models.heads.ClsHead._get_predictions']
+    cfg = Config.fromfile(
+        'tests/data/test_models/test_task_modules/mmcls_cfg.py')
+    model = MODELS.build(cfg.model)
+    UntracedMethodRegistry.method_dict = dict()
+    tracer = CustomTracer(skipped_methods=skipped_methods)
+    graph_predict = tracer.trace(model, concrete_args={'mode': 'predict'})
+    graph_module = build_graphmodule(model, graph_predict)
+    assert isinstance(graph_module, GraphModule)
+
+    # test _prepare_module_dict
+    modules = dict(model.named_modules())
+    module_dict = _prepare_module_dict(model, graph_predict)
+    for k, v in module_dict.items():
+        assert isinstance(v, torch.nn.Module)
+        assert not isinstance(v, modules[k].__class__)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/test_demo_inputs.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/test_demo_inputs.py
new file mode 100644
index 0000000000000000000000000000000000000000..e88352f08e4f800c3f98cc119e0f84b98e5bd423
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_demo_inputs/test_demo_inputs.py
@@ -0,0 +1,24 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+from mmrazor.models.task_modules.demo_inputs import DefaultDemoInput
+from ....data.tracer_passed_models import FxPassedModelManager
+
+
+class TestDemoInputs(unittest.TestCase):
+
+    def test_demo_inputs(self):
+        for Model in FxPassedModelManager().include_models():
+            with self.subTest(model=Model):
+                demo_input = DefaultDemoInput(input_shape=[1, 3, 224, 224])
+                model = Model()
+                model.eval()
+                try:
+                    demo_input(model)
+                    input = demo_input.get_data(model)
+                    if isinstance(input, dict):
+                        model(**input)
+                    else:
+                        model(input)
+                except Exception as e:
+                    self.fail(f'{e}')
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_estimators/test_flops_params.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_estimators/test_flops_params.py
new file mode 100644
index 0000000000000000000000000000000000000000..82fa9d1880e2a0f8622d07718aaa9adc7225c592
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_estimators/test_flops_params.py
@@ -0,0 +1,236 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import pytest
+import torch
+from mmcv.cnn.bricks import Conv2dAdaptivePadding
+from torch import Tensor
+from torch.nn import Conv2d, Module, Parameter
+
+from mmrazor.models import OneShotMutableModule, ResourceEstimator
+from mmrazor.models.task_modules.estimators.counters import BaseCounter
+from mmrazor.registry import MODELS, TASK_UTILS
+from mmrazor.structures import export_fix_subnet
+
+_FIRST_STAGE_MUTABLE = dict(
+    type='OneShotMutableOP',
+    candidates=dict(
+        mb_k3e1=dict(
+            type='MBBlock',
+            kernel_size=3,
+            expand_ratio=1,
+            norm_cfg=dict(type='BN'),
+            act_cfg=dict(type='ReLU6'))))
+
+_OTHER_STAGE_MUTABLE = dict(
+    type='OneShotMutableOP',
+    candidates=dict(
+        mb_k3e3=dict(
+            type='MBBlock',
+            kernel_size=3,
+            expand_ratio=3,
+            norm_cfg=dict(type='BN'),
+            act_cfg=dict(type='ReLU6')),
+        mb_k5e3=dict(
+            type='MBBlock',
+            kernel_size=5,
+            expand_ratio=3,
+            norm_cfg=dict(type='BN'),
+            act_cfg=dict(type='ReLU6')),
+        identity=dict(type='Identity')))
+
+ARCHSETTING_CFG = [
+    # Parameters to build layers. 4 parameters are needed to construct a
+    # layer, from left to right: channel, num_blocks, stride, mutable cfg.
+    [16, 1, 1, _FIRST_STAGE_MUTABLE],
+    [24, 2, 2, _OTHER_STAGE_MUTABLE],
+    [32, 3, 2, _OTHER_STAGE_MUTABLE],
+    [64, 4, 2, _OTHER_STAGE_MUTABLE],
+    [96, 3, 1, _OTHER_STAGE_MUTABLE],
+    [160, 3, 2, _OTHER_STAGE_MUTABLE],
+    [320, 1, 1, _OTHER_STAGE_MUTABLE]
+]
+
+NORM_CFG = dict(type='BN')
+BACKBONE_CFG = dict(
+    type='mmrazor.SearchableMobileNetV2',
+    first_channels=32,
+    last_channels=1280,
+    widen_factor=1.0,
+    norm_cfg=NORM_CFG,
+    arch_setting=ARCHSETTING_CFG)
+
+estimator = ResourceEstimator()
+
+
+class FoolAddConstant(Module):
+
+    def __init__(self, p: float = 0.1) -> None:
+        super().__init__()
+
+        self.register_parameter(
+            name='p', param=Parameter(torch.tensor(p, dtype=torch.float32)))
+
+    def forward(self, x: Tensor) -> Tensor:
+        return x + self.p
+
+
+@TASK_UTILS.register_module()
+class FoolAddConstantCounter(BaseCounter):
+
+    @staticmethod
+    def add_count_hook(module, input, output):
+        module.__flops__ += 1000000
+        module.__params__ += 700000
+
+
+class FoolConv2d(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.conv2d = Conv2d(3, 32, 3)
+
+    def forward(self, x: Tensor) -> Tensor:
+        return self.conv2d(x)
+
+
+class FoolConvModule(Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.add_constant = FoolAddConstant(0.1)
+        self.conv2d = FoolConv2d()
+
+    def forward(self, x: Tensor) -> Tensor:
+        x = self.add_constant(x)
+
+        return self.conv2d(x)
+
+
+class TestResourceEstimator(TestCase):
+
+    def sample_choice(self, model: Module) -> None:
+        for module in model.modules():
+            if isinstance(module, OneShotMutableModule):
+                module.current_choice = module.sample_choice()
+
+    def test_estimate(self) -> None:
+        fool_conv2d = FoolConv2d()
+        flops_params_cfg = dict(input_shape=(1, 3, 224, 224))
+        results = estimator.estimate(
+            model=fool_conv2d, flops_params_cfg=flops_params_cfg)
+        flops_count = results['flops']
+        params_count = results['params']
+
+        self.assertEqual(flops_count, 44.158)
+        self.assertEqual(params_count, 0.001)
+
+        fool_conv2d = Conv2dAdaptivePadding(3, 32, 3)
+        results = estimator.estimate(
+            model=fool_conv2d, flops_params_cfg=flops_params_cfg)
+        flops_count = results['flops']
+        params_count = results['params']
+
+        self.assertEqual(flops_count, 44.958)
+        self.assertEqual(params_count, 0.001)
+
+    def test_register_module(self) -> None:
+        fool_add_constant = FoolConvModule()
+        flops_params_cfg = dict(input_shape=(1, 3, 224, 224))
+        results = estimator.estimate(
+            model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+        flops_count = results['flops']
+        params_count = results['params']
+
+        self.assertEqual(flops_count, 45.158)
+        self.assertEqual(params_count, 0.701)
+
+    def test_disable_sepc_counter(self) -> None:
+        fool_add_constant = FoolConvModule()
+        flops_params_cfg = dict(
+            input_shape=(1, 3, 224, 224),
+            disabled_counters=['FoolAddConstantCounter'])
+        rest_results = estimator.estimate(
+            model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+        rest_flops_count = rest_results['flops']
+        rest_params_count = rest_results['params']
+
+        self.assertLess(rest_flops_count, 45.158)
+        self.assertLess(rest_params_count, 0.701)
+
+        fool_conv2d = Conv2dAdaptivePadding(3, 32, 3)
+        flops_params_cfg = dict(
+            input_shape=(1, 3, 224, 224), disabled_counters=['Conv2dCounter'])
+        rest_results = estimator.estimate(
+            model=fool_conv2d, flops_params_cfg=flops_params_cfg)
+        rest_flops_count = rest_results['flops']
+        rest_params_count = rest_results['params']
+
+        self.assertEqual(rest_flops_count, 0)
+        self.assertEqual(rest_params_count, 0)
+
+    def test_estimate_spec_module(self) -> None:
+        fool_add_constant = FoolConvModule()
+        flops_params_cfg = dict(
+            input_shape=(1, 3, 224, 224),
+            spec_modules=['add_constant', 'conv2d'])
+        results = estimator.estimate(
+            model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+        flops_count = results['flops']
+        params_count = results['params']
+
+        self.assertEqual(flops_count, 45.158)
+        self.assertEqual(params_count, 0.701)
+
+    def test_estimate_separation_modules(self) -> None:
+        fool_add_constant = FoolConvModule()
+        flops_params_cfg = dict(
+            input_shape=(1, 3, 224, 224), spec_modules=['add_constant'])
+        results = estimator.estimate_separation_modules(
+            model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+        self.assertGreater(results['add_constant']['flops'], 0)
+
+        with pytest.raises(AssertionError):
+            flops_params_cfg = dict(
+                input_shape=(1, 3, 224, 224), spec_modules=['backbone'])
+            results = estimator.estimate_separation_modules(
+                model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+
+        with pytest.raises(AssertionError):
+            flops_params_cfg = dict(
+                input_shape=(1, 3, 224, 224), spec_modules=[])
+            results = estimator.estimate_separation_modules(
+                model=fool_add_constant, flops_params_cfg=flops_params_cfg)
+
+    def test_estimate_subnet(self) -> None:
+        flops_params_cfg = dict(input_shape=(1, 3, 224, 224))
+        model = MODELS.build(BACKBONE_CFG)
+        self.sample_choice(model)
+        copied_model = copy.deepcopy(model)
+
+        results = estimator.estimate(
+            model=copied_model, flops_params_cfg=flops_params_cfg)
+        flops_count = results['flops']
+        params_count = results['params']
+
+        _, sliced_model = export_fix_subnet(model, slice_weight=True)
+        subnet_results = estimator.estimate(
+            model=sliced_model, flops_params_cfg=flops_params_cfg)
+        subnet_flops_count = subnet_results['flops']
+        subnet_params_count = subnet_results['params']
+
+        self.assertEqual(flops_count, subnet_flops_count)
+        self.assertEqual(params_count, subnet_params_count)
+
+        # test whether subnet estimate will affect original model
+        copied_model = copy.deepcopy(model)
+        results_after_estimate = estimator.estimate(
+            model=copied_model, flops_params_cfg=flops_params_cfg)
+        flops_count_after_estimate = results_after_estimate['flops']
+        params_count_after_estimate = results_after_estimate['params']
+
+        self.assertEqual(flops_count, flops_count_after_estimate)
+        self.assertEqual(params_count, params_count_after_estimate)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_graph_utils.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_graph_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea7f90565c371e7effc9b470292ca1dbc3cb78b0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_graph_utils.py
@@ -0,0 +1,536 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import operator
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+
+try:
+    from torch.ao.quantization import QConfigMapping
+    from torch.ao.quantization.fake_quantize import FakeQuantizeBase
+    from torch.ao.quantization.fx import prepare
+    from torch.ao.quantization.quantize_fx import _fuse_fx
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    QConfigMapping = get_placeholder('torch>=1.13')
+    FakeQuantizeBase = get_placeholder('torch>=1.13')
+    prepare = get_placeholder('torch>=1.13')
+    _fuse_fx = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.task_modules.tracer import CustomTracer, build_graphmodule
+from mmrazor.models.task_modules.tracer.fx import (
+    del_fakequant_after_function, del_fakequant_after_method,
+    del_fakequant_after_module, del_fakequant_after_op,
+    del_fakequant_before_function, del_fakequant_before_method,
+    del_fakequant_before_module, del_fakequant_before_op)
+from mmrazor.structures.quantization import BackendConfigs, QConfigHandler
+
+
+def _get_attrs(target, attrs):
+    attrs = attrs.split('.')
+
+    for att in attrs:
+        target = getattr(target, att, None)
+    return target
+
+
+class BasicBlock(nn.Module):
+
+    def __init__(self, in_channels, out_channels):
+        super(BasicBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.mid_channels = out_channels
+
+        self.norm1 = nn.BatchNorm2d(self.mid_channels)
+        self.norm2 = nn.BatchNorm2d(out_channels)
+        self.conv1 = nn.Conv2d(in_channels, self.mid_channels, 1)
+        self.conv2 = nn.Conv2d(self.mid_channels, out_channels, 1)
+
+        self.relu = nn.ReLU6()
+        self.drop_path = nn.Identity()
+
+    def forward(self, x):
+
+        def _inner_forward(x):
+            identity = x
+
+            out = self.conv1(x)
+            out = self.norm1(out)
+            out = self.relu(out)
+
+            out = self.conv2(out)
+            out = self.norm2(out)
+
+            out = self.drop_path(out)
+
+            out += identity
+
+            return out
+
+        out = _inner_forward(x)
+
+        out = self.relu(out)
+
+        return out
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.stem_layer = nn.Sequential(
+            nn.Conv2d(3, 3, 1), nn.BatchNorm2d(3), nn.ReLU())
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.block = BasicBlock(3, 3)
+        self.block2 = BasicBlock(3, 3)
+        self.gap = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc = nn.Linear(3, 4)
+
+    def forward(self, x):
+        x = self.stem_layer(x)
+        x = self.maxpool(x)
+        x = self.block(x)
+        x = self.block2(x)
+        x = self.gap(x)
+        x = x.flatten(1)
+        x = self.fc(x)
+        return x
+
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+
+class TestGraphUtils(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        self.tracer = CustomTracer()
+        self.backend_config = BackendConfigs['native']
+        self.qconfig = QConfigHandler(global_qconfig)
+        self.qconfig_mapping = QConfigMapping().set_global(
+            self.qconfig.convert())
+        self.example_inputs = (torch.randn(1, 3, 224, 224), )
+
+    def swap_ff_with_fxff(self, model):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        modules_to_swap = []
+        for name, module in model.named_children():
+            if isinstance(module, torch.ao.nn.quantized.FloatFunctional):
+                modules_to_swap.append(name)
+            else:
+                self.swap_ff_with_fxff(module)
+
+        for name in modules_to_swap:
+            del model._modules[name]
+            model._modules[name] = torch.ao.nn.quantized.FXFloatFunctional()
+
+    def test_del_fakequant_before_op(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        op_del_prev_fakequant = ('output', )
+
+        prepared_after_del = del_fakequant_before_op(
+            prepared, op_del_prev_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op in op_del_prev_fakequant:
+                args = node.args
+                self.assertIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op in op_del_prev_fakequant:
+                args = node.args
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_before_op(
+            prepared, op_del_prev_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op in op_del_prev_fakequant:
+                args = node.args
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+    def test_del_fakequant_after_op(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        op_del_next_fakequant = ('placeholder', )
+
+        prepared_after_del = del_fakequant_after_op(
+            prepared, op_del_next_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op in op_del_next_fakequant:
+                self.assertIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op in op_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_after_op(
+            prepared, op_del_next_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op in op_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+    def test_del_fakequant_before_method(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        method_del_prev_fakequant = ('flatten', )
+
+        prepared_after_del = del_fakequant_before_method(
+            prepared, method_del_prev_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_prev_fakequant:
+                args = node.args
+                self.assertIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_prev_fakequant:
+                args = node.args
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_before_method(
+            prepared, method_del_prev_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_prev_fakequant:
+                args = node.args
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+    def test_del_fakequant_after_method(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        method_del_next_fakequant = ('flatten', )
+
+        prepared_after_del = del_fakequant_after_method(
+            prepared, method_del_next_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_next_fakequant:
+                self.assertIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_after_method(
+            prepared, method_del_next_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_method' and \
+                    node.target in method_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+    def test_del_fakequant_before_function(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        function_del_prev_fakequant = (operator.add, )
+
+        prepared_after_del = del_fakequant_before_function(
+            prepared, function_del_prev_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_prev_fakequant:
+                args = node.args
+                self.assertIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_prev_fakequant:
+                args = node.args
+                self.assertEqual(len(args), 2)
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[1].target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_before_function(
+            prepared, function_del_prev_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_prev_fakequant:
+                args = node.args
+                self.assertEqual(len(args), 2)
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, args[1].target), FakeQuantizeBase)
+
+    def test_del_fakequant_after_function(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        function_del_next_fakequant = (operator.add, )
+
+        prepared_after_del = del_fakequant_after_function(
+            prepared, function_del_next_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_next_fakequant:
+                self.assertIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_after_function(
+            prepared, function_del_next_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_function' and \
+                    node.target in function_del_next_fakequant:
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+    def test_del_fakequant_before_module(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        module_del_prev_fakequant = (torch.nn.ReLU6, torch.nn.Identity)
+
+        prepared_after_del = del_fakequant_before_module(
+            prepared, module_del_prev_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_prev_fakequant):
+                args = node.args
+                self.assertIsInstance(
+                    _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_prev_fakequant):
+                args = node.args
+                if args[0].op == 'call_module':
+                    self.assertNotIsInstance(
+                        _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_before_module(
+            prepared, module_del_prev_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_prev_fakequant):
+                args = node.args
+                if args[0].op == 'call_module':
+                    self.assertNotIsInstance(
+                        _get_attrs(prepared, args[0].target), FakeQuantizeBase)
+
+    def test_del_fakequant_after_module(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        model_to_quantize = ToyModel()
+        model_to_quantize.eval()
+
+        self.swap_ff_with_fxff(model_to_quantize)
+        traced_graph = self.tracer.trace(model_to_quantize)
+        graph_module = build_graphmodule(model_to_quantize, traced_graph)
+
+        graph_module = _fuse_fx(
+            graph_module=graph_module,
+            is_qat=True,
+            backend_config=self.backend_config)
+        prepared = prepare(
+            model=graph_module,
+            qconfig_mapping=self.qconfig_mapping,
+            is_qat=True,
+            node_name_to_scope=self.tracer.node_name_to_scope,
+            example_inputs=self.example_inputs,
+            backend_config=self.backend_config)
+
+        module_del_next_fakequant = (torch.nn.MaxPool2d, )
+
+        prepared_after_del = del_fakequant_after_module(
+            prepared, module_del_next_fakequant, inplace=False)
+        for node in prepared.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_next_fakequant):
+                self.assertIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_next_fakequant):
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
+
+        prepared_after_del = del_fakequant_after_module(
+            prepared, module_del_next_fakequant, inplace=True)
+        for node in prepared_after_del.graph.nodes:
+            if node.op == 'call_module' and isinstance(
+                    _get_attrs(prepared, node.target),
+                    module_del_next_fakequant):
+                self.assertNotIsInstance(
+                    _get_attrs(prepared, node.next.target), FakeQuantizeBase)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_predictors/test_metric_predictor.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_predictors/test_metric_predictor.py
new file mode 100644
index 0000000000000000000000000000000000000000..5da4ab4d1a6e1cb8716a0c79f8540ccbabc43f90
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_task_modules/test_predictors/test_metric_predictor.py
@@ -0,0 +1,196 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import tempfile
+from unittest import TestCase
+
+import numpy as np
+import torch.nn as nn
+from mmengine.model import BaseModel
+
+from mmrazor.models import OneShotMutableOP
+from mmrazor.registry import TASK_UTILS
+
+convs = nn.ModuleDict({
+    'conv1': nn.Conv2d(3, 8, 1),
+    'conv2': nn.Conv2d(3, 8, 1),
+    'conv3': nn.Conv2d(3, 8, 1),
+})
+MutableOP = OneShotMutableOP(convs)
+
+
+class ToyModel(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor, init_cfg=None)
+        self.mutable = MutableOP
+        self.bn = nn.BatchNorm2d(8)
+
+    def forward(self, batch_inputs, data_samples=None, mode='tensor'):
+        if mode == 'loss':
+            out = self.bn(self.mutable(batch_inputs))
+            return dict(loss=out)
+        elif mode == 'predict':
+            out = self.bn(self.mutable(batch_inputs)) + 1
+            return out
+        elif mode == 'tensor':
+            out = self.bn(self.mutable(batch_inputs)) + 2
+            return out
+
+
+class TestMetricPredictorWithGP(TestCase):
+
+    def setUp(self) -> None:
+        self.temp_dir = tempfile.mkdtemp()
+        self.search_groups = {0: [MutableOP]}
+        self.candidates = [{0: 'conv1'}, {0: 'conv2'}, {0: 'conv3'}]
+        predictor_cfg = dict(
+            type='MetricPredictor',
+            handler_cfg=dict(type='GaussProcessHandler'),
+            search_groups=self.search_groups,
+            train_samples=4,
+        )
+        self.predictor = TASK_UTILS.build(predictor_cfg)
+        self.model = ToyModel()
+
+    def generate_data(self):
+        inputs = []
+        for candidate in self.candidates:
+            inputs.append(self.predictor.model2vector(candidate))
+        inputs = np.array(inputs)
+        labels = np.random.rand(3)
+        return inputs, labels
+
+    def test_init_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.assertFalse(self.predictor.initialize)
+        self.predictor.fit(inputs, labels)
+        self.assertTrue(self.predictor.initialize)
+
+    def test_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.predictor.fit(inputs, labels)
+
+        metrics = self.predictor.predict(self.model)
+        self.assertIsInstance(metrics, dict)
+        self.assertGreater(metrics['accuracy_top-1'], 0.0)
+
+
+class TestMetricPredictorWithCart(TestCase):
+
+    def setUp(self) -> None:
+        self.temp_dir = tempfile.mkdtemp()
+        self.search_groups = {0: [MutableOP]}
+        self.candidates = [{0: 'conv1'}, {0: 'conv2'}, {0: 'conv3'}]
+        predictor_cfg = dict(
+            type='MetricPredictor',
+            handler_cfg=dict(type='CartsHandler'),
+            search_groups=self.search_groups,
+            train_samples=4,
+        )
+        self.predictor = TASK_UTILS.build(predictor_cfg)
+        self.model = ToyModel()
+
+    def generate_data(self):
+        inputs = []
+        for candidate in self.candidates:
+            inputs.append(self.predictor.model2vector(candidate))
+        inputs = np.array(inputs)
+        labels = np.random.rand(3)
+        return inputs, labels
+
+    def test_init_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.assertFalse(self.predictor.initialize)
+        self.predictor.fit(inputs, labels)
+        self.assertTrue(self.predictor.initialize)
+
+    def test_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.predictor.fit(inputs, labels)
+
+        metrics = self.predictor.predict(self.model)
+        self.assertIsInstance(metrics, dict)
+        self.assertGreater(metrics['accuracy_top-1'], 0.0)
+
+
+class TestMetricPredictorWithRBF(TestCase):
+
+    def setUp(self) -> None:
+        self.temp_dir = tempfile.mkdtemp()
+        self.search_groups = {0: [MutableOP]}
+        self.candidates = [{0: 'conv1'}, {0: 'conv2'}, {0: 'conv3'}]
+        predictor_cfg = dict(
+            type='MetricPredictor',
+            handler_cfg=dict(type='RBFHandler'),
+            search_groups=self.search_groups,
+            train_samples=4,
+        )
+        self.predictor = TASK_UTILS.build(predictor_cfg)
+        self.model = ToyModel()
+
+    def generate_data(self):
+        inputs = []
+        for candidate in self.candidates:
+            inputs.append(self.predictor.model2vector(candidate))
+        inputs = np.array(inputs)
+        labels = np.random.rand(3)
+        return inputs, labels
+
+    def test_init_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.assertFalse(self.predictor.initialize)
+        self.predictor.fit(inputs, labels)
+        self.assertTrue(self.predictor.initialize)
+
+    def test_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.predictor.fit(inputs, labels)
+
+        metrics = self.predictor.predict(self.model)
+        self.assertIsInstance(metrics, dict)
+        self.assertGreater(metrics['accuracy_top-1'], 0.0)
+
+
+class TestMetricPredictorWithMLP(TestCase):
+
+    def setUp(self) -> None:
+        self.temp_dir = tempfile.mkdtemp()
+        self.search_groups = {0: [MutableOP]}
+        self.candidates = [{0: 'conv1'}, {0: 'conv2'}, {0: 'conv3'}]
+        predictor_cfg = dict(
+            type='MetricPredictor',
+            handler_cfg=dict(type='MLPHandler'),
+            search_groups=self.search_groups,
+            train_samples=4,
+        )
+        self.predictor = TASK_UTILS.build(predictor_cfg)
+        self.model = ToyModel()
+
+    def generate_data(self):
+        inputs = []
+        for candidate in self.candidates:
+            inputs.append(self.predictor.model2vector(candidate))
+        inputs = np.array(inputs)
+        labels = np.random.rand(3)
+        return inputs, labels
+
+    def test_init_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.assertFalse(self.predictor.initialize)
+        self.predictor.fit(inputs, labels)
+        self.assertTrue(self.predictor.initialize)
+
+    def test_predictor(self):
+        self.model.mutable.current_choice = 'conv1'
+        inputs, labels = self.generate_data()
+        self.predictor.fit(inputs, labels)
+
+        metrics = self.predictor.predict(self.model)
+        self.assertIsInstance(metrics, dict)
+        self.assertGreater(metrics['accuracy_top-1'], 0.0)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/test_expand.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/test_expand.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8f3b82a8be9eb5af00dd6c0f1141ae4b79e037a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_models/test_utils/test_expandable_utils/test_expand.py
@@ -0,0 +1,64 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import torch
+
+from mmrazor import digit_version
+from mmrazor.models.mutables import SimpleMutableChannel
+from mmrazor.models.utils.expandable_utils import (
+    expand_expandable_dynamic_model, make_channel_divisible,
+    to_expandable_model)
+from mmrazor.models.utils.expandable_utils.ops import ExpandLinear
+from ....data.models import DwConvModel, MultiConcatModel, SingleLineModel
+
+
+class TestExpand(unittest.TestCase):
+
+    def check_torch_version(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+
+    def test_expand(self):
+        self.check_torch_version()
+        for Model in [MultiConcatModel, DwConvModel]:
+            x = torch.rand([1, 3, 224, 224])
+            model = Model()
+            print(model)
+            mutator = to_expandable_model(model)
+            print(mutator.choice_template)
+            print(model)
+            y1 = model(x)
+
+            for unit in mutator.mutable_units:
+                unit.expand(10)
+                print(unit.mutable_channel.mask.shape)
+            expand_expandable_dynamic_model(model, zero=True)
+            print(model)
+            y2 = model(x)
+            self.assertTrue((y1 - y2).abs().max() < 1e-3)
+
+    def test_expand_static_model(self):
+        self.check_torch_version()
+        x = torch.rand([1, 3, 224, 224])
+        model = SingleLineModel()
+        y1 = model(x)
+        make_channel_divisible(model, divisor=4)
+        y2 = model(x)
+        print(y1.reshape([-1])[:5])
+        print(y2.reshape([-1])[:5])
+        self.assertTrue((y1 - y2).abs().max() < 1e-3)
+
+    def test_ExpandConv2d(self):
+        self.check_torch_version()
+        linear = ExpandLinear(3, 3)
+        mutable_in = SimpleMutableChannel(3)
+        mutable_out = SimpleMutableChannel(3)
+        linear.register_mutable_attr('in_channels', mutable_in)
+        linear.register_mutable_attr('out_channels', mutable_out)
+
+        print(linear.weight)
+
+        mutable_in.mask = torch.tensor([1.0, 1.0, 0.0, 1.0, 0.0])
+        mutable_out.mask = torch.tensor([1.0, 1.0, 0.0, 1.0, 0.0])
+        linear_ex = linear.expand(zero=True)
+        print(linear_ex.weight)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_registry/test_registry.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_registry/test_registry.py
new file mode 100644
index 0000000000000000000000000000000000000000..c8340f3525bf650f91d6bc27db6a25f4c9a16d24
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_registry/test_registry.py
@@ -0,0 +1,144 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+from typing import Dict, Optional, Union
+from unittest import TestCase
+
+import torch.nn as nn
+from mmengine import fileio
+from mmengine.config import Config
+from mmengine.model import BaseModel
+
+from mmrazor.models import *  # noqa: F403, F401
+from mmrazor.models.algorithms.base import BaseAlgorithm
+from mmrazor.models.mutables import OneShotMutableOP
+from mmrazor.registry import MODELS
+from mmrazor.structures import load_fix_subnet
+from mmrazor.utils import ValidFixMutable
+
+
+@MODELS.register_module()
+class MockModel(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        convs1 = nn.ModuleDict({
+            'conv1': nn.Conv2d(3, 8, 1),
+            'conv2': nn.Conv2d(3, 8, 1),
+            'conv3': nn.Conv2d(3, 8, 1),
+        })
+        convs2 = nn.ModuleDict({
+            'conv1': nn.Conv2d(8, 16, 1),
+            'conv2': nn.Conv2d(8, 16, 1),
+            'conv3': nn.Conv2d(8, 16, 1),
+        })
+
+        self.mutable1 = OneShotMutableOP(convs1)
+        self.mutable2 = OneShotMutableOP(convs2)
+
+    def forward(self, x):
+        x = self.mutable1(x)
+        x = self.mutable2(x)
+        return x
+
+
+@MODELS.register_module()
+class MockAlgorithm(BaseAlgorithm):
+
+    def __init__(self,
+                 architecture: Union[BaseModel, Dict],
+                 fix_subnet: Optional[ValidFixMutable] = None):
+        super().__init__(architecture)
+
+        if fix_subnet is not None:
+            # According to fix_subnet, delete the unchosen part of supernet
+            load_fix_subnet(self, fix_subnet, prefix='architecture.')
+            self.is_supernet = False
+        else:
+            self.is_supernet = True
+
+
+class TestRegistry(TestCase):
+
+    def setUp(self) -> None:
+        self.arch_cfg_path = dict(
+            cfg_path='mmdet::faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py',
+            pretrained=False)
+
+        return super().setUp()
+
+    def test_build_razor_from_cfg(self):
+        # test cfg_path
+        # TODO relay on mmengine:HAOCHENYE/config_new_feature
+        # model = MODELS.build(self.arch_cfg_path)
+        # self.assertIsNotNone(model)
+
+        # test fix subnet
+        cfg = Config.fromfile(
+            'tests/data/test_registry/registry_subnet_config.py')
+        model = MODELS.build(cfg.model)
+
+        # test return architecture
+        cfg = Config.fromfile(
+            'tests/data/test_registry/registry_architecture_config.py')
+        model = MODELS.build(cfg.model)
+        self.assertTrue(isinstance(model, BaseModel))
+
+    def test_build_subnet_prune_from_cfg(self):
+        mutator_cfg = fileio.load('tests/data/test_registry/subnet.json')
+        init_cfg = dict(
+            type='Pretrained',
+            checkpoint='tests/data/test_registry/subnet_weight.pth')
+        # test fix subnet
+        model_cfg = dict(
+            # use mmrazor's build_func
+            type='mmrazor.sub_model',
+            cfg=dict(
+                cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py',
+                pretrained=False),
+            fix_subnet=mutator_cfg,
+            mode='mutator',
+            init_cfg=init_cfg)
+        model = MODELS.build(model_cfg)
+        self.assertTrue(isinstance(model, BaseModel))
+
+    def test_build_subnet_prune_from_cfg_by_mutator(self):
+        mutator_cfg = fileio.load('tests/data/test_registry/subnet.json')
+        init_cfg = dict(
+            type='Pretrained',
+            checkpoint='tests/data/test_registry/subnet_weight.pth')
+        # test fix subnet
+        model_cfg = dict(
+            # use mmrazor's build_func
+            type='mmrazor.sub_model',
+            cfg=dict(
+                cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py',
+                pretrained=False),
+            fix_subnet=mutator_cfg,
+            mode='mutator',
+            init_cfg=init_cfg)
+        model = MODELS.build(model_cfg)
+        self.assertTrue(isinstance(model, BaseModel))
+        # make sure the model is pruned
+        assert model.backbone.layer1[0].conv1.weight.size()[0] == 41
+
+    def test_build_subnet_prune_from_cfg_by_mutable(self):
+        mutator_cfg = fileio.load('tests/data/test_registry/subnet.json')
+        init_cfg = dict(
+            type='Pretrained',
+            checkpoint='tests/data/test_registry/subnet_weight.pth')
+        # test fix subnet
+        model_cfg = dict(
+            # use mmrazor's build_func
+            type='mmrazor.sub_model',
+            cfg=dict(
+                cfg_path='mmcls::resnet/resnet50_8xb32_in1k.py',
+                pretrained=False),
+            fix_subnet=mutator_cfg,
+            mode='mutable',
+            init_cfg=init_cfg)
+        model = MODELS.build(model_cfg)
+        self.assertTrue(isinstance(model, BaseModel))
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_autoslim_greedy_search_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_autoslim_greedy_search_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..87fab9939dc902ed2425dc4b5c2013a5b8b55991
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_autoslim_greedy_search_loop.py
@@ -0,0 +1,187 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import shutil
+import tempfile
+from typing import Dict, List, Tuple, Union
+from unittest import TestCase
+from unittest.mock import MagicMock
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcls.structures import ClsDataSample
+from mmengine.config import Config
+from torch.utils.data import DataLoader, Dataset
+
+from mmrazor.engine import AutoSlimGreedySearchLoop
+from mmrazor.models.algorithms import AutoSlim
+from mmrazor.registry import LOOPS
+
+MUTATOR_TYPE = Union[torch.nn.Module, Dict]
+DISTILLER_TYPE = Union[torch.nn.Module, Dict]
+
+
+def collate_fn(data_batch):
+    return data_batch[0]
+
+
+class ToyDataset(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = [torch.randn(2, 3, 4, 4)] * 4
+    label = [[ClsDataSample().set_gt_label(torch.randint(0, 1000, (2, )))]
+             for _ in range(4)]
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return len(self.data)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_samples=self.label[index])
+
+
+ARCHITECTURE_CFG = dict(
+    _scope_='mmcls',
+    type='ImageClassifier',
+    backbone=dict(type='MobileNetV2', widen_factor=1.5),
+    neck=dict(type='GlobalAveragePooling'),
+    head=dict(
+        type='mmcls.LinearClsHead',
+        num_classes=1000,
+        in_channels=1920,
+        loss=dict(type='mmcls.CrossEntropyLoss', loss_weight=1.0),
+        topk=(1, 5)))
+
+MUTATOR_CFG = dict(
+    type='OneShotChannelMutator',
+    channel_unit_cfg=dict(
+        type='OneShotMutableChannelUnit',
+        default_args=dict(
+            candidate_choices=list(i / 12 for i in range(2, 13)),
+            choice_mode='ratio')))
+
+DISTILLER_CFG = dict(
+    type='ConfigurableDistiller',
+    teacher_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    student_recorders=dict(fc=dict(type='ModuleOutputs', source='head.fc')),
+    distill_losses=dict(
+        loss_kl=dict(type='KLDivergence', tau=1, loss_weight=1)),
+    loss_forward_mappings=dict(
+        loss_kl=dict(
+            preds_S=dict(recorder='fc', from_student=True),
+            preds_T=dict(recorder='fc', from_student=False))))
+
+
+class ToyDataPreprocessor(torch.nn.Module):
+
+    def forward(
+            self,
+            data: Dict,
+            training: bool = True) -> Tuple[torch.Tensor, List[ClsDataSample]]:
+        return data
+
+
+class Net(nn.Module):
+
+    def __init__(self):
+        super(Net, self).__init__()
+        self.conv = nn.Conv2d(3, 100, 3)
+
+    def forward(self, x):
+        out = F.conv2d(
+            x,
+            weight=self.conv.weight,
+            bias=self.conv.bias,
+            stride=self.conv.stride,
+            padding=self.conv.padding,
+            dilation=self.conv.dilation,
+            groups=self.conv.groups)
+        return out
+
+
+class ToyRunner:
+
+    @property
+    def distributed(self):
+        pass
+
+    @property
+    def rank(self):
+        pass
+
+    @property
+    def epoch(self):
+        pass
+
+    @property
+    def work_dir(self):
+        pass
+
+    def model(self):
+        pass
+
+    def logger(self):
+        pass
+
+    def call_hook(self, fn_name: str):
+        pass
+
+    def visualizer(self):
+        pass
+
+
+class TestAutoSlimGreedySearchLoop(TestCase):
+    device: str = 'cpu'
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+        train_cfg = dict(type='AutoSlimGreedySearchLoop', target_flops=(700, ))
+        self.train_cfg = Config(train_cfg)
+        self.runner = MagicMock(spec=ToyRunner)
+        self.runner.model = self.prepare_model(MUTATOR_CFG, DISTILLER_CFG,
+                                               ARCHITECTURE_CFG)
+        self.runner.distributed = False
+        self.dataloader = DataLoader(ToyDataset(), collate_fn=collate_fn)
+        self.evaluator = MagicMock()
+
+    def prepare_model(self,
+                      mutator_cfg: MUTATOR_TYPE = MUTATOR_CFG,
+                      distiller_cfg: DISTILLER_TYPE = DISTILLER_CFG,
+                      architecture_cfg: Dict = ARCHITECTURE_CFG,
+                      num_random_samples: int = 2) -> AutoSlim:
+        model = AutoSlim(
+            mutator=mutator_cfg,
+            distiller=distiller_cfg,
+            architecture=architecture_cfg,
+            data_preprocessor=ToyDataPreprocessor(),
+            num_random_samples=num_random_samples)
+        model.to(self.device)
+
+        return model
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self) -> None:
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.assertIsInstance(loop, AutoSlimGreedySearchLoop)
+
+    def test_run(self):
+        # test_run_epoch: distributed == False
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        loop._epoch = 1
+        self.runner.distributed = False
+        self.runner.work_dir = self.temp_dir
+        loop.run()
+        self.assertEqual(len(loop.searched_subnet), 1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_darts_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_darts_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..70255b206b8729f7d3e69945ba30e2808202194d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_darts_loop.py
@@ -0,0 +1,258 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import shutil
+import tempfile
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine.config import Config
+from mmengine.hooks import Hook
+from mmengine.model import BaseDataPreprocessor, BaseModel
+from mmengine.runner import Runner
+from torch.utils.data import DataLoader, Dataset
+
+from mmrazor.engine import DartsEpochBasedTrainLoop  # noqa: F401
+from mmrazor.engine import DartsIterBasedTrainLoop  # noqa: F401
+from mmrazor.registry import DATASETS, HOOKS, MODELS
+
+
+class ToyDataPreprocessor(BaseDataPreprocessor):
+
+    def collate_data(self, data):
+        data = [_data[0] for _data in data]
+        inputs = [_data['inputs'].to(self._device) for _data in data]
+        batch_data_samples = []
+        # Model can get predictions without any data samples.
+        for _data in data:
+            if 'data_samples' in _data:
+                batch_data_samples.append(_data['data_samples'])
+        # Move data from CPU to corresponding device.
+        batch_data_samples = [
+            data_sample.to(self._device) for data_sample in batch_data_samples
+        ]
+
+        if not batch_data_samples:
+            batch_data_samples = None  # type: ignore
+
+        return inputs, batch_data_samples
+
+
+@MODELS.register_module()
+class ToyModel_DartsLoop(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.linear1 = nn.Linear(2, 2)
+        self.linear2 = nn.Linear(2, 1)
+
+    def train_step(self, data, optim_wrapper=None):
+
+        data1, data2 = data
+        _ = self._run_forward(data1, mode='loss')
+        losses = self._run_forward(data2, mode='loss')
+        parsed_losses, log_vars = self.parse_losses(losses)
+        return log_vars
+
+    def forward(self, inputs, data_samples, mode='tensor'):
+        batch_inputs = torch.stack(inputs).to(self.linear1.weight.device)
+        labels = torch.stack(data_samples).to(self.linear1.weight.device)
+        outputs = self.linear1(batch_inputs)
+        outputs = self.linear2(outputs)
+
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = (labels - outputs).sum()
+            outputs = dict(loss=loss)
+            return outputs
+        elif mode == 'predict':
+            outputs = dict(log_vars=dict(a=1, b=0.5))
+            return outputs
+
+
+@DATASETS.register_module()
+class ToyDataset_DartsLoop(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_samples=self.label[index])
+
+
+class TestDartsLoop(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+        epoch_based_cfg = dict(
+            default_scope='mmrazor',
+            model=dict(type='ToyModel_DartsLoop'),
+            work_dir=self.temp_dir,
+            train_dataloader=dict(
+                dataset=dict(type='ToyDataset_DartsLoop'),
+                sampler=dict(type='DefaultSampler', shuffle=True),
+                batch_size=3,
+                num_workers=0),
+            optim_wrapper=dict(
+                type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01)),
+            param_scheduler=dict(type='MultiStepLR', milestones=[1, 2]),
+            train_cfg=dict(
+                type='DartsEpochBasedTrainLoop',
+                max_epochs=3,
+                val_interval=1,
+                val_begin=2),
+            custom_hooks=[],
+            default_hooks=dict(
+                runtime_info=dict(type='RuntimeInfoHook'),
+                timer=dict(type='IterTimerHook'),
+                logger=dict(type='LoggerHook'),
+                param_scheduler=dict(type='ParamSchedulerHook'),
+                checkpoint=dict(
+                    type='CheckpointHook', interval=1, by_epoch=True),
+                sampler_seed=dict(type='DistSamplerSeedHook')),
+            launcher='none',
+            env_cfg=dict(dist_cfg=dict(backend='nccl')),
+        )
+        self.epoch_based_cfg = Config(epoch_based_cfg)
+        self.epoch_based_cfg.train_cfg['mutator_dataloader'] = \
+            self.epoch_based_cfg.train_dataloader
+        self.iter_based_cfg = copy.deepcopy(self.epoch_based_cfg)
+        self.iter_based_cfg.train_dataloader = dict(
+            dataset=dict(type='ToyDataset_DartsLoop'),
+            sampler=dict(type='InfiniteSampler', shuffle=True),
+            batch_size=3,
+            num_workers=0)
+        self.iter_based_cfg.train_cfg = dict(
+            type='DartsIterBasedTrainLoop',
+            mutator_dataloader=self.iter_based_cfg.train_dataloader,
+            max_iters=12,
+            val_interval=4,
+            val_begin=4)
+        self.iter_based_cfg.default_hooks = dict(
+            runtime_info=dict(type='RuntimeInfoHook'),
+            timer=dict(type='IterTimerHook'),
+            logger=dict(type='LoggerHook'),
+            param_scheduler=dict(type='ParamSchedulerHook'),
+            checkpoint=dict(type='CheckpointHook', interval=1, by_epoch=False),
+            sampler_seed=dict(type='DistSamplerSeedHook'))
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        # 1. DartsEpochBasedTrainLoop
+        cfg = copy.deepcopy(self.epoch_based_cfg)
+        cfg.experiment_name = 'test_init1'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_train_loop(cfg.train_cfg)
+
+        self.assertIsInstance(loop, DartsEpochBasedTrainLoop)
+        self.assertIsInstance(loop.runner, Runner)
+        self.assertEqual(loop.max_epochs, 3)
+        self.assertEqual(loop.max_iters, 12)
+        self.assertIsInstance(loop.mutator_dataloader, DataLoader)
+
+        # 2. DartsIterBasedTrainLoop
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_init2'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_train_loop(cfg.train_cfg)
+
+        self.assertIsInstance(loop, DartsIterBasedTrainLoop)
+        self.assertIsInstance(loop.runner, Runner)
+        self.assertEqual(loop.max_iters, 12)
+        self.assertIsInstance(loop.mutator_dataloader, DataLoader)
+
+    def test_run(self):
+        # 1. test DartsEpochBasedTrainLoop
+        epoch_results = []
+        epoch_targets = [i for i in range(3)]
+        iter_results = []
+        iter_targets = [i for i in range(4 * 3)]
+        batch_idx_results = []
+        batch_idx_targets = [i for i in range(4)] * 3  # train and val
+        val_epoch_results = []
+        val_epoch_targets = [i for i in range(2, 4)]
+
+        @HOOKS.register_module()
+        class TestEpochHook(Hook):
+
+            def before_train_epoch(self, runner):
+                epoch_results.append(runner.epoch)
+
+            def before_train_iter(self, runner, batch_idx, data_batch=None):
+                iter_results.append(runner.iter)
+                batch_idx_results.append(batch_idx)
+
+            def before_val_epoch(self, runner):
+                val_epoch_results.append(runner.epoch)
+
+        cfg = copy.deepcopy(self.epoch_based_cfg)
+        cfg.experiment_name = 'test_train1'
+        cfg.custom_hooks = [dict(type='TestEpochHook', priority=50)]
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+        assert isinstance(runner.train_loop, DartsEpochBasedTrainLoop)
+        for result, target, in zip(epoch_results, epoch_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(iter_results, iter_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(batch_idx_results, batch_idx_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(val_epoch_results, val_epoch_targets):
+            self.assertEqual(result, target)
+
+        # 2. test DartsIterBasedTrainLoop
+        epoch_results = []
+        iter_results = []
+        batch_idx_results = []
+        val_iter_results = []
+        val_batch_idx_results = []
+        iter_targets = [i for i in range(12)]
+        batch_idx_targets = [i for i in range(12)]
+        val_iter_targets = [i for i in range(4, 12)]
+        val_batch_idx_targets = [i for i in range(4)] * 2
+
+        @HOOKS.register_module()
+        class TestIterHook(Hook):
+
+            def before_train_epoch(self, runner):
+                epoch_results.append(runner.epoch)
+
+            def before_train_iter(self, runner, batch_idx, data_batch=None):
+                iter_results.append(runner.iter)
+                batch_idx_results.append(batch_idx)
+
+            def before_val_iter(self, runner, batch_idx, data_batch=None):
+                val_epoch_results.append(runner.iter)
+                val_batch_idx_results.append(batch_idx)
+
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_train2'
+        cfg.custom_hooks = [dict(type='TestIterHook', priority=50)]
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+        assert isinstance(runner.train_loop, DartsIterBasedTrainLoop)
+        self.assertEqual(len(epoch_results), 1)
+        self.assertEqual(epoch_results[0], 0)
+        self.assertEqual(runner.val_interval, 4)
+        self.assertEqual(runner.val_begin, 4)
+        for result, target, in zip(iter_results, iter_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(batch_idx_results, batch_idx_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(val_iter_results, val_iter_targets):
+            self.assertEqual(result, target)
+        for result, target, in zip(val_batch_idx_results,
+                                   val_batch_idx_targets):
+            self.assertEqual(result, target)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_distill_val_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_distill_val_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..49fa9ace445b19ff144491ad4d6285991742c3b0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_distill_val_loop.py
@@ -0,0 +1,180 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import shutil
+import tempfile
+from unittest import TestCase
+from unittest.mock import MagicMock
+
+import torch
+import torch.nn as nn
+from mmengine.config import Config
+from mmengine.evaluator import BaseMetric
+from mmengine.model import BaseModel
+from mmengine.runner import Runner
+from torch.utils.data import Dataset
+
+from mmrazor.engine import SelfDistillValLoop  # noqa: F401
+from mmrazor.engine import SingleTeacherDistillValLoop
+from mmrazor.registry import DATASETS, METRICS, MODELS
+
+
+@MODELS.register_module()
+class ToyModel_DistillValLoop(BaseModel):
+
+    def __init__(self):
+        super().__init__()
+        self.linear1 = nn.Linear(2, 2)
+        self.linear2 = nn.Linear(2, 1)
+        self.teacher = MagicMock()
+
+    def forward(self, inputs, data_samples, mode='tensor'):
+        inputs = torch.stack(inputs)
+        labels = torch.stack(data_samples)
+        outputs = self.linear1(inputs)
+        outputs = self.linear2(outputs)
+
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = (labels - outputs).sum()
+            outputs = dict(loss=loss)
+            return outputs
+        elif mode == 'predict':
+            outputs = dict(log_vars=dict(a=1, b=0.5))
+            return outputs
+
+
+@DATASETS.register_module()
+class ToyDataset_DistillValLoop(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_samples=self.label[index])
+
+
+@METRICS.register_module()
+class ToyMetric_DistillValLoop(BaseMetric):
+
+    def __init__(self, collect_device='cpu', dummy_metrics=None):
+        super().__init__(collect_device=collect_device)
+        self.dummy_metrics = dummy_metrics
+
+    def process(self, data_samples, predictions):
+        result = {'acc': 1}
+        self.results.append(result)
+
+    def compute_metrics(self, results):
+        return dict(acc=1)
+
+
+class TestSingleTeacherDistillValLoop(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+
+        val_dataloader = dict(
+            dataset=dict(type='ToyDataset_DistillValLoop'),
+            sampler=dict(type='DefaultSampler', shuffle=False),
+            batch_size=3,
+            num_workers=0)
+        val_evaluator = dict(type='ToyMetric_DistillValLoop')
+
+        val_loop_cfg = dict(
+            default_scope='mmrazor',
+            model=dict(type='ToyModel_DistillValLoop'),
+            work_dir=self.temp_dir,
+            val_dataloader=val_dataloader,
+            val_evaluator=val_evaluator,
+            val_cfg=dict(type='SingleTeacherDistillValLoop'),
+            custom_hooks=[],
+            default_hooks=dict(
+                runtime_info=dict(type='RuntimeInfoHook'),
+                timer=dict(type='IterTimerHook'),
+                logger=dict(type='LoggerHook'),
+                param_scheduler=dict(type='ParamSchedulerHook'),
+                checkpoint=dict(
+                    type='CheckpointHook', interval=1, by_epoch=True),
+                sampler_seed=dict(type='DistSamplerSeedHook')),
+            launcher='none',
+            env_cfg=dict(dist_cfg=dict(backend='nccl')),
+        )
+        self.val_loop_cfg = Config(val_loop_cfg)
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.val_loop_cfg)
+        cfg.experiment_name = 'test_init'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_val_loop(cfg.val_cfg)
+
+        self.assertIsInstance(loop, SingleTeacherDistillValLoop)
+
+    def test_run(self):
+        cfg = copy.deepcopy(self.val_loop_cfg)
+        cfg.experiment_name = 'test_run'
+        runner = Runner.from_cfg(cfg)
+        runner.val()
+
+        self.assertIn('val/teacher.acc', runner.message_hub.log_scalars.keys())
+
+
+class TestSelfDistillValLoop(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+
+        val_dataloader = dict(
+            dataset=dict(type='ToyDataset_DistillValLoop'),
+            sampler=dict(type='DefaultSampler', shuffle=False),
+            batch_size=3,
+            num_workers=0)
+        val_evaluator = dict(type='ToyMetric_DistillValLoop')
+
+        val_loop_cfg = dict(
+            default_scope='mmrazor',
+            model=dict(type='ToyModel_DistillValLoop'),
+            work_dir=self.temp_dir,
+            val_dataloader=val_dataloader,
+            val_evaluator=val_evaluator,
+            val_cfg=dict(type='SelfDistillValLoop'),
+            custom_hooks=[],
+            default_hooks=dict(
+                runtime_info=dict(type='RuntimeInfoHook'),
+                timer=dict(type='IterTimerHook'),
+                logger=dict(type='LoggerHook'),
+                param_scheduler=dict(type='ParamSchedulerHook'),
+                checkpoint=dict(
+                    type='CheckpointHook', interval=1, by_epoch=True),
+                sampler_seed=dict(type='DistSamplerSeedHook')),
+            launcher='none',
+            env_cfg=dict(dist_cfg=dict(backend='nccl')),
+        )
+        self.val_loop_cfg = Config(val_loop_cfg)
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.val_loop_cfg)
+        cfg.experiment_name = 'test_init_self'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_val_loop(cfg.val_cfg)
+
+        self.assertIsInstance(loop, SelfDistillValLoop)
+
+    def test_run(self):
+        cfg = copy.deepcopy(self.val_loop_cfg)
+        cfg.experiment_name = 'test_run_self'
+        runner = Runner.from_cfg(cfg)
+        runner.val()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_evolution_search_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_evolution_search_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..1dc2cf95897a46e81e159595c88163f6b5db4ee0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_evolution_search_loop.py
@@ -0,0 +1,392 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import shutil
+import tempfile
+from unittest import TestCase
+from unittest.mock import MagicMock, patch
+
+import torch
+import torch.nn as nn
+from mmengine import fileio
+from mmengine.config import Config
+from torch.utils.data import DataLoader, Dataset
+
+from mmrazor.engine import EvolutionSearchLoop
+from mmrazor.models import OneShotMutableOP
+from mmrazor.registry import LOOPS
+from mmrazor.structures import Candidates
+
+
+def collate_fn(data_batch):
+    return data_batch
+
+
+class ToyDataset(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_sample=self.label[index])
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.architecture = nn.Conv2d(1, 1, 1)
+
+    def forward(self, x):
+        return self.architecture(x)
+
+
+class ToyRunner:
+
+    @property
+    def distributed(self):
+        pass
+
+    @property
+    def rank(self):
+        pass
+
+    @property
+    def epoch(self):
+        pass
+
+    @property
+    def work_dir(self):
+        pass
+
+    def model(self):
+        return ToyModel()
+
+    def logger(self):
+        pass
+
+    def call_hook(self, fn_name: str):
+        pass
+
+    def visualizer(self):
+        pass
+
+
+class TestEvolutionSearchLoop(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+        train_cfg = dict(
+            type='EvolutionSearchLoop',
+            max_epochs=4,
+            max_keep_ckpts=3,
+            resume_from=None,
+            num_candidates=4,
+            top_k=2,
+            num_mutation=2,
+            num_crossover=2,
+            mutate_prob=0.1,
+            constraints_range=dict(flops=(0, 330)),
+            score_key='coco/bbox_mAP')
+        self.train_cfg = Config(train_cfg)
+        self.runner = MagicMock(spec=ToyRunner)
+        self.runner.train_dataloader = MagicMock()
+        self.dataloader = DataLoader(ToyDataset(), collate_fn=collate_fn)
+        self.evaluator = MagicMock()
+        self.calibrate_bn_statistics = MagicMock()
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        # test_init: dataloader and evaluator are instances
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.assertIsInstance(loop, EvolutionSearchLoop)
+
+        # test init_candidates is not None
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        fake_candidates = Candidates(fake_subnet)
+        init_candidates_path = os.path.join(self.temp_dir, 'candidates.yaml')
+        fileio.dump(fake_candidates, init_candidates_path)
+        loop_cfg.init_candidates = init_candidates_path
+        loop = LOOPS.build(loop_cfg)
+        self.assertIsInstance(loop, EvolutionSearchLoop)
+        self.assertEqual(loop.candidates, fake_candidates)
+
+    @patch('mmrazor.structures.subnet.fix_subnet.load_fix_subnet')
+    @patch('mmrazor.structures.subnet.fix_subnet.export_fix_subnet')
+    @patch('mmrazor.models.task_modules.estimators.resource_estimator.'
+           'get_model_flops_params')
+    def test_run_epoch(self, flops_params, mock_export_fix_subnet,
+                       load_status):
+        # test_run_epoch: distributed == False
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = False
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        loop.model.mutator.sample_choices = MagicMock(return_value=fake_subnet)
+        mock_export_fix_subnet.return_value = (fake_subnet, self.runner.model)
+        load_status.return_value = True
+        flops_params.return_value = 0, 0
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+        # test_run_epoch: distributed == True
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = True
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        self.runner.model.mutator.sample_choices = MagicMock(
+            return_value=fake_subnet)
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+        # test_check_constraints
+        loop_cfg.constraints_range = dict(params=(0, 100))
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = True
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        loop.model.mutator.sample_choices = MagicMock(return_value=fake_subnet)
+        flops_params.return_value = (50., 1)
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+    @patch('mmrazor.structures.subnet.fix_subnet.export_fix_subnet')
+    @patch('mmrazor.models.task_modules.estimators.resource_estimator.'
+           'get_model_flops_params')
+    def test_run_loop(self, mock_flops, mock_export_fix_subnet):
+        # test a new search: resume == None
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        loop._epoch = 1
+
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        mock_export_fix_subnet.return_value = (fake_subnet, self.runner.model)
+        self.runner.work_dir = self.temp_dir
+        loop.update_candidate_pool = MagicMock()
+        loop.val_candidate_pool = MagicMock()
+
+        mutation_candidates = Candidates([fake_subnet] * loop.num_mutation)
+        for i in range(loop.num_mutation):
+            mutation_candidates.set_resource(i, 0.1 + 0.1 * i, 'flops')
+            mutation_candidates.set_resource(i, 99 + i, 'score')
+        crossover_candidates = Candidates([fake_subnet] * loop.num_crossover)
+        for i in range(loop.num_crossover):
+            crossover_candidates.set_resource(i, 0.1 + 0.1 * i, 'flops')
+            crossover_candidates.set_resource(i, 99 + i, 'score')
+        loop.gen_mutation_candidates = \
+            MagicMock(return_value=mutation_candidates)
+        loop.gen_crossover_candidates = \
+            MagicMock(return_value=crossover_candidates)
+        loop.candidates = Candidates([fake_subnet] * 4)
+        mock_flops.return_value = (0.5, 101)
+        torch.save = MagicMock()
+        loop.run()
+        assert os.path.exists(
+            os.path.join(self.temp_dir, 'best_fix_subnet.yaml'))
+        self.assertEqual(loop._epoch, loop._max_epochs)
+        assert os.path.exists(
+            os.path.join(self.temp_dir,
+                         f'search_epoch_{loop._max_epochs-1}.pkl'))
+        # test resuming search
+        loop_cfg.resume_from = os.path.join(
+            self.temp_dir, f'search_epoch_{loop._max_epochs-1}.pkl')
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        loop.run()
+        self.assertEqual(loop._max_epochs, 1)
+
+
+class TestEvolutionSearchLoopWithPredictor(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+        convs = nn.ModuleDict({
+            'conv1': nn.Conv2d(3, 8, 1),
+            'conv2': nn.Conv2d(3, 8, 1),
+            'conv3': nn.Conv2d(3, 8, 1),
+        })
+        MutableOP = OneShotMutableOP(convs)
+        self.search_groups = {0: [MutableOP], 1: [MutableOP]}
+        train_cfg = dict(
+            type='EvolutionSearchLoop',
+            max_epochs=4,
+            max_keep_ckpts=3,
+            resume_from=None,
+            num_candidates=4,
+            top_k=2,
+            num_mutation=2,
+            num_crossover=2,
+            mutate_prob=0.1,
+            constraints_range=dict(flops=(0, 330)),
+            score_key='bbox_mAP',
+            predictor_cfg=dict(
+                type='MetricPredictor',
+                handler_cfg=dict(type='GaussProcessHandler'),
+                search_groups=self.search_groups,
+                train_samples=4,
+            ))
+        self.train_cfg = Config(train_cfg)
+        self.runner = MagicMock(spec=ToyRunner)
+        self.runner.train_dataloader = MagicMock()
+        self.dataloader = DataLoader(ToyDataset(), collate_fn=collate_fn)
+        self.evaluator = MagicMock()
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        # test_init: dataloader and evaluator are instances
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.assertIsInstance(loop, EvolutionSearchLoop)
+
+        # test init_candidates is not None
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        fake_candidates = Candidates(fake_subnet)
+        init_candidates_path = os.path.join(self.temp_dir, 'candidates.yaml')
+        fileio.dump(fake_candidates, init_candidates_path)
+        loop_cfg.init_candidates = init_candidates_path
+        loop = LOOPS.build(loop_cfg)
+        self.assertIsInstance(loop, EvolutionSearchLoop)
+        self.assertEqual(loop.candidates, fake_candidates)
+
+    @patch('mmrazor.structures.subnet.fix_subnet.load_fix_subnet')
+    @patch('mmrazor.structures.subnet.fix_subnet.export_fix_subnet')
+    @patch('mmrazor.models.task_modules.estimators.resource_estimator.'
+           'get_model_flops_params')
+    def test_run_epoch(self, flops_params, mock_export_fix_subnet,
+                       load_status):
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = False
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        loop.model.mutator.sample_choices = MagicMock(return_value=fake_subnet)
+        mock_export_fix_subnet.return_value = (fake_subnet, self.runner.model)
+        load_status.return_value = True
+        flops_params.return_value = 0, 0
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+        # test_run_epoch: distributed == True
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = True
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        self.runner.model.mutator.sample_choices = MagicMock(
+            return_value=fake_subnet)
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+        # test_check_constraints
+        loop_cfg.constraints_range = dict(params=(0, 100))
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        self.runner.distributed = True
+        self.runner.work_dir = self.temp_dir
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        loop.model.mutator.sample_choices = MagicMock(return_value=fake_subnet)
+        flops_params.return_value = (50., 1)
+        loop.run_epoch()
+        self.assertEqual(len(loop.candidates), 4)
+        self.assertEqual(len(loop.top_k_candidates), 2)
+        self.assertEqual(loop._epoch, 1)
+
+    @patch('mmrazor.structures.subnet.fix_subnet.export_fix_subnet')
+    @patch('mmrazor.models.task_modules.predictor.metric_predictor.'
+           'MetricPredictor.model2vector')
+    @patch('mmrazor.models.task_modules.estimators.resource_estimator.'
+           'get_model_flops_params')
+    def test_run_loop(self, mock_flops, mock_model2vector,
+                      mock_export_fix_subnet):
+        # test a new search: resume == None
+        loop_cfg = copy.deepcopy(self.train_cfg)
+        loop_cfg.runner = self.runner
+        loop_cfg.dataloader = self.dataloader
+        loop_cfg.evaluator = self.evaluator
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        loop._epoch = 1
+
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        loop.model.mutator.sample_choices = MagicMock(return_value=fake_subnet)
+        mock_export_fix_subnet.return_value = (fake_subnet, self.runner.model)
+
+        self.runner.work_dir = self.temp_dir
+        loop.update_candidate_pool = MagicMock()
+        loop.val_candidate_pool = MagicMock()
+
+        mutation_candidates = Candidates([fake_subnet] * loop.num_mutation)
+        for i in range(loop.num_mutation):
+            mutation_candidates.set_resource(i, 0.1 + 0.1 * i, 'flops')
+            mutation_candidates.set_resource(i, 99 + i, 'score')
+        crossover_candidates = Candidates([fake_subnet] * loop.num_crossover)
+        for i in range(loop.num_crossover):
+            crossover_candidates.set_resource(i, 0.1 + 0.1 * i, 'flops')
+            crossover_candidates.set_resource(i, 99 + i, 'score')
+        loop.gen_mutation_candidates = \
+            MagicMock(return_value=mutation_candidates)
+        loop.gen_crossover_candidates = \
+            MagicMock(return_value=crossover_candidates)
+        loop.candidates = Candidates([fake_subnet] * 4)
+
+        mock_flops.return_value = (0.5, 101)
+        mock_model2vector.return_value = dict(
+            normal_vector=[0, 1], onehot_vector=[0, 1, 0, 1])
+        torch.save = MagicMock()
+        loop.run()
+        assert os.path.exists(
+            os.path.join(self.temp_dir, 'best_fix_subnet.yaml'))
+        self.assertEqual(loop._epoch, loop._max_epochs)
+        assert os.path.exists(
+            os.path.join(self.temp_dir,
+                         f'search_epoch_{loop._max_epochs-1}.pkl'))
+        # test resuming search
+        loop_cfg.resume_from = os.path.join(
+            self.temp_dir, f'search_epoch_{loop._max_epochs-1}.pkl')
+        loop = LOOPS.build(loop_cfg)
+        self.runner.rank = 0
+        loop.run()
+        self.assertEqual(loop._max_epochs, 1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_quantization_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_quantization_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a300fb912195718827baf5c100dd3dc6f4c7ce9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_quantization_loop.py
@@ -0,0 +1,413 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import logging
+import shutil
+import tempfile
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine.config import Config, ConfigDict
+from mmengine.evaluator import BaseMetric
+from mmengine.hooks import Hook
+from mmengine.logging import MMLogger
+from mmengine.model import BaseModel
+from mmengine.optim import OptimWrapper
+from mmengine.registry import DATASETS, HOOKS, METRICS, MODELS, OPTIM_WRAPPERS
+from mmengine.runner import Runner
+from torch.nn.intrinsic.qat import ConvBnReLU2d
+from torch.utils.data import Dataset
+
+from mmrazor import digit_version
+from mmrazor.engine import (LSQEpochBasedLoop, PTQLoop, QATEpochBasedLoop,
+                            QATValLoop)
+
+try:
+    from torch.ao.nn.quantized import FloatFunctional, FXFloatFunctional
+    from torch.ao.quantization import QConfigMapping
+    from torch.ao.quantization.fake_quantize import FakeQuantizeBase
+    from torch.ao.quantization.fx import prepare
+    from torch.ao.quantization.qconfig_mapping import \
+        get_default_qconfig_mapping
+    from torch.ao.quantization.quantize_fx import _fuse_fx
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    QConfigMapping = get_placeholder('torch>=1.13')
+    FakeQuantizeBase = get_placeholder('torch>=1.13')
+    prepare = get_placeholder('torch>=1.13')
+    _fuse_fx = get_placeholder('torch>=1.13')
+    get_default_qconfig_mapping = get_placeholder('torch>=1.13')
+    FloatFunctional = get_placeholder('torch>=1.13')
+    FXFloatFunctional = get_placeholder('torch>=1.13')
+
+
+class ToyDataset(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 3, 4, 4)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_sample=self.label[index])
+
+
+class MMArchitectureQuant(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor)
+        self.architecture = ToyModel()
+
+    def calibrate_step(self, data):
+        data = self.data_preprocessor(data, False)
+        return self.architecture(**data)
+
+    def sync_qparams(self, src_mode):
+        pass
+
+    def forward(self, inputs, data_sample, mode='tensor'):
+        return self.architecture(inputs, data_sample, mode)
+
+
+class ToyModel(BaseModel):
+
+    def __init__(self, data_preprocessor=None):
+        super().__init__(data_preprocessor=data_preprocessor)
+        qconfig = get_default_qconfig_mapping().to_dict()['']
+        self.architecture = nn.Sequential(
+            ConvBnReLU2d(3, 3, 1, qconfig=qconfig))
+
+    def forward(self, inputs, data_sample, mode='tensor'):
+        if isinstance(inputs, list):
+            inputs = torch.stack(inputs)
+        if isinstance(data_sample, list):
+            data_sample = torch.stack(data_sample)
+        outputs = self.architecture(inputs)
+
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = data_sample.sum() - outputs.sum()
+            outputs = dict(loss=loss)
+            return outputs
+        elif mode == 'predict':
+            return outputs
+
+
+class ToyOptimWrapper(OptimWrapper):
+    ...
+
+
+class ToyMetric1(BaseMetric):
+
+    def __init__(self, collect_device='cpu', dummy_metrics=None):
+        super().__init__(collect_device=collect_device)
+        self.dummy_metrics = dummy_metrics
+
+    def process(self, data_batch, predictions):
+        result = {'acc': 1}
+        self.results.append(result)
+
+    def compute_metrics(self, results):
+        return dict(acc=1)
+
+
+DEFAULT_CFG = ConfigDict(
+    model=dict(type='MMArchitectureQuant'),
+    train_dataloader=dict(
+        dataset=dict(type='ToyDataset'),
+        sampler=dict(type='DefaultSampler', shuffle=True),
+        batch_size=3,
+        num_workers=0),
+    val_dataloader=dict(
+        dataset=dict(type='ToyDataset'),
+        sampler=dict(type='DefaultSampler', shuffle=False),
+        batch_size=3,
+        num_workers=0),
+    test_dataloader=dict(
+        dataset=dict(type='ToyDataset'),
+        sampler=dict(type='DefaultSampler', shuffle=False),
+        batch_size=3,
+        num_workers=0),
+    optim_wrapper=dict(
+        type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01)),
+    val_evaluator=dict(type='ToyMetric1'),
+    test_evaluator=dict(type='ToyMetric1'),
+    train_cfg=dict(),
+    val_cfg=dict(),
+    test_cfg=dict(),
+    custom_hooks=[],
+    data_preprocessor=None,
+    launcher='none',
+    env_cfg=dict(dist_cfg=dict(backend='nccl')),
+)
+
+
+class TestQATEpochBasedLoop(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.temp_dir = tempfile.mkdtemp()
+        MODELS.register_module(module=MMArchitectureQuant, force=True)
+        DATASETS.register_module(module=ToyDataset, force=True)
+        METRICS.register_module(module=ToyMetric1, force=True)
+        OPTIM_WRAPPERS.register_module(module=ToyOptimWrapper, force=True)
+
+        default_cfg = copy.deepcopy(DEFAULT_CFG)
+        default_cfg = Config(default_cfg)
+        default_cfg.work_dir = self.temp_dir
+        default_cfg.train_cfg = ConfigDict(
+            type='mmrazor.QATEpochBasedLoop',
+            max_epochs=4,
+            val_begin=1,
+            val_interval=1,
+            disable_observer_begin=-1,
+            freeze_bn_begin=-1,
+            dynamic_intervals=None)
+        self.default_cfg = default_cfg
+
+    def tearDown(self):
+        MODELS.module_dict.pop('MMArchitectureQuant')
+        DATASETS.module_dict.pop('ToyDataset')
+        METRICS.module_dict.pop('ToyMetric1')
+        OPTIM_WRAPPERS.module_dict.pop('ToyOptimWrapper')
+
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_init_qat_train_loop'
+        runner = Runner(**cfg)
+        self.assertIsInstance(runner, Runner)
+        self.assertIsInstance(runner.train_loop, QATEpochBasedLoop)
+
+    def test_run_epoch(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_train'
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+        @HOOKS.register_module(force=True)
+        class TestFreezeBNHook(Hook):
+
+            def __init__(self, freeze_bn_begin):
+                self.freeze_bn_begin = freeze_bn_begin
+
+            def after_train_epoch(self, runner):
+
+                def check_bn_stats(mod):
+                    if isinstance(mod, ConvBnReLU2d):
+                        assert mod.freeze_bn
+                        assert not mod.bn.training
+
+                if runner.train_loop._epoch + 1 >= self.freeze_bn_begin:
+                    runner.model.apply(check_bn_stats)
+
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_freeze_bn'
+        cfg.custom_hooks = [
+            dict(type='TestFreezeBNHook', priority=50, freeze_bn_begin=1)
+        ]
+        cfg.train_cfg.freeze_bn_begin = 1
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+        @HOOKS.register_module(force=True)
+        class TestDisableObserverHook(Hook):
+
+            def __init__(self, disable_observer_begin):
+                self.disable_observer_begin = disable_observer_begin
+
+            def after_train_epoch(self, runner):
+
+                def check_observer_stats(mod):
+                    if isinstance(mod, FakeQuantizeBase):
+                        assert mod.fake_quant_enabled[0] == 0
+
+                if runner.train_loop._epoch + 1 >= self.disable_observer_begin:
+                    runner.model.apply(check_observer_stats)
+
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_disable_observer'
+        cfg.custom_hooks = [
+            dict(
+                type='TestDisableObserverHook',
+                priority=50,
+                disable_observer_begin=1)
+        ]
+        cfg.train_cfg.disable_observer_begin = 1
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+
+class TestLSQEpochBasedLoop(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.temp_dir = tempfile.mkdtemp()
+        MODELS.register_module(module=MMArchitectureQuant, force=True)
+        DATASETS.register_module(module=ToyDataset, force=True)
+        METRICS.register_module(module=ToyMetric1, force=True)
+        OPTIM_WRAPPERS.register_module(module=ToyOptimWrapper, force=True)
+
+        default_cfg = copy.deepcopy(DEFAULT_CFG)
+        default_cfg = Config(default_cfg)
+        default_cfg.work_dir = self.temp_dir
+        default_cfg.train_cfg = ConfigDict(
+            type='mmrazor.LSQEpochBasedLoop',
+            max_epochs=4,
+            val_begin=1,
+            val_interval=1,
+            freeze_bn_begin=-1,
+            dynamic_intervals=None)
+        self.default_cfg = default_cfg
+
+    def tearDown(self):
+        MODELS.module_dict.pop('MMArchitectureQuant')
+        DATASETS.module_dict.pop('ToyDataset')
+        METRICS.module_dict.pop('ToyMetric1')
+        OPTIM_WRAPPERS.module_dict.pop('ToyOptimWrapper')
+
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_init_lsq_train_loop'
+        runner = Runner(**cfg)
+        self.assertIsInstance(runner, Runner)
+        self.assertIsInstance(runner.train_loop, LSQEpochBasedLoop)
+
+    def test_run_epoch(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_train'
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+        @HOOKS.register_module(force=True)
+        class TestFreezeBNHook(Hook):
+
+            def __init__(self, freeze_bn_begin):
+                self.freeze_bn_begin = freeze_bn_begin
+
+            def after_train_epoch(self, runner):
+
+                def check_bn_stats(mod):
+                    if isinstance(mod, ConvBnReLU2d):
+                        assert mod.freeze_bn
+                        assert not mod.bn.training
+
+                if runner.train_loop._epoch + 1 >= self.freeze_bn_begin:
+                    runner.model.apply(check_bn_stats)
+
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_freeze_bn'
+        cfg.custom_hooks = [
+            dict(type='TestFreezeBNHook', priority=50, freeze_bn_begin=1)
+        ]
+        cfg.train_cfg.freeze_bn_begin = 1
+        runner = Runner.from_cfg(cfg)
+        runner.train()
+
+
+class TestQATValLoop(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.temp_dir = tempfile.mkdtemp()
+        MODELS.register_module(module=MMArchitectureQuant, force=True)
+        DATASETS.register_module(module=ToyDataset, force=True)
+        METRICS.register_module(module=ToyMetric1, force=True)
+        OPTIM_WRAPPERS.register_module(module=ToyOptimWrapper, force=True)
+
+        default_cfg = copy.deepcopy(DEFAULT_CFG)
+        default_cfg = Config(default_cfg)
+        default_cfg.work_dir = self.temp_dir
+        default_cfg.val_cfg = ConfigDict(type='mmrazor.QATValLoop')
+        self.default_cfg = default_cfg
+
+    def tearDown(self):
+        MODELS.module_dict.pop('MMArchitectureQuant')
+        DATASETS.module_dict.pop('ToyDataset')
+        METRICS.module_dict.pop('ToyMetric1')
+        OPTIM_WRAPPERS.module_dict.pop('ToyOptimWrapper')
+
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_init_qat_val_loop'
+        runner = Runner(**cfg)
+        self.assertIsInstance(runner, Runner)
+        self.assertIsInstance(runner.val_loop, QATValLoop)
+
+    def test_run(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_qat_val'
+        cfg.pop('train_dataloader')
+        cfg.pop('train_cfg')
+        cfg.pop('optim_wrapper')
+        cfg.pop('test_dataloader')
+        cfg.pop('test_cfg')
+        cfg.pop('test_evaluator')
+        runner = Runner.from_cfg(cfg)
+        runner.val()
+
+
+class TestPTQLoop(TestCase):
+
+    def setUp(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+        self.temp_dir = tempfile.mkdtemp()
+        MODELS.register_module(module=MMArchitectureQuant, force=True)
+        DATASETS.register_module(module=ToyDataset, force=True)
+        METRICS.register_module(module=ToyMetric1, force=True)
+        OPTIM_WRAPPERS.register_module(module=ToyOptimWrapper, force=True)
+
+        default_cfg = copy.deepcopy(DEFAULT_CFG)
+        default_cfg = Config(default_cfg)
+        default_cfg.work_dir = self.temp_dir
+        # save_checkpoint in PTQLoop need train_dataloader
+        default_cfg.train_cfg = ConfigDict(by_epoch=True, max_epochs=3)
+        default_cfg.test_cfg = ConfigDict(
+            type='mmrazor.PTQLoop',
+            calibrate_dataloader=default_cfg.train_dataloader,
+            calibrate_steps=32)
+        self.default_cfg = default_cfg
+
+    def tearDown(self):
+        MODELS.module_dict.pop('MMArchitectureQuant')
+        DATASETS.module_dict.pop('ToyDataset')
+        METRICS.module_dict.pop('ToyMetric1')
+        OPTIM_WRAPPERS.module_dict.pop('ToyOptimWrapper')
+
+        logging.shutdown()
+        MMLogger._instance_dict.clear()
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_init_ptq_loop'
+        runner = Runner(**cfg)
+        self.assertIsInstance(runner, Runner)
+        self.assertIsInstance(runner.test_loop, PTQLoop)
+
+    def test_run(self):
+        cfg = copy.deepcopy(self.default_cfg)
+        cfg.experiment_name = 'test_ptq_run'
+        runner = Runner.from_cfg(cfg)
+        runner.test()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_subnet_sampler_loop.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_subnet_sampler_loop.py
new file mode 100644
index 0000000000000000000000000000000000000000..02c3a90d5d2b550c12f143a32f6aeb4e8ccd2f4a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_subnet_sampler_loop.py
@@ -0,0 +1,207 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+import os
+import shutil
+import tempfile
+from unittest import TestCase
+from unittest.mock import MagicMock, patch
+
+import torch
+import torch.nn as nn
+from mmengine.config import Config
+from mmengine.evaluator import BaseMetric
+from mmengine.model import BaseModel
+from mmengine.runner import Runner
+from torch.utils.data import Dataset
+
+from mmrazor.engine import GreedySamplerTrainLoop  # noqa: F401
+from mmrazor.registry import DATASETS, METRICS, MODELS
+
+
+@MODELS.register_module()
+class ToyModel_GreedySamplerTrainLoop(BaseModel):
+
+    @patch('mmrazor.models.mutators.NasMutator')
+    def __init__(self, mock_mutator):
+        super().__init__()
+        self.linear1 = nn.Linear(2, 2)
+        self.linear2 = nn.Linear(2, 1)
+        self.mutator = mock_mutator
+
+    def forward(self, inputs, data_samples, mode='tensor'):
+        batch_inputs = torch.stack(inputs)
+        labels = torch.stack(data_samples)
+        outputs = self.linear1(batch_inputs)
+        outputs = self.linear2(outputs)
+
+        if mode == 'tensor':
+            return outputs
+        elif mode == 'loss':
+            loss = (labels - outputs).sum()
+            outputs = dict(loss=loss)
+            return outputs
+        elif mode == 'predict':
+            outputs = dict(log_vars=dict(a=1, b=0.5))
+            return outputs
+
+    def sample_subnet(self):
+        return self.mutator.sample_choices()
+
+    def set_subnet(self, subnet):
+        self.mutator.set_choices(subnet)
+
+    def export_fix_subnet(self):
+        pass
+
+
+@DATASETS.register_module()
+class ToyDataset_GreedySamplerTrainLoop(Dataset):
+    METAINFO = dict()  # type: ignore
+    data = torch.randn(12, 2)
+    label = torch.ones(12)
+
+    @property
+    def metainfo(self):
+        return self.METAINFO
+
+    def __len__(self):
+        return self.data.size(0)
+
+    def __getitem__(self, index):
+        return dict(inputs=self.data[index], data_samples=self.label[index])
+
+
+@METRICS.register_module()
+class ToyMetric_GreedySamplerTrainLoop(BaseMetric):
+
+    def __init__(self, collect_device='cpu', dummy_metrics=None):
+        super().__init__(collect_device=collect_device)
+        self.dummy_metrics = dummy_metrics
+
+    def process(self, data_samples, predictions):
+        result = {'acc': 1}
+        self.results.append(result)
+
+    def compute_metrics(self, results):
+        return dict(acc=1)
+
+
+class TestGreedySamplerTrainLoop(TestCase):
+
+    def setUp(self):
+        self.temp_dir = tempfile.mkdtemp()
+
+        val_dataloader = dict(
+            dataset=dict(type='ToyDataset_GreedySamplerTrainLoop'),
+            sampler=dict(type='DefaultSampler', shuffle=False),
+            batch_size=3,
+            num_workers=0)
+        val_evaluator = dict(type='ToyMetric_GreedySamplerTrainLoop')
+
+        iter_based_cfg = dict(
+            default_scope='mmrazor',
+            model=dict(type='ToyModel_GreedySamplerTrainLoop'),
+            work_dir=self.temp_dir,
+            train_dataloader=dict(
+                dataset=dict(type='ToyDataset_GreedySamplerTrainLoop'),
+                sampler=dict(type='InfiniteSampler', shuffle=True),
+                batch_size=3,
+                num_workers=0),
+            val_dataloader=val_dataloader,
+            optim_wrapper=dict(
+                type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01)),
+            param_scheduler=dict(type='MultiStepLR', milestones=[1, 2]),
+            val_evaluator=val_evaluator,
+            train_cfg=dict(
+                type='GreedySamplerTrainLoop',
+                dataloader_val=val_dataloader,
+                evaluator=val_evaluator,
+                max_iters=12,
+                val_interval=2,
+                score_key='acc',
+                constraints_range=None,
+                num_candidates=4,
+                num_samples=2,
+                top_k=2,
+                prob_schedule='linear',
+                schedule_start_iter=4,
+                schedule_end_iter=10,
+                init_prob=0.,
+                max_prob=0.8),
+            val_cfg=dict(),
+            custom_hooks=[],
+            default_hooks=dict(
+                runtime_info=dict(type='RuntimeInfoHook'),
+                timer=dict(type='IterTimerHook'),
+                logger=dict(type='LoggerHook'),
+                param_scheduler=dict(type='ParamSchedulerHook'),
+                checkpoint=dict(
+                    type='CheckpointHook', interval=1, by_epoch=False),
+                sampler_seed=dict(type='DistSamplerSeedHook')),
+            launcher='none',
+            env_cfg=dict(dist_cfg=dict(backend='nccl')),
+        )
+        self.iter_based_cfg = Config(iter_based_cfg)
+
+    def tearDown(self):
+        shutil.rmtree(self.temp_dir)
+
+    def test_init(self):
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_init_GreedySamplerTrainLoop'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_train_loop(cfg.train_cfg)
+        self.assertIsInstance(loop, GreedySamplerTrainLoop)
+
+    def test_update_cur_prob(self):
+        # prob_schedule = linear
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_update_cur_prob1'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_train_loop(cfg.train_cfg)
+
+        loop.update_cur_prob(loop.schedule_end_iter - 1)
+        self.assertGreater(loop.max_prob, loop.cur_prob)
+        loop.update_cur_prob(loop.schedule_end_iter + 1)
+        self.assertEqual(loop.max_prob, loop.cur_prob)
+
+        # prob_schedule = consine
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_update_cur_prob2'
+        cfg.train_cfg.prob_schedule = 'consine'
+        runner = Runner.from_cfg(cfg)
+        loop = runner.build_train_loop(cfg.train_cfg)
+
+        loop.update_cur_prob(loop.schedule_end_iter - 1)
+        self.assertGreater(loop.max_prob, loop.cur_prob)
+        loop.update_cur_prob(loop.schedule_end_iter + 1)
+        self.assertEqual(loop.max_prob, loop.cur_prob)
+
+    def test_sample_subnet(self):
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_sample_subnet'
+        runner = Runner.from_cfg(cfg)
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        runner.model.sample_subnet = MagicMock(return_value=fake_subnet)
+        loop = runner.build_train_loop(cfg.train_cfg)
+        loop.cur_prob = loop.max_prob
+        self.assertEqual(len(loop.top_k_candidates), 0)
+
+        loop._iter = loop.val_interval
+        subnet = loop.sample_subnet()
+        self.assertEqual(subnet, fake_subnet)
+        self.assertEqual(len(loop.top_k_candidates), loop.top_k)
+
+    def test_run(self):
+        # test run with _check_constraints
+        cfg = copy.deepcopy(self.iter_based_cfg)
+        cfg.experiment_name = 'test_run1'
+        runner = Runner.from_cfg(cfg)
+        fake_subnet = {'1': 'choice1', '2': 'choice2'}
+        runner.model.sample_subnet = MagicMock(return_value=fake_subnet)
+        loop = runner.build_train_loop(cfg.train_cfg)
+        loop._check_constraints = MagicMock(return_value=(True, dict()))
+        runner.train()
+
+        self.assertEqual(runner.iter, runner.max_iters)
+        assert os.path.exists(os.path.join(self.temp_dir, 'candidates.pkl'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_calibrate_bn_mixin.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_calibrate_bn_mixin.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce482c268eb1d4dafdffa6b3daf9495dfe9dfe78
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_calibrate_bn_mixin.py
@@ -0,0 +1,83 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from logging import Logger
+from typing import Sequence
+from unittest import TestCase
+
+import torch
+from torch import Tensor, nn
+from torch.utils.data import DataLoader, Dataset
+
+from mmrazor.engine.runner.utils import CalibrateBNMixin
+
+
+class ToyModel(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+
+        self.bn = nn.BatchNorm2d(3)
+
+    def forward(self, x: Tensor) -> Tensor:
+        return self.bn(x)
+
+    def test_step(self, x: Tensor) -> None:
+        self(x)
+
+
+class ToyRunner:
+
+    def __init__(self) -> None:
+        self.model = ToyModel()
+        self.logger = Logger('calibrate test logger')
+
+
+class ToyValLoop(CalibrateBNMixin):
+
+    def __init__(self) -> None:
+        self.fp16 = False
+        self.runner = ToyRunner()
+
+
+class FakeDataset(Dataset):
+
+    def __init__(self,
+                 random_nums: int = 64,
+                 x_shape: Sequence[int] = (3, 224, 224)) -> None:
+        self.random_x = torch.normal(1, 100, size=(random_nums, *x_shape))
+        self.random_nums = random_nums
+        self.x_shape = list(x_shape)
+
+    def __getitem__(self, index: int) -> Tensor:
+        return self.random_x[index]
+
+    def __len__(self) -> int:
+        return self.random_nums
+
+    @property
+    def data(self) -> Tensor:
+        return self.random_x
+
+
+class TestCalibrateBNMixin(TestCase):
+
+    def test_calibrate_bn_statistics(self) -> None:
+        dataloader = self.prepare_dataloader(random_nums=2000)
+        loop = ToyValLoop()
+        loop.calibrate_bn_statistics(dataloader, 2000)
+
+        calibrated_data = dataloader.dataset.data
+        calibrated_mean = calibrated_data.mean((0, 2, 3))
+        calibrated_var = calibrated_data.var((0, 2, 3), unbiased=True)
+
+        assert torch.allclose(calibrated_mean,
+                              loop.runner.model.bn.running_mean)
+        assert torch.allclose(calibrated_var, loop.runner.model.bn.running_var)
+
+    def prepare_dataloader(
+        self,
+        random_nums: int = 2000,
+        x_shape: Sequence[int] = (3, 224, 224)
+    ) -> DataLoader:
+        dataset = FakeDataset(random_nums=random_nums, x_shape=x_shape)
+
+        return DataLoader(dataset, batch_size=64)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_check.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_check.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f3a80eaa3acfaeac3dba94a8168b64c0a015c53
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_check.py
@@ -0,0 +1,44 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest.mock import patch
+
+from mmrazor.engine.runner.utils import check_subnet_resources
+
+try:
+    from mmdet.models.detectors import BaseDetector
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BaseDetector = get_placeholder('mmdet')
+
+
+@patch('mmrazor.models.ResourceEstimator')
+@patch('mmrazor.models.SPOS')
+def test_check_subnet_resources(mock_model, mock_estimator):
+    # constraints_range = dict()
+    constraints_range = dict()
+    fake_subnet = {'1': 'choice1', '2': 'choice2'}
+    is_pass, _ = check_subnet_resources(mock_model, fake_subnet,
+                                        mock_estimator, constraints_range)
+    assert is_pass is True
+
+    # constraints_range is not None
+    # architecturte is BaseDetector
+    constraints_range = dict(flops=(0, 330))
+    mock_model.architecture = BaseDetector
+    fake_results = {'flops': 50.}
+    mock_estimator.estimate.return_value = fake_results
+    is_pass, _ = check_subnet_resources(
+        mock_model,
+        fake_subnet,
+        mock_estimator,
+        constraints_range,
+    )
+    assert is_pass is True
+
+    # constraints_range is not None
+    # architecturte is BaseDetector
+    constraints_range = dict(flops=(0, 330))
+    fake_results = {'flops': -50.}
+    mock_estimator.estimate.return_value = fake_results
+    is_pass, _ = check_subnet_resources(mock_model, fake_subnet,
+                                        mock_estimator, constraints_range)
+    assert is_pass is False
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_genetic.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_genetic.py
new file mode 100644
index 0000000000000000000000000000000000000000..04f46e24a9b409ad1ce1903def9b231678fb26e5
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_runners/test_utils/test_genetic.py
@@ -0,0 +1,15 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmrazor.engine.runner.utils import crossover
+
+
+def test_crossover():
+    fake_random_subnet1 = {}
+    fake_random_subnet2 = {}
+    for i in range(50):
+        fake_random_subnet1[i] = f'{i}_choice1'
+        fake_random_subnet2[i] = f'{i}_choice2'
+
+    result = crossover(fake_random_subnet1, fake_random_subnet2)
+
+    assert type(result) == type(fake_random_subnet1)
+    assert len(result) == len(fake_random_subnet1)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_backendconfig.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_backendconfig.py
new file mode 100644
index 0000000000000000000000000000000000000000..24295e391ee0fd99874bfd0b1e22a2bbd8d5aca4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_backendconfig.py
@@ -0,0 +1,62 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+try:
+    from torch.ao.quantization.backend_config import BackendConfig
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    BackendConfig = get_placeholder('torch>=1.13')
+
+import pytest
+import torch
+
+from mmrazor import digit_version
+from mmrazor.structures.quantization.backend_config import (
+    BackendConfigs, get_academic_backend_config,
+    get_academic_backend_config_dict, get_native_backend_config,
+    get_native_backend_config_dict, get_openvino_backend_config,
+    get_openvino_backend_config_dict, get_tensorrt_backend_config,
+    get_tensorrt_backend_config_dict)
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_get_backend_config():
+
+    # test get_native_backend_config
+    native_backend_config = get_native_backend_config()
+    assert isinstance(native_backend_config, BackendConfig)
+    assert native_backend_config.name == 'native'
+    native_backend_config_dict = get_native_backend_config_dict()
+    assert isinstance(native_backend_config_dict, dict)
+
+    # test get_academic_backend_config
+    academic_backend_config = get_academic_backend_config()
+    assert isinstance(academic_backend_config, BackendConfig)
+    assert academic_backend_config.name == 'academic'
+    academic_backend_config_dict = get_academic_backend_config_dict()
+    assert isinstance(academic_backend_config_dict, dict)
+
+    # test get_openvino_backend_config
+    openvino_backend_config = get_openvino_backend_config()
+    assert isinstance(openvino_backend_config, BackendConfig)
+    assert openvino_backend_config.name == 'openvino'
+    openvino_backend_config_dict = get_openvino_backend_config_dict()
+    assert isinstance(openvino_backend_config_dict, dict)
+
+    # test get_tensorrt_backend_config
+    tensorrt_backend_config = get_tensorrt_backend_config()
+    assert isinstance(tensorrt_backend_config, BackendConfig)
+    assert tensorrt_backend_config.name == 'tensorrt'
+    tensorrt_backend_config_dict = get_tensorrt_backend_config_dict()
+    assert isinstance(tensorrt_backend_config_dict, dict)
+
+
+@pytest.mark.skipif(
+    digit_version(torch.__version__) < digit_version('1.13.0'),
+    reason='version of torch < 1.13.0')
+def test_backendconfigs_mapping():
+
+    mapping = BackendConfigs
+    assert isinstance(mapping, dict)
+    assert 'academic' in mapping.keys()
+    assert isinstance(mapping['academic'], BackendConfig)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_qconfig.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_qconfig.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ab78243daeb519c2a8bc93aae893420de22dc63
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_structures/test_qconfig.py
@@ -0,0 +1,172 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from unittest import TestCase
+
+import torch
+from mmengine.config import Config
+
+try:
+    from torch.ao.quantization import FakeQuantize, QConfig
+except ImportError:
+    from mmrazor.utils import get_placeholder
+    QConfig = get_placeholder('torch>=1.13')
+    FakeQuantize = get_placeholder('torch>=1.13')
+
+from mmrazor import digit_version
+from mmrazor.models.fake_quants import register_torch_fake_quants
+from mmrazor.models.observers import register_torch_observers
+from mmrazor.registry import MODELS
+from mmrazor.structures import QConfigHandler, QSchemeHandler
+
+register_torch_observers()
+register_torch_fake_quants()
+
+
+class TestQSchemeHandler(TestCase):
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # per_channel
+        qscheme = QSchemeHandler(is_symmetry=True, is_per_channel=True)
+        assert qscheme.torch_qscheme is torch.per_channel_symmetric
+
+        # per_tensor
+        qscheme = QSchemeHandler(is_symmetry=True, is_per_channel=False)
+        assert qscheme.torch_qscheme is torch.per_tensor_symmetric
+
+        # qdtype is incorrect
+        self.assertRaises(AssertionError, QSchemeHandler, 'float')
+
+        # is_symmetric_range
+        kwargs = {'is_symmetric_range': True}
+        qscheme = QSchemeHandler(**kwargs)
+        assert qscheme.is_symmetric_range is True
+
+    def test_to_observer_params(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # qdtype = quint8
+        ret_params = QSchemeHandler(qdtype='quint8').to_observer_params()
+        assert ret_params['dtype'] == torch.quint8
+        assert ret_params['quant_min'] == 0 and ret_params['quant_max'] == 255
+
+        # qdtype = qint8, is_symmetric_range=False
+        ret_params = QSchemeHandler(qdtype='qint8').to_observer_params()
+        assert ret_params['dtype'] == torch.qint8
+        assert ret_params['quant_min'] == -128 and ret_params[
+            'quant_max'] == 127
+
+        # qdtype = qint8, is_symmetric_range=True
+        ret_params = QSchemeHandler(
+            qdtype='qint8', is_symmetric_range=True).to_observer_params()
+        assert ret_params['quant_min'] == -127 and ret_params[
+            'quant_max'] == 127
+
+        # per_channel
+        ret_params = QSchemeHandler(is_per_channel=True).to_observer_params()
+        assert ret_params['ch_axis'] == 0
+
+        # per_tensor
+        ret_params = QSchemeHandler(is_per_channel=False).to_observer_params()
+        assert 'ch_axis' not in ret_params.keys()
+
+
+class TestQConfigHandler(TestCase):
+
+    def setUp(self):
+        self.qconfig_dict = dict(
+            w_observer=dict(type='MovingAveragePerChannelMinMaxObserver'),
+            a_observer=dict(type='MovingAveragePerChannelMinMaxObserver'),
+            w_fake_quant=dict(type='FakeQuantize'),
+            a_fake_quant=dict(type='FakeQuantize'),
+            w_qscheme=dict(
+                qdtype='qint8',
+                bit=8,
+                is_symmetry=True,
+                is_symmetric_range=True),
+            a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
+        )
+        self.qconfig = Config(self.qconfig_dict)
+
+    def test_check_qconfig(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        assert QConfigHandler.check_qconfig(self.qconfig_dict) is True
+        assert QConfigHandler.check_qconfig(self.qconfig) is True
+        qconfig_dict = copy.copy(self.qconfig_dict)
+        print(qconfig_dict)
+        qconfig_dict.pop('w_observer')
+        assert QConfigHandler.check_qconfig(qconfig_dict) is False
+
+    def test_init(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # test dict init
+        qconfig = QConfigHandler(self.qconfig_dict)
+        assert hasattr(qconfig, 'w_qscheme')
+        assert hasattr(qconfig, 'a_qscheme')
+        assert hasattr(qconfig, 'w_fake_quant')
+        assert hasattr(qconfig, 'a_fake_quant')
+
+        # test mmengine's Config init
+        qconfig = QConfigHandler(self.qconfig)
+        assert hasattr(qconfig, 'w_qscheme')
+        assert hasattr(qconfig, 'a_qscheme')
+        assert hasattr(qconfig, 'w_fake_quant')
+        assert hasattr(qconfig, 'a_fake_quant')
+
+        # per_channel
+        assert qconfig.w_qscheme.is_per_channel is True
+        assert qconfig.a_qscheme.is_per_channel is True
+
+    def test_convert(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        qconfig = QConfigHandler(self.qconfig)
+        torch_qconfig = qconfig.convert()
+        assert isinstance(torch_qconfig, QConfig)
+
+    def test_replace_fakequant(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        # update_qparams is False
+        qconfig = QConfigHandler(self.qconfig)
+        org_fakequant_ins = qconfig.w_fake_quant()
+        new_fakequant = qconfig.replace_fakequant(
+            org_fakequant_ins, qconfig.w_qscheme, update_qparams=False)
+        new_fakequant_ins = new_fakequant()
+        assert isinstance(new_fakequant_ins, FakeQuantize)
+        assert isinstance(new_fakequant_ins.activation_post_process,
+                          MODELS.get('PerChannelMinMaxObserver'))
+
+        # update_qparams is True
+        qconfig = QConfigHandler(self.qconfig)
+        org_fakequant_ins = qconfig.w_fake_quant()
+        org_fakequant_ins.scale = torch.Tensor([2])
+        org_fakequant_ins.activation_post_process.min_val = torch.Tensor([1])
+        new_fakequant_ins = qconfig.replace_fakequant(
+            org_fakequant_ins, qconfig.w_qscheme, update_qparams=True)
+        assert isinstance(new_fakequant_ins, FakeQuantize)
+        assert isinstance(new_fakequant_ins.activation_post_process,
+                          MODELS.get('PerChannelMinMaxObserver'))
+        assert new_fakequant_ins.scale == org_fakequant_ins.scale
+        assert new_fakequant_ins.activation_post_process.min_val == \
+            org_fakequant_ins.activation_post_process.min_val
+
+    def test_fixed_w_fakequant(self):
+        if digit_version(torch.__version__) < digit_version('1.13.0'):
+            self.skipTest('version of torch < 1.13.0')
+
+        qconfig = QConfigHandler(self.qconfig)
+        qconfig.fixed_w_fakequant()
+        new_fakequant_ins = qconfig.w_fake_quant()
+        assert isinstance(new_fakequant_ins, FakeQuantize)
+        assert isinstance(new_fakequant_ins.activation_post_process,
+                          MODELS.get('PerChannelMinMaxObserver'))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef101fec61e72abc0eb90266d453b5b22331378d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) OpenMMLab. All rights reserved.
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/test_tools.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/test_tools.py
new file mode 100644
index 0000000000000000000000000000000000000000..8af4a0d20a63e21ff5b76ab248fcd2504bc648b9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_tools/test_tools.py
@@ -0,0 +1,101 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import shutil
+import subprocess
+from unittest import TestCase
+
+import torch
+
+from mmrazor import digit_version
+
+TEST_TOOLS = os.getenv('TEST_TOOLS') == 'true'
+
+
+class TestTools(TestCase):
+    _config_path = None
+
+    def setUp(self) -> None:
+        if not TEST_TOOLS:
+            self.skipTest('disabled')
+
+    @property
+    def config_path(self):
+        if self._config_path is None:
+            self._config_path = self._get_config_path()
+        return self._config_path
+
+    def _setUp(self) -> None:
+        self.workdir = os.path.dirname(__file__) + '/tmp/'
+        if not os.path.exists(self.workdir):
+            os.mkdir(self.workdir)
+
+    def save_to_config(self, name, content):
+        with open(self.workdir + f'/{name}', 'w') as f:
+            f.write(content)
+
+    def test_get_channel_unit(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+
+        for path in self.config_path:
+            with self.subTest(path=path):
+                self._setUp()
+                self.save_to_config('pretrain.py', f"""_base_=['{path}']""")
+                try:
+                    subprocess.run([
+                        'python', './tools/pruning/get_channel_units.py',
+                        f'{self.workdir}/pretrain.py', '-o',
+                        f'{self.workdir}/unit.json'
+                    ])
+                except Exception as e:
+                    self.fail(f'{e}')
+                self.assertTrue(os.path.exists(f'{self.workdir}/unit.json'))
+
+                self._tearDown()
+
+    def test_get_prune_config(self):
+        if digit_version(torch.__version__) < digit_version('1.12.0'):
+            self.skipTest('version of torch < 1.12.0')
+        for path in self.config_path:
+            with self.subTest(path=path):
+                self._setUp()
+                self.save_to_config('pretrain.py', f"""_base_=['{path}']""")
+                try:
+                    subprocess.run([
+                        'python',
+                        './tools/pruning/get_l1_prune_config.py',
+                        f'{self.workdir}/pretrain.py',
+                        '-o',
+                        f'{self.workdir}/prune.py',
+                    ])
+                    pass
+                except Exception as e:
+                    self.fail(f'{e}')
+                self.assertTrue(os.path.exists(f'{self.workdir}/prune.py'))
+
+                self._tearDown()
+
+    def _tearDown(self) -> None:
+        print('delete')
+        shutil.rmtree(self.workdir)
+        pass
+
+    def _get_config_path(self):
+        config_paths = []
+        paths = [
+            ('mmcls', 'mmcls::resnet/resnet34_8xb32_in1k.py'),
+            ('mmdet', 'mmdet::retinanet/retinanet_r18_fpn_1x_coco.py'),
+            (
+                'mmseg',
+                'mmseg::deeplabv3plus/deeplabv3plus_r50-d8_4xb4-20k_voc12aug-512x512.py'  # noqa
+            ),
+            ('mmyolo',
+             'mmyolo::yolov5/yolov5_m-p6-v62_syncbn_fast_8xb16-300e_coco.py')
+        ]
+        for repo_name, path in paths:
+            try:
+                __import__(repo_name)
+                config_paths.append(path)
+            except Exception:
+                pass
+        return config_paths
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_index_dict.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_index_dict.py
new file mode 100644
index 0000000000000000000000000000000000000000..767dd806cd940d30a96cb23e6a3ee6130bb0d85c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_index_dict.py
@@ -0,0 +1,16 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+from mmrazor.utils.index_dict import IndexDict
+
+
+class TestIndexDict(unittest.TestCase):
+
+    def test_dict(self):
+        dict = IndexDict()
+        dict[(4, 5)] = 2
+        dict[(1, 3)] = 1
+
+        self.assertSequenceEqual(list(dict.keys()), [(1, 3), (4, 5)])
+        with self.assertRaisesRegex(AssertionError, 'overlap'):
+            dict[2, 3] = 3
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_placeholder.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_placeholder.py
new file mode 100644
index 0000000000000000000000000000000000000000..600cd09141c51e357a889aa85af7c3e5d5c0c554
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_utils/test_placeholder.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import pytest
+
+from mmrazor.utils import get_placeholder
+
+
+class TestPlaceholder(unittest.TestCase):
+
+    def test_placeholder(self):
+        holder = get_placeholder('test')
+        with pytest.raises(ImportError):
+            holder()
+        from mmrazor.models.architectures.dynamic_ops import DynamicMixin
+
+        class tmp(holder, DynamicMixin):
+            pass
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/test_visualizer/test_visualizer.py b/cv/distiller/CWD/pytorch/mmrazor/tests/test_visualizer/test_visualizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1beaedce606716934bb8b594b9a80f4f1380fc2
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/test_visualizer/test_visualizer.py
@@ -0,0 +1,128 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import numpy as np
+import pytest
+import torch
+from mmengine.visualization import Visualizer
+
+from mmrazor.visualization.local_visualizer import modify
+
+
+class TestVisualizer(TestCase):
+
+    def setUp(self):
+        """Setup the demo image in every test method.
+
+        TestCase calls functions in this order: setUp() -> testMethod() ->
+        tearDown() -> cleanUp()
+        """
+        self.image = np.random.randint(
+            0, 256, size=(10, 10, 3)).astype('uint8')
+
+    def test_draw_featmap(self):
+        visualizer = Visualizer()
+        visualizer.draw_featmap = modify
+        image = np.random.randint(0, 256, size=(3, 3, 3), dtype='uint8')
+
+        # must be Tensor
+        with pytest.raises(
+                AssertionError,
+                match='`featmap` should be torch.Tensor, but got '
+                "<class 'numpy.ndarray'>"):
+            visualizer.draw_featmap(np.ones((3, 3, 3)))
+
+        # test tensor format
+        with pytest.raises(
+                AssertionError, match='Input dimension must be 3, but got 4'):
+            visualizer.draw_featmap(torch.randn(1, 1, 3, 3))
+
+        # test overlaid_image shape
+        with pytest.warns(Warning):
+            visualizer.draw_featmap(torch.randn(1, 4, 3), overlaid_image=image)
+
+        # test resize_shape
+        featmap = visualizer.draw_featmap(
+            torch.randn(1, 4, 3), resize_shape=(6, 7))
+        assert featmap.shape[:2] == (6, 7)
+        featmap = visualizer.draw_featmap(
+            torch.randn(1, 4, 3), overlaid_image=image, resize_shape=(6, 7))
+        assert featmap.shape[:2] == (6, 7)
+
+        # test channel_reduction parameter
+        # mode only supports 'squeeze_mean' and 'select_max'
+        with pytest.raises(AssertionError):
+            visualizer.draw_featmap(
+                torch.randn(2, 3, 3), channel_reduction='xx')
+
+        featmap = visualizer.draw_featmap(
+            torch.randn(2, 3, 3), channel_reduction='squeeze_mean')
+        assert featmap.shape[:2] == (3, 3)
+        featmap = visualizer.draw_featmap(
+            torch.randn(2, 3, 3), channel_reduction='select_max')
+        assert featmap.shape[:2] == (3, 3)
+        featmap = visualizer.draw_featmap(
+            torch.randn(2, 3, 3), channel_reduction='pixel_wise_max')
+        assert featmap.shape[:2] == (3, 3)
+        featmap = visualizer.draw_featmap(
+            torch.randn(2, 4, 3),
+            overlaid_image=image,
+            channel_reduction='pixel_wise_max')
+        assert featmap.shape[:2] == (3, 3)
+
+        # test topk parameter
+        with pytest.raises(
+                AssertionError,
+                match='The input tensor channel dimension must be 1 or 3 '
+                'when topk is less than 1, but the channel '
+                'dimension you input is 6, you can use the '
+                'channel_reduction parameter or set topk '
+                'greater than 0 to solve the error'):
+            visualizer.draw_featmap(
+                torch.randn(6, 3, 3), channel_reduction=None, topk=0)
+
+        featmap = visualizer.draw_featmap(
+            torch.randn(6, 3, 3), channel_reduction='select_max', topk=10)
+        assert featmap.shape[:2] == (3, 3)
+        featmap = visualizer.draw_featmap(
+            torch.randn(1, 4, 3), channel_reduction=None, topk=-1)
+        assert featmap.shape[:2] == (4, 3)
+
+        featmap = visualizer.draw_featmap(
+            torch.randn(3, 4, 3),
+            overlaid_image=image,
+            channel_reduction=None,
+            topk=-1)
+        assert featmap.shape[:2] == (3, 3)
+        featmap = visualizer.draw_featmap(
+            torch.randn(6, 3, 3),
+            channel_reduction=None,
+            topk=4,
+            arrangement=(2, 2))
+        assert featmap.shape[:2] == (6, 6)
+        featmap = visualizer.draw_featmap(
+            torch.randn(6, 3, 3),
+            channel_reduction=None,
+            topk=4,
+            arrangement=(1, 4))
+        assert featmap.shape[:2] == (3, 12)
+        with pytest.raises(
+                AssertionError,
+                match='The product of row and col in the `arrangement` '
+                'is less than topk, please set '
+                'the `arrangement` correctly'):
+            visualizer.draw_featmap(
+                torch.randn(6, 3, 3),
+                channel_reduction=None,
+                topk=4,
+                arrangement=(1, 2))
+
+        # test gray
+        featmap = visualizer.draw_featmap(
+            torch.randn(6, 3, 3),
+            overlaid_image=np.random.randint(
+                0, 256, size=(3, 3), dtype='uint8'),
+            channel_reduction=None,
+            topk=4,
+            arrangement=(2, 2))
+        assert featmap.shape[:2] == (6, 6)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/utils/__init__.py b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4f0562e4641cdcb72adc6c1249908ecbbcd4e40
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .set_torch_thread import SetTorchThread
+
+__all__ = ['SetTorchThread']
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_dist_env.py b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_dist_env.py
new file mode 100644
index 0000000000000000000000000000000000000000..66d41a170d99b9fc508ced9c44b56a44ad1dfae7
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_dist_env.py
@@ -0,0 +1,31 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import random
+
+import torch
+import torch.distributed as dist
+
+
+class SetDistEnv:
+
+    def __init__(self, using_cuda=False, port=None) -> None:
+        self.using_cuda = using_cuda
+        if self.using_cuda:
+            assert torch.cuda.is_available()
+        if port is None:
+            port = random.randint(10000, 20000)
+        self.port = port
+
+    def __enter__(self):
+        os.environ['MASTER_ADDR'] = 'localhost'
+        os.environ['MASTER_PORT'] = str(self.port)
+
+        # initialize the process group
+        if self.using_cuda:
+            backend = 'nccl'
+        else:
+            backend = 'gloo'
+        dist.init_process_group(backend, rank=0, world_size=1)
+
+    def __exit__(self, exc_type, exc_value, tb):
+        dist.destroy_process_group()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_torch_thread.py b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_torch_thread.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3cc482e87d2e26bbad47f4d0734ea6b0422faab
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tests/utils/set_torch_thread.py
@@ -0,0 +1,17 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+
+
+class SetTorchThread:
+
+    def __init__(self, num_thread: int = -1) -> None:
+        self.prev_num_threads = torch.get_num_threads()
+        self.num_threads = num_thread
+
+    def __enter__(self):
+        if self.num_threads != -1:
+            torch.set_num_threads(self.num_threads)
+
+    def __exit__(self, exc_type, exc_value, tb):
+        if self.num_threads != -1:
+            torch.set_num_threads(self.prev_num_threads)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/dataset_converters/cityscapes.py b/cv/distiller/CWD/pytorch/mmrazor/tools/dataset_converters/cityscapes.py
new file mode 100644
index 0000000000000000000000000000000000000000..3f22c27ad1bc21c9e09f7b0fc45476019668714d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/dataset_converters/cityscapes.py
@@ -0,0 +1,56 @@
+#copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os.path as osp
+
+from cityscapesscripts.preparation.json2labelImg import json2labelImg
+from mmengine.utils import (mkdir_or_exist, scandir, track_parallel_progress,
+                            track_progress)
+
+
+def convert_json_to_label(json_file):
+    label_file = json_file.replace('_polygons.json', '_labelTrainIds.png')
+    json2labelImg(json_file, label_file, 'trainIds')
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Convert Cityscapes annotations to TrainIds')
+    parser.add_argument('cityscapes_path', help='cityscapes data path')
+    parser.add_argument('--gt-dir', default='gtFine', type=str)
+    parser.add_argument('-o', '--out-dir', help='output path')
+    parser.add_argument(
+        '--nproc', default=1, type=int, help='number of process')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    cityscapes_path = args.cityscapes_path
+    out_dir = args.out_dir if args.out_dir else cityscapes_path
+    mkdir_or_exist(out_dir)
+
+    gt_dir = osp.join(cityscapes_path, args.gt_dir)
+
+    poly_files = []
+    for poly in scandir(gt_dir, '_polygons.json', recursive=True):
+        poly_file = osp.join(gt_dir, poly)
+        poly_files.append(poly_file)
+    if args.nproc > 1:
+        track_parallel_progress(convert_json_to_label, poly_files, args.nproc)
+    else:
+        track_progress(convert_json_to_label, poly_files)
+
+    split_names = ['train', 'val', 'test']
+
+    for split in split_names:
+        filenames = []
+        for poly in scandir(
+                osp.join(gt_dir, split), '_polygons.json', recursive=True):
+            filenames.append(poly.replace('_gtFine_polygons.json', ''))
+        with open(osp.join(out_dir, f'{split}.txt'), 'w') as f:
+            f.writelines(f + '\n' for f in filenames)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/dist_test.sh b/cv/distiller/CWD/pytorch/mmrazor/tools/dist_test.sh
new file mode 100644
index 0000000000000000000000000000000000000000..dea131b43ea8f1222661d20603d40c18ea7f28a1
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/dist_test.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+
+CONFIG=$1
+CHECKPOINT=$2
+GPUS=$3
+NNODES=${NNODES:-1}
+NODE_RANK=${NODE_RANK:-0}
+PORT=${PORT:-29500}
+MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
+
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
+python -m torch.distributed.launch \
+    --nnodes=$NNODES \
+    --node_rank=$NODE_RANK \
+    --master_addr=$MASTER_ADDR \
+    --nproc_per_node=$GPUS \
+    --master_port=$PORT \
+    $(dirname "$0")/test.py \
+    $CONFIG \
+    $CHECKPOINT \
+    --launcher pytorch \
+    ${@:4}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/dist_train.sh b/cv/distiller/CWD/pytorch/mmrazor/tools/dist_train.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a32ea3fa4d3d7c55f539d6dfed6788f4c5ae3e2b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/dist_train.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+# Copyright (c) 2023, Shanghai Iluvatar CoreX Semiconductor Co., Ltd.
+# All Rights Reserved.
+
+CONFIG=$1
+GPUS=$2
+NNODES=${NNODES:-1}
+NODE_RANK=${NODE_RANK:-0}
+PORT=${PORT:-29500}
+MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
+
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
+python3 -m torch.distributed.launch \
+    --nnodes=$NNODES \
+    --node_rank=$NODE_RANK \
+    --master_addr=$MASTER_ADDR \
+    --nproc_per_node=$GPUS \
+    --master_port=$PORT \
+    $(dirname "$0")/train.py \
+    $CONFIG \
+    --launcher pytorch ${@:3}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/misc/print_config.py b/cv/distiller/CWD/pytorch/mmrazor/tools/misc/print_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..6829b29342008e391429ee8aa3ad71792d69f094
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/misc/print_config.py
@@ -0,0 +1,51 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import warnings
+
+from mmengine import Config, DictAction
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Print the whole config')
+    parser.add_argument('config', help='config file path')
+    parser.add_argument(
+        '--options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file (deprecate), '
+        'change to --cfg-options instead.')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    args = parser.parse_args()
+
+    if args.options and args.cfg_options:
+        raise ValueError(
+            '--options and --cfg-options cannot be both '
+            'specified, --options is deprecated in favor of --cfg-options')
+    if args.options:
+        warnings.warn('--options is deprecated in favor of --cfg-options')
+        args.cfg_options = args.options
+
+    return args
+
+
+def main():
+    args = parse_args()
+
+    cfg = Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    print(f'Config:\n{cfg.pretty_text}')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_attentivenas_nas_ckpt.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_attentivenas_nas_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..e320fdf94290a90848b0d1e1ef779a3bbfc146c0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_attentivenas_nas_ckpt.py
@@ -0,0 +1,170 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a checkpoint to be published')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+
+    for key, value in checkpoint['state_dict'].items():
+        key = key.replace('module.', 'architecture.backbone.')
+        if 'blocks.10' in key:
+            new_key = key.replace('blocks.10', 'layer3.3')
+        elif 'blocks.11' in key:
+            new_key = key.replace('blocks.11', 'layer3.4')
+        elif 'blocks.12' in key:
+            new_key = key.replace('blocks.12', 'layer3.5')
+        elif 'blocks.13' in key:
+            new_key = key.replace('blocks.13', 'layer4.0')
+        elif 'blocks.14' in key:
+            new_key = key.replace('blocks.14', 'layer4.1')
+        elif 'blocks.15' in key:
+            new_key = key.replace('blocks.15', 'layer4.2')
+        elif 'blocks.16' in key:
+            new_key = key.replace('blocks.16', 'layer4.3')
+        elif 'blocks.17' in key:
+            new_key = key.replace('blocks.17', 'layer4.4')
+        elif 'blocks.18' in key:
+            new_key = key.replace('blocks.18', 'layer4.5')
+        elif 'blocks.19' in key:
+            new_key = key.replace('blocks.19', 'layer5.0')
+        elif 'blocks.20' in key:
+            new_key = key.replace('blocks.20', 'layer5.1')
+        elif 'blocks.21' in key:
+            new_key = key.replace('blocks.21', 'layer5.2')
+        elif 'blocks.22' in key:
+            new_key = key.replace('blocks.22', 'layer5.3')
+        elif 'blocks.23' in key:
+            new_key = key.replace('blocks.23', 'layer5.4')
+        elif 'blocks.24' in key:
+            new_key = key.replace('blocks.24', 'layer5.5')
+        elif 'blocks.25' in key:
+            new_key = key.replace('blocks.25', 'layer5.6')
+        elif 'blocks.26' in key:
+            new_key = key.replace('blocks.26', 'layer5.7')
+        elif 'blocks.27' in key:
+            new_key = key.replace('blocks.27', 'layer6.0')
+        elif 'blocks.28' in key:
+            new_key = key.replace('blocks.28', 'layer6.1')
+        elif 'blocks.29' in key:
+            new_key = key.replace('blocks.29', 'layer6.2')
+        elif 'blocks.30' in key:
+            new_key = key.replace('blocks.30', 'layer6.3')
+        elif 'blocks.31' in key:
+            new_key = key.replace('blocks.31', 'layer6.4')
+        elif 'blocks.32' in key:
+            new_key = key.replace('blocks.32', 'layer6.5')
+        elif 'blocks.33' in key:
+            new_key = key.replace('blocks.33', 'layer6.6')
+        elif 'blocks.34' in key:
+            new_key = key.replace('blocks.34', 'layer6.7')
+        elif 'blocks.35' in key:
+            new_key = key.replace('blocks.35', 'layer7.0')
+        elif 'blocks.36' in key:
+            new_key = key.replace('blocks.36', 'layer7.1')
+        elif 'blocks.0' in key:
+            new_key = key.replace('blocks.0', 'layer1.0')
+        elif 'blocks.1' in key:
+            new_key = key.replace('blocks.1', 'layer1.1')
+        elif 'blocks.2' in key:
+            new_key = key.replace('blocks.2', 'layer2.0')
+        elif 'blocks.3' in key:
+            new_key = key.replace('blocks.3', 'layer2.1')
+        elif 'blocks.4' in key:
+            new_key = key.replace('blocks.4', 'layer2.2')
+        elif 'blocks.5' in key:
+            new_key = key.replace('blocks.5', 'layer2.3')
+        elif 'blocks.6' in key:
+            new_key = key.replace('blocks.6', 'layer2.4')
+        elif 'blocks.7' in key:
+            new_key = key.replace('blocks.7', 'layer3.0')
+        elif 'blocks.8' in key:
+            new_key = key.replace('blocks.8', 'layer3.1')
+        elif 'blocks.9' in key:
+            new_key = key.replace('blocks.9', 'layer3.2')
+        else:
+            new_key = key
+
+        if 'mobile_inverted_conv.depth_conv.conv.conv' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.depth_conv.conv.conv',
+                'depthwise_conv.conv')
+        elif 'mobile_inverted_conv.depth_conv.bn.bn' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.depth_conv.bn.bn', 'depthwise_conv.bn')
+        elif 'mobile_inverted_conv.point_linear.conv.conv' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.point_linear.conv.conv',
+                'linear_conv.conv')
+        elif 'mobile_inverted_conv.point_linear.bn.bn' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.point_linear.bn.bn', 'linear_conv.bn')
+        elif 'shortcut.conv.conv' in new_key:
+            final_new_key = new_key.replace('shortcut.conv.conv',
+                                            'shortcut.conv')
+        elif 'mobile_inverted_conv.inverted_bottleneck.conv.conv' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.inverted_bottleneck.conv.conv',
+                'expand_conv.conv')
+        elif 'mobile_inverted_conv.inverted_bottleneck.bn.bn' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.inverted_bottleneck.bn.bn',
+                'expand_conv.bn')
+        elif 'mobile_inverted_conv.depth_conv.se.fc.reduce' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.depth_conv.se.fc.reduce',
+                'se.conv1.conv')
+        elif 'mobile_inverted_conv.depth_conv.se.fc.expand' in new_key:
+            final_new_key = new_key.replace(
+                'mobile_inverted_conv.depth_conv.se.fc.expand',
+                'se.conv2.conv')
+        elif 'first_conv.conv.conv' in new_key:
+            final_new_key = new_key.replace('first_conv.conv.conv',
+                                            'first_conv.conv')
+        elif 'first_conv.bn.bn' in new_key:
+            final_new_key = new_key.replace('first_conv.bn.bn',
+                                            'first_conv.bn')
+        elif 'final_expand_layer.conv.conv' in new_key:
+            final_new_key = new_key.replace('final_expand_layer.conv.conv',
+                                            'final_expand_layer.conv')
+        elif 'final_expand_layer.bn.bn' in new_key:
+            final_new_key = new_key.replace('final_expand_layer.bn.bn',
+                                            'final_expand_layer.bn')
+        elif 'feature_mix_layer.conv.conv' in new_key:
+            final_new_key = new_key.replace('feature_mix_layer.conv.conv',
+                                            'feature_mix_layer.conv')
+        elif 'classifier.linear.linear' in new_key:
+            final_new_key = new_key.replace(
+                'backbone.classifier.linear.linear', 'head.fc')
+        else:
+            final_new_key = new_key
+
+        new_state_dict[final_new_key] = value
+
+    checkpoint['state_dict'] = new_state_dict
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_latest.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_bignas_gml_ckpt.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_bignas_gml_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..bba0b3dc98bb89248fc89d5de2bdfad38bcfb551
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_bignas_gml_ckpt.py
@@ -0,0 +1,56 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a checkpoint to be published')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+
+    for key, value in checkpoint['state_dict'].items():
+        key = key.replace('model.', 'architecture.')
+        if 'blocks.0' in key:
+            new_key = key.replace('blocks.0', 'layer1')
+        elif 'blocks.1' in key:
+            new_key = key.replace('blocks.1', 'layer2')
+        elif 'blocks.2' in key:
+            new_key = key.replace('blocks.2', 'layer3')
+        elif 'blocks.3' in key:
+            new_key = key.replace('blocks.3', 'layer4')
+        elif 'blocks.4' in key:
+            new_key = key.replace('blocks.4', 'layer5')
+        elif 'blocks.5' in key:
+            new_key = key.replace('blocks.5', 'layer6')
+        elif 'blocks.6' in key:
+            new_key = key.replace('blocks.6', 'layer7')
+        else:
+            new_key = key
+
+        new_state_dict[new_key] = value
+
+    checkpoint['state_dict'] = new_state_dict
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_latest.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6e966589b1ca50515c82c057455c150c2cceafe
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt.py
@@ -0,0 +1,47 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a checkpoint to be published')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+
+    for key, value in checkpoint['state_dict'].items():
+        if key.startswith('architecture.model.distiller.teacher'):
+            new_key = key.replace('architecture.model.distiller.teacher',
+                                  'architecture.teacher')
+        elif key.startswith('architecture.model'):
+            new_key = key.replace('architecture.model', 'architecture')
+        else:
+            new_key = key
+
+        new_state_dict[new_key] = value
+
+    checkpoint['state_dict'] = new_state_dict
+
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_latest.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt_to_student.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt_to_student.py
new file mode 100644
index 0000000000000000000000000000000000000000..e44f66d02f518b12cbfdf98c53b450148354a44d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_kd_ckpt_to_student.py
@@ -0,0 +1,48 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Convert KD checkpoint to student-only checkpoint')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument('--out-path', help='save checkpoint path')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+    new_meta = checkpoint['meta']
+
+    for key, value in checkpoint['state_dict'].items():
+        if key.startswith('architecture.'):
+            new_key = key.replace('architecture.', '')
+            new_state_dict[new_key] = value
+
+    checkpoint = dict()
+    checkpoint['meta'] = new_meta
+    checkpoint['state_dict'] = new_state_dict
+
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        if args.out_path:
+            ckpt_dir = Path(args.out_path)
+        else:
+            ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_student.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_ofa_ckpt.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_ofa_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..b28f15f290908f4de7ce40c295acae067617572b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_ofa_ckpt.py
@@ -0,0 +1,110 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a checkpoint to be published')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument('--depth', nargs='+', type=int, help='layer depth')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def block2layer_index_convert(layer_depth):
+    """Build index_table from OFA blocks to MMRazor layers."""
+    index_table = dict()
+    i = 0
+    first_index = 1
+    second_index = 0
+    for k in layer_depth:
+        for _ in range(k):
+            index_table[str(i)] = str(first_index) + '.' + str(second_index)
+            i += 1
+            second_index += 1
+        second_index = 0
+        first_index += 1
+
+    return index_table
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+
+    index_table = block2layer_index_convert(args.depth)
+
+    for key, value in checkpoint['state_dict'].items():
+        if 'blocks' in key:
+            index = key.split('.')[1]
+            new_key = key.replace('blocks.' + index,
+                                  'layer' + index_table[index])
+        else:
+            new_key = key
+
+        if 'mobile_inverted_conv' in new_key:
+            new_key = new_key.replace('mobile_inverted_conv.', '')
+        if 'depth_conv' in key:
+            new_key = new_key.replace('depth_conv', 'depthwise_conv')
+        if 'point_linear' in key:
+            new_key = new_key.replace('point_linear', 'linear_conv')
+        if 'inverted_bottleneck' in key:
+            new_key = new_key.replace('inverted_bottleneck', 'expand_conv')
+        if '.conv.conv' in new_key:
+            new_key = new_key.replace('.conv.conv', '.conv')
+        if '.bn.bn' in new_key:
+            new_key = new_key.replace('.bn.bn', '.bn')
+
+        if 'layer1.0.depthwise_conv.weight' in new_key:
+            new_key = new_key.replace('layer1.0.depthwise_conv.weight',
+                                      'layer1.0.depthwise_conv.conv.weight')
+        if 'layer1.0.linear_conv.weight' in new_key:
+            new_key = new_key.replace('layer1.0.linear_conv.weight',
+                                      'layer1.0.linear_conv.conv.weight')
+
+        if 'depthwise_conv.se.fc.reduce' in new_key:
+            new_key = new_key.replace('depthwise_conv.se.fc.reduce',
+                                      'se.conv1.conv')
+        if 'depthwise_conv.se.fc.expand' in new_key:
+            new_key = new_key.replace('depthwise_conv.se.fc.expand',
+                                      'se.conv2.conv')
+
+        if 'final_expand_layer' in new_key:
+            new_key = new_key.replace('final_expand_layer',
+                                      'last_conv.final_expand_layer')
+        if 'feature_mix_layer' in new_key:
+            new_key = new_key.replace('feature_mix_layer',
+                                      'last_conv.feature_mix_layer')
+
+        if '5to3_matrix' in new_key:
+            new_key = new_key.replace('5to3_matrix', 'trans_matrix_5to3')
+        if '7to5_matrix' in new_key:
+            new_key = new_key.replace('7to5_matrix', 'trans_matrix_7to5')
+
+        new_key = 'architecture.backbone.' + new_key
+
+        if 'classifier.linear' in new_key:
+            new_key = new_key.replace('classifier.linear', 'head.fc')
+            new_key = new_key.replace('backbone.', '')
+
+        new_state_dict[new_key] = value
+
+    checkpoint['state_dict'] = new_state_dict
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_latest.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_quant_ckpt.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_quant_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..9fbb0612516a5700d50d5e3312f63893dffb8730
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_quant_ckpt.py
@@ -0,0 +1,53 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+from pathlib import Path
+
+import torch
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Convert quantized checkpoint to deploy')
+    parser.add_argument('checkpoint', help='input checkpoint filename')
+    parser.add_argument('--out-path', help='save checkpoint path')
+    parser.add_argument(
+        '--inplace', action='store_true', help='replace origin ckpt')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+    checkpoint = torch.load(args.checkpoint, map_location='cpu')
+    new_state_dict = dict()
+    new_meta = checkpoint['meta']
+
+    for key, value in checkpoint['state_dict'].items():
+        if key.startswith('qmodels.predict.'):
+            new_key = key.replace('qmodels.predict.', '')
+            if '_val' in new_key and 'weight_fake_quant' in new_key:
+                new_key = new_key.replace('_val', '_vals')
+            new_state_dict[new_key] = value
+        # if key.startswith('architecture.'):
+        #     new_key = key.replace('architecture.', '')
+        #     new_state_dict[new_key] = value
+
+    checkpoint = dict()
+    checkpoint['meta'] = new_meta
+    checkpoint['state_dict'] = new_state_dict
+
+    if args.inplace:
+        torch.save(checkpoint, args.checkpoint)
+    else:
+        ckpt_path = Path(args.checkpoint)
+        ckpt_name = ckpt_path.stem
+        if args.out_path:
+            ckpt_dir = Path(args.out_path)
+        else:
+            ckpt_dir = ckpt_path.parent
+        new_ckpt_path = ckpt_dir / f'{ckpt_name}_deploy.pth'
+        torch.save(checkpoint, new_ckpt_path)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_supernet2subnet.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_supernet2subnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9f39a5174e3714272a87b7df4d31e4905a8e569
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/convert_supernet2subnet.py
@@ -0,0 +1,60 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+import os.path as osp
+import time
+
+import torch
+from mmengine.config import Config
+from mmengine.runner import Runner
+
+from mmrazor.structures.subnet import load_fix_subnet
+from mmrazor.utils import register_all_modules
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a NAS supernet checkpoint to be converted')
+    parser.add_argument('config', help='NAS model config file path')
+    parser.add_argument('checkpoint', help='supernet checkpoint file path')
+    parser.add_argument('yaml', help='YAML with subnet settings file path')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    register_all_modules(False)
+    args = parse_args()
+
+    # load config
+    cfg = Config.fromfile(args.config)
+    cfg.launcher = args.launcher
+
+    cfg.load_from = args.checkpoint
+    cfg.work_dir = '/'.join(args.checkpoint.split('/')[:-1])
+
+    runner = Runner.from_cfg(cfg)
+
+    load_fix_subnet(runner.model, args.yaml)
+
+    timestamp_subnet = time.strftime('%Y%m%d_%H%M', time.localtime())
+    model_name = f'subnet_{timestamp_subnet}.pth'
+    save_path = osp.join(runner.work_dir, model_name)
+    torch.save({
+        'state_dict': runner.model.state_dict(),
+        'meta': {}
+    }, save_path)
+    runner.logger.info(f'Successful converted. Saved in {save_path}.')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/publish_model.py b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/publish_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5bf482870e31853d6d831875defa85d04dbc472
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/model_converters/publish_model.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import shutil
+from pathlib import Path
+from typing import Union
+
+import torch
+from mmengine import digit_version
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Process a checkpoint to be published')
+    parser.add_argument('ckpt', help='input checkpoint filename', type=str)
+    parser.add_argument('--model-name', help='model(config) name', type=str)
+    parser.add_argument('--timestamp', help='training timestamp', type=str)
+    parser.add_argument('--out-dir', help='output dir', type=str)
+    args = parser.parse_args()
+    return args
+
+
+def cal_file_sha256(file_path: Union[str, Path]) -> str:
+    import hashlib
+
+    BLOCKSIZE = 65536
+    sha256_hash = hashlib.sha256()
+
+    with open(file_path, 'rb') as f:
+        block = f.read(BLOCKSIZE)
+        while block:
+            sha256_hash.update(block)
+            block = f.read(BLOCKSIZE)
+
+    return sha256_hash.hexdigest()
+
+
+def process_checkpoint(ckpt_path_str: str, model_name: str, timestamp: str,
+                       out_dir_str: str) -> None:
+
+    ckpt_path = Path(ckpt_path_str)
+    work_dir = ckpt_path.parent
+
+    out_dir: Path = Path(out_dir_str)
+    out_dir.mkdir(parents=True, exist_ok=True)
+
+    tmp_ckpt_path = out_dir / 'tmp.pth'
+
+    checkpoint = torch.load(ckpt_path, map_location='cpu')
+    # remove optimizer for smaller file size
+    if 'optimizer' in checkpoint:
+        del checkpoint['optimizer']
+    # remove message_hub for smaller file size
+    if 'message_hub' in checkpoint:
+        del checkpoint['message_hub']
+    # remove param_schedulers for smaller file size
+    if 'param_schedulers' in checkpoint:
+        del checkpoint['param_schedulers']
+
+    # if it is necessary to remove some sensitive data in checkpoint['meta'],
+    # add the code here.
+    if digit_version(torch.__version__) >= digit_version('1.6'):
+        torch.save(
+            checkpoint, tmp_ckpt_path, _use_new_zipfile_serialization=False)
+    else:
+        torch.save(checkpoint, tmp_ckpt_path)
+
+    sha = cal_file_sha256(tmp_ckpt_path)
+    save_ckpt_path = f'{out_dir}/{model_name}_{timestamp}-{sha[:8]}.pth'
+    tmp_ckpt_path.rename(save_ckpt_path)
+    print(f'Successfully generated the publish-ckpt as {save_ckpt_path}.')
+
+    log_path = work_dir / timestamp / f'{timestamp}.log'
+    save_log_path = f'{out_dir}/{model_name}_{timestamp}-{sha[:8]}.log'
+    shutil.copy(str(log_path), str(save_log_path))
+    print(f'Successfully generated the publish-log as {save_log_path}.')
+
+    log_path = work_dir / timestamp / f'{timestamp}.log'
+    json_path = work_dir / timestamp / f'vis_data/{timestamp}.json'
+    save_json_path = f'{out_dir}/{model_name}_{timestamp}-{sha[:8]}.json'
+    shutil.copy(str(json_path), str(save_json_path))
+    print(f'Successfully generated the publish-log as {save_json_path}.')
+
+
+def main():
+    args = parse_args()
+    process_checkpoint(args.ckpt, args.model_name, args.timestamp,
+                       args.out_dir)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_channel_units.py b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_channel_units.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb3f890eda5e9210d48a801b7929aa7ef0a5637c
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_channel_units.py
@@ -0,0 +1,84 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import json
+import sys
+
+import torch.nn as nn
+from mmengine import MODELS
+from mmengine.config import Config
+
+from mmrazor.models import BaseAlgorithm
+from mmrazor.models.mutators import ChannelMutator
+
+sys.setrecursionlimit(int(pow(2, 20)))
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Get channel unit of a model.')
+    parser.add_argument('config', help='config of the model')
+    parser.add_argument(
+        '-c',
+        '--with-channel',
+        action='store_true',
+        help='output with channel config')
+    parser.add_argument(
+        '-i',
+        '--with-init-args',
+        action='store_true',
+        help='output with init args')
+    parser.add_argument(
+        '--choice',
+        action='store_true',
+        help=('output choices template. When this flag is activated, '
+              '-c and -i will be ignored'))
+    parser.add_argument(
+        '-o',
+        '--output-path',
+        default='',
+        help='the file path to store channel unit info')
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+    config = Config.fromfile(args.config)
+    default_scope = config['default_scope']
+
+    model = MODELS.build(config['model'])
+    if isinstance(model, BaseAlgorithm):
+        mutator = model.mutator
+    elif isinstance(model, nn.Module):
+        mutator: ChannelMutator = ChannelMutator(
+            channel_unit_cfg=dict(
+                type='L1MutableChannelUnit',
+                default_args=dict(choice_mode='ratio'),
+            ),
+            parse_cfg={
+                'type': 'ChannelAnalyzer',
+                'demo_input': {
+                    'type': 'DefaultDemoInput',
+                    'scope': default_scope
+                },
+                'tracer_type': 'FxTracer'
+            })
+        mutator.prepare_from_supernet(model)
+    if args.choice:
+        config = mutator.choice_template
+    else:
+        config = mutator.config_template(
+            with_channels=args.with_channel,
+            with_unit_init_args=args.with_init_args)
+    json_config = json.dumps(config, indent=4, separators=(',', ':'))
+    if args.output_path == '':
+        print('=' * 100)
+        print('config template')
+        print('=' * 100)
+        print(json_config)
+    else:
+        with open(args.output_path, 'w') as file:
+            file.write(json_config)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_flops.py b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_flops.py
new file mode 100644
index 0000000000000000000000000000000000000000..409817e105e6783add95a690d9cf16d43cc857f9
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_flops.py
@@ -0,0 +1,55 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+
+from mmengine import Config
+
+from mmrazor.models.algorithms import ItePruneAlgorithm
+from mmrazor.models.task_modules import ResourceEstimator
+from mmrazor.models.task_modules.demo_inputs import DefaultDemoInput
+from mmrazor.registry import MODELS
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('config')
+    parser.add_argument('-H', default=224, type=int)
+    parser.add_argument('-W', default=224, type=int)
+    args = parser.parse_args()
+    return args
+
+
+def input_generator_wrapper(model, shape, training, scope=None):
+
+    def input_generator(input_shape):
+        inputs = DefaultDemoInput(scope=scope).get_data(
+            model, input_shape=input_shape, training=training)
+        if isinstance(input, dict) and 'mode' in inputs:
+            inputs['mode'] = 'tensor'
+        return inputs
+
+    return input_generator
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    config = Config.fromfile(args.config)
+    H = args.H
+    W = args.W
+
+    default_scope = config['default_scope']
+    model_config = config['model']
+    # model_config['_scope_'] = default_scope
+    model: ItePruneAlgorithm = MODELS.build(model_config)
+
+    estimator = ResourceEstimator(
+        flops_params_cfg=dict(
+            input_shape=(1, 3, H, W),
+            print_per_layer_stat=False,
+            input_constructor=input_generator_wrapper(
+                model,
+                (1, 3, H, W),
+                training=False,
+                scope=default_scope,
+            )))
+    result = estimator.estimate(model)
+    print(result)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_l1_prune_config.py b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_l1_prune_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..877ad42d5ff8f5ac9143d05bcdad6ba3cef19696
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_l1_prune_config.py
@@ -0,0 +1,127 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import copy
+from typing import Dict
+
+from mmengine import Config, fileio
+
+from mmrazor.models.mutators import ChannelMutator
+from mmrazor.registry import MODELS
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Get the config to prune a model.')
+    parser.add_argument('config', help='config of the model')
+    parser.add_argument(
+        '--checkpoint',
+        default=None,
+        type=str,
+        help='checkpoint path of the model')
+    parser.add_argument(
+        '--subnet',
+        default=None,
+        type=str,
+        help='pruning structure for the model')
+    parser.add_argument(
+        '-o',
+        type=str,
+        default='./prune.py',
+        help='output path to store the pruning config.')
+    args = parser.parse_args()
+    return args
+
+
+def wrap_prune_config(config: Config, prune_target: Dict,
+                      checkpoint_path: str):
+    config = copy.deepcopy(config)
+    default_scope = config['default_scope']
+    arch_config: Dict = config['model']
+
+    # update checkpoint_path
+    if checkpoint_path is not None:
+        arch_config.update({
+            'init_cfg': {
+                'type': 'Pretrained',
+                'checkpoint': checkpoint_path  # noqa
+            },
+        })
+
+    # deal with data_preprocessor
+    if 'data_preprocessor' in config:
+        data_preprocessor = config['data_preprocessor']
+        arch_config.update({'data_preprocessor': data_preprocessor})
+        config['data_preprocessor'] = None
+    else:
+        data_preprocessor = None
+
+    # prepare algorithm
+    algorithm_config = dict(
+        _scope_='mmrazor',
+        type='ItePruneAlgorithm',
+        architecture=arch_config,
+        target_pruning_ratio=prune_target,
+        mutator_cfg=dict(
+            type='ChannelMutator',
+            channel_unit_cfg=dict(
+                type='L1MutableChannelUnit',
+                default_args=dict(choice_mode='ratio')),
+            parse_cfg=dict(
+                type='ChannelAnalyzer',
+                tracer_type='FxTracer',
+                demo_input=dict(type='DefaultDemoInput',
+                                scope=default_scope))))
+    config['model'] = algorithm_config
+
+    return config
+
+
+def change_config(config):
+
+    scope = config['default_scope']
+    config['model']['_scope_'] = scope
+    return config
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    config_path = args.config
+    checkpoint_path = args.checkpoint
+    target_path = args.o
+
+    origin_config = Config.fromfile(config_path)
+    origin_config = change_config(origin_config)
+    default_scope = origin_config['default_scope']
+
+    # get subnet config
+    model = MODELS.build(copy.deepcopy(origin_config['model']))
+    mutator: ChannelMutator = ChannelMutator(
+        channel_unit_cfg=dict(
+            type='L1MutableChannelUnit',
+            default_args=dict(choice_mode='ratio'),
+        ),
+        parse_cfg={
+            'type': 'ChannelAnalyzer',
+            'demo_input': {
+                'type': 'DefaultDemoInput',
+                'scope': default_scope
+            },
+            'tracer_type': 'FxTracer'
+        })
+    mutator.prepare_from_supernet(model)
+    if args.subnet is None:
+        choice_template = mutator.choice_template
+    else:
+        input_choices = fileio.load(args.subnet)
+        try:
+            mutator.set_choices(input_choices)
+            choice_template = input_choices
+        except Exception as e:
+            print(f'error when apply input subnet: {e}')
+            choice_template = mutator.choice_template
+
+    # prune and finetune
+
+    prune_config: Config = wrap_prune_config(origin_config, choice_template,
+                                             checkpoint_path)
+    prune_config.dump(target_path)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_static_model_from_algorithm.py b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_static_model_from_algorithm.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d28842e741c6b39b38eabb0c1e7560c384f052a
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/pruning/get_static_model_from_algorithm.py
@@ -0,0 +1,82 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import json
+import os
+
+import torch
+from mmengine import Config, fileio
+from mmengine.runner.checkpoint import load_checkpoint
+
+from mmrazor.models import BaseAlgorithm
+from mmrazor.registry import MODELS
+from mmrazor.utils import print_log
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Export a pruned model checkpoint.')
+    parser.add_argument('config', help='config of the model')
+    parser.add_argument(
+        'checkpoint',
+        default=None,
+        type=str,
+        help='checkpoint path of the model')
+    parser.add_argument(
+        '-o',
+        type=str,
+        default='',
+        help='output path to store the pruned checkpoint.')
+    args = parser.parse_args()
+    return args
+
+
+def get_save_path(config_path, checkpoint_path, target_path):
+    if target_path != '':
+        work_dir = target_path
+    else:
+        work_dir = 'work_dirs/' + os.path.basename(config_path).split('.')[0]
+
+    checkpoint_name = os.path.basename(checkpoint_path).split(
+        '.')[0] + '_pruned'
+
+    return work_dir, checkpoint_name
+
+
+def get_static_model(algorithm):
+    from mmrazor.structures.subnet import export_fix_subnet, load_fix_subnet
+    pruning_structure = algorithm.mutator.choice_template
+
+    # to static model
+    fix_mutable = export_fix_subnet(algorithm.architecture)[0]
+    load_fix_subnet(algorithm.architecture, fix_mutable)
+    model = algorithm.architecture
+    return model, pruning_structure
+
+
+if __name__ == '__main__':
+    # init
+    args = parse_args()
+    config_path = args.config
+    checkpoint_path = args.checkpoint
+    target_path = args.o
+
+    work_dir, checkpoint_name = get_save_path(config_path, checkpoint_path,
+                                              target_path)
+    os.makedirs(work_dir, exist_ok=True)
+
+    # build model
+    config = Config.fromfile(config_path)
+    model = MODELS.build(config.model)
+    assert isinstance(model, BaseAlgorithm), 'Model must be a BaseAlgorithm'
+    load_checkpoint(model, checkpoint_path, map_location='cpu')
+
+    pruned_model, structure = get_static_model(model)
+
+    # save
+    torch.save(pruned_model.state_dict(),
+               os.path.join(work_dir, checkpoint_name + '.pth'))
+    fileio.dump(
+        structure, os.path.join(work_dir, checkpoint_name + '.json'), indent=4)
+
+    print_log('Save pruned model to {}'.format(work_dir))
+    print_log('Pruning Structure: {}'.format(json.dumps(structure, indent=4)))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/ptq.py b/cv/distiller/CWD/pytorch/mmrazor/tools/ptq.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c00c5b11e46c3f9130c00cca0c77eacafaee233
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/ptq.py
@@ -0,0 +1,73 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+import os.path as osp
+
+from mmengine.config import Config, DictAction
+from mmengine.runner import Runner
+
+from mmrazor.utils import register_all_modules
+
+
+# TODO: support fuse_conv_bn, visualization, and format_only
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='MMRazor test (and eval) a model')
+    parser.add_argument('config', help='test config file path')
+    # parser.add_argument('checkpoint', help='checkpoint file')
+    parser.add_argument(
+        '--work-dir',
+        help='the directory to save the file containing evaluation metrics')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+
+    return args
+
+
+def main():
+    register_all_modules(False)
+    args = parse_args()
+
+    # load config
+    cfg = Config.fromfile(args.config)
+    cfg.launcher = args.launcher
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+
+    # work_dir is determined in this priority: CLI > segment in file > filename
+    if args.work_dir is not None:
+        # update configs according to CLI args if args.work_dir is not None
+        cfg.work_dir = args.work_dir
+    elif cfg.get('work_dir', None) is None:
+        # use config filename as default work_dir if cfg.work_dir is None
+        cfg.work_dir = osp.join('./work_dirs',
+                                osp.splitext(osp.basename(args.config))[0])
+
+    # cfg.load_from = args.checkpoint
+
+    # build the runner from config
+    runner = Runner.from_cfg(cfg)
+
+    # start testing
+    runner.test()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_test.sh b/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_test.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3c74ec6ecd1f08049a3234f2562f8be7107ed6ec
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_test.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+
+CONFIG=$1
+CHECKPOINT=$2
+GPUS=$3
+PORT=${PORT:-29500}
+
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
+python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
+    $(dirname "$0")/test.py $CONFIG $CHECKPOINT --launcher pytorch ${@:4}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_train.sh b/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_train.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b3feb3d9c7a6c33d82739cdf5ee10365673aaded
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/slurm_train.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+set -x
+
+PARTITION=$1
+JOB_NAME=$2
+CONFIG=$3
+WORK_DIR=$4
+GPUS=${GPUS:-8}
+GPUS_PER_NODE=${GPUS_PER_NODE:-8}
+CPUS_PER_TASK=${CPUS_PER_TASK:-5}
+SRUN_ARGS=${SRUN_ARGS:-""}
+PY_ARGS=${@:5}
+
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
+srun -p ${PARTITION} \
+    --job-name=${JOB_NAME} \
+    --gres=gpu:${GPUS_PER_NODE} \
+    --ntasks=${GPUS} \
+    --ntasks-per-node=${GPUS_PER_NODE} \
+    --cpus-per-task=${CPUS_PER_TASK} \
+    --kill-on-bad-exit=1 \
+    ${SRUN_ARGS} \
+    python -u tools/train.py ${CONFIG} --work-dir=${WORK_DIR} --launcher="slurm" ${PY_ARGS}
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/test.py b/cv/distiller/CWD/pytorch/mmrazor/tools/test.py
new file mode 100644
index 0000000000000000000000000000000000000000..a69133158c7687479419c05fcfef95d4560a1cd0
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/test.py
@@ -0,0 +1,80 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+import os.path as osp
+
+from mmengine.config import Config, DictAction
+from mmengine.runner import Runner
+
+from mmrazor.utils import register_all_modules
+
+
+# TODO: support fuse_conv_bn, visualization, and format_only
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='MMRazor test (and eval) a model')
+    parser.add_argument('config', help='test config file path')
+    parser.add_argument('checkpoint', help='checkpoint file')
+    parser.add_argument(
+        '--work-dir',
+        help='the directory to save the file containing evaluation metrics')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    return args
+
+
+def main():
+    register_all_modules(False)
+    args = parse_args()
+
+    # load config
+    cfg = Config.fromfile(args.config)
+    cfg.launcher = args.launcher
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+
+    # work_dir is determined in this priority: CLI > segment in file > filename
+    if args.work_dir is not None:
+        # update configs according to CLI args if args.work_dir is not None
+        cfg.work_dir = args.work_dir
+    elif cfg.get('work_dir', None) is None:
+        # use config filename as default work_dir if cfg.work_dir is None
+        cfg.work_dir = osp.join('./work_dirs',
+                                osp.splitext(osp.basename(args.config))[0])
+
+    if args.checkpoint == 'none':
+        # NOTE: In this case, `args.checkpoint` isn't specified. If you haven't
+        # specified a checkpoint in the `init_cfg` of the model yet, it may
+        # cause the invalid results.
+        cfg.load_from = None
+    else:
+        cfg.load_from = args.checkpoint
+        if 'type' in cfg.test_cfg and cfg.test_cfg.type.endswith('PTQLoop'):
+            cfg.test_cfg.only_val = True
+
+    # build the runner from config
+    runner = Runner.from_cfg(cfg)
+
+    # start testing
+    runner.test()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/train.py b/cv/distiller/CWD/pytorch/mmrazor/tools/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b150e5f2814475c75681fe0beeca393b5d0cdc4
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/train.py
@@ -0,0 +1,121 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import logging
+import os
+import os.path as osp
+
+from mmengine.config import Config, DictAction
+from mmengine.logging import print_log
+from mmengine.runner import Runner
+
+from mmrazor.utils import register_all_modules
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Train an algorithm')
+    parser.add_argument('config', help='train config file path')
+    parser.add_argument('--work-dir', help='the dir to save logs and models')
+    parser.add_argument(
+        '--amp',
+        action='store_true',
+        default=False,
+        help='enable automatic-mixed-precision training')
+    parser.add_argument(
+        '--auto-scale-lr',
+        action='store_true',
+        help='enable automatically scaling LR.')
+    parser.add_argument(
+        '--resume',
+        action='store_true',
+        help='resume from the latest checkpoint in the work_dir automatically')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+
+    return args
+
+
+def main():
+    register_all_modules(False)
+    args = parse_args()
+
+    # load config
+    cfg = Config.fromfile(args.config)
+    cfg.launcher = args.launcher
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+
+    # work_dir is determined in this priority: CLI > segment in file > filename
+    if args.work_dir is not None:
+        # update configs according to CLI args if args.work_dir is not None
+        cfg.work_dir = args.work_dir
+    elif cfg.get('work_dir', None) is None:
+        # use config filename as default work_dir if cfg.work_dir is None
+        cfg.work_dir = osp.join('./work_dirs',
+                                osp.splitext(osp.basename(args.config))[0])
+
+    # enable automatic-mixed-precision training
+    if args.amp:
+        if getattr(cfg.optim_wrapper, 'type', None):
+            optim_wrapper = cfg.optim_wrapper.type
+            if optim_wrapper == 'AmpOptimWrapper':
+                print_log(
+                    'AMP training is already enabled in your config.',
+                    logger='current',
+                    level=logging.WARNING)
+            else:
+                assert optim_wrapper == 'OptimWrapper', (
+                    '`--amp` is only supported when the optimizer wrapper '
+                    f'type is `OptimWrapper` but got {optim_wrapper}.')
+                cfg.optim_wrapper.type = 'AmpOptimWrapper'
+                cfg.optim_wrapper.loss_scale = 'dynamic'
+
+        if getattr(cfg.optim_wrapper, 'constructor', None):
+            if cfg.optim_wrapper.architecture.type == 'OptimWrapper':
+                cfg.optim_wrapper.architecture.type = 'AmpOptimWrapper'
+                cfg.optim_wrapper.architecture.loss_scale = 'dynamic'
+
+            # TODO: support amp training for mutator
+            # if cfg.optim_wrapper.mutator.type == 'OptimWrapper':
+            #     cfg.optim_wrapper.mutator.type = 'AmpOptimWrapper'
+            #     cfg.optim_wrapper.mutator.loss_scale = 'dynamic'
+
+    # enable automatically scaling LR
+    if args.auto_scale_lr:
+        if 'auto_scale_lr' in cfg and \
+                'enable' in cfg.auto_scale_lr and \
+                'base_batch_size' in cfg.auto_scale_lr:
+            cfg.auto_scale_lr.enable = True
+        else:
+            raise RuntimeError('Can not find "auto_scale_lr" or '
+                               '"auto_scale_lr.enable" or '
+                               '"auto_scale_lr.base_batch_size" in your'
+                               ' configuration file.')
+
+    cfg.resume = args.resume
+
+    # build the runner from config
+    runner = Runner.from_cfg(cfg)
+
+    # start training
+    runner.train()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/demo.jpg b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/demo.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..dd613cee3bc13a3677908d7d6f1899e8278a4b47
Binary files /dev/null and b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/demo.jpg differ
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_diff_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_diff_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..d95d75b16c7cabb4dcce2ffe6c0c7bfcc055c69d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_diff_visualization.py
@@ -0,0 +1,169 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+
+import mmcv
+import torch
+from mmengine.config import Config
+from mmengine.registry import VISUALIZERS
+from mmengine.utils import import_modules_from_strings
+
+from mmrazor.models.task_modules import RecorderManager
+from mmrazor.utils import register_all_modules
+from mmrazor.visualization.local_visualizer import modify
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Feature map visualization')
+    parser.add_argument('img', help='Image file')
+    parser.add_argument(
+        'config1', help='train config file path for the first model')
+    parser.add_argument(
+        'config2', help='train config file path for the second model')
+    parser.add_argument('vis_config', help='visualization config file path')
+    parser.add_argument(
+        'checkpoint1', help='Checkpoint file for the first model')
+    parser.add_argument(
+        'checkpoint2', help='Checkpoint file for the second model')
+    parser.add_argument('--out-file', default=None, help='Path to output file')
+    parser.add_argument(
+        '--device', default='cpu', help='Device used for inference')
+    parser.add_argument('--repo', help='the corresponding repo name')
+    parser.add_argument(
+        '--use-norm',
+        action='store_true',
+        help='normalize the featmap before visualization')
+    parser.add_argument(
+        '--overlaid', action='store_true', help='overlaid image')
+    parser.add_argument(
+        '--channel-reduction',
+        help='Reduce multiple channels to a single channel. The optional value'
+        ' is \'squeeze_mean\', \'select_max\' or \'pixel_wise_max\'.',
+        default=None)
+    parser.add_argument(
+        '--topk',
+        help='If channel_reduction is not None and topk > 0, it will select '
+        'topk channel to show by the sum of each channel. If topk <= 0, '
+        'tensor_chw is assert to be one or three.',
+        type=int,
+        default=20)
+    parser.add_argument(
+        '--arrangement',
+        nargs='+',
+        type=int,
+        help='the arrangement of featmap when channel_reduction is not None '
+        'and topk > 0.',
+        default=[4, 5])
+    parser.add_argument(
+        '--resize-shape',
+        nargs='+',
+        type=int,
+        help='the shape to scale the feature map',
+        default=None)
+    parser.add_argument(
+        '--alpha', help='the transparency of featmap', default=0.5)
+
+    parser.add_argument('--local_rank', type=int, default=0)
+
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+
+    return args
+
+
+def norm(feat):
+    N, C, H, W = feat.shape
+    feat = feat.permute(1, 0, 2, 3).reshape(C, -1)
+    mean = feat.mean(dim=-1, keepdim=True)
+    std = feat.std(dim=-1, keepdim=True)
+    centered = (feat - mean) / (std + 1e-6)
+    centered = centered.reshape(C, N, H, W).permute(1, 0, 2, 3)
+    return centered
+
+
+def main(args):
+    register_all_modules(False)
+    mod = import_modules_from_strings(f'{args.repo}.utils')
+    mod.register_all_modules()
+
+    apis = import_modules_from_strings(f'{args.repo}.apis')
+    inference_model, init_model = None, None
+    for attr_name in dir(apis):
+        if 'inference_' in attr_name:
+            inference_model = getattr(apis, attr_name)
+        if 'init_' in attr_name:
+            init_model = getattr(apis, attr_name)
+    assert inference_model and init_model
+
+    model1 = init_model(args.config1, args.checkpoint1, device=args.device)
+    # init visualizer
+    visualizer = VISUALIZERS.build(model1.cfg.visualizer)
+    visualizer.draw_featmap = modify
+
+    model2 = init_model(args.config2, args.checkpoint2, device=args.device)
+
+    visualization_cfg = Config.fromfile(args.vis_config)
+    recorder_cfg1 = visualization_cfg.recorders1
+    mappings1 = visualization_cfg.mappings1
+    recorder_cfg2 = visualization_cfg.recorders2
+    mappings2 = visualization_cfg.mappings2
+
+    recorder_manager1 = RecorderManager(recorder_cfg1)
+    recorder_manager1.initialize(model1)
+
+    recorder_manager2 = RecorderManager(recorder_cfg2)
+    recorder_manager2.initialize(model2)
+
+    with recorder_manager1:
+        # test a single image
+        _ = inference_model(model1, args.img)
+
+    with recorder_manager2:
+        # test a single image
+        _ = inference_model(model2, args.img)
+
+    overlaid_image = mmcv.imread(
+        args.img, channel_order='rgb') if args.overlaid else None
+
+    for name1, name2 in zip(mappings1.keys(), mappings2.keys()):
+        record1 = mappings1[name1]
+        recorder1 = recorder_manager1.get_recorder(record1.recorder)
+        record_idx = getattr(record1, 'record_idx', 0)
+        data_idx = getattr(record1, 'data_idx')
+        feats1 = recorder1.get_record_data(record_idx, data_idx)
+        if isinstance(feats1, torch.Tensor):
+            feats1 = (feats1, )
+
+        record2 = mappings2[name2]
+        recorder2 = recorder_manager2.get_recorder(record2.recorder)
+        record_idx = getattr(record2, 'record_idx', 0)
+        data_idx = getattr(record2, 'data_idx')
+        feats2 = recorder2.get_record_data(record_idx, data_idx)
+        if isinstance(feats2, torch.Tensor):
+            feats2 = (feats2, )
+
+        for i, (feat1, feat2) in enumerate(zip(feats1, feats2)):
+            diff = torch.abs(feat1 - feat2)
+            if args.use_norm:
+                diff = norm(diff)
+            drawn_img = visualizer.draw_featmap(
+                diff[0],
+                overlaid_image,
+                args.channel_reduction,
+                topk=args.topk,
+                arrangement=tuple(args.arrangement),
+                resize_shape=tuple(args.resize_shape)
+                if args.resize_shape else None,
+                alpha=args.alpha)
+            visualizer.add_datasample(
+                f'model1_{name1}_model2_{name2}_{i}',
+                drawn_img,
+                show=args.out_file is None,
+                wait_time=0.1,
+                out_file=args.out_file)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    main(args)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..617f9d016c30ed6e2366b89b988baf5ab1092e1d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/feature_visualization.py
@@ -0,0 +1,155 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+
+import mmcv
+import torch
+from mmengine.config import Config, DictAction
+from mmengine.registry import VISUALIZERS
+from mmengine.utils import import_modules_from_strings
+
+from mmrazor.models.task_modules import RecorderManager
+from mmrazor.utils import register_all_modules
+from mmrazor.visualization.local_visualizer import modify
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Feature map visualization')
+    parser.add_argument('img', help='Image file')
+    parser.add_argument('config', help='train config file path')
+    parser.add_argument('vis_config', help='visualization config file path')
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument('--out-file', default=None, help='Path to output file')
+    parser.add_argument(
+        '--device', default='cpu', help='Device used for inference')
+    parser.add_argument('--repo', help='the corresponding repo name')
+    parser.add_argument(
+        '--use-norm',
+        action='store_true',
+        help='normalize the featmap before visualization')
+    parser.add_argument(
+        '--overlaid', action='store_true', help='overlaid image')
+    parser.add_argument(
+        '--channel-reduction',
+        help='Reduce multiple channels to a single channel. The optional value'
+        ' is \'squeeze_mean\', \'select_max\' or \'pixel_wise_max\'.',
+        default=None)
+    parser.add_argument(
+        '--topk',
+        type=int,
+        help='If channel_reduction is not None and topk > 0, it will select '
+        'topk channel to show by the sum of each channel. If topk <= 0, '
+        'tensor_chw is assert to be one or three.',
+        default=20)
+    parser.add_argument(
+        '--arrangement',
+        nargs='+',
+        type=int,
+        help='the arrangement of featmap when channel_reduction is not None '
+        'and topk > 0.',
+        default=[4, 5])
+    parser.add_argument(
+        '--resize-shape',
+        nargs='+',
+        type=int,
+        help='the shape to scale the feature map',
+        default=None)
+    parser.add_argument(
+        '--alpha', help='the transparency of featmap', default=0.5)
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.',
+        default={})
+
+    parser.add_argument('--local_rank', type=int, default=0)
+
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+
+    return args
+
+
+def norm(feat):
+    N, C, H, W = feat.shape
+    feat = feat.permute(1, 0, 2, 3).reshape(C, -1)
+    mean = feat.mean(dim=-1, keepdim=True)
+    std = feat.std(dim=-1, keepdim=True)
+    centered = (feat - mean) / (std + 1e-6)
+    centered = centered.reshape(C, N, H, W).permute(1, 0, 2, 3)
+    return centered
+
+
+def main(args):
+    register_all_modules(False)
+    mod = import_modules_from_strings(f'{args.repo}.utils')
+    mod.register_all_modules()
+
+    apis = import_modules_from_strings(f'{args.repo}.apis')
+    inference_model, init_model = None, None
+    for attr_name in dir(apis):
+        if 'inference_' in attr_name:
+            inference_model = getattr(apis, attr_name)
+        if 'init_' in attr_name:
+            init_model = getattr(apis, attr_name)
+    assert inference_model and init_model
+
+    model = init_model(args.config, args.checkpoint, device=args.device)
+    # init visualizer
+    visualizer = VISUALIZERS.build(model.cfg.visualizer)
+    visualizer.draw_featmap = modify
+
+    visualization_cfg = Config.fromfile(args.vis_config)
+    recorder_cfg = visualization_cfg.recorders
+    mappings = visualization_cfg.mappings
+    recorder_manager = RecorderManager(recorder_cfg)
+    recorder_manager.initialize(model)
+
+    with recorder_manager:
+        # test a single image
+        result = inference_model(model, args.img)
+
+    overlaid_image = mmcv.imread(
+        args.img, channel_order='rgb') if args.overlaid else None
+
+    for name, record in mappings.items():
+        recorder = recorder_manager.get_recorder(record.recorder)
+        record_idx = getattr(record, 'record_idx', 0)
+        data_idx = getattr(record, 'data_idx')
+        feats = recorder.get_record_data(record_idx, data_idx)
+        if isinstance(feats, torch.Tensor):
+            feats = (feats, )
+
+        for i, feat in enumerate(feats):
+            if args.use_norm:
+                feat = norm(feat)
+            drawn_img = visualizer.draw_featmap(
+                feat[0],
+                overlaid_image,
+                args.channel_reduction,
+                topk=args.topk,
+                arrangement=tuple(args.arrangement),
+                resize_shape=tuple(args.resize_shape)
+                if args.resize_shape else None,
+                alpha=args.alpha)
+            visualizer.add_datasample(
+                f'{name}_{i}',
+                drawn_img,
+                data_sample=result,
+                draw_gt=False,
+                show=args.out_file is None,
+                wait_time=0.1,
+                out_file=args.out_file,
+                **args.cfg_options)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    main(args)
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_diff_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_diff_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..7bc34d90ac38e715c12a21108e4dbea2422a738d
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_diff_visualization.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# configs for the 1st model
+recorders1 = dict(
+    backbone=dict(_scope_='mmrazor', type='ModuleOutputs', source='backbone'))
+mappings1 = dict(
+    p3=dict(recorder='backbone', data_idx=0),
+    p4=dict(recorder='backbone', data_idx=1),
+    p5=dict(recorder='backbone', data_idx=2),
+    p6=dict(recorder='backbone', data_idx=3))
+
+# configs for the 2nd model
+recorders2 = dict(
+    backbone=dict(_scope_='mmrazor', type='ModuleOutputs', source='backbone'))
+mappings2 = dict(
+    p3=dict(recorder='backbone', data_idx=0),
+    p4=dict(recorder='backbone', data_idx=1),
+    p5=dict(recorder='backbone', data_idx=2),
+    p6=dict(recorder='backbone', data_idx=3))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c8038dffe0be1ed70aab5b12b033ae6f84ca66b
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/backbone_feature_visualization.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+recorders = dict(
+    backbone=dict(_scope_='mmrazor', type='ModuleOutputs', source='backbone'))
+mappings = dict(
+    p3=dict(recorder='backbone', data_idx=0),
+    p4=dict(recorder='backbone', data_idx=1),
+    p5=dict(recorder='backbone', data_idx=2),
+    p6=dict(recorder='backbone', data_idx=3))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_diff_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_diff_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6c172fb642ee3d2f8780665bdc866281032c7c3
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_diff_visualization.py
@@ -0,0 +1,18 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# configs for the 1st model
+recorders1 = dict(
+    neck=dict(_scope_='mmrazor', type='ModuleOutputs', source='neck'))
+mappings1 = dict(
+    p3=dict(recorder='neck', data_idx=0),
+    p4=dict(recorder='neck', data_idx=1),
+    p5=dict(recorder='neck', data_idx=2),
+    p6=dict(recorder='neck', data_idx=3))
+
+# configs for the 2nd model
+recorders2 = dict(
+    neck=dict(_scope_='mmrazor', type='ModuleOutputs', source='neck'))
+mappings2 = dict(
+    p3=dict(recorder='neck', data_idx=0),
+    p4=dict(recorder='neck', data_idx=1),
+    p5=dict(recorder='neck', data_idx=2),
+    p6=dict(recorder='neck', data_idx=3))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_visualization.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..40b6b3f1b4127a80636e1072673b338d9e717501
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_configs/fpn_feature_visualization.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+recorders = dict(
+    neck=dict(_scope_='mmrazor', type='ModuleOutputs', source='neck'))
+mappings = dict(
+    p3=dict(recorder='neck', data_idx=0),
+    p4=dict(recorder='neck', data_idx=1),
+    p5=dict(recorder='neck', data_idx=2),
+    p6=dict(recorder='neck', data_idx=3))
diff --git a/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_scheduler.py b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..c97bc64189ceeafafb8aad150de0b2ae2c846469
--- /dev/null
+++ b/cv/distiller/CWD/pytorch/mmrazor/tools/visualizations/vis_scheduler.py
@@ -0,0 +1,262 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import json
+import os.path as osp
+import re
+from pathlib import Path
+from unittest.mock import MagicMock
+
+import matplotlib.pyplot as plt
+import rich
+import torch.nn as nn
+from mmcls.utils import register_all_modules
+from mmengine import Config, DictAction, Hook, Runner, Visualizer
+from mmengine.model import BaseModel
+from rich.progress import BarColumn, MofNCompleteColumn, Progress, TextColumn
+
+
+class SimpleModel(BaseModel):
+    """simple model that do nothing in train_step."""
+
+    def __init__(self):
+        super(SimpleModel, self).__init__()
+        self.data_preprocessor = nn.Identity()
+        self.conv = nn.Conv2d(1, 1, 1)
+
+    def forward(self, batch_inputs, data_samples, mode='tensor'):
+        pass
+
+    def train_step(self, data, optim_wrapper):
+        pass
+
+
+class ParamRecordHook(Hook):
+
+    def __init__(self, by_epoch):
+        super().__init__()
+        self.by_epoch = by_epoch
+        self.lr_list = []
+        self.momentum_list = []
+        self.task_id = 0
+        self.progress = Progress(BarColumn(), MofNCompleteColumn(),
+                                 TextColumn('{task.description}'))
+
+    def before_train(self, runner):
+        if self.by_epoch:
+            total = runner.train_loop.max_epochs
+            self.task_id = self.progress.add_task(
+                'epochs', start=True, total=total)
+        else:
+            total = runner.train_loop.max_iters
+            self.task_id = self.progress.add_task(
+                'iters', start=True, total=total)
+        self.progress.start()
+
+    def after_train_epoch(self, runner):
+        if self.by_epoch:
+            self.progress.update(self.task_id, advance=1)
+
+    def after_train_iter(self, runner, batch_idx, data_batch, outputs):
+        if not self.by_epoch:
+            self.progress.update(self.task_id, advance=1)
+        self.lr_list.append(runner.optim_wrapper.get_lr()['lr'][0])
+        self.momentum_list.append(
+            runner.optim_wrapper.get_momentum()['momentum'][0])
+
+    def after_train(self, runner):
+        self.progress.stop()
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Visualize a Dataset Pipeline')
+    parser.add_argument('config', help='config file path')
+    parser.add_argument(
+        '--param',
+        type=str,
+        default='lr',
+        choices=['lr', 'momentum'],
+        help='The param to visualize its change curve, choose from'
+        '"lr" and "momentum". Defaults to "lr".')
+    parser.add_argument(
+        '--dataset-size',
+        type=int,
+        help='The size of the dataset. If specify, `build_dataset` will '
+        'be skipped and use this size as the dataset size.')
+    parser.add_argument(
+        '--ngpus',
+        type=int,
+        default=1,
+        help='The number of GPUs used in training.')
+    parser.add_argument(
+        '--log-level',
+        default='WARNING',
+        help='The log level of the handler and logger. Defaults to '
+        'WARNING.')
+    parser.add_argument('--title', type=str, help='title of figure')
+    parser.add_argument(
+        '--style', type=str, default='whitegrid', help='style of plt')
+    parser.add_argument(
+        '--save-path',
+        type=Path,
+        help='The learning rate curve plot save path')
+    parser.add_argument('--not-show', default=False, action='store_true')
+    parser.add_argument(
+        '--window-size',
+        default='12*7',
+        help='Size of the window to display images, in format of "$W*$H".')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    args = parser.parse_args()
+    if args.window_size != '':
+        assert re.match(r'\d+\*\d+', args.window_size), \
+            "'window-size' must be in format 'W*H'."
+
+    return args
+
+
+def plot_curve(lr_list, args, param_name, iters_per_epoch, by_epoch=True):
+    """Plot learning rate vs iter graph."""
+    try:
+        import seaborn as sns
+        sns.set_style(args.style)
+    except ImportError:
+        pass
+
+    wind_w, wind_h = args.window_size.split('*')
+    wind_w, wind_h = int(wind_w), int(wind_h)
+    plt.figure(figsize=(wind_w, wind_h))
+
+    ax: plt.Axes = plt.subplot()
+    ax.plot(lr_list, linewidth=1)
+
+    if by_epoch:
+        ax.xaxis.tick_top()
+        ax.set_xlabel('Iters')
+        ax.xaxis.set_label_position('top')
+        sec_ax = ax.secondary_xaxis(
+            'bottom',
+            functions=(lambda x: x / iters_per_epoch,
+                       lambda y: y * iters_per_epoch))
+        sec_ax.set_xlabel('Epochs')
+    else:
+        plt.xlabel('Iters')
+    plt.ylabel(param_name)
+
+    if args.title is None:
+        plt.title(f'{osp.basename(args.config)} {param_name} curve')
+    else:
+        plt.title(args.title)
+
+
+def simulate_train(data_loader, cfg, by_epoch):
+    model = SimpleModel()
+    param_record_hook = ParamRecordHook(by_epoch=by_epoch)
+    default_hooks = dict(
+        param_scheduler=cfg.default_hooks['param_scheduler'],
+        timer=None,
+        logger=None,
+        checkpoint=None,
+        sampler_seed=None,
+        param_record=param_record_hook)
+
+    runner = Runner(
+        model=model,
+        work_dir=cfg.work_dir,
+        train_dataloader=data_loader,
+        train_cfg=cfg.train_cfg,
+        log_level=cfg.log_level,
+        optim_wrapper=cfg.optim_wrapper,
+        param_scheduler=cfg.param_scheduler,
+        default_scope=cfg.default_scope,
+        default_hooks=default_hooks,
+        visualizer=MagicMock(spec=Visualizer),
+        custom_hooks=cfg.get('custom_hooks', None))
+
+    runner.train()
+
+    return param_record_hook.lr_list, param_record_hook.momentum_list
+
+
+def main():
+    args = parse_args()
+    cfg = Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    if cfg.get('work_dir', None) is None:
+        # use config filename as default work_dir if cfg.work_dir is None
+        cfg.work_dir = osp.join('./work_dirs',
+                                osp.splitext(osp.basename(args.config))[0])
+
+    cfg.log_level = args.log_level
+    # register all modules in mmcls into the registries
+    register_all_modules()
+
+    # make sure save_root exists
+    if args.save_path and not args.save_path.parent.exists():
+        raise FileNotFoundError(
+            f'The save path is {args.save_path}, and directory '
+            f"'{args.save_path.parent}' do not exist.")
+
+    # init logger
+    print('Param_scheduler :')
+    rich.print_json(json.dumps(cfg.param_scheduler))
+
+    # prepare data loader
+    batch_size = cfg.train_dataloader.batch_size * args.ngpus
+
+    if 'by_epoch' in cfg.train_cfg:
+        by_epoch = cfg.train_cfg.get('by_epoch')
+    elif 'type' in cfg.train_cfg:
+        by_epoch = cfg.train_cfg.get('type') == 'EpochBasedTrainLoop'
+    else:
+        raise ValueError('please set `train_cfg`.')
+
+    if args.dataset_size is None and by_epoch:
+        from mmcls.datasets import build_dataset
+        dataset_size = len(build_dataset(cfg.train_dataloader.dataset))
+    else:
+        dataset_size = args.dataset_size or batch_size
+
+    class FakeDataloader(list):
+        dataset = MagicMock(metainfo=None)
+
+    data_loader = FakeDataloader(range(dataset_size // batch_size))
+    dataset_info = (
+        f'\nDataset infos:'
+        f'\n - Dataset size: {dataset_size}'
+        f'\n - Batch size per GPU: {cfg.train_dataloader.batch_size}'
+        f'\n - Number of GPUs: {args.ngpus}'
+        f'\n - Total batch size: {batch_size}')
+    if by_epoch:
+        dataset_info += f'\n - Iterations per epoch: {len(data_loader)}'
+    rich.print(dataset_info + '\n')
+
+    # simulation training process
+    lr_list, momentum_list = simulate_train(data_loader, cfg, by_epoch)
+    if args.param == 'lr':
+        param_list = lr_list
+    else:
+        param_list = momentum_list
+
+    param_name = 'Learning Rate' if args.param == 'lr' else 'Momentum'
+    plot_curve(param_list, args, param_name, len(data_loader), by_epoch)
+
+    if args.save_path:
+        plt.savefig(args.save_path)
+        print(f'\nThe {param_name} graph is saved at {args.save_path}')
+
+    if not args.not_show:
+        plt.show()
+
+
+if __name__ == '__main__':
+    main()